Pandas
Integrating with Pandas: Extending Your Data Science Toolkit
Working with Pandas in Pixeltable
Pixeltable seamlessly integrates with Pandas, enabling you to leverage both tools for a comprehensive workflow. While Pixeltable and Pandas serve different purposes, they complement each other well - Pixeltable excels at persistent storage, incremental updates, and multimodal data handling.
Key Differences
Pixeltable dataframes differ from Pandas DataFrames in two important ways:
- Pixeltable dataframes do not hold data in memory or allow direct updates (use insert/update/delete operations instead)
- Query execution in Pixeltable must be initiated explicitly to return results
Importing Pandas Data into Pixeltable
The simplest way to import Pandas data into Pixeltable is using the import_pandas
function:
import pixeltable as pxt
import pandas as pd
# Create or load a Pandas DataFrame
df = pd.read_csv("my_data.csv")
# Create a Pixeltable table from the DataFrame
table = pxt.io.import_pandas("my_table", df)
The import_pandas
function:
- Automatically creates a new Pixeltable table
- Infers the column types from your Pandas DataFrame
Data Type Mapping
When importing from Pandas, Pixeltable automatically maps data types:
- Numeric types (int, float) map to corresponding Pixeltable types
- String/object types map to
StringType
- Datetime types map to
TimestampType
- Complex types (lists, dicts) map to
JsonType
Extracting Data to Pandas
You can convert Pixeltable query results to Pandas DataFrames using the to_pandas() method:
# Query Pixeltable and convert to Pandas
result = table.select(table.column1, table.column2).collect()
df = result.to_pandas()
# Now use standard Pandas operations
print(df.describe())
Common Operations Comparison
Here's how common data operations translate between Pixeltable and Pandas:
# Computing a new feature
# Pandas:
df["test"] = df["col1"] - df["col2"]
df["test"].head(5)
# Pixeltable:
table.select(table.col1 - table.col2).head(5)
Best Practices
- Memory Management
- Use Pixeltable for persistent storage and large datasets
- Convert to Pandas only for the specific data segments you need to analyze
- Be cautious with
collect()
on large tables without limits
- Incremental Processing
- Let Pixeltable handle incremental updates through computed columns
- Use Pandas for one-off analytical tasks and exploratory data analysis
- Type Safety
- Leverage Pixeltable's type system for data validation
- Be aware of type conversions when moving between Pixeltable and Pandas
Example Workflow
Here's a complete example showing how to effectively combine Pixeltable and Pandas:
import pixeltable as pxt
import pandas as pd
# Create a Pixeltable table with some data
table = pxt.create_table('example', {
'id': pxt.IntType(),
'value': pxt.FloatType()
})
# Insert some data
table.insert([
{'id': i, 'value': float(i)}
for i in range(10)
])
# Query specific data and convert to Pandas
df = table.select(
table.id,
table.value
).where(
table.value > 5
).collect().to_pandas()
# Perform Pandas operations
df['squared'] = df['value'] ** 2
# Create new Pixeltable table from results
results_table = pxt.io.import_pandas(
'results',
df
)
Updated 3 months ago