Tables are the fundamental data storage units in Pixeltable. They function similarly to SQL database tables but with enhanced capabilities designed specifically for AI and ML workflows. Each table consists of columns with defined data types and can store both structured data and unstructured media assets.In Pixeltable, tables:
Persist across sessions, meaning your data remains available even after restarting your environment
Maintain strong typing for data consistency
Support operations like filtering, querying, and transformation
Can handle specialized data types for machine learning and media processing
Group logically into directories (namespaces) for organization
Creating a table requires defining a name and schema that describes its structure:
Copy
Ask AI
import pixeltable as pxt# Create a directory to organize tablespxt.create_dir('example')# Create a table with a defined schemafilms = pxt.create_table('example.films', { 'title': pxt.String, 'year': pxt.Int, 'revenue': pxt.Float})
Pixeltable allows you to explicitly cast column values to ensure they conform to the expected type. This is particularly useful when working with computed columns or transforming data from external sources.
Copy
Ask AI
# Cast columns to different typestable.update({ 'int_score': table.score.astype(pxt.Int), # Cast float to integer 'string_count': table.count.astype(pxt.String), # Cast integer to string})# Using casting in computed columnsfilms.add_computed_column( budget_category=films.budget.astype(pxt.String) + ' million')# Casting in expressionsfilms.where(films.revenue.astype(pxt.Int) > 100).collect()
Column casting helps maintain data consistency and prevents type errors when processing your data.
# Basic row countfilms.count() # Returns total number of rows# Basic filteringfilms.where(films.budget >= 200.0).collect()# Select specific columnsfilms.select(films.title, films.year).collect()# Limit resultsfilms.limit(5).collect() # First 5 rows (no specific order)films.head(5) # First 5 rows by insertion orderfilms.tail(5) # Last 5 rows by insertion order# Order resultsfilms.order_by(films.budget, asc=False).limit(5).collect()
Create tables or insert data directly from external sources:
Copy
Ask AI
import pixeltable as pxtimport pandas as pd# Create a table from a CSV filetable = pxt.create_table('world_population', source='https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/world-population-data.csv')# Create a table from a pandas DataFramedf = pd.DataFrame({ 'cca3': ['FRA', 'DEU', 'ITA'], 'country': ['France', 'Germany', 'Italy'], 'continent': ['Europe', 'Europe', 'Europe'], 'pop_2023': [68_000_000, 83_000_000, 59_000_000]})table = pxt.create_table('europe_population', source=df)# Insert data from a pandas DataFrame into an existing tablenew_df = pd.DataFrame({ 'cca3': ['ESP', 'GBR', 'POL'], 'country': ['Spain', 'United Kingdom', 'Poland'], 'continent': ['Europe', 'Europe', 'Europe'], 'pop_2023': [47_000_000, 67_000_000, 38_000_000]})table.insert(new_df)
Pixeltable supports importing from various data sources:
CSV files (.csv)
Excel files (.xls, .xlsx)
Parquet files (.parquet, .pq, .parq)
JSON files (.json)
Pandas DataFrames
Pixeltable DataFrames
Hugging Face datasets
Update Operations
Modify existing data:
Copy
Ask AI
# Update all rowsfilms.update({ 'budget': films.budget * 1.1 # Increase all budgets by 10%})# Conditional updatesfilms.where( films.year < 2000).update({ 'plot': films.plot + ' (Classic Film)'})# Batch updates for multiple rowsupdates = [ {'id': 1, 'budget': 175.0}, {'id': 2, 'budget': 185.0}]films.batch_update(updates)
Delete Operations
Remove data with conditions:
Copy
Ask AI
# Delete specific rowsfilms.where( films.year < 1995).delete()# Delete with complex conditionsfilms.where( (films.budget < 100.0) & (films.year < 2000)).delete()# WARNING: Delete all rows (use with caution!)# films.delete() # Without where clause deletes all rows
Column Operations
Manage table structure:
Copy
Ask AI
# Add new columnfilms.add_column(rating=pxt.String)# Drop columnfilms.drop_column('rating')# View schemafilms.describe()
Versioning
Manage table versions:
Copy
Ask AI
# Revert the last operationfilms.revert() # Cannot be undone!# Revert multiple times to go back furtherfilms.revert()films.revert() # Goes back two operations
Export Operations
Extract data for analysis:
Copy
Ask AI
# Get results as Python objectsresult = films.limit(5).collect()first_row = result[0] # Get first row as dicttimestamps = result['timestamp'] # Get list of values for one column# Convert to Pandasdf = resultdf['revenue'].describe() # Get statistics for revenue column
Join Tables
Combine data from multiple tables using different join types.
Returns all records from the left table and matching records from the right table.
Copy
Ask AI
left_join_result = customers.join( orders, on=customers.customer_id == orders.customer_id, how='left').select( customers.name, orders.amount)left_df = left_join_result.collect()print(left_df)# Output will show all customers (1-5), with null for amount where no order exists
Returns all records from the right table and matching records from the left table.
Copy
Ask AI
right_join_result = customers.join( orders, on=customers.customer_id == orders.customer_id, how='right').select( customers.name, orders.amount)right_df = right_join_result.collect()print(right_df)# Output will show all orders (order_id 101-104), with null for name where no customer exists
# table.py - Run once to set uppxt.create_table(..., if_exists="ignore")# app.py - Production codetable = pxt.get_table("myapp.mytable")if table is None: raise RuntimeError("Run table.py first!")