Learn more about Pixeltable tables and the data operations with our in-depth guide.

What are Tables?

Tables are the fundamental data storage units in Pixeltable. They function similarly to SQL database tables but with enhanced capabilities designed specifically for AI and ML workflows. Each table consists of columns with defined data types and can store both structured data and unstructured media assets.

In Pixeltable, tables:

  • Persist across sessions, meaning your data remains available even after restarting your environment
  • Maintain strong typing for data consistency
  • Support operations like filtering, querying, and transformation
  • Can handle specialized data types for machine learning and media processing
  • Group logically into directories (namespaces) for organization

Creating a table requires defining a name and schema that describes its structure:

import pixeltable as pxt

# Create a directory to organize tables
pxt.create_dir('example')

# Create a table with a defined schema
films = pxt.create_table('example.films', {
    'title': pxt.String,
    'year': pxt.Int,
    'revenue': pxt.Float
})

Type System

# Schema definition
table = pxt.create_table('example', {
    'text': pxt.String,     # Text data
    'count': pxt.Int,       # Integer numbers
    'score': pxt.Float,     # Decimal numbers
    'active': pxt.Bool,     # Boolean values
    'created': pxt.Timestamp # Date/time values
})

Data Operations

Query Operations

Filter and retrieve data:

# Basic row count
films.count()  # Returns total number of rows

# Basic filtering
films.where(films.budget >= 200.0).collect()

# Select specific columns
films.select(films.title, films.year).collect()

# Limit results
films.limit(5).collect()  # First 5 rows (no specific order)
films.head(5)  # First 5 rows by insertion order
films.tail(5)  # Last 5 rows by insertion order

# Order results
films.order_by(films.budget, asc=False).limit(5).collect()

String Operations

Manipulate text data:

# String contains
films.where(films.title.contains('Inception')).collect()

# String replacement
films.update({
    'plot': films.plot.replace('corporate secrets', 'subconscious secrets')
})

# String functions
films.update({
    'title': films.title.upper(),        # Convert to uppercase
    'length': films.title.len()          # Get string length
})

Insert Operations

Add new data:

# Insert single row
films.insert(
    title='Inside Out 2',
    year=2024,
    plot='Emotions navigate puberty',
    budget=200.0
)

# Insert multiple rows
films.insert([
    {
        'title': 'Jurassic Park', 
        'year': 1993, 
        'plot': 'Dinosaur theme park disaster',
        'budget': 63.0
    },
    {
        'title': 'Titanic', 
        'year': 1997, 
        'plot': 'Ill-fated ocean liner romance',
        'budget': 200.0
    }
])

Update Operations

Modify existing data:

# Update all rows
films.update({
    'budget': films.budget * 1.1  # Increase all budgets by 10%
})

# Conditional updates
films.where(
    films.year < 2000
).update({
    'plot': films.plot + ' (Classic Film)'
})

# Batch updates for multiple rows
updates = [
    {'id': 1, 'budget': 175.0},
    {'id': 2, 'budget': 185.0}
]
films.batch_update(updates)

Delete Operations

Remove data with conditions:

# Delete specific rows
films.where(
    films.year < 1995
).delete()

# Delete with complex conditions
films.where(
    (films.budget < 100.0) & 
    (films.year < 2000)
).delete()

# WARNING: Delete all rows (use with caution!)
# films.delete()  # Without where clause deletes all rows

Column Operations

Manage table structure:

# Add new column
films.add_column(rating=pxt.String)

# Drop column
films.drop_column('rating')

# View schema
films.describe()

Versioning

Manage table versions:

# Revert the last operation
films.revert()  # Cannot be undone!

# Revert multiple times to go back further
films.revert()
films.revert()  # Goes back two operations

Export Operations

Extract data for analysis:

# Get results as Python objects
result = films.limit(5).collect()
first_row = result[0]  # Get first row as dict
timestamps = result['timestamp']  # Get list of values for one column

# Convert to Pandas
df = result.to_pandas()
df['revenue'].describe()  # Get statistics for revenue column

Always use a where() clause with delete(). Without it, all rows in the table will be deleted! While you can use revert() to undo operations, it’s better to be careful with destructive operations.

Best Practices

Schema Definition

  • Use clear naming for directories and tables
  • Document computed column dependencies

Application Code

  • Use get_table() to fetch existing tables
  • Use batch operations for multiple rows

Common Patterns

Additional Resources

Remember that Pixeltable automatically handles versioning and lineage tracking. Every operation is recorded and can be reverted if needed.