Skip to main content
Pixeltable automatically tracks every change to your tables—data insertions, updates, deletions, and schema modifications. Query any point in history, undo mistakes, and maintain full reproducibility without manual version management.

How it works

Every operation that modifies a table creates a new version:
import pixeltable as pxt

# Version 0: Table created
products = pxt.create_table('demo.products', {
    'name': pxt.String,
    'price': pxt.Float
})

# Version 1: Data inserted
products.insert([
    {'name': 'Widget', 'price': 9.99},
    {'name': 'Gadget', 'price': 24.99}
])

# Version 2: Schema changed
products.add_computed_column(price_with_tax=products.price * 1.08)

# Version 3: Data updated
products.update({'price': 19.99}, where=products.name == 'Widget')
No configuration required—versioning is always on.

Viewing history

Human-readable history

products.history()
Returns a DataFrame showing all versions with timestamps, change types, and row counts:
versioncreated_atchange_typeinsertsupdatesdeletesschema_change
32025-01-15 10:30:00data010None
22025-01-15 10:29:00schema020Added: price_with_tax
12025-01-15 10:28:00data200None
02025-01-15 10:27:00schema000Initial Version

Programmatic access

versions = products.get_versions()  # List of dictionaries
latest = versions[0]
print(f"Version {latest['version']}: {latest['inserts']} inserts")

Time travel queries

Query any historical version using the table_name:version syntax:
# Get the table at version 1 (before computed column)
products_v1 = pxt.get_table('demo.products:1')
products_v1.collect()  # Returns data as it was at version 1

# Compare with current state
products.collect()  # Returns current data
Version handles are read-only—you cannot modify historical data.

Use cases

  • Debugging: Compare data before and after a problematic update
  • Auditing: Track who changed what and when
  • Recovery: Find and extract accidentally deleted or modified data
  • Reproducibility: Query exact data used for a specific model training run

Reverting changes

Undo the most recent change with revert():
# Oops, wrong update
products.update({'price': 0.00}, where=products.name == 'Widget')

# Undo it
products.revert()  # Removes version N, table is now at version N-1
revert() permanently removes the latest version. This cannot be undone.
You can call revert() multiple times to go back further, but cannot revert past version 0 or past a version referenced by a snapshot.

Snapshots

Create named, persistent point-in-time copies for long-term preservation:
# Freeze current state before a major data update
baseline = pxt.create_snapshot('demo.products_baseline', products)

# Later: source table changes, but snapshot remains unchanged
products.insert([{'name': 'NewItem', 'price': 99.99}])

products.count()   # 3 rows (updated)
baseline.count()   # 2 rows (frozen)
Snapshots vs Time Travel:
  • Time travel (pxt.get_table('table:N')) queries historical versions in place
  • Snapshots create a named, independent copy that persists even if the source table is modified or deleted

Data lineage

Pixeltable tracks the complete lineage of your data:

Schema lineage

Every computed column records its dependencies:
products.add_computed_column(
    discounted=products.price * 0.9
)
products.add_computed_column(
    discounted_with_tax=products.discounted * 1.08
)

# Pixeltable knows: discounted_with_tax → discounted → price

View lineage

Views automatically track their source tables:
expensive = pxt.create_view(
    'demo.expensive_products',
    products.where(products.price > 20)
)
# View lineage: expensive_products → products

What’s tracked

Change TypeTracked Information
insert()Row count, timestamp, computed values generated
update()Rows affected, old vs new values (via version comparison)
delete()Row count removed
add_column()Column name, type, dependencies
add_computed_column()Column name, expression, dependencies
drop_column()Column removed
rename_column()Old name → new name

Best practices

Use Snapshots for Milestones

Create snapshots before major data loads, model training runs, or production deployments.

Version Numbers for Reproducibility

Log table version numbers alongside model artifacts: products.get_versions()[0]['version']

Revert for Quick Fixes

Use revert() immediately after mistakes. For older issues, use time travel to identify the problem.

Namespace by Environment

Use directories like dev.products, staging.products to isolate versioning across environments.

Comparison with other systems

FeaturePixeltableGitDelta Lake
Automatic versioning✅ Every operationManual commits✅ Every operation
Time travel queriestable:N syntaxCheckout commitVERSION AS OF
Schema versioning✅ TrackedFile-based✅ Schema evolution
Computed column lineage✅ AutomaticN/AN/A
Revertrevert()git revertRESTORE
Named snapshotscreate_snapshot()Tags/branchesN/A

Next steps