Skip to main content

Concurrent Access & Scaling

AspectDetails
LockingAutomatic table-level locking for schema changes
IsolationPostgreSQL SERIALIZABLE isolation prevents data race conditions
RetriesBuilt-in retry logic handles transient serialization failures
Multi-ProcessMultiple workers/containers can safely read/write to the same instance
Scaling DimensionCurrent ApproachLimitation
Metadata StorageSingle embedded PostgreSQL instanceVertical scaling (larger EC2/VM)
ComputeMultiple API workers connected to same instanceShared access to storage volume required
High AvailabilitySingle attached storage volumeFailover requires volume detach/reattach
Multi-node HA and horizontal scaling planned for Pixeltable Cloud (2026).

GPU Acceleration

  • Automatic GPU Detection: Pixeltable uses CUDA GPUs for local models (Hugging Face, Ollama) when available.
  • CPU Fallback: Models run on CPU if no GPU detected (functional but slower).
  • Configuration: Control via CUDA_VISIBLE_DEVICES environment variable.

Error Handling

Error TypeModeBehavior
Computed Column Errorson_error='abort' (default)Fails entire operation if any row errors
on_error='ignore'Continues processing; stores None with error metadata
Media Validationmedia_validation='on_write' (default)Validates media during insert (catches errors early)
media_validation='on_read'Defers validation until media accessed (faster inserts)
Access error details via table.column.errortype and table.column.errormsg.
# Example: Graceful error handling in production
table.add_computed_column(
    analysis=llm_analyze(table.document),
    on_error='ignore'  # Continue processing despite individual failures
)

# Query for errors
errors = table.where(table.analysis.errortype != None).collect()

Testing Transformations Before Deployment

When you add a computed column, Pixeltable executes it immediately for all existing rows. For expensive operations (LLM calls, model inference), validate your logic on a sample first using select(); nothing is stored until you commit with add_computed_column().
# 1. Test transformation on sample rows (nothing stored)
table.select(
    table.text,
    summary=summarize_with_llm(table.text)
).head(3)  # Only processes 3 rows

# 2. Once satisfied, persist to table (processes all rows)
table.add_computed_column(summary=summarize_with_llm(table.text))
This “iterate-then-add” workflow lets you catch errors early without wasting API calls or compute on your full dataset.
Pro tip: Save expressions as variables to guarantee identical logic in both steps:
summary_expr = summarize_with_llm(table.text)
table.select(table.text, summary=summary_expr).head(3)  # Test
table.add_computed_column(summary=summary_expr)          # Commit

Full Tutorial

Step-by-step guide with examples for built-in functions, expressions, and custom UDFs

Schema Evolution

Operation TypeExamplesImpact
SafeAdd columns, Add computed columns, Add indexesIncremental computation only
DestructiveModify computed columns (if_exists='replace'), Drop columns/tables/viewsFull recomputation or data loss
Production Safety:
# Use if_exists='ignore' for idempotent schema migrations
import pixeltable as pxt
import config

docs_table = pxt.get_table(f'{config.APP_NAMESPACE}/documents')
docs_table.add_computed_column(
    embedding=embed_model(docs_table.document),
    if_exists='ignore'  # No-op if column exists
)
  • Version control setup_pixeltable.py like database migration scripts.
  • Rollback via table.revert() (single operation) or Git revert (complex changes).

Deployment Patterns

Web Applications:
  • Execute setup_pixeltable.py during deployment initialization
  • Web server processes connect to Pixeltable instance
  • Pixeltable uses connection pooling internally
  • Example: FastAPI with pxt.get_table() in endpoint handlers
Batch Processing:
  • Schedule via cron, Airflow, AWS EventBridge, GCP Cloud Scheduler
  • Isolate batch workloads from real-time serving (separate containers/instances)
  • Use Pixeltable’s incremental computation to process only new data
Containers:
  • Docker provides reproducible builds across environments
  • Full Backend: Mount persistent volume at ~/.pixeltable
  • Kubernetes: Use ReadWriteOnce PVC (single-pod write access)
  • Docker Compose or Kubernetes for multi-container deployments
# Dockerfile for Pixeltable application
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Initialize schema and start application
CMD python setup_pixeltable.py && uvicorn app:app --host 0.0.0.0

Environment Management

Multi-Tenancy and Isolation

Isolation TypeImplementationUse CaseOverhead
LogicalSingle Pixeltable instance with directory namespaces (pxt.create_dir(f"user_{user_id}"))Dev/staging environments, simple multi-user appsLow
PhysicalSeparate container instances per tenantSaaS with strict data isolationHigh
Logical Isolation Example:
# Per-user isolation via namespaces
pxt.create_dir(f"user_{user_id}", if_exists='ignore')
user_table = pxt.create_table(f"user_{user_id}/chat_history", schema={...})

High Availability Constraints

ConfigurationStatusDetails
Single Pod + ReadWriteOnce PVC✅ SupportedOne active pod writes to dedicated volume. Failover requires volume detach/reattach.
Multiple Pods + Shared Volume (NFS/EFS)❌ Not SupportedWill cause database corruption. Do not mount same pgdata to multiple pods.
Multi-Node HA🔜 Coming 2026Available in Pixeltable Cloud (serverless scaling, API endpoints). Join waitlist
Single-Writer Limitation: Pixeltable’s storage layer uses an embedded PostgreSQL instance. Only one process can write to ~/.pixeltable/pgdata at a time.

Troubleshooting

Reset Database (Development Only)

To completely reset Pixeltable’s local state during development:
# Stop all Pixeltable processes first, then:
rm -rf ~/.pixeltable/pgdata ~/.pixeltable/media ~/.pixeltable/file_cache
This deletes all data. Only use in development. For production, use backups and table.revert() or snapshots instead.

Common Issues

SymptomCauseSolution
”Cannot connect to database”Stale lock fileRemove ~/.pixeltable/pgdata/postmaster.pid if no process is running
Slow first queryFile cache missFiles download on first access; subsequent queries are fast
”Table not found”Wrong namespaceCheck pxt.list_tables() and verify config.APP_NAMESPACE
OOM on large mediaFull file loaded to memoryUse iterators (FrameIterator, DocumentSplitter) to process incrementally

Environment Separation

Use environment-specific namespaces to manage dev/staging/prod configurations:
# config.py
import os

ENV = os.getenv('ENVIRONMENT', 'dev')
APP_NAMESPACE = f'{ENV}_myapp'  # Creates: dev_myapp, staging_myapp, prod_myapp

# Model and API configuration
EMBEDDING_MODEL = os.getenv('EMBEDDING_MODEL', 'intfloat/e5-large-v2')
OPENAI_MODEL = os.getenv('OPENAI_MODEL', 'gpt-4o-mini')

# Optional: Cloud storage for generated media
MEDIA_STORAGE_BUCKET = os.getenv('MEDIA_STORAGE_BUCKET')

Testing

Staging Environment:
  • Mirror production configuration.
  • Test schema changes, UDF updates, application code changes.
  • Use representative data (anonymized or synthetic).
# Test environment with isolated namespace
import pixeltable as pxt

TEST_NS = 'test_myapp'
pxt.create_dir(TEST_NS, if_exists='replace')
# Run setup targeting test namespace
# Execute tests
# pxt.drop_dir(TEST_NS, force=True)  # Cleanup
Last modified on January 25, 2026