Skip to main content

Concurrent Access & Scaling

AspectDetails
LockingAutomatic table-level locking for schema changes
IsolationPostgreSQL SERIALIZABLE isolation prevents data race conditions
RetriesBuilt-in retry logic handles transient serialization failures
Multi-ProcessMultiple workers/containers can safely read/write to the same instance
Scaling DimensionCurrent ApproachLimitation
Metadata StorageSingle embedded PostgreSQL instanceVertical scaling (larger EC2/VM)
ComputeMultiple API workers connected to same instanceShared access to storage volume required
High AvailabilitySingle attached storage volumeFailover requires volume detach/reattach
Multi-node HA and horizontal scaling planned for Pixeltable Cloud (2026).

GPU Acceleration

  • Automatic GPU Detection: Pixeltable uses CUDA GPUs for local models (Hugging Face, Ollama) when available.
  • CPU Fallback: Models run on CPU if no GPU detected (functional but slower).
  • Configuration: Control via CUDA_VISIBLE_DEVICES environment variable.

Error Handling

Error TypeModeBehavior
Computed Column Errorson_error='abort' (default)Fails entire operation if any row errors
on_error='ignore'Continues processing; stores None with error metadata
Media Validationmedia_validation='on_write' (default)Validates media during insert (catches errors early)
media_validation='on_read'Defers validation until media accessed (faster inserts)
Access error details via table.column.errortype and table.column.errormsg.
# Example: Graceful error handling in production
table.add_computed_column(
    analysis=llm_analyze(table.document),
    on_error='ignore'  # Continue processing despite individual failures
)

# Query for errors
errors = table.where(table.analysis.errortype != None).collect()

Schema Evolution

Operation TypeExamplesImpact
SafeAdd columns, Add computed columns, Add indexesIncremental computation only
DestructiveModify computed columns (if_exists='replace'), Drop columns/tables/viewsFull recomputation or data loss
Production Safety:
# Use if_exists='ignore' for idempotent schema migrations
import pixeltable as pxt
import config

docs_table = pxt.get_table(f'{config.APP_NAMESPACE}.documents')
docs_table.add_computed_column(
    embedding=embed_model(docs_table.document),
    if_exists='ignore'  # No-op if column exists
)
  • Version control setup_pixeltable.py like database migration scripts.
  • Rollback via table.revert() (single operation) or Git revert (complex changes).

Deployment Patterns

Web Applications:
  • Execute setup_pixeltable.py during deployment initialization
  • Web server processes connect to Pixeltable instance
  • Pixeltable uses connection pooling internally
  • Example: FastAPI with pxt.get_table() in endpoint handlers
Batch Processing:
  • Schedule via cron, Airflow, AWS EventBridge, GCP Cloud Scheduler
  • Isolate batch workloads from real-time serving (separate containers/instances)
  • Use Pixeltable’s incremental computation to process only new data
Containers:
  • Docker provides reproducible builds across environments
  • Full Backend: Mount persistent volume at ~/.pixeltable
  • Kubernetes: Use ReadWriteOnce PVC (single-pod write access)
  • Docker Compose or Kubernetes for multi-container deployments
# Dockerfile for Pixeltable application
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Initialize schema and start application
CMD python setup_pixeltable.py && uvicorn app:app --host 0.0.0.0

Environment Management

Multi-Tenancy and Isolation

Isolation TypeImplementationUse CaseOverhead
LogicalSingle Pixeltable instance with directory namespaces (pxt.create_dir(f"user_{user_id}"))Dev/staging environments, simple multi-user appsLow
PhysicalSeparate container instances per tenantSaaS with strict data isolationHigh
Logical Isolation Example:
# Per-user isolation via namespaces
pxt.create_dir(f"user_{user_id}", if_exists='ignore')
user_table = pxt.create_table(f"user_{user_id}.chat_history", schema={...})

High Availability Constraints

ConfigurationStatusDetails
Single Pod + ReadWriteOnce PVC✅ SupportedOne active pod writes to dedicated volume. Failover requires volume detach/reattach.
Multiple Pods + Shared Volume (NFS/EFS)❌ Not SupportedWill cause database corruption. Do not mount same pgdata to multiple pods.
Multi-Node HA🔜 Coming 2026Available in Pixeltable Cloud (serverless scaling, API endpoints). Join waitlist
Single-Writer Limitation: Pixeltable’s storage layer uses an embedded PostgreSQL instance. Only one process can write to ~/.pixeltable/pgdata at a time.

Environment Separation

Use environment-specific namespaces to manage dev/staging/prod configurations:
# config.py
import os

ENV = os.getenv('ENVIRONMENT', 'dev')
APP_NAMESPACE = f'{ENV}_myapp'  # Creates: dev_myapp, staging_myapp, prod_myapp

# Model and API configuration
EMBEDDING_MODEL = os.getenv('EMBEDDING_MODEL', 'intfloat/e5-large-v2')
OPENAI_MODEL = os.getenv('OPENAI_MODEL', 'gpt-4o-mini')

# Optional: Cloud storage for generated media
MEDIA_STORAGE_BUCKET = os.getenv('MEDIA_STORAGE_BUCKET')

Testing

Staging Environment:
  • Mirror production configuration.
  • Test schema changes, UDF updates, application code changes.
  • Use representative data (anonymized or synthetic).
# Test environment with isolated namespace
import pixeltable as pxt

TEST_NS = 'test_myapp'
pxt.create_dir(TEST_NS, if_exists='replace')
# Run setup targeting test namespace
# Execute tests
# pxt.drop_dir(TEST_NS, force=True)  # Cleanup