Skip to main content

Concurrent Access & Scaling

AspectDetails
Thread SafetyEach thread gets its own database connection and transaction context automatically
LockingAutomatic table-level locking for schema changes
IsolationPostgreSQL SERIALIZABLE isolation prevents data race conditions
RetriesBuilt-in retry logic handles transient serialization failures
Scaling DimensionCurrent ApproachLimitation
Metadata StorageSingle embedded PostgreSQL instanceVertical scaling (larger EC2/VM)
ComputeMultiple API workers connected to same instanceShared access to storage volume required
High AvailabilitySingle attached storage volumeFailover requires volume detach/reattach
Multi-node HA and horizontal scaling planned for Pixeltable Cloud (2026).

Web Framework Concurrency

Pixeltable is thread-safe and works with FastAPI, Flask, Django, and other web frameworks out of the box. The key rule: use sync (def) endpoint handlers, not async def.

Why Sync Endpoints

FastAPI (and Starlette) dispatches sync (def) handlers to a thread pool. Each concurrent request gets its own thread, and Pixeltable automatically creates an isolated database connection per thread. This gives you true parallel request handling with no extra configuration.
from pydantic import BaseModel
from fastapi import FastAPI
import pixeltable as pxt

app = FastAPI()

class SearchResult(BaseModel):
    text: str
    score: float

@app.post("/ingest")
def ingest(text: str):
    t = pxt.get_table('myapp/documents')
    status = t.insert([{'text': text}])
    return {'inserted': status.num_rows}

@app.get("/search")
def search(query: str, limit: int = 10) -> list[SearchResult]:
    t = pxt.get_table('myapp/documents')
    sim = t.text.similarity(string=query)
    results = (
        t.order_by(sim, asc=False)
        .limit(limit)
        .select(t.text, score=sim)
        .collect()
    )
    return list(results.to_pydantic(SearchResult))
Do not use async def for endpoints that call Pixeltable. Pixeltable’s API is synchronous. Inside an async def handler, Pixeltable calls block the event loop, serializing all requests and starving other coroutines. With def handlers, FastAPI’s thread pool handles concurrency for you.

Returning Query Results

table.select(...).collect() returns a ResultSet object, which Pydantic cannot serialize directly. You have two options: Option 1: to_pydantic() (recommended for FastAPI) Define a Pydantic model and let Pixeltable validate and convert each row. FastAPI serializes these natively.
class Item(BaseModel):
    name: str
    score: float

@app.get("/rows")
def get_rows() -> list[Item]:
    t = pxt.get_table('myapp/items')
    return list(t.select(t.name, t.score).collect().to_pydantic(Item))
Option 2: to_pandas() + to_dict() Convert via pandas when you don’t need a Pydantic model.
@app.get("/rows")
def get_rows():
    t = pxt.get_table('myapp/items')
    df = t.select(t.name, t.score).collect().to_pandas()
    return {'rows': df.to_dict(orient='records')}

uvloop Compatibility

Pixeltable is compatible with uvloop, the high-performance event loop used by default in many production deployments. No special configuration is needed — sync endpoints work identically whether the server uses the default asyncio loop or uvloop.
# uvicorn with uvloop (the default when uvloop is installed)
uvicorn app:app --host 0.0.0.0 --port 8000 --workers 1

GPU Acceleration

  • Automatic GPU Detection: Pixeltable uses CUDA GPUs for local models (Hugging Face, Ollama) when available.
  • CPU Fallback: Models run on CPU if no GPU detected (functional but slower).
  • Configuration: Control via CUDA_VISIBLE_DEVICES environment variable.

Error Handling

Error TypeModeBehavior
Computed Column Errorson_error='abort' (default)Fails entire operation if any row errors
on_error='ignore'Continues processing; stores None with error metadata
Media Validationmedia_validation='on_write' (default)Validates media during insert (catches errors early)
media_validation='on_read'Defers validation until media accessed (faster inserts)
Access error details via table.column.errortype and table.column.errormsg.
# Example: Graceful error handling in production
table.add_computed_column(
    analysis=llm_analyze(table.document),
    on_error='ignore'  # Continue processing despite individual failures
)

# Query for errors
errors = table.where(table.analysis.errortype != None).collect()

Testing Transformations Before Deployment

When you add a computed column, Pixeltable executes it immediately for all existing rows. For expensive operations (LLM calls, model inference), validate your logic on a sample first using select(); nothing is stored until you commit with add_computed_column().
# 1. Test transformation on sample rows (nothing stored)
table.select(
    table.text,
    summary=summarize_with_llm(table.text)
).head(3)  # Only processes 3 rows

# 2. Once satisfied, persist to table (processes all rows)
table.add_computed_column(summary=summarize_with_llm(table.text))
This “iterate-then-add” workflow lets you catch errors early without wasting API calls or compute on your full dataset.
Pro tip: Save expressions as variables to guarantee identical logic in both steps:
summary_expr = summarize_with_llm(table.text)
table.select(table.text, summary=summary_expr).head(3)  # Test
table.add_computed_column(summary=summary_expr)          # Commit

Full Tutorial

Step-by-step guide with examples for built-in functions, expressions, and custom UDFs

Schema Evolution

Operation TypeExamplesImpact
SafeAdd columns, Add computed columns, Add indexesIncremental computation only
DestructiveModify computed columns (if_exists='replace'), Drop columns/tables/viewsFull recomputation or data loss
Production Safety:
# Use if_exists='ignore' for idempotent schema migrations
import pixeltable as pxt
import config

docs_table = pxt.get_table(f'{config.APP_NAMESPACE}/documents')
docs_table.add_computed_column(
    embedding=embed_model(docs_table.document),
    if_exists='ignore'  # No-op if column exists
)
  • Version control setup_pixeltable.py like database migration scripts.
  • Rollback via table.revert() (single operation) or Git revert (complex changes).

Updating Models

The most common schema evolution is switching an embedding or LLM model. In a traditional stack this requires a migration script, a compute cluster, reprocessing every row, and a maintenance window. In Pixeltable it’s one line — the old column keeps working while the new one backfills. Traditional approach:
# 1. Write migration script
# 2. Spin up compute to re-embed all rows (hours of downtime)
# 3. Swap the column in application code
# 4. Deploy during maintenance window
# 5. Monitor for consistency issues

data = db.query("SELECT id, content FROM documents")
for row in data:
    new_vec = new_model.encode(row["content"])
    db.execute("UPDATE documents SET embedding = %s WHERE id = %s", (new_vec, row["id"]))
Pixeltable approach:
# Add a new computed column. Old column still serves queries — zero downtime.
docs.add_computed_column(
    embedding_v2=sentence_transformer(docs.text, model_id='intfloat/e5-large-v2'),
    if_exists='ignore'
)
# Pixeltable backfills in batches, rate-limited, with automatic retries.
# Switch your queries to embedding_v2 when ready.
Because both columns coexist, you can A/B test retrieval quality before cutting over — no rollback plan needed.

Deployment Patterns

Web Applications:
  • Execute setup_pixeltable.py during deployment initialization
  • Web server processes connect to Pixeltable instance
  • Pixeltable uses connection pooling internally
  • Use sync (def) endpoint handlers for concurrent request support

App Template

Clone a working FastAPI + React app with multimodal upload, search, and agent endpoints already configured.
Batch Processing:
  • Schedule via cron, Airflow, AWS EventBridge, GCP Cloud Scheduler
  • Isolate batch workloads from real-time serving (separate containers/instances)
  • Use Pixeltable’s incremental computation to process only new data
Containers:
  • Docker provides reproducible builds across environments
  • Full Backend: Mount persistent volume at ~/.pixeltable
  • Kubernetes: Use ReadWriteOnce PVC (single-pod write access)
  • Docker Compose or Kubernetes for multi-container deployments
# Dockerfile for Pixeltable application
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Initialize schema and start application
CMD python setup_pixeltable.py && uvicorn app:app --host 0.0.0.0

Environment Management

Multi-Tenancy and Isolation

Isolation TypeImplementationUse CaseOverhead
LogicalSingle Pixeltable instance with directory namespaces (pxt.create_dir(f"user_{user_id}"))Dev/staging environments, simple multi-user appsLow
PhysicalSeparate container instances per tenantSaaS with strict data isolationHigh
Logical Isolation Example:
# Per-user isolation via namespaces
pxt.create_dir(f"user_{user_id}", if_exists='ignore')
user_table = pxt.create_table(f"user_{user_id}/chat_history", schema={...})

High Availability Constraints

ConfigurationStatusDetails
Single Pod + ReadWriteOnce PVC✅ SupportedOne active pod writes to dedicated volume. Failover requires volume detach/reattach.
Multiple Pods + Shared Volume (NFS/EFS)❌ Not SupportedWill cause database corruption. Do not mount same pgdata to multiple pods.
Multi-Node HA🔜 Coming 2026Available in Pixeltable Cloud (serverless scaling, API endpoints). Join waitlist
Single-Writer Limitation: Pixeltable’s storage layer uses an embedded PostgreSQL instance. Only one process can write to ~/.pixeltable/pgdata at a time.

Troubleshooting

Reset Database (Development Only)

To completely reset Pixeltable’s local state during development:
# Stop all Pixeltable processes first, then:
rm -rf ~/.pixeltable/pgdata ~/.pixeltable/media ~/.pixeltable/file_cache
This deletes all data. Only use in development. For production, use backups and table.revert() or snapshots instead.

Common Issues

SymptomCauseSolution
”Cannot connect to database”Stale lock fileRemove ~/.pixeltable/pgdata/postmaster.pid if no process is running
Slow first queryFile cache missFiles download on first access; subsequent queries are fast
”Table not found”Wrong namespaceCheck pxt.list_tables() and verify config.APP_NAMESPACE
OOM on large mediaFull file loaded to memoryUse iterators (FrameIterator, DocumentSplitter) to process incrementally

Environment Separation

Use environment-specific namespaces to manage dev/staging/prod configurations:
# config.py
import os

ENV = os.getenv('ENVIRONMENT', 'dev')
APP_NAMESPACE = f'{ENV}_myapp'  # Creates: dev_myapp, staging_myapp, prod_myapp

# Model and API configuration
EMBEDDING_MODEL = os.getenv('EMBEDDING_MODEL', 'intfloat/e5-large-v2')
OPENAI_MODEL = os.getenv('OPENAI_MODEL', 'gpt-4o-mini')

# Optional: Cloud storage for generated media
MEDIA_STORAGE_BUCKET = os.getenv('MEDIA_STORAGE_BUCKET')

Testing

Staging Environment:
  • Mirror production configuration.
  • Test schema changes, UDF updates, application code changes.
  • Use representative data (anonymized or synthetic).
# Test environment with isolated namespace
import pixeltable as pxt

TEST_NS = 'test_myapp'
pxt.create_dir(TEST_NS, if_exists='replace')
# Run setup targeting test namespace
# Execute tests
# pxt.drop_dir(TEST_NS, force=True)  # Cleanup
Last modified on March 1, 2026