Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.pixeltable.com/llms.txt

Use this file to discover all available pages before exploring further.

Code Organization

Separate schema definition from router logic. Schema Definition (setup_pixeltable.py):
  • Tables, views, computed columns, indexes, and agent-internal @pxt.query functions
  • Flat module with if_exists='ignore' for idempotency (no setup() wrapper, no _initialized flag)
  • Run once before starting workers: python setup_pixeltable.py
Router Files (routers/data.py, routers/search.py, etc.):
  • Call pxt.get_table() directly to get table handles
  • Define router-facing @pxt.query functions next to the routes that use them
  • No import setup_pixeltable needed; tables already exist from the init step
Configuration (config.py):
  • Externalizes model IDs, API keys, thresholds, connection strings
  • Uses environment variables (.env + python-dotenv) or secrets management
  • Never hardcodes secrets
# setup_pixeltable.py — creates tables (flat, idempotent, no queries for routers)
import pixeltable as pxt
import config

pxt.create_dir(config.APP_NAMESPACE, if_exists='ignore')

docs = pxt.create_table(
    f'{config.APP_NAMESPACE}/documents',
    {'document': pxt.Document, 'metadata': pxt.Json, 'timestamp': pxt.Timestamp},
    if_exists='ignore',
)

# ---

# routers/data.py — queries live next to the routes that use them
import pixeltable as pxt
from pixeltable.serving import FastAPIRouter
import config

router = FastAPIRouter(prefix="/api/data", tags=["data"])
docs = pxt.get_table(f'{config.APP_NAMESPACE}/documents')

@pxt.query
def list_documents():
    return docs.select(docs.title, docs.document).order_by(docs.title)

router.add_query_route(path="/documents", query=list_documents, method="get")

Project Structure

project/
├── config.py              # Environment variables, model IDs, API keys
├── functions.py           # Custom UDFs (imported as modules)
├── setup_pixeltable.py     # Schema definition (tables, views, indexes)
├── main.py                # FastAPI app, mounts routers
├── routers/
│   ├── data.py            # CRUD routes + queries for data pipeline
│   ├── search.py          # Search routes + queries
│   └── agent.py           # Agent routes (declarative + hand-written)
├── requirements.txt       # Pinned dependencies
└── .env                   # Secrets (gitignored)
Key Principles:
  • Schema separate from routers: setup_pixeltable.py defines tables/views/indexes. Router files define @pxt.query functions next to the routes that use them. No cross-imports needed.
  • Module UDFs (functions.py): Update when code changes; improve testability. Learn more
  • Idempotency: Use if_exists='ignore' to make setup_pixeltable.py safely re-runnable.
  • Built-in HTTP serving: For standard endpoints, consider pxt serve with a TOML config.
  • return_rows=True: Pass to insert()/update() to get computed column values back without a follow-up query. See HTTP Serving.
  • Multi-worker deployments: With --workers N, run python setup_pixeltable.py before uvicorn so schema creation happens once, not per worker (see Starter Kit Dockerfile).

Pixeltable Starter Kit

See this structure in action: a production-ready FastAPI + React app with schema definition, config, UDFs, and endpoint routers already wired up. Includes deployment configs for Docker, Helm, Terraform (EKS/GKE/AKS), and AWS CDK.

Storage Architecture

Pixeltable is an OLTP database built on embedded PostgreSQL. It uses multiple storage mechanisms:
Important Concept: Pixeltable directories (pxt.create_dir) are logical namespaces in the catalog, NOT filesystem directories.
How Media is Stored:
  • PostgreSQL stores only file paths/URLs, never raw media data.
  • Inserted local files: path stored, original file remains in place.
  • Inserted URLs: URL stored, file downloaded to File Cache on first access.
  • Generated media (computed columns): saved to Media Store (default: local, configurable to S3/GCS/Azure per-column).
  • File Cache size: configure via file_cache_size_g in ~/.pixeltable/config.toml. See configuration guide
For large datasets with remote media, consider increasing file cache size to avoid repeated downloads (default is 20% of available disk):
# ~/.pixeltable/config.toml
file_cache_size_g = 50  # 50 GB cache

References, Not Copies

Unlike vector databases that require ingesting data into their own storage format, Pixeltable stores references to external files. Your original media stays in S3/GCS/Azure; only computed results (embeddings, metadata, generated media) are stored locally or in configured cloud buckets. This means:
  • No data duplication — you don’t pay for storage twice.
  • Schema changes don’t require re-upload — add a column, not a migration script.
  • Works with existing storage — point Pixeltable at your current buckets.
Deployment-Specific Storage Patterns: Approach 1 (Orchestration Layer):
  • Pixeltable storage can be ephemeral (re-computable).
  • Processing results exported to external RDBMS and blob storage.
  • Reference input media from S3/GCS/Azure URIs.
Approach 2 (Full Backend):
  • Pixeltable IS the RDBMS (embedded PostgreSQL, not replaceable).
  • Requires persistent volume at ~/.pixeltable (pgdata, media, file_cache).
  • Media Store configurable to S3/GCS/Azure buckets for generated files.
All Starter Kit deployment configs set PIXELTABLE_HOME=/data/pixeltable pointing to persistent storage (Docker volumes, K8s PVCs, or EFS). For large media workloads, configure external blob storage:
PIXELTABLE_INPUT_MEDIA_DEST=s3://your-bucket/input    # or gs:// or az://
PIXELTABLE_OUTPUT_MEDIA_DEST=s3://your-bucket/output

Dependency Management

Virtual Environments: Use venv, conda, or uv to isolate dependencies. Requirements:
# requirements.txt
pixeltable==0.4.6
fastapi==0.115.0
uvicorn[standard]==0.32.0
pydantic==2.9.0
python-dotenv==1.0.1
sentence-transformers==3.3.0  # If using embedding indexes
  • Pin versions: package==X.Y.Z
  • Include integration packages (e.g., openai, sentence-transformers)
  • Test updates in staging before production

Data Interoperability

Pixeltable integrates with existing data pipelines via import/export capabilities. See the Import/Export SDK reference for full details. Import: Export:
# Export query results to Parquet
import pixeltable as pxt

docs_table = pxt.get_table('myapp/documents')
results = docs_table.where(docs_table.timestamp > '2024-01-01')
pxt.io.export_parquet(results, '/data/exports/recent_docs.parquet')
Last modified on May 9, 2026