Skip to main content

Code Organization

Both deployment strategies require separating schema definition from application code. Schema Definition (setup_pixeltable.py):
  • Defines directories, tables, views, computed columns, indexes
  • Acts as Infrastructure-as-Code for Pixeltable entities
  • Version controlled in Git
  • Executed during initial deployment and schema migrations
Application Code (app.py, endpoints.py, functions.py):
  • Assumes Pixeltable infrastructure exists
  • Interacts with tables via pxt.get_table() and @pxt.udf
  • Handles missing tables/views gracefully
Configuration (config.py):
  • Externalizes model IDs, API keys, thresholds, connection strings
  • Uses environment variables (.env + python-dotenv) or secrets management
  • Never hardcodes secrets
# setup_pixeltable.py
import pixeltable as pxt
import config

pxt.create_dir(config.APP_NAMESPACE, if_exists='ignore')

pxt.create_table(
    f'{config.APP_NAMESPACE}/documents',
    {
        'document': pxt.Document,
        'metadata': pxt.Json,
        'timestamp': pxt.Timestamp
    },
    if_exists='ignore'  # Idempotent: safe for repeated execution
)

# ---

# app.py
import pixeltable as pxt
import config

docs_table = pxt.get_table(f'{config.APP_NAMESPACE}/documents')
if docs_table is None:
    raise RuntimeError(
        f"Table '{config.APP_NAMESPACE}/documents' not found. "
        "Run setup_pixeltable.py first."
    )

Project Structure

project/
├── config.py              # Environment variables, model IDs, API keys
├── functions.py           # Custom UDFs (imported as modules)
├── setup_pixeltable.py    # Schema definition (tables, views, indexes)
├── app.py                 # Application endpoints (FastAPI/Flask)
├── requirements.txt       # Pinned dependencies
└── .env                   # Secrets (gitignored)
Key Principles:
  • Module UDFs (functions.py): Update when code changes; improve testability. Learn more
  • Retrieval Queries (@pxt.query): Encapsulate complex retrieval logic as reusable functions.
  • Idempotency: Use if_exists='ignore' to make setup_pixeltable.py safely re-runnable.

Storage Architecture

Pixeltable is an OLTP database built on embedded PostgreSQL. It uses multiple storage mechanisms:
Important Concept: Pixeltable directories (pxt.create_dir) are logical namespaces in the catalog, NOT filesystem directories.
How Media is Stored:
  • PostgreSQL stores only file paths/URLs, never raw media data.
  • Inserted local files: path stored, original file remains in place.
  • Inserted URLs: URL stored, file downloaded to File Cache on first access.
  • Generated media (computed columns): saved to Media Store (default: local, configurable to S3/GCS/Azure per-column).
  • File Cache size: configure via file_cache_size_g in ~/.pixeltable/config.toml. See configuration guide
For large datasets with remote media, consider increasing file cache size to avoid repeated downloads (default is 20% of available disk):
# ~/.pixeltable/config.toml
file_cache_size_g = 50  # 50 GB cache
Deployment-Specific Storage Patterns: Approach 1 (Orchestration Layer):
  • Pixeltable storage can be ephemeral (re-computable).
  • Processing results exported to external RDBMS and blob storage.
  • Reference input media from S3/GCS/Azure URIs.
Approach 2 (Full Backend):
  • Pixeltable IS the RDBMS (embedded PostgreSQL, not replaceable).
  • Requires persistent volume at ~/.pixeltable (pgdata, media, file_cache).
  • Media Store configurable to S3/GCS/Azure buckets for generated files.

Dependency Management

Virtual Environments: Use venv, conda, or uv to isolate dependencies. Requirements:
# requirements.txt
pixeltable==0.4.6
fastapi==0.115.0
uvicorn[standard]==0.32.0
pydantic==2.9.0
python-dotenv==1.0.1
sentence-transformers==3.3.0  # If using embedding indexes
  • Pin versions: package==X.Y.Z
  • Include integration packages (e.g., openai, sentence-transformers)
  • Test updates in staging before production

Data Interoperability

Pixeltable integrates with existing data pipelines via import/export capabilities. See the Import/Export SDK reference for full details. Import: Export:
# Export query results to Parquet
import pixeltable as pxt

docs_table = pxt.get_table('myapp/documents')
results = docs_table.where(docs_table.timestamp > '2024-01-01')
pxt.io.export_parquet(results, '/data/exports/recent_docs.parquet')
Last modified on January 25, 2026