Skip to main content

Code Organization

Both deployment strategies require separating schema definition from application code. Schema Definition (setup_pixeltable.py):
  • Defines directories, tables, views, computed columns, indexes
  • Acts as Infrastructure-as-Code for Pixeltable entities
  • Version controlled in Git
  • Executed during initial deployment and schema migrations
Application Code (app.py, endpoints.py, functions.py):
  • Assumes Pixeltable infrastructure exists
  • Interacts with tables via pxt.get_table() and @pxt.udf
  • Handles missing tables/views gracefully
Configuration (config.py):
  • Externalizes model IDs, API keys, thresholds, connection strings
  • Uses environment variables (.env + python-dotenv) or secrets management
  • Never hardcodes secrets
# setup_pixeltable.py
import pixeltable as pxt
import config

pxt.create_dir(config.APP_NAMESPACE, if_exists='ignore')

pxt.create_table(
    f'{config.APP_NAMESPACE}.documents',
    {
        'document': pxt.Document,
        'metadata': pxt.Json,
        'timestamp': pxt.Timestamp
    },
    if_exists='ignore'  # Idempotent: safe for repeated execution
)

# ---

# app.py
import pixeltable as pxt
import config

docs_table = pxt.get_table(f'{config.APP_NAMESPACE}.documents')
if docs_table is None:
    raise RuntimeError(
        f"Table '{config.APP_NAMESPACE}.documents' not found. "
        "Run setup_pixeltable.py first."
    )

Project Structure

project/
├── config.py              # Environment variables, model IDs, API keys
├── functions.py           # Custom UDFs (imported as modules)
├── setup_pixeltable.py    # Schema definition (tables, views, indexes)
├── app.py                 # Application endpoints (FastAPI/Flask)
├── requirements.txt       # Pinned dependencies
└── .env                   # Secrets (gitignored)
Key Principles:
  • Module UDFs (functions.py): Update when code changes; improve testability. Learn more
  • Retrieval Queries (@pxt.query): Encapsulate complex retrieval logic as reusable functions.
  • Idempotency: Use if_exists='ignore' to make setup_pixeltable.py safely re-runnable.

Storage Architecture

Pixeltable is an OLTP database built on embedded PostgreSQL. It uses multiple storage mechanisms:
Important Concept: Pixeltable directories (pxt.create_dir) are logical namespaces in the catalog, NOT filesystem directories.
How Media is Stored:
  • PostgreSQL stores only file paths/URLs, never raw media data.
  • Inserted local files: path stored, original file remains in place.
  • Inserted URLs: URL stored, file downloaded to File Cache on first access.
  • Generated media (computed columns): saved to Media Store (default: local, configurable to S3/GCS/Azure per-column).
  • File Cache size: configure via file_cache_size_g in ~/.pixeltable/config.toml. See configuration guide
Deployment-Specific Storage Patterns: Approach 1 (Orchestration Layer):
  • Pixeltable storage can be ephemeral (re-computable).
  • Processing results exported to external RDBMS and blob storage.
  • Reference input media from S3/GCS/Azure URIs.
Approach 2 (Full Backend):
  • Pixeltable IS the RDBMS (embedded PostgreSQL, not replaceable).
  • Requires persistent volume at ~/.pixeltable (pgdata, media, file_cache).
  • Media Store configurable to S3/GCS/Azure buckets for generated files.

Dependency Management

Virtual Environments: Use venv, conda, or uv to isolate dependencies. Requirements:
# requirements.txt
pixeltable==0.4.6
fastapi==0.115.0
uvicorn[standard]==0.32.0
pydantic==2.9.0
python-dotenv==1.0.1
sentence-transformers==3.3.0  # If using embedding indexes
  • Pin versions: package==X.Y.Z
  • Include integration packages (e.g., openai, sentence-transformers)
  • Test updates in staging before production

Data Interoperability

Pixeltable integrates with existing data pipelines via import/export capabilities. Import:
  • CSV, Excel, JSON: pxt.io.import_csv(), pxt.io.import_excel(), pxt.io.import_json()
  • Parquet: pxt.io.import_parquet()
  • Pandas DataFrames: table.insert(df) or pxt.create_table(source=df)
  • Hugging Face Datasets: pxt.io.import_huggingface_dataset()
Export:
  • Parquet: pxt.io.export_parquet(table, path) for data warehousing
  • LanceDB: pxt.io.export_lancedb(table, db_uri, table_name) for vector databases
  • PyTorch: table.to_pytorch_dataset() for ML training pipelines
  • COCO: table.to_coco_dataset() for computer vision
  • Pandas: table.collect().to_pandas() for analysis
# Export query results to Parquet
import pixeltable as pxt

docs_table = pxt.get_table('myapp.documents')
results = docs_table.where(docs_table.timestamp > '2024-01-01')
pxt.io.export_parquet(results, '/data/exports/recent_docs.parquet')