Code Organization
Both deployment strategies require separating schema definition from application code. Schema Definition (setup_pixeltable.py):
- Defines directories, tables, views, computed columns, indexes
- Acts as Infrastructure-as-Code for Pixeltable entities
- Version controlled in Git
- Executed during initial deployment and schema migrations
app.py, endpoints.py, functions.py):
- Assumes Pixeltable infrastructure exists
- Interacts with tables via
pxt.get_table()and@pxt.udf - Handles missing tables/views gracefully
config.py):
- Externalizes model IDs, API keys, thresholds, connection strings
- Uses environment variables (
.env+python-dotenv) or secrets management - Never hardcodes secrets
Project Structure
- Project Structure
- config.py
- functions.py
- setup_pixeltable.py
- app.py
Key Principles:
- Module UDFs (
functions.py): Update when code changes; improve testability. Learn more - Retrieval Queries (
@pxt.query): Encapsulate complex retrieval logic as reusable functions. - Idempotency: Use
if_exists='ignore'to makesetup_pixeltable.pysafely re-runnable.
Storage Architecture
Pixeltable is an OLTP database built on embedded PostgreSQL. It uses multiple storage mechanisms: How Media is Stored:- PostgreSQL stores only file paths/URLs, never raw media data.
- Inserted local files: path stored, original file remains in place.
- Inserted URLs: URL stored, file downloaded to File Cache on first access.
- Generated media (computed columns): saved to Media Store (default: local, configurable to S3/GCS/Azure per-column).
- File Cache size: configure via
file_cache_size_gin~/.pixeltable/config.toml. See configuration guide
- Pixeltable storage can be ephemeral (re-computable).
- Processing results exported to external RDBMS and blob storage.
- Reference input media from S3/GCS/Azure URIs.
- Pixeltable IS the RDBMS (embedded PostgreSQL, not replaceable).
- Requires persistent volume at
~/.pixeltable(pgdata, media, file_cache). - Media Store configurable to S3/GCS/Azure buckets for generated files.
Dependency Management
Virtual Environments: Usevenv, conda, or uv to isolate dependencies.
Requirements:
- Pin versions:
package==X.Y.Z - Include integration packages (e.g.,
openai,sentence-transformers) - Test updates in staging before production
Data Interoperability
Pixeltable integrates with existing data pipelines via import/export capabilities. See the Import/Export SDK reference for full details. Import:- CSV, Excel, JSON:
pxt.io.import_csv(),pxt.io.import_excel(),pxt.io.import_json() - Parquet:
pxt.io.import_parquet() - Pandas DataFrames:
table.insert(df)orpxt.create_table(source=df) - Hugging Face Datasets:
pxt.io.import_huggingface_dataset()
- Parquet:
pxt.io.export_parquet(table, path)for data warehousing - LanceDB:
pxt.io.export_lancedb(table, db_uri, table_name)for vector databases - PyTorch:
table.to_pytorch_dataset()for ML training pipelines - COCO:
table.to_coco_dataset()for computer vision - Pandas:
table.collect().to_pandas()for analysis