Documentation Index
Fetch the complete documentation index at: https://docs.pixeltable.com/llms.txt
Use this file to discover all available pages before exploring further.
What Pixeltable Replaces
Most multimodal AI stacks look like this: blob storage for media, a relational database for metadata, a vector database for embeddings, an orchestrator for scheduling, and custom glue code holding it all together.- Traditional Stack
- With Pixeltable
5+ services to deploy and maintain: blob storage, orchestrator, relational DB, vector DB, cache — plus custom retry logic, rate limiting, sync scripts, and error handling to wire them together.
Systems Pixeltable Replaces
You don’t install, configure, or manage these — Pixeltable handles them natively.| Instead of … | With Pixeltable … |
|---|---|
| PostgreSQL / MySQL | pxt.create_table() — schema is Python, versioned automatically |
| Pinecone / Weaviate / Qdrant | add_embedding_index() — one line, auto-maintained on insert/update/delete |
| S3 / boto3 / blob storage | pxt.Image / Video / Audio / Document types with transparent caching; destination='s3://…' for cloud routing |
| Airflow / Prefect / Celery | Computed columns trigger on insert — no orchestrator, no workers, no DAGs |
| LangChain / LlamaIndex (RAG) | @pxt.query + .similarity() + computed column chaining |
| pandas / polars (multimodal) | .sample(), ephemeral UDFs, then add_computed_column() to commit — same code, prototype to production |
| DVC / MLflow / W&B | Built-in history(), revert(), time travel (table:N), snapshots — zero config |
| Custom retry / rate-limit / caching | Built into every AI integration; results cached, only new rows recomputed |
| Custom ETL / glue code | Declarative schema — Pixeltable handles execution, caching, incremental updates |
Tools Pixeltable Abstracts
These tools run under the hood, but you interact through a cleaner interface. This is a sample — Pixeltable wraps 30+ AI providers, dozens of built-in functions for media and data processing, and supports any Python library via@pxt.udf.
| Tool | Raw usage | Through Pixeltable |
|---|---|---|
| FFmpeg | Install binary, subprocess calls, format conversion, frame seeking | extract_audio(video, format='mp3') for audio; frame_iterator(video, fps=1) for frame extraction via pxt.create_view() |
| Pillow/PIL | Image.open(), resize, convert, encode, save, handle formats | pixeltable.functions.image module: resize(), crop(), thumbnail(), b64_encode(), rotate(), blend(), plus width(), height(), get_metadata() |
| spaCy | pip install spacy, download model, load pipeline, parse documents | document_splitter(doc, separators='sentence') — spaCy runs under the hood (configurable via spacy_model parameter). Also supports 'heading', 'paragraph', 'page', 'token_limit', 'char_limit' separators |
| sentence-transformers | Load model, tokenize, encode batches, normalize vectors | sentence_transformer.using(model_id='intfloat/e5-large-v2') passed to add_embedding_index(). Pixeltable handles model loading, batching, and index maintenance |
| OpenAI CLIP | Load model, preprocess images/text differently, encode, handle multimodal alignment | clip.using(model_id='openai/clip-vit-base-patch32') — multimodal embedding index that accepts both image and text queries for cross-modal search |
| OpenAI Whisper | API key setup, audio format handling, chunking long files, parsing responses | openai.transcriptions(audio=table.audio_col, model='whisper-1') as a computed column — automatic rate limiting, caching. Also supports local Whisper via whisper.transcribe() |
| Anthropic Claude tool calling | Construct messages, define tool schemas as JSON, parse tool_use blocks, execute tools, re-call with results | anthropic.messages() + anthropic.invoke_tools() + pxt.tools() — all as chained computed columns. Tool schemas derived automatically from @pxt.udf function signatures |
| + many more | See the full SDK Reference, AI Integrations, and Cookbooks |
What Pixeltable Doesn’t Replace
You still need these — Pixeltable is a data layer, not a full application framework.| Tool | Why you still need it |
|---|---|
| FastAPI / Flask / Django | Standard CRUD endpoints can use built-in HTTP serving; custom logic still needs a framework |
| Pydantic | Request/response validation for your API endpoints (Pixeltable’s .to_pydantic() bridges the two) |
| React / Vue / frontend | UI layer — Pixeltable has no frontend |
| Docker / Kubernetes / Terraform | Deployment infrastructure — Pixeltable runs inside your containers, it doesn’t provision them |
| Authentication / authorization | User management, API keys, OAuth — outside Pixeltable’s scope |
| Domain-specific UDFs | Business logic you write as @pxt.udf functions (e.g., web search, custom scoring) — Pixeltable provides the framework, you provide the logic |
Deployment Decision Guide
Pixeltable supports three production deployment patterns. Choose based on your constraints:| Question | Answer | Recommendation |
|---|---|---|
| Building a web app with a frontend? | Yes | Full Backend (FastAPI + React) |
| Need an API with zero web code? | Yes | Declarative Serving (pxt serve from TOML) |
| Need batch/background processing (cron, queue, Cloud Run Job)? | Yes | Batch Processing (pure Python script, no HTTP server) |
| Existing production DB that must stay? | Yes | Batch Processing (process in Pixeltable, export_sql to your DB) |
| Need semantic search (RAG)? | Yes | Full Backend or Declarative Serving |
| Expose Pixeltable as MCP server for LLM tools? | Yes | Full Backend + MCP Server |
Technical Capabilities (Both)
Regardless of deployment mode, you get:- Multimodal Types: Native handling of Video, Document, Audio, Image, JSON.
- Computed Columns: Automatic incremental updates and dependency tracking.
- Views & Iterators: Built-in logic for chunking documents, extracting frames, etc.
- Model Orchestration: Rate-limited API calls to OpenAI, Anthropic, Gemini, local models.
- Data Interoperability: Import/export CSV, JSON, Parquet, PyTorch, LanceDB, pandas.
- Configurable Media Storage: Per-column destination (local or cloud bucket).
Use Case Comparison
| Capability | ML Data Wrangling | AI Applications |
|---|---|---|
| Multimodal Types | ✅ Video, Audio, Image, Document | ✅ Video, Audio, Image, Document |
| Computed Columns | ✅ Enrichment & pre-annotation | ✅ Pipeline orchestration |
| Embedding Indexes | ✅ Curation & similarity search | ✅ RAG & retrieval |
| Versioning | ✅ Dataset snapshots | ✅ Data lineage |
| Data Sharing | ✅ Publish datasets | ✅ Team collaboration |
Deployment Strategies
Approach 1: Batch Processing
Use Pixeltable as a batch processing engine: a Python script that ingests data, lets computed columns process it, exports results to your existing serving database viaexport_sql, and exits. No HTTP server, no FastAPI. Run it as a Cloud Run Job, ECS Task, Kubernetes Job, Lambda, or a cron container.
Use When
Use When
- Existing RDBMS (PostgreSQL, MySQL, Snowflake) and blob storage (S3, GCS, Azure Blob) must remain
- Long-running batch jobs (processing thousands of documents, hours of video)
- Background tasks triggered by a queue, cron, or webhook
- You don’t need an HTTP API at all
Architecture
Architecture
- Run Pixeltable in an ephemeral container (Cloud Run Job, ECS Fargate, K8s Job, Lambda)
- Define tables, views, computed columns in
schema.py(idempotent) - Insert data from queue, RDBMS, or cloud storage
- Computed columns process everything automatically (chunking, embeddings, LLM calls)
export_sqlpushes structured results to your serving databasedestinationparameter routes generated media to cloud buckets- Container exits when done
What This Provides
What This Provides
- Native multimodal type system (Video, Document, Audio, Image, JSON)
- Declarative computed columns eliminate orchestration boilerplate
- Incremental computation automatically handles new data
export_sqlfor any SQL database (PostgreSQL, MySQL, Snowflake, SQLite)destinationparameter for routing media to S3/GCS/Azure Blob- LLM call orchestration with automatic rate limiting
- Iterators for chunking documents, extracting frames, splitting audio
Approach 2: Pixeltable as Full Backend
Use Pixeltable for both orchestration and storage as your primary data backend.Use When
Use When
- Building new multimodal AI application
- Semantic search and vector similarity required
- Storage and ML pipeline need tight integration
- Stack consolidation preferred over separate storage/orchestration layers
Architecture
Architecture
- Deploy Pixeltable on persistent instance (EC2 with EBS, EKS with persistent volumes, VM)
- Build API endpoints (FastAPI, Flask, Django) that interact with Pixeltable tables
- Frontend calls endpoints to insert data and retrieve results
- Query using Pixeltable’s semantic search, filters, joins, and aggregations
- All data stored in Pixeltable: metadata, media references, computed column results
What This Provides
What This Provides
- Unified storage, computation, and retrieval in single system
- Native semantic search via embedding indexes (pgvector)
- No synchronization layer between storage and orchestration
- Automatic versioning and lineage tracking
- Incremental computation propagates through views
- LLM/agent orchestration
- Data export to PyTorch, Parquet, LanceDB
FastAPIRouter auto-generates request/response schemas from column types, handles file uploads via uploadfile_inputs, and supports background=True for long-running inserts. OpenAPI docs are available at /docs.
Approach 3: Declarative Serving (pxt serve)
Generate a complete REST API from a TOML config. No FastAPI code, no frontend, no hand-written endpoints. Define your schema in Python, declare routes in pyproject.toml, and run pxt serve.
Use When
Use When
- You need an API but not a frontend
- Endpoints are standard insert, query, delete, or
export_sqloperations - Prototyping an API before building a full application
- You want zero Python web framework code
Architecture
Architecture
- Define tables, views, computed columns, embedding indexes in
schema.py - Declare routes in
pyproject.tomlusing[[tool.pixeltable.service.routes]] - Run
pxt serve my-serviceto generate and start a FastAPI app - Supports insert, query, delete, and
export_sqlroute types - Auto-generates OpenAPI/Swagger docs
What This Provides
What This Provides
- Complete REST API from configuration alone
- Auto-generated request/response schemas
- Background job support for long-running inserts
export_sqlroutes for pushing data to external databases- OpenAPI documentation out of the box
- Same Pixeltable capabilities (computed columns, embedding indexes, etc.)
Get Started
Scaffold a project in one command, then customize:| Directory | Pattern | What it demonstrates |
|---|---|---|
backend/ + frontend/ | Full Backend | FastAPI + React with persistent storage, multimodal upload, cross-modal search, tool-calling agent. Deployment via Docker, Helm, Terraform, CDK. |
batch/ | Batch Processing | Pure Python script: ingest, computed columns, export_sql to serving DB. Deploy to Cloud Run Jobs, Lambda, ECS Fargate, K8s Jobs. |
serving/ | Declarative Serving | pxt serve from TOML config: zero-code REST API with insert, query, search, and export_sql routes. |
templates/ | Application Templates | 7 vertical templates (knowledge base, chat agent, audio transcription, video search, media indexing, image dataset, full-stack showcase) scaffolded via pixeltable-new --template. |
Next Steps
HTTP Serving
Expose tables and queries as HTTP endpoints with TOML or Python
Infrastructure Setup
Code organization and storage architecture
Production Operations
Concurrency, error handling, and schema evolution
Security & Backup
Backup strategies and security best practices