Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.pixeltable.com/llms.txt

Use this file to discover all available pages before exploring further.

What Pixeltable Replaces

Most multimodal AI stacks look like this: blob storage for media, a relational database for metadata, a vector database for embeddings, an orchestrator for scheduling, and custom glue code holding it all together.
5+ services to deploy and maintain: blob storage, orchestrator, relational DB, vector DB, cache — plus custom retry logic, rate limiting, sync scripts, and error handling to wire them together.

Systems Pixeltable Replaces

You don’t install, configure, or manage these — Pixeltable handles them natively.
Instead of …With Pixeltable …
PostgreSQL / MySQLpxt.create_table() — schema is Python, versioned automatically
Pinecone / Weaviate / Qdrantadd_embedding_index() — one line, auto-maintained on insert/update/delete
S3 / boto3 / blob storagepxt.Image / Video / Audio / Document types with transparent caching; destination='s3://…' for cloud routing
Airflow / Prefect / CeleryComputed columns trigger on insert — no orchestrator, no workers, no DAGs
LangChain / LlamaIndex (RAG)@pxt.query + .similarity() + computed column chaining
pandas / polars (multimodal).sample(), ephemeral UDFs, then add_computed_column() to commit — same code, prototype to production
DVC / MLflow / W&BBuilt-in history(), revert(), time travel (table:N), snapshots — zero config
Custom retry / rate-limit / cachingBuilt into every AI integration; results cached, only new rows recomputed
Custom ETL / glue codeDeclarative schema — Pixeltable handles execution, caching, incremental updates

Tools Pixeltable Abstracts

These tools run under the hood, but you interact through a cleaner interface. This is a sample — Pixeltable wraps 30+ AI providers, dozens of built-in functions for media and data processing, and supports any Python library via @pxt.udf.
ToolRaw usageThrough Pixeltable
FFmpegInstall binary, subprocess calls, format conversion, frame seekingextract_audio(video, format='mp3') for audio; frame_iterator(video, fps=1) for frame extraction via pxt.create_view()
Pillow/PILImage.open(), resize, convert, encode, save, handle formatspixeltable.functions.image module: resize(), crop(), thumbnail(), b64_encode(), rotate(), blend(), plus width(), height(), get_metadata()
spaCypip install spacy, download model, load pipeline, parse documentsdocument_splitter(doc, separators='sentence') — spaCy runs under the hood (configurable via spacy_model parameter). Also supports 'heading', 'paragraph', 'page', 'token_limit', 'char_limit' separators
sentence-transformersLoad model, tokenize, encode batches, normalize vectorssentence_transformer.using(model_id='intfloat/e5-large-v2') passed to add_embedding_index(). Pixeltable handles model loading, batching, and index maintenance
OpenAI CLIPLoad model, preprocess images/text differently, encode, handle multimodal alignmentclip.using(model_id='openai/clip-vit-base-patch32') — multimodal embedding index that accepts both image and text queries for cross-modal search
OpenAI WhisperAPI key setup, audio format handling, chunking long files, parsing responsesopenai.transcriptions(audio=table.audio_col, model='whisper-1') as a computed column — automatic rate limiting, caching. Also supports local Whisper via whisper.transcribe()
Anthropic Claude tool callingConstruct messages, define tool schemas as JSON, parse tool_use blocks, execute tools, re-call with resultsanthropic.messages() + anthropic.invoke_tools() + pxt.tools() — all as chained computed columns. Tool schemas derived automatically from @pxt.udf function signatures
+ many moreSee the full SDK Reference, AI Integrations, and Cookbooks

What Pixeltable Doesn’t Replace

You still need these — Pixeltable is a data layer, not a full application framework.
ToolWhy you still need it
FastAPI / Flask / DjangoStandard CRUD endpoints can use built-in HTTP serving; custom logic still needs a framework
PydanticRequest/response validation for your API endpoints (Pixeltable’s .to_pydantic() bridges the two)
React / Vue / frontendUI layer — Pixeltable has no frontend
Docker / Kubernetes / TerraformDeployment infrastructure — Pixeltable runs inside your containers, it doesn’t provision them
Authentication / authorizationUser management, API keys, OAuth — outside Pixeltable’s scope
Domain-specific UDFsBusiness logic you write as @pxt.udf functions (e.g., web search, custom scoring) — Pixeltable provides the framework, you provide the logic
Migrating from a specific stack? See the step-by-step migration guides with side-by-side code comparisons:

Deployment Decision Guide

Pixeltable supports three production deployment patterns. Choose based on your constraints:
QuestionAnswerRecommendation
Building a web app with a frontend?YesFull Backend (FastAPI + React)
Need an API with zero web code?YesDeclarative Serving (pxt serve from TOML)
Need batch/background processing (cron, queue, Cloud Run Job)?YesBatch Processing (pure Python script, no HTTP server)
Existing production DB that must stay?YesBatch Processing (process in Pixeltable, export_sql to your DB)
Need semantic search (RAG)?YesFull Backend or Declarative Serving
Expose Pixeltable as MCP server for LLM tools?YesFull Backend + MCP Server

Technical Capabilities (Both)

Regardless of deployment mode, you get:

Use Case Comparison

CapabilityML Data WranglingAI Applications
Multimodal Types✅ Video, Audio, Image, Document✅ Video, Audio, Image, Document
Computed Columns✅ Enrichment & pre-annotation✅ Pipeline orchestration
Embedding Indexes✅ Curation & similarity search✅ RAG & retrieval
Versioning✅ Dataset snapshots✅ Data lineage
Data Sharing✅ Publish datasets✅ Team collaboration

Deployment Strategies

Approach 1: Batch Processing

Use Pixeltable as a batch processing engine: a Python script that ingests data, lets computed columns process it, exports results to your existing serving database via export_sql, and exits. No HTTP server, no FastAPI. Run it as a Cloud Run Job, ECS Task, Kubernetes Job, Lambda, or a cron container.
  • Existing RDBMS (PostgreSQL, MySQL, Snowflake) and blob storage (S3, GCS, Azure Blob) must remain
  • Long-running batch jobs (processing thousands of documents, hours of video)
  • Background tasks triggered by a queue, cron, or webhook
  • You don’t need an HTTP API at all
  • Run Pixeltable in an ephemeral container (Cloud Run Job, ECS Fargate, K8s Job, Lambda)
  • Define tables, views, computed columns in schema.py (idempotent)
  • Insert data from queue, RDBMS, or cloud storage
  • Computed columns process everything automatically (chunking, embeddings, LLM calls)
  • export_sql pushes structured results to your serving database
  • destination parameter routes generated media to cloud buckets
  • Container exits when done
  • Native multimodal type system (Video, Document, Audio, Image, JSON)
  • Declarative computed columns eliminate orchestration boilerplate
  • Incremental computation automatically handles new data
  • export_sql for any SQL database (PostgreSQL, MySQL, Snowflake, SQLite)
  • destination parameter for routing media to S3/GCS/Azure Blob
  • LLM call orchestration with automatic rate limiting
  • Iterators for chunking documents, extracting frames, splitting audio
# schema.py: declarative schema (idempotent, safe to re-run)
import pixeltable as pxt
from pixeltable.functions.huggingface import sentence_transformer
from pixeltable.functions.string import string_splitter
from pixeltable.functions.uuid import uuid7

pxt.create_dir('pipeline', if_exists='ignore')
embed_fn = sentence_transformer.using(model_id='all-MiniLM-L6-v2')

documents = pxt.create_table('pipeline.documents', {
    'title': pxt.String,
    'body': pxt.String,
    'source_id': pxt.String,
    'uuid': uuid7(),
    'timestamp': pxt.Timestamp,
}, primary_key=['uuid'], if_exists='ignore')

sentences = pxt.create_view(
    'pipeline.sentences', documents,
    iterator=string_splitter(text=documents.body, separators='sentence'),
    if_exists='ignore',
)
sentences.add_embedding_index(
    'text', idx_name='sentences_embed', string_embed=embed_fn, if_exists='ignore'
)
# pipeline.py: ingest, compute, export, exit
import json
from datetime import datetime
from pixeltable.io.sql import export_sql
import schema

SERVING_DB_URL = 'postgresql+psycopg://user:pass@host/db'

with open('batch.json') as f:
    batch = json.load(f)

now = datetime.now()
for row in batch['documents']:
    row.setdefault('timestamp', now)

# Insert triggers computed columns: chunking, embeddings, etc.
schema.documents.insert(batch['documents'])

# Export structured results to serving DB
export_sql(
    schema.documents.select(
        schema.documents.source_id,
        schema.documents.title,
        schema.documents.body,
    ),
    'processed_documents',
    db_connect_str=SERVING_DB_URL,
    if_exists='replace',
)

Approach 2: Pixeltable as Full Backend

Use Pixeltable for both orchestration and storage as your primary data backend.
  • Building new multimodal AI application
  • Semantic search and vector similarity required
  • Storage and ML pipeline need tight integration
  • Stack consolidation preferred over separate storage/orchestration layers
  • Deploy Pixeltable on persistent instance (EC2 with EBS, EKS with persistent volumes, VM)
  • Build API endpoints (FastAPI, Flask, Django) that interact with Pixeltable tables
  • Frontend calls endpoints to insert data and retrieve results
  • Query using Pixeltable’s semantic search, filters, joins, and aggregations
  • All data stored in Pixeltable: metadata, media references, computed column results
  • Unified storage, computation, and retrieval in single system
  • Native semantic search via embedding indexes (pgvector)
  • No synchronization layer between storage and orchestration
  • Automatic versioning and lineage tracking
  • Incremental computation propagates through views
  • LLM/agent orchestration
  • Data export to PyTorch, Parquet, LanceDB
# Example: FastAPIRouter endpoints backed by Pixeltable
import fastapi
import pixeltable as pxt
from pixeltable.serving import FastAPIRouter

app = fastapi.FastAPI()
router = FastAPIRouter(prefix="/api", tags=["documents"])

docs = pxt.get_table('myapp/documents')

# File upload with background processing (returns job handle)
router.add_insert_route(docs, path="/upload",
    uploadfile_inputs=["document"], inputs=["uploaded_at"],
    outputs=["uuid"], background=True)

# Search via @pxt.query
@pxt.query
def search_documents(query_text: str, limit: int = 10):
    sim = docs.embedding.similarity(string=query_text)
    return docs.order_by(sim, asc=False).limit(limit).select(
        docs.document, docs.summary, similarity=sim)

router.add_query_route(path="/search", query=search_documents)

# Delete by primary key
router.add_delete_route(docs, path="/delete")

app.include_router(router)
FastAPIRouter auto-generates request/response schemas from column types, handles file uploads via uploadfile_inputs, and supports background=True for long-running inserts. OpenAPI docs are available at /docs.
When to keep hand-written endpoints: Use @router.post() for multi-table operations, conditional logic, or custom response shapes. Since FastAPIRouter extends APIRouter, hand-written and declarative routes coexist on the same router. See the migration guide for details.
Use sync (def) endpoints, not async def. FastAPI dispatches sync endpoints to a thread pool, giving each request its own thread. Pixeltable is thread-safe and handles concurrent requests automatically. Using async def would block the event loop and serialize all requests. See Production Operations for details.

Approach 3: Declarative Serving (pxt serve)

Generate a complete REST API from a TOML config. No FastAPI code, no frontend, no hand-written endpoints. Define your schema in Python, declare routes in pyproject.toml, and run pxt serve.
  • You need an API but not a frontend
  • Endpoints are standard insert, query, delete, or export_sql operations
  • Prototyping an API before building a full application
  • You want zero Python web framework code
  • Define tables, views, computed columns, embedding indexes in schema.py
  • Declare routes in pyproject.toml using [[tool.pixeltable.service.routes]]
  • Run pxt serve my-service to generate and start a FastAPI app
  • Supports insert, query, delete, and export_sql route types
  • Auto-generates OpenAPI/Swagger docs
  • Complete REST API from configuration alone
  • Auto-generated request/response schemas
  • Background job support for long-running inserts
  • export_sql routes for pushing data to external databases
  • OpenAPI documentation out of the box
  • Same Pixeltable capabilities (computed columns, embedding indexes, etc.)
# pyproject.toml
[project]
name = "my-api"
requires-python = ">=3.10"

[[tool.pixeltable.service]]
name = "my-service"
prefix = "/api"
modules = ["schema"]

[[tool.pixeltable.service.routes]]
type = "query"
path = "/search"
query = "schema:search_documents"

[[tool.pixeltable.service.routes]]
type = "insert"
table = "pipeline.documents"
path = "/ingest"
inputs = ["title", "body"]
outputs = ["title", "body"]

[[tool.pixeltable.service.routes]]
type = "export_sql"
path = "/export"
query = "schema:export_query"
db_url = "postgresql+psycopg://user:pass@host/db"
table_name = "exported_docs"
if_exists = "replace"
pxt serve my-service
# curl -X POST localhost:8000/api/search -d '{"query_text": "machine learning"}'

Get Started

Scaffold a project in one command, then customize:
uvx pixeltable-new myapp              # declarative serving (default)
uvx pixeltable-new myapp --backend    # full FastAPI + React app
uvx pixeltable-new myapp --batch      # batch processing script
Or scaffold a vertical application template:
uvx pixeltable-new myapp --template knowledge-base         # cross-modal search + RAG Q&A
uvx pixeltable-new myapp --template chat-agent             # tool-calling agent with memory
uvx pixeltable-new myapp --template audio-transcription    # audio transcription + search
uvx pixeltable-new myapp --template full-stack-showcase    # FastAPI + React reference app
uvx pixeltable-new myapp --template video-search           # video frame analysis + search
uvx pixeltable-new myapp --template media-indexing         # enterprise media processing + export
uvx pixeltable-new myapp --template image-dataset          # ML dataset auto-labeling + export
Or clone the full Starter Kit for reference implementations with Docker, Helm, Terraform, CDK, and cloud job runners. The starter kit contains reference implementations for all three deployment patterns:
DirectoryPatternWhat it demonstrates
backend/ + frontend/Full BackendFastAPI + React with persistent storage, multimodal upload, cross-modal search, tool-calling agent. Deployment via Docker, Helm, Terraform, CDK.
batch/Batch ProcessingPure Python script: ingest, computed columns, export_sql to serving DB. Deploy to Cloud Run Jobs, Lambda, ECS Fargate, K8s Jobs.
serving/Declarative Servingpxt serve from TOML config: zero-code REST API with insert, query, search, and export_sql routes.
templates/Application Templates7 vertical templates (knowledge base, chat agent, audio transcription, video search, media indexing, image dataset, full-stack showcase) scaffolded via pixeltable-new --template.

Next Steps

HTTP Serving

Expose tables and queries as HTTP endpoints with TOML or Python

Infrastructure Setup

Code organization and storage architecture

Production Operations

Concurrency, error handling, and schema evolution

Security & Backup

Backup strategies and security best practices
Last modified on May 22, 2026