Skip to main content
The only open source Python library providing declarative data infrastructure for building multimodal AI applications, enabling incremental storage, transformation, indexing, retrieval, and orchestration of data. With Pixeltable, you define your entire data processing and AI workflow declaratively using computed columns on tables. Focus on your application logic, not the data plumbing.

Before Pixeltable

AI teams are building on images, video, audio, and text, but the infrastructure is broken:

Fragmented Data

Data lives across object stores, vector DBs, SQL, and ad-hoc pipelines. No single source of truth.

Costly Iteration

Every model change requires reprocessing. Pipelines are brittle and hard to reproduce.
This creates high engineering cost, slow iteration, and production risk.
Pixeltable solves this. One system for storage, orchestration, and retrieval. Transactions, incremental updates, and automatic dependency tracking built in.

With Pixeltable

Persistent Storage

All data and computed results are automatically stored and versioned.

Incremental Updates

Data transformations run automatically on new data. No orchestration code needed.

Multimodal-Native

Images, video, audio, and documents integrate seamlessly with structured data.

AI Integration

Built-in support for OpenAI, Anthropic, Gemini, Hugging Face, and dozens more.

Get started

Quick Start

Install Pixeltable and run your first pipeline in 5 minutes.

10-Minute Tour

See Pixeltable in action with a hands-on image workflow.

Core Concepts

Learn about tables, computed columns, views, and the type system.

SDK Reference

Complete API reference for the Pixeltable Python SDK.
Many documentation pages are interactive notebooks (marked with in the sidebar). Open them in Colab, Kaggle, or locally to follow along.

Core Primitives

Pixeltable provides a small set of primitives that compose into any multimodal AI workflow:
Create tables with native multimodal types
t = pxt.create_table('myapp.media', {
    'video': pxt.Video,
    'image': pxt.Image,
    'audio': pxt.Audio,
    'document': pxt.Document,
    'metadata': pxt.Json
})

Tables & Data

Create, insert, update, delete

Type System

All supported types
Declarative computed columns: API calls, LLM inference, local models, vision
# LLM API call
t.add_computed_column(summary=openai.chat_completions(
    messages=[{'role': 'user', 'content': 'Summarize: ' + t.text}]
))

# Local model inference
t.add_computed_column(objects=yolox(t.image, model_id='yolox_s'))

# Vision analysis
t.add_computed_column(desc=openai.vision(prompt="Describe", image=t.image))

Computed Columns

Incremental transforms

AI Integrations

OpenAI, Anthropic, Gemini, HuggingFace…
Explode rows: video→frames, doc→chunks, audio→segments
# Extract frames from video at 1 fps
frames = pxt.create_view('myapp.frames', t, iterator=frame_iterator(t.video, fps=1))

# Chunk documents for RAG
chunks = pxt.create_view('myapp.chunks', t, iterator=document_splitter(t.document))

Views

Virtual tables

Iterators

Frame, Document, Audio splitters
Add embedding indexes for semantic search
t.add_embedding_index('text', embedding=openai.embeddings())

# Search by similarity
results = t.order_by(t.text.similarity('find relevant docs'), asc=False).limit(10)

Embedding Indexes

Vector search with automatic maintenance
Write custom functions with @pxt.udf and @pxt.query
@pxt.udf
def extract_entities(text: str) -> list[str]:
    # Your custom logic
    return entities

@pxt.query
def search_by_topic(topic: str):
    return t.where(t.category == topic).select(t.title, t.summary)

UDFs & Queries

Custom Python functions
Tool calling for AI agents and MCP integration
# Load tools from MCP server, UDFs, and queries
mcp_tools = pxt.mcp_udfs('http://localhost:8000/mcp')
tools = pxt.tools(search_by_topic, extract_entities, *mcp_tools)

# LLM decides which tool to call; Pixeltable executes it
t.add_computed_column(response=openai.chat_completions(
    messages=[{'role': 'user', 'content': t.question}],
    tools=tools
))
t.add_computed_column(result=openai.invoke_tools(tools, t.response))

Tool Calling

Build agents with tools

Agents & MCP

MCP servers, memory, Pixelbot
SQL-like queries + test transformations before committing
# Query data with familiar syntax
results = t.where(t.score > 0.8).order_by(t.timestamp).limit(10).collect()

# Test transformations on sample rows BEFORE adding to table
t.select(t.text, summary=summarize(t.text)).head(3)  # Nothing stored yet
t.add_computed_column(summary=summarize(t.text))      # Now commit to all rows

Queries & Expressions

Select, filter, aggregate

Iterative Development

Test before commit
Time travel and automatic versioning
t.history()                    # View all versions
t.revert(version=5)            # Rollback changes
old_data = pxt.get_table('myapp.media:3')  # Query past version

Version Control

History, snapshots, lineage
Load from any source, export to ML formats
# Import from files, URLs, S3, Hugging Face
t.insert(pxt.io.import_csv('data.csv'))
t.insert(pxt.io.import_huggingface_dataset(dataset))

# Export to ML/analytics formats
pxt.io.export_parquet(t, 'output.parquet')
loader = DataLoader(t.to_pytorch_dataset(), batch_size=32)
coco_path = t.to_coco_dataset()

Data Import

CSV, JSON, Parquet, S3, HF

Data Export

PyTorch, Parquet, COCO, LanceDB
Publish and replicate datasets via Pixeltable Cloud
pxt.publish(t, 'my-dataset')              # Share publicly
pxt.replicate('user/dataset', 'local')   # Pull to local

Data Sharing

Publish, replicate, collaborate

Use Cases

Pixeltable’s primitives are use-case agnostic. They compose into any multimodal AI workflow:

Data Wrangling for ML

Curate, augment, export training datasets. Pre-annotate with models, integrate Label Studio, export PyTorch.

Backend for AI Apps

Build RAG systems, semantic search, and multimodal APIs. Pixeltable handles storage, retrieval, and orchestration.

Agents & MCP

Tool-calling agents with persistent memory, MCP server integration, and automatic conversation history.
Start with the Quick Start to get running in 5 minutes, or explore Cookbooks for hands-on examples covering RAG, video analysis, audio transcription, and more.

Choose How You Run Pixeltable

Pixeltable OSS

Open-source Python library. Install with pip install pixeltable and run locally. Same APIs scale to production.

Pixeltable Cloud

Data sharing available now. Managed endpoints and live tables coming soon.

Book a Demo

Schedule a call to discuss your use case and see how Pixeltable can help.

Next steps

Join the Community

Get help, share projects, and connect with other developers

GitHub

Star the repo, report issues, and contribute
Last modified on March 15, 2026