This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Why Pixeltable
Every multimodal AI app needs the same five things: store media, run
models, index embeddings, serve endpoints, version everything. Most
teams glue together 5-8 services (Postgres + Pinecone + S3 + Airflow +
LangChain + …) and spend more time on infrastructure than on the
product.
Pixeltable is a single system that handles all five. One
pip install, one Python API, one place to store, transform, index,
retrieve, serve, version, observe, and debug.
For developers and vibe coders: Pixeltable’s declarative API means
AI assistants generate correct, production-grade code. No glue logic, no
orchestrator configs, no serialization code. Experimenting on multimodal
data (extract a frame, run a model, draw bounding boxes) is one
expression, not a pipeline. Install the Pixeltable
Skill and prompt.
For teams evaluating infrastructure: Transaction integrity, async
execution, parallelization, caching, retries, and observability are
built in. One system to operate, monitor, and maintain. Schema changes
are one line. Model upgrades are zero-downtime.
Extensible via @pxt.udf, @pxt.uda, @pxt.query. 20+ AI
providers built
in. Skill | MCP
Server
| Starter Kit
| llms.txt: docs.pixeltable.com/llms.txt
%pip install -qU pixeltable google-genai 'fastapi[standard]'
%pip install -qU torch torchvision transformers # optional, for object detection
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
import getpass
import logging
import os
import warnings
warnings.filterwarnings('ignore')
logging.getLogger('asyncio').setLevel(logging.CRITICAL)
logging.getLogger('huggingface_hub').setLevel(logging.CRITICAL)
if (
'GEMINI_API_KEY' not in os.environ
and 'GOOGLE_API_KEY' not in os.environ
):
os.environ['GEMINI_API_KEY'] = getpass.getpass('Gemini API Key: ')
import pixeltable as pxt
from pixeltable.functions import gemini
BASE_URL = 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources'
- Store: Multimodal Tables
Video, audio, images, and documents are first-class column types.
pip install pixeltable is all you need.
from pixeltable.functions.uuid import uuid7
pxt.drop_dir('demo', force=True, if_not_exists='ignore')
pxt.create_dir('demo')
videos = pxt.create_table(
'demo/videos',
{'id': uuid7(), 'video': pxt.Video, 'title': pxt.String},
primary_key='id',
)
videos
Created directory ‘demo’.
Created table ‘videos’.
- Orchestrate: AI as Computed Columns
Add a computed column; Pixeltable calls Gemini on every insert, caches
results, retries failures, keeps embeddings in sync.
videos.add_computed_column(
response=gemini.generate_content(
[videos.video, 'Describe this video in detail.'],
model='gemini-3-flash-preview',
)
)
videos.add_computed_column(
description=videos.response.candidates[0]
.content.parts[0]
.text.astype(pxt.String)
)
videos.add_embedding_index(
'description',
embedding=gemini.embed_content.using(
model='gemini-embedding-2-preview'
),
)
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.03 s
- Insert: One Call Triggers the Full Pipeline
insert() downloads videos, runs Gemini, extracts text, computes
embeddings. Open the Dashboard to watch in real time.
videos.insert(
[
{
'video': f'{BASE_URL}/bangkok.mp4',
'title': 'Bangkok Street Tour',
},
{
'video': f'{BASE_URL}/The-Pursuit-of-Happiness-Video-Extract.mp4',
'title': 'The Pursuit of Happiness',
},
]
)
Inserted 2 rows with 0 errors in 22.85 s (0.09 rows/s)
2 rows inserted.
videos = pxt.get_table('demo/videos')
videos.select(videos.title, videos.description).collect()
- Retrieve: Semantic Search
Embedding index stays in sync automatically. No separate vector DB.
sim = videos.description.similarity(string='street food')
# Filter + rank in one expression
videos.where(videos.description != None).order_by(sim, asc=False).limit(
5
).select(videos.title, videos.description, sim).collect()
Extract a frame, run DETR object detection, draw bounding boxes, all in
one expression. Change timestamp and re-run to explore different
frames.
from pixeltable.functions import huggingface
from pixeltable.functions.vision import bboxes_draw
frame = videos.video.extract_frame(timestamp=2.0)
detections = huggingface.detr_for_object_detection(
frame, model_id='facebook/detr-resnet-50'
)
videos.select(
videos.title,
annotated=bboxes_draw(
frame,
boxes=detections.boxes,
labels=detections.label_text,
fill=True,
fill_alpha=0.15,
width=2,
font_size=14,
),
).collect()
- Serve: Queries Become API Endpoints
@pxt.query becomes an HTTP endpoint via FastAPIRouter. In
production, use pxt serve service.toml. See HTTP
Serving.
import fastapi
from fastapi.testclient import TestClient
from pixeltable.serving import FastAPIRouter
@pxt.query
def search_videos(query_text: str, limit: int = 5):
sim = videos.description.similarity(string=query_text)
return (
videos.order_by(sim, asc=False)
.limit(limit)
.select(videos.title, videos.description, sim)
)
app = fastapi.FastAPI()
router = FastAPIRouter()
router.add_query_route(path='/search', query=search_videos)
router.add_insert_route(
videos,
path='/ingest',
inputs=['video', 'title'],
outputs=['id', 'title', 'description'],
)
router.add_delete_route(videos, path='/delete')
app.include_router(router)
client = TestClient(app)
resp = client.post(
'/search', json={'query_text': 'street food', 'limit': 2}
)
resp.json()
{‘rows’: [{‘title’: ‘The Pursuit of Happiness’,
‘description’: ‘In this clip from the movie “The Pursuit of Happyness,” Chris Gardner (played by Will Smith) has just finished an interview for a competitive internship at a brokerage firm. Despite his disheveled appearance—wearing a grey work jacket and looking tired—he is approached by Jay Twistle, a senior manager at the firm.\n\nDetailed Scene Breakdown:\n\n* The Approach: The scene opens with Chris looking down, appearing stressed or emotional. A voice calls out “Chris…”, and he turns to see Mr. Twistle walking toward him with a wide, congratulatory smile. They are in a professional office lobby with people moving in the background and a reception desk visible.\n* The Interaction: Mr. Twistle expresses his admiration, saying, “I don't know how you did it dressed as a garbage man, but you really pulled it off in there.” This refers to Chris's impressive performance during the interview despite his unconventional attire (having come straight from a night in a jail cell due to unpaid parking tickets).\n* Building Rapport: Chris politely thanks him, addressing him as “Mr. Twistle.” In a sign of newfound respect and a positive result, Twistle insists, “Hey, now you can call me Jay. We'll talk to you soon.” He gives Chris a friendly pat on the shoulder before walking away.\n* Gardner's Reaction: Chris is left standing in the hallway, a look of immense relief and quiet triumph washing over his face. The scene highlights a pivotal moment where his intelligence and determination overcame his difficult circumstances.\n\nThe video features the “Binge Society” logo in the top left corner and copyright information at the bottom for Columbia Pictures Industries, Inc. and GH One LLC from 2006.’,
‘similarity’: 0.4487778141613705},
{‘title’: ‘Bangkok Street Tour’,
‘description’: “The video is a static, high-angle shot overlooking a busy multi-lane city street in what appears to be Bangkok, Thailand, indicated by the presence of tuk-tuks and brightly colored taxis. The scene captures the constant flow of traffic throughout the entire clip.\n\nIn the foreground on the left, a blue hatchback and a traditional three-wheeled tuk-tuk with a pink delivery bag on its back are either stationary or moving very slowly. Throughout the video, various vehicles, including white sedans, silver SUVs, motorcycles, and the city’s signature pink and green-yellow taxis, navigate the lanes. \n\nThe road is divided by a narrow median with small green bushes. Traffic moves in both directions, with vehicles heading away from the camera and towards it. On the left side of the street, large multi-story buildings feature several prominent billboards, one of which displays a woman’s face. On the right, a row of trees lines the sidewalk, behind which several large, white-roofed structures with pink accents are visible. In the background, a pedestrian overpass crosses the busy road, and taller city buildings can be seen in the distance under a bright, overcast sky. The overall atmosphere is one of a typical, bustling urban afternoon.”,
‘similarity’: 0.13178936719516832}]}
- Version: Automatic History
Every insert, update, and delete is versioned. history() returns the
full changelog.
An agent is just more computed columns. Define tools from @pxt.query
functions, wire tool calling and context assembly as a chain of columns,
and every insert triggers the full reasoning pipeline. Memory is a table
with an embedding index.
This pattern scales to production. See the Starter
Kit for a
complete implementation with documents, images, video, and cross-modal
search.
from pixeltable.functions.anthropic import invoke_tools, messages
# 1. Define tools from existing @pxt.query functions
tools = pxt.tools(search_videos)
# 2. Memory: chat history with embedding index
chat = pxt.create_table(
'demo/chat',
{
'role': pxt.String,
'content': pxt.String,
'conversation_id': pxt.String,
},
if_exists='ignore',
)
chat.add_embedding_index(
'content',
string_embed=gemini.embed_content.using(
model='gemini-embedding-2-preview'
),
if_exists='ignore',
)
# 3. Agent pipeline: each step is a computed column
agent = pxt.create_table(
'demo/agent', {'prompt': pxt.String}, if_exists='ignore'
)
# Step 1: LLM decides which tools to call
agent.add_computed_column(
response=messages(
model='claude-sonnet-4-20250514',
messages=[{'role': 'user', 'content': agent.prompt}],
tools=tools,
tool_choice=tools.choice(required=True),
max_tokens=4096,
),
if_exists='ignore',
)
# Step 2: Pixeltable executes the tool calls (runs search_videos)
agent.add_computed_column(
tool_output=invoke_tools(tools, agent.response), if_exists='ignore'
)
# In production, add more steps: assemble context, call LLM again with results.
# See the Starter Kit for the full multi-step agent pipeline.
agent.describe()
Created table ‘chat’.
Created table ‘agent’.
Added 0 column values with 0 errors in 0.01 s
Bonus: Cloud Storage (Optional)
Free managed bucket with Pixeltable Cloud. Set two config values;
computed media flows to cloud. See Cloud
Services.
# ~/.pixeltable/config.toml
[pixeltable]
api_key = "your-pixeltable-cloud-api-key"
output_media_dest = "pxtfs://yourorg:yourdb/home"
Summary
Links