Persistent Storage

All data and computed results are automatically stored and versioned.

Computed Columns

Define transformations once; they run automatically on new data.

Multimodal Support

Handle images, video, audio, and text seamlessly in one unified interface.

AI Integration

Built-in support for AI services like OpenAI, YOLOX, Together, Label Studio, Replicate…

The below steps will get you started in 1 minute. Learn more by looking at this tutorial on Github.

Getting Started

Start Building (Step 1)

pip install pixeltable

Get up and running with basic tables, queries, and computed columns.

# Create a table to hold data
t = pxt.create_table('films_table', {
    'name': pxt.String,
    'revenue': pxt.Float,
    'budget': pxt.Float
})

All data and computed results are automatically stored and versioned.

Add Processing (Step 2)

Add LLMs, computer vision, embeddings indices, and build your first multimodal app.

from pixeltable.functions.huggingface import clip
import PIL.Image

# create embedding index on the 'img' column of table 't'
t.add_embedding_index(
    'img',
    embedding=clip.using(model_id='openai/clip-vit-base-patch32')
)

# index is kept up-to-date enabling relevant searches
sim = t.img.similarity(sample_img)

res = (
    t.order_by(sim, asc=False)  # Order by similarity
    .where(t.id != 6)  # Metadata filtering
    .limit(2)  # Limit number of results to 2
    .select(t.id, t.img, sim)
    .collect()  # Retrieve results now
)

Pixeltable orchestrates model execution, ensuring results are stored, indexed, and accessible through the same query interface.

Scale Up (Step 3)

Handle production data volumes, and deploy your application.

# Import media data (videos, images, audio...)
v = pxt.create_table('videos', {'video': pxt.Video})

prefix = 's3://multimedia-commons/'
paths = [
    'data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4',
    'data/videos/mp4/ffe/feb/ffefebb41485539f964760e6115fbc44.mp4',
    'data/videos/mp4/ffe/f73/ffef7384d698b5f70d411c696247169.mp4'
]
v.insert({'video': prefix + p} for p in paths)

Handle images, video, audio, numbers, array and text seamlessly in one interface.

Next Steps