Quick Start

This guide will get you from zero to a working AI application in under 5 minutes. Learn more by looking at this tutorial on Github.

Create Your First Multimodal AI Application

Let’s build an image analysis application that combines object detection and OpenAI Vision.

Installation

Please refer to our installation section here.

pip install -qU torch transformers openai pixeltable

Create Table Structure

import pixeltable as pxt

# Create directory for our tables
pxt.drop_dir('demo', force=True)  
pxt.create_dir('demo')

# Create table with image column
t = pxt.create_table('demo.first', {'input_image': pxt.Image})

This creates a persistent and versioned table that holds data.

Add Object Detection

from pixeltable.functions import huggingface

# Add ResNet-50 object detection
t.add_computed_column(
    detections=huggingface.detr_for_object_detection(
        t.input_image, 
        model_id='facebook/detr-resnet-50'
    )
)

# Extract just the labels
t.add_computed_column(detections_text=t.detections.label_text)

Computed columns are populated whenever new data is added to their input columns.

Add OpenAI Vision Analysis

import os
import getpass

from pixeltable.functions import openai    

if 'OPENAI_API_KEY' not in os.environ:
  os.environ['OPENAI_API_KEY'] = getpass.getpass('Enter your OpenAI API key:')        

t.add_computed_column(
    vision=openai.vision(
        prompt="Describe what's in this image.",
        image=t.input_image,
        model='gpt-4o-mini'
    )
)

Pixeltable handles parallelization, rate limiting, and incremental processing automatically.

Use Your Application

# Insert an image
t.insert(input_image='https://raw.github.com/pixeltable/pixeltable/release/docs/resources/images/000000000025.jpg')

# Retrieve results
t.select(
  t.input_image,
  t.detections_text,
  t.vision
).collect()

The query engine uses lazy evaluation, only computing what’s needed.

What happened behind the scenes?

Key Features

Persistent Storage

All data and computed results are automatically stored and versioned. Your app state persists between sessions.

Computed Columns

Define transformations once, they run automatically on new data. Perfect for AI orchestration.

Multimodal Support

Handle images, video, audio, and text seamlessly in one unified interface.

AI Integration

Built-in support for popular AI services like OpenAI, YOLOX, Hugging Face, Label Studio, Replicate, Anthropic…

Custom Functions (UDFs)

Extend Pixeltable with your own functions using the @pxt.udf decorator:

@pxt.udf
def top_detection(detect: dict) -> str:
    scores = detect['scores']
    label_text = detect['label_text']
    i = scores.index(max(scores))
    return label_text[i]

# Use it in a computed column
t.add_computed_column(top=top_detection(t.detections))

Next Steps

RAG Tutorial

Build a production-grade RAG system

Video Analysis

Process video with object detection

LLM Integration

Work with OpenAI and other LLM providers

Welcome to Pixeltable

Multimodal AI Datastore

Tutorials

Libraries

Create Your First Multimodal AI Application

Installation

Key Features

Persistent Storage

Computed Columns

Multimodal Support

AI Integration

Custom Functions (UDFs)

Next Steps

RAG Tutorial

Video Analysis

LLM Integration

Welcome to Pixeltable

Multimodal AI Datastore

Tutorials

Libraries

​Create Your First Multimodal AI Application

​Installation

​Key Features

Persistent Storage

Computed Columns

Multimodal Support

AI Integration

​Custom Functions (UDFs)

​Next Steps

RAG Tutorial

Video Analysis

LLM Integration

Create Your First Multimodal AI Application

Installation

Key Features

Custom Functions (UDFs)

Next Steps