This guide will get you from zero to a working AI application in under 5 minutes. Learn more by looking at this tutorial on Github.

Create Your First Multimodal AI Application

Let’s build an image analysis application that combines object detection and GPT-4 Vision.

Installation

Please refer to our installation section here.

pip install -qU torch transformers openai pixeltable
1

Create Table Structure

import pixeltable as pxt

# Create directory for our tables
pxt.drop_dir('demo', force=True)  
pxt.create_dir('demo')

# Create table with image column
t = pxt.create_table('demo.first', {'input_image': pxt.Image})

This creates a persistent and versioned table that holds data.

2

Add Object Detection

from pixeltable.functions import huggingface

# Add ResNet-50 object detection
t.add_computed_column(
    detections=huggingface.detr_for_object_detection(
        t.input_image, 
        model_id='facebook/detr-resnet-50'
    )
)

# Extract just the labels
t.add_computed_column(detections_text=t.detections.label_text)

Computed columns are populated whenever new data is added to their input columns.

3

Add GPT-4 Vision Analysis

import os
import getpass

from pixeltable.functions import openai    

if 'OPENAI_API_KEY' not in os.environ:
  os.environ['OPENAI_API_KEY'] = getpass.getpass('Enter your OpenAI API key:')        

t.add_computed_column(
    vision=openai.vision(
        prompt="Describe what's in this image.",
        image=t.input_image,
        model='gpt-4o-mini'
    )
)

Pixeltable handles parallelization, rate limiting, and incremental processing automatically.

4

Use Your Application

# Insert an image
t.insert(input_image='https://raw.github.com/pixeltable/pixeltable/release/docs/resources/images/000000000025.jpg')

# Retrieve results
t.select(
  t.input_image,
  t.detections_text,
  t.vision
).collect()

The query engine uses lazy evaluation, only computing what’s needed.

Key Features

Persistent Storage

All data and computed results are automatically stored and versioned. Your app state persists between sessions.

Computed Columns

Define transformations once, they run automatically on new data. Perfect for AI orchestration.

Multimodal Support

Handle images, video, audio, and text seamlessly in one unified interface.

AI Integration

Built-in support for popular AI services like OpenAI, YOLOX, Hugging Face, Label Studio, Replicate, Anthropic…

Custom Functions (UDFs)

Extend Pixeltable with your own functions using the @pxt.udf decorator:

@pxt.udf
def top_detection(detect: dict) -> str:
    scores = detect['scores']
    label_text = detect['label_text']
    i = scores.index(max(scores))
    return label_text[i]

# Use it in a computed column
t.add_computed_column(top=top_detection(t.detections))

Next Steps