FAQ
Frequently asked questions about Pixeltable
Core Concepts
Pixeltable is open-source AI data infrastructure providing a declarative, incremental approach for multimodal workloads. It unifies data management, transformation, and AI model execution under a table-like interface. Key features:
- Unified Interface: Manages text, images, video, and audio in a single framework
- Declarative Design: Defines transformations and model inference as computed columns
- Incremental Processing: Automatically handles caching and selective recomputation
- Type System: Provides data validation for multimodal content types
import pixeltable as pxt
from pixeltable.iterators import DocumentSplitter
# Create multimodal table for RAG
docs = pxt.create_table('chatbot.documents', {
'document': pxt.Document, # PDF/Text files
'video': pxt.Video, # MP4 videos
'audio': pxt.Audio, # Audio files
'timestamp': pxt.Timestamp
})
# Create view for document chunking
chunks = pxt.create_view(
'chatbot.chunks',
docs,
iterator=DocumentSplitter.create(
document=docs.document,
separators='sentence',
metadata='title,heading'
)
)
# Add embedding index for search
chunks.add_embedding_index(
'text',
string_embed=sentence_transformer
)
Pixeltable’s data management approach includes:
- Media Storage: References external files (videos, images, documents) in their original locations
- Incremental Computation: Recomputes only affected parts of the workflow when inputs change
- Type System: Handles various data types including tensors, embeddings, and structured data
- Computed Columns: Defines transformations as functions of other columns
- Built-in Functions: Provides pre-implemented operations for common AI tasks
# Example video processing
frames = pxt.create_view(
'video_search.frames',
videos,
iterator=FrameIterator.create(
video=videos.video,
fps=1 # Extract 1 frame per second
)
)
# Add multimodal search
frames.add_embedding_index(
'frame',
string_embed=clip_text, # For text-to-image search
image_embed=clip_image # For image-to-image search
)
Pixeltable’s architecture includes views and computed columns:
Views
- Virtual tables generated from base tables using iterators (e.g., DocumentSplitter, FrameIterator)
- Enable efficient chunking of documents or extraction of video frames
- Support embedding indexes for similarity search
# Create document chunks view
chunks = pxt.create_view(
'docs.chunks',
docs,
iterator=DocumentSplitter.create(
document=docs.document,
separators='sentence'
)
)
Computed Columns
- Columns defined as functions of other columns
- Update automatically when their dependencies change
- Can invoke external services (e.g., LLMs, embedding models)
- Implement custom logic via User-Defined Functions (UDFs)
# Example computed column for generating embeddings
docs_table.add_computed_column(
embeddings=openai.embeddings(
docs_table.text,
model='text-embedding-3-small'
)
)
# Custom UDF example
@pxt.udf
def create_prompt(context: list[dict], question: str) -> str:
context_text = "\n".join(item['text'] for item in context)
return f"Context:\n{context_text}\n\nQuestion: {question}"
# Using the UDF in a computed column
docs_table.add_computed_column(
prompt=create_prompt(docs_table.context, docs_table.question)
)
Features & Capabilities
Data Management
- Handles text, images, video, and audio in a unified framework
- Maintains data lineage and version history
- Provides caching mechanisms for efficiency
RAG Implementation
- Supports document chunking with configurable strategies
- Manages embedding generation and indexing
- Enables similarity search for context retrieval
- Integrates with various LLM providers
Media Processing
- Extracts and processes video frames
- Supports audio transcription and analysis
- Enables cross-modal search (e.g., searching videos with text)
Development Features
- Implements computations declaratively
- Processes updates incrementally
- Provides type validation for data integrity
- Supports SQL-like queries for data selection
Pixeltable implements RAG workflows through:
# Create chunks view
chunks = pxt.create_view(
'chatbot.chunks',
docs,
iterator=DocumentSplitter.create(
document=docs.document,
separators='sentence',
metadata='title,heading'
)
)
# Add embedding index
chunks.add_embedding_index(
'text',
string_embed=sentence_transformer
)
# Define context retrieval query
@chunks.query
def get_context(query: str):
sim = chunks.text.similarity(query)
return chunks.order_by(sim, asc=False).limit(5)
# Generate response with context
docs.add_computed_column(
context=get_context(docs.question)
)
docs.add_computed_column(
response=openai.chat_completions(
messages=create_prompt(
docs.context,
docs.question
),
model='gpt-4o'
)
)
Pixeltable supports video and image workflows:
# Frame extraction
frames = pxt.create_view(
'video_search.frames',
videos,
iterator=FrameIterator.create(
video=videos.video,
fps=1
)
)
# Object detection
frames.add_computed_column(
detections=yolox(
frames.frame,
model_id='yolox_tiny',
threshold=0.25
)
)
# Cross-modal search
frames.add_embedding_index(
'frame',
string_embed=clip_text, # For text-to-image search
image_embed=clip_image # For image-to-image search
)
# Text query for video frames
results = frames.frame.similarity("person walking on beach")
.order_by(sim, asc=False)
.limit(5)
.collect()
Integration & Deployment
Pixeltable provides integrations with:
from pixeltable.functions import openai, anthropic
from pixeltable.functions.huggingface import (
sentence_transformer,
clip_image,
clip_text
)
# OpenAI integrations
table.add_computed_column(
embeddings=openai.embeddings(
table.text,
model='text-embedding-3-small'
)
)
# Anthropic integrations
table.add_computed_column(
analysis=anthropic.messages(
model='claude-3-sonnet-20240229',
messages=table.prompt
)
)
# Hugging Face integrations
table.add_computed_column(
image_embeddings=clip_image(
table.image,
model_id='openai/clip-vit-base-patch32'
)
)
Pixeltable also supports local model inference via Ollama, LlamaCPP, and other integrations.
Pixeltable integrates with web frameworks like FastAPI and Gradio:
# FastAPI + Pixeltable example
@app.post("/chat")
async def chat(message: ChatMessage):
# Insert question
chat_table.insert([{
'question': message.message,
'timestamp': datetime.now()
}])
# Get answer (computed columns handle RAG pipeline)
result = chat_table.select(
chat_table.response
).where(
chat_table.question == message.message
).collect()
return JSONResponse(
status_code=200,
content={"response": result['response'][0]}
)
Using Pixeltable
Pixeltable is designed for:
Retrieval-Augmented Generation (RAG)
- Document processing, chunking, and embedding
- Context retrieval and relevance ranking
- LLM integration for question answering
- Multimodal RAG with support for text, video, audio sources
Video and Image Analysis
- Frame extraction and processing
- Object detection and analysis
- Semantic search across video content
- Content transcription and analysis
ML Workflow Management
- Data preparation and transformation
- Feature extraction and engineering
- Model inference orchestration
- Data versioning and lineage tracking
Key technical characteristics:
- Declarative Computation Model
- Defines data transformations as computed columns
- Automatically manages dependency graphs
- Uses SQL-like operations for data manipulation
- Tracks data lineage at column level
- Multimodal Data Support
- Handles diverse data types with a consistent interface
- Provides built-in transformations for different modalities
- Supports cross-modal operations (e.g., text-to-image search)
- Manages storage and processing efficiency
- Incremental Computation
- Recomputes only what’s necessary when data changes
- Caches intermediate results
- Versions data automatically
- Optimizes computational resource usage
Pixeltable’s technical specifications:
- Python Version: 3.9 or higher
- Media Storage: References external files in local, remote, or cloud storage
- Memory Requirements: Varies based on dataset size and transformations
- GPU Support: Optional, beneficial for computer vision tasks and local LLM inference
- OS Support: Linux, macOS, Windows
Pixeltable can be installed via pip:
pip install pixeltable
Was this page helpful?