Skip to main content

Deployment Decision Guide

Pixeltable supports two production deployment patterns. Choose based on your constraints:
QuestionAnswerRecommendation
Existing production DB that must stay?YesOrchestration Layer
Building new multimodal app?YesFull Backend
Need semantic search (RAG)?YesFull Backend
Only ETL/transformation?YesOrchestration Layer
Expose Pixeltable as MCP server for LLM tools?YesFull Backend + MCP Server

Technical Capabilities (Both)

Regardless of deployment mode, you get:

Deployment Strategies

Approach 1: Pixeltable as Orchestration Layer

Use Pixeltable for multimodal data orchestration while retaining your existing data infrastructure.
  • Existing RDBMS (PostgreSQL, MySQL) and blob storage (S3, GCS, Azure Blob) must remain
  • Application already queries a separate data layer
  • Incremental adoption required with minimal stack changes
  • Deploy Pixeltable in Docker container or dedicated compute instance
  • Define tables, views, computed columns, and UDFs for multimodal processing
  • Process videos, documents, audio, images within Pixeltable
  • Export structured outputs (embeddings, metadata, classifications) to RDBMS
  • Export generated media to blob storage
  • Application queries existing data layer, not Pixeltable
  • Native multimodal type system (Video, Document, Audio, Image, JSON)
  • Declarative computed columns eliminate orchestration boilerplate
  • Incremental computation automatically handles new data
  • UDFs encapsulate transformation logic
  • LLM call orchestration with automatic rate limiting
  • Iterators for chunking documents, extracting frames, splitting audio
# Example: Orchestrate in Pixeltable, export to external systems
import pixeltable as pxt
from pixeltable.functions.video import extract_audio
from pixeltable.functions.openai import transcriptions
from pixeltable.iterators import FrameIterator
import psycopg2
from datetime import datetime

# Setup: Define Pixeltable orchestration pipeline
pxt.create_dir('video_processing', if_exists='ignore')

videos = pxt.create_table(
    'video_processing.videos',
    {'video': pxt.Video, 'uploaded_at': pxt.Timestamp}
)

# Computed columns for orchestration
videos.add_computed_column(
    audio=extract_audio(videos.video, format='mp3')
)
videos.add_computed_column(
    transcript=transcriptions(audio=videos.audio, model='whisper-1')
)

# Optional: Add LLM-based summary
from pixeltable.functions.openai import chat_completions
videos.add_computed_column(
    summary=chat_completions(
        messages=[{'role': 'user', 'content': f"Summarize: {videos.transcript.text}"}],
        model='gpt-4o-mini'
    )
)

# Extract frames for analysis
frames = pxt.create_view(
    'video_processing.frames',
    videos,
    iterator=FrameIterator.create(video=videos.video, fps=1.0)
)

# Insert video for processing
videos.insert([{'video': 's3://bucket/video.mp4', 'uploaded_at': datetime.now()}])

# Export structured results to external RDBMS
conn = psycopg2.connect("postgresql://...")
cursor = conn.cursor()

for row in videos.select(videos.video, videos.transcript).collect():
    cursor.execute(
        "INSERT INTO video_metadata (video_url, transcript_json) VALUES (%s, %s)",
        (row['video'], row['transcript'])
    )
conn.commit()

Approach 2: Pixeltable as Full Backend

Use Pixeltable for both orchestration and storage as your primary data backend.
  • Building new multimodal AI application
  • Semantic search and vector similarity required
  • Storage and ML pipeline need tight integration
  • Stack consolidation preferred over separate storage/orchestration layers
  • Deploy Pixeltable on persistent instance (EC2 with EBS, EKS with persistent volumes, VM)
  • Build API endpoints (FastAPI, Flask, Django) that interact with Pixeltable tables
  • Frontend calls endpoints to insert data and retrieve results
  • Query using Pixeltable’s semantic search, filters, joins, and aggregations
  • All data stored in Pixeltable: metadata, media references, computed column results
  • Unified storage, computation, and retrieval in single system
  • Native semantic search via embedding indexes (pgvector)
  • No synchronization layer between storage and orchestration
  • Automatic versioning and lineage tracking
  • Incremental computation propagates through views
  • LLM/agent orchestration
  • Data export to PyTorch, Parquet, LanceDB
# Example: FastAPI endpoints backed by Pixeltable
from fastapi import FastAPI, UploadFile
from datetime import datetime
import pixeltable as pxt

app = FastAPI()
docs_table = pxt.get_table('myapp.documents')  # Has computed columns: embedding, summary

@app.post("/documents/upload")
async def upload_document(file: UploadFile):
    status = docs_table.insert([{
        'document': file.filename,
        'uploaded_at': datetime.now()
    }])
    return {"rows_inserted": status.num_rows}

@app.get("/documents/search")
async def search_documents(query: str, limit: int = 10):
    sim = docs_table.embedding.similarity(query)
    results = docs_table.select(
        docs_table.document,
        docs_table.summary,  # LLM-generated summary (computed column)
        similarity=sim
    ).order_by(sim, asc=False).limit(limit).collect()

    return {"results": list(results)}

@app.get("/documents/{doc_id}")
async def get_document(doc_id: int):
    result = docs_table.where(docs_table._rowid == doc_id).collect()
    return result[0] if len(result) > 0 else {"error": "Not found"}

Next Steps