When to Use Views

Views in Pixeltable are best used when you need to:

  1. Transform Data: When you need to process or reshape data from a base table (e.g., splitting documents into chunks, extracting features from images)
  2. Filter Data: When you frequently need to work with a specific subset of your data
  3. Create Virtual Tables: When you want to avoid storing redundant data and automatically keep derived data in sync
  4. Build Data Workflows: When you need to chain multiple data transformations together
  5. Save Storage: When you want to compute data on demand rather than storing it permanently

Choose views over tables when your data is derived from other base tables and needs to stay synchronized with its source. Use regular tables when you need to store original data or when the computation cost of deriving data on demand is too high.

Phase 1: Define your base table and view structure

import pixeltable as pxt
from pixeltable.iterators import DocumentSplitter

# Create a directory to organize data (optional)
pxt.drop_dir('documents', force=True)
pxt.create_dir('documents')

# Define your base table first
documents = pxt.create_table(
    path_str="documents.collection",
    {"document": pxt.Document}
)

# Create a view that splits documents into chunks
chunks = pxt.create_view(
    'documents.chunks',
    documents,
    iterator=DocumentSplitter.create(
        document=documents.document,
        separators='token_limit',
        limit=300
    )
)

Phase 2: Use your application

import pixeltable as pxt

# Connect to your base table and view
documents = pxt.get_table("documents.collection")
chunks = pxt.get_view("documents.chunks")

# Insert data into base table - view updates automatically
documents.insert([{
    "document": "path/to/document.pdf"
}])

# Query the view
print(chunks.collect())

View Types

View Operations

Query Operations

Query views like regular tables:

# Basic filtering on view
chunks.where(chunks.text.contains('specific topic')).collect()

# Select specific columns
chunks.select(chunks.text, chunks.pos).collect()

# Order results
chunks.order_by(chunks.pos).limit(5).collect()

Computed Columns

Add computed columns to views:

# Add embeddings to chunks
chunks.add_computed_column(
    embedding=sentence_transformer.using(
        model_id='intfloat/e5-large-v2'
    )(chunks.text)
)

Chaining Views

Create views based on other views:

# Create a view of embedded chunks
embedded_chunks = pxt.create_view(
    'docs.embedded_chunks',
    chunks.where(chunks.text.len() > 100)
)

Key Features

Automatic Updates

Views automatically update when base tables change

Virtual Storage

Views compute data on demand, saving storage

Workflow Integration

Views can be part of larger data workflows

Additional Resources