Views in Pixeltable are best used when you need to:
Transform Data: When you need to process or reshape data from a base table (e.g., splitting documents into chunks, extracting features from images)
Filter Data: When you frequently need to work with a specific subset of your data
Create Virtual Tables: When you want to avoid storing redundant data and automatically keep derived data in sync
Build Data Workflows: When you need to chain multiple data transformations together
Save Storage: When you want to compute data on demand rather than storing it permanently
Choose views over tables when your data is derived from other base tables and needs to stay synchronized with its source. Use regular tables when you need to store original data or when the computation cost of deriving data on demand is too high.
Phase 1: Define your base table and view structure
Copy
Ask AI
import pixeltable as pxtfrom pixeltable.iterators import DocumentSplitter# Create a directory to organize data (optional)pxt.drop_dir('documents', force=True)pxt.create_dir('documents')# Define your base table firstdocuments = pxt.create_table( "documents.collection", {"document": pxt.Document})# Create a view that splits documents into chunkschunks = pxt.create_view( 'documents.chunks', documents, iterator=DocumentSplitter.create( document=documents.document, separators='token_limit', limit=300 ))
import pixeltable as pxt# Connect to your base table and viewdocuments = pxt.get_table("documents.collection")chunks = pxt.get_table("documents.chunks")# Insert data into base table - view updates automaticallydocuments.insert([{ "document": "path/to/document.pdf"}])# Query the viewprint(chunks.collect())
# Basic filtering on viewchunks.where(chunks.text.contains('specific topic')).collect()# Select specific columnschunks.select(chunks.text, chunks.pos).collect()# Order resultschunks.order_by(chunks.pos).limit(5).collect()
Computed Columns
Add computed columns to views:
Copy
Ask AI
# Add embeddings to chunkschunks.add_computed_column( embedding=sentence_transformer.using( model_id='intfloat/e5-large-v2' )(chunks.text))
Chaining Views
Create views based on other views:
Copy
Ask AI
# Create a view of embedded chunksembedded_chunks = pxt.create_view( 'docs.embedded_chunks', chunks.where(chunks.text.len() > 100))