Pixeltable provides a rich type system designed for multimodal AI applications. Every column and expression has an associated type that determines what data it can hold and what operations are available.
Type overview
Pixeltable Type Python Type Description pxt.StringstrText data pxt.IntintInteger numbers pxt.FloatfloatDecimal numbers pxt.BoolboolBoolean values pxt.Timestampdatetime.datetimeDate/time values pxt.Jsondict, list, str, int, float, boolFlexible JSON data pxt.Arraynp.ndarrayNumerical arrays (embeddings, tensors) pxt.ImagePIL.Image.ImageImage data pxt.Videostr (file path)Video references pxt.Audiostr (file path)Audio files pxt.Documentstr (file path)PDF/text documents
pxt.Audio, pxt.Video, and pxt.Document return file paths when queried. Pixeltable automatically downloads and caches remote media locally. Use .fileurl to get the original URL.
Basic types
import pixeltable as pxt
table = pxt.create_table( 'example.basic_types' , {
'text' : pxt.String, # Text data
'count' : pxt.Int, # Integer numbers
'score' : pxt.Float, # Decimal numbers
'active' : pxt.Bool, # Boolean values
'created' : pxt.Timestamp # Date/time values
})
Auto-generated UUIDs
Use uuid4() to create columns that auto-generate unique identifiers:
from pixeltable.functions.uuid import uuid4
# UUID as primary key - auto-generated for each row
products = pxt.create_table( 'example.products' , {
'id' : uuid4(), # Auto-generates UUID
'name' : pxt.String,
'price' : pxt.Float
}, primary_key = [ 'id' ])
# Insert without providing 'id' - it's generated automatically
products.insert([{ 'name' : 'Laptop' , 'price' : 999.99 }])
You can also add UUIDs to existing tables:
# Add UUID column to existing table
orders.add_computed_column( order_id = uuid4())
See the UUID cookbook for more examples of working with unique identifiers.
Pixeltable natively supports images, video, audio, and documents as first-class column types.
media = pxt.create_table( 'example.media' , {
'image' : pxt.Image, # Any image
'video' : pxt.Video, # Video reference
'audio' : pxt.Audio, # Audio file
'document' : pxt.Document # PDF/text document
})
Image specialization
Images can be constrained by resolution and/or color mode:
# Constrain by resolution
thumbnails = pxt.create_table( 'example.thumbnails' , {
'thumb' : pxt.Image[( 224 , 224 )] # Width 224, height 224
})
# Constrain by color mode
grayscale = pxt.create_table( 'example.grayscale' , {
'img' : pxt.Image[ 'L' ] # Grayscale (1-channel)
})
# Constrain both
rgb_fixed = pxt.create_table( 'example.rgb_fixed' , {
'img' : pxt.Image[( 300 , 200 ), 'RGB' ] # 300x200 RGB images
})
See the PIL Documentation for the full list of image modes ('RGB', 'RGBA', 'L', etc.).
Array types (embeddings & tensors)
Arrays are used for embeddings, feature vectors, and tensor data. They must always specify a shape and dtype.
ml_data = pxt.create_table( 'example.ml_features' , {
# Fixed-size embedding (e.g., from CLIP or OpenAI)
'embedding' : pxt.Array[( 768 ,), pxt.Float],
# Variable first dimension (batch of 512-dim vectors)
'features' : pxt.Array[( None , 512 ), pxt.Float],
# 3D tensor with flexible dimensions
'tensor' : pxt.Array[( None , None , 3 ), pxt.Float]
})
Array shapes follow NumPy conventions. Use None for unconstrained dimensions:
(512,) — fixed 512-element vector
(None, 768) — variable-length sequence of 768-dim vectors
(64, 64, 3) — fixed 64×64×3 tensor
Working with arrays
# Arrays can be sliced like NumPy arrays
t.select(
t.embedding[ 0 ], # First element
t.embedding[ 5 : 10 ], # Slice
t.embedding[ - 3 :] # Last 3 elements
).collect()
JSON type
The Json type stores flexible structured data—dictionaries, lists, or primitives.
logs = pxt.create_table( 'example.logs' , {
'event' : pxt.Json
})
logs.insert([
{ 'event' : { 'type' : 'click' , 'x' : 100 , 'y' : 200 }},
{ 'event' : { 'type' : 'scroll' , 'delta' : 50 }},
{ 'event' : [ 'tag1' , 'tag2' , 'tag3' ]}
])
JSON path access
Access nested data using dictionary or attribute syntax:
# Dictionary syntax
t.select(t.event[ 'type' ]).collect()
# Attribute syntax (JSONPath)
t.select(t.event.type).collect()
# List indexing
t.select(t.event.tags[ 0 ]).collect()
# Slicing
t.select(t.event.tags[: 2 ]).collect()
Pixeltable handles missing keys gracefully—you’ll get None instead of an exception.
JSON schema validation
Validate JSON columns against a schema to ensure data integrity:
# Define a JSON schema
movie_schema = {
'type' : 'object' ,
'properties' : {
'title' : { 'type' : 'string' },
'year' : { 'type' : 'integer' },
'rating' : { 'type' : 'number' }
},
'required' : [ 'title' , 'year' ]
}
# Create table with validated JSON column
movies = pxt.create_table( 'example.validated_movies' , {
'data' : pxt.Json[movie_schema]
})
# Valid insert
movies.insert( data = { 'title' : 'Inception' , 'year' : 2010 , 'rating' : 8.8 })
# Invalid insert raises error (missing required 'year')
# movies.insert(data={'title': 'Movie'}) # Error!
Using Pydantic models
from pydantic import BaseModel
class Movie ( BaseModel ):
title: str
year: int
rating: float | None = None
# Use the model's JSON schema for validation
movies = pxt.create_table( 'example.pydantic_movies' , {
'data' : pxt.Json[Movie.model_json_schema()]
})
Type conversion
Use astype() to convert between compatible types:
# Cast float to integer
table.select( int_score = table.score.astype(pxt.Int)).collect()
# Cast integer to string
table.select( str_count = table.count.astype(pxt.String)).collect()
# Cast JSON to String (if the value is actually a string)
table.select( text = table.json_col.astype(pxt.String)).collect()
Type conversion assumes the underlying value is compatible. Converting a dict to a string will raise an exception.
Column properties
Media columns (Image, Video, Audio, Document) have special properties:
# Local file path (Pixeltable ensures this is on local filesystem)
t.select(t.image.localpath).collect()
# Original URL where the media resides
t.select(t.image.fileurl).collect()
Error properties
Computed columns have errortype and errormsg properties for debugging:
# Create a computed column that might fail
t.add_computed_column(
result = some_function(t.input),
on_error = 'ignore' # Continue on errors
)
# Query error information for failed rows
t.where(t.result == None ).select(
t.input,
t.result.errortype, # Exception class name
t.result.errormsg # Error message
).collect()
Best practices
Use Specific Types Prefer pxt.Image[(224,224), 'RGB'] over pxt.Image when you know the constraints. This enables optimizations and catches errors early.
Validate JSON Use JSON schema validation or Pydantic models for structured data to ensure consistency across your pipeline.
Specify Array Shapes Always specify array shapes and dtypes. Use None for variable dimensions: pxt.Array[(None, 768), pxt.Float].
Handle Errors Gracefully Use on_error='ignore' in production pipelines, then query .errortype and .errormsg to debug failures.
See also