Skip to main content
Pixeltable provides a rich type system designed for multimodal AI applications. Every column and expression has an associated type that determines what data it can hold and what operations are available.

Type overview

Pixeltable TypePython TypeDescription
pxt.StringstrText data
pxt.IntintInteger numbers
pxt.FloatfloatDecimal numbers
pxt.BoolboolBoolean values
pxt.Timestampdatetime.datetimeTimestamp values
pxt.Datedatetime.dateDate values
pxt.UUIDuuid.UUIDUnique identifiers
pxt.Arraynp.ndarrayNumerical arrays (embeddings, tensors)
pxt.Jsondict, list, str, int, float, boolFlexible JSON data
pxt.ImagePIL.Image.ImageImage data
pxt.Videostr (file path)Video files
pxt.Audiostr (file path)Audio files
pxt.Documentstr (file path)Documents (PDFs, markdown, html, etc.)
pxt.Audio, pxt.Video, and pxt.Document return file paths when queried. Pixeltable automatically downloads and caches remote media locally. Use .fileurl to get the original URL.

Basic types

import pixeltable as pxt

table = pxt.create_table('example/basic_types', {
    'text': pxt.String,        # Text data
    'count': pxt.Int,          # Integer numbers
    'score': pxt.Float,        # Decimal numbers
    'active': pxt.Bool,        # Boolean values
    'created': pxt.Timestamp   # Date/time values
})

Auto-generated UUIDs

Use uuid7() to create columns that auto-generate unique identifiers:
from pixeltable.functions.uuid import uuid7

# UUID as primary key - auto-generated for each row
products = pxt.create_table('example/products', {
    'id': uuid7(),           # Auto-generates UUID
    'name': pxt.String,
    'price': pxt.Float
}, primary_key=['id'])

# Insert without providing 'id' - it's generated automatically
products.insert([{'name': 'Laptop', 'price': 999.99}])
You can also add UUIDs to existing tables:
# Add UUID column to existing table
orders.add_computed_column(order_id=uuid7())
By default, stored=True for all computed columns—values compute once and persist. For UUIDs, this ensures stable identifiers. Setting stored=False would regenerate UUIDs on every query (almost never what you want).
See the UUID cookbook for more examples of working with unique identifiers.

Media types

Pixeltable natively supports images, video, audio, and documents as first-class column types.
media = pxt.create_table('example/media', {
    'image': pxt.Image,      # Any image
    'video': pxt.Video,      # Video reference
    'audio': pxt.Audio,      # Audio file
    'document': pxt.Document # PDF/text document
})

Image specialization

Images can be constrained by resolution and/or color mode:
# Constrain by resolution
thumbnails = pxt.create_table('example/thumbnails', {
    'thumb': pxt.Image[(224, 224)]  # Width 224, height 224
})

# Constrain by color mode
grayscale = pxt.create_table('example/grayscale', {
    'img': pxt.Image['L']  # Grayscale (1-channel)
})

# Constrain both
rgb_fixed = pxt.create_table('example/rgb_fixed', {
    'img': pxt.Image[(300, 200), 'RGB']  # 300x200 RGB images
})
See the PIL Documentation for the full list of image modes ('RGB', 'RGBA', 'L', etc.).

Array types (embeddings & tensors)

Arrays are used for embeddings, feature vectors, and tensor data. They must always specify a shape and dtype.
ml_data = pxt.create_table('example/ml_features', {
    # Fixed-size embedding (e.g., from CLIP or OpenAI)
    'embedding': pxt.Array[(768,), pxt.Float],

    # Variable first dimension (batch of 512-dim vectors)
    'features': pxt.Array[(None, 512), pxt.Float],

    # 3D tensor with flexible dimensions
    'tensor': pxt.Array[(None, None, 3), pxt.Float]
})
Array shapes follow NumPy conventions. Use None for unconstrained dimensions:
  • (512,) — fixed 512-element vector
  • (None, 768) — variable-length sequence of 768-dim vectors
  • (64, 64, 3) — fixed 64×64×3 tensor

Working with arrays

# Arrays can be sliced like NumPy arrays
t.select(
    t.embedding[0],      # First element
    t.embedding[5:10],   # Slice
    t.embedding[-3:]     # Last 3 elements
).collect()

JSON type

The Json type stores flexible structured data—dictionaries, lists, or primitives.
logs = pxt.create_table('example/logs', {
    'event': pxt.Json
})

logs.insert([
    {'event': {'type': 'click', 'x': 100, 'y': 200}},
    {'event': {'type': 'scroll', 'delta': 50}},
    {'event': ['tag1', 'tag2', 'tag3']}
])

JSON path access

Access nested data using dictionary or attribute syntax:
# Dictionary syntax
t.select(t.event['type']).collect()

# Attribute syntax (JSONPath)
t.select(t.event.type).collect()

# List indexing
t.select(t.event.tags[0]).collect()

# Slicing
t.select(t.event.tags[:2]).collect()
Pixeltable handles missing keys gracefully—you’ll get None instead of an exception.

JSON schema validation

Validate JSON columns against a schema to ensure data integrity:
# Define a JSON schema
movie_schema = {
    'type': 'object',
    'properties': {
        'title': {'type': 'string'},
        'year': {'type': 'integer'},
        'rating': {'type': 'number'}
    },
    'required': ['title', 'year']
}

# Create table with validated JSON column
movies = pxt.create_table('example/validated_movies', {
    'data': pxt.Json[movie_schema]
})

# Valid insert
movies.insert(data={'title': 'Inception', 'year': 2010, 'rating': 8.8})

# Invalid insert raises error (missing required 'year')
# movies.insert(data={'title': 'Movie'})  # Error!

Using Pydantic models

from pydantic import BaseModel

class Movie(BaseModel):
    title: str
    year: int
    rating: float | None = None

# Use the model's JSON schema for validation
movies = pxt.create_table('example/pydantic_movies', {
    'data': pxt.Json[Movie.model_json_schema()]
})

Type conversion

Use astype() to convert string file paths or URLs to media types:
# String file paths → Media types
media = pxt.create_table('media_table', {'path': pxt.String})
media.insert([{'path': '/path/to/image.jpg'}])

# Convert string path to Image
media.select(img=media.path.astype(pxt.Image)).collect()
Primary use case: Converting string columns containing file paths or URLs to media types (Image, Video, Audio, Document).
For other type conversions, use built-in functions from the string, json, or math modules. For example, use string.len() to get string length as an integer, or access JSON fields directly.

Column properties

Media column properties

Media columns (Image, Video, Audio, Document) have special properties:
# Local file path (Pixeltable ensures this is on local filesystem)
t.select(t.image.localpath).collect()

# Original URL where the media resides
t.select(t.image.fileurl).collect()

Error properties

Computed columns have errortype and errormsg properties for debugging:
# Create a computed column that might fail
t.add_computed_column(
    result=some_function(t.input),
    on_error='ignore'  # Continue on errors
)

# Query error information for failed rows
t.where(t.result == None).select(
    t.input,
    t.result.errortype,   # Exception class name
    t.result.errormsg     # Error message
).collect()

Best practices

Use Specific Types

Prefer pxt.Image[(224,224), 'RGB'] over pxt.Image when you know the constraints. This enables optimizations and catches errors early.

Validate JSON

Use JSON schema validation or Pydantic models for structured data to ensure consistency across your pipeline.

Specify Array Shapes

Always specify array shapes and dtypes. Use None for variable dimensions: pxt.Array[(None, 768), pxt.Float].

Handle Errors Gracefully

Use on_error='ignore' in production pipelines, then query .errortype and .errormsg to debug failures.

See also

Last modified on February 4, 2026