What are User-Defined Functions?

User-Defined Functions (UDFs) in Pixeltable allow you to extend the platform with custom Python code. They bridge the gap between Pixeltable’s built-in operations and your specific data processing needs, enabling you to create reusable components for transformations, analysis, and AI workflows. Pixeltable UDFs offer several key advantages:

Reusability: Define a function once and use it across multiple tables and operations
Type Safety: Strong typing ensures data compatibility throughout your workflows
Performance: Batch processing and caching capabilities optimize execution
Integration: Seamlessly combine custom code with Pixeltable’s query system
Flexibility: Process any data type including text, images, videos, and embeddings

UDFs can be as simple as a basic transformation or as complex as a multi-stage ML workflow. Pixeltable offers three types of custom functions to handle different scenarios:

import pixeltable as pxt

# Basic UDF for text transformation
@pxt.udf
def clean_text(text: str) -> str:
    """Clean and normalize text data."""
    return text.lower().strip()

# Use in a computed column
documents = pxt.get_table('my_documents')
documents.add_computed_column(
    clean_content=clean_text(documents.content)
)

User-Defined Functions in Pixeltable

Learn more about UDFs and UDAs with our in-depth guide.

This guide covers four types of custom functions in Pixeltable:

Basic User-Defined Functions (UDFs)
Tables as UDFs
User-Defined Aggregates (UDAs)
MCP UDFs

1. Basic User-Defined Functions (UDFs)

Overview

UDFs allow you to:

Write custom Python functions for data processing
Integrate them into computed columns and queries
Optimize performance through batching
Create reusable components for your data workflow

All UDFs require type hints for parameters and return values. This enables Pixeltable to validate and optimize your data workflow before execution.

Creating Basic UDFs

@pxt.udf
def add_tax(price: float, rate: float) -> float:
    return price * (1 + rate)

# Use in computed column
table.add_computed_column(
    price_with_tax=add_tax(table.price)
)

UDF Types

# Defined directly in your code
@pxt.udf
def extract_year(date_str: str) -> int:
    return int(date_str.split('-')[0])
    
# Used immediately
table.add_computed_column(
    year=extract_year(table.date)
)

Local UDFs are serialized with their columns. Changes to the UDF only affect new columns.

Supported Types

Basic Types

Native Python types supported in UDFs:

@pxt.udf
def process_data(
    text: str,           # String data
    count: int,          # Integer numbers
    score: float,        # Floating point
    active: bool,        # Boolean
    items: list[str],    # Generic lists
    meta: dict[str,any]  # Dictionaries
) -> str:
    return "Processed"

Complex Types

Pixeltable-specific types:

@pxt.udf
def process_media(
    img: PIL.Image.Image,    # Images
    embeddings: pxt.Array,   # Numerical arrays
    config: pxt.Json,        # JSON data
    doc: pxt.Document        # Documents
) -> str:
    return "Processed"

Performance Optimization

Batching

@pxt.udf(batch_size=16)
def embed_texts(
    texts: Batch[str]
) -> Batch[pxt.Array]:
    # Process multiple texts at once
    return model.encode(texts)

Caching

@pxt.udf
def expensive_operation(text: str) -> str:
    # Cache model instance
    if not hasattr(expensive_operation, 'model'):
        expensive_operation.model = load_model()
    return expensive_operation.model(text)

Async Support

from typing import Optional, Literal, Union, Any
import json

@pxt.udf
async def chat_completions(
    messages: list,
    *,
    model: str,
    model_kwargs: Optional[dict] = None,
) -> dict:

    # Setup API request with proper context management
    result = await openai_client.chat_completions(
        messages=messages,
        model=model,
        model_kwargs=model_kwargs
    )
    
    # Process response
    return json.loads(result.text)

# Example usage in a computed column
table.add_computed_column(
    response=chat_completions(
        [
            {'role': 'system', 'content': 'You are a helpful assistant.'},
            {'role': 'user', 'content': table.prompt}
        ],
        model='gpt-4o-mini'
    )
)

Async UDFs are specifically designed for handling external API calls, such as LLM calls, database queries, or web service interactions. They should not be used for general computation or data processing. They keep your Pixeltable workflows responsive by allowing background execution of time-consuming operations.

Best Practices for Basic UDFs

Type Safety

Always provide complete type hints
Use specific types over generic ones
Validate input ranges where appropriate

@pxt.udf
def validate_score(score: float) -> float:
    if not 0 <= score <= 100:
        raise ValueError("Score must be between 0 and 100")
    return score

Performance

Use batching for GPU operations
Cache expensive resources
Process data in chunks when possible

@pxt.udf(batch_size=32)
def process_chunk(items: Batch[str]) -> Batch[str]:
    if not hasattr(process_chunk, 'model'):
        process_chunk.model = load_expensive_model()
    return process_chunk.model.process_batch(items)

Organization

Keep related UDFs in modules
Use clear, descriptive names
Document complex operations

@pxt.udf
def normalize_text(
    text: str,
    lowercase: bool = True,
    remove_punctuation: bool = True
) -> str:
    """Normalize text by optionally lowercasing and removing punctuation."""
    if lowercase:
        text = text.lower()
    if remove_punctuation:
        text = text.translate(str.maketrans("", "", string.punctuation))
    return text

Table UDFs

2. Tables as UDFs

Overview

Tables as UDFs allow you to:

Convert entire tables into reusable functions
Create modular and complex data processing workflows
Encapsulate multi-step operations
Share workflows between different tables and applications

Tables as UDFs are particularly powerful for building AI agents and complex automation workflows that require multiple processing steps.

Creating Table UDFs

Step 1: Create a Specialized Table

# Create a table with your workflow
finance_agent = pxt.create_table('directory.financial_analyst', 
                                {'prompt': pxt.String})
# Add computed columns for processing
finance_agent.add_computed_column(/* ... */)

Step 2: Convert to UDF

# Convert table to UDF by specifying return column
finance_agent_udf = pxt.udf(finance_agent, 
                           return_value=finance_agent.answer)

Step 3: Use the Table UDF

# Use like any other UDF
result_table.add_computed_column(
    result=finance_agent_udf(result_table.prompt)
)

Flow Diagram

Agent Table UDF Flow

Key Benefits of Table UDFs

Modularity

Break complex workflows into reusable components that can be tested and maintained separately.

Encapsulation

Hide implementation details and expose only the necessary inputs and outputs through clean interfaces.

Composition

Combine multiple specialized agents to build more powerful workflows through function composition.

Advanced Techniques

Chaining Multiple Table UDFs

You can create a workflow of table UDFs to handle complex multi-stage processing:

# Create a chain of specialized agents
research_agent = pxt.udf(research_table, return_value=research_table.findings)
analysis_agent = pxt.udf(analysis_table, return_value=analysis_table.insights)
report_agent = pxt.udf(report_table, return_value=report_table.document)

# Use them in sequence
workflow.add_computed_column(research=research_agent(workflow.query))
workflow.add_computed_column(analysis=analysis_agent(workflow.research))
workflow.add_computed_column(report=report_agent(workflow.analysis))

Parallel Processing with Table UDFs

Execute multiple table UDFs in parallel and combine their results:

# Define specialized agents for different tasks
stock_agent = pxt.udf(stock_table, return_value=stock_table.analysis)
news_agent = pxt.udf(news_table, return_value=news_table.summary)
sentiment_agent = pxt.udf(sentiment_table, return_value=sentiment_table.score)

# Process in parallel
portfolio.add_computed_column(stock_data=stock_agent(portfolio.ticker))
portfolio.add_computed_column(news_data=news_agent(portfolio.ticker))
portfolio.add_computed_column(sentiment=sentiment_agent(portfolio.ticker))

# Combine results
portfolio.add_computed_column(report=combine_insights(
    portfolio.stock_data, 
    portfolio.news_data, 
    portfolio.sentiment
))

3. User-Defined Aggregates (UDAs)

Overview

UDAs enable you to:

Create custom aggregation functions
Process multiple rows into a single result
Use them in group_by operations
Build reusable aggregation logic

Creating UDAs

@pxt.uda
class sum_of_squares(pxt.Aggregator):
    def __init__(self):
        self.cur_sum = 0
        
    def update(self, val: int) -> None:
        self.cur_sum += val * val
        
    def value(self) -> int:
        return self.cur_sum

UDA Components

Initialization (__init__)
- Sets up initial state
- Defines parameters
- Called once at start
Update Method (update)
- Processes each input row
- Updates internal state
- Must handle all value types
Value Method (value)
- Returns final result
- Called after all updates
- Performs final calculations

Using UDAs

# Basic usage
table.select(sum_of_squares(table.value)).collect()

# With grouping
table.group_by(table.category).select(
    table.category,
    sum_of_squares(table.value)
).collect()

Best Practices for UDAs

Manage state carefully
Handle edge cases and errors
Optimize for performance
Use appropriate type hints
Document expected behavior

4. MCP UDFs

Overview

MCP UDFs allow you to:

Connect to a running MCP server.
Use tools exposed by the MCP server as Pixeltable UDFs.
Integrate external services and custom logic into your Pixeltable workflows.

Example

Here is a simple example of an MCP server running in a separate Python script:

from mcp.server.fastmcp import FastMCP

mcp = FastMCP('PixeltableDemo', stateless_http=True)

@mcp.tool()
def pixelmultiple(a: int, b: int) -> int:
    """Computes the Pixelmultiple of two integers."""
    return (a + 22) * b

if __name__ == '__main__':
    mcp.run(transport='streamable-http')

Additional Resources

API Documentation

Complete API reference

Welcome to Pixeltable

Multimodal AI Datastore

Tutorials

Libraries

​What are User-Defined Functions?

​User-Defined Functions in Pixeltable

​1. Basic User-Defined Functions (UDFs)

​Overview

​Creating Basic UDFs

​UDF Types

​Supported Types

​Performance Optimization

Batching

Caching

Async Support

​Best Practices for Basic UDFs

​2. Tables as UDFs

​Overview

​Creating Table UDFs

​Step 1: Create a Specialized Table

​Step 2: Convert to UDF

​Step 3: Use the Table UDF

​Flow Diagram

Agent Table UDF Flow

​Key Benefits of Table UDFs

Modularity

Encapsulation

Composition

​Advanced Techniques

​3. User-Defined Aggregates (UDAs)

​Overview

​Creating UDAs

​UDA Components

​Using UDAs

​Best Practices for UDAs

​4. MCP UDFs

​Overview

​Example

​Additional Resources

API Documentation

What are User-Defined Functions?

User-Defined Functions in Pixeltable

1. Basic User-Defined Functions (UDFs)

Overview

Creating Basic UDFs

UDF Types

Supported Types

Performance Optimization

Best Practices for Basic UDFs

2. Tables as UDFs

Overview

Creating Table UDFs

Step 1: Create a Specialized Table

Step 2: Convert to UDF

Step 3: Use the Table UDF

Flow Diagram

Key Benefits of Table UDFs

Advanced Techniques

3. User-Defined Aggregates (UDAs)

Overview

Creating UDAs

UDA Components

Using UDAs

Best Practices for UDAs

4. MCP UDFs

Overview

Example

Additional Resources