What are User-Defined Functions?

User-Defined Functions (UDFs) in Pixeltable allow you to extend the platform with custom Python code. They bridge the gap between Pixeltable’s built-in operations and your specific data processing needs, enabling you to create reusable components for transformations, analysis, and AI workflows.

Pixeltable UDFs offer several key advantages:

  • Reusability: Define a function once and use it across multiple tables and operations
  • Type Safety: Strong typing ensures data compatibility throughout your pipelines
  • Performance: Batch processing and caching capabilities optimize execution
  • Integration: Seamlessly combine custom code with Pixeltable’s query system
  • Flexibility: Process any data type including text, images, videos, and embeddings

UDFs can be as simple as a basic transformation or as complex as a multi-stage ML pipeline. Pixeltable offers three types of custom functions to handle different scenarios:

import pixeltable as pxt

# Basic UDF for text transformation
@pxt.udf
def clean_text(text: str) -> str:
    """Clean and normalize text data."""
    return text.lower().strip()

# Use in a computed column
documents = pxt.get_table('my_documents')
documents.add_computed_column(
    clean_content=clean_text(documents.content)
)

User-Defined Functions in Pixeltable

Learn more about UDFs and UDAs with our in-depth guide.

This guide covers three types of custom functions in Pixeltable:

  1. Basic User-Defined Functions (UDFs)
  2. Tables as UDFs
  3. User-Defined Aggregates (UDAs)

1. Basic User-Defined Functions (UDFs)

Overview

UDFs allow you to:

  • Write custom Python functions for data processing
  • Integrate them into computed columns and queries
  • Optimize performance through batching
  • Create reusable components for your data workflow

All UDFs require type hints for parameters and return values. This enables Pixeltable to validate and optimize your data workflow before execution.

Creating Basic UDFs

@pxt.udf
def add_tax(price: float, rate: float = 0.1) -> float:
    return price * (1 + rate)

# Use in computed column
table.add_computed_column(
    price_with_tax=add_tax(table.price)
)

UDF Types

# Defined directly in your code
@pxt.udf
def extract_year(date_str: str) -> int:
    return int(date_str.split('-')[0])
    
# Used immediately
table.add_computed_column(
    year=extract_year(table.date)
)

Local UDFs are serialized with their columns. Changes to the UDF only affect new columns.

Supported Types

Performance Optimization

Batching

@pxt.udf(batch_size=16)
def embed_texts(
    texts: Batch[str]
) -> Batch[pxt.Array]:
    # Process multiple texts at once
    return model.encode(texts)

Caching

@pxt.udf
def expensive_operation(text: str) -> str:
    # Cache model instance
    if not hasattr(expensive_operation, 'model'):
        expensive_operation.model = load_model()
    return expensive_operation.model(text)

Best Practices for Basic UDFs

2. Tables as UDFs

Overview

Tables as UDFs allow you to:

  • Convert entire tables into reusable functions
  • Create modular and complex data processing workflows
  • Encapsulate multi-step operations
  • Share workflows between different tables and applications

Tables as UDFs are particularly powerful for building AI agents and complex automation workflows that require multiple processing steps.

Creating Table UDFs

Step 1: Create a Specialized Table

# Create a table with your workflow
finance_agent = pxt.create_table('directory.financial_analyst', 
                                {'prompt': pxt.String})
# Add computed columns for processing
finance_agent.add_computed_column(/* ... */)

Step 2: Convert to UDF

# Convert table to UDF by specifying return column
finance_agent_udf = pxt.udf(finance_agent, 
                           return_value=finance_agent.answer)

Step 3: Use the Table UDF

# Use like any other UDF
result_table.add_computed_column(
    result=finance_agent_udf(result_table.prompt)
)

Flow Diagram

Agent Table UDF Flow

Key Benefits of Table UDFs

Modularity

Break complex workflows into reusable components that can be tested and maintained separately.

Encapsulation

Hide implementation details and expose only the necessary inputs and outputs through clean interfaces.

Composition

Combine multiple specialized agents to build more powerful workflows through function composition.

Advanced Techniques

3. User-Defined Aggregates (UDAs)

Overview

UDAs enable you to:

  • Create custom aggregation functions
  • Process multiple rows into a single result
  • Use them in group_by operations
  • Build reusable aggregation logic

Creating UDAs

@pxt.uda
class sum_of_squares(pxt.Aggregator):
    def __init__(self):
        self.cur_sum = 0
        
    def update(self, val: int) -> None:
        self.cur_sum += val * val
        
    def value(self) -> int:
        return self.cur_sum

UDA Components

  1. Initialization (__init__)

    • Sets up initial state
    • Defines parameters
    • Called once at start
  2. Update Method (update)

    • Processes each input row
    • Updates internal state
    • Must handle all value types
  3. Value Method (value)

    • Returns final result
    • Called after all updates
    • Performs final calculations

Using UDAs

# Basic usage
table.select(sum_of_squares(table.value)).collect()

# With grouping
table.group_by(table.category).select(
    table.category,
    sum_of_squares(table.value)
).collect()

Best Practices for UDAs

  • Manage state carefully
  • Handle edge cases and errors
  • Optimize for performance
  • Use appropriate type hints
  • Document expected behavior

Additional Resources

API Documentation

Complete API reference