What are User-Defined Functions?
User-Defined Functions (UDFs) in Pixeltable allow you to extend the platform with custom Python code. They bridge the gap between Pixeltable’s built-in operations and your specific data processing needs, enabling you to create reusable components for transformations, analysis, and AI workflows. Pixeltable UDFs offer several key advantages:- Reusability: Define a function once and use it across multiple tables and operations
- Type Safety: Strong typing ensures data compatibility throughout your workflows
- Performance: Batch processing and caching capabilities optimize execution
- Integration: Seamlessly combine custom code with Pixeltable’s query system
- Flexibility: Process any data type including text, images, videos, and embeddings
User-Defined Functions in Pixeltable
Learn more about UDFs and UDAs with our in-depth guide.
- Basic User-Defined Functions (UDFs)
- Tables as UDFs
- User-Defined Aggregates (UDAs)
- MCP UDFs
1. Basic User-Defined Functions (UDFs)
Overview
UDFs allow you to:- Write custom Python functions for data processing
- Integrate them into computed columns and queries
- Optimize performance through batching
- Create reusable components for your data workflow
All UDFs require type hints for parameters and return values. This enables Pixeltable to validate and optimize your data workflow before execution.
Creating Basic UDFs
UDF Types
Local UDFs are serialized with their columns. Changes to the UDF only affect new columns.
Supported Types
Basic Types
Basic Types
Native Python types supported in UDFs:
Complex Types
Complex Types
Pixeltable-specific types:
Performance Optimization
Batching
Caching
Async Support
Async UDFs are specifically designed for handling external API calls, such as LLM calls, database queries, or web service interactions. They should not be used for general computation or data processing. They keep your Pixeltable workflows responsive by allowing background execution of time-consuming operations.
Best Practices for Basic UDFs
Type Safety
Type Safety
- Always provide complete type hints
- Use specific types over generic ones
- Validate input ranges where appropriate
Performance
Performance
- Use batching for GPU operations
- Cache expensive resources
- Process data in chunks when possible
Organization
Organization
- Keep related UDFs in modules
- Use clear, descriptive names
- Document complex operations
Table UDFs
Table UDFs
- Define clear input and output columns for your table UDFs
- Implement cleanup routines for tables that grow large
- Balance between too many small tables and monolithic tables
- Use clear naming conventions for tables and their UDFs
- Document the purpose and expected inputs for each table UDF
2. Tables as UDFs
Overview
Tables as UDFs allow you to:- Convert entire tables into reusable functions
- Create modular and complex data processing workflows
- Encapsulate multi-step operations
- Share workflows between different tables and applications
Tables as UDFs are particularly powerful for building AI agents and complex automation workflows that require multiple processing steps.
Creating Table UDFs
Flow Diagram
Agent Table UDF Flow
Key Benefits of Table UDFs
Modularity
Break complex workflows into reusable components that can be tested and maintained separately.
Encapsulation
Hide implementation details and expose only the necessary inputs and outputs through clean interfaces.
Composition
Combine multiple specialized agents to build more powerful workflows through function composition.
Advanced Techniques
Chaining Multiple Table UDFs
Chaining Multiple Table UDFs
You can create a workflow of table UDFs to handle complex multi-stage processing:
Parallel Processing with Table UDFs
Parallel Processing with Table UDFs
Execute multiple table UDFs in parallel and combine their results:
3. User-Defined Aggregates (UDAs)
Overview
UDAs enable you to:- Create custom aggregation functions
- Process multiple rows into a single result
- Use them in group_by operations
- Build reusable aggregation logic
Creating UDAs
UDA Components
-
Initialization (
__init__
)- Sets up initial state
- Defines parameters
- Called once at start
-
Update Method (
update
)- Processes each input row
- Updates internal state
- Must handle all value types
-
Value Method (
value
)- Returns final result
- Called after all updates
- Performs final calculations
Using UDAs
Best Practices for UDAs
- Manage state carefully
- Handle edge cases and errors
- Optimize for performance
- Use appropriate type hints
- Document expected behavior
4. MCP UDFs
Overview
MCP UDFs allow you to:- Connect to a running MCP server.
- Use tools exposed by the MCP server as Pixeltable UDFs.
- Integrate external services and custom logic into your Pixeltable workflows.
Example
Here is a simple example of an MCP server running in a separate Python script:Additional Resources
API Documentation
Complete API reference