Computed Columns
Computed columns combines automatic calculations with smart incremental updates. Think of them as your data workflow automated.
Learn more about computed columns with our in-depth guide.
What are Computed Columns?
Computed columns are permanent table columns that automatically calculate values based on expressions involving other columns. They maintain those calculations as your data changes, enabling seamless data transformations without manual updates.
Why use computed columns?
You would use computed columns when you want to:
- Compute a value based on the contents of other columns
- Automatically update the computed value when the source data changes
- Simplify queries by avoiding the need to write complex expressions
- Create reproducible data transformation pipelines
How to create a computed column
As soon as the column is added, Pixeltable will (by default) automatically compute its value for all rows in the table, storing the results in the new column.
In traditional data workflows, it is commonplace to recompute entire pipelines when the input dataset is changed or enlarged. In Pixeltable, by contrast, all updates are applied incrementally. When new data appear in a table or existing data are altered, Pixeltable will recompute only those rows that are dependent on the changed data.
Building workflows
Let’s explore another example that uses computed columns for image processing operations.
Once we insert data, it will automatically compute the values for the new columns.
Pixeltable will automatically manage the dependencies between the columns, so that when the source image is updated, the rotated and rotated_transparent columns are automatically recomputed.
You don’t need to think about orchestration. Our DAG engine will take care of the dependencies for you.
Key Features
Incremental Updates
Only recomputes values for rows affected by changes in source columns, saving processing time and resources.
Automatic Dependencies
Tracks relationships between columns and handles the execution order of computations automatically.
Expression Support
Supports complex expressions combining multiple columns, Python functions, and built-in operations.
Type Safety
Ensures type consistency across computations and validates expressions at creation time.
Advanced Usage
Using Python Functions
You can use Python functions in computed columns using the @pxt.udf
decorator:
Chaining Computations
Computed columns can depend on other computed columns:
Best Practices
- Break down complex operations: Split complex operations into multiple columns for better readability and easier debugging
- Handle missing values: Explicitly handle None/null values to prevent unexpected errors
- Consider performance: For large tables, minimize the use of computationally expensive operations
- Document your transformations: Add comments explaining the purpose and logic of your computed columns
- Reuse common calculations: Create intermediate computed columns for values used in multiple places
Troubleshooting
Common Issues
- Type mismatches: Ensure input and output types are compatible
- Missing dependencies: A computed column will show as None if its inputs are None
- Performance issues: Very complex computations on large tables might become slow
Additional Resources
Was this page helpful?