Section 2: Computed Columns
Welcome to Section 2 of the Pixeltable Fundamentals tutorial, Computed Columns. In the previous section, Tables and Data Operations, we learned how to create tables, populate them with data, and query and manipulate their contents. In this section, we’ll introduce one of Pixeltable’s most essential and powerful concepts: computed columns. We’ll learn how to:- Add computed columns to a table
- Use computed columns for complex operations such as image processing and model inference
Runtime -> Change runtime type menu item at the top, then select the
GPU radio button and click on Save.
Next, let’s ensure the Pixeltable library is installed in your
environment, along with the Huggingface transformers library, which
we’ll need for this tutorial section.
Computed Columns
Let’s start with a simple example that illustrates the basic concepts behind computed columns. We’ll use a table of world population data for our example. Remember that you can import datasets into a Pixeltable table by usingpxt.create_table() with the source parameter.
pop_t.head() returns the first few rows of a table,
and typing the table name pop_t by itself gives the schema.
| cca3 | country | continent | pop_2023 | pop_2022 | pop_2000 | area__km__ |
|---|---|---|---|---|---|---|
| IND | India | Asia | 1428627663 | 1417173173 | 1059633675 | 3287590. |
| CHN | China | Asia | 1425671352 | 1425887337 | 1264099069 | 9706961. |
| USA | United States | North America | 339996563 | 338289857 | 282398554 | 9372610. |
| IDN | Indonesia | Asia | 277534122 | 275501339 | 214072421 | 1904569. |
| PAK | Pakistan | Asia | 240485658 | 235824862 | 154369924 | 881912. |
| table 'fundamentals.population' |
| Column Name | Type | Computed With |
|---|---|---|
| cca3 | String | |
| country | String | |
| continent | String | |
| pop_2023 | Int | |
| pop_2022 | Int | |
| pop_2000 | Int | |
| area__km__ | Float |
select() such a quantity into a Pixeltable
DataFrame, giving it the name yoy_change (year-over-year change):
| country | yoy_change |
|---|---|
| India | 11454490 |
| China | -215985 |
| United States | 1706706 |
| Indonesia | 2032783 |
| Pakistan | 4660796 |
pop_t, we
see the new column and its definition.
| table 'fundamentals.population' |
| Column Name | Type | Computed With |
|---|---|---|
| cca3 | String | |
| country | String | |
| continent | String | |
| pop_2023 | Int | |
| pop_2022 | Int | |
| pop_2000 | Int | |
| area__km__ | Float | |
| yoy_change | Int | pop_2023 - pop_2022 |
| country | yoy_change |
|---|---|
| India | 11454490 |
| China | -215985 |
| United States | 1706706 |
| Indonesia | 2032783 |
| Pakistan | 4660796 |
| table 'fundamentals.population' |
| Column Name | Type | Computed With |
|---|---|---|
| cca3 | String | |
| country | String | |
| continent | String | |
| pop_2023 | Int | |
| pop_2022 | Int | |
| pop_2000 | Int | |
| area__km__ | Float | |
| yoy_change | Int | pop_2023 - pop_2022 |
| yoy_percent_change | Float | (100 * yoy_change) / pop_2022 |
| country | yoy_change | yoy_percent_change |
|---|---|---|
| India | 11454490 | 0.808 |
| China | -215985 | -0.015 |
| United States | 1706706 | 0.505 |
| Indonesia | 2032783 | 0.738 |
| Pakistan | 4660796 | 1.976 |
yoy_growth and yoy_percent_growth
have been automatically updated in response to the new data.
| cca3 | country | continent | pop_2023 | pop_2022 | pop_2000 | area__km__ | yoy_change | yoy_percent_change |
|---|---|---|---|---|---|---|---|---|
| FLK | Falkland Islands | South America | 3791 | 3780 | 3080.0 | 12173. | 11 | 0.291 |
| NIU | Niue | Oceania | 1935 | 1934 | 2074.0 | 261. | 1 | 0.052 |
| TKL | Tokelau | Oceania | 1893 | 1871 | 1666.0 | 12. | 22 | 1.176 |
| VAT | Vatican City | Europe | 518 | 510 | 651.0 | 0.44 | 8 | 1.569 |
| None | California | None | 39110000 | 39030000 | NaN | NaN | 80000 | 0.205 |
Recomputing Columns
From time to time you might need to recompute the data in an existing computed column. Perhaps the code for one of your UDFs has changed, and you want to recompute a column that uses that UDF in order to pick up the new logic. Or perhaps you want to re-run a nondeterministic computation such as model inference. The command to do this isrecompute_columns(). It won’t do much in the current example, because
all our computations are simple and deterministic, but for demonstration
purposes here’s what it looks like:
| cca3 | country | continent | pop_2023 | pop_2022 | pop_2000 | area__km__ | yoy_change | yoy_percent_change |
|---|---|---|---|---|---|---|---|---|
| FLK | Falkland Islands | South America | 3791 | 3780 | 3080.0 | 12173. | 11 | 0.291 |
| NIU | Niue | Oceania | 1935 | 1934 | 2074.0 | 261. | 1 | 0.052 |
| TKL | Tokelau | Oceania | 1893 | 1871 | 1666.0 | 12. | 22 | 1.176 |
| VAT | Vatican City | Europe | 518 | 510 | 651.0 | 0.44 | 8 | 1.569 |
| None | California | None | 39110000 | 39030000 | NaN | NaN | 80000 | 0.205 |
recompute_columns() is primarily useful when the input data
remains the same, but your UDF business logic changes.
A More Complex Example: Image Processing
In the Tables and Data Operations tutorial, we saw how media data such as images can be inserted into Pixeltable tables, alongside more traditional structured data. Let’s explore another example that uses computed columns for image processing operations. In this example, we’ll create the table directly by providing a schema, rather than importing it from a CSV like before.| source |
|---|
get_metadata(), which returns a dictionary with various metadata about
the image. Let’s go ahead and make this a computed column.
“UDF” is standard terminology in databases, meaning “User-Defined
Function”. Technically speaking, the get_metadata()
function isn’t user-defined, it’s built in to the Pixeltable library.
But we’ll consistently refer to Pixeltable functions as “UDFs” in order
to clearly distinguish them from ordinary Python functions. Later in
this tutorial, we’ll see how to turn (almost) any Python function into a
Pixeltable UDF.
| source | metadata |
|---|---|
| {"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 426, "palette": null} | |
| {"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 483, "palette": null} | |
| {"bits": 8, "mode": "RGB", "width": 621, "format": "JPEG", "height": 640, "palette": null} |
| source | metadata | rotated |
|---|---|---|
| {"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 426, "palette": null} | ||
| {"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 483, "palette": null} | ||
| {"bits": 8, "mode": "RGB", "width": 621, "format": "JPEG", "height": 640, "palette": null} |
| source | metadata | rotated | rotated_transparent |
|---|---|---|---|
| {"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 426, "palette": null} | |||
| {"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 483, "palette": null} | |||
| {"bits": 8, "mode": "RGB", "width": 621, "format": "JPEG", "height": 640, "palette": null} |
get_metadata(), convert(), and
rotate(), Pixeltable has a sizable library of other common
image operations that can be used as UDFs in computed columns. For the
most part, the image UDFs are analogs of the operations provided by the
Pillow library
(in fact, Pixeltable is just using Pillow under the covers). You can
read more about the provided image (and other) UDFs in the
Pixeltable
API Documentation.
Let’s have a look at our table schema.
| table 'fundamentals.image_ops' |
| Column Name | Type | Computed With |
|---|---|---|
| source | Image | |
| metadata | Json | source.get_metadata() |
| rotated | Image | source.rotate(10) |
| rotated_transparent | Image['RGBA'] | source.convert('RGBA').rotate(10) |
Image Detection
In addition to simple operations likerotate() and convert(), the
Pixeltable API includes UDFs for various off-the-shelf image models.
Let’s look at one example: object detection using the ResNet-50 model.
Model inference is a UDF too, and it can be inserted into a computed
column like any other.
This one may take a little more time to compute, since it involves first
downloading the ResNet-50 model (if it isn’t already cached), then
running inference on the images in our table.
| source | detections |
|---|---|
| {"boxes": [[243.329, 198.938, 252.491, 212.817], [411.603, 219.568, 442.399, 305.185], [339.933, 202.026, 347.892, 217.15], [512.94, 200.648, 530.061, 221.701], [494.712, 175.403, 518.906, 285.738], [342.964, 178.523, 367.205, 226.858], ..., [445.77, 170.045, 508.718, 291.668], [359.977, 212.132, 372.942, 229.529], [363.898, 220.541, 417.606, 315.758], [448.891, 121.4, 462.657, 142.467], [421.982, 158.211, 465.436, 300.659], [383.675, 172.826, 403.362, 211.302]], "labels": [86, 62, 86, 78, 82, 64, ..., 82, 86, 62, 85, 1, 1], "scores": [0.979, 0.987, 0.908, 0.915, 0.884, 0.926, ..., 0.987, 0.99, 0.997, 0.972, 0.999, 0.882], "label_text": ["vase", "chair", "vase", "microwave", "refrigerator", "potted plant", ..., "refrigerator", "vase", "chair", "clock", "person", "person"]} | |
| {"boxes": [[0.677, 280.959, 401.834, 477.803], [488.731, 51.887, 497.612, 82.074], [472.22, 251.556, 486.093, 287.805], [490.944, 249.437, 514.341, 288.037], [513.687, 48.824, 522.899, 80.729], [479.89, 251.301, 497.199, 288.15], ..., [341.577, 219.091, 429.855, 352.103], [468.799, 251.786, 481.076, 288.294], [461.138, 295.626, 470.704, 337.119], [503.576, 247.766, 535.297, 288.59], [440.375, 250.286, 449.473, 287.611], [499.56, 51.345, 508.956, 81.398]], "labels": [65, 84, 84, 84, 84, 84, ..., 64, 84, 84, 84, 84, 84], "scores": [0.995, 0.808, 0.819, 0.847, 0.841, 0.822, ..., 0.999, 0.81, 0.831, 0.844, 0.818, 0.81], "label_text": ["bed", "book", "book", "book", "book", "book", ..., "potted plant", "book", "book", "book", "book", "book"]} | |
| {"boxes": [[270.334, 353.472, 307.723, 404.887], [155.908, 101.902, 434.426, 551.656], [380.517, 160.025, 430.297, 190.625], [154.665, 128.519, 425.751, 607.173], [289.045, 98.536, 437.856, 549.117]], "labels": [40, 1, 37, 1, 1], "scores": [0.821, 0.897, 0.972, 0.98, 0.909], "label_text": ["baseball glove", "person", "sports ball", "person", "person"]} |
draw_bounding_boxes() API to superimpose
bounding boxes on the images, using different colors to distinguish
different object categories.
| source | image_with_bb |
|---|---|
| image_with_bb |
|---|
| table 'fundamentals.image_ops' |
| Column Name | Type | Computed With |
|---|---|---|
| source | Image | |
| metadata | Json | source.get_metadata() |
| rotated | Image | source.rotate(10) |
| rotated_transparent | Image['RGBA'] | source.convert('RGBA').rotate(10) |
| detections | Json | detr_for_object_detection(source, model_id='facebook/detr-resnet-50', threshold=0.8) |
| image_with_bb | Image | draw_bounding_boxes(source, detections.boxes, detections.label_text, fill=True) |
| source | image_with_bb | detections_labeltext | metadata |
|---|---|---|---|
| ["train"] | {"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 427, "palette": null} | ||
| ["person", "person", "tennis racket", "person", "person", "person", "person", "person", "person", "person", "person", "person", "person"] | {"bits": 8, "mode": "RGB", "width": 640, "format": "JPEG", "height": 427, "palette": null} |
t.detections, as well as generated images such as
t.image_with_bb. (Later we’ll see how to tune this behavior
in cases where it might be undesirable to store everything, but
the default behavior is that computed column output is always
persisted.)
Expressions
Let’s have a closer look at that call todraw_bounding_boxes() in the
last example.
draw_bounding_boxes() is, of
course, a UDF, and its first argument is a column reference of the sort
we’ve used many times now: t.source, the source image. The other two
arguments are more than simple column references, though: they’re
compound expressions that include the column reference t.detections
along with a suffix (.boxes or .label_text) that tells Pixeltable to
look inside the dictionary stored in t.detections.
These are all examples of Pixeltable expressions. In fact, we’ve
seen other types of Pixeltable expressions as well, without explicitly
calling them out:
- Calls to a UDF are expressions, such as
t.source.rotate(10), or thedraw_bounding_boxes()example above; - Arithmetic operations are expressions, such as year-over-year
calculation in our first example:
100 * pop_t.yoy_change / pop_t.pop_2022.