Queries and Expressions

Kaggle Colab Download Notebook

Pixeltable Fundamentals

Section 3: Queries and Expressions

Welcome to Section 3 of the Pixeltable Fundamentals tutorial, Queries and Expressions.

In the previous section of this tutorial, Computed Columns, we saw how to issue queries over Pixeltable tables, such as:

pop_t.select(yoy_change=(pop_t.pop_2023 - pop_t.pop_2022)).collect()

We also saw how to define computed columns that become part of the table and are updated automatically when new rows are inserted:

pop_t.add_column(yoy_change=(pop_t.pop_2023 - pop_t.pop_2022))

Both these examples use the Pixeltable expression pop_t.pop_2023 - pop_t.pop_2022. We've seen a number of other expressions as well, such as the chain of image operations

t.source.convert('RGBA').rotate(10)

and the model invocation

detr_for_object_detection(
    t.source,
    model_id='facebook/detr-resnet-50',
    threshold=0.8
)

Expressions are the basic building blocks of Pixeltable workloads. An expression can be included in a select() statement, which will cause it to be evaluated dynamically, or in an add_column() statement, which will add it to the table schema as a computed column. In this section, we'll dive deeper into the different kinds of Pixeltable expressions and their uses. We'll:

  • Understand the relationship between Pixeltable expressions and query execution
  • Survey the different types of expressions
  • Learn more about the Pixeltable type system

To get started, let's import the necessary libraries for this tutorial and set up a demo directory.

%pip install -qU pixeltable datasets
import pixeltable as pxt

pxt.drop_dir('demo', force=True)
pxt.create_dir('demo')
Connected to Pixeltable database at:
postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory `demo`.

<pixeltable.catalog.dir.Dir at 0x3274086a0>

In this tutorial we're going to work with a subset of the MNIST dataset, a classic reference database of hand-drawn digits. A copy of the MNIST dataset is hosted on the Hugging Face datasets repository, so wecan use Pixeltable's built-in Hugging Face data importer to load it into a Pixeltable table.

import datasets

# Download the first 50 images of the MNIST dataset
ds = datasets.load_dataset('ylecun/mnist', split='train[:50]')

# Import them into a Pixeltable table
t = pxt.io.import_huggingface_dataset('demo.mnist', ds)
Created table `mnist_tmp_69154325`.
Inserting rows into `mnist_tmp_69154325`: 50 rows [00:00, 15723.14 rows/s]
Inserted 50 rows with 0 errors.
t.head(5)

Column References

The most basic type of expression is a column reference: that's what you get when you type, say, t.image. An expression such as t.image by itself is just a Python object; it doesn't contain any actual data, and no data will be loaded until you use the expression in a select() query or add_column() statement. Here's what we get if we type t.image by itself:

t.image
Column Name Type Computed With
image Image

This is true of all Pixeltable expressions: we can freely create them and manipulate them in various ways, but no actual data will be loaded until we use them in a query.

JSON Collections (Dicts and Lists)

Data is commonly presented in JSON format: for example, API responses and model output often take the shape of JSON dictionaries or lists of dictionaries. Pixeltable has native support for JSON accessors. To demonstrate this, let's add a computed column that runs an image classification model against the images in our dataset.

from pixeltable.functions.huggingface import vit_for_image_classification

t.add_computed_column(classification=vit_for_image_classification(
    t.image,
    model_id='farleyknight-org-username/vit-base-mnist'
))
Computing cells: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 50/50 [00:01<00:00, 41.26 cells/s]
Added 50 column values with 0 errors.

UpdateStatus(num_rows=50, num_computed_values=50, num_excs=0, updated_cols=[], cols_with_excs=[])
t.select(t.image, t.classification).head(3)

We see that the output is returned as a dict containing three lists: the five most likely labels (classes) for the image, the corresponding text labels (in this case, just the string form of the class number), and the scores (confidences) of each prediction. The Pixeltable type of the classification column is pxt.Json:

t
Column Name Type Computed With
image Image
label String
classification Json vit_for_image_classification(image, model_id='farleyknight-org-username/vit-base-mnist')

Pixeltable provides a range of operators on Json-typed output that behave just as you'd expect. To look up a key in a dictionary, use the syntax t.classification['labels']:

t.select(t.classification['labels']).head(3)
classification_labels
[5, 3, 2, 8, 7]
[0, 6, 9, 8, 1]
[4, 1, 9, 7, 0]

You can also use a convenient "attribute" syntax for dictionary lookups. This follows the standard JSONPath expression syntax.

t.select(t.classification.labels).head(3)
classification_labels
[5, 3, 2, 8, 7]
[0, 6, 9, 8, 1]
[4, 1, 9, 7, 0]

The "attribute" syntax isn't fully general (it won't work for dictionary keys that are not valid Python identifiers), but it's handy when it works.

t.classification.labels is another Pixeltable expression; you can think of it as saying, "do the 'labels' lookup from every dictionary in the column t.classification, and return the result as a new column." As before, the expression by itself contains no data; it's the query that does the actual work of retrieving data. Here's what we see if we just give the expression by itself, without a query:

t.classification.labels
<pixeltable.exprs.json_path.JsonPath at 0x3c0730220>

Similarly, one can pull out a specific item in a list (for this model, we're probably mostly interested in the first item anyway):

t.select(t.classification.labels[0]).head(3)
classification_labels0
5
0
4

Or slice a list in the usual manner:

t.select(t.classification.labels[:2]).head(3)
classification_labels2
[5, 3]
[0, 6]
[4, 1]

Pixeltable is resilient against out-of-bounds indices or dictionary keys. If an index or key doesn't exist for a particular row, you'll get a None output for that row.

t.select(t.classification.not_a_key).head(3)
classification_notakey
None
None
None

As always, any expression can be used to create a computed column.

# Use label_text to be consistent with t.label, which was given
# to us as a string

t.add_computed_column(pred_label=t.classification.label_text[0])
t
Computing cells: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 50/50 [00:00<00:00, 6547.05 cells/s]
Added 50 column values with 0 errors.
Column Name Type Computed With
image Image
label String
classification Json vit_for_image_classification(image, model_id='farleyknight-org-username/vit-base-mnist')
pred_label Json classification.label_text[0]

Finally, just as it's possible to extract items from lists and dictionaries using Pixeltable expressions, you can also construct new lists and dictionaries: just package them up in the usual way.

custom_dict = {
    # Keys must be strings; values can be any expressions
    'ground_truth': t.label,
    'prediction': t.pred_label,
    'is_correct': t.label == t.pred_label,
    # You can also use constants as values
    'engine': 'pixeltable',
}

t.select(t.image, custom_dict).head(5)

UDF Calls

UDF calls are another common type of expression. We've seen them throughout the tutorial; in fact, we used one a moment ago when we added a model invocation to our workload:

vit_for_image_classification(
    t.image,
    model_id='farleyknight-org-username/vit-base-mnist'
)

This calls the vit_for_image_classification UDF in the pxt.functions.huggingface module. Note that vit_for_image_classification is a Pixeltable UDF, not an ordinary Python function. (UDFs were first discussed in the Tables and Data Operations section of this tutorial.) You can think of a Pixeltable UDF as a function that operates on columns of data, iteratively applying an underlying operation to each row in the column (or columns). In this case, vit_for_image_classification operates on t.image, running the model against every image in the column.

Notice that in addition to the column t.image, this call to vit_for_image_classification also takes a constant argument specifying the model_id. Any UDF call argument may be a constant, and the constant value simply means "use this value for every row being evaluated".

You can always compose Pixeltable expressions to form more complicated ones; here's an example that runs the model against a 90-degree rotation of every image in the sample and extracts the label. Not surprisingly, the model doesn't perform as well on the rotated images.

rot_model_result = vit_for_image_classification(
    t.image.rotate(90),
    model_id='farleyknight-org-username/vit-base-mnist'
)

t.select(t.image, rot_label=rot_model_result.labels[0]).head(5)   

Note that we employed a useful trick here: we assigned an expression to the variable rot_model_result for later reuse. Every Pixeltable expression is a Python object, so you can freely assign them to variables, reuse them, compose them, and so on. Remember that nothing actually happens until the expression is used in a query - so in this example, setting the variable rot_model_result doesn't itself result in any data being retrieved; that only happens later, when we actually use it in the select() query.

There are a large number of built-in UDFs that ship with Pixeltable; you can always refer back to the API Documentation for details.

Method Calls

Many built-in UDFs allow a convenient alternate syntax. The following two expressions are exactly equivalent:

a = t.image.rotate(90)
b = pxt.functions.image.rotate(t.image, 90)

a and b can always be used interchangeably in queries, with identical results. Just like in standard Python classes, whenever Pixeltable sees the method call t.image.rotate(90), it interprets it as a function call pxt.functions.image.rotate(self, 90), with (in this case) self equal to t.image.

Any method call can also be written as a function call, but (just like in standard Python) not every function call can be written as a method call. For example, the following won't work:

t.image.vit_for_image_classification(
    model_id='farleyknight-org-username/vit-base-mnist'
)

That's because vit_for_image_classification is part of the pxt.functions.huggingface module, not the core module pxt.functions.image. Most Pixeltable types have a corresponding core module of UDFs that can be used as method calls (pxt.functions.image for Image; pxt.functions.string for String; and so on), described fully in the API Documentation.

Arithmetic and Boolean Operations

Expressions can also be combined using standard arithmetic and boolean operators. As with everything else, arithmetic and boolean expressions are operations on columns that (when used in a query) are applied to every row.

t.select(
    t.image,
    t.label,
    t.label == '4',
    t.label < '5',
).head(5)

When you use a where clause in a query, you're giving it a Pixeltable expression, too (a boolean-valued one).

t.where(t.label == '4').select(t.image).show()

The following example shows how boolean expressions can be assigned to variables and used to form more complex expressions.

# Reuse `rot_model_result` from above, extracting
# the dominant label as a new expression

rot_label = rot_model_result.label_text[0]

# Select all the rows where the ground truth label is '5',
# and the "rotated" version of the model got it wrong
# (by returning something other than a '5')

t.where((t.label == '5') & (rot_label != '5')).select(
    t.image, t.label, rot_label=rot_label
).show()

Notice that to form a logical "and", we wrote

(t.label == '5') & (rot_label != '5')

using the operator & rather than and. Likewise, to form a logical "or", we'd use | rather than or:

(t.label == '5') | (rot_label != '5')

For logical negation:

~(t.label == '5')

This follows the convention used by other popular data-manipulation frameworks such as Pandas, and it's necessary because the Python language does not allow the meanings of and, or, and not to be customized. There is one more instance of this to be aware of: to check whether an expression is None, it's necessary to write (say)

t.label == None

rather than t.label is None, for the same reason.

Arrays

In addition to lists and dicts, Pixeltable also has built-in support for numerical arrays. A typical place where arrays show up is as the output of an embedding.

from pixeltable.functions.huggingface import clip_image

t.add_column(clip=clip_image(t.image, model_id='openai/clip-vit-base-patch32'))
t.select(t.image, t.clip).head(5)
Computing cells: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 50/50 [00:01<00:00, 41.41 cells/s]
Added 50 column values with 0 errors.

The underlying Python type of pxt.Array is an ordinary NumPy array (np.ndarray), so that an array-typed column is a column of NumPy arrays (in this example, representing the embedding output of each image in the table). As with lists, arrays can be sliced in all the usual ways.

t.select(t.clip[0], t.clip[5:10], t.clip[-3:]).head(5)
col_0 col_1 col_2
-0.052 [-0.032 0.07 1.116 -0.037 -0.158] [ 0.904 0.037 -0.161]
0.063 [-0.332 -0.201 1.023 -0.003 -0.094] [ 0.823 -0.149 -0.163]
-0.059 [-0.233 0.238 0.864 -0.135 -0.017] [ 0.765 -0.363 0.206]
0.098 [-0.107 0.038 1.081 -0.111 0.172] [ 1.023 -0.136 0.092]
-0.046 [-0.115 0.066 1.089 -0.017 0.105] [0.786 0.098 0.171]

Ad hoc UDFs with apply

We've now seen the most commonly encountered Pixeltable expression types. There are a few other less commonly encountered expressions that are occasionally useful.

You can use apply to map any Python function onto a column of data. You can think of apply as a quick way of constructing an "on-the-fly" UDF for one-off use.

import numpy as np

t.select(t.clip.apply(np.ndarray.dumps, col_type=pxt.String)).head(2)
dumps
b'\x80\x02cnumpy.core.multiarray\n_reconstruct\nq\x00cnumpy\nndarray\nq\x01K\x00\x85q\x02c_codecs\nencode\nq\x03X\x01\x00\x00\x00bq\x04X\x06\x00\x00\x00latin1q\x05\x86q\x06Rq\x07\x87q\x08Rq\t(K\x01M\x00\x02\x85q\ncnumpy\ndtype\nq\x0bX\x02\x00\x00\x00f4q\x0c\x89\x88\x87q\rRq\x0e(K\x03X\x01\x00\x00\x00<q\x0fNNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tq\x10b\x89h\x03X\xf4\x0b\x00\x00\xc3\xbf\xc3\xacV\xc2\xbdl\xc2\xbe\xc2\x84\xc2\xbe@w\xc3\xb6\xc2\xbe(\xc2\xb0 \xc2\xbf*{Y=\xc3\x96s\x02\xc2\xbd\x11\xc2\x9e\xc2\x8e=v\xc3\xa9\xc2\x8e?"(\x18\xc2\xbdr\x05"\xc2\xbe\xc3\xbf1\x0f?6,\xc3\x84>\xc2\x8a^\xc3\xb3>\xc2\xa6lD\xc2\xbef5 \xc2\xbe\xc3\xb8:\xc2\xa4=3\'\xc2\xb8>\xc3\xaf7w\xc2\xbb\xc3\x99=#\xc2\xbec\xc2\xb3@=+|\x19\xc2\xbe\xc3\xa8\xc3\xaf\xc2\x82>\xc3\x9c\xc3\xbeF\xc2\xbe{\x10\xc2\x80>\xc2\x81\xc3\xb2\xc2\x8f<\xc2\xb4\x18\xc3\x87=\xc2\xa9\xc3\x83\x14\xc2\xbe\xc3\xabo2=\x03\'\xc3\x97<\xc3\x91@\x16\xc2\xbe\xc2\xb9\xc2\x96U>\xc2\xab\x15\xc2\xac\xc2\xbe\xc3\x90\xc2\xb3\xc2\xab=\x00\xc2\xb10\xc2\xbd\xc3\xb8\xc2\x82P\xc2\xbea\xc2\x83[\xc2\xbeYf\xc2\x9b=]\xc2\x8c\xc3\xb4\xc2\xbd\xc3\x95\xc2\x87\xc2\x9b\xc2\xbe7S\xc2\x81\xc2\xbf\xc3\xa3\xc2\x8fO=\xc3\xa8!,\xc2\xbeP\xc3\x98\xc2\x98\xc2\xbe\xc2\xb1\x06\x04\xc2\xbf2\xc3\xbb\xc3\x9a\xc2\xbe\xc3\xae|@?p\x16]>\xc3\x8c}\x00>\xc2\x98\xc2\xb0\xc2\xa4=^\xc3\x94\xc3\x9e\xc2\xbe]\'L=\x0b\xc3' ...... b'\$\x01\x11>EFw>\xc2\xbc\xc2\x86G\xc2\xbe`\xc3\xac;\xc2\xbd\xc3\x86(%>\xc2\x84s\x0c>\xc3\x80\tx<\xc2\xb6\xc2\x8f\xc3\xbf=e\xc2\x9b\xc3\x87\xc2\xbd\xc2\xa4M6?\xc2\xb8\xc2\x8d\xc2\xb1>-iP=OJ1\xc2\xbcb\xc3\xa0\xc2\x9c>\xc3\x92a\xc3\x97\xc2\xbd\xc3\xbc\xc2\x88\xc2\xb9>\xc3\x9d\x01\x1d=<\xc3\x95:\xc2\xbe.\xc3\xb5*\xc2\xbf\x162z\xc2\xbe\xc2\x82.\xc2\xb3\xc2\xbe\xc2\xa5\xc2\xa9\x03>\xc2\x89[\xc2\xa0=\x1a[\xc3\xbc\xc2\xbd\xc3\xaf\xc2\x90\xc2\x89\xc2\xbe\x1a\xc2\x87%\xc2\xbeZ\xc2\xae\xc2\x9c\xc2\xbe\xc2\xa0\xc3\xbd\n\xc2\xbf2\xc2\xaa\xc2\x90\xc2\xbd\xc2\x8c\xc2\x86\xc3\xbd\xc2\xbd\xc2\x95\xc2\xaeE\xc2\xbe?\xc2\x80\xc3\x83\xc2\xbe\xc2\xb64\x04>\xc3\xa2\xc2\xa6\xc2\xb6\xc2\xbdK\x0f3?}}L>\xc3\x97D\xc2\x85\xc2\xbe\x10\x1d\xc2\x9e\xc2\xbe\x1b\xc3\xaa\xc3\xa8=\xc3\xba3\xc2\x8c>\xc2\xb6\xc3\xb6 >\xc3\x8f\xc2\xac@\xc2\xbd\n\xc3\x9ba\xc2\xbe\xc2\xb0x\xc3\x87\xc2\xbe\xc2\x95\x08\x1c\xc2\xbe\xc2\xa3\xc3\xb6\xc2\xbd\xc2\xbe\xc3\x9d\$\xc2\xb5<\xc3\x9e\xc2\x8f%\xc2\xbe\xc2\xa5O(>ZS\r=h\x02\xc2\x88\xc2\xbe\xc3\x99C\xc2\xb5>\xc3\x85q\x1a\xc2\xbc\xc3\x89!M>\xc2\xbfk\xc2\x8c>>\\\x02\xc2\xbe>\xc2\x89\xc3\x84=|\xc2\xa2?\xc2\xbf\xc3\xb5\xc2\xacS=\x17\xc2\x89s\xc2\xbe[\xc2\x83f\xc2\xbe\xc3\xb8\xc3\xa4\xc2\x84>\xc2\x92^\xc2\xae\xc2\xbd~p}<\xc3\x8a\xc3\xa3\xc2\xb9=\x13\$\xc3\xa9\xc2\xbd~\xc3\xab\xc3\x96\xc2\xbc\xc3\xa5]G\xc2\xbe\xc3\xb5\x1aV=\xc3\x97\xc2\x83X\xc2\xbe\xc2\x89:\x0b9lG\x15=~\xc3\xb5\xc3\xaf>e\xc2\xb1B\xc2\xbem\xc2\x9b\xc2\xa1=z\xc3\x8c\xc2\xb1=\xc3\x96\n\xc2\xb1\xc2\xbe\xc3\x97Lg?\xc3\xa9\xc2\xa5\x16=\xc2\xb6\xc3\xaa\$\xc2\xbeq\x11h\x05\x86q\x12Rq\x13tq\x14b.'
b'\x80\x02cnumpy.core.multiarray\n_reconstruct\nq\x00cnumpy\nndarray\nq\x01K\x00\x85q\x02c_codecs\nencode\nq\x03X\x01\x00\x00\x00bq\x04X\x06\x00\x00\x00latin1q\x05\x86q\x06Rq\x07\x87q\x08Rq\t(K\x01M\x00\x02\x85q\ncnumpy\ndtype\nq\x0bX\x02\x00\x00\x00f4q\x0c\x89\x88\x87q\rRq\x0e(K\x03X\x01\x00\x00\x00<q\x0fNNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tq\x10b\x89h\x03X\x1b\x0c\x00\x00\xc3\xaa!\xc2\x80=\xc3\x86v2\xc2\xbe}\xc2\x99\xc2\x86\xc2\xbe0\xc2\x9a\x18\xc2\xbf\x13O_>p,\xc2\xaa\xc2\xbeq\xc3\x87M\xc2\xbe\xc2\xa9\xc3\xaf\xc2\x82?^&+\xc2\xbb\xc2\xa1\x1f\xc3\x81\xc2\xbd\xc3\x9aA\xc3\x89>\xc2\xae\xc2\xa0\xc3\xa9=\r\xc3\x9c\xc3\x9f>\xc3\xa0\xc2\xa4\xc2\xb1\xc2\xbe\xc2\xba\xc3\xa7\xc3\xae\xc2\xbcT\xc2\xad\$=\xc3\x8426>\xc3\xac~\xc2\x9a\xc2\xbc\xc3\x86\xc3\x93\xc3\x8b<\xc3\xa8\r\xc3\x88=\xc3\xb30\xc2\x8f\xc2\xbd"\xc2\x8b+\xc2\xbdO\xc2\x86\xc2\xbc<\x17|`>\x07\xc2\x8a\xc2\x99\xc2\xbc\xc2\xbd\xc3\x86\xc3\x99=\ng\xc2\xa6\xc2\xbe1\x03\xc3\x9d\xc2\xbe\xc3\xae:\xc2\xa6=\xc3\x8aHp\xc2\xbev\xc2\x9c\xc3\x85<"\xc3\xb6\\\xc2\xbe&\xc3\x94\xc2\x8c=/\xc3\xa9\x11=\x00\xc2\xaa\x05\xc2\xbeyg\xc3\x9d\xc2\xbep\xc3\x9e\x12>\xc2\x9c\xc2\x8c\x00\xc2\xbe)\xc3\x9b\xc2\xaa\xc2\xbdT\xc2\xb1\xc2\x92\xc2\xbf\xc2\xa6\x06\xc3\x9a\xc2\xbd\xc3\x9b\xc2\x81\xc2\x94\xc2\xbe\xc2\x98oz\xc2\xbe\xc2\xa8\x01\xc2\x91\xc2\xbd\xc3\x8d\xc2\x84n\xc2\xbe!,\xc2\x8a?\x12\x18J\xc2\xbd\xc2\x9f,<>3V\xc3\x83=\xc3\xa7' ...... b"\xb8<;\xc2\xa1\x15\xc2\xbev=\xc2\x89=\xc3\xa4[G>/Qj>\xc3\x95\xc3\xaee>\xc3\xbc5\xc3\x9c\xc2\xbd.\xc2\x8b\xc2\x9a\xc2\xbe\xc2\xa8\xc2\x89\x11?\x1a]\xc2\xbe>\x06\xc2\x84\xc2\x83>\xc2\xacr\xc2\x84\xc2\xbe\xc2\x8c\xc3\xa4\xc2\x87>eQ\x14\xc2\xbeA\xc2\xa0.>\xc3\xb0X\xc2\xbf\xc2\xbdl\xc3\x8d\xc3\x93\xc2\xbew\xc3\xb2\xc3\xa2\xc2\xbe\t\xc2\x86\xc3\xbc\xc2\xbd\xc2\xba\xc3\xb0\t\xc2\xbf\xc3\x8a\xc3\xaa1\xc2\xbe\xc3\x93\xc3\x8b\xc2\x92\xc2\xbd\x06_'\xc2\xbe\xc2\x8c\xc2\xa8\xc2\xa3\xc2\xbeU\xc3\xab\xc3\x99\xc2\xbd\xc3\x8e\xc2\x84i\xc2\xbee\xc2\x8bF\xc2\xbf\xc2\xbe\xc3\x8dv>N\xc2\xb3\xc3\xb5<\xc2\x99a\xc2\x8e;\xc3\xb2H\xc2\x95\xc2\xbe1\xc2\x81J>\x1e@\xc2\xa5\xc2\xbd\xc3\x83,\xc2\x8b?\xc2\x92\x18\xc3\xbb<\x0c_[\xc2\xbeq\x1e\xc3\xa2\xc2\xbdU\xc3\x9a\x1d\xc2\xbc#\xc2\x9e\xc2\xbf>e2\xc2\xbf>\x1e 0\xc2\xbe\xc2\x88@\x0b\xc2\xbe\xc2\xb7-A\xc2\xbe\xc2\xa5\x03\xc2\x99\xc2\xbdr\xc3\x91\xc3\x85\xc2\xbe\xc3\x8a\xc3\x9ey=\xc3\x91\x02\xc2\x9f\xc2\xbd\xc3\xbe\xc3\xa52\xc2\xbe|-\xc2\x8e=\x7f\xc2\x87\xc2\xa7\xc2\xbd\xc2\xa2\n\xc2\x9d>\xc3\x9e\n\xc3\xa8={\xc3\x87S\xc2\xbd\xc3\xaa\xc3\x94H>~\x11F\xc2\xbe?D\x08>\xc2\xb2\x13\x0e\xc2\xbf\xc2\xaaa*>\xc3\x9a\xc2\x9c\xc2\xb5\xc2\xbe\xc3\x81\xc3\xa2H>l\xc3\xb7i>\xc3\xbd\xc2\xa2\xc3\xb6=&p\xc2\x86=\xc2\xb6\xc3\xbdt<\xc2\xb6R\xc3\x80\xc2\xbe'\xc2\xab\x00\xc2\xbe\xc2\x87;V\xc2\xbd\xc3\x86\xc2\x88\xc2\xa9=\x1f\xc2\xbf\xc3\x8a\xc2\xbe\xc3\x9e\xc3\x8c\xc3\xa8<\xc3\x91\xc3\xad\xc3\x88=\xc2\xa1\x07\xc2\xbd>\xc2\x85&N\xc2\xbe\xc3\x9f\xc3\x8d\xc2\x96>Ae\xc3\xa3<\xc2\x9a\xc3\xb8\xc2\x9e:&\xc2\x99R?\xc3\xa82\x18\xc2\xbe\xc2\xb1\x0e'\xc2\xbeq\x11h\x05\x86q\x12Rq\x13tq\x14b."

Note, however, that if the function you're applying doesn't have type hints (as in the example here), you'll need to specify the output column type explicitly.

Column Properties

Some ColumnRef expressions have additional useful properties. A media column (image, video, audio, or document) has the following two properties:

  • localpath: the media location on the local filesystem
  • fileurl: the original URL where the media resides (could be the same as localpath)
t.select(t.image, t.image.localpath).head(5)

Any computed column will have two additional properties, errortype and errormsg. These properties will usually be None. However, if the computed column was created with on_error='ignore' and an exception was encountered during column execution, then the properties will contain additional information about the exception.

To demonstrate this feature, we're going to deliberately trigger an exception in a computed column. The images in our example table are black and white, meaning they have only one color channel. If we try to extract a channel other than channel number 0, we'll get an exception. Ordinarily when we call add_column, the exception is raised and the add_column operation is aborted.

t.add_column(channel=t.image.getchannel(1))
Computing cells:  32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                            | 16/50 [00:00<00:00, 905.19 cells/s]



---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

File ~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exprs/row_builder.py:314, in RowBuilder.eval(self, data_row, ctx, profile, ignore_errors)
    313 start_time = time.perf_counter()
--> 314 expr.eval(data_row, self)
    315 if profile is not None:


File ~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exprs/function_call.py:429, in FunctionCall.eval(self, data_row, row_builder)
    427 if isinstance(self.fn, func.CallableFunction) and not self.fn.is_batched:
    428     # optimization: avoid additional level of indirection we'd get from calling Function.exec()
--> 429     data_row[self.slot_idx] = self.fn.py_fn(*args, **kwargs)
    430 elif self.is_window_fn_call:


File /opt/miniconda3/envs/pxt/lib/python3.9/site-packages/PIL/Image.py:2555, in Image.getchannel(self, channel)
   2553         raise ValueError(msg) from e
-> 2555 return self._new(self.im.getband(channel))


ValueError: band index out of range


The above exception was the direct cause of the following exception:


Error                                     Traceback (most recent call last)

Cell In[25], line 1
----> 1 t.add_column(channel=t.image.getchannel(1))


File ~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table.py:414, in Table.add_column(self, type, stored, print_stats, on_error, **kwargs)
    412 new_col = self._create_columns({col_name: col_schema})[0]
    413 self._verify_column(new_col, set(self._schema.keys()), set(self._query_names))
--> 414 status = self._tbl_version.add_column(new_col, print_stats=print_stats, on_error=on_error)
    415 FileCache.get().emit_eviction_warnings()
    416 return status


File ~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table_version.py:503, in TableVersion.add_column(self, col, print_stats, on_error)
    501 self.schema_version = self.version
    502 with Env.get().engine.begin() as conn:
--> 503     status = self._add_columns([col], conn, print_stats=print_stats, on_error=on_error)
    504     _ = self._add_default_index(col, conn)
    505     self._update_md(time.time(), conn, preceding_schema_version=preceding_schema_version)


File ~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table_version.py:578, in TableVersion._add_columns(self, cols, conn, print_stats, on_error)
    576     # we need to re-initialize the sqlalchemy schema
    577     self.store_tbl.create_sa_tbl()
--> 578     raise exc
    579 finally:
    580     plan.close()


File ~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table_version.py:561, in TableVersion._add_columns(self, cols, conn, print_stats, on_error)
    559 plan.open()
    560 try:
--> 561     num_excs = self.store_tbl.load_column(col, plan, value_expr_slot_idx, conn, on_error)
    562 except sql.exc.DBAPIError as exc:
    563     # Wrap the DBAPIError in an excs.Error to unify processing in the subsequent except block
    564     raise excs.Error(f'SQL error during execution of computed column `{col.name}`:\n{exc}') from exc


File ~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/store.py:265, in StoreBase.load_column(self, col, exec_plan, value_expr_slot_idx, conn, on_error)
    263 value_exc = result_row.get_exc(value_expr_slot_idx)
    264 if on_error == 'abort':
--> 265     raise excs.Error(
    266         f'Error while evaluating computed column `{col.name}`:\n{value_exc}'
    267     ) from value_exc
    268 # we store a NULL value and record the exception/exc type
    269 error_type = type(value_exc).__name__


Error: Error while evaluating computed column `channel`:
band index out of range

But if we use on_error='ignore', the exception will be logged in the column properties instead.

t.add_column(channel=t.image.getchannel(1), on_error='ignore')
Computing cells: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 50/50 [00:00<00:00, 1718.16 cells/s]
Added 50 column values with 50 errors.

UpdateStatus(num_rows=50, num_computed_values=50, num_excs=50, updated_cols=[], cols_with_excs=['mnist_tmp_69154325.channel'])

Notice that the update status informs us that there were 50 errors. If we query the table, we see that the column contains only None values, but the errortype and errormsg fields contain details of the error.

t.select(t.image, t.channel, t.channel.errortype, t.channel.errormsg).head(5)

More details on Pixeltable's error handling can be found in the Working with External Files how-to guide.

The Pixeltable Type System

We've seen that every column and every expression in Pixeltable has an associated Pixeltable type. In this section, we'll briefly survey the various Pixeltable types and their uses.

Here are all the supported types and their corresponding Python types:

Pixeltable typePython type
pxt.Stringstr
pxt.Intint
pxt.Floatfloat
pxt.Boolbool
pxt.Timestampdatetime.datetime
pxt.Json**
pxt.Arraynp.ndarray
pxt.ImagePIL.Image.Image
pxt.Videostr
pxt.Audiostr
pxt.Documentstr

The Python type is what you'll get back if you query an expression of the given Pixeltable type. For pxt.Json, it can be any of str, int, float, bool, list, or dict.

pxt.Audio, pxt.Video, and pxt.Document all correspond to the Python type str. This is because those types are represented by file paths that reference the media in question. When you query for, say, t.select(t.video_col), you're guaranteed to get a file path on the local filesystem (Pixeltable will download and cache a local copy of the video if necessary to ensure this). If you want the original URL, use t.video_col.fileurl instead.

Several types can be specialized to constrain the allowable data in a column.

  • pxt.Image can be specialized with a resolution and/or an image mode:
    • pxt.Image[(300,200)] - images with width 300 and height 200
    • pxt.Image['RGB'] - images with mode 'RGB'; see the PIL Documentation for the full list
    • pxt.Image[(300,200), 'RGB'] - combines the above constraints
  • pxt.Array must be specialized with a shape and a dtype:
    • pxt.Array[(64,64,3), pxt.Float] - 3-dimensional arrays with dtype pxt.Float and 64x64x3 shape

If we look at the structure of our table now, we see examples of specialized image and array types.

t
Column Name Type Computed With
image Image
label String
classification Json vit_for_image_classification(image, model_id='farleyknight-org-username/vit-base-mnist')
pred_label Json classification.label_text[0]
clip Array[(512,), Float] clip_image(image, model_id='openai/clip-vit-base-patch32')
channel Image['L'] image.getchannel(1)

t.clip has type pxt.Array[(512,), pxt.Float], since the output of the embedding is always a 1x512 array. t.channel has type Image['L'], since it's always an 'L' mode (1-channel) image.

You can freely use pxt.Image by itself to mean "any image, without constraints", but numerical arrays must always specify a shape and a dtype; pxt.Array by itself will raise an error.
Array shapes follow standard numpy conventions: a shape is a tuple of integers, such as (512,) or (64,64,3). A None may be used in place of an integer to indicate an unconstrained size for that dimension, as in (None,None,3) (3-dimensional array with two unconstrained dimensions), or simply (None,) (unconstrained 1-dimensional array).