pixeltable.Table - Pixeltable Documentation

A handle to a table, view, or snapshot. This class is the primary interface through which table operations (queries, insertions, updates, etc.) are performed in Pixeltable. Every user-invoked operation that runs an ExecNode tree (directly or indirectly) needs to call FileCache.emit_eviction_warnings() at the end of the operation. View source on GitHub

Methods

`add_column()`

Adds an ordinary (non-computed) column to the table. Signature:

add_column(
    *,
    if_exists: Literal['error', 'ignore', 'replace', 'replace_force'] = 'error',
    **kwargs: ts.ColumnType | builtins.type | _GenericAlias | exprs.Expr
)-> UpdateStatus

Parameters:

kwargs (ts.ColumnType | builtins.type | _GenericAlias | exprs.Expr): Exactly one keyword argument of the form col_name=col_type.
if_exists (Literal[‘error’, ‘ignore’, ‘replace’, ‘replace_force’]) = error: Determines the behavior if the column already exists. Must be one of the following:
'error': an exception will be raised.
'ignore': do nothing and return.
'replace' or 'replace_force': drop the existing column and add the new column, if it has no dependents.

Returns:

UpdateStatus: Information about the execution status of the operation.

Example: Add an int column:

tbl.add_column(new_col=pxt.Int)

Alternatively, this can also be expressed as:

tbl.add_columns({'new_col': pxt.Int})

`add_embedding_index()`

Add an embedding index to the table. Once the index is created, it will be automatically kept up-to-date as new rows are inserted into the table. To add an embedding index, one must specify, at minimum, the column to be indexed and an embedding UDF. Only String and Image columns are currently supported. Here’s an example that uses a [CLIP embedding][pixeltable.functions.huggingface.clip] to index an image column:

from pixeltable.functions.huggingface import clip … embedding_fn = clip.using(model_id=‘openai/clip-vit-base-patch32’) … tbl.add_embedding_index(tbl.img, embedding=embedding_fn)

Once the index is created, similiarity lookups can be performed using the similarity pseudo-function.

reference_img = PIL.Image.open(‘my_image.jpg’) … sim = tbl.img.similarity(reference_img) … tbl.select(tbl.img, sim).order_by(sim, asc=False).limit(5)

If the embedding UDF is a multimodal embedding (supporting more than one data type), then lookups may be performed using any of its supported types. In our example, CLIP supports both text and images, so we can also search for images using a text description:

sim = tbl.img.similarity(‘a picture of a train’) … tbl.select(tbl.img, sim).order_by(sim, asc=False).limit(5)

Signature:

add_embedding_index(
    column: str | ColumnRef,
    *,
    idx_name: Optional[str] = None,
    embedding: Optional[pxt.Function] = None,
    string_embed: Optional[pxt.Function] = None,
    image_embed: Optional[pxt.Function] = None,
    metric: str = 'cosine',
    if_exists: Literal['error', 'ignore', 'replace', 'replace_force'] = 'error'
)-> None

Parameters:

column (str | ColumnRef): The name of, or reference to, the column to be indexed; must be a String or Image column.
idx_name (Optional[str]): An optional name for the index. If not specified, a name such as 'idx0' will be generated automatically. If specified, the name must be unique for this table and a valid pixeltable column name.
embedding (Optional[pxt.Function]): The UDF to use for the embedding. Must be a UDF that accepts a single argument of type String or Image (as appropriate for the column being indexed) and returns a fixed-size 1-dimensional array of floats.
string_embed (Optional[pxt.Function]): An optional UDF to use for the string embedding component of this index. Can be used in conjunction with image_embed to construct multimodal embeddings manually, by specifying different embedding functions for different data types.
image_embed (Optional[pxt.Function]): An optional UDF to use for the image embedding component of this index. Can be used in conjunction with string_embed to construct multimodal embeddings manually, by specifying different embedding functions for different data types.
metric (str) = cosine: Distance metric to use for the index; one of 'cosine', 'ip', or 'l2'. The default is 'cosine'.
if_exists (Literal[‘error’, ‘ignore’, ‘replace’, ‘replace_force’]) = error: Directive for handling an existing index with the same name. Must be one of the following:
'error': raise an error if an index with the same name already exists.
'ignore': do nothing if an index with the same name already exists.
'replace' or 'replace_force': replace the existing index with the new one.

Example: Add an index to the img column of the table my_table:

from pixeltable.functions.huggingface import clip
    tbl = pxt.get_table('my_table')
    embedding_fn = clip.using(model_id='openai/clip-vit-base-patch32')
    tbl.add_embedding_index(tbl.img, embedding=embedding_fn)

Alternatively, the img column may be specified by name:

tbl.add_embedding_index('img', embedding=embedding_fn)

Add a second index to the img column, using the inner product as the distance metric, and with a specific name:

tbl.add_embedding_index(
        tbl.img,
        idx_name='ip_idx',
        embedding=embedding_fn,
        metric='ip'
    )

Add an index using separately specified string and image embeddings:

tbl.add_embedding_index(
        tbl.img,
        string_embed=string_embedding_fn,
        image_embed=image_embedding_fn
    )

`delete()`

Delete rows in this table. Signature:

delete(where: Optional['exprs.Expr'] = None)-> UpdateStatus

Parameters:

where (Optional[‘exprs.Expr’]): a predicate to filter rows to delete.

Example: Delete all rows in a table:

tbl.delete()

Delete all rows in a table where column a is greater than 5:

tbl.delete(tbl.a > 5)

`drop_column()`

Drop a column from the table. Signature:

drop_column(
    column: str | ColumnRef,
    if_not_exists: Literal['error', 'ignore'] = 'error'
)-> None

Parameters:

column (str | ColumnRef): The name or reference of the column to drop.
if_not_exists (Literal[‘error’, ‘ignore’]) = error: Directive for handling a non-existent column. Must be one of the following:
'error': raise an error if the column does not exist.
'ignore': do nothing if the column does not exist.

Example: Drop the column col from the table my_table by column name:

tbl = pxt.get_table('my_table')
    tbl.drop_column('col')

Drop the column col from the table my_table by column reference:

tbl = pxt.get_table('my_table')
    tbl.drop_column(tbl.col)

Drop the column col from the table my_table if it exists, otherwise do nothing:

tbl = pxt.get_table('my_table')
    tbl.drop_col(tbl.col, if_not_exists='ignore')

`drop_embedding_index()`

Drop an embedding index from the table. Either a column name or an index name (but not both) must be specified. If a column name or reference is specified, it must be a column containing exactly one embedding index; otherwise the specific index name must be provided instead. Signature:

drop_embedding_index(
    *,
    column: str | ColumnRef | None = None,
    idx_name: Optional[str] = None,
    if_not_exists: Literal['error', 'ignore'] = 'error'
)-> None

Parameters:

column (str | ColumnRef | None): The name of, or reference to, the column from which to drop the index. The column must have only one embedding index.
idx_name (Optional[str]): The name of the index to drop.
if_not_exists (Literal[‘error’, ‘ignore’]) = error: Directive for handling a non-existent index. Must be one of the following:
'error': raise an error if the index does not exist.
'ignore': do nothing if the index does not exist.

Note that if_not_exists parameter is only applicable when an idx_name is specified and it does not exist, or when column is specified and it has no index. if_not_exists does not apply to non-exisitng column. Example: Drop the embedding index on the img column of the table my_table by column name:

tbl = pxt.get_table('my_table')
    tbl.drop_embedding_index(column='img')

Drop the embedding index on the img column of the table my_table by column reference:

tbl = pxt.get_table('my_table')
    tbl.drop_embedding_index(column=tbl.img)

Drop the embedding index idx1 of the table my_table by index name:

tbl = pxt.get_table('my_table')
    tbl.drop_embedding_index(idx_name='idx1')

Drop the embedding index idx1 of the table my_table by index name, if it exists, otherwise do nothing:

tbl = pxt.get_table('my_table')
    tbl.drop_embedding_index(idx_name='idx1', if_not_exists='ignore')

`drop_index()`

Drop an index from the table. Either a column name or an index name (but not both) must be specified. If a column name or reference is specified, it must be a column containing exactly one index; otherwise the specific index name must be provided instead. Signature:

drop_index(
    *,
    column: str | ColumnRef | None = None,
    idx_name: Optional[str] = None,
    if_not_exists: Literal['error', 'ignore'] = 'error'
)-> None

Parameters:

column (str | ColumnRef | None): The name of, or reference to, the column from which to drop the index. The column must have only one embedding index.
idx_name (Optional[str]): The name of the index to drop.
if_not_exists (Literal[‘error’, ‘ignore’]) = error: Directive for handling a non-existent index. Must be one of the following:
'error': raise an error if the index does not exist.
'ignore': do nothing if the index does not exist.

Note that if_not_exists parameter is only applicable when an idx_name is specified and it does not exist, or when column is specified and it has no index. if_not_exists does not apply to non-exisitng column. Example: Drop the index on the img column of the table my_table by column name:

tbl = pxt.get_table('my_table')
    tbl.drop_index(column_name='img')

Drop the index on the img column of the table my_table by column reference:

tbl = pxt.get_table('my_table')
    tbl.drop_index(tbl.img)

Drop the index idx1 of the table my_table by index name:

tbl = pxt.get_table('my_table')
    tbl.drop_index(idx_name='idx1')

Drop the index idx1 of the table my_table by index name, if it exists, otherwise do nothing:

tbl = pxt.get_table('my_table')
    tbl.drop_index(idx_name='idx1', if_not_exists='ignore')

`insert()`

Inserts rows into this table. There are two mutually exclusive call patterns: To insert multiple rows at a time:

insert(
    source: TableSourceDataType,
    /,
    *,
    on_error: Literal['abort', 'ignore'] = 'abort',
    print_stats: bool = False,
    **kwargs: Any,
)```

To insert just a single row, you can use the more concise syntax:

```python
insert(
    *,
    on_error: Literal['abort', 'ignore'] = 'abort',
    print_stats: bool = False,
    **kwargs: Any
)```

Signature:

insert(
    source: Optional[TableDataSource] = None,
    /,
    *,
    source_format: Optional[Literal['csv', 'excel', 'parquet', 'json']] = None,
    schema_overrides: Optional[dict[str, ts.ColumnType]] = None,
    on_error: Literal['abort', 'ignore'] = 'abort',
    print_stats: bool = False,
    **kwargs: Any
)-> UpdateStatus

Parameters:

source (Optional[TableDataSource]): A data source from which data can be imported.
kwargs (Any): (if inserting a single row) Keyword-argument pairs representing column names and values. (if inserting multiple rows) Additional keyword arguments are passed to the data source.
source_format (Optional[Literal[‘csv’, ‘excel’, ‘parquet’, ‘json’]]): A hint about the format of the source data
schema_overrides (Optional[dict[str, ts.ColumnType]]): If specified, then columns in schema_overrides will be given the specified types
on_error (Literal[‘abort’, ‘ignore’]) = abort: Determines the behavior if an error occurs while evaluating a computed column or detecting an invalid media file (such as a corrupt image) for one of the inserted rows.
If on_error='abort', then an exception will be raised and the rows will not be inserted.
If on_error='ignore', then execution will continue and the rows will be inserted. Any cells with errors will have a None value for that cell, with information about the error stored in the corresponding tbl.col_name.errortype and tbl.col_name.errormsg fields.
print_stats (bool) = False: If True, print statistics about the cost of computed columns.

Returns:

UpdateStatus: An UpdateStatus object containing information about the update.

Example: Insert two rows into the table my_table with three int columns a, b, and c. Column c is nullable:

tbl = pxt.get_table('my_table')
    tbl.insert([{'a': 1, 'b': 1, 'c': 1}, {'a': 2, 'b': 2}])

Insert a single row using the alternative syntax:

tbl.insert(a=3, b=3, c=3)

Insert rows from a CSV file:

tbl.insert(source='path/to/file.csv')

Insert Pydantic model instances into a table with two pxt.Int columns a and b:

class MyModel(pydantic.BaseModel):
        a: int
        b: int

    models = [MyModel(a=1, b=2), MyModel(a=3, b=4)]
    tbl.insert(models)

`recompute_columns()`

Recompute the values in one or more computed columns of this table. Signature:

recompute_columns(
    *columns: str | ColumnRef,
    where: exprs.Expr | None = None,
    errors_only: bool = False,
    cascade: bool = True
)-> UpdateStatus

Parameters:

columns (str | ColumnRef): The names or references of the computed columns to recompute.
where (‘exprs.Expr’ | None): A predicate to filter rows to recompute.
errors_only (bool) = False: If True, only run the recomputation for rows that have errors in the column (ie, the column’s errortype property indicates that an error occurred). Only allowed for recomputing a single column.
cascade (bool) = True: if True, also update all computed columns that transitively depend on the recomputed columns.

Example: Recompute computed columns c1 and c2 for all rows in this table, and everything that transitively depends on them:

tbl.recompute_columns('c1', 'c2')

Recompute computed column c1 for all rows in this table, but don’t recompute other columns that depend on it:

tbl.recompute_columns(tbl.c1, tbl.c2, cascade=False)

Recompute column c1 and its dependents, but only for rows with c2 == 0:

tbl.recompute_columns('c1', where=tbl.c2 == 0)

Recompute column c1 and its dependents, but only for rows that have errors in it:

tbl.recompute_columns('c1', errors_only=True)

`rename_column()`

Rename a column. Signature:

rename_column(
    old_name: str,
    new_name: str
)-> None

Parameters:

old_name (str): The current name of the column.
new_name (str): The new name of the column.

Example: Rename the column col1 to col2 of the table my_table:

tbl = pxt.get_table('my_table')
    tbl.rename_column('col1', 'col2')

`revert()`

Reverts the table to the previous version. .. warning:: This operation is irreversible. Signature:

revert()-> None

`update()`

Update rows in this table. Signature:

update(
    value_spec: dict[str, Any],
    where: Optional['exprs.Expr'] = None,
    cascade: bool = True
)-> UpdateStatus

Parameters:

value_spec (dict[str, Any]): a dictionary mapping column names to literal values or Pixeltable expressions.
where (Optional[‘exprs.Expr’]): a predicate to filter rows to update.
cascade (bool) = True: if True, also update all computed columns that transitively depend on the updated columns.

Returns:

UpdateStatus: An UpdateStatus object containing information about the update.

Example: Set column int_col to 1 for all rows:

tbl.update({'int_col': 1})

Set column int_col to 1 for all rows where int_col is 0:

tbl.update({'int_col': 1}, where=tbl.int_col == 0)

Set int_col to the value of other_int_col + 1:

tbl.update({'int_col': tbl.other_int_col + 1})

Increment int_col by 1 for all rows where int_col is 0:

tbl.update({'int_col': tbl.int_col + 1}, where=tbl.int_col == 0)

SDK Reference

​Methods

​add_column()

​add_embedding_index()

​delete()

​drop_column()

​drop_embedding_index()

​drop_index()

​insert()

​recompute_columns()

​rename_column()

​revert()

​update()

Methods

`add_column()`

`add_embedding_index()`

`delete()`

`drop_column()`

`drop_embedding_index()`

`drop_index()`

`insert()`

`recompute_columns()`

`rename_column()`

`revert()`

`update()`