# Changelog
Source: https://docs.pixeltable.com/changelog/changelog
Release notes for Pixeltable covering new features, performance improvements, bug fixes, and breaking changes across SDK versions.
## Contributors
Pixeltable is built by a vibrant community of contributors. We're grateful for everyone who has helped make Pixeltable better!
**Want to contribute?** Check out our [Contributing Guide](https://github.com/pixeltable/pixeltable/tree/main?tab=contributing-ov-file#readme) to get started.
**Top Contributors:** View our top contributors on [GitHub](https://github.com/pixeltable/pixeltable/graphs/contributors).
***
## Release History
View the complete release history for Pixeltable below. Each release includes detailed information about new features, bug fixes, and improvements.
For the latest release information, visit our [GitHub Releases page](https://github.com/pixeltable/pixeltable/releases).
***
### v0.6.4
**Released:** June 03, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.6.4](https://github.com/pixeltable/pixeltable/releases/tag/v0.6.4)
#### What's Changed
* \[PXT-1158]\[PXT-1162] Fix retargeting to stop creating ghost columns by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1345](https://github.com/pixeltable/pixeltable/pull/1345)
* Catalog bug: create view failed to mark base table as modified by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1370](https://github.com/pixeltable/pixeltable/pull/1370)
* Rename one of two 'test\_uuid.py's by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1369](https://github.com/pixeltable/pixeltable/pull/1369)
* Fix an error in test\_finalize\_pending\_ops\_non\_retriable\_error that oc… by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1374](https://github.com/pixeltable/pixeltable/pull/1374)
* Reimplement Random ops stats collection and reporting by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1367](https://github.com/pixeltable/pixeltable/pull/1367)
* Fix tests that depend on non-deterministic order by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1363](https://github.com/pixeltable/pixeltable/pull/1363)
* Cross reference HTTP serving page from SDK page on FastAPIRouter by [@apreshill](https://github.com/apreshill) in [#1377](https://github.com/pixeltable/pixeltable/pull/1377)
* Fix future warning from huggingface by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1380](https://github.com/pixeltable/pixeltable/pull/1380)
* \[PXT-1172] Table load can raise AssertionError if the table is dropped concurrently by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1375](https://github.com/pixeltable/pixeltable/pull/1375)
* PXT-1153: FastAPIRouter prevents duplicate routes by [@mkornacker](https://github.com/mkornacker) in [#1371](https://github.com/pixeltable/pixeltable/pull/1371)
* PXT-1149: thread-safe Table by [@mkornacker](https://github.com/mkornacker) in [#1372](https://github.com/pixeltable/pixeltable/pull/1372)
* PXT-1068: recompile SELECT \* query by [@mkornacker](https://github.com/mkornacker) in [#1357](https://github.com/pixeltable/pixeltable/pull/1357)
* Typed JSON return values for audio.get\_metadata() and video.get\_metadata() by [@aaron-siegel](https://github.com/aaron-siegel) in [#1368](https://github.com/pixeltable/pixeltable/pull/1368)
* Add fp32 embedding to db dump for migration testing by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1383](https://github.com/pixeltable/pixeltable/pull/1383)
* Ollama updates by [@aaron-siegel](https://github.com/aaron-siegel) in [#1386](https://github.com/pixeltable/pixeltable/pull/1386)
* Lancedb updates by [@aaron-siegel](https://github.com/aaron-siegel) in [#1387](https://github.com/pixeltable/pixeltable/pull/1387)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.6.3...v0.6.4](https://github.com/pixeltable/pixeltable/compare/v0.6.3...v0.6.4)
***
### v0.6.3
**Released:** May 22, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.6.3](https://github.com/pixeltable/pixeltable/releases/tag/v0.6.3)
#### What's Changed
* \[PXT-1160] skip working-with-fireworks.ipynb in ci by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1350](https://github.com/pixeltable/pixeltable/pull/1350)
* docs: relocate submodules, README updates, and nav fixes by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1348](https://github.com/pixeltable/pixeltable/pull/1348)
* \[PXT-1159] simplify GroqTest::test\_tool\_invocations to make it more reliable by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1346](https://github.com/pixeltable/pixeltable/pull/1346)
* Improve run\_tool\_invocations\_test by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1347](https://github.com/pixeltable/pixeltable/pull/1347)
* Support UDFs defined in modules with `from __future__ import annotations` by [@aaron-siegel](https://github.com/aaron-siegel) in [#1344](https://github.com/pixeltable/pixeltable/pull/1344)
* Update model used as an example in working with fireworks by [@aaron-siegel](https://github.com/aaron-siegel) in [#1356](https://github.com/pixeltable/pixeltable/pull/1356)
* explicitly prohibit the unsupported pk col operations by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1349](https://github.com/pixeltable/pixeltable/pull/1349)
* \[PXT-1002] double the current random-ops duration in ci to 6 minutes by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1359](https://github.com/pixeltable/pixeltable/pull/1359)
* \[PXT-1147] Hugging Face datasets integration breaks with `datasets` 4.8.5 by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1361](https://github.com/pixeltable/pixeltable/pull/1361)
* docs: fix deployment staleness and add application templates by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1362](https://github.com/pixeltable/pixeltable/pull/1362)
* \[PXT-1155] random\_ops fix for: ValueError: sleep length must be non-n… by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1360](https://github.com/pixeltable/pixeltable/pull/1360)
* \[PXT-1138] Fault injection framework for testing by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1326](https://github.com/pixeltable/pixeltable/pull/1326)
* Twelvelabs async file uploads by [@aaron-siegel](https://github.com/aaron-siegel) in [#1358](https://github.com/pixeltable/pixeltable/pull/1358)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.6.2...v0.6.3](https://github.com/pixeltable/pixeltable/compare/v0.6.2...v0.6.3)
***
### v0.6.2
**Released:** May 15, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.6.2](https://github.com/pixeltable/pixeltable/releases/tag/v0.6.2)
#### What's Changed
* Update 'notebook' dependency to address security vulnerability by [@aaron-siegel](https://github.com/aaron-siegel) in [#1311](https://github.com/pixeltable/pixeltable/pull/1311)
* Add missing skip\_test directives to inlined objects tests by [@aaron-siegel](https://github.com/aaron-siegel) in [#1315](https://github.com/pixeltable/pixeltable/pull/1315)
* Disable MPS for CLIP embeddings on Apple Silicon by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1314](https://github.com/pixeltable/pixeltable/pull/1314)
* \[PXT-1130] fix a pendingtableops corruption bug in catalog by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1298](https://github.com/pixeltable/pixeltable/pull/1298)
* \[PXT-1135] Nano Banana image generation support by [@christopherpestano](https://github.com/christopherpestano) in [#1312](https://github.com/pixeltable/pixeltable/pull/1312)
* Test fixes by [@aaron-siegel](https://github.com/aaron-siegel) in [#1320](https://github.com/pixeltable/pixeltable/pull/1320)
* Replace deprecated openai models in tests and examples by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1310](https://github.com/pixeltable/pixeltable/pull/1310)
* Thread concurrency-related bug fixes by [@mkornacker](https://github.com/mkornacker) in [#1322](https://github.com/pixeltable/pixeltable/pull/1322)
* Add defensive guards to dashboard tree traversal by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1321](https://github.com/pixeltable/pixeltable/pull/1321)
* \[PXT-1002] Start enforcing random ops in ci by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1325](https://github.com/pixeltable/pixeltable/pull/1325)
* Docs & README refresh: serving integration, dependency fixes, style improvements by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1318](https://github.com/pixeltable/pixeltable/pull/1318)
* `pxt deploy` CLI command + add services to environment config by [@aaron-siegel](https://github.com/aaron-siegel) in [#1319](https://github.com/pixeltable/pixeltable/pull/1319)
* Get rid of later Column value expr initialization by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1323](https://github.com/pixeltable/pixeltable/pull/1323)
* \[PXT-1154] Invalidate cached TV version after acquiring write lock by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1329](https://github.com/pixeltable/pixeltable/pull/1329)
* \[PXT-1078] gemini 2.0 models are being discontinued by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1332](https://github.com/pixeltable/pixeltable/pull/1332)
* Allow deploying services defined in code by [@aaron-siegel](https://github.com/aaron-siegel) in [#1331](https://github.com/pixeltable/pixeltable/pull/1331)
* Add submodules, CLI docs, skill wiring, Local UI Doc, and README updates by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1330](https://github.com/pixeltable/pixeltable/pull/1330)
* Fix main by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1341](https://github.com/pixeltable/pixeltable/pull/1341)
* Replace more deprecated and decomissioned OpenAI models in tests and examples by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1334](https://github.com/pixeltable/pixeltable/pull/1334)
* \[PXT-952] Switch to read committed isolation by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1335](https://github.com/pixeltable/pixeltable/pull/1335)
* Serving config updates by [@aaron-siegel](https://github.com/aaron-siegel) in [#1339](https://github.com/pixeltable/pixeltable/pull/1339)
* PXT-1140: parameterized queries + plan caching + thread-safe catalog by [@mkornacker](https://github.com/mkornacker) in [#1328](https://github.com/pixeltable/pixeltable/pull/1328)
* Add `custom_metadata` and `comment` to add\_computed\_column() by [@aaron-siegel](https://github.com/aaron-siegel) in [#1343](https://github.com/pixeltable/pixeltable/pull/1343)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.6.1...v0.6.2](https://github.com/pixeltable/pixeltable/compare/v0.6.1...v0.6.2)
***
### v0.6.1
**Released:** May 06, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.6.1](https://github.com/pixeltable/pixeltable/releases/tag/v0.6.1)
#### What's Changed
* Update and fix insert docstring by [@aaron-siegel](https://github.com/aaron-siegel) in [#1307](https://github.com/pixeltable/pixeltable/pull/1307)
* PXT-1054 Allow R2 Home Bucket as Destination Blob Storage by [@amithadke](https://github.com/amithadke) in [#1213](https://github.com/pixeltable/pixeltable/pull/1213)
* Explicitly install pip in all conda environments in CI by [@aaron-siegel](https://github.com/aaron-siegel) in [#1309](https://github.com/pixeltable/pixeltable/pull/1309)
* fixes for make install with python 3.12+ by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1303](https://github.com/pixeltable/pixeltable/pull/1303)
* \[PXT-1012] Fix iterator view update when base column is an iterator parameter by [@christopherpestano](https://github.com/christopherpestano) in [#1288](https://github.com/pixeltable/pixeltable/pull/1288)
* Add FAL, Jina, and Reve tests to CI by [@aaron-siegel](https://github.com/aaron-siegel) in [#1316](https://github.com/pixeltable/pixeltable/pull/1316)
* Bump next from 15.0.2 to 15.5.15 in /docs/sample-apps/text-and-image-similarity-search-nextjs-fastapi/frontend by [@dependabot](https://github.com/dependabot)\[bot] in [#1317](https://github.com/pixeltable/pixeltable/pull/1317)
* Fixing LC\_CTYPE to use legacy locale on MacOS by [@christopherpestano](https://github.com/christopherpestano) in [#1313](https://github.com/pixeltable/pixeltable/pull/1313)
* Dashboard performance and correctness fixes by [@mkornacker](https://github.com/mkornacker) in [#1308](https://github.com/pixeltable/pixeltable/pull/1308)
#### New Contributors
* [@dependabot](https://github.com/dependabot)\[bot] made their first contribution in [#1317](https://github.com/pixeltable/pixeltable/pull/1317)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.6.0...v0.6.1](https://github.com/pixeltable/pixeltable/compare/v0.6.0...v0.6.1)
***
### v0.6.0
**Released:** May 01, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.6.0](https://github.com/pixeltable/pixeltable/releases/tag/v0.6.0)
#### What's Changed
* CI Updates by [@aaron-siegel](https://github.com/aaron-siegel) in [#1281](https://github.com/pixeltable/pixeltable/pull/1281)
* video.py:pan() as an expr\_udf by [@mkornacker](https://github.com/mkornacker) in [#1284](https://github.com/pixeltable/pixeltable/pull/1284)
* \[PXT-1072] Drop num\_retained\_versions by [@christopherpestano](https://github.com/christopherpestano) in [#1277](https://github.com/pixeltable/pixeltable/pull/1277)
* Quality: Module resolution can mask real import errors by [@tomaioo](https://github.com/tomaioo) in [#1265](https://github.com/pixeltable/pixeltable/pull/1265)
* Upgrade actions/checkout and actions/setup-python to v6 by [@aaron-siegel](https://github.com/aaron-siegel) in [#1279](https://github.com/pixeltable/pixeltable/pull/1279)
* Applying the new exception hierarchy by [@mkornacker](https://github.com/mkornacker) in [#1282](https://github.com/pixeltable/pixeltable/pull/1282)
* \[PXT-1096] Introduce unversioned tables by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1233](https://github.com/pixeltable/pixeltable/pull/1233)
* Initial pxt CLI by [@mkornacker](https://github.com/mkornacker) in [#1283](https://github.com/pixeltable/pixeltable/pull/1283)
* Delete the invalidating table version log statement by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1280](https://github.com/pixeltable/pixeltable/pull/1280)
* Change buckets used in bedrock tests. by [@amithadke](https://github.com/amithadke) in [#1290](https://github.com/pixeltable/pixeltable/pull/1290)
* \[PXT-1131] Update Catalog table locking protocol by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1247](https://github.com/pixeltable/pixeltable/pull/1247)
* Dependabot security patches, week of 2026-04-20 by [@aaron-siegel](https://github.com/aaron-siegel) in [#1289](https://github.com/pixeltable/pixeltable/pull/1289)
* \[PXT-1065 + PXT-1066] Add export\_csv and export\_json by [@christopherpestano](https://github.com/christopherpestano) in [#1212](https://github.com/pixeltable/pixeltable/pull/1212)
* Additional FastAPIRouter methods by [@mkornacker](https://github.com/mkornacker) in [#1292](https://github.com/pixeltable/pixeltable/pull/1292)
* PXT-1134: add CellMaterializationNode to add\_computed\_column plan by [@mkornacker](https://github.com/mkornacker) in [#1294](https://github.com/pixeltable/pixeltable/pull/1294)
* PXT-1118: test\_overlay\_text() fails in CI by [@mkornacker](https://github.com/mkornacker) in [#1287](https://github.com/pixeltable/pixeltable/pull/1287)
* Additional functionality for mix\_audio() and overlay\_text() by [@mkornacker](https://github.com/mkornacker) in [#1300](https://github.com/pixeltable/pixeltable/pull/1300)
* \[PXT-1002] Retry loop in another location where pending table ops err… by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1297](https://github.com/pixeltable/pixeltable/pull/1297)
* add .claude to gitignore by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1302](https://github.com/pixeltable/pixeltable/pull/1302)
* export\_sql parameter for FastAPIRouter insert/update routes by [@mkornacker](https://github.com/mkornacker) in [#1299](https://github.com/pixeltable/pixeltable/pull/1299)
* \[PXT-1125 + PXT-1133] Fix SQL/Python divergence on utf-8 string input by [@christopherpestano](https://github.com/christopherpestano) in [#1286](https://github.com/pixeltable/pixeltable/pull/1286)
* openai.responses() by [@aaron-siegel](https://github.com/aaron-siegel) in [#1293](https://github.com/pixeltable/pixeltable/pull/1293)
* Unified configuration system & deployment bundle creation by [@aaron-siegel](https://github.com/aaron-siegel) in [#1296](https://github.com/pixeltable/pixeltable/pull/1296)
* \[PXT-924] Improve error messages for incorrect udf call by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1291](https://github.com/pixeltable/pixeltable/pull/1291)
* add a missing type hint for `make check` with python 3.12 by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1304](https://github.com/pixeltable/pixeltable/pull/1304)
* Docstring formatting by [@aaron-siegel](https://github.com/aaron-siegel) in [#1305](https://github.com/pixeltable/pixeltable/pull/1305)
#### New Contributors
* [@tomaioo](https://github.com/tomaioo) made their first contribution in [#1265](https://github.com/pixeltable/pixeltable/pull/1265)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.28...v0.6.0](https://github.com/pixeltable/pixeltable/compare/v0.5.28...v0.6.0)
***
### v0.5.28
**Released:** April 17, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.28](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.28)
#### What's Changed
* \[PXT-970 + PXT-1074] Gemini video reference images, TTS, and transcription by [@christopherpestano](https://github.com/christopherpestano) in [#1221](https://github.com/pixeltable/pixeltable/pull/1221)
* Update Mintlify version and fix sidebar ordering in style.css by [@aaron-siegel](https://github.com/aaron-siegel) in [#1263](https://github.com/pixeltable/pixeltable/pull/1263)
* Add `timm` dependency to notebooks by [@aaron-siegel](https://github.com/aaron-siegel) in [#1266](https://github.com/pixeltable/pixeltable/pull/1266)
* Update README: dashboard, skill repo, restructured Quick Start by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1264](https://github.com/pixeltable/pixeltable/pull/1264)
* Upgrade dev versions of libraries based on dependabot warnings by [@aaron-siegel](https://github.com/aaron-siegel) in [#1267](https://github.com/pixeltable/pixeltable/pull/1267)
* More `nightly.yml` updates by [@aaron-siegel](https://github.com/aaron-siegel) in [#1271](https://github.com/pixeltable/pixeltable/pull/1271)
* [@pxt](https://github.com/pxt).iterator() should be equivalent to [@pxt](https://github.com/pxt).iterator by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1273](https://github.com/pixeltable/pixeltable/pull/1273)
* Hide progress tracker by default by [@aaron-siegel](https://github.com/aaron-siegel) in [#1275](https://github.com/pixeltable/pixeltable/pull/1275)
* \[PXT-477] Primary key enforcement by [@christopherpestano](https://github.com/christopherpestano) in [#1203](https://github.com/pixeltable/pixeltable/pull/1203)
* A few more dependabot version upgrades by [@aaron-siegel](https://github.com/aaron-siegel) in [#1274](https://github.com/pixeltable/pixeltable/pull/1274)
* SDK docs: shift vision and uuid to built-in functions section by [@apreshill](https://github.com/apreshill) in [#1270](https://github.com/pixeltable/pixeltable/pull/1270)
* very\_expensive pytest marker, and 'expensive notebooks' concept by [@aaron-siegel](https://github.com/aaron-siegel) in [#1272](https://github.com/pixeltable/pixeltable/pull/1272)
* \[PXT-1026] Streaming ResultSets via ResultCursor by [@christopherpestano](https://github.com/christopherpestano) in [#1259](https://github.com/pixeltable/pixeltable/pull/1259)
* serving.fastapi.FastAPIRouter by [@mkornacker](https://github.com/mkornacker) in [#1268](https://github.com/pixeltable/pixeltable/pull/1268)
* Proposal for new exception hierarchy. by [@mkornacker](https://github.com/mkornacker) in [#1278](https://github.com/pixeltable/pixeltable/pull/1278)
* PXT-1048 Expand Bedrock API to support media input and output by [@amithadke](https://github.com/amithadke) in [#1244](https://github.com/pixeltable/pixeltable/pull/1244)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.27...v0.5.28](https://github.com/pixeltable/pixeltable/compare/v0.5.27...v0.5.28)
***
### v0.5.27
**Released:** April 11, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.27](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.27)
#### What's Changed
* Don't use 'no\_timm' as default revision for detr models by [@aaron-siegel](https://github.com/aaron-siegel) in [#1256](https://github.com/pixeltable/pixeltable/pull/1256)
* Updates for WhisperX 3.8 compatibility by [@aaron-siegel](https://github.com/aaron-siegel) in [#1255](https://github.com/pixeltable/pixeltable/pull/1255)
* Cookbook: Create video slideshow from images by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1243](https://github.com/pixeltable/pixeltable/pull/1243)
* Documentation cleanup: deprecate openai.vision, add missing SDK entries, new provider notebooks by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1235](https://github.com/pixeltable/pixeltable/pull/1235)
* Consolidate notebook code cells containing only import statements by [@aaron-siegel](https://github.com/aaron-siegel) in [#1260](https://github.com/pixeltable/pixeltable/pull/1260)
* Updates for transformers 5 by [@aaron-siegel](https://github.com/aaron-siegel) in [#1257](https://github.com/pixeltable/pixeltable/pull/1257)
* Documentation for ResultSet, Expr, ColumnRef by [@aaron-siegel](https://github.com/aaron-siegel) in [#1262](https://github.com/pixeltable/pixeltable/pull/1262)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.26...v0.5.27](https://github.com/pixeltable/pixeltable/compare/v0.5.26...v0.5.27)
***
### v0.5.26
**Released:** April 09, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.26](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.26)
#### What's Changed
* Fix release bundle specs in pyproject.toml by [@aaron-siegel](https://github.com/aaron-siegel) in [#1254](https://github.com/pixeltable/pixeltable/pull/1254)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.25...v0.5.26](https://github.com/pixeltable/pixeltable/compare/v0.5.25...v0.5.26)
***
### v0.5.25
**Released:** April 09, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.25](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.25)
#### What's Changed
* Local UI by [@aaron-siegel](https://github.com/aaron-siegel) in [#1224](https://github.com/pixeltable/pixeltable/pull/1224)
* \[PXT-1088] Fix RateLimitsScheduler over-scheduling causing 429 cascades by [@christopherpestano](https://github.com/christopherpestano) in [#1176](https://github.com/pixeltable/pixeltable/pull/1176)
* \[PXT-1002] Catalog refactoring in begin\_xact by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1225](https://github.com/pixeltable/pixeltable/pull/1225)
* get rid of the corrupts\_db marker -- we've fixed all issues, and it's… by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1236](https://github.com/pixeltable/pixeltable/pull/1236)
* Fix for [https://github.com/pixeltable/pixeltable/issues/1239](https://github.com/pixeltable/pixeltable/issues/1239) by [@mkornacker](https://github.com/mkornacker) in [#1241](https://github.com/pixeltable/pixeltable/pull/1241)
* list\_iterator by [@aaron-siegel](https://github.com/aaron-siegel) in [#1231](https://github.com/pixeltable/pixeltable/pull/1231)
* Use miniforge in CI instead of miniconda3 by [@aaron-siegel](https://github.com/aaron-siegel) in [#1162](https://github.com/pixeltable/pixeltable/pull/1162)
* \[PXT-836] Decouple resource estimation from rate-limits pool by [@christopherpestano](https://github.com/christopherpestano) in [#1191](https://github.com/pixeltable/pixeltable/pull/1191)
* Clean up bridge.py by [@mkornacker](https://github.com/mkornacker) in [#1237](https://github.com/pixeltable/pixeltable/pull/1237)
* Use v2.x API in Mistral AI integration by [@aaron-siegel](https://github.com/aaron-siegel) in [#1251](https://github.com/pixeltable/pixeltable/pull/1251)
* More video udfs by [@mkornacker](https://github.com/mkornacker) in [#1240](https://github.com/pixeltable/pixeltable/pull/1240)
* \[PXT-1081] Exclude Microsoft Fabric notebook from make check by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1253](https://github.com/pixeltable/pixeltable/pull/1253)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.24...v0.5.25](https://github.com/pixeltable/pixeltable/compare/v0.5.24...v0.5.25)
***
### v0.5.24
**Released:** April 02, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.24](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.24)
#### What's Changed
* Remove a duplicate select query from Catalog.get\_tbl\_version by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1214](https://github.com/pixeltable/pixeltable/pull/1214)
* PXT-767 Support Array Column as input to add\_embedding\_index by [@amithadke](https://github.com/amithadke) in [#1096](https://github.com/pixeltable/pixeltable/pull/1096)
* Update release script for admin repo by [@aaron-siegel](https://github.com/aaron-siegel) in [#1220](https://github.com/pixeltable/pixeltable/pull/1220)
* \[PXT-565] Llama CPP tool call invocation integration by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1211](https://github.com/pixeltable/pixeltable/pull/1211)
* \[PXT-1063] Garbage Collection in dispatch() can stop gc'ing intermediate values on split dags by [@christopherpestano](https://github.com/christopherpestano) in [#1194](https://github.com/pixeltable/pixeltable/pull/1194)
* Remove check\_pending\_ops=False during similarity expression deserialization by [@amithadke](https://github.com/amithadke) in [#1222](https://github.com/pixeltable/pixeltable/pull/1222)
* Suppress pytest-benchmark/xdist warning in test runs by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1223](https://github.com/pixeltable/pixeltable/pull/1223)
* Improved type system for JsonType by [@aaron-siegel](https://github.com/aaron-siegel) in [#1215](https://github.com/pixeltable/pixeltable/pull/1215)
* remove an unused parameter dir\_id in view.\_create by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1229](https://github.com/pixeltable/pixeltable/pull/1229)
* \[PXT-1002] Some table data conduit cleanup by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1228](https://github.com/pixeltable/pixeltable/pull/1228)
* More video udfs by [@mkornacker](https://github.com/mkornacker) in [#1226](https://github.com/pixeltable/pixeltable/pull/1226)
* Fix convert\_41.py by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1230](https://github.com/pixeltable/pixeltable/pull/1230)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.23...v0.5.24](https://github.com/pixeltable/pixeltable/compare/v0.5.23...v0.5.24)
***
### v0.5.23
**Released:** March 23, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.23](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.23)
#### What's Changed
* Make \_runtime\_ctx an optional UDF parameter by [@aaron-siegel](https://github.com/aaron-siegel) in [#1208](https://github.com/pixeltable/pixeltable/pull/1208)
* PXT-968: More bounding box udfs by [@mkornacker](https://github.com/mkornacker) in [#1204](https://github.com/pixeltable/pixeltable/pull/1204)
* PXT-947 Add OpenAI gpt-image-\* support and image editing functions by [@amithadke](https://github.com/amithadke) in [#1202](https://github.com/pixeltable/pixeltable/pull/1202)
* resize(video: pxt.Video) udf by [@mkornacker](https://github.com/mkornacker) in [#1210](https://github.com/pixeltable/pixeltable/pull/1210)
* Python 3.14 compatibility by [@aaron-siegel](https://github.com/aaron-siegel) in [#1209](https://github.com/pixeltable/pixeltable/pull/1209)
* Gemini notebook fixes by [@aaron-siegel](https://github.com/aaron-siegel) in [#1216](https://github.com/pixeltable/pixeltable/pull/1216)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.22...v0.5.23](https://github.com/pixeltable/pixeltable/compare/v0.5.22...v0.5.23)
***
### v0.5.22
**Released:** March 16, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.22](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.22)
#### What's Changed
* fix: add missing \_runtime\_ctx to embed\_content multimodal overloads by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1207](https://github.com/pixeltable/pixeltable/pull/1207)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.21...v0.5.22](https://github.com/pixeltable/pixeltable/compare/v0.5.21...v0.5.22)
***
### v0.5.21
**Released:** March 15, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.21](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.21)
#### What's Changed
* \[PXT-1002] Do not convert SQL errors when loading a table version for… by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1170](https://github.com/pixeltable/pixeltable/pull/1170)
* Enable cockroach tests by [@amithadke](https://github.com/amithadke) in [#1118](https://github.com/pixeltable/pixeltable/pull/1118)
* \[PXT-1002] RandomTableOps should ignore the error when creating a vie… by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1172](https://github.com/pixeltable/pixeltable/pull/1172)
* Random-ops should exit with a non-zero code on KeyboardInterrupt by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1175](https://github.com/pixeltable/pixeltable/pull/1175)
* Refactor Sample Apps with New Pattern + Broken Links and new Generic Zapier-esque ETL Demo App by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1169](https://github.com/pixeltable/pixeltable/pull/1169)
* \[PXT-1002] Fix a broken self-validation in Catalog by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1174](https://github.com/pixeltable/pixeltable/pull/1174)
* Create \*.instructions.md by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1177](https://github.com/pixeltable/pixeltable/pull/1177)
* \[PXT-1002] Improve random\_ops logging + a bug fix in Catalog by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1178](https://github.com/pixeltable/pixeltable/pull/1178)
* \[PXT-969] Materialize stores\_cellmd as boolean property in ColumnMd by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1127](https://github.com/pixeltable/pixeltable/pull/1127)
* \[PXT-1052] Fix RateLimitsScheduler crash when functions share a pool with different signatures by [@christopherpestano](https://github.com/christopherpestano) in [#1181](https://github.com/pixeltable/pixeltable/pull/1181)
* PXT-920 + PXT-1046 Refactor execution dispatch/init\_rows to improve performance and adding perf testing by [@christopherpestano](https://github.com/christopherpestano) in [#1154](https://github.com/pixeltable/pixeltable/pull/1154)
* \[PXT-1024] Catch up Reve UDFs with the API updates by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1180](https://github.com/pixeltable/pixeltable/pull/1180)
* Enable cockroachdb tests for merge queue by [@amithadke](https://github.com/amithadke) in [#1183](https://github.com/pixeltable/pixeltable/pull/1183)
* PXT-1061 Handle non-path text strings in process\_media\_contents by [@amithadke](https://github.com/amithadke) in [#1173](https://github.com/pixeltable/pixeltable/pull/1173)
* Add BFL FLUX integration for image generation and editing by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1028](https://github.com/pixeltable/pixeltable/pull/1028)
* PXT-1044: parquet import of image urls is broken by [@mkornacker](https://github.com/mkornacker) in [#1186](https://github.com/pixeltable/pixeltable/pull/1186)
* PXT-883 Allow pixeltable http urls for referencing replicas by [@amithadke](https://github.com/amithadke) in [#1179](https://github.com/pixeltable/pixeltable/pull/1179)
* Add Vertex AI authentication to Gemini integration by [@mkornacker](https://github.com/mkornacker) in [#1193](https://github.com/pixeltable/pixeltable/pull/1193)
* Log thread name now that Pixeltable can be used in a multi-threaded f… by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1184](https://github.com/pixeltable/pixeltable/pull/1184)
* \[PXT-1002] Retry pending table ops on InFailedSqlTransaction by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1192](https://github.com/pixeltable/pixeltable/pull/1192)
* Fix flakiness in some sample tests by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1196](https://github.com/pixeltable/pixeltable/pull/1196)
* Updates to CLAUDE.md by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1197](https://github.com/pixeltable/pixeltable/pull/1197)
* \[PXT-1067] Disable sqlalchemy INFO logging by default by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1195](https://github.com/pixeltable/pixeltable/pull/1195)
* fix model\_dump() calls in udfs by [@mkornacker](https://github.com/mkornacker) in [#1198](https://github.com/pixeltable/pixeltable/pull/1198)
* concat\_videos\_agg() uda, equivalent to concat\_videos() udf by [@mkornacker](https://github.com/mkornacker) in [#1188](https://github.com/pixeltable/pixeltable/pull/1188)
* PXT-968: bounding box udfs by [@mkornacker](https://github.com/mkornacker) in [#1190](https://github.com/pixeltable/pixeltable/pull/1190)
* miniconda migration: update context for ai agents by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1200](https://github.com/pixeltable/pixeltable/pull/1200)
* Adjusting Copilot review instructions by [@mkornacker](https://github.com/mkornacker) in [#1205](https://github.com/pixeltable/pixeltable/pull/1205)
* fix: update transformers to 4.57.6 and fix huggingface.py mypy errors by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1189](https://github.com/pixeltable/pixeltable/pull/1189)
* Multimodal embeddings support in Gemini integration by [@aaron-siegel](https://github.com/aaron-siegel) in [#1206](https://github.com/pixeltable/pixeltable/pull/1206)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.20...v0.5.21](https://github.com/pixeltable/pixeltable/compare/v0.5.20...v0.5.21)
***
### v0.5.20
**Released:** March 03, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.20](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.20)
#### What's Changed
* Perftest to log if it thinks that it's running in CI by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1163](https://github.com/pixeltable/pixeltable/pull/1163)
* \[PXT-1002] re-enable force replace view in random ops by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1166](https://github.com/pixeltable/pixeltable/pull/1166)
* \[PXT-1002] Fix table md caching when an insert finalizes view creation by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1138](https://github.com/pixeltable/pixeltable/pull/1138)
* Add missing %pip install to custom-iterators.ipynb by [@aaron-siegel](https://github.com/aaron-siegel) in [#1171](https://github.com/pixeltable/pixeltable/pull/1171)
* Add migration guides for new users coming from common stacks by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1167](https://github.com/pixeltable/pixeltable/pull/1167)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.19...v0.5.20](https://github.com/pixeltable/pixeltable/compare/v0.5.19...v0.5.20)
***
### v0.5.19
**Released:** March 01, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.19](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.19)
#### What's Changed
* Add local docs serving instructions to contributing guide by [@apreshill](https://github.com/apreshill) in [#1054](https://github.com/pixeltable/pixeltable/pull/1054)
* TableOp refactoring so that TableVersion is not required for some ops by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1153](https://github.com/pixeltable/pixeltable/pull/1153)
* [@pxt](https://github.com/pxt).iterator decorator by [@aaron-siegel](https://github.com/aaron-siegel) in [#1111](https://github.com/pixeltable/pixeltable/pull/1111)
* Docs: add missing integrations, SDK entries, and cookbook updates for v0.5.11–v0.5.18 by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1158](https://github.com/pixeltable/pixeltable/pull/1158)
* Quieter CI output by [@aaron-siegel](https://github.com/aaron-siegel) in [#1161](https://github.com/pixeltable/pixeltable/pull/1161)
* \[PXT-1002] Make non-transactional TableOps idempotent by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1139](https://github.com/pixeltable/pixeltable/pull/1139)
* \[PXT-1043] Support video embeddings in VoyageAI by [@aaron-siegel](https://github.com/aaron-siegel) in [#1160](https://github.com/pixeltable/pixeltable/pull/1160)
* PXT-877 Fixing if\_exists='replace' cannot be used to replace a Table with a View/Snapshot or vice-versa by [@christopherpestano](https://github.com/christopherpestano) in [#1150](https://github.com/pixeltable/pixeltable/pull/1150)
* PXT-1020: support for multi-threaded API calls by [@mkornacker](https://github.com/mkornacker) in [#1155](https://github.com/pixeltable/pixeltable/pull/1155)
* Fix TableVersion.is\_iterator\_column by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1159](https://github.com/pixeltable/pixeltable/pull/1159)
* PXT-933 Support videos in gemini generate\_content by [@amithadke](https://github.com/amithadke) in [#1152](https://github.com/pixeltable/pixeltable/pull/1152)
* \[PXT-1018] Add a "source" field to list of columns in t.describe() by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1135](https://github.com/pixeltable/pixeltable/pull/1135)
* uvloop compatibility by [@mkornacker](https://github.com/mkornacker) in [#1164](https://github.com/pixeltable/pixeltable/pull/1164)
* docs: update deployment guides for thread safety, sync endpoints, and uvloop by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1165](https://github.com/pixeltable/pixeltable/pull/1165)
* Add Bedrock API Key auth support and notebook outputs by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1146](https://github.com/pixeltable/pixeltable/pull/1146)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.18...v0.5.19](https://github.com/pixeltable/pixeltable/compare/v0.5.18...v0.5.19)
***
### v0.5.18
**Released:** February 24, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.18](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.18)
#### What's Changed
* misc improvements in the code by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1072](https://github.com/pixeltable/pixeltable/pull/1072)
* \[PXT-995] improve test migration coverage of literals of various types by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1128](https://github.com/pixeltable/pixeltable/pull/1128)
* Twelvelabs notebook update by [@apreshill](https://github.com/apreshill) in [#1117](https://github.com/pixeltable/pixeltable/pull/1117)
* \[PXT-1040] Temporarily disable twelvelabs nb test by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1140](https://github.com/pixeltable/pixeltable/pull/1140)
* Update contribution guidelines regarding AI-generated code by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1134](https://github.com/pixeltable/pixeltable/pull/1134)
* \[PXT-1007 + PXT-1010] Modifying add\_columns to support column metadata and introducing standard ColumnSpec by [@christopherpestano](https://github.com/christopherpestano) in [#1119](https://github.com/pixeltable/pixeltable/pull/1119)
* Adding negative\_prompt to img2img notebook by [@christopherpestano](https://github.com/christopherpestano) in [#1136](https://github.com/pixeltable/pixeltable/pull/1136)
* \[PXT-1040] disable all twelvelabs tests by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1142](https://github.com/pixeltable/pixeltable/pull/1142)
* PXT-1039: video\_splitter(mode='accurate') doesn't work by [@mkornacker](https://github.com/mkornacker) in [#1145](https://github.com/pixeltable/pixeltable/pull/1145)
* PXT-966: crop() udf for videos by [@mkornacker](https://github.com/mkornacker) in [#1144](https://github.com/pixeltable/pixeltable/pull/1144)
* dumps() udf for json by [@mkornacker](https://github.com/mkornacker) in [#1149](https://github.com/pixeltable/pixeltable/pull/1149)
* Fixes for recent versions of mintlify by [@aaron-siegel](https://github.com/aaron-siegel) in [#1151](https://github.com/pixeltable/pixeltable/pull/1151)
* \[PXT-1003] Add offset parameter to limit() queries for pagination by [@aaron-siegel](https://github.com/aaron-siegel) in [#1148](https://github.com/pixeltable/pixeltable/pull/1148)
* Add agentic patterns cookbook by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1141](https://github.com/pixeltable/pixeltable/pull/1141)
* PXT-985 + PXT-1041 - Adding custom\_metadata and comment for columns by [@christopherpestano](https://github.com/christopherpestano) in [#1132](https://github.com/pixeltable/pixeltable/pull/1132)
* Fix: Implement drop\_index() for BtreeIndex and EmbeddingIndex by [@KeeProMise](https://github.com/KeeProMise) in [#1133](https://github.com/pixeltable/pixeltable/pull/1133)
* Update OpenAI vision and image gen APIs to make proper use of images in dicts by [@aaron-siegel](https://github.com/aaron-siegel) in [#1147](https://github.com/pixeltable/pixeltable/pull/1147)
* \[PXT-995] Literal should serialize its entire type info by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1123](https://github.com/pixeltable/pixeltable/pull/1123)
#### New Contributors
* [@KeeProMise](https://github.com/KeeProMise) made their first contribution in [#1133](https://github.com/pixeltable/pixeltable/pull/1133)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.17...v0.5.18](https://github.com/pixeltable/pixeltable/compare/v0.5.17...v0.5.18)
***
### v0.5.17
**Released:** February 10, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.17](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.17)
#### What's Changed
* Standardize names for runner configs by [@aaron-siegel](https://github.com/aaron-siegel) in [#1122](https://github.com/pixeltable/pixeltable/pull/1122)
* Add Jina AI integration for embeddings and reranking by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1029](https://github.com/pixeltable/pixeltable/pull/1029)
* Add Microsoft Fabric Integration for Azure OpenAI by [@pawarbi](https://github.com/pawarbi) in [#1109](https://github.com/pixeltable/pixeltable/pull/1109)
* Switch away from gemini-2.0 models by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1115](https://github.com/pixeltable/pixeltable/pull/1115)
* PXT-985 Adding custom\_metadata and restricting comment field to string by [@christopherpestano](https://github.com/christopherpestano) in [#1102](https://github.com/pixeltable/pixeltable/pull/1102)
* Nightly CI fix by [@aaron-siegel](https://github.com/aaron-siegel) in [#1129](https://github.com/pixeltable/pixeltable/pull/1129)
* PXT-1033: handle min\_segment\_duration=None correctly in VideoSplitter by [@mkornacker](https://github.com/mkornacker) in [#1131](https://github.com/pixeltable/pixeltable/pull/1131)
* Apply ruff formatting to code snippets in docstrings by [@aaron-siegel](https://github.com/aaron-siegel) in [#1125](https://github.com/pixeltable/pixeltable/pull/1125)
* Improved treatment of stored UDFs by [@aaron-siegel](https://github.com/aaron-siegel) in [#1126](https://github.com/pixeltable/pixeltable/pull/1126)
* PXT-1023: Support for ragged arrays in export\_parquet() by [@mkornacker](https://github.com/mkornacker) in [#1124](https://github.com/pixeltable/pixeltable/pull/1124)
#### New Contributors
* [@pawarbi](https://github.com/pawarbi) made their first contribution in [#1109](https://github.com/pixeltable/pixeltable/pull/1109)
* [@christopherpestano](https://github.com/christopherpestano) made their first contribution in [#1102](https://github.com/pixeltable/pixeltable/pull/1102)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.16...v0.5.17](https://github.com/pixeltable/pixeltable/compare/v0.5.16...v0.5.17)
***
### v0.5.16
**Released:** February 04, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.16](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.16)
#### What's Changed
* PXT-898 Allow Pixeltable API key to change in the environment mid-stream in a Python session by [@amithadke](https://github.com/amithadke) in [#1060](https://github.com/pixeltable/pixeltable/pull/1060)
* various runwayml followups by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1095](https://github.com/pixeltable/pixeltable/pull/1095)
* Ensure progress bar stops on empty results and plan exit by [@amithadke](https://github.com/amithadke) in [#1097](https://github.com/pixeltable/pixeltable/pull/1097)
* Fix exception handling in catalog by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1101](https://github.com/pixeltable/pixeltable/pull/1101)
* Migrate docs to `uuid7()` UDF by [@apreshill](https://github.com/apreshill) in [#1093](https://github.com/pixeltable/pixeltable/pull/1093)
* Add retries to Python install in CI by [@aaron-siegel](https://github.com/aaron-siegel) in [#1094](https://github.com/pixeltable/pixeltable/pull/1094)
* fix: Make notebook outputs visible in dark mode by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1107](https://github.com/pixeltable/pixeltable/pull/1107)
* various improvements to random-ops script by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1084](https://github.com/pixeltable/pixeltable/pull/1084)
* Prep work for iterator refactor: Add media types and iterators to migration test by [@aaron-siegel](https://github.com/aaron-siegel) in [#1103](https://github.com/pixeltable/pixeltable/pull/1103)
* Add export media to s3 to io cookbooks in docs by [@apreshill](https://github.com/apreshill) in [#1088](https://github.com/pixeltable/pixeltable/pull/1088)
* Include audio\_splitter and video\_splitter in db dumps by [@aaron-siegel](https://github.com/aaron-siegel) in [#1110](https://github.com/pixeltable/pixeltable/pull/1110)
* PXT-965 Support http url and blob store uri for creating json/parquet/csv tables by [@amithadke](https://github.com/amithadke) in [#1104](https://github.com/pixeltable/pixeltable/pull/1104)
* Fixes for Pandas 3.0 by [@aaron-siegel](https://github.com/aaron-siegel) in [#1112](https://github.com/pixeltable/pixeltable/pull/1112)
* Upgrade ruff to latest by [@aaron-siegel](https://github.com/aaron-siegel) in [#1114](https://github.com/pixeltable/pixeltable/pull/1114)
* Fixes for Transformers 5 by [@aaron-siegel](https://github.com/aaron-siegel) in [#1113](https://github.com/pixeltable/pixeltable/pull/1113)
* Use a larger runner in merge queue for full tests on Python 3.10 by [@aaron-siegel](https://github.com/aaron-siegel) in [#1120](https://github.com/pixeltable/pixeltable/pull/1120)
* \[PXT-944] speech2text\_for\_conditional\_generation declares return type… by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1116](https://github.com/pixeltable/pixeltable/pull/1116)
* \[PXT-875] Fix openai perftest on github by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1062](https://github.com/pixeltable/pixeltable/pull/1062)
* PXT-973: additional\_columns doesn't evaluate as expected when creating a view by [@mkornacker](https://github.com/mkornacker) in [#1087](https://github.com/pixeltable/pixeltable/pull/1087)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.15...v0.5.16](https://github.com/pixeltable/pixeltable/compare/v0.5.15...v0.5.16)
***
### v0.5.15
**Released:** January 29, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.15](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.15)
#### What's Changed
* docs: update overview description and callout/footer styling by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1086](https://github.com/pixeltable/pixeltable/pull/1086)
* Fix HF datasets rotten\_tomatoes references in tests & notebook by [@aaron-siegel](https://github.com/aaron-siegel) in [#1089](https://github.com/pixeltable/pixeltable/pull/1089)
* Gemini UDFs to use "rate limits" scheduler by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1092](https://github.com/pixeltable/pixeltable/pull/1092)
* Allow dict/list config params to be specified as environment variables by [@aaron-siegel](https://github.com/aaron-siegel) in [#1091](https://github.com/pixeltable/pixeltable/pull/1091)
* Minor Gemini UDF followup for safer get\_retry\_delay() by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1098](https://github.com/pixeltable/pixeltable/pull/1098)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.14...v0.5.15](https://github.com/pixeltable/pixeltable/compare/v0.5.14...v0.5.15)
***
### v0.5.14
**Released:** January 24, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.14](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.14)
#### What's Changed
* Add RunwayML integration with UDFs for image and video generation by [@tiennguyentony](https://github.com/tiennguyentony) in [#1019](https://github.com/pixeltable/pixeltable/pull/1019)
* Deployment and Use Cases Docs by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1043](https://github.com/pixeltable/pixeltable/pull/1043)
* Transaction rollback by [@mkornacker](https://github.com/mkornacker) in [#1075](https://github.com/pixeltable/pixeltable/pull/1075)
* \[PXT-972] Bugfix: FrameIterator.set\_pos() on videos with start\_time > 0 by [@aaron-siegel](https://github.com/aaron-siegel) in [#1082](https://github.com/pixeltable/pixeltable/pull/1082)
* to\_string() method on UUIDType by [@aaron-siegel](https://github.com/aaron-siegel) in [#1078](https://github.com/pixeltable/pixeltable/pull/1078)
* CI and Makefile step to ensure notebooks have >= 50% of their cells with outputs by [@aaron-siegel](https://github.com/aaron-siegel) in [#1073](https://github.com/pixeltable/pixeltable/pull/1073)
* Regenerate all outputs for Reve integration notebook by [@apreshill](https://github.com/apreshill) in [#1071](https://github.com/pixeltable/pixeltable/pull/1071)
* Apply ruff formatting to all notebooks by [@aaron-siegel](https://github.com/aaron-siegel) in [#1074](https://github.com/pixeltable/pixeltable/pull/1074)
#### New Contributors
* [@tiennguyentony](https://github.com/tiennguyentony) made their first contribution in [#1019](https://github.com/pixeltable/pixeltable/pull/1019)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.13...v0.5.14](https://github.com/pixeltable/pixeltable/compare/v0.5.13...v0.5.14)
***
### v0.5.13
**Released:** January 22, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.13](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.13)
#### What's Changed
* rename reset\_db fixture to uses\_db by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1067](https://github.com/pixeltable/pixeltable/pull/1067)
* Use '/' as path delimiter by [@amithadke](https://github.com/amithadke) in [#1055](https://github.com/pixeltable/pixeltable/pull/1055)
* Temporarily disable progress reporting when verbosity \< 2 by [@aaron-siegel](https://github.com/aaron-siegel) in [#1079](https://github.com/pixeltable/pixeltable/pull/1079)
* Follow up fixes for Path delimiter change by [@amithadke](https://github.com/amithadke) in [#1076](https://github.com/pixeltable/pixeltable/pull/1076)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.12...v0.5.13](https://github.com/pixeltable/pixeltable/compare/v0.5.12...v0.5.13)
***
### v0.5.12
**Released:** January 17, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.12](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.12)
#### What's Changed
* Lint markdown in notebooks by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1033](https://github.com/pixeltable/pixeltable/pull/1033)
* Adjust down max connections on OpenAI client by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1058](https://github.com/pixeltable/pixeltable/pull/1058)
* \[PXT-915] Gemini embedding UDFs by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#986](https://github.com/pixeltable/pixeltable/pull/986)
* PXT-866 Add validation for version in pixeltable uri by [@amithadke](https://github.com/amithadke) in [#1048](https://github.com/pixeltable/pixeltable/pull/1048)
* uuid7() udf by [@mkornacker](https://github.com/mkornacker) in [#1059](https://github.com/pixeltable/pixeltable/pull/1059)
* \[PXT-875] Disable performance test until it reliably passes by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1061](https://github.com/pixeltable/pixeltable/pull/1061)
* Daemonize pgserver on Windows by [@aaron-siegel](https://github.com/aaron-siegel) in [#1057](https://github.com/pixeltable/pixeltable/pull/1057)
* PXT-954: assertion in recompute\_columns() for view column by [@mkornacker](https://github.com/mkornacker) in [#1064](https://github.com/pixeltable/pixeltable/pull/1064)
* Remove obsolete mkdocs by [@aaron-siegel](https://github.com/aaron-siegel) in [#1056](https://github.com/pixeltable/pixeltable/pull/1056)
* Working with blob storage nb by [@apreshill](https://github.com/apreshill) in [#977](https://github.com/pixeltable/pixeltable/pull/977)
* PXT-961: correct support for alpha in draw\_bounding\_boxes() by [@mkornacker](https://github.com/mkornacker) in [#1068](https://github.com/pixeltable/pixeltable/pull/1068)
* Notebook CI tweaks by [@aaron-siegel](https://github.com/aaron-siegel) in [#1069](https://github.com/pixeltable/pixeltable/pull/1069)
* PXT-943: Rectify all indices in TableRestorer, not just embedding indices by [@aaron-siegel](https://github.com/aaron-siegel) in [#1066](https://github.com/pixeltable/pixeltable/pull/1066)
* \[PXT-955] Skip UDA evaluation if a required parameter is None by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1070](https://github.com/pixeltable/pixeltable/pull/1070)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.11...v0.5.12](https://github.com/pixeltable/pixeltable/compare/v0.5.11...v0.5.12)
***
### v0.5.11
**Released:** January 13, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.11](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.11)
#### What's Changed
* \[PXT-916] Store embedding indexes as halfvecs by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1007](https://github.com/pixeltable/pixeltable/pull/1007)
* Add a "read only random ops" stress-tests job by [@aaron-siegel](https://github.com/aaron-siegel) in [#1047](https://github.com/pixeltable/pixeltable/pull/1047)
* Streamline dev installation by [@aaron-siegel](https://github.com/aaron-siegel) in [#1046](https://github.com/pixeltable/pixeltable/pull/1046)
* Add reruns by default to all cockroach test failures by [@aaron-siegel](https://github.com/aaron-siegel) in [#1053](https://github.com/pixeltable/pixeltable/pull/1053)
* PXT-938: export\_sql() by [@mkornacker](https://github.com/mkornacker) in [#1037](https://github.com/pixeltable/pixeltable/pull/1037)
* Add cookbooks: SQL and Segmentation by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1038](https://github.com/pixeltable/pixeltable/pull/1038)
* \[PXT-629] Update plan is incomplete by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1044](https://github.com/pixeltable/pixeltable/pull/1044)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.10...v0.5.11](https://github.com/pixeltable/pixeltable/compare/v0.5.10...v0.5.11)
***
### v0.5.10
**Released:** January 10, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.10](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.10)
#### What's Changed
* Adding ipywidgets to dev dependencies by [@mkornacker](https://github.com/mkornacker) in [#1027](https://github.com/pixeltable/pixeltable/pull/1027)
* Add a seed to TestSample.test\_sample\_basic\_f by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1040](https://github.com/pixeltable/pixeltable/pull/1040)
* Twelvelabs notebook by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1013](https://github.com/pixeltable/pixeltable/pull/1013)
* Readme Updates by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1041](https://github.com/pixeltable/pixeltable/pull/1041)
* Proper configurability for spaCy models by [@aaron-siegel](https://github.com/aaron-siegel) in [#1039](https://github.com/pixeltable/pixeltable/pull/1039)
* Various import fixes by [@aaron-siegel](https://github.com/aaron-siegel) in [#1042](https://github.com/pixeltable/pixeltable/pull/1042)
* PXT-875 Run perf tests on a dedicated larger runner by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1024](https://github.com/pixeltable/pixeltable/pull/1024)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.9...v0.5.10](https://github.com/pixeltable/pixeltable/compare/v0.5.9...v0.5.10)
***
### v0.5.9
**Released:** December 30, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.9](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.9)
#### What's Changed
* Bedrock invoke\_model() udf by [@mkornacker](https://github.com/mkornacker) in [#1018](https://github.com/pixeltable/pixeltable/pull/1018)
* \[PXT-765] Support for Office Formats as part of Document Type through MarkdownIT by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#960](https://github.com/pixeltable/pixeltable/pull/960)
* HF DetrForSegmentation by [@mkornacker](https://github.com/mkornacker) in [#1020](https://github.com/pixeltable/pixeltable/pull/1020)
* Image2Image: Updated HF.py to use AutoPipelineForImage2Image and Cookbook by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1025](https://github.com/pixeltable/pixeltable/pull/1025)
* Fixed broken tutorial links. by [@joerg84](https://github.com/joerg84) in [#1026](https://github.com/pixeltable/pixeltable/pull/1026)
* Allow `similarity(image=...)` to accept a filename or URL instead of a PIL image object by [@aaron-siegel](https://github.com/aaron-siegel) in [#1023](https://github.com/pixeltable/pixeltable/pull/1023)
* docs(cookbook): add MCP tool calling section to LLM tool calling guide by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1021](https://github.com/pixeltable/pixeltable/pull/1021)
* PXT-928: Export Json columns to parquet as pa.struct by [@mkornacker](https://github.com/mkornacker) in [#1017](https://github.com/pixeltable/pixeltable/pull/1017)
* removing psutil by [@mkornacker](https://github.com/mkornacker) in [#1031](https://github.com/pixeltable/pixeltable/pull/1031)
* Use head() instead of collect() in test\_add\_column\_to\_view by [@aaron-siegel](https://github.com/aaron-siegel) in [#1022](https://github.com/pixeltable/pixeltable/pull/1022)
* disable progress reporting in Jupyter if ipywidgets is not installed by [@mkornacker](https://github.com/mkornacker) in [#1032](https://github.com/pixeltable/pixeltable/pull/1032)
#### New Contributors
* [@joerg84](https://github.com/joerg84) made their first contribution in [#1026](https://github.com/pixeltable/pixeltable/pull/1026)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.8...v0.5.9](https://github.com/pixeltable/pixeltable/compare/v0.5.8...v0.5.9)
***
### v0.5.8
**Released:** December 20, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.8](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.8)
#### What's Changed
* Use high performance endpoint for Tigris by [@apreshill](https://github.com/apreshill) in [#1011](https://github.com/pixeltable/pixeltable/pull/1011)
* Merge Table.add\_embedding\_index examples by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1014](https://github.com/pixeltable/pixeltable/pull/1014)
* Notebook fixes & some cleanup by [@aaron-siegel](https://github.com/aaron-siegel) in [#1010](https://github.com/pixeltable/pixeltable/pull/1010)
* Progress tracker by [@mkornacker](https://github.com/mkornacker) in [#956](https://github.com/pixeltable/pixeltable/pull/956)
* \[PXT-925] Fix spurious exception when `if_not_exists='ignore'` is used with a missing parent dir by [@aaron-siegel](https://github.com/aaron-siegel) in [#1015](https://github.com/pixeltable/pixeltable/pull/1015)
* Improve primary key error message by [@aaron-siegel](https://github.com/aaron-siegel) in [#1016](https://github.com/pixeltable/pixeltable/pull/1016)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.7...v0.5.8](https://github.com/pixeltable/pixeltable/compare/v0.5.7...v0.5.8)
***
### v0.5.7
**Released:** December 18, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.7](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.7)
#### What's Changed
* Fix a bug in rag-demo.ipynb by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#996](https://github.com/pixeltable/pixeltable/pull/996)
* Fixes the errant `/datastore/` url in the Reve docstrings by [@apreshill](https://github.com/apreshill) in [#999](https://github.com/pixeltable/pixeltable/pull/999)
* Remove custom-iterators.ipynb from docs for now, and clean up docs.json by [@aaron-siegel](https://github.com/aaron-siegel) in [#997](https://github.com/pixeltable/pixeltable/pull/997)
* \[PXT-921] Skip test\_create\_video\_table on cockroachdb by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1002](https://github.com/pixeltable/pixeltable/pull/1002)
* Add iterators cookbook with all 6 built-in iterators by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1000](https://github.com/pixeltable/pixeltable/pull/1000)
* PXT 910 Add rerun options to presigned url tests by [@amithadke](https://github.com/amithadke) in [#1006](https://github.com/pixeltable/pixeltable/pull/1006)
* docs: add presigned\_url to S3 cookbook and update SDK docs by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1004](https://github.com/pixeltable/pixeltable/pull/1004)
* docs(providers): add Tigris example notebook by [@Xe](https://github.com/Xe) in [#998](https://github.com/pixeltable/pixeltable/pull/998)
* docs: update Mintlify theme colors and styling by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1008](https://github.com/pixeltable/pixeltable/pull/1008)
* Add `pxt.Binary` type to type system; `bytes` support in JSON; working Gemini 3 Pro by [@aaron-siegel](https://github.com/aaron-siegel) in [#1001](https://github.com/pixeltable/pixeltable/pull/1001)
* Support audio and video embedding indices by [@aaron-siegel](https://github.com/aaron-siegel) in [#990](https://github.com/pixeltable/pixeltable/pull/990)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.6...v0.5.7](https://github.com/pixeltable/pixeltable/compare/v0.5.6...v0.5.7)
***
### v0.5.6
**Released:** December 15, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.6](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.6)
#### What's Changed
* \[PXT-892] Support variable framerate in FrameIterator by [@aaron-siegel](https://github.com/aaron-siegel) in [#961](https://github.com/pixeltable/pixeltable/pull/961)
* \[PXT-875] Define GRAFANA\_INSTANCE\_ID for the perf job by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#989](https://github.com/pixeltable/pixeltable/pull/989)
* \[PXT-399] Remove pymupdf as a dependency by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#981](https://github.com/pixeltable/pixeltable/pull/981)
* Docs Cleanup + Cookbooks + Versioning/Lineage + Production for Workshop by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#964](https://github.com/pixeltable/pixeltable/pull/964)
* Iterators Refactor Part 1 by [@aaron-siegel](https://github.com/aaron-siegel) in [#992](https://github.com/pixeltable/pixeltable/pull/992)
* Update documentation for iterators and aggregate functions by [@aaron-siegel](https://github.com/aaron-siegel) in [#995](https://github.com/pixeltable/pixeltable/pull/995)
* PXT-910 Add presigned\_url udf by [@amithadke](https://github.com/amithadke) in [#991](https://github.com/pixeltable/pixeltable/pull/991)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.5...v0.5.6](https://github.com/pixeltable/pixeltable/compare/v0.5.5...v0.5.6)
***
### v0.5.5
**Released:** December 11, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.5](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.5)
#### What's Changed
* Multimodal support for Gemini `generate_content()` by [@aaron-siegel](https://github.com/aaron-siegel) in [#983](https://github.com/pixeltable/pixeltable/pull/983)
* PXT-903 Add UUID in pixeltable types by [@amithadke](https://github.com/amithadke) in [#979](https://github.com/pixeltable/pixeltable/pull/979)
* PXT-905/907: clean up handling of Huggingface datasets by [@mkornacker](https://github.com/mkornacker) in [#984](https://github.com/pixeltable/pixeltable/pull/984)
* Twelve Labs multimodal embeddings support by [@aaron-siegel](https://github.com/aaron-siegel) in [#987](https://github.com/pixeltable/pixeltable/pull/987)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.4...v0.5.5](https://github.com/pixeltable/pixeltable/compare/v0.5.4...v0.5.5)
***
### v0.5.4
**Released:** December 09, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.4](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.4)
#### What's Changed
* \[PXT-645] Support more numpy dtypes for Array by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#940](https://github.com/pixeltable/pixeltable/pull/940)
* Add working-with-voyageai tutorial notebook by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#978](https://github.com/pixeltable/pixeltable/pull/978)
* StringSplitter docstring fix plus test by [@mkornacker](https://github.com/mkornacker) in [#980](https://github.com/pixeltable/pixeltable/pull/980)
* \[PXT-875] performance test for openai endpoints by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#963](https://github.com/pixeltable/pixeltable/pull/963)
* Restructuring of docs site and repo by [@aaron-siegel](https://github.com/aaron-siegel) in [#982](https://github.com/pixeltable/pixeltable/pull/982)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.3...v0.5.4](https://github.com/pixeltable/pixeltable/compare/v0.5.3...v0.5.4)
***
### v0.5.3
**Released:** December 04, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.3](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.3)
#### What's Changed
* PXT-872 Support count() with sample and group by clause. by [@amithadke](https://github.com/amithadke) in [#955](https://github.com/pixeltable/pixeltable/pull/955)
* Add VOYAGE\_API\_KEY to CI and configuration.mdx; update uv.lock doctools reference by [@aaron-siegel](https://github.com/aaron-siegel) in [#976](https://github.com/pixeltable/pixeltable/pull/976)
* Fal.ai Integration by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#959](https://github.com/pixeltable/pixeltable/pull/959)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.2...v0.5.3](https://github.com/pixeltable/pixeltable/compare/v0.5.2...v0.5.3)
***
### v0.5.2
**Released:** December 03, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.2](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.2)
#### What's Changed
* Use database schemas and search\_path for test isolation in parallel runs by [@amithadke](https://github.com/amithadke) in [#953](https://github.com/pixeltable/pixeltable/pull/953)
* Working CI for Cockroach by [@aaron-siegel](https://github.com/aaron-siegel) in [#906](https://github.com/pixeltable/pixeltable/pull/906)
* Fix internal documentation links by [@aaron-siegel](https://github.com/aaron-siegel) in [#954](https://github.com/pixeltable/pixeltable/pull/954)
* \[PXT-886] Fix a bug in RateLimitsScheduler's error handling by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#951](https://github.com/pixeltable/pixeltable/pull/951)
* \[PXT-786] Development Guide by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#958](https://github.com/pixeltable/pixeltable/pull/958)
* Add Reve integration notebook by [@apreshill](https://github.com/apreshill) in [#939](https://github.com/pixeltable/pixeltable/pull/939)
* Adds support for Voyage AI embeddings and rerankers. by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#962](https://github.com/pixeltable/pixeltable/pull/962)
* Some rough-edges features/improvements by [@mkornacker](https://github.com/mkornacker) in [#967](https://github.com/pixeltable/pixeltable/pull/967)
* \[PXT-908] Ensure that generated Gemini videos have sound by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#973](https://github.com/pixeltable/pixeltable/pull/973)
* PXT-904: add MIME type for object uploads by [@mkornacker](https://github.com/mkornacker) in [#971](https://github.com/pixeltable/pixeltable/pull/971)
* Update uv.lock by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#974](https://github.com/pixeltable/pixeltable/pull/974)
* Add uv.lock validation to the pr tests by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#975](https://github.com/pixeltable/pixeltable/pull/975)
* Documentation and config updates by [@aaron-siegel](https://github.com/aaron-siegel) in [#972](https://github.com/pixeltable/pixeltable/pull/972)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.1...v0.5.2](https://github.com/pixeltable/pixeltable/compare/v0.5.1...v0.5.2)
***
### v0.5.1
**Released:** November 19, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.1](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.1)
#### What's Changed
* Add TableVersionMd.from\_dict and update publish to use objects instead of dicts by [@amithadke](https://github.com/amithadke) in [#944](https://github.com/pixeltable/pixeltable/pull/944)
* Publishing existing version returns 201, 204 does not allow any content to be sent back in body. by [@amithadke](https://github.com/amithadke) in [#948](https://github.com/pixeltable/pixeltable/pull/948)
* Replace StorageDestination with StorageTarget by [@amithadke](https://github.com/amithadke) in [#947](https://github.com/pixeltable/pixeltable/pull/947)
* Missing converter for schema change in PR 932 by [@mkornacker](https://github.com/mkornacker) in [#949](https://github.com/pixeltable/pixeltable/pull/949)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.0...v0.5.1](https://github.com/pixeltable/pixeltable/compare/v0.5.0...v0.5.1)
***
### v0.5.0
**Released:** November 18, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.0](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.0)
#### What's Changed
* Data sharing docs by [@apreshill](https://github.com/apreshill) in [#931](https://github.com/pixeltable/pixeltable/pull/931)
* Numerous documentation fixes by [@aaron-siegel](https://github.com/aaron-siegel) in [#933](https://github.com/pixeltable/pixeltable/pull/933)
* PXT-846: FrameIterator(keyframes\_only: bool) by [@mkornacker](https://github.com/mkornacker) in [#934](https://github.com/pixeltable/pixeltable/pull/934)
* \[PXT-809] Improve OpenAI rate limiting by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#912](https://github.com/pixeltable/pixeltable/pull/912)
* Multi-phase drop\_table() by [@mkornacker](https://github.com/mkornacker) in [#932](https://github.com/pixeltable/pixeltable/pull/932)
* Streamline Makefile by [@aaron-siegel](https://github.com/aaron-siegel) in [#937](https://github.com/pixeltable/pixeltable/pull/937)
* Changes to protocol to handle publishing existing version by [@amithadke](https://github.com/amithadke) in [#938](https://github.com/pixeltable/pixeltable/pull/938)
* PXT-871: == None filter doesn't work correctly on an array column by [@mkornacker](https://github.com/mkornacker) in [#941](https://github.com/pixeltable/pixeltable/pull/941)
* More documentation improvements by [@aaron-siegel](https://github.com/aaron-siegel) in [#936](https://github.com/pixeltable/pixeltable/pull/936)
* Circularity detection in view creation with if\_exists='replace' by [@aaron-siegel](https://github.com/aaron-siegel) in [#942](https://github.com/pixeltable/pixeltable/pull/942)
* Add Tigris integration by [@Xe](https://github.com/Xe) in [#935](https://github.com/pixeltable/pixeltable/pull/935)
* Improvements to notebook documentation by [@aaron-siegel](https://github.com/aaron-siegel) in [#943](https://github.com/pixeltable/pixeltable/pull/943)
* Improvements to retriable errors detection in RequestRateScheduler by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#922](https://github.com/pixeltable/pixeltable/pull/922)
* Rename `DataFrame` to `Query` and `DataFrameResultSet` to `ResultSet` by [@aaron-siegel](https://github.com/aaron-siegel) in [#902](https://github.com/pixeltable/pixeltable/pull/902)
* PXT-873: t.sample() fails on externalized array data by [@mkornacker](https://github.com/mkornacker) in [#945](https://github.com/pixeltable/pixeltable/pull/945)
#### New Contributors
* [@Xe](https://github.com/Xe) made their first contribution in [#935](https://github.com/pixeltable/pixeltable/pull/935)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.24...v0.5.0](https://github.com/pixeltable/pixeltable/compare/v0.4.24...v0.5.0)
***
### v0.4.24
**Released:** November 12, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.24](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.24)
#### What's Changed
* Update imagen model in tests and docs (3.0 is deprecated) by [@aaron-siegel](https://github.com/aaron-siegel) in [#929](https://github.com/pixeltable/pixeltable/pull/929)
* Allow hyphens in table and dir names by [@aaron-siegel](https://github.com/aaron-siegel) in [#926](https://github.com/pixeltable/pixeltable/pull/926)
* Skip download when replicating the same version of a table a second time by [@aaron-siegel](https://github.com/aaron-siegel) in [#927](https://github.com/pixeltable/pixeltable/pull/927)
* Several fixes and improvements for data sharing by [@aaron-siegel](https://github.com/aaron-siegel) in [#928](https://github.com/pixeltable/pixeltable/pull/928)
* PXT-862: bug fix for drop\_table() by [@mkornacker](https://github.com/mkornacker) in [#930](https://github.com/pixeltable/pixeltable/pull/930)
* Various docs updates by [@aaron-siegel](https://github.com/aaron-siegel) in [#923](https://github.com/pixeltable/pixeltable/pull/923)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.23...v0.4.24](https://github.com/pixeltable/pixeltable/compare/v0.4.23...v0.4.24)
***
### v0.4.23
**Released:** November 11, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.23](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.23)
#### What's Changed
* Add PIXELTABLE\_API\_KEY to CI environment by [@aaron-siegel](https://github.com/aaron-siegel) in [#914](https://github.com/pixeltable/pixeltable/pull/914)
* `create_store_tbls: bool` option in Catalog.create\_replica() by [@aaron-siegel](https://github.com/aaron-siegel) in [#916](https://github.com/pixeltable/pixeltable/pull/916)
* \[PXT-380] Remove NamedFunction object and related code in named\_function.py by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#911](https://github.com/pixeltable/pixeltable/pull/911)
* Switch to new random ops script in CI by [@aaron-siegel](https://github.com/aaron-siegel) in [#909](https://github.com/pixeltable/pixeltable/pull/909)
* \[PXT-799] Allow setting `fps` greater than the framerate of the video in `FrameIterator` by [@aaron-siegel](https://github.com/aaron-siegel) in [#918](https://github.com/pixeltable/pixeltable/pull/918)
* Intelligible error message when replicating a view of an existing original base table by [@aaron-siegel](https://github.com/aaron-siegel) in [#897](https://github.com/pixeltable/pixeltable/pull/897)
* \[PXT-837] Support creating/inserting directly from an existing Table by [@aaron-siegel](https://github.com/aaron-siegel) in [#919](https://github.com/pixeltable/pixeltable/pull/919)
* Add parameters to `make stresstest` by [@aaron-siegel](https://github.com/aaron-siegel) in [#920](https://github.com/pixeltable/pixeltable/pull/920)
* Introduce "anchor tables" in TableVersion(Handle) for live replicas; working pull() by [@aaron-siegel](https://github.com/aaron-siegel) in [#917](https://github.com/pixeltable/pixeltable/pull/917)
* Time travel for view over snapshot; replicas of view over snapshot by [@aaron-siegel](https://github.com/aaron-siegel) in [#924](https://github.com/pixeltable/pixeltable/pull/924)
* Proper display of embeddings by [@aaron-siegel](https://github.com/aaron-siegel) in [#925](https://github.com/pixeltable/pixeltable/pull/925)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.22...v0.4.23](https://github.com/pixeltable/pixeltable/compare/v0.4.22...v0.4.23)
***
### v0.4.22
**Released:** November 04, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.22](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.22)
#### What's Changed
* Manage `additional_md` from Catalog, rather than TableVersion by [@aaron-siegel](https://github.com/aaron-siegel) in [#913](https://github.com/pixeltable/pixeltable/pull/913)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.21...v0.4.22](https://github.com/pixeltable/pixeltable/compare/v0.4.21...v0.4.22)
***
### v0.4.21
**Released:** November 03, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.21](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.21)
#### What's Changed
* Hotfix for bug when publishing older versions of a table by [@aaron-siegel](https://github.com/aaron-siegel) in [#910](https://github.com/pixeltable/pixeltable/pull/910)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.20...v0.4.21](https://github.com/pixeltable/pixeltable/compare/v0.4.20...v0.4.21)
***
### v0.4.20
**Released:** November 03, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.20](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.20)
#### What's Changed
* pyscenedetect udfs by [@mkornacker](https://github.com/mkornacker) in [#899](https://github.com/pixeltable/pixeltable/pull/899)
* CockroachDB fixes + CI target by [@aaron-siegel](https://github.com/aaron-siegel) in [#900](https://github.com/pixeltable/pixeltable/pull/900)
* Add protocol for replica operations. by [@amithadke](https://github.com/amithadke) in [#819](https://github.com/pixeltable/pixeltable/pull/819)
* \[PXT-822, PXT-674] Fix for querying snapshots of tables with unstored columns by [@aaron-siegel](https://github.com/aaron-siegel) in [#895](https://github.com/pixeltable/pixeltable/pull/895)
* Switch to using random\_tbl\_ops\_2 in stress-tests by [@aaron-siegel](https://github.com/aaron-siegel) in [#898](https://github.com/pixeltable/pixeltable/pull/898)
* Fix nondeterminism in unit test by [@aaron-siegel](https://github.com/aaron-siegel) in [#905](https://github.com/pixeltable/pixeltable/pull/905)
* \[PXT-817] UDFs for reve.com by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#901](https://github.com/pixeltable/pixeltable/pull/901)
* \[PXT-826] Refactor index creation logic by [@aaron-siegel](https://github.com/aaron-siegel) in [#908](https://github.com/pixeltable/pixeltable/pull/908)
* UV\_OPTS in Makefile by [@aaron-siegel](https://github.com/aaron-siegel) in [#896](https://github.com/pixeltable/pixeltable/pull/896)
* Ignore additional\_mds when checking table or table version metadata by [@amithadke](https://github.com/amithadke) in [#903](https://github.com/pixeltable/pixeltable/pull/903)
* \[PXT-786] push() and pull() implementations by [@amithadke](https://github.com/amithadke) in [#907](https://github.com/pixeltable/pixeltable/pull/907)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.19...v0.4.20](https://github.com/pixeltable/pixeltable/compare/v0.4.19...v0.4.20)
***
### v0.4.19
**Released:** October 29, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.19](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.19)
#### What's Changed
* Add image recipes to cookbook by [@apreshill](https://github.com/apreshill) in [#857](https://github.com/pixeltable/pixeltable/pull/857)
* Add display-name to CI matrix (prep for testing global media destination) by [@aaron-siegel](https://github.com/aaron-siegel) in [#879](https://github.com/pixeltable/pixeltable/pull/879)
* Enable all media destinations in CI by [@aaron-siegel](https://github.com/aaron-siegel) in [#876](https://github.com/pixeltable/pixeltable/pull/876)
* \[PXT-814] UDF to encode a numpy array to an audio file by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#881](https://github.com/pixeltable/pixeltable/pull/881)
* Convert notebooks to use YAML frontmatter and fix formatting issues by [@goodlux](https://github.com/goodlux) in [#880](https://github.com/pixeltable/pixeltable/pull/880)
* Rename a public constant by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#884](https://github.com/pixeltable/pixeltable/pull/884)
* Multi-phase create\_table() by [@mkornacker](https://github.com/mkornacker) in [#854](https://github.com/pixeltable/pixeltable/pull/854)
* Initial integration of TwelveLabs Embed API by [@mkornacker](https://github.com/mkornacker) in [#885](https://github.com/pixeltable/pixeltable/pull/885)
* Fix `pxt.__version__` by [@aaron-siegel](https://github.com/aaron-siegel) in [#887](https://github.com/pixeltable/pixeltable/pull/887)
* Update many error messages for consistency by [@aaron-siegel](https://github.com/aaron-siegel) in [#869](https://github.com/pixeltable/pixeltable/pull/869)
* Replace `Optional[T]` with `T | None` (Python 3.10 style) throughout the codebase by [@aaron-siegel](https://github.com/aaron-siegel) in [#888](https://github.com/pixeltable/pixeltable/pull/888)
* Docs-related updates to Makefile and pyproject by [@aaron-siegel](https://github.com/aaron-siegel) in [#889](https://github.com/pixeltable/pixeltable/pull/889)
* \[PXT-685] Add `recompute_columns()` to computed columns fundamentals notebook by [@aaron-siegel](https://github.com/aaron-siegel) in [#892](https://github.com/pixeltable/pixeltable/pull/892)
* \[PXT-811, PXT-812] Improve two error messages with helpful hints by [@aaron-siegel](https://github.com/aaron-siegel) in [#891](https://github.com/pixeltable/pixeltable/pull/891)
* Revert two uses of `Optional` in unit tests by [@aaron-siegel](https://github.com/aaron-siegel) in [#893](https://github.com/pixeltable/pixeltable/pull/893)
* Dependency updates for Python 3.14 by [@aaron-siegel](https://github.com/aaron-siegel) in [#894](https://github.com/pixeltable/pixeltable/pull/894)
* Azure support by [@aaron-siegel](https://github.com/aaron-siegel) in [#886](https://github.com/pixeltable/pixeltable/pull/886)
* Default media destination as configuration parameter by [@aaron-siegel](https://github.com/aaron-siegel) in [#883](https://github.com/pixeltable/pixeltable/pull/883)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.18...v0.4.19](https://github.com/pixeltable/pixeltable/compare/v0.4.18...v0.4.19)
***
### v0.4.18
**Released:** October 22, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.18](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.18)
#### What's Changed
* Updates to nightly.yml by [@aaron-siegel](https://github.com/aaron-siegel) in [#866](https://github.com/pixeltable/pixeltable/pull/866)
* Streamline CI configs on PRs by [@aaron-siegel](https://github.com/aaron-siegel) in [#858](https://github.com/pixeltable/pixeltable/pull/858)
* Update WhisperX to >=3.7 and enable for Python 3.13 by [@aaron-siegel](https://github.com/aaron-siegel) in [#860](https://github.com/pixeltable/pixeltable/pull/860)
* elements parameter for DocSplitter by [@mkornacker](https://github.com/mkornacker) in [#865](https://github.com/pixeltable/pixeltable/pull/865)
* Fix examples docstring for add\_embedding\_index() by [@aaron-siegel](https://github.com/aaron-siegel) in [#871](https://github.com/pixeltable/pixeltable/pull/871)
* Improvements to random\_tbl\_ops script by [@aaron-siegel](https://github.com/aaron-siegel) in [#868](https://github.com/pixeltable/pixeltable/pull/868)
* Enforce `numpy>=2.2` by [@aaron-siegel](https://github.com/aaron-siegel) in [#872](https://github.com/pixeltable/pixeltable/pull/872)
* Segmentation-related improvements by [@mkornacker](https://github.com/mkornacker) in [#873](https://github.com/pixeltable/pixeltable/pull/873)
* Randomize the behavior of `sample()` in the case `seed=None` by [@aaron-siegel](https://github.com/aaron-siegel) in [#828](https://github.com/pixeltable/pixeltable/pull/828)
* \[PXT-729] Documentation deploy scripts for Mintlify website and local development by [@goodlux](https://github.com/goodlux) in [#867](https://github.com/pixeltable/pixeltable/pull/867)
* Properly reconstruct btree and vector indices when a replica is restored by [@aaron-siegel](https://github.com/aaron-siegel) in [#875](https://github.com/pixeltable/pixeltable/pull/875)
* Fix various errors and typos in README and the notebooks by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#877](https://github.com/pixeltable/pixeltable/pull/877)
* UDFs for Hugging Face Auto model integrations by [@aaron-siegel](https://github.com/aaron-siegel) in [#870](https://github.com/pixeltable/pixeltable/pull/870)
#### New Contributors
* [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) made their first contribution in [#877](https://github.com/pixeltable/pixeltable/pull/877)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.17...v0.4.18](https://github.com/pixeltable/pixeltable/compare/v0.4.17...v0.4.18)
***
### v0.4.17
**Released:** October 16, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.17](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.17)
#### What's Changed
* Update model used by Together AI tests by [@aaron-siegel](https://github.com/aaron-siegel) in [#846](https://github.com/pixeltable/pixeltable/pull/846)
* Fix broken links at the bottom of basics notebook by [@apreshill](https://github.com/apreshill) in [#844](https://github.com/pixeltable/pixeltable/pull/844)
* Retry failed notebook tests once in CI by [@aaron-siegel](https://github.com/aaron-siegel) in [#830](https://github.com/pixeltable/pixeltable/pull/830)
* feat(storage): add Backblaze B2 S3-compatible integration and tests by [@jeronimodeleon](https://github.com/jeronimodeleon) in [#840](https://github.com/pixeltable/pixeltable/pull/840)
* cockroachDB: Set null\_ordered\_last on session start. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#838](https://github.com/pixeltable/pixeltable/pull/838)
* cockroachDB: Explicit coercions for arithmetic ops. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#839](https://github.com/pixeltable/pixeltable/pull/839)
* Fix for isolated NB tests in CI by [@aaron-siegel](https://github.com/aaron-siegel) in [#847](https://github.com/pixeltable/pixeltable/pull/847)
* Notebook updates & OpenRouter notebook by [@aaron-siegel](https://github.com/aaron-siegel) in [#851](https://github.com/pixeltable/pixeltable/pull/851)
* ffmpeg with libx264 by [@mkornacker](https://github.com/mkornacker) in [#855](https://github.com/pixeltable/pixeltable/pull/855)
* Fixed incorrect documentation links by [@metadaddy](https://github.com/metadaddy) in [#859](https://github.com/pixeltable/pixeltable/pull/859)
* Update pixeltable-pgserver dependency to 0.4.0 by [@aaron-siegel](https://github.com/aaron-siegel) in [#853](https://github.com/pixeltable/pixeltable/pull/853)
* Support packaging of tables with embedding indices for data sharing by [@aaron-siegel](https://github.com/aaron-siegel) in [#841](https://github.com/pixeltable/pixeltable/pull/841)
* mode 'accurate' for VideoSplitter and segment\_video() by [@mkornacker](https://github.com/mkornacker) in [#856](https://github.com/pixeltable/pixeltable/pull/856)
* Added PDF-Page-Chunk-Extractor for image extraction (Issue 703) (PR 705) by [@kamir](https://github.com/kamir) in [#850](https://github.com/pixeltable/pixeltable/pull/850)
* Formatting fixes by [@aaron-siegel](https://github.com/aaron-siegel) in [#862](https://github.com/pixeltable/pixeltable/pull/862)
* Fix pyproject and mypy config by [@aaron-siegel](https://github.com/aaron-siegel) in [#863](https://github.com/pixeltable/pixeltable/pull/863)
* Fixes for load\_replica\_md() with non-snapshot tables by [@aaron-siegel](https://github.com/aaron-siegel) in [#861](https://github.com/pixeltable/pixeltable/pull/861)
* Correctly process cellmd in package/restore by [@aaron-siegel](https://github.com/aaron-siegel) in [#864](https://github.com/pixeltable/pixeltable/pull/864)
#### New Contributors
* [@jeronimodeleon](https://github.com/jeronimodeleon) made their first contribution in [#840](https://github.com/pixeltable/pixeltable/pull/840)
* [@metadaddy](https://github.com/metadaddy) made their first contribution in [#859](https://github.com/pixeltable/pixeltable/pull/859)
* [@kamir](https://github.com/kamir) made their first contribution in [#850](https://github.com/pixeltable/pixeltable/pull/850)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.16...v0.4.17](https://github.com/pixeltable/pixeltable/compare/v0.4.16...v0.4.17)
***
### v0.4.16
**Released:** October 08, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.16](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.16)
#### What's Changed
* Openrouter Integration by [@aaron-siegel](https://github.com/aaron-siegel) in [#825](https://github.com/pixeltable/pixeltable/pull/825)
* Concurrency fixes & random\_tbl\_ops v2 by [@aaron-siegel](https://github.com/aaron-siegel) in [#814](https://github.com/pixeltable/pixeltable/pull/814)
* Images and arrays in json structures, plus improved storage of array columns by [@mkornacker](https://github.com/mkornacker) in [#812](https://github.com/pixeltable/pixeltable/pull/812)
* Minimal edits to docstrings. by [@goodlux](https://github.com/goodlux) in [#813](https://github.com/pixeltable/pixeltable/pull/813)
* Add SDK documentation for Mintlify by [@goodlux](https://github.com/goodlux) in [#835](https://github.com/pixeltable/pixeltable/pull/835)
* Fix for performance problem when importing HF datasets by [@mkornacker](https://github.com/mkornacker) in [#833](https://github.com/pixeltable/pixeltable/pull/833)
* cockroachDB: div, mod operations SQL changed. Timestamp propagated through client stack by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#823](https://github.com/pixeltable/pixeltable/pull/823)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.15...v0.4.16](https://github.com/pixeltable/pixeltable/compare/v0.4.15...v0.4.16)
***
### v0.4.15
**Released:** October 01, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.15](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.15)
#### What's Changed
* Add a spot for the cookbook in docs/ by [@apreshill](https://github.com/apreshill) in [#815](https://github.com/pixeltable/pixeltable/pull/815)
* Fixes for notebook tests resource cleanup by [@aaron-siegel](https://github.com/aaron-siegel) in [#827](https://github.com/pixeltable/pixeltable/pull/827)
* Adding export\_lancedb() to API reference by [@mkornacker](https://github.com/mkornacker) in [#824](https://github.com/pixeltable/pixeltable/pull/824)
* Replace `create_replica()` with separate `publish()` and `replicate()` methods by [@aaron-siegel](https://github.com/aaron-siegel) in [#816](https://github.com/pixeltable/pixeltable/pull/816)
* PXT-638, PXT-675, PXT-682 Handle Keyboard exception by [@amithadke](https://github.com/amithadke) in [#803](https://github.com/pixeltable/pixeltable/pull/803)
* PXT-772 Filling in missing docstrings by [@goodlux](https://github.com/goodlux) in [#822](https://github.com/pixeltable/pixeltable/pull/822)
* with\_audio() udf by [@mkornacker](https://github.com/mkornacker) in [#826](https://github.com/pixeltable/pixeltable/pull/826)
#### New Contributors
* [@apreshill](https://github.com/apreshill) made their first contribution in [#815](https://github.com/pixeltable/pixeltable/pull/815)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.14...v0.4.15](https://github.com/pixeltable/pixeltable/compare/v0.4.14...v0.4.15)
***
### v0.4.14
**Released:** September 23, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.14](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.14)
#### What's Changed
* Proper implementation of package/restore for non-snapshot replicas by [@aaron-siegel](https://github.com/aaron-siegel) in [#797](https://github.com/pixeltable/pixeltable/pull/797)
* Set up pydoclint by [@aaron-siegel](https://github.com/aaron-siegel) in [#805](https://github.com/pixeltable/pixeltable/pull/805)
* upgrade mint.json -> docs.json by [@goodlux](https://github.com/goodlux) in [#809](https://github.com/pixeltable/pixeltable/pull/809)
* Enable a destination parameter on stored computed columns. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#766](https://github.com/pixeltable/pixeltable/pull/766)
* Add support for running tests with cockroachdb as backend by [@amithadke](https://github.com/amithadke) in [#811](https://github.com/pixeltable/pixeltable/pull/811)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.13...v0.4.14](https://github.com/pixeltable/pixeltable/compare/v0.4.13...v0.4.14)
***
### v0.4.13
**Released:** September 19, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.13](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.13)
#### What's Changed
* Added pxt.io.export\_lancedb() by [@mkornacker](https://github.com/mkornacker) in [#795](https://github.com/pixeltable/pixeltable/pull/795)
* Update README.md by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#801](https://github.com/pixeltable/pixeltable/pull/801)
* Use raw\.githubusercontent.com instead of raw\.github.com in tests by [@aaron-siegel](https://github.com/aaron-siegel) in [#806](https://github.com/pixeltable/pixeltable/pull/806)
* Simplify & generalize TableDataSource types by [@aaron-siegel](https://github.com/aaron-siegel) in [#804](https://github.com/pixeltable/pixeltable/pull/804)
* Short Sample App: CLI Media Toolkit for Multimodal Data Processing by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#802](https://github.com/pixeltable/pixeltable/pull/802)
* Table.get\_versions() by [@aaron-siegel](https://github.com/aaron-siegel) in [#800](https://github.com/pixeltable/pixeltable/pull/800)
* Fixes for nightly CI by [@aaron-siegel](https://github.com/aaron-siegel) in [#807](https://github.com/pixeltable/pixeltable/pull/807)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.12...v0.4.13](https://github.com/pixeltable/pixeltable/compare/v0.4.12...v0.4.13)
***
### v0.4.12
**Released:** September 05, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.12](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.12)
#### What's Changed
* Update model used by groq tests and examples by [@aaron-siegel](https://github.com/aaron-siegel) in [#790](https://github.com/pixeltable/pixeltable/pull/790)
* Clear TempStore, MediaStore, and HF cache after each test in CI by [@aaron-siegel](https://github.com/aaron-siegel) in [#792](https://github.com/pixeltable/pixeltable/pull/792)
* Explicitly install pixeltable in run-isolated-nb-tests.sh by [@aaron-siegel](https://github.com/aaron-siegel) in [#794](https://github.com/pixeltable/pixeltable/pull/794)
* Handle incomplete rate limit headers better by [@mkornacker](https://github.com/mkornacker) in [#788](https://github.com/pixeltable/pixeltable/pull/788)
* SDK changes/fixes for data sharing by [@aaron-siegel](https://github.com/aaron-siegel) in [#791](https://github.com/pixeltable/pixeltable/pull/791)
* Disable TestWhisperx on Linux w/ GPU by [@mkornacker](https://github.com/mkornacker) in [#789](https://github.com/pixeltable/pixeltable/pull/789)
* recompute\_columns(): added where parameter by [@mkornacker](https://github.com/mkornacker) in [#787](https://github.com/pixeltable/pixeltable/pull/787)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.11...v0.4.12](https://github.com/pixeltable/pixeltable/compare/v0.4.11...v0.4.12)
***
### v0.4.11
**Released:** August 29, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.11](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.11)
#### What's Changed
* missing .md for VideoSplitter by [@mkornacker](https://github.com/mkornacker) in [#784](https://github.com/pixeltable/pixeltable/pull/784)
* CI & dev environment enhancements by [@aaron-siegel](https://github.com/aaron-siegel) in [#785](https://github.com/pixeltable/pixeltable/pull/785)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.10...v0.4.11](https://github.com/pixeltable/pixeltable/compare/v0.4.10...v0.4.11)
***
### v0.4.10
**Released:** August 28, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.10](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.10)
#### What's Changed
* Fix local\_public\_names() to properly exclude private functions by [@goodlux](https://github.com/goodlux) in [#778](https://github.com/pixeltable/pixeltable/pull/778)
* Add .DS\_Store to .gitignore by [@goodlux](https://github.com/goodlux) in [#779](https://github.com/pixeltable/pixeltable/pull/779)
* More video built-ins by [@mkornacker](https://github.com/mkornacker) in [#768](https://github.com/pixeltable/pixeltable/pull/768)
* Add missing **all** to gemini and whisper modules by [@aaron-siegel](https://github.com/aaron-siegel) in [#781](https://github.com/pixeltable/pixeltable/pull/781)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.9...v0.4.10](https://github.com/pixeltable/pixeltable/compare/v0.4.9...v0.4.10)
***
### v0.4.9
**Released:** August 27, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.9](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.9)
#### What's Changed
* WhisperX Speaker Diarization by [@aaron-siegel](https://github.com/aaron-siegel) in [#770](https://github.com/pixeltable/pixeltable/pull/770)
* Basic support for concurrent pixeltable metadata creation/upgrade by [@amithadke](https://github.com/amithadke) in [#769](https://github.com/pixeltable/pixeltable/pull/769)
* Support for pydantic models in Table.insert() by [@mkornacker](https://github.com/mkornacker) in [#760](https://github.com/pixeltable/pixeltable/pull/760)
* Add comments for concurrent pixeltable initialization changes by [@amithadke](https://github.com/amithadke) in [#772](https://github.com/pixeltable/pixeltable/pull/772)
* Disable notebook tests that are failing in CI for unknown reasons by [@aaron-siegel](https://github.com/aaron-siegel) in [#777](https://github.com/pixeltable/pixeltable/pull/777)
* Publish the existing mypy plugin under `pixeltable.mypy` module to make it accessible for external use. by [@amithadke](https://github.com/amithadke) in [#776](https://github.com/pixeltable/pixeltable/pull/776)
* Remove `ext` package and fold contents into `functions` by [@aaron-siegel](https://github.com/aaron-siegel) in [#775](https://github.com/pixeltable/pixeltable/pull/775)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.8...v0.4.9](https://github.com/pixeltable/pixeltable/compare/v0.4.8...v0.4.9)
***
# Agentic Patterns
Source: https://docs.pixeltable.com/howto/cookbooks/agents/agentic-patterns
Implement reflection, planning, tool use, and multi-agent collaboration patterns in Pixeltable to build cognitive agents on tabular pipelines.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Two popular taxonomies describe the building blocks of agentic AI
systems:
* **Cognitive / reasoning-oriented** (Taxonomy 1): Reflection, Tool Use,
ReAct, Planning, Multi-Agent — asks *“how does the agent think?”*
* **Architectural / system-design-oriented** (Taxonomy 2): Prompt
Chaining, Routing, Parallelization, Tool Use, Evaluator-Optimizer,
Orchestrator-Worker — asks *“how do you wire LLM calls together?”*
(See [OpenAI’s Practical Guide to Building
Agents](https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf),
[Anthropic’s multi-agent research
system](https://www.anthropic.com/engineering/multi-agent-research-system),
and [Pydantic AI’s multi-agent
delegation](https://ai.pydantic.dev/multi-agent-applications/#agent-delegation).)
Mapping them against each other reveals:
The cleanest framing: **six architectural patterns** that describe how
you structure LLM calls, plus **two cross-cutting reasoning strategies**
(ReAct and Planning) that can be layered inside any of them.
This cookbook implements all eight in Pixeltable, where your agent *is*
a table:
## Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions import openai
pxt.drop_dir('agentic_patterns', force=True)
pxt.create_dir('agentic_patterns')
```
Created directory 'agentic\_patterns'.
\
## Pattern 1: Prompt Chaining
Break a complex task into sequential steps, where each step’s output
feeds the next.
**Imperative approach:** a chain of function calls or an explicit
pipeline object. **Pixeltable approach:** each step is a computed
column. The engine resolves dependencies automatically.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table with a single input column
chain = pxt.create_table('agentic_patterns/chain', {'topic': pxt.String})
```
Created table 'chain'.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Step 1: generate an outline
chain.add_computed_column(
outline_response=openai.chat_completions(
messages=[
{
'role': 'user',
'content': 'Create a 3-point outline for a short article about: '
+ chain.topic,
}
],
model='gpt-4o-mini',
)
)
chain.add_computed_column(
outline=chain.outline_response.choices[0].message.content.astype(
pxt.String
)
)
```
Added 0 column values with 0 errors in 0.00 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Step 2: write a draft from the outline
chain.add_computed_column(
draft_response=openai.chat_completions(
messages=[
{
'role': 'user',
'content': 'Write a short article (2-3 paragraphs) based on this outline:\n\n'
+ chain.outline,
}
],
model='gpt-4o-mini',
)
)
chain.add_computed_column(
draft=chain.draft_response.choices[0].message.content.astype(
pxt.String
)
)
```
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Step 3: polish the draft
chain.add_computed_column(
polish_response=openai.chat_completions(
messages=[
{
'role': 'user',
'content': 'Edit this article for clarity and conciseness. '
'Return only the improved text:\n\n' + chain.draft,
}
],
model='gpt-4o-mini',
)
)
chain.add_computed_column(
final_article=chain.polish_response.choices[0].message.content.astype(
pxt.String
)
)
```
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert a topic — all three steps execute automatically
chain.insert([{'topic': 'the benefits of declarative AI pipelines'}])
chain.select(
chain.topic, chain.outline, chain.draft, chain.final_article
).collect()
```
Inserted 1 row with 0 errors in 14.58 s (0.07 rows/s)
Every intermediate result (`outline`, `draft`, `final_article`) is
persisted in the table. Inserting another topic reuses the same pipeline
— no code changes needed. If the same topic is inserted again, cached
results are returned instantly.
## Pattern 2: Routing
Classify an input and route it to a specialized handler. This is the
agent equivalent of a switch/case statement.
**Imperative approach:** a triage agent that performs handoffs to
specialized agents. **Pixeltable approach:** one computed column
classifies; a UDF selects the prompt; a second LLM call generates the
response.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Step 1: classify the query intent
router.add_computed_column(
classify_response=openai.chat_completions(
messages=[
{
'role': 'user',
'content': 'Classify this customer query into exactly one category: '
'technical, billing, or general. Reply with the single word only.\n\n'
'Query: ' + router.query,
}
],
model='gpt-4o-mini',
)
)
router.add_computed_column(
intent=router.classify_response.choices[0].message.content.astype(
pxt.String
)
)
```
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Step 2: route to a specialized system prompt based on the classification
@pxt.udf
def route_prompt(intent: str, query: str) -> list[dict]:
"""Select a system prompt based on the classified intent."""
system_prompts = {
'technical': 'You are a senior technical support engineer. '
'Provide precise, step-by-step troubleshooting guidance.',
'billing': 'You are a billing specialist. '
'Be empathetic and clear about charges, refunds, and payment options.',
'general': 'You are a friendly customer service representative. '
'Answer helpfully and concisely.',
}
# Default to general if classification is unexpected
system = system_prompts.get(
intent.strip().lower(), system_prompts['general']
)
return [
{'role': 'system', 'content': system},
{'role': 'user', 'content': query},
]
router.add_computed_column(
routed_messages=route_prompt(router.intent, router.query)
)
```
Added 0 column values with 0 errors in 0.01 s
No rows affected.
Added 0 column values with 0 errors in 0.00 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert queries spanning different intents
router.insert(
[
{
'query': 'My API calls are returning 429 errors since this morning'
},
{'query': 'I was charged twice for my subscription last month'},
{'query': 'What programming languages do you support?'},
]
)
router.select(router.query, router.intent, router.response).collect()
```
Inserted 3 rows with 0 errors in 6.93 s (0.43 rows/s)
Each query was classified and then handled by a specialized system
prompt. The `intent` column is inspectable for every row, making it easy
to audit routing decisions.
## Pattern 3: Parallelization
Run multiple independent LLM calls on the same input simultaneously,
then combine the results.
**Imperative approach:** `asyncio.gather` or thread pools. **Pixeltable
approach:** add independent computed columns. The engine parallelizes
them automatically because they share no dependencies.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Three independent LLM calls — Pixeltable runs them in parallel automatically
parallel.add_computed_column(
sentiment_raw=openai.chat_completions(
messages=[
{
'role': 'user',
'content': 'Analyze the sentiment of this text. '
'Reply with: positive, negative, or neutral.\n\n'
+ parallel.text,
}
],
model='gpt-4o-mini',
)
)
parallel.add_computed_column(
sentiment=parallel.sentiment_raw.choices[0].message.content.astype(
pxt.String
)
)
parallel.add_computed_column(
entities_raw=openai.chat_completions(
messages=[
{
'role': 'user',
'content': 'Extract all named entities (people, companies, locations) '
'from this text. Return a comma-separated list.\n\n'
+ parallel.text,
}
],
model='gpt-4o-mini',
)
)
parallel.add_computed_column(
entities=parallel.entities_raw.choices[0].message.content.astype(
pxt.String
)
)
parallel.add_computed_column(
summary_raw=openai.chat_completions(
messages=[
{
'role': 'user',
'content': 'Summarize this text in one sentence.\n\n'
+ parallel.text,
}
],
model='gpt-4o-mini',
)
)
parallel.add_computed_column(
summary=parallel.summary_raw.choices[0].message.content.astype(
pxt.String
)
)
```
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.00 s
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.00 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Merge the parallel results into a single structured report
@pxt.udf
def merge_analysis(sentiment: str, entities: str, summary: str) -> dict:
"""Combine parallel analysis results into one report."""
return {
'sentiment': sentiment.strip(),
'entities': entities.strip(),
'summary': summary.strip(),
}
parallel.add_computed_column(
report=merge_analysis(
parallel.sentiment, parallel.entities, parallel.summary
)
)
```
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
parallel.insert(
[
{
'text': 'Apple announced record quarterly revenue of $124 billion, '
'driven by strong iPhone sales in Europe and Asia. CEO Tim Cook '
"expressed optimism about the company's AI initiatives, while "
'some analysts remain cautious about increased R&D spending.'
}
]
)
parallel.select(
parallel.text, parallel.sentiment, parallel.entities, parallel.summary
).collect()
```
The three LLM calls (`sentiment`, `entities`, `summary`) have no
dependency on each other, so Pixeltable dispatches them concurrently.
The `merge_analysis` UDF waits for all three before combining the
results. No async code required.
## Pattern 4: Tool Use
Give an LLM access to external functions it can call to gather
information or take action.
**Imperative approach:** `@function_tool` decorator, tool loop that
re-prompts until the LLM stops requesting tools. **Pixeltable
approach:** `pxt.tools()` bundles UDFs into tool definitions;
`invoke_tools()` executes the LLM’s choices — both as computed columns.
For a deeper walkthrough including MCP servers, see [Use tool calling
with
LLMs](/howto/cookbooks/agents/llm-tool-calling).
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Define tool functions as UDFs
@pxt.udf
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
weather_data = {
'new york': 'Sunny, 72F',
'london': 'Cloudy, 58F',
'tokyo': 'Rainy, 65F',
'paris': 'Partly cloudy, 68F',
}
return weather_data.get(
city.lower(), f'Weather data not available for {city}'
)
@pxt.udf
def get_stock_price(symbol: str) -> str:
"""Get the current stock price for a ticker symbol."""
prices = {'AAPL': '$178.50', 'GOOGL': '$141.25', 'MSFT': '$378.90'}
return prices.get(symbol.upper(), f'Price not available for {symbol}')
# Bundle into a Tools object
tools = pxt.tools(get_weather, get_stock_price)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create the tool-calling pipeline
tool_agent = pxt.create_table(
'agentic_patterns/tool_agent', {'query': pxt.String}
)
# LLM decides which tool(s) to call
tool_agent.add_computed_column(
response=openai.chat_completions(
messages=[{'role': 'user', 'content': tool_agent.query}],
model='gpt-4o-mini',
tools=tools,
)
)
# Execute the tool calls automatically
tool_agent.add_computed_column(
tool_output=openai.invoke_tools(tools, tool_agent.response)
)
```
Created table 'tool\_agent'.
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tool_agent.insert(
[
{'query': "What's the weather in Tokyo?"},
{'query': "What's Apple's stock price?"},
{
'query': "What's the weather in Paris and Microsoft's stock price?"
},
]
)
for row in tool_agent.select(
tool_agent.query, tool_agent.tool_output
).collect():
print(f'Query: {row["query"]}')
for tool_name, results in (row['tool_output'] or {}).items():
if results:
print(f' -> {tool_name}: {results}')
print()
```
The LLM chose which tools to invoke (including multiple tools for the
last query). `invoke_tools()` executed them and stored results. The full
LLM response is also persisted in the `response` column for debugging.
## Pattern 5: Evaluator-Optimizer
One LLM generates output, a second LLM evaluates it, and the results are
used to decide whether to refine. This is the architectural cousin of
the *Reflection* pattern from Taxonomy 1 — an agent critiques its own
output and iteratively improves it.
**Imperative approach:** a while-loop that re-prompts until a quality
threshold is met (see [Pixelagent’s reflection
example](https://github.com/pixeltable/pixelagent/tree/main/examples/reflection)).
**Pixeltable approach:** chained computed columns — generate, evaluate,
then conditionally refine. The evaluation score is stored alongside the
content for analysis.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Step 1: generate initial marketing copy
evaluator.add_computed_column(
gen_response=openai.chat_completions(
messages=[
{
'role': 'user',
'content': 'Write a short marketing tagline (one sentence) for this product:\n\n'
+ evaluator.product_brief,
}
],
model='gpt-4o-mini',
)
)
evaluator.add_computed_column(
first_draft=evaluator.gen_response.choices[0].message.content.astype(
pxt.String
)
)
```
Added 0 column values with 0 errors in 0.00 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Step 2: evaluate the draft with an LLM-as-judge
evaluator.add_computed_column(
eval_response=openai.chat_completions(
messages=[
{
'role': 'user',
'content': 'Rate this marketing tagline on a scale of 1-10 for clarity, '
'creativity, and persuasiveness. Then provide one sentence of feedback '
'for improvement.\n\n'
'Tagline: ' + evaluator.first_draft + '\n\n'
'Reply in this exact format:\n'
'Score: \nFeedback: ',
}
],
model='gpt-4o-mini',
)
)
evaluator.add_computed_column(
evaluation=evaluator.eval_response.choices[0].message.content.astype(
pxt.String
)
)
```
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Step 3: refine using the feedback
evaluator.add_computed_column(
refine_response=openai.chat_completions(
messages=[
{
'role': 'user',
'content': 'Improve this marketing tagline based on the feedback below. '
'Return only the improved tagline.\n\n'
'Original: ' + evaluator.first_draft + '\n\n'
'Feedback: ' + evaluator.evaluation,
}
],
model='gpt-4o-mini',
)
)
evaluator.add_computed_column(
refined=evaluator.refine_response.choices[0].message.content.astype(
pxt.String
)
)
```
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
evaluator.insert(
[
{
'product_brief': 'A noise-canceling headphone designed for open-plan offices, '
'with 30-hour battery life and a built-in microphone for calls.'
},
{
'product_brief': 'An AI-powered code review tool that catches bugs, suggests '
"improvements, and learns your team's coding style over time."
},
]
)
evaluator.select(
evaluator.product_brief,
evaluator.first_draft,
evaluator.evaluation,
evaluator.refined,
).collect()
```
Inserted 2 rows with 0 errors in 2.95 s (0.68 rows/s)
Both the first draft and the refined version are stored side-by-side
with the evaluation. This makes it straightforward to compare outputs,
audit the judge’s reasoning, or filter rows where the score fell below a
threshold.
## Pattern 6: Orchestrator-Worker
A central agent decomposes a task, delegates sub-tasks to specialized
worker agents, and synthesizes the results. This is the architectural
cousin of the *Multi-Agent* pattern from Taxonomy 1, and the same
structure Anthropic uses in their [multi-agent research
system](https://www.anthropic.com/engineering/multi-agent-research-system)
— a lead agent coordinates parallel subagents, each with their own
context and tools.
**Imperative approach:** an orchestrator agent class that spawns worker
agent instances and collects their outputs. **Pixeltable approach:**
each worker is a table with computed columns, wrapped as a callable
function via `pxt.udf(table, return_value=...)`. The orchestrator table
calls these functions as computed columns.
input → decompose → worker A (summarizer) ─┐
→ worker B (fact-checker) ─┼→ synthesize → output
For more on table UDFs, see [Use a table pipeline as a reusable
function](/howto/cookbooks/agents/pattern-table-as-udf).
### Build worker agents as tables
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Worker A: summarizer
summarizer_tbl = pxt.create_table(
'agentic_patterns/summarizer', {'text': pxt.String}
)
summarizer_tbl.add_computed_column(
response=openai.chat_completions(
messages=[
{
'role': 'user',
'content': 'Summarize this text in 2-3 sentences:\n\n'
+ summarizer_tbl.text,
}
],
model='gpt-4o-mini',
)
)
summarizer_tbl.add_computed_column(
summary=summarizer_tbl.response.choices[0].message.content.astype(
pxt.String
)
)
# Wrap as a callable function
summarize = pxt.udf(summarizer_tbl, return_value=summarizer_tbl.summary)
```
Created table 'summarizer'.
Added 0 column values with 0 errors in 0.10 s
Added 0 column values with 0 errors in 0.06 s
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Worker B: fact-checker
checker_tbl = pxt.create_table(
'agentic_patterns/checker', {'claim': pxt.String}
)
checker_tbl.add_computed_column(
response=openai.chat_completions(
messages=[
{
'role': 'user',
'content': 'Assess whether this claim is plausible. '
'Reply with: PLAUSIBLE or DUBIOUS, followed by a one-sentence explanation.\n\n'
'Claim: ' + checker_tbl.claim,
}
],
model='gpt-4o-mini',
)
)
checker_tbl.add_computed_column(
assessment=checker_tbl.response.choices[0].message.content.astype(
pxt.String
)
)
# Wrap as a callable function
fact_check = pxt.udf(checker_tbl, return_value=checker_tbl.assessment)
```
Created table 'checker'.
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.02 s
### Build the orchestrator
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Orchestrator table: delegates to workers, then synthesizes
orchestrator = pxt.create_table(
'agentic_patterns/orchestrator', {'article': pxt.String}
)
# Dispatch to worker A (summarizer) and worker B (fact-checker) in parallel
orchestrator.add_computed_column(
summary=summarize(text=orchestrator.article)
)
orchestrator.add_computed_column(
fact_check_result=fact_check(claim=orchestrator.article)
)
```
Created table 'orchestrator'.
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Synthesize worker outputs into a final briefing
orchestrator.add_computed_column(
synth_response=openai.chat_completions(
messages=[
{
'role': 'user',
'content': 'Based on the summary and fact-check below, write a brief '
'editorial note (2-3 sentences) about this article.\n\n'
'Summary: ' + orchestrator.summary + '\n\n'
'Fact-check: ' + orchestrator.fact_check_result,
}
],
model='gpt-4o-mini',
)
)
orchestrator.add_computed_column(
briefing=orchestrator.synth_response.choices[
0
].message.content.astype(pxt.String)
)
```
Added 0 column values with 0 errors in 0.02 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
orchestrator.insert(
[
{
'article': 'A recent study published in Nature found that global sea levels '
'rose by 4.5 mm per year over the last decade, nearly double the rate observed '
'in the 1990s. Researchers attribute the acceleration primarily to ice sheet '
'loss in Greenland and Antarctica, compounded by thermal expansion of ocean '
'water. The findings suggest coastal cities may face significant flooding risks '
'by 2050 without aggressive mitigation strategies.'
}
]
)
orchestrator.select(
orchestrator.summary,
orchestrator.fact_check_result,
orchestrator.briefing,
).collect()
```
Inserted 1 row with 0 errors in 4.69 s (0.21 rows/s)
The orchestrator table called two independent worker pipelines
(`summarize` and `fact_check`), each backed by their own table with full
intermediate-result persistence. The synthesis step consumed both
outputs to produce the final briefing. Adding a new worker (e.g., a tone
analyzer) requires only creating another table, wrapping it with
`pxt.udf()`, and adding one more computed column to the orchestrator.
## Strategy A: ReAct
ReAct is not a wiring pattern — it is a **reasoning strategy** that can
be applied inside any of the six patterns above. The agent alternates
between reasoning about the next step and acting on it (typically via
tools), observing the result before deciding what to do next.
**Imperative approach:** a while-loop that parses the LLM’s
THOUGHT/ACTION output, calls tools, and feeds observations back (see
[Pixelagent’s ReAct
example](https://github.com/pixeltable/pixelagent/tree/main/examples/planning)).
**Pixeltable approach:** the reasoning loop lives in a UDF that inserts
rows into a tool-calling table and reads back results. The table stores
every thought-action-observation triple for full observability.
question → \[THOUGHT → ACTION → OBSERVATION] × N → final answer
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import re
# Define a tool for the ReAct agent
@pxt.udf
def lookup_population(country: str) -> str:
"""Look up the approximate population of a country."""
populations = {
'united states': '331 million',
'china': '1.4 billion',
'india': '1.4 billion',
'germany': '84 million',
'brazil': '214 million',
'japan': '125 million',
}
return populations.get(
country.lower(), f'Population data not available for {country}'
)
react_tools = pxt.tools(lookup_population)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Build a tool-calling table that the ReAct loop will insert into
react_steps = pxt.create_table(
'agentic_patterns/react_steps',
{'step': pxt.Int, 'prompt': pxt.String, 'system_prompt': pxt.String},
)
react_steps.add_computed_column(
response=openai.chat_completions(
messages=[
{'role': 'system', 'content': react_steps.system_prompt},
{'role': 'user', 'content': react_steps.prompt},
],
model='gpt-4o-mini',
tools=react_tools,
)
)
react_steps.add_computed_column(
answer=react_steps.response.choices[0].message.content.astype(
pxt.String
)
)
react_steps.add_computed_column(
tool_output=openai.invoke_tools(react_tools, react_steps.response)
)
```
Created table 'react\_steps'.
Added 0 column values with 0 errors in 0.00 s
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.00 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# The ReAct loop: reason → act → observe, repeated until done
REACT_SYSTEM = (
"You are a research assistant. Answer the user's question step by step.\n"
'Available tools: lookup_population\n\n'
'On each turn, respond in this exact format:\n'
'THOUGHT: \n'
'ACTION: \n\n'
'When ACTION is FINAL, include your final answer after it.\n'
'Current step: {step} of {max_steps}.'
)
question = 'Which country has a larger population, Brazil or Germany?'
max_steps = 4
history = []
for step in range(1, max_steps + 1):
# Build prompt with accumulated observations
prompt = question
if history:
prompt += '\n\nPrevious observations:\n' + '\n'.join(history)
system = REACT_SYSTEM.format(step=step, max_steps=max_steps)
react_steps.insert(
[{'step': step, 'prompt': prompt, 'system_prompt': system}]
)
# Read back the result for this step
row = (
react_steps.where(react_steps.step == step)
.select(react_steps.answer, react_steps.tool_output)
.collect()
)
answer_text = row['answer'][0] or ''
tool_out = row['tool_output'][0]
# Record observation from tool output (if any)
if tool_out:
history.append(f'Step {step} tool result: {tool_out}')
# Check if the agent decided to finalize
if 'FINAL' in answer_text.upper():
break
print(f'Completed in {step} steps')
for row in react_steps.select(
react_steps.step, react_steps.answer, react_steps.tool_output
).collect():
print(f'Step {row["step"]}:')
if row['answer']:
print(f' {row["answer"][:200]}')
for tool_name, results in (row['tool_output'] or {}).items():
if results:
print(f' -> {tool_name}: {results}')
print()
```
Every thought, action, and observation is persisted as a row in the
`react_steps` table. The loop itself is plain Python; the LLM calls and
tool execution happen declaratively via computed columns. This makes the
reasoning trace fully queryable after the fact — useful for debugging or
evaluation.
## Strategy B: Planning
Planning is the second cross-cutting reasoning strategy. Instead of
acting step-by-step (ReAct), the agent first generates a complete plan,
then executes each step. This is especially effective for complex tasks
where the structure of the solution can be determined upfront.
**Imperative approach:** an LLM generates a plan as structured JSON,
then a loop executes each step (see [Pixelagent’s planning
example](https://github.com/pixeltable/pixelagent/tree/main/examples/planning)).
**Pixeltable approach:** a prompt-chaining pipeline where the first
column generates the plan and a UDF parses it into executable steps.
Each step then feeds into subsequent computed columns.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import json as json_mod
planner = pxt.create_table(
'agentic_patterns/planner', {'question': pxt.String}
)
# Step 1: generate a plan as structured JSON
planner.add_computed_column(
plan_response=openai.chat_completions(
messages=[
{
'role': 'user',
'content': 'Break this question into 2-3 research steps. '
'Return ONLY a JSON object like {"steps": ["sub-question 1", "sub-question 2"]}. '
'No other text.\n\n'
'Question: ' + planner.question,
}
],
model='gpt-4o-mini',
)
)
planner.add_computed_column(
plan_text=planner.plan_response.choices[0].message.content.astype(
pxt.String
)
)
```
Created table 'planner'.
Added 0 column values with 0 errors in 0.00 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Step 2: parse the plan and execute each sub-question, then synthesize
@pxt.udf
def execute_plan(plan_json: str, original_question: str) -> list[dict]:
"""Parse the plan JSON and return structured sub-questions."""
try:
data = json_mod.loads(plan_json)
# Handle both {"steps": [...]} and direct [...]
steps = (
data
if isinstance(data, list)
else data.get('steps', data.get('questions', []))
)
return [
{'step': i + 1, 'sub_question': q}
for i, q in enumerate(steps)
]
except (json_mod.JSONDecodeError, TypeError):
return [{'step': 1, 'sub_question': original_question}]
planner.add_computed_column(
plan_steps=execute_plan(planner.plan_text, planner.question)
)
```
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Step 3: execute the plan — answer each sub-question, then synthesize
@pxt.udf
def format_plan_for_execution(
plan_steps: list[dict], original_question: str
) -> str:
"""Format the plan steps into a single execution prompt."""
step_list = '\n'.join(
f'{s["step"]}. {s["sub_question"]}' for s in plan_steps
)
return (
f'Answer each of these research sub-questions briefly, '
f'then provide a final synthesis that answers the original question.\n\n'
f'Original question: {original_question}\n\n'
f'Sub-questions:\n{step_list}'
)
planner.add_computed_column(
exec_prompt=format_plan_for_execution(
planner.plan_steps, planner.question
)
)
planner.add_computed_column(
exec_response=openai.chat_completions(
messages=[{'role': 'user', 'content': planner.exec_prompt}],
model='gpt-4o-mini',
)
)
planner.add_computed_column(
final_answer=planner.exec_response.choices[0].message.content.astype(
pxt.String
)
)
```
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
planner.insert(
[
{
'question': 'What are the economic and environmental trade-offs of electric vehicles vs hydrogen fuel cells?'
}
]
)
row = planner.select(
planner.question, planner.plan_text, planner.final_answer
).collect()
print('Plan:', row['plan_text'][0])
print()
print('Answer:', row['final_answer'][0][:500])
```
The plan (stored in `plan_steps`) is fully inspectable. The execution
step answers all sub-questions in a single LLM call, but this could also
use parallelization (Pattern 3) to answer each sub-question
independently and merge the results. Planning and ReAct compose
naturally with any of the six architectural patterns.
## Choosing a Pattern
### Six architectural patterns
### Two cross-cutting reasoning strategies
Patterns compose naturally. An orchestrator-worker system might use
routing in the orchestrator, tool use within a worker, and ReAct
reasoning inside the tool-calling loop. Because each pattern is just a
set of computed columns on a table, combining them requires no special
glue code.
## See Also
**Pixeltable cookbooks:**
* [Use tool calling with
LLMs](/howto/cookbooks/agents/llm-tool-calling)
— deep dive into `pxt.tools()`, `invoke_tools()`, and MCP server
integration
* [Build an agent with persistent
memory](/howto/cookbooks/agents/pattern-agent-memory)
— embedding indexes for semantic memory recall
* [Build a RAG
pipeline](/howto/cookbooks/agents/pattern-rag-pipeline)
— document chunking, embedding, and retrieval-augmented generation
* [Look up structured data with retrieval
UDFs](/howto/cookbooks/agents/pattern-data-lookup)
— `pxt.retrieval_udf()` for key-based lookups
* [Use a table pipeline as a reusable
function](/howto/cookbooks/agents/pattern-table-as-udf)
— `pxt.udf(table)` explained in depth
**Pixelagent examples** (imperative implementations of the same
patterns):
* [Reflection
loop](https://github.com/pixeltable/pixelagent/tree/main/examples/reflection)
— main agent + critic agent with iterative refinement
* [ReAct /
Planning](https://github.com/pixeltable/pixelagent/tree/main/examples/planning)
— step-by-step reasoning with tool calls
* [Tool
calling](https://github.com/pixeltable/pixelagent/tree/main/examples/tool-calling)
— OpenAI, Anthropic, and Bedrock tool integration
* [Memory](https://github.com/pixeltable/pixelagent/tree/main/examples/memory)
— persistent and semantic memory management
**External references:**
* [OpenAI’s Practical Guide to Building
Agents](https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf)
— the six architectural patterns
* [Anthropic: How we built our multi-agent research
system](https://www.anthropic.com/engineering/multi-agent-research-system)
— orchestrator-worker at scale
* [Pydantic AI: Multi-agent
applications](https://ai.pydantic.dev/multi-agent-applications/#agent-delegation)
— agent delegation patterns
# Use tool calling and MCP servers with LLMs
Source: https://docs.pixeltable.com/howto/cookbooks/agents/llm-tool-calling
Connect LLMs to Pixeltable UDFs, queries, and external MCP servers so agents can search data, call APIs, and execute typed Python tools.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Enable LLMs to call functions and tools, then execute the results
automatically.
## Problem
You want an LLM to decide which functions to call based on user
queries—for agents, chatbots, or automated workflows.
## Solution
**What’s in this recipe:**
* Define tools as Python functions
* Let LLMs decide which tool to call
* Automatically execute tool calls with `invoke_tools`
* Use MCP servers to load external tools
You define tools with JSON schemas, pass them to the LLM, and use
`invoke_tools` to execute the function calls.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai mcp
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions import openai
# Create a fresh directory
pxt.drop_dir('tools_demo', force=True)
pxt.create_dir('tools_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'tools\_demo'.
\
### Define tools as UDFs
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Define tool functions as Pixeltable UDFs
@pxt.udf
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
# In production, call a real weather API
weather_data = {
'new york': 'Sunny, 72°F',
'london': 'Cloudy, 58°F',
'tokyo': 'Rainy, 65°F',
'paris': 'Partly cloudy, 68°F',
}
return weather_data.get(
city.lower(), f'Weather data not available for {city}'
)
@pxt.udf
def get_stock_price(symbol: str) -> str:
"""Get the current stock price for a symbol."""
# In production, call a real stock API
prices = {
'AAPL': '$178.50',
'GOOGL': '$141.25',
'MSFT': '$378.90',
'AMZN': '$185.30',
}
return prices.get(symbol.upper(), f'Price not available for {symbol}')
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a Tools object with our functions
tools = pxt.tools(get_weather, get_stock_price)
```
### Create tool-calling pipeline
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create table for queries
queries = pxt.create_table('tools_demo/queries', {'query': pxt.String})
```
Created table 'queries'.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add LLM call with tools
queries.add_computed_column(
response=openai.chat_completions(
messages=[{'role': 'user', 'content': queries.query}],
model='gpt-4o-mini',
tools=tools, # Pass tools to the LLM
)
)
```
Added 0 column values with 0 errors in 0.00 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Automatically execute tool calls and get results
queries.add_computed_column(
tool_results=openai.invoke_tools(tools, queries.response)
)
```
Added 0 column values with 0 errors in 0.01 s
No rows affected.
### Run tool-enabled queries
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert queries that require tool calls
sample_queries = [
{'query': "What's the weather in Tokyo?"},
{'query': "What's the stock price of Apple?"},
{
'query': "What's the weather in Paris and the price of Microsoft stock?"
},
]
queries.insert(sample_queries)
```
Inserted 3 rows with 0 errors in 4.16 s (0.72 rows/s)
3 rows inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View results
queries.select(queries.query, queries.tool_results).collect()
```
## Using MCP Servers as Tools
The [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) is
an open protocol that standardizes how applications provide context to
LLMs. Pixeltable can connect to MCP servers and use their exposed tools
as UDFs.
### Why MCP?
### Create an MCP Server
First, create an MCP server with tools you want to expose. Save this as
`mcp_server.py`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from mcp.server.fastmcp import FastMCP
mcp = FastMCP('PixeltableDemo', stateless_http=True)
@mcp.tool()
def calculate_discount(price: float, discount_percent: float) -> float:
"""Calculate the discounted price."""
return price * (1 - discount_percent / 100)
@mcp.tool()
def check_inventory(product_id: str) -> str:
"""Check inventory status for a product."""
# In production, query your inventory database
inventory = {
'SKU001': 'In stock (42 units)',
'SKU002': 'Low stock (3 units)',
'SKU003': 'Out of stock',
}
return inventory.get(product_id, f'Unknown product: {product_id}')
if __name__ == '__main__':
mcp.run(transport='streamable-http')
```
Run the server: `python mcp_server.py` (it will listen on
`http://localhost:8000/mcp`)
### Connect to MCP Server and Use Tools
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Connect to the MCP server using pxt.mcp_udfs()
# This creates a Pixeltable UDF for each tool exposed by the server
# See: https://docs.pixeltable.com/platform/custom-functions#5-mcp-udfs
mcp_tools = pxt.mcp_udfs('https://docs.pixeltable.com/mcp')
# View available tools - each is now a callable Pixeltable function
for tool in mcp_tools:
print(f'- {tool.name}: {tool.comment()}')
```
- SearchPixeltableDocumentation: Search across the Pixeltable Documentation knowledge base to find relevant information, code examples, API references, and guides. Use this tool when you need to answer questions about Pixeltable Documentation, find specific documentation, understand how features work, or locate implementation details. The search returns contextual content with titles and direct links to the documentation pages.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Bundle MCP tools for LLM use
mcp_toolset = pxt.tools(*mcp_tools)
# Create a table with MCP tool-calling pipeline
mcp_queries = pxt.create_table(
'tools_demo/mcp_queries', {'query': pxt.String}
)
# Add LLM call with MCP tools
mcp_queries.add_computed_column(
response=openai.chat_completions(
messages=[{'role': 'user', 'content': mcp_queries.query}],
model='gpt-4o-mini',
tools=mcp_toolset,
)
)
# Execute MCP tool calls
mcp_queries.add_computed_column(
tool_results=openai.invoke_tools(mcp_toolset, mcp_queries.response)
)
# View the schema - note that mcp_toolset is stored as persistent metadata
# Every subsequent insert will use these same tools automatically
mcp_queries.describe()
```
Created table 'mcp\_queries'.
Added 0 column values with 0 errors in 0.00 s
Added 0 column values with 0 errors in 0.01 s
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Test with e-commerce queries
mcp_queries.insert(
[
{'query': 'What is Pixeltable?'},
{'query': 'How to use OpenAI in Pixeltable?'},
]
)
mcp_queries.select(mcp_queries.query, mcp_queries.tool_results).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Extract the search result with a named column
mcp_queries.select(
search_result=mcp_queries.tool_results[
'SearchPixeltableDocumentation'
][0]
).collect()
```
## Explanation
**Tool calling flow:**
MCP Server → pxt.mcp\_udfs() → pxt.tools() → LLM tool calling
MCP servers expose tools via a standardized protocol. Pixeltable’s
`mcp_udfs()` connects to any MCP server and returns the tools as
callable UDFs that can be bundled with `pxt.tools()` for LLM use.
**Supported providers:**
## See also
* [Build a RAG
pipeline](/howto/cookbooks/agents/pattern-rag-pipeline) -
Retrieval-augmented generation
* [Run local
LLMs](/howto/providers/working-with-ollama) -
Local model inference
* [Multimodal MCP Servers](/libraries/mcp) -
Pixeltable’s MCP server collection
* [Custom
Functions](/platform/custom-functions) -
More about UDFs and MCP integration
# Build an agent with memory
Source: https://docs.pixeltable.com/howto/cookbooks/agents/pattern-agent-memory
Give LLM agents persistent short-term and long-term memory in Pixeltable using conversation tables, summarization, and embedding-based recall.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Create an AI agent that remembers important information across
conversations.
## Problem
You want to build an AI agent that can store and recall important
information—user preferences, key facts, or context from previous
conversations.
## Solution
**What’s in this recipe:**
* Store memories with embeddings for semantic search
* Retrieve relevant memories based on conversation context
* Use `@pxt.query` for retrieval functions
This pattern is inspired by
[Pixelbot](https://github.com/pixeltable/pixelbot) and
[Pixelmemory](https://github.com/pixeltable/pixelmemory).
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
from datetime import datetime
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.openai import chat_completions, embeddings
# Create a fresh directory
pxt.drop_dir('agent_demo', force=True)
pxt.create_dir('agent_demo')
```
Created directory 'agent\_demo'.
\
### Create memory bank
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create memory bank table
memories = pxt.create_table(
'agent_demo/memories',
{
'content': pxt.String, # The memory content
'category': pxt.String, # Optional category (preference, fact, etc.)
'created_at': pxt.Timestamp, # When the memory was stored
},
)
```
Created table 'memories'.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add embedding index for semantic search on content
memories.add_embedding_index(
column='content',
string_embed=embeddings.using(model='text-embedding-3-small'),
)
```
### Define retrieval function
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Define a query function to retrieve relevant memories
@pxt.query
def recall_memories(context: str, top_k: int = 3):
"""Retrieve memories relevant to the current context."""
sim = memories.content.similarity(string=context)
return (
memories.where(sim > 0.5)
.order_by(sim, asc=False)
.limit(top_k)
.select(content=memories.content, category=memories.category)
)
```
### Store some memories
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Store some initial memories
initial_memories = [
{
'content': 'User prefers Python for data analysis',
'category': 'preference',
'created_at': datetime.now(),
},
{
'content': 'The project deadline is March 15, 2024',
'category': 'fact',
'created_at': datetime.now(),
},
{
'content': 'User works at a startup in San Francisco',
'category': 'fact',
'created_at': datetime.now(),
},
{
'content': 'Budget for the ML project is $50,000',
'category': 'fact',
'created_at': datetime.now(),
},
{
'content': 'User prefers concise explanations over detailed ones',
'category': 'preference',
'created_at': datetime.now(),
},
]
memories.insert(initial_memories)
```
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Build prompt with memories
@pxt.udf
def build_memory_prompt(
user_message: str, relevant_memories: list[dict]
) -> str:
memory_text = '\n'.join(
[f'- {m["content"]}' for m in relevant_memories]
)
return f"""You are a helpful assistant with access to the following memories about the user:
{memory_text}
Use these memories to personalize your response when relevant.
User: {user_message}
Assistant:"""
conversations.add_computed_column(
prompt=build_memory_prompt(
conversations.user_message, conversations.relevant_memories
)
)
```
Added 0 column values with 0 errors.
No rows affected.
Added 0 column values with 0 errors.
Added 0 column values with 0 errors.
No rows affected.
### Chat with memory-aware agent
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Test the memory-aware agent
test_messages = [
{
'user_message': 'What programming language should I use for this project?'
},
{'user_message': 'When do I need to finish this?'},
{'user_message': 'How much can I spend on cloud resources?'},
]
conversations.insert(test_messages)
```
User Message → Retrieve Memories → Build Prompt → LLM Response
↓
Memory Bank (with embeddings)
**Key components:**
**Adding new memories:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
memories.insert([{
'content': 'New information to remember',
'category': 'fact',
'created_at': datetime.now()
}])
```
## See also
* [Build a RAG
pipeline](/howto/cookbooks/agents/pattern-rag-pipeline) -
Document retrieval
* [Use tool
calling](/howto/cookbooks/agents/llm-tool-calling) -
Function calling with LLMs
* [Pixelbot](https://github.com/pixeltable/pixelbot) - Full agent
implementation
# Look up structured data with retrieval UDFs
Source: https://docs.pixeltable.com/howto/cookbooks/agents/pattern-data-lookup
Expose Pixeltable tables as retrieval UDFs so LLM agents can look up structured rows by key, run filters, and return typed results as tools.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Create lookup functions that query tables by key—for customer records,
product catalogs, or financial data.
## Problem
You have structured data—customer records, product catalogs, financial
data—and need to look up rows by key values. Common scenarios:
## Solution
**What’s in this recipe:**
* Create lookup functions from tables with `retrieval_udf`
* Query by single or multiple keys
* Use lookups in computed columns for data enrichment
Use `pxt.retrieval_udf(table)` to automatically create a function that
queries the table by its columns.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Create a fresh directory
pxt.drop_dir('lookup_demo', force=True)
pxt.create_dir('lookup_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'lookup\_demo'.
\
Created table 'products'.
Inserting rows into \`products\`: 5 rows \[00:00, 502.31 rows/s]
Inserted 5 rows with 0 errors.
### Create a lookup function with retrieval\_udf
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a lookup function that searches by SKU
get_product = pxt.retrieval_udf(
products,
name='get_product',
description='Look up a product by its SKU code',
parameters=['sku'], # Only use SKU as the lookup key
limit=1, # Return at most 1 result
)
# Check the function signature
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Look up a product by SKU
result = products.select(get_product(sku='LAPTOP-001')).limit(1).collect()
```
### Look up by category (multiple results)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a category lookup (returns multiple products)
get_by_category = pxt.retrieval_udf(
products,
name='get_by_category',
description='Get all products in a category',
parameters=['category'],
limit=10, # Return up to 10 products
)
# Find all electronics
products.select(get_by_category(category='electronics')).limit(
1
).collect()
```
### Use lookups for data enrichment
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create an orders table
orders = pxt.create_table(
'lookup_demo/orders',
{
'order_id': pxt.String,
'product_sku': pxt.String,
'quantity': pxt.Int,
},
)
orders.insert(
[
{
'order_id': 'ORD-001',
'product_sku': 'LAPTOP-001',
'quantity': 2,
},
{
'order_id': 'ORD-002',
'product_sku': 'PHONE-001',
'quantity': 1,
},
{
'order_id': 'ORD-003',
'product_sku': 'CHAIR-001',
'quantity': 4,
},
]
)
```
Created table 'orders'.
Inserting rows into \`orders\`: 3 rows \[00:00, 1186.28 rows/s]
Inserted 3 rows with 0 errors.
3 rows inserted, 6 values computed.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add a computed column that enriches orders with product details
orders.add_computed_column(
product_info=get_product(sku=orders.product_sku)
)
# View enriched orders
orders.select(
orders.order_id,
orders.product_sku,
orders.quantity,
orders.product_info,
).collect()
```
Added 3 column values with 0 errors.
## Explanation
**`retrieval_udf` parameters:**
**Use cases:**
**Tips:**
* Use `limit=1` for unique key lookups
* Specify only needed columns in `parameters` for cleaner APIs
* Add descriptions for LLM tool integration
## See also
* [Use tool calling with
LLMs](/howto/cookbooks/agents/llm-tool-calling) -
Use retrieval UDFs as LLM tools
* [Build a RAG
pipeline](/howto/cookbooks/agents/pattern-rag-pipeline) -
Semantic search with `@pxt.query`
# Build a RAG pipeline
Source: https://docs.pixeltable.com/howto/cookbooks/agents/pattern-rag-pipeline
Build a complete RAG pipeline in Pixeltable with document ingestion, chunking, embeddings, semantic search, and LLM answer generation.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Create a retrieval-augmented generation system that answers questions
using your documents as context.
## Problem
You want an LLM to answer questions using your specific documents—not
just its training data. You need to retrieve relevant context and
include it in the prompt.
## Solution
**What’s in this recipe:**
* Embed and index documents for retrieval
* Create a query function that retrieves context
* Generate answers grounded in your documents
You build a pipeline that: (1) embeds documents, (2) finds relevant
chunks for a query, and (3) generates an answer using those chunks as
context.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.openai import chat_completions, embeddings
# Create a fresh directory
pxt.drop_dir('rag_demo', force=True)
pxt.create_dir('rag_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'rag\_demo'.
\
### Step 1: create document store with embeddings
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create table for document chunks
chunks = pxt.create_table(
'rag_demo/chunks', {'doc_id': pxt.String, 'chunk_text': pxt.String}
)
```
Created table 'chunks'.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add embedding index for semantic search
chunks.add_embedding_index(
column='chunk_text',
string_embed=embeddings.using(model='text-embedding-3-small'),
)
```
### Step 2: load documents
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Sample knowledge base (in production, load from files/database)
documents = [
{
'doc_id': 'password-reset',
'chunk_text': 'To reset your password, go to the login page and click "Forgot Password". Enter your email address and you will receive a reset link within 5 minutes. The link expires after 24 hours.',
},
{
'doc_id': 'password-reset',
'chunk_text': 'Password requirements: minimum 8 characters, at least one uppercase letter, one number, and one special character. Passwords expire every 90 days for security.',
},
{
'doc_id': 'account-settings',
'chunk_text': 'To update your profile, navigate to Settings > Account. You can change your display name, email address, and notification preferences. Changes take effect immediately.',
},
{
'doc_id': 'billing',
'chunk_text': 'Billing occurs on the first of each month. You can view invoices under Settings > Billing. To change your payment method, click "Update Payment" and enter your new card details.',
},
{
'doc_id': 'api-access',
'chunk_text': 'API keys can be generated in Settings > Developer. Each key has configurable permissions. Rate limits are 1000 requests per minute for standard plans, 10000 for enterprise.',
},
]
chunks.insert(documents)
```
### Step 3: create the RAG query function
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Define a query function that retrieves context
@pxt.query
def retrieve_context(query: str, top_k: int = 3):
"""Retrieve the most relevant chunks for a query."""
sim = chunks.chunk_text.similarity(string=query)
return (
chunks.where(sim > 0.5)
.order_by(sim, asc=False)
.limit(top_k)
.select(doc_id=chunks.doc_id, text=chunks.chunk_text)
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View retrieved context for a query
query = 'What are the key features?'
context_chunks = retrieve_context(query)
context_chunks
```
retrieve\_context('What are the key features?')
### Step 4: generate answers with context
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table for questions/answers
qa = pxt.create_table('rag_demo/qa', {'question': pxt.String})
```
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Build the RAG prompt
@pxt.udf
def build_rag_prompt(question: str, context: list[dict]) -> str:
context_text = '\n\n'.join(
[f'[{c["doc_id"]}]: {c["text"]}' for c in context]
)
return f"""Answer the question based only on the provided context. If the context doesn't contain the answer, say "I don't have information about that."
Context:
{context_text}
Question: {question}
Answer:"""
qa.add_computed_column(prompt=build_rag_prompt(qa.question, qa.context))
```
Added 0 column values with 0 errors.
No rows affected.
Added 0 column values with 0 errors.
Added 0 column values with 0 errors.
No rows affected.
### Ask questions
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert questions
questions = [
{'question': 'How do I reset my password?'},
{'question': 'What are the API rate limits?'},
{'question': 'When am I billed?'},
]
qa.insert(questions)
```
Question → Embed → Retrieve similar chunks → Build prompt with context → Generate answer
**Key components:**
**Scaling tips:**
* Use `doc-chunk-for-rag` recipe to split long documents
* Adjust `top_k` to balance context size vs. relevance
* Consider metadata filtering for large knowledge bases
## See also
* [Chunk documents for
RAG](/howto/cookbooks/text/doc-chunk-for-rag) -
Split documents into chunks
* [Create text
embeddings](/howto/cookbooks/search/embed-text-openai) -
Embedding fundamentals
* [Semantic text
search](/howto/cookbooks/search/search-semantic-text) -
Search patterns
# Use a table pipeline as a reusable function
Source: https://docs.pixeltable.com/howto/cookbooks/agents/pattern-table-as-udf
Wrap an entire Pixeltable table pipeline as a reusable function that agents and other tables can call with typed inputs and outputs.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Convert a table with computed columns into a callable function for
multi-agent workflows and pipeline composition.
## Problem
You have a table that runs a complex pipeline—LLM calls, tool use,
post-processing—and you want to reuse that entire pipeline from other
tables. Copy-pasting computed column definitions is error-prone and hard
to maintain.
## Solution
**What’s in this recipe:**
* Create an “agent” table with computed columns
* Convert the table to a callable UDF with
`pxt.udf(table, return_value=...)`
* Use the table UDF in other tables’ computed columns
You wrap an entire table pipeline as a function. When you call this
function from another table, it inserts a row into the agent table, runs
all computed columns, and returns the specified output column.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.openai import chat_completions
# Create a fresh directory
pxt.drop_dir('table_udf_demo', force=True)
pxt.create_dir('table_udf_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'table\_udf\_demo'.
\
### Create an agent table with computed columns
You create a table that encapsulates a complete pipeline. This example
builds a summarization agent:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create the agent table with input column
summarizer = pxt.create_table(
'table_udf_demo/summarizer', {'text': pxt.String}
)
```
Created table 'summarizer'.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add the LLM call as a computed column
summarizer.add_computed_column(
response=chat_completions(
messages=[
{
'role': 'user',
'content': 'Summarize this in one sentence:\n\n'
+ summarizer.text,
}
],
model='gpt-4o-mini',
)
)
```
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Extract the summary text
summarizer.add_computed_column(
summary=summarizer.response.choices[0].message.content
)
```
Added 0 column values with 0 errors.
No rows affected.
### Convert the table to a UDF
You use `pxt.udf(table, return_value=...)` to convert the table into a
callable function. The `return_value` specifies which column to return:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Convert the summarizer table into a callable UDF
summarize = pxt.udf(summarizer, return_value=summarizer.summary)
```
### Use the table UDF in another table
You can now use `summarize()` as a computed column in any other table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table that uses the summarizer
articles = pxt.create_table(
'table_udf_demo/articles',
{'title': pxt.String, 'content': pxt.String},
)
```
Created table 'articles'.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add the table UDF as a computed column
articles.add_computed_column(summary=summarize(text=articles.content))
```
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert articles - summaries are generated automatically
articles.insert(
[
{
'title': 'Climate Report',
'content': 'Global temperatures rose by 1.2 degrees Celsius above pre-industrial levels last year, marking the hottest year on record. Scientists attribute this to continued greenhouse gas emissions and a strong El Nino pattern. The report calls for immediate action to reduce carbon emissions.',
},
{
'title': 'Tech Merger',
'content': 'Two major semiconductor companies announced a merger valued at $50 billion. The combined entity will control 30% of the global chip market. Regulators in multiple countries will review the deal over the next 18 months.',
},
]
)
```
Consumer table row → Table UDF called → Agent table inserts row →
Computed columns run → Return value extracted → Consumer gets result
**When to use table UDFs vs `@pxt.query`:**
**Key benefits:**
* **Encapsulation**: Hide complex pipeline details behind a simple
function call
* **Reusability**: Use the same agent from multiple consumer tables
* **Persistence**: All intermediate results are stored in the agent
table for debugging
* **Composition**: Chain agents together for multi-stage workflows
## See also
* [Look up structured
data](/howto/cookbooks/agents/pattern-data-lookup) -
Simple key-based lookups with `retrieval_udf`
* [Build a RAG
pipeline](/howto/cookbooks/agents/pattern-rag-pipeline) -
Retrieval with `@pxt.query`
* [Use tool calling with
LLMs](/howto/cookbooks/agents/llm-tool-calling) -
Add tools to agent tables
# Extract audio from video
Source: https://docs.pixeltable.com/howto/cookbooks/audio/audio-extract-from-video
Extract audio tracks from video files in Pixeltable using FFmpeg-backed UDFs for transcription, music analysis, and audio ML pipelines.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Extract the audio track from video files for transcription, analysis, or
processing.
## Problem
You have video files but need to work with just the audio track—for
transcription, speaker analysis, or audio processing. Extracting audio
manually with ffmpeg is tedious and doesn’t integrate with your data
pipeline.
## Solution
**What’s in this recipe:**
* Extract audio from video as a computed column
* Choose audio format (mp3, wav, flac)
* Chain with transcription for automatic video-to-text
You use the `extract_audio` function to create an audio column from
video. This integrates seamlessly with transcription and other audio
processing.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable boto3 'numpy<2.4'
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.video import extract_audio
# Create a fresh directory
pxt.drop_dir('audio_extract_demo', force=True)
pxt.create_dir('audio_extract_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'audio\_extract\_demo'.
\
### Extract audio from video
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create table for videos
videos = pxt.create_table(
'audio_extract_demo/videos', {'title': pxt.String, 'video': pxt.Video}
)
```
Created table 'videos'.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add computed column to extract audio as MP3
videos.add_computed_column(
audio=extract_audio(videos.video, format='mp3')
)
```
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert a sample video (from multimedia-commons with audio)
video_url = 's3://multimedia-commons/data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4'
videos.insert([{'title': 'Sample Video', 'video': video_url}])
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View results
videos.select(videos.title, videos.audio).collect()
```
### Chain with transcription
Add transcription as a follow-up computed column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Install whisper for transcription
%pip install -qU openai-whisper
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import whisper
# Add transcription of the extracted audio
videos.add_computed_column(
transcription=whisper.transcribe(videos.audio, model='base.en')
)
```
Added 1 column value with 0 errors.
1 row updated, 1 value computed.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Extract the transcript text
videos.add_computed_column(transcript=videos.transcription.text)
```
Added 1 column value with 0 errors.
1 row updated, 1 value computed.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View the full pipeline results
videos.select(videos.title, videos.transcript).collect()
```
## Explanation
**Audio format options:**
**Pipeline flow:**
Video → extract\_audio → Audio → whisper.transcribe → Transcript
Each step is a computed column. When you insert a new video:
1. Audio is extracted automatically
2. Whisper transcribes the audio
3. All results are cached for future queries
## See also
* [Transcribe
audio](/howto/cookbooks/audio/audio-transcribe) -
Audio-only transcription
* [Summarize
podcasts](/howto/cookbooks/audio/audio-summarize-podcast) -
Transcribe and summarize
* [Extract video
frames](/howto/cookbooks/video/video-extract-frames) -
Work with video frames
# Summarize podcasts and audio
Source: https://docs.pixeltable.com/howto/cookbooks/audio/audio-summarize-podcast
Build a podcast summarization pipeline in Pixeltable that transcribes audio with Whisper and generates structured summaries with an LLM.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Transcribe audio files and generate summaries automatically using
Whisper and LLMs.
## Problem
You have podcast episodes, meeting recordings, or interviews that need
both transcription and summarization. Doing this manually is
time-consuming and doesn’t scale.
## Solution
**What’s in this recipe:**
* Transcribe audio with Whisper (runs locally)
* Generate summaries with an LLM
* Chain transcription → summarization automatically
You create a pipeline where audio is transcribed first, then the
transcript is summarized. Both steps run automatically when you insert
new audio files.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai-whisper openai
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions import openai, whisper
# Create a fresh directory
pxt.drop_dir('podcast_demo', force=True)
pxt.create_dir('podcast_demo')
```
Created directory 'podcast\_demo'.
\
### Create the pipeline
Create a table with audio input, then add computed columns for
transcription and summarization:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create table for audio files
podcasts = pxt.create_table(
'podcast_demo/episodes', {'title': pxt.String, 'audio': pxt.Audio}
)
```
Created table 'episodes'.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Step 1: Transcribe with local Whisper (uses GPU if available)
podcasts.add_computed_column(
transcription=whisper.transcribe(podcasts.audio, model='base.en')
)
```
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Extract the text from transcription result (cast to String for concatenation)
podcasts.add_computed_column(
transcript_text=podcasts.transcription.text.astype(pxt.String)
)
```
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Step 2: Summarize the transcript with OpenAI
summary_prompt = (
"""Summarize this transcript in 2-3 sentences, then list 3 key points.
Transcript:
"""
+ podcasts.transcript_text
)
podcasts.add_computed_column(
summary_response=openai.chat_completions(
messages=[{'role': 'user', 'content': summary_prompt}],
model='gpt-4o-mini',
)
)
```
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Extract summary text from response
podcasts.add_computed_column(
summary=podcasts.summary_response.choices[0].message.content
)
```
Added 0 column values with 0 errors.
No rows affected.
### Process audio files
Insert audio files and watch the pipeline run automatically:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert sample audio
audio_url = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/10-minute%20tour%20of%20Pixeltable.mp3'
podcasts.insert([{'title': 'Pixeltable Tour', 'audio': audio_url}])
```
Each step is a computed column that depends on the previous one. When
you insert a new audio file, all steps run automatically in sequence.
**Whisper model options:**
For production with varied audio quality, use `small.en` or larger.
## See also
* [Transcribe
audio](/howto/cookbooks/audio/audio-transcribe) -
Basic audio transcription
* [Summarize
text](/howto/cookbooks/text/text-summarize) -
Text summarization patterns
# Convert text to speech
Source: https://docs.pixeltable.com/howto/cookbooks/audio/audio-text-to-speech
Generate speech audio from text columns in Pixeltable using OpenAI, ElevenLabs, and other TTS providers via declarative computed columns.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Generate natural-sounding audio from text using OpenAI’s text-to-speech
models.
## Problem
You need to convert text content into spoken audio—for accessibility,
content repurposing, or voice applications.
## Solution
**What’s in this recipe:**
* Generate speech with OpenAI TTS
* Choose from multiple voice options
* Store text and audio together
You add a computed column that converts text to audio. The audio is
cached and only regenerated when the source text changes.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.openai import speech
# Create a fresh directory
pxt.drop_dir('tts_demo', force=True)
pxt.create_dir('tts_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'tts\_demo'.
\
Added 0 column values with 0 errors.
No rows affected.
### Generate audio
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert sample articles
sample_articles = [
{
'title': 'Welcome to AI',
'content': 'Artificial intelligence is transforming how we work and live. From smart assistants to autonomous vehicles, AI is becoming part of our daily lives.',
},
{
'title': 'Getting Started',
'content': 'To begin your journey with machine learning, start by understanding the basics of data preparation and model training.',
},
]
articles.insert(sample_articles)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View articles with generated audio
articles.select(
articles.title, articles.content, articles.audio
).collect()
```
## Explanation
**OpenAI TTS models:**
**Voice options:**
**Tips:**
* Use `tts-1` for drafts and real-time applications
* Use `tts-1-hd` for final production audio
* Audio is cached—no regeneration on queries
## See also
* [Transcribe
audio](/howto/cookbooks/audio/audio-transcribe) -
Convert audio to text
* [Summarize
podcasts](/howto/cookbooks/audio/audio-summarize-podcast) -
Transcribe and summarize audio
# Transcribe audio files with Whisper
Source: https://docs.pixeltable.com/howto/cookbooks/audio/audio-transcribe
Transcribe audio and video files into searchable text columns in Pixeltable using Whisper, WhisperX, and other speech-to-text models.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Convert speech to text locally using OpenAI’s open-source Whisper
model—no API key needed.
## Problem
You have audio or video files that need transcription. Long files are
memory-intensive to process at once, so you need to split them into
manageable segments.
## Solution
**What’s in this recipe:**
* Transcribe audio files locally with Whisper (no API key)
* Automatically segment long files
* Extract and transcribe audio from videos
You create a view with `audio_splitter` to break long files into
segments, then add a computed column for transcription. Whisper runs
locally on your machine—no API calls needed.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai-whisper
```
### Load audio files
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions import whisper
from pixeltable.functions.audio import audio_splitter
# Create a fresh directory
pxt.drop_dir('audio_demo', force=True)
pxt.create_dir('audio_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Converting metadata from version 45 to 46
Created directory 'audio\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert a sample audio file (video files also work - audio is extracted automatically)
audio.insert(
[
{
'audio': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/audio-transcription-demo/Lex-Fridman-Podcast-430-Excerpt-0.mp4'
}
]
)
```
Inserted 1 row with 0 errors in 1.05 s (0.95 rows/s)
1 row inserted.
### Split into segments
Create a view that splits audio into 30-second segments with overlap:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Split audio into segments for transcription
segments = pxt.create_view(
'audio_demo/segments',
audio,
iterator=audio_splitter(
audio.audio,
duration=30.0, # 30-second segments
overlap=2.0, # 2-second overlap for context
min_segment_duration=5.0, # Drop segments shorter than 5 seconds
),
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View the segments
segments.select(segments.segment_start, segments.segment_end).collect()
```
### Transcribe with Whisper
Add a computed column that transcribes each segment:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add transcription column (runs locally - no API key needed)
segments.add_computed_column(
transcription=whisper.transcribe(
audio=segments.audio_segment,
model='base.en', # Options: tiny.en, base.en, small.en, medium.en, large
)
)
```
Added 2 column values with 0 errors in 3.35 s (0.60 rows/s)
2 rows updated.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Extract just the text
segments.add_computed_column(text=segments.transcription.text)
```
Added 2 column values with 0 errors in 0.06 s (31.82 rows/s)
2 rows updated.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View transcriptions with timestamps
segments.select(
segments.segment_start, segments.segment_end, segments.text
).collect()
```
## Explanation
**Whisper models:**
Models ending in `.en` are English-only and faster. Remove `.en` for
multilingual support.
**audio\_splitter parameters:**
**Video files work too:**
When you insert a video file, Pixeltable automatically extracts the
audio track.
## See also
* [Iterators
documentation](/platform/iterators)
* [Whisper library](https://github.com/openai/whisper)
# Create custom aggregate functions (UDAs)
Source: https://docs.pixeltable.com/howto/cookbooks/core/custom-aggregates-uda
Define user-defined aggregate functions (UDAs) in Pixeltable to compute custom group-by statistics over rows with init, update, and value steps.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Build reusable aggregation logic for group-by queries and analytics.
## Problem
You need aggregations beyond the built-in `sum`, `count`, `mean`, `min`,
`max` — such as collecting values into a list, concatenating strings, or
computing custom statistics.
## Solution
**What’s in this recipe:**
* Define a UDA (User-Defined Aggregate) with the `@pxt.uda` decorator
* Use UDAs in `group_by` queries
* Create UDAs with multiple inputs
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
pxt.drop_dir('uda_demo', force=True)
pxt.create_dir('uda_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'uda\_demo'.
\
Created table 'sales'.
Inserting rows into \`sales\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`sales\`: 6 rows \[00:00, 609.56 rows/s]
Inserted 6 rows with 0 errors.
### Variance UDA (not built-in)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# A UDA is a class that inherits from pxt.Aggregator
# It must implement: __init__, update, and value
@pxt.uda
class variance(pxt.Aggregator):
"""Compute population variance using Welford's online algorithm."""
def __init__(self):
self.count = 0
self.mean = 0.0
self.m2 = 0.0 # Sum of squared differences from mean
def update(self, val: float) -> None:
if val is not None:
self.count += 1
delta = val - self.mean
self.mean += delta / self.count
delta2 = val - self.mean
self.m2 += delta * delta2
def value(self) -> float:
if self.count < 1:
return 0.0
return self.m2 / self.count # Population variance
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Use like any built-in aggregate
sales.select(variance(sales.amount)).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Use in group_by queries
sales.group_by(sales.region).select(
sales.region, amount_variance=variance(sales.amount)
).collect()
```
### String concatenation UDA
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.uda
class string_agg(pxt.Aggregator):
"""Concatenate strings with a comma separator."""
def __init__(self):
self.values = []
def update(self, val: str) -> None:
if val is not None:
self.values.append(val)
def value(self) -> str:
return ', '.join(self.values)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# List all products sold in each region
sales.group_by(sales.region).select(
sales.region, products=string_agg(sales.product)
).collect()
```
### Collect values into a list
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.uda
class collect_list(pxt.Aggregator):
"""Collect all values into a list."""
def __init__(self):
self.items = []
def update(self, val: float) -> None:
if val is not None:
self.items.append(val)
def value(self) -> list[float]:
return self.items
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Get all amounts per region as a list
sales.group_by(sales.region).select(
sales.region, amounts=collect_list(sales.amount)
).collect()
```
### Weighted average UDA
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.uda
class weighted_avg(pxt.Aggregator):
"""Compute weighted average: sum(value * weight) / sum(weight)."""
def __init__(self):
self.weighted_sum = 0.0
self.weight_sum = 0.0
def update(self, value: float, weight: float) -> None:
if value is not None and weight is not None:
self.weighted_sum += value * weight
self.weight_sum += weight
def value(self) -> float:
if self.weight_sum == 0:
return 0.0
return self.weighted_sum / self.weight_sum
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Compute quantity-weighted average price per region
sales.group_by(sales.region).select(
sales.region, avg_price=weighted_avg(sales.amount, sales.quantity)
).collect()
```
### Mode UDA (most frequent value)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from collections import Counter
@pxt.uda
class mode(pxt.Aggregator):
"""Find the most frequent value in a group."""
def __init__(self):
self.counts = Counter()
def update(self, val: str) -> None:
if val is not None:
self.counts[val] += 1
def value(self) -> str:
if not self.counts:
return None
return self.counts.most_common(1)[0][0]
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Find most common product per region
sales.group_by(sales.region).select(
sales.region, top_product=mode(sales.product)
).collect()
```
## Explanation
**UDA structure:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.uda
class my_aggregate(pxt.Aggregator):
def __init__(self): # Initialize state
self.state = initial_value
def update(self, val: InputType) -> None: # Called for each row
# Update internal state with val
def value(self) -> OutputType: # Called at the end
return self.state
```
**Key points:**
* Always handle `None` values in `update()`
* Multiple parameters in `update()` enable multi-column aggregations
(like `weighted_avg`)
* Return type annotation on `value()` determines output column type
## See also
* [UDFs in Pixeltable](../../../platform/udfs-in-pixeltable) - Complete
guide to custom functions
* [Join
tables](/howto/cookbooks/core/query-join-tables) -
Combine data before aggregating
# Custom Iterators
Source: https://docs.pixeltable.com/howto/cookbooks/core/custom-iterators
Build custom ComponentIterators in Pixeltable to split documents, videos, audio, or other media into rows for view-based processing.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
An **iterator** in Pixeltable is a function that expands a single input
row into multiple output rows. Built-in Pixeltable iterators include
[frame\_iterator](/sdk/latest/video#iterator-frame_iterator),
which iterates over the frames of a video;
[tile\_iterator](/sdk/latest/image#iterator-tile_iterator),
which iterates over tiles of an image; and
[document\_splitter](/sdk/latest/document#iterator-document_splitter),
which iterates over chunks (such as sentences or pages) of a document.
These and other examples are discussed in the
[Iterators](/platform/iterators) platform
tutorial.
As with UDFs, Pixeltable provides a way for users to define their own
iterators from arbitrary Python code. Recall that custom UDFs are
created by decorating a Python function with the `@pxt.udf` decorator.
Similarly, custom iterators are created by decorating a Python generator
function with `@pxt.iterator`.
Custom iterators are a relatively advanced
Pixeltable feature. This guide will make the most sense if you’re
already familiar with Pixeltable’s built-in iterators, as well as the
pxt.udf decorator. If you haven’t encountered those
concepts yet, it’s recommended to first read the
Iterators
and
UDFs
tutorial sections.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
pxt.create_dir('iterators_demo', if_exists='replace_force')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'iterators\_demo'.
\
In this tutorial, we’ll be creating an iterator that takes an image as
input, and produces multiple images as output. The output images will be
variations of the input with different characteristics. To start, we’ll
create a base table to store our source images.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('iterators_demo/images', {'image': pxt.Image})
images = [
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000108.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000632.jpg',
]
t.insert({'image': image} for image in images)
t.head()
```
Now let’s define a custom iterator. Our iterator is going to turn each
image into `n` different grayscale images of varying brightness.
Creating a functioning iterator is as simple as defining a Python
generator function (a function that `yield`s its output) and then
decorating it with `@pxt.iterator`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from PIL.Image import Image
from PIL.ImageEnhance import Brightness
from typing import Iterator, TypedDict
class GrayscaleOutput(TypedDict):
brightness: float
grayscale_image: Image
@pxt.iterator
def grayscale_iterator(
image: Image, *, n: int
) -> Iterator[GrayscaleOutput]:
grayscale_image = image.convert('L')
enhancer = Brightness(grayscale_image)
for brightness in [0.5 * (i + 1) for i in range(n)]:
enhanced_image = enhancer.enhance(brightness)
yield {
'grayscale_image': enhanced_image,
'brightness': brightness,
}
```
Notice that before defining our iterator, we first introduced a
`TypedDict` class describing the content of the iterator’s output.
Unlike UDFs, iterators can (and usually do) return multiple outputs.
They will *always* `yield` dictionaries, and you *must* annotate the
return type with a suitable `TypedDict`. This is how Pixeltable knows
what types to assign to the iterator’s output columns.
Defining a TypedDict for your
iterator is not optional. Remember that Pixeltable is a database system,
and everything must be typed!
Now let’s see our iterator in action! We’ll create a view on top of the
`images` table and collect the results.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v = pxt.create_view(
'iterators_demo/grayscale',
t,
iterator=grayscale_iterator(t.image, n=3),
)
v.head()
```
The iterator view has the columns `brightness` and `grayscale_image`,
which were defined in `GrayscaleOutput`. In addition, Pixeltable added a
third column `pos`. *Every* iterator will automatically output a `pos`
column, regardless of what shows up in the iterator’s `TypedDict`. The
`pos` column simply indicates the integer position of that row in the
original iteration order. If we look at the schema of our new view, we
can see that `pos` always has type `Int`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v
```
In addition, a column for the original input image is included for
reference. (Of course, the input image is *not* copied `n` times;
Pixeltable materializes it in the view by joining against the base
table.)
## Parameterizing Iterators
Iterators often contain complex functionality; `document_splitter`, for
example, has 10 optional parameters to tune its behavior. Like UDFs,
iterators can involve any number of parameters. To illustrate this,
let’s add an optional `colorize` parameter to our iterator.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from PIL import ImageOps
@pxt.iterator
def grayscale_iterator(
image: Image, *, n: int, colorize: str | None = None
) -> Iterator[GrayscaleOutput]:
grayscale_image = image.convert('L')
if colorize is not None:
grayscale_image = ImageOps.colorize(
grayscale_image, black='black', white=colorize
)
enhancer = Brightness(grayscale_image)
for brightness in [0.5 * (i + 1) for i in range(n)]:
enhanced_image = enhancer.enhance(brightness)
yield {
'grayscale_image': enhanced_image,
'brightness': brightness,
}
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v = pxt.create_view(
'iterators_demo/grayscale',
t,
iterator=grayscale_iterator(t.image, n=3, colorize='red'),
if_exists='replace',
)
v.head()
```
## Validation
Often it’s desirable to validate an iterator’s inputs as a sanity check.
Suppose we want to check that the `colorize` input is a valid PIL color
name. That’s already being done, in a sense: when `ImageOps.colorize` is
called in our iterator code, it will raise an exception if the color
name is not valid. The problem is that the iterator code isn’t executed
until our workflow actually runs. There’s nothing stopping us from
*instantiating* instances of `grayscale_iterator` with broken inputs. To
appreciate this distinction, let’s set up an empty table with no rows,
and define an invalid iterator view on it.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table(
'iterators_demo/images',
{'image': pxt.Image},
if_exists='replace_force',
)
```
Created table 'images'.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v = pxt.create_view(
'iterators_demo/grayscale',
t,
iterator=grayscale_iterator(
t.image, n=3, colorize='invalid_color_name'
),
)
```
The view gets created without any errors, because nothing has actually
run yet! Only when we go to insert data do we see an exception.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.insert({'image': image} for image in images)
```
ValueError: unknown color specifier: 'invalid\_color\_name'
\[0;31m---------------------------------------------------------------------------\[0m
\[0;31mValueError\[0m Traceback (most recent call last)
Cell \[0;32mIn\[10], line 1\[0m
\[0;32m----> 1\[0m \[43mt\[49m\[38;5;241;43m.\[39;49m\[43minsert\[49m\[43m(\[49m\[43m\{\[49m\[38;5;124;43m'\[39;49m\[38;5;124;43mimage\[39;49m\[38;5;124;43m'\[39;49m\[43m:\[49m\[43m \[49m\[43mimage\[49m\[43m}\[49m\[43m \[49m\[38;5;28;43;01mfor\[39;49;00m\[43m \[49m\[43mimage\[49m\[43m \[49m\[38;5;129;43;01min\[39;49;00m\[43m \[49m\[43mimages\[49m\[43m)\[49m
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/insertable\_table.py:174\[0m, in \[0;36mInsertableTable.insert\[0;34m(self, source, source\_format, schema\_overrides, on\_error, print\_stats, \*\*kwargs)\[0m
\[1;32m 171\[0m data\_source\[38;5;241m.\[39madd\_table\_info(table)
\[1;32m 172\[0m data\_source\[38;5;241m.\[39mprepare\_for\_insert\_into\_table()
\[0;32m--> 174\[0m \[38;5;28;01mreturn\[39;00m \[43mtable\[49m\[38;5;241;43m.\[39;49m\[43minsert\_table\_data\_source\[49m\[43m(\[49m
\[1;32m 175\[0m \[43m \[49m\[43mdata\_source\[49m\[38;5;241;43m=\[39;49m\[43mdata\_source\[49m\[43m,\[49m\[43m \[49m\[43mfail\_on\_exception\[49m\[38;5;241;43m=\[39;49m\[43mfail\_on\_exception\[49m\[43m,\[49m\[43m \[49m\[43mprint\_stats\[49m\[38;5;241;43m=\[39;49m\[43mprint\_stats\[49m
\[1;32m 176\[0m \[43m\[49m\[43m)\[49m
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/insertable\_table.py:195\[0m, in \[0;36mInsertableTable.insert\_table\_data\_source\[0;34m(self, data\_source, fail\_on\_exception, print\_stats)\[0m
\[1;32m 193\[0m status \[38;5;241m=\[39m pxt\[38;5;241m.\[39mUpdateStatus()
\[1;32m 194\[0m \[38;5;28;01mfor\[39;00m row\_batch \[38;5;129;01min\[39;00m data\_source\[38;5;241m.\[39mvalid\_row\_batch():
\[0;32m--> 195\[0m status \[38;5;241m+\[39m\[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43m\_tbl\_version\[49m\[38;5;241;43m.\[39;49m\[43mget\[49m\[43m(\[49m\[43m)\[49m\[38;5;241;43m.\[39;49m\[43minsert\[49m\[43m(\[49m
\[1;32m 196\[0m \[43m \[49m\[43mrows\[49m\[38;5;241;43m=\[39;49m\[43mrow\_batch\[49m\[43m,\[49m\[43m \[49m\[43mquery\[49m\[38;5;241;43m=\[39;49m\[38;5;28;43;01mNone\[39;49;00m\[43m,\[49m\[43m \[49m\[43mprint\_stats\[49m\[38;5;241;43m=\[39;49m\[43mprint\_stats\[49m\[43m,\[49m\[43m \[49m\[43mfail\_on\_exception\[49m\[38;5;241;43m=\[39;49m\[43mfail\_on\_exception\[49m
\[1;32m 197\[0m \[43m \[49m\[43m)\[49m
\[1;32m 199\[0m Env\[38;5;241m.\[39mget()\[38;5;241m.\[39mconsole\_logger\[38;5;241m.\[39minfo(status\[38;5;241m.\[39minsert\_msg(start\_ts))
\[1;32m 201\[0m FileCache\[38;5;241m.\[39mget()\[38;5;241m.\[39memit\_eviction\_warnings()
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table\_version.py:1183\[0m, in \[0;36mTableVersion.insert\[0;34m(self, rows, query, print\_stats, fail\_on\_exception)\[0m
\[1;32m 1180\[0m \[38;5;28;01myield\[39;00m rowid
\[1;32m 1182\[0m \[38;5;28;01mwith\[39;00m Env\[38;5;241m.\[39mget()\[38;5;241m.\[39mreport\_progress():
\[0;32m-> 1183\[0m result \[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43m\_insert\[49m\[43m(\[49m
\[1;32m 1184\[0m \[43m \[49m\[43mplan\[49m\[43m,\[49m\[43m \[49m\[43mtime\[49m\[38;5;241;43m.\[39;49m\[43mtime\[49m\[43m(\[49m\[43m)\[49m\[43m,\[49m\[43m \[49m\[43mprint\_stats\[49m\[38;5;241;43m=\[39;49m\[43mprint\_stats\[49m\[43m,\[49m\[43m \[49m\[43mrowids\[49m\[38;5;241;43m=\[39;49m\[43mrowids\[49m\[43m(\[49m\[43m)\[49m\[43m,\[49m\[43m \[49m\[43mabort\_on\_exc\[49m\[38;5;241;43m=\[39;49m\[43mfail\_on\_exception\[49m
\[1;32m 1185\[0m \[43m \[49m\[43m)\[49m
\[1;32m 1186\[0m \[38;5;28;01mreturn\[39;00m result
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table\_version.py:1214\[0m, in \[0;36mTableVersion.*insert\[0;34m(self, exec\_plan, timestamp, rowids, print\_stats, abort\_on\_exc)\[0m
\[1;32m 1211\[0m \[38;5;28;01mfrom\[39;00m \[38;5;21;01mpixeltable\[39;00m\[38;5;21;01m.\[39;00m\[38;5;21;01mplan\[39;00m \[38;5;28;01mimport\[39;00m Planner
\[1;32m 1213\[0m view\_plan, * \[38;5;241m=\[39m Planner\[38;5;241m.\[39mcreate\_view\_load\_plan(view\[38;5;241m.\[39mget()\[38;5;241m.\[39mpath, propagates\_insert\[38;5;241m=\[39m\[38;5;28;01mTrue\[39;00m)
\[0;32m-> 1214\[0m status \[38;5;241m=\[39m \[43mview\[49m\[38;5;241;43m.\[39;49m\[43mget\[49m\[43m(\[49m\[43m)\[49m\[38;5;241;43m.\[39;49m\[43m\_insert\[49m\[43m(\[49m\[43mview\_plan\[49m\[43m,\[49m\[43m \[49m\[43mtimestamp\[49m\[43m,\[49m\[43m \[49m\[43mprint\_stats\[49m\[38;5;241;43m=\[39;49m\[43mprint\_stats\[49m\[43m)\[49m
\[1;32m 1215\[0m result \[38;5;241m+\[39m\[38;5;241m=\[39m status\[38;5;241m.\[39mto\_cascade()
\[1;32m 1217\[0m \[38;5;66;03m# Use the net status after all propagations\[39;00m
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table\_version.py:1201\[0m, in \[0;36mTableVersion.\_insert\[0;34m(self, exec\_plan, timestamp, rowids, print\_stats, abort\_on\_exc)\[0m
\[1;32m 1199\[0m \[38;5;28mself\[39m\[38;5;241m.\[39mbump\_version(timestamp, bump\_schema\_version\[38;5;241m=\[39m\[38;5;28;01mFalse\[39;00m)
\[1;32m 1200\[0m exec\_plan\[38;5;241m.\[39mctx\[38;5;241m.\[39mtitle \[38;5;241m=\[39m \[38;5;28mself\[39m\[38;5;241m.\[39mdisplay\_str()
\[0;32m-> 1201\[0m cols\_with\_excs, row\_counts \[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mstore\_tbl\[49m\[38;5;241;43m.\[39;49m\[43minsert\_rows\[49m\[43m(\[49m
\[1;32m 1202\[0m \[43m \[49m\[43mexec\_plan\[49m\[43m,\[49m\[43m \[49m\[43mv\_min\[49m\[38;5;241;43m=\[39;49m\[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mversion\[49m\[43m,\[49m\[43m \[49m\[43mrowids\[49m\[38;5;241;43m=\[39;49m\[43mrowids\[49m\[43m,\[49m\[43m \[49m\[43mabort\_on\_exc\[49m\[38;5;241;43m=\[39;49m\[43mabort\_on\_exc\[49m
\[1;32m 1203\[0m \[43m\[49m\[43m)\[49m
\[1;32m 1204\[0m result \[38;5;241m=\[39m UpdateStatus(
\[1;32m 1205\[0m cols\_with\_excs\[38;5;241m=\[39m\[\[38;5;124mf\[39m\[38;5;124m'\[39m\[38;5;132;01m\{\[39;00m\[38;5;28mself\[39m\[38;5;241m.\[39mname\[38;5;132;01m}\[39;00m\[38;5;124m.\[39m\[38;5;132;01m\{\[39;00m\[38;5;28mself\[39m\[38;5;241m.\[39mcols\_by\_id\[cid]\[38;5;241m.\[39mname\[38;5;132;01m}\[39;00m\[38;5;124m'\[39m \[38;5;28;01mfor\[39;00m cid \[38;5;129;01min\[39;00m cols\_with\_excs],
\[1;32m 1206\[0m row\_count\_stats\[38;5;241m=\[39mrow\_counts,
\[1;32m 1207\[0m )
\[1;32m 1209\[0m \[38;5;66;03m# update views\[39;00m
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/store.py:425\[0m, in \[0;36mStoreBase.insert\_rows\[0;34m(self, exec\_plan, v\_min, rowids, abort\_on\_exc)\[0m
\[1;32m 420\[0m \[38;5;28;01mwith\[39;00m exec\_plan:
\[1;32m 421\[0m progress\_reporter \[38;5;241m=\[39m exec\_plan\[38;5;241m.\[39mctx\[38;5;241m.\[39madd\_progress\_reporter(
\[1;32m 422\[0m \[38;5;124mf\[39m\[38;5;124m'\[39m\[38;5;124mRows written (table \[39m\[38;5;132;01m\{\[39;00m\[38;5;28mself\[39m\[38;5;241m.\[39mtbl\_version\[38;5;241m.\[39mget()\[38;5;241m.\[39mname\[38;5;132;01m!r}\[39;00m\[38;5;124m)\[39m\[38;5;124m'\[39m, \[38;5;124m'\[39m\[38;5;124mrows\[39m\[38;5;124m'\[39m
\[1;32m 423\[0m )
\[0;32m--> 425\[0m \[38;5;28;01mfor\[39;00m row\_batch \[38;5;129;01min\[39;00m exec\_plan:
\[1;32m 426\[0m num\_rows \[38;5;241m+\[39m\[38;5;241m=\[39m \[38;5;28mlen\[39m(row\_batch)
\[1;32m 427\[0m batch\_table\_rows: \[38;5;28mlist\[39m\[\[38;5;28mtuple\[39m\[Any]] \[38;5;241m=\[39m \[]
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exec/exec\_node.py:63\[0m, in \[0;36mExecNode.\_\_iter\_\_\[0;34m(self)\[0m
\[1;32m 61\[0m \[38;5;28;01mtry\[39;00m:
\[1;32m 62\[0m \[38;5;28;01mwhile\[39;00m \[38;5;28;01mTrue\[39;00m:
\[0;32m---> 63\[0m batch: DataRowBatch \[38;5;241m=\[39m \[43mloop\[49m\[38;5;241;43m.\[39;49m\[43mrun\_until\_complete\[49m\[43m(\[49m\[38;5;28;43maiter\[39;49m\[38;5;241;43m.\[39;49m\[38;5;21;43m\_\_anext\_\_\[39;49m\[43m(\[49m\[43m)\[49m\[43m)\[49m
\[1;32m 64\[0m \[38;5;28;01myield\[39;00m batch
\[1;32m 65\[0m \[38;5;28;01mexcept\[39;00m \[38;5;167;01mStopAsyncIteration\[39;00m:
File \[0;32m/opt/miniconda3/envs/pxt/lib/python3.10/site-packages/nest\_asyncio.py:99\[0m, in \[0;36m\_patch\_loop.\.run\_until\_complete\[0;34m(self, future)\[0m
\[1;32m 96\[0m \[38;5;28;01mif\[39;00m \[38;5;129;01mnot\[39;00m f\[38;5;241m.\[39mdone():
\[1;32m 97\[0m \[38;5;28;01mraise\[39;00m \[38;5;167;01mRuntimeError\[39;00m(
\[1;32m 98\[0m \[38;5;124m'\[39m\[38;5;124mEvent loop stopped before Future completed.\[39m\[38;5;124m'\[39m)
\[0;32m---> 99\[0m \[38;5;28;01mreturn\[39;00m \[43mf\[49m\[38;5;241;43m.\[39;49m\[43mresult\[49m\[43m(\[49m\[43m)\[49m
File \[0;32m/opt/miniconda3/envs/pxt/lib/python3.10/asyncio/futures.py:201\[0m, in \[0;36mFuture.result\[0;34m(self)\[0m
\[1;32m 199\[0m \[38;5;28mself\[39m\[38;5;241m.\[39m\_\_log\_traceback \[38;5;241m=\[39m \[38;5;28;01mFalse\[39;00m
\[1;32m 200\[0m \[38;5;28;01mif\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_exception \[38;5;129;01mis\[39;00m \[38;5;129;01mnot\[39;00m \[38;5;28;01mNone\[39;00m:
\[0;32m--> 201\[0m \[38;5;28;01mraise\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_exception\[38;5;241m.\[39mwith\_traceback(\[38;5;28mself\[39m\[38;5;241m.\[39m\_exception\_tb)
\[1;32m 202\[0m \[38;5;28;01mreturn\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_result
File \[0;32m/opt/miniconda3/envs/pxt/lib/python3.10/asyncio/tasks.py:232\[0m, in \[0;36mTask.\_\_step\[0;34m(***failed resolving arguments***)\[0m
\[1;32m 228\[0m \[38;5;28;01mtry\[39;00m:
\[1;32m 229\[0m \[38;5;28;01mif\[39;00m exc \[38;5;129;01mis\[39;00m \[38;5;28;01mNone\[39;00m:
\[1;32m 230\[0m \[38;5;66;03m# We use the \`send\` method directly, because coroutines\[39;00m
\[1;32m 231\[0m \[38;5;66;03m# don't have \`**iter**\` and \`**next**\` methods.\[39;00m
\[0;32m--> 232\[0m result \[38;5;241m=\[39m \[43mcoro\[49m\[38;5;241;43m.\[39;49m\[43msend\[49m\[43m(\[49m\[38;5;28;43;01mNone\[39;49;00m\[43m)\[49m
\[1;32m 233\[0m \[38;5;28;01melse\[39;00m:
\[1;32m 234\[0m result \[38;5;241m=\[39m coro\[38;5;241m.\[39mthrow(exc)
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exec/object\_store\_save\_node.py:128\[0m, in \[0;36mObjectStoreSaveNode.\_\_aiter\_\_\[0;34m(self)\[0m
\[1;32m 125\[0m \[38;5;28;01mwhile\[39;00m \[38;5;28;01mTrue\[39;00m:
\[1;32m 126\[0m \[38;5;66;03m# Create work to fill the queue to the high water mark ... ?without overrunning the in-flight row limit.\[39;00m
\[1;32m 127\[0m \[38;5;28;01mwhile\[39;00m \[38;5;129;01mnot\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39minput\_finished \[38;5;129;01mand\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39mqueued\_work \[38;5;241m\<\[39m \[38;5;28mself\[39m\[38;5;241m.\[39mQUEUE\_DEPTH\_HIGH\_WATER:
\[0;32m--> 128\[0m input\_batch \[38;5;241m=\[39m \[38;5;28;01mawait\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39mget\_input\_batch(input\_iter)
\[1;32m 129\[0m \[38;5;28;01mif\[39;00m input\_batch \[38;5;129;01mis\[39;00m \[38;5;129;01mnot\[39;00m \[38;5;28;01mNone\[39;00m:
\[1;32m 130\[0m \[38;5;28mself\[39m\[38;5;241m.\[39m\_\_process\_input\_batch(input\_batch, executor)
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exec/object\_store\_save\_node.py:114\[0m, in \[0;36mObjectStoreSaveNode.get\_input\_batch\[0;34m(self, input\_iter)\[0m
\[1;32m 112\[0m \[38;5;250m\[39m\[38;5;124;03m"""Get the next batch of input rows, or None if there are no more rows"""\[39;00m
\[1;32m 113\[0m \[38;5;28;01mtry\[39;00m:
\[0;32m--> 114\[0m input\_batch \[38;5;241m=\[39m \[38;5;28;01mawait\[39;00m anext(input\_iter)
\[1;32m 115\[0m \[38;5;28;01mif\[39;00m input\_batch \[38;5;129;01mis\[39;00m \[38;5;28;01mNone\[39;00m:
\[1;32m 116\[0m \[38;5;28mself\[39m\[38;5;241m.\[39minput\_finished \[38;5;241m=\[39m \[38;5;28;01mTrue\[39;00m
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exec/expr\_eval/expr\_eval\_node.py:298\[0m, in \[0;36mExprEvalNode.\_\_aiter\_\_\[0;34m(self)\[0m
\[1;32m 296\[0m \[38;5;28;01mraise\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39merror \[38;5;28;01mfrom\[39;00m \[38;5;21;01mself\[39;00m\[38;5;21;01m.\[39;00m\[38;5;21;01merror\[39;00m\[38;5;21;01m.\[39;00m\[38;5;21;01mexc\[39;00m
\[1;32m 297\[0m \[38;5;28;01melse\[39;00m:
\[0;32m--> 298\[0m \[38;5;28;01mraise\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39merror
\[1;32m 299\[0m \[38;5;28;01mif\[39;00m completed\_aw \[38;5;129;01min\[39;00m done:
\[1;32m 300\[0m \[38;5;28mself\[39m\[38;5;241m.\[39m\_log\_state(\[38;5;124m'\[39m\[38;5;124mcompleted\_aw done\[39m\[38;5;124m'\[39m)
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exec/expr\_eval/expr\_eval\_node.py:124\[0m, in \[0;36mExprEvalNode.\_fetch\_input\_batch\[0;34m(self)\[0m
\[1;32m 122\[0m \[38;5;28;01massert\[39;00m \[38;5;129;01mnot\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39minput\_complete
\[1;32m 123\[0m \[38;5;28;01mtry\[39;00m:
\[0;32m--> 124\[0m batch \[38;5;241m=\[39m \[38;5;28;01mawait\[39;00m anext(\[38;5;28mself\[39m\[38;5;241m.\[39minput\_iter)
\[1;32m 125\[0m \[38;5;28;01mif\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39mprogress\_reporter \[38;5;129;01mis\[39;00m \[38;5;129;01mnot\[39;00m \[38;5;28;01mNone\[39;00m:
\[1;32m 126\[0m \[38;5;66;03m# make sure our progress reporter shows up before we run anything long\[39;00m
\[1;32m 127\[0m \[38;5;28mself\[39m\[38;5;241m.\[39mprogress\_reporter\[38;5;241m.\[39mupdate(\[38;5;241m0\[39m)
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exec/component\_iteration\_node.py:56\[0m, in \[0;36mComponentIterationNode.\_\_aiter\_\_\[0;34m(self)\[0m
\[1;32m 54\[0m \[38;5;28;01mif\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_\_non\_nullable\_args\_specified(iterator\_args):
\[1;32m 55\[0m iterator \[38;5;241m=\[39m \[38;5;28mself\[39m\[38;5;241m.\[39mview\[38;5;241m.\[39mget()\[38;5;241m.\[39miterator\_call\[38;5;241m.\[39meval(iterator\_args)
\[0;32m---> 56\[0m \[38;5;28;01mfor\[39;00m pos, component\_dict \[38;5;129;01min\[39;00m \[38;5;28menumerate\[39m(iterator):
\[1;32m 57\[0m output\_row \[38;5;241m=\[39m \[38;5;28mself\[39m\[38;5;241m.\[39mrow\_builder\[38;5;241m.\[39mmake\_row()
\[1;32m 58\[0m input\_row\[38;5;241m.\[39mcopy(output\_row)
Cell \[0;32mIn\[6], line 10\[0m, in \[0;36mgrayscale\_iterator\[0;34m(image, n, colorize)\[0m
\[1;32m 8\[0m grayscale\_image \[38;5;241m=\[39m image\[38;5;241m.\[39mconvert(\[38;5;124m'\[39m\[38;5;124mL\[39m\[38;5;124m'\[39m)
\[1;32m 9\[0m \[38;5;28;01mif\[39;00m colorize \[38;5;129;01mis\[39;00m \[38;5;129;01mnot\[39;00m \[38;5;28;01mNone\[39;00m:
\[0;32m---> 10\[0m grayscale\_image \[38;5;241m=\[39m \[43mImageOps\[49m\[38;5;241;43m.\[39;49m\[43mcolorize\[49m\[43m(\[49m
\[1;32m 11\[0m \[43m \[49m\[43mgrayscale\_image\[49m\[43m,\[49m\[43m \[49m\[43mblack\[49m\[38;5;241;43m=\[39;49m\[38;5;124;43m'\[39;49m\[38;5;124;43mblack\[39;49m\[38;5;124;43m'\[39;49m\[43m,\[49m\[43m \[49m\[43mwhite\[49m\[38;5;241;43m=\[39;49m\[43mcolorize\[49m
\[1;32m 12\[0m \[43m \[49m\[43m)\[49m
\[1;32m 13\[0m enhancer \[38;5;241m=\[39m Brightness(grayscale\_image)
\[1;32m 14\[0m \[38;5;28;01mfor\[39;00m brightness \[38;5;129;01min\[39;00m \[\[38;5;241m0.5\[39m \[38;5;241m\*\[39m (i \[38;5;241m+\[39m \[38;5;241m1\[39m) \[38;5;28;01mfor\[39;00m i \[38;5;129;01min\[39;00m \[38;5;28mrange\[39m(n)]:
File \[0;32m/opt/miniconda3/envs/pxt/lib/python3.10/site-packages/PIL/ImageOps.py:207\[0m, in \[0;36mcolorize\[0;34m(image, black, white, mid, blackpoint, whitepoint, midpoint)\[0m
\[1;32m 205\[0m \[38;5;66;03m# Define colors from arguments\[39;00m
\[1;32m 206\[0m rgb\_black \[38;5;241m=\[39m cast(Sequence\[\[38;5;28mint\[39m], \_color(black, \[38;5;124m"\[39m\[38;5;124mRGB\[39m\[38;5;124m"\[39m))
\[0;32m--> 207\[0m rgb\_white \[38;5;241m=\[39m cast(Sequence\[\[38;5;28mint\[39m], \[43m\_color\[49m\[43m(\[49m\[43mwhite\[49m\[43m,\[49m\[43m \[49m\[38;5;124;43m"\[39;49m\[38;5;124;43mRGB\[39;49m\[38;5;124;43m"\[39;49m\[43m)\[49m)
\[1;32m 208\[0m rgb\_mid \[38;5;241m=\[39m cast(Sequence\[\[38;5;28mint\[39m], \_color(mid, \[38;5;124m"\[39m\[38;5;124mRGB\[39m\[38;5;124m"\[39m)) \[38;5;28;01mif\[39;00m mid \[38;5;129;01mis\[39;00m \[38;5;129;01mnot\[39;00m \[38;5;28;01mNone\[39;00m \[38;5;28;01melse\[39;00m \[38;5;28;01mNone\[39;00m
\[1;32m 210\[0m \[38;5;66;03m# Empty lists for the mapping\[39;00m
File \[0;32m/opt/miniconda3/envs/pxt/lib/python3.10/site-packages/PIL/ImageOps.py:48\[0m, in \[0;36m\_color\[0;34m(color, mode)\[0m
\[1;32m 45\[0m \[38;5;28;01mif\[39;00m \[38;5;28misinstance\[39m(color, \[38;5;28mstr\[39m):
\[1;32m 46\[0m \[38;5;28;01mfrom\[39;00m \[38;5;21;01m.\[39;00m \[38;5;28;01mimport\[39;00m ImageColor
\[0;32m---> 48\[0m color \[38;5;241m=\[39m \[43mImageColor\[49m\[38;5;241;43m.\[39;49m\[43mgetcolor\[49m\[43m(\[49m\[43mcolor\[49m\[43m,\[49m\[43m \[49m\[43mmode\[49m\[43m)\[49m
\[1;32m 49\[0m \[38;5;28;01mreturn\[39;00m color
File \[0;32m/opt/miniconda3/envs/pxt/lib/python3.10/site-packages/PIL/ImageColor.py:144\[0m, in \[0;36mgetcolor\[0;34m(color, mode)\[0m
\[1;32m 130\[0m \[38;5;250m\[39m\[38;5;124;03m"""\[39;00m
\[1;32m 131\[0m \[38;5;124;03mSame as :py:func:\`\~PIL.ImageColor.getrgb\` for most modes. However, if\[39;00m
\[1;32m 132\[0m \[38;5;124;03m\`\`mode\`\` is HSV, converts the RGB value to a HSV value, or if \`\`mode\`\` is\[39;00m
\[0;32m (...)\[0m
\[1;32m 141\[0m \[38;5;124;03m:return: \`\`graylevel, (graylevel, alpha) or (red, green, blue\[, alpha])\`\`\[39;00m
\[1;32m 142\[0m \[38;5;124;03m"""\[39;00m
\[1;32m 143\[0m \[38;5;66;03m# same as getrgb, but converts the result to the given mode\[39;00m
\[0;32m--> 144\[0m rgb, alpha \[38;5;241m=\[39m \[43mgetrgb\[49m\[43m(\[49m\[43mcolor\[49m\[43m)\[49m, \[38;5;241m255\[39m
\[1;32m 145\[0m \[38;5;28;01mif\[39;00m \[38;5;28mlen\[39m(rgb) \[38;5;241m==\[39m \[38;5;241m4\[39m:
\[1;32m 146\[0m alpha \[38;5;241m=\[39m rgb\[\[38;5;241m3\[39m]
File \[0;32m/opt/miniconda3/envs/pxt/lib/python3.10/site-packages/PIL/ImageColor.py:125\[0m, in \[0;36mgetrgb\[0;34m(color)\[0m
\[1;32m 123\[0m \[38;5;28;01mreturn\[39;00m \[38;5;28mint\[39m(m\[38;5;241m.\[39mgroup(\[38;5;241m1\[39m)), \[38;5;28mint\[39m(m\[38;5;241m.\[39mgroup(\[38;5;241m2\[39m)), \[38;5;28mint\[39m(m\[38;5;241m.\[39mgroup(\[38;5;241m3\[39m)), \[38;5;28mint\[39m(m\[38;5;241m.\[39mgroup(\[38;5;241m4\[39m))
\[1;32m 124\[0m msg \[38;5;241m=\[39m \[38;5;124mf\[39m\[38;5;124m"\[39m\[38;5;124munknown color specifier: \[39m\[38;5;132;01m\{\[39;00m\[38;5;28mrepr\[39m(color)\[38;5;132;01m}\[39;00m\[38;5;124m"\[39m
\[0;32m--> 125\[0m \[38;5;28;01mraise\[39;00m \[38;5;167;01mValueError\[39;00m(msg)
\[0;31mValueError\[0m: unknown color specifier: 'invalid\_color\_name'
It’s more useful to do *fail-fast validation*, in which the arguments
get checked at the time the iterator is first instantiated. This can be
done in Pixeltable with the `@validate` decorator.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from PIL import ImageColor
@grayscale_iterator.validate
def _(bound_args: dict):
color = bound_args.get('colorize')
if color is not None:
try:
ImageColor.getrgb(color)
except ValueError as exc:
raise ValueError(f'Invalid color name: {color}') from exc
```
Now if we try to create an invalid instance, we get an error right away.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table(
'iterators_demo/images',
{'input': pxt.Image},
if_exists='replace_force',
)
```
ValueError: Invalid color name: invalid\_color\_name
\[0;31m---------------------------------------------------------------------------\[0m
\[0;31mValueError\[0m Traceback (most recent call last)
Cell \[0;32mIn\[11], line 9\[0m, in \[0;36m\_\[0;34m(bound\_args)\[0m
\[1;32m 8\[0m \[38;5;28;01mtry\[39;00m:
\[0;32m----> 9\[0m \[43mImageColor\[49m\[38;5;241;43m.\[39;49m\[43mgetrgb\[49m\[43m(\[49m\[43mcolor\[49m\[43m)\[49m
\[1;32m 10\[0m \[38;5;28;01mexcept\[39;00m \[38;5;167;01mValueError\[39;00m \[38;5;28;01mas\[39;00m exc:
File \[0;32m/opt/miniconda3/envs/pxt/lib/python3.10/site-packages/PIL/ImageColor.py:125\[0m, in \[0;36mgetrgb\[0;34m(color)\[0m
\[1;32m 124\[0m msg \[38;5;241m=\[39m \[38;5;124mf\[39m\[38;5;124m"\[39m\[38;5;124munknown color specifier: \[39m\[38;5;132;01m\{\[39;00m\[38;5;28mrepr\[39m(color)\[38;5;132;01m}\[39;00m\[38;5;124m"\[39m
\[0;32m--> 125\[0m \[38;5;28;01mraise\[39;00m \[38;5;167;01mValueError\[39;00m(msg)
\[0;31mValueError\[0m: unknown color specifier: 'invalid\_color\_name'
The above exception was the direct cause of the following exception:
\[0;31mValueError\[0m Traceback (most recent call last)
Cell \[0;32mIn\[13], line 4\[0m
\[1;32m 1\[0m v \[38;5;241m=\[39m pxt\[38;5;241m.\[39mcreate\_view(
\[1;32m 2\[0m \[38;5;124m'\[39m\[38;5;124miterators\_demo/grayscale\[39m\[38;5;124m'\[39m,
\[1;32m 3\[0m t,
\[0;32m----> 4\[0m iterator\[38;5;241m=\[39m\[43mgrayscale\_iterator\[49m\[43m(\[49m
\[1;32m 5\[0m \[43m \[49m\[43mt\[49m\[38;5;241;43m.\[39;49m\[43minput\[49m\[43m,\[49m\[43m \[49m\[43mn\[49m\[38;5;241;43m=\[39;49m\[38;5;241;43m3\[39;49m\[43m,\[49m\[43m \[49m\[43mcolorize\[49m\[38;5;241;43m=\[39;49m\[38;5;124;43m'\[39;49m\[38;5;124;43minvalid\_color\_name\[39;49m\[38;5;124;43m'\[39;49m
\[1;32m 6\[0m \[43m \[49m\[43m)\[49m,
\[1;32m 7\[0m )
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/func/iterator.py:233\[0m, in \[0;36mGeneratingFunction.\_\_call\_\_\[0;34m(self, \*args, \*\*kwargs)\[0m
\[1;32m 231\[0m \[38;5;66;03m# Run custom iterator validation on whatever args are bound to literals at this stage\[39;00m
\[1;32m 232\[0m \[38;5;28;01mif\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_validate \[38;5;129;01mis\[39;00m \[38;5;129;01mnot\[39;00m \[38;5;28;01mNone\[39;00m:
\[0;32m--> 233\[0m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43m\_validate\[49m\[43m(\[49m\[43mliteral\_args\[49m\[43m)\[49m
\[1;32m 235\[0m output\_schema \[38;5;241m=\[39m \[38;5;28mself\[39m\[38;5;241m.\[39mcall\_output\_schema(literal\_args)
\[1;32m 237\[0m outputs \[38;5;241m=\[39m \{
\[1;32m 238\[0m name: IteratorOutput(orig\_name\[38;5;241m=\[39mname, is\_stored\[38;5;241m=\[39m(name \[38;5;129;01mnot\[39;00m \[38;5;129;01min\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39munstored\_cols), col\_type\[38;5;241m=\[39mcol\_type)
\[1;32m 239\[0m \[38;5;28;01mfor\[39;00m name, col\_type \[38;5;129;01min\[39;00m output\_schema\[38;5;241m.\[39mitems()
\[1;32m 240\[0m }
Cell \[0;32mIn\[11], line 11\[0m, in \[0;36m\_\[0;34m(bound\_args)\[0m
\[1;32m 9\[0m ImageColor\[38;5;241m.\[39mgetrgb(color)
\[1;32m 10\[0m \[38;5;28;01mexcept\[39;00m \[38;5;167;01mValueError\[39;00m \[38;5;28;01mas\[39;00m exc:
\[0;32m---> 11\[0m \[38;5;28;01mraise\[39;00m \[38;5;167;01mValueError\[39;00m(\[38;5;124mf\[39m\[38;5;124m'\[39m\[38;5;124mInvalid color name: \[39m\[38;5;132;01m\{\[39;00mcolor\[38;5;132;01m}\[39;00m\[38;5;124m'\[39m) \[38;5;28;01mfrom\[39;00m \[38;5;21;01mexc\[39;00m
\[0;31mValueError\[0m: Invalid color name: invalid\_color\_name
The input to `validate()`, `bound_args`, is a dictionary that contains
all *constant* arguments for a particular instance of the iterator. In
the above example, it contains `colorize` (because it’s equal to the
constant value `'invalid_color_name'`), but not `image` (which depends
dynamically on the data in the `t.input` column).
`validate()` will actually be called twice: once when the iterator is
instantiated, with just the constant arguments present in `bound_args`;
and again when the iterator is evaluated on each row, this time with
*all* arguments present.
## Class-Based Iterators
For complex iterators that need to maintain a lot of state or provide
fine-grained control over their iteration mechanism, it can be
convenient to define a class rather than a generator function. This can
be done by writing a subclass of `PxtIterator` and decorating the class,
rather than decorating a function. Here’s what `grayscale_iterator`
looks like if written as a class; it is functionally identical to the
earlier implementation.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.iterator
class grayscale_iterator(pxt.PxtIterator[GrayscaleOutput]):
# The parameters of __init__() determine the iterator arguments
def __init__(
self, image: Image, *, n: int, colorize: str | None = None
):
self.image = image
self.n = n
self.colorize = colorize
self.idx = 0
grayscale_image = self.image.convert('L')
if self.colorize is not None:
grayscale_image = ImageOps.colorize(
grayscale_image, black='black', white=self.colorize
)
self.enhancer = Brightness(grayscale_image)
# Every class-based iterator *must* implement a __next__() method
# whose return type is a `TypedDict`.
def __next__(self) -> GrayscaleOutput:
if self.idx >= self.n:
raise StopIteration
brightness = 0.5 * (self.idx + 1)
enhanced_image = self.enhancer.enhance(brightness)
self.idx += 1
return {
'grayscale_image': enhanced_image,
'brightness': brightness,
}
# When defining a class-based iterator, validate() can optionally be specified
# as a @classmethod rather than a standalone decorated function.
@classmethod
def validate(cls, bound_args: dict):
color = bound_args.get('colorize')
if color is not None:
try:
ImageColor.getrgb(color)
except ValueError as exc:
raise ValueError(f'Invalid color name: {color}') from exc
```
## Unstored Columns
That’s all you need to know to implement fully functional iterators. But
sometimes, depending on the nature of the outputs, a little extra work
will help make them more performant.
In our example, every input image gets turned into `n` output images.
Moreover, recreating those output images doesn’t involve a lot of
computation: it’s just a simple color mask. If we store every output
image as a separate file, then when `n` is large we’ll be using up a lot
of storage without much benefit. Even at `n=3`, the outputs will consume
3x the storage as the inputs (maybe a little less since they’re
monochrome now, but you get the idea).
Just as with computed columns, Pixeltable provides an option for
iterator outputs to be *unstored* - meaning the outputs won’t be saved
to disk, and they’ll instead be dynamically regenerated each time a
client queries them. Unstored columns don’t provide much benefit for
scalar columns (integers or strings, say), where the storage footprint
is small; or for expensive computations (such as generative model
outputs), where we actually *do* want to persist the output. But for
simple image operations, they can be a lifesaver.
In the Pixeltable library,
frame\_iterator and tile\_iterator both use an
unstored column for the output images. In the case of
frame\_iterator, the output is potentially *huge*, because
video data is highly compressed, as compared to individually stored
frame images.
To mark an iterator output as unstored, use the `unstored_cols`
decorator parameter. There is one important caveat:
* If you use unstored columns, you *must* implement your iterator as a
class-based iterator; and
* You *must* implement a `seek()` method in your class, as in the
example below.
This is to ensure Pixeltable has efficient random access to the iterator
outputs, to facilitate downstream queries against the iterator view.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Mark `grayscale_image` as an unstored column.
@pxt.iterator(unstored_cols=['grayscale_image'])
class grayscale_iterator(pxt.PxtIterator[GrayscaleOutput]):
def __init__(
self, image: Image, *, n: int, colorize: str | None = None
):
self.image = image
self.n = n
self.colorize = colorize
self.idx = 0
grayscale_image = self.image.convert('L')
if self.colorize is not None:
grayscale_image = ImageOps.colorize(
grayscale_image, black='black', white=self.colorize
)
self.enhancer = Brightness(grayscale_image)
def __next__(self) -> GrayscaleOutput:
if self.idx >= self.n:
raise StopIteration
brightness = 0.5 * (self.idx + 1)
enhanced_image = self.enhancer.enhance(brightness)
self.idx += 1
return {
'grayscale_image': enhanced_image,
'brightness': brightness,
}
# seek() will always receive the `pos` of the row being sought. It
# will also receive the previously stored values of any *stored*
# output columns in the target row, as keyword arguments.
def seek(self, pos: int, **kwargs):
assert 0 <= pos < self.n
# 'brightness' is a stored column, so it should always be
# present. We don't need it to implement seek(), but for
# purposes of illustration let's check that it's here.
assert 'brightness' in kwargs
self.idx = pos # Reset the iterator to the sought position.
# When defining a class-based iterator, validate() can optionally
# be a @classmethod rather than a standalone decorated function.
@classmethod
def validate(cls, bound_args: dict):
color = bound_args.get('colorize')
if color is not None:
try:
ImageColor.getrgb(color)
except ValueError as exc:
raise ValueError(f'Invalid color name: {color}') from exc
```
There it is: a complete, performant implementation of
`grayscale_iterator`. Let’s check one more time that it all works as
expected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table(
'iterators_demo/images',
{'image': pxt.Image},
if_exists='replace_force',
)
t.insert({'image': image} for image in images)
```
Inserted 2 rows with 0 errors in 0.03 s (75.79 rows/s)
2 rows inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v = pxt.create_view(
'iterators_demo/grayscale',
t,
iterator=grayscale_iterator(t.image, n=3),
)
v.head()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Check that we have random access to arbitrary rows in the view.
v.where(v.pos == 2).collect()
```
# Split data into multiple rows with iterators
Source: https://docs.pixeltable.com/howto/cookbooks/core/data-split-rows
Split a single row into many derived rows in Pixeltable using built-in component iterators for chunks, frames, video segments, and tiled data.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Transform a single document, video, image, or audio file into multiple
rows for granular processing.
**What’s in this recipe:**
* Split documents into text chunks for RAG
* Extract frames or segments from videos
* Tile images for high-resolution analysis
* Chunk audio files for transcription
## Problem
You have documents, videos, or text that you need to break into smaller
pieces for processing. A PDF needs to be split into chunks for
retrieval-augmented generation. A video needs individual frames for
analysis. Text needs to be divided into sentences or sliding windows.
You need a way to transform one source row into multiple output rows
automatically.
## Solution
You create views with iterator functions that split source data into
multiple rows. Pixeltable provides built-in iterators for documents,
videos, images, audio, and strings.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable spacy tiktoken click
!python -m spacy download en_core_web_sm -q
```
### Split documents into chunks
Use `document_splitter` to break documents (PDF, HTML, Markdown, TXT)
into text chunks.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.document import document_splitter
pxt.drop_dir('split_demo', force=True)
pxt.create_dir('split_demo')
docs = pxt.create_table('split_demo/docs', {'doc': pxt.Document})
docs.insert(
[
{
'doc': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/rag-demo/Jefferson-Amazon.pdf'
}
]
)
```
Inserted 1 row with 0 errors in 0.13 s (7.68 rows/s)
1 row inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
chunks = pxt.create_view(
'split_demo/doc_chunks',
docs,
iterator=document_splitter(
docs.doc, separators='sentence,token_limit', limit=300
),
)
chunks.select(chunks.text).limit(3).collect()
```
**Available separators:**
* `heading` — Split on HTML/Markdown headings
* `sentence` — Split on sentence boundaries (requires spacy)
* `token_limit` — Split by token count (requires tiktoken)
* `char_limit` — Split by character count
* `page` — Split by page (PDF only)
[SDK Reference:
document\_splitter](/sdk/latest/document)
### Extract frames from videos
Use `frame_iterator` to extract frames at specified intervals.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.video import frame_iterator
videos = pxt.create_table('split_demo/videos', {'video': pxt.Video})
videos.insert(
[
{
'video': 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/bangkok.mp4'
}
]
)
```
Inserted 1 row with 0 errors in 1.28 s (0.78 rows/s)
1 row inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
frames = pxt.create_view(
'split_demo/frames',
videos,
iterator=frame_iterator(videos.video, fps=1.0),
)
frames.select(frames.frame, frames.frame_attrs).limit(3).collect()
```
**frame\_iterator options:**
* `fps` — Frames per second to extract
* `num_frames` — Extract exact number of frames (evenly spaced)
* `keyframes_only` — Extract only keyframes
[SDK Reference:
frame\_iterator](/sdk/latest/video)
### Split videos into segments
Use `video_splitter` to divide videos into smaller clips.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.video import video_splitter
segments = pxt.create_view(
'split_demo/segments',
videos,
iterator=video_splitter(
videos.video, duration=5.0, min_segment_duration=1.0
),
)
segments.select(
segments.segment_start, segments.segment_end, segments.video_segment
).limit(3).collect()
```
**video\_splitter options:**
* `duration` — Duration of each segment in seconds
* `overlap` — Overlap between segments in seconds
* `min_segment_duration` — Drop last segment if shorter than this
[SDK Reference:
video\_splitter](/sdk/latest/video)
### Split strings into sentences
Use `string_splitter` to divide text into sentences.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.string import string_splitter
texts = pxt.create_table('split_demo/texts', {'content': pxt.String})
texts.insert(
[
{
'content': 'AI data infrastructure simplifies ML workflows. Declarative pipelines update incrementally. This makes development faster and more maintainable.'
}
]
)
```
Inserted 1 row with 0 errors in 0.03 s (38.38 rows/s)
1 row inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sentences = pxt.create_view(
'split_demo/sentences',
texts,
iterator=string_splitter(texts.content, separators='sentence'),
)
sentences.select(sentences.text).collect()
```
[SDK Reference:
string\_splitter](/sdk/latest/string)
### Tile images for analysis
Use `tile_iterator` to divide large images into a grid of smaller tiles.
This is useful for processing high-resolution images that are too large
to analyze at once, or for running object detection on different
regions.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.image import tile_iterator
images = pxt.create_table('split_demo/images', {'image': pxt.Image})
images.insert(
[
{
'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/pixeltable-logo-large.png'
}
]
)
```
Inserted 1 row with 0 errors in 0.09 s (11.69 rows/s)
1 row inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tiles = pxt.create_view(
'split_demo/tiles',
images,
iterator=tile_iterator(images.image, tile_size=(100, 100)),
)
```
**tile\_iterator options:**
* `tile_size` — Size of each tile as `(width, height)`
* `overlap` — Overlap between adjacent tiles as `(width, height)`
[SDK Reference:
tile\_iterator](/sdk/latest/image)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tiles.select(tiles.tile_coord, tiles.tile).sample(n=4).collect()
```
### Split audio into chunks
Use `audio_splitter` to divide audio files into time-based segments for
transcription or analysis.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.audio import audio_splitter
audio = pxt.create_table('split_demo/audio', {'audio': pxt.Audio})
audio.insert(
[
{
'audio': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/10-minute%20tour%20of%20Pixeltable.mp3'
}
]
)
```
Inserted 1 row with 0 errors in 0.67 s (1.50 rows/s)
1 row inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
audio_segments = pxt.create_view(
'split_demo/audio_chunks',
audio,
iterator=audio_splitter(audio.audio, duration=30.0, overlap=2.0),
)
audio_segments.select(
audio_segments.segment_start, audio_segments.segment_end
).limit(5).collect()
```
**audio\_splitter options:**
* `duration` — Duration of each chunk in seconds
* `overlap` — Overlap between chunks in seconds
* `min_segment_duration` — Drop last chunk if shorter than this
[SDK Reference:
audio\_splitter](/sdk/latest/audio)
## See also
* [Split documents for
RAG](/howto/cookbooks/text/doc-chunk-for-rag)
* [Extract frames from
videos](/howto/cookbooks/video/video-extract-frames)
* [Transcribe audio
files](/howto/cookbooks/audio/audio-transcribe)
# Get fast feedback on transformations
Source: https://docs.pixeltable.com/howto/cookbooks/core/dev-iterative-workflow
Develop Pixeltable pipelines iteratively with versioned tables, computed columns, and time-travel queries to refine logic without losing data.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
## Problem
You need to iterate on transformation logic before running it on your
entire dataset—especially for expensive operations like API calls or
model inference.
## Solution
**What’s in this recipe:**
* Test transformations on sample rows before applying to your full
dataset
* Save expressions as variables to guarantee consistent logic
* Apply the iterate-then-add workflow with built-in functions,
expressions, and custom UDFs
* Annotate columns with comments and custom metadata using `ColumnSpec`
You test transformation logic on sample rows before processing your
entire dataset using the iterate-then-add workflow. This lets you
validate logic on a few rows before committing to your full table.
You use `.select()` with `.collect()` to preview transformations—nothing
is stored in your table. If you want to collect only the first few rows,
use `.head(n)` instead of `.collect()`. Once you’re satisfied with the
results, use `.add_computed_column()` with the same expression to
persist the transformation across your full table.
This workflow applies to any data type in Pixeltable: images, videos,
audio files, documents, and structured tabular data. This recipe uses
text data and shows three examples:
1. Testing built-in functions on sample data
2. Saving expressions as variables to ensure consistency
3. Iterating with custom user-defined functions (UDFs)
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
### Create sample data
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Create a fresh directory (drop existing if present)
pxt.drop_dir('demo_project', force=True)
pxt.create_dir('demo_project')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'demo\_project'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('demo_project/lyrics', {'text': pxt.String})
```
Created table 'lyrics'.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.insert(
[
{'text': 'Tumble out of bed and I stumble to the kitchen'},
{'text': 'Pour myself a cup of ambition'},
{'text': 'And yawn and stretch and try to come to life'},
{'text': "Jump in the shower and the blood starts pumpin'"},
{'text': "Out on the street, the traffic starts jumpin'"},
{'text': 'With folks like me on the job from nine to five'},
]
)
```
Inserted 6 rows with 0 errors in 0.01 s (916.65 rows/s)
6 rows inserted.
### Example 1: built-in functions
Iterate with built-in functions, then add to the table.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Test uppercase transformation on subset
t.select(t.text, uppercase=t.text.upper()).head(2)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Confirm the transformation was only in memory—table unchanged
t.head(2)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Apply to all rows (same expression)
t.add_computed_column(uppercase=t.text.upper())
```
Added 6 column values with 0 errors in 0.04 s (158.08 rows/s)
6 rows updated.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View text with uppercase column
t.collect()
```
### Example 2: save and reuse expressions
Save an expression as a variable to guarantee the same logic in both
iterate and add steps.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Define the expression once - no duplication
char_count_expr = t.text.len()
# Iterate: Test on subset
t.select(t.text, char_count=char_count_expr).head(2)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Confirm the transformation was only in memory—table unchanged
t.head(2)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add: Use the SAME expression to persist
t.add_computed_column(char_count=char_count_expr)
```
Added 6 column values with 0 errors in 0.02 s (348.64 rows/s)
6 rows updated.
Added 6 column values with 0 errors in 0.02 s (312.11 rows/s)
6 rows updated.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View text with word_count column
t.collect()
```
### Example 4: annotate columns with metadata
Use `ColumnSpec` to attach a comment or custom metadata when adding
columns. Comments appear in `describe()` output, while `custom_metadata`
stores arbitrary data (tags, version info, config) that you can retrieve
with `get_metadata()`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.types import ColumnSpec
# Add a column with a comment and custom metadata
t.add_column(
source=ColumnSpec(
type=pxt.String,
comment='Original source URL or file path',
custom_metadata={'added_by': 'data_team', 'version': 2},
)
)
t.describe()
```
## Explanation
**How the iterate-then-add workflow works:**
Queries and computed columns serve different purposes. Queries let you
test transformations on sample rows without storing anything. Once
you’re satisfied with the results, you use the exact same expression
with `.add_computed_column()` to persist it across your entire table.
This workflow is especially valuable for expensive operations—API calls,
model inference, complex image processing—where you want to validate
logic before processing your full dataset. Test on 2-3 rows to catch
errors early, then commit once.
**To customize this workflow:**
* **Sample size**: Use `.head(n)` to collect only the first n
rows—`.head(1)` for single-row testing, `.head(10)` for broader
validation, or `.collect()` to collect all rows
* **Save expressions**: Store transformations as variables (Example 2)
to guarantee identical logic in both iterate and add steps
* **Chain transformations**: Test multiple operations
together—`.select(t.text.upper().split())` works just like single
operations
* **Use with any data type**: This pattern works with images, videos,
audio, documents—not just text. For multimodal data, visual inspection
during iteration is especially valuable
**The Pixeltable workflow:**
In traditional databases, `.select()` just picks which columns to view.
In Pixeltable, `.select()` also lets you compute new transformations on
the fly—define new columns without storing them. This makes `.select()`
perfect for testing transformations before you commit them.
When you use `.select()`, you’re creating a query. Queries are temporary
operations that retrieve and transform data from tables—they don’t store
anything. Queries use lazy evaluation, meaning they don’t execute until
you call `.collect()`. You must use `.collect()` to execute the query
and return results. `.head(n)` is a convenience method that collects
only the first n rows instead of all rows. Use `.head(n)` when iterating
to get fast feedback without processing your entire dataset.
Nothing is stored in your table when you run queries. You can test
different approaches quickly without affecting your data. You can store
query results in a Python variable to work with them in your session.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Store query results as a variable (in memory only)
results = t.select(
t.text,
uppercase=t.text.upper() # Label the transformed column
).head(3)
```
These results are stored in memory and will not persist across
sessions—only `.add_computed_column()` persists data to your table.
Once you’re satisfied, `.add_computed_column()` uses the same expression
but adds it as a persistent column in your table. Now the transformation
runs on all rows and results are stored permanently.
## See also
* [Transform images with PIL
operations](/howto/cookbooks/images/img-pil-transforms)
* [Convert RGB images to
grayscale](/howto/cookbooks/images/img-rgb-to-grayscale)
* [Apply filters to
images](/howto/cookbooks/images/img-apply-filters)
# Your Backend for Multimodal AI Applications
Source: https://docs.pixeltable.com/howto/cookbooks/core/multimodal_backend
Use Pixeltable as a unified backend for multimodal AI apps that store, transform, embed, and query images, video, audio, and documents.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
### Why Pixeltable
Every multimodal AI app needs the same five things: store media, run
models, index embeddings, serve endpoints, version everything. Most
teams glue together 5-8 services (Postgres + Pinecone + S3 + Airflow +
LangChain + …) and spend more time on infrastructure than on the
product.
**Pixeltable is a single system that handles all five.** One
`pip install`, one Python API, one place to store, transform, index,
retrieve, serve, version, observe, and debug.
**For developers and vibe coders:** Pixeltable’s declarative API means
AI assistants generate correct, production-grade code. No glue logic, no
orchestrator configs, no serialization code. Experimenting on multimodal
data (extract a frame, run a model, draw bounding boxes) is one
expression, not a pipeline. Install the [Pixeltable
Skill](https://github.com/pixeltable/pixeltable-skill) and prompt.
**For teams evaluating infrastructure:** Transaction integrity, async
execution, parallelization, caching, retries, and observability are
built in. One system to operate, monitor, and maintain. Schema changes
are one line. Model upgrades are zero-downtime.
Extensible via `@pxt.udf`, `@pxt.uda`, `@pxt.query`. [20+ AI
providers](/integrations/frameworks) built
in. [Skill](https://github.com/pixeltable/pixeltable-skill) | [MCP
Server](https://github.com/pixeltable/mcp-server-pixeltable-developer)
\| [Starter Kit](https://github.com/pixeltable/pixeltable-starter-kit)
\| `llms.txt`: docs.pixeltable.com/llms.txt
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable google-genai 'fastapi[standard]'
%pip install -qU torch torchvision transformers # optional, for object detection
```
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import logging
import os
import warnings
warnings.filterwarnings('ignore')
logging.getLogger('asyncio').setLevel(logging.CRITICAL)
logging.getLogger('huggingface_hub').setLevel(logging.CRITICAL)
if (
'GEMINI_API_KEY' not in os.environ
and 'GOOGLE_API_KEY' not in os.environ
):
os.environ['GEMINI_API_KEY'] = getpass.getpass('Gemini API Key: ')
import pixeltable as pxt
from pixeltable.functions import gemini
BASE_URL = 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources'
```
1. Store: Multimodal Tables
Video, audio, images, and documents are first-class column types.
`pip install pixeltable` is all you need.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.uuid import uuid7
pxt.drop_dir('demo', force=True, if_not_exists='ignore')
pxt.create_dir('demo')
videos = pxt.create_table(
'demo/videos',
{'id': uuid7(), 'video': pxt.Video, 'title': pxt.String},
primary_key='id',
)
videos
```
Created directory 'demo'.
Created table 'videos'.
2. Orchestrate: AI as Computed Columns
Add a computed column; Pixeltable calls Gemini on every insert, caches
results, retries failures, keeps embeddings in sync.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
videos.add_computed_column(
response=gemini.generate_content(
[videos.video, 'Describe this video in detail.'],
model='gemini-3-flash-preview',
)
)
videos.add_computed_column(
description=videos.response.candidates[0]
.content.parts[0]
.text.astype(pxt.String)
)
videos.add_embedding_index(
'description',
embedding=gemini.embed_content.using(
model='gemini-embedding-2-preview'
),
)
```
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.03 s
3. Insert: One Call Triggers the Full Pipeline
`insert()` downloads videos, runs Gemini, extracts text, computes
embeddings. Open the **Dashboard** to watch in real time.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
videos.insert(
[
{
'video': f'{BASE_URL}/bangkok.mp4',
'title': 'Bangkok Street Tour',
},
{
'video': f'{BASE_URL}/The-Pursuit-of-Happiness-Video-Extract.mp4',
'title': 'The Pursuit of Happiness',
},
]
)
```
Inserted 2 rows with 0 errors in 22.85 s (0.09 rows/s)
2 rows inserted.
\{'rows': \[\{'title': 'The Pursuit of Happiness',
'description': 'In this clip from the movie "The Pursuit of Happyness," Chris Gardner (played by Will Smith) has just finished an interview for a competitive internship at a brokerage firm. Despite his disheveled appearance—wearing a grey work jacket and looking tired—he is approached by Jay Twistle, a senior manager at the firm.\n\n**Detailed Scene Breakdown:**\n\n\* **The Approach:** The scene opens with Chris looking down, appearing stressed or emotional. A voice calls out "Chris...", and he turns to see Mr. Twistle walking toward him with a wide, congratulatory smile. They are in a professional office lobby with people moving in the background and a reception desk visible.\n\* **The Interaction:** Mr. Twistle expresses his admiration, saying, "I don\'t know how you did it dressed as a garbage man, but you really pulled it off in there." This refers to Chris\'s impressive performance during the interview despite his unconventional attire (having come straight from a night in a jail cell due to unpaid parking tickets).\n\* **Building Rapport:** Chris politely thanks him, addressing him as "Mr. Twistle." In a sign of newfound respect and a positive result, Twistle insists, "Hey, now you can call me Jay. We\'ll talk to you soon." He gives Chris a friendly pat on the shoulder before walking away.\n\* **Gardner\'s Reaction:** Chris is left standing in the hallway, a look of immense relief and quiet triumph washing over his face. The scene highlights a pivotal moment where his intelligence and determination overcame his difficult circumstances.\n\nThe video features the "Binge Society" logo in the top left corner and copyright information at the bottom for Columbia Pictures Industries, Inc. and GH One LLC from 2006.',
'similarity': 0.4487778141613705},
\{'title': 'Bangkok Street Tour',
'description': "The video is a static, high-angle shot overlooking a busy multi-lane city street in what appears to be Bangkok, Thailand, indicated by the presence of tuk-tuks and brightly colored taxis. The scene captures the constant flow of traffic throughout the entire clip.\n\nIn the foreground on the left, a blue hatchback and a traditional three-wheeled tuk-tuk with a pink delivery bag on its back are either stationary or moving very slowly. Throughout the video, various vehicles, including white sedans, silver SUVs, motorcycles, and the city's signature pink and green-yellow taxis, navigate the lanes. \n\nThe road is divided by a narrow median with small green bushes. Traffic moves in both directions, with vehicles heading away from the camera and towards it. On the left side of the street, large multi-story buildings feature several prominent billboards, one of which displays a woman’s face. On the right, a row of trees lines the sidewalk, behind which several large, white-roofed structures with pink accents are visible. In the background, a pedestrian overpass crosses the busy road, and taller city buildings can be seen in the distance under a bright, overcast sky. The overall atmosphere is one of a typical, bustling urban afternoon.",
'similarity': 0.13178936719516832}]}
7. Version: Automatic History
Every insert, update, and delete is versioned. `history()` returns the
full changelog.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
videos.history()
```
8. Agents: Tool Calling and Memory as Computed Columns
An agent is just more computed columns. Define tools from `@pxt.query`
functions, wire tool calling and context assembly as a chain of columns,
and every insert triggers the full reasoning pipeline. Memory is a table
with an embedding index.
This pattern scales to production. See the [Starter
Kit](https://github.com/pixeltable/pixeltable-starter-kit) for a
complete implementation with documents, images, video, and cross-modal
search.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.anthropic import invoke_tools, messages
# 1. Define tools from existing @pxt.query functions
tools = pxt.tools(search_videos)
# 2. Memory: chat history with embedding index
chat = pxt.create_table(
'demo/chat',
{
'role': pxt.String,
'content': pxt.String,
'conversation_id': pxt.String,
},
if_exists='ignore',
)
chat.add_embedding_index(
'content',
string_embed=gemini.embed_content.using(
model='gemini-embedding-2-preview'
),
if_exists='ignore',
)
# 3. Agent pipeline: each step is a computed column
agent = pxt.create_table(
'demo/agent', {'prompt': pxt.String}, if_exists='ignore'
)
# Step 1: LLM decides which tools to call
agent.add_computed_column(
response=messages(
model='claude-sonnet-4-20250514',
messages=[{'role': 'user', 'content': agent.prompt}],
tools=tools,
tool_choice=tools.choice(required=True),
max_tokens=4096,
),
if_exists='ignore',
)
# Step 2: Pixeltable executes the tool calls (runs search_videos)
agent.add_computed_column(
tool_output=invoke_tools(tools, agent.response), if_exists='ignore'
)
# In production, add more steps: assemble context, call LLM again with results.
# See the Starter Kit for the full multi-step agent pipeline.
agent.describe()
```
Created table 'chat'.
Created table 'agent'.
Added 0 column values with 0 errors in 0.01 s
## Bonus: Cloud Storage (Optional)
Free managed bucket with Pixeltable Cloud. Set two config values;
computed media flows to cloud. See [Cloud
Services](/use-cases/services).
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# ~/.pixeltable/config.toml
[pixeltable]
api_key = "your-pixeltable-cloud-api-key"
output_media_dest = "pxtfs://yourorg:yourdb/home"
```
## Summary
### Links
* [10-Minute Tour](/overview/ten-minute-tour)
* [Starter Kit](https://github.com/pixeltable/pixeltable-starter-kit)
(FastAPI + React reference app)
* [Cookbooks](/howto/cookbooks) (50+ recipes)
* [Docs](/)
* [GitHub](https://github.com/pixeltable/pixeltable) (Apache 2.0)
# Join tables to combine data
Source: https://docs.pixeltable.com/howto/cookbooks/core/query-join-tables
Join multiple Pixeltable tables on shared keys to combine metadata, embeddings, and computed columns into unified, queryable result sets.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Combine data from multiple tables using inner, left, and cross joins.
## Problem
You have related data in separate tables and need to combine them for
analysis—customers with orders, products with inventory, or media with
metadata.
## Solution
**What’s in this recipe:**
* Inner join to match rows from both tables
* Left join to keep all rows from the first table
* Cross join for Cartesian product (all combinations)
* Join with filtering, aggregation, and saving results
* Paginate results with `limit()` and `offset`
Use `table1.join(table2, on=..., how=...)` to combine tables based on
matching columns.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
import pixeltable.functions as pxtf
# Create a fresh directory
pxt.drop_dir('join_demo', force=True)
pxt.create_dir('join_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'join\_demo'.
\
Created table 'orders'.
Inserted 4 rows with 0 errors in 0.01 s (657.81 rows/s)
### Inner join (matching rows only)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Inner join: only rows that match in both tables
customers.join(
orders, on=customers.customer_id == orders.customer_id, how='inner'
).select(customers.name, orders.product, orders.amount).collect()
```
### Left join (keep all from first table)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Left join: all customers, with order data where available
# Charlie has no orders, so product/amount will be null
customers.join(
orders, on=customers.customer_id == orders.customer_id, how='left'
).select(customers.name, orders.product, orders.amount).collect()
```
### Join with filtering
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Combine join with where clause to filter results
customers.join(
orders, on=customers.customer_id == orders.customer_id, how='inner'
).where(orders.amount > 50).select(
customers.name, customers.email, orders.product, orders.amount
).collect()
```
### Join with aggregation
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Join and aggregate: total spending per customer
customers.join(
orders, on=customers.customer_id == orders.customer_id, how='inner'
).group_by(customers.name).select(
customers.name,
total_spent=pxtf.sum(orders.amount),
order_count=pxtf.count(orders.order_id),
).collect()
```
### Cross join (all combinations)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Cross join: every customer paired with every product (no 'on' condition)
products = pxt.create_table(
'join_demo/products', {'product': pxt.String, 'price': pxt.Float}
)
products.insert(
[
{'product': 'Widget', 'price': 19.99},
{'product': 'Gadget', 'price': 29.99},
]
)
customers.join(products, how='cross').select(
customers.name, products.product, products.price
).collect()
```
Created table 'products'.
Inserted 2 rows with 0 errors in 0.00 s (422.52 rows/s)
### Save join results to a new table
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Build a join query and collect as DataFrame
customer_orders_df = (
customers.join(
orders,
on=customers.customer_id == orders.customer_id,
how='inner',
)
.select(
name=customers.name,
email=customers.email,
product=orders.product,
amount=orders.amount,
)
.collect()
.to_pandas()
)
customer_orders_df
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a new table from the DataFrame
orders_report = pxt.create_table(
'join_demo/orders_report', source=customer_orders_df
)
orders_report.collect()
```
Created table 'orders\_report'.
Inserted 3 rows with 0 errors in 0.01 s (500.32 rows/s)
### Paginate results with limit and offset
Use `limit(n, offset=k)` to retrieve results in pages. This is useful
for displaying results incrementally or building paginated APIs.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Page 1: first 2 rows
orders.order_by(orders.order_id).limit(2).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Page 2: next 2 rows (skip the first 2)
orders.order_by(orders.order_id).limit(2, offset=2).collect()
```
## Explanation
**Join types:**
**Join syntax:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Simple: join on column by name
t1.join(t2, on=t1.id)
# Explicit predicate
t1.join(t2, on=t1.customer_id == t2.customer_id)
# Composite key
t1.join(t2, on=(t1.pk1 == t2.pk1) & (t1.pk2 == t2.pk2))
```
**Aggregation functions:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import sum, count, mean, min, max
# Use as functions, not methods
total=sum(t.amount)
num_rows=count(t.id)
```
**Saving join results:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Collect as DataFrame, then create table
df = query.select(name=t.col, ...).collect().to_pandas()
new_table = pxt.create_table('path', source=df)
```
**Pagination:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# limit(n) returns at most n rows
# limit(n, offset=k) skips the first k rows, then returns n
query.order_by(t.id).limit(10) # rows 0-9
query.order_by(t.id).limit(10, offset=10) # rows 10-19
```
**Tips:**
* Use explicit predicates (`t1.col == t2.col`) for clarity
* Chain `.where()` after join to filter results
* Chain `.group_by()` for aggregations
* Use `'left'` join when the first table is your “main” table
* Use named columns in `.select(name=col)` for clean column names
* Always use `.order_by()` with pagination to get deterministic page
ordering
## See also
* [Look up structured
data](/howto/cookbooks/agents/pattern-data-lookup) -
Use retrieval UDFs for lookups
* [Sample data for
training](/howto/cookbooks/data/data-sampling) -
Sample from joined results
# Time Zones
Source: https://docs.pixeltable.com/howto/cookbooks/core/time-zones
Work with timezone-aware timestamps in Pixeltable: store UTC, convert between zones, and run accurate date queries across regions.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Because typical use cases involve datasets that span multiple time
zones, Pixeltable strives to be precise in how it handles time zone
arithmetic for datetimes.
Timestamps are always stored in the Pixeltable database in UTC, to
ensure consistency across datasets and deployments. Time zone
considerations therefore apply during insertion and retrieval of
timestamp data.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
### The default time zone
Every Pixeltable deployment has a **default time zone**. The default
time zone can be configured either by setting the `PIXELTABLE_TIME_ZONE`
environment variable, or by adding a `time-zone` entry to the
`[pixeltable]` section in `$PIXELTABLE_HOME/config.toml`. It must be a
valid [IANA Time
Zone](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones).
(See the [Pixeltable
Configuration](/platform/configuration) guide
for more details on configuration options.)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import os
os.environ['PIXELTABLE_TIME_ZONE'] = 'America/Los_Angeles'
```
If no time zone is configured, then Pixeltable will fall back on the
system time zone of the host on which it is running. **Because system
time zone is deployment-dependent, it is recommended that production
deployments configure a default time zone explicitly.**
As outlined in the [Python datetime
documentation](https://docs.python.org/3/library/datetime.html), a
Python `datetime` object may be either **naive** (no time zone) or
**aware** (equipped with an explicit time zone). Pixeltable will always
interpret naive `datetime` objects as belonging to the configured
default time zone.
### Insertion and retrieval
When a `datetime` is inserted into the database, it will be converted to
UTC and stored as an absolute timestamp. If the `datetime` has an
explicit time zone, Pixeltable will use that time zone for the
conversion; otherwise, Pixeltable will use the default time zone.
When a `datetime` is retrieved, it will always be retrieved in the
default time zone. To query in a different time zone, it is necessary to
do an explicit conversion; we’ll give an example of this in a moment.
Let’s first walk through a few examples that illustrate the default
behavior.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
pxt.drop_dir('tz_demo', force=True)
pxt.create_dir('tz_demo')
t = pxt.create_table(
'tz_demo/example', {'dt': pxt.Timestamp, 'note': pxt.String}
)
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'tz\_demo'.
Created table 'example'.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from datetime import datetime, timezone
from zoneinfo import ZoneInfo
naive_dt = datetime(2024, 8, 9, 23, 0, 0)
explicit_dt = datetime(
2024, 8, 9, 23, 0, 0, tzinfo=ZoneInfo('America/Los_Angeles')
)
other_dt = datetime(
2024, 8, 9, 23, 0, 0, tzinfo=ZoneInfo('America/New_York')
)
t.insert(
[
{'dt': naive_dt, 'note': 'No time zone specified (uses default)'},
{
'dt': explicit_dt,
'note': 'Time zone America/Los_Angeles was specified explicitly',
},
{
'dt': other_dt,
'note': 'Time zone America/New_York was specified explicitly',
},
]
)
```
Inserting rows into \`example\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`example\`: 3 rows \[00:00, 433.04 rows/s]
Inserted 3 rows with 0 errors.
3 rows inserted, 3 values computed.
On retrieval, all timestamps are normalized to the default time zone,
regardless of how they were specified during insertion.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.collect()
```
To represent timestamps in a different time zone, use the `astimezone`
method.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(
t.dt, dt_new_york=t.dt.astimezone('America/New_York'), note=t.note
).collect()
```
### Timestamp methods and properties
The Pixeltable API exposes all the standard `datetime` methods and
properties from the Python library. Because retrieval uses the default
time zone, they are all relative to the default time zone unless
`astimezone` is used.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(
t.dt,
day_default=t.dt.day,
day_eastern=t.dt.astimezone('America/New_York').day,
).collect()
```
Observe that the first two timestamps map to different dates depending
on the time zone, as expected.
# Track changes and revert to previous versions
Source: https://docs.pixeltable.com/howto/cookbooks/core/version-control-history
Track every change to your Pixeltable tables, run time-travel queries against historical snapshots, and revert columns to previous versions.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Undo mistakes, audit changes, and create point-in-time snapshots of your
data.
## Problem
You need to track what changed in your data pipeline, undo accidental
modifications, or preserve a specific state for reproducibility.
## Solution
**What’s in this recipe:**
* View version history with `history()` and `get_versions()`
* Access specific versions with `pxt.get_table('table:N')`
* Undo changes with `revert()`
* Create point-in-time snapshots with `pxt.create_snapshot()`
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
pxt.drop_dir('version_demo', force=True)
pxt.create_dir('version_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'version\_demo'.
\
### Create a table and make some changes
Every data or schema change creates a new version.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create table (version 0)
products = pxt.create_table(
'version_demo/products',
{'name': pxt.String, 'price': pxt.Float, 'category': pxt.String},
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert more data (version 4)
products.insert(
[{'name': 'Thingamajig', 'price': 49.99, 'category': 'Tools'}]
)
```
Inserting rows into \`products\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`products\`: 1 rows \[00:00, 661.46 rows/s]
Inserted 1 row with 0 errors.
1 row inserted, 3 values computed.
### View version history
Use `history()` for a human-readable summary of all changes.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View full history (most recent first)
products.history()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View only the last 3 versions
products.history(n=3)
```
### Programmatic access to version metadata
Use `get_versions()` to access version data programmatically.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Get version metadata as a list of dictionaries
versions = products.get_versions()
# Access specific version info
latest = versions[0]
latest['version'], latest['change_type'], latest['inserts']
```
(4, 'data', 1)
### Access a specific version
Use `pxt.get_table('table_name:version')` to get a read-only handle to a
specific version:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Get the table at version 1 (after initial insert, before computed column)
products_v1 = pxt.get_table('version_demo/products:1')
# This is a read-only view of the data at that point in time
products_v1.collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Compare data at version 2 (after computed column added) vs version 1
# Note: version 1 doesn't have the price_with_tax column yet
products_v2 = pxt.get_table('version_demo/products:2')
products_v2.collect()
```
### Revert to previous version
Use `revert()` to undo the most recent change. This is irreversible.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Current state: 4 products
products.count()
```
4
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Revert the last insert (removes Thingamajig)
products.revert()
products.count()
```
3
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# History now shows version 4 was reverted
products.history()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Can revert multiple times (back to before the update)
products.revert()
# Check the Widget price is back to original
products.where(products.name == 'Widget').select(
products.name, products.price
).collect()
```
### Create point-in-time snapshots
Snapshots freeze a table’s state for reproducibility. Unlike `revert()`,
snapshots preserve the data indefinitely.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a snapshot of the current state
snapshot_v1 = pxt.create_snapshot('version_demo/products_v1', products)
snapshot_v1.collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Now make changes to the original table
products.insert(
[{'name': 'Doohickey', 'price': 99.99, 'category': 'Premium'}]
)
products.update({'price': 29.99}, where=products.name == 'Gadget')
products.collect()
```
Inserting rows into \`products\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`products\`: 1 rows \[00:00, 535.67 rows/s]
Inserted 1 row with 0 errors.
Inserting rows into \`products\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`products\`: 1 rows \[00:00, 558.05 rows/s]
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Snapshot remains unchanged - still shows original data
snapshot_v1.collect()
```
## Explanation
**What creates a new version:**
* `insert()` - adding rows
* `update()` - modifying rows
* `delete()` - removing rows
* `add_column()` / `add_computed_column()` - schema changes
* `drop_column()` - schema changes
* `rename_column()` - schema changes
**Version history methods:**
* `history()` - Human-readable DataFrame showing all changes
* `get_versions()` - List of dictionaries for programmatic access
**Accessing specific versions:**
* `pxt.get_table('table_name:N')` - Get read-only handle to version N
* Useful for comparing data across versions, auditing changes, or
recovering specific values
* Version handles are read-only—you cannot modify historical versions
**Reverting:**
* `revert()` undoes the most recent version
* Can call multiple times to go back further
* Cannot revert past version 0
* Cannot revert if a snapshot references that version
**Snapshots vs revert:**
* Snapshots are persistent, named, point-in-time copies
* `revert()` permanently removes the latest version
* Use snapshots when you need to preserve state for reproducibility
* Use `revert()` to undo mistakes
## See also
* [Data sharing](../../../platform/data-sharing) - Share tables between
environments
* [Iterative
development](/howto/cookbooks/core/dev-iterative-workflow) -
Fast feedback during development
# Configure API keys for AI services
Source: https://docs.pixeltable.com/howto/cookbooks/core/workflow-api-keys
Manage API keys for OpenAI, Anthropic, and other providers in Pixeltable using environment variables and config files for safe pipelines.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Set up API credentials for OpenAI, Anthropic, and other AI providers so
Pixeltable can access them.
## Problem
You need to call AI services (OpenAI, Anthropic, Gemini, etc.) from your
data pipeline. These services require API keys, but you don’t want to
hardcode credentials in your notebooks or scripts.
## Solution
**What’s in this recipe:**
* Set API keys using environment variables
* Store keys in a config file for all projects
* Use `getpass` for one-time session keys
You configure API keys using one of three methods, depending on your
needs. Pixeltable automatically discovers credentials from environment
variables or config files—no code changes needed.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
### Option 1: environment variables
**Use when:** CI/CD pipelines, Docker containers, production deployments
Set the environment variable in your shell before running Python:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# In your terminal (temporary, current session only)
export OPENAI_API_KEY="sk-..."
# Or add to ~/.bashrc or ~/.zshrc (permanent)
echo 'export OPENAI_API_KEY="sk-..."' >> ~/.zshrc
```
You can also set it in Python (useful for testing):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import os
import pixeltable as pxt
# Set in Python (current process only)
# os.environ['OPENAI_API_KEY'] = 'sk-...'
# Check if a key is set
'OPENAI_API_KEY' in os.environ
```
True
### Option 2: config file
**Use when:** Local development, want credentials available to all
Pixeltable projects
Create `~/.pixeltable/config.toml`:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# ~/.pixeltable/config.toml
[openai]
api_key = "sk-..."
[anthropic]
api_key = "sk-ant-..."
[google]
api_key = "AIza..."
```
You can check if the config file exists:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Check config file location
home_dir = pxt.home() # Usually ~/.pixeltable
config_file = home_dir / 'config.toml'
print(config_file)
config_file.exists()
```
/Users/asiegel/.pixeltable/config.toml
True
### Option 3: getpass (interactive)
**Use when:** Shared notebooks, demos, one-time sessions
Prompt for the key at runtime—it won’t be saved anywhere:
### Verify your configuration
Test that Pixeltable can access your credentials by checking the config:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
# Uncomment to use interactively:
# if 'OPENAI_API_KEY' not in os.environ:
# os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
# Check which API keys are available
services = [
'OPENAI_API_KEY',
'ANTHROPIC_API_KEY',
'GOOGLE_API_KEY',
'MISTRAL_API_KEY',
]
for svc in services:
status = '✓' if svc in os.environ else '✗'
print(f'{status} {svc}')
```
## Explanation
**Discovery order:**
Pixeltable checks for API keys in this order:
1. Environment variable (e.g., `OPENAI_API_KEY`)
2. Config file (`~/.pixeltable/config.toml`)
3. Raises an error if not found
**Supported services:**
**Config file is global:** All Pixeltable projects on your machine share
the same config file.
**Getpass is per-session:** The key only exists in memory for the
current Python session.
## See also
* [Pixeltable configuration
reference](/platform/configuration)
* [Working with
OpenAI](/howto/providers/working-with-openai)
# Extract fields from LLM JSON responses
Source: https://docs.pixeltable.com/howto/cookbooks/core/workflow-json-extraction
Extract structured fields from LLM JSON responses in Pixeltable using path expressions, validation, and computed columns for downstream queries.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Parse and access specific fields from structured JSON responses returned
by language models.
## Problem
LLM APIs return nested JSON responses with metadata you don’t need. You
want to extract just the text content or specific fields for downstream
processing.
```json theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{
"id": "chatcmpl-123",
"choices": [{
"message": {
"content": "This is the actual response text" // ← You want this
}
}],
"usage": {"tokens": 50}
}
```
## Solution
**What’s in this recipe:**
* Extract text content from chat completions
* Access nested JSON fields
* Create separate columns for different fields
You use JSON path notation to extract specific fields from API responses
and store them in computed columns.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
```
### Create prompts table
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions import openai
# Create a fresh directory
pxt.drop_dir('json_demo', force=True)
pxt.create_dir('json_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'json\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('json_demo/prompts', {'prompt': pxt.String})
```
Created table 'prompts'.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.insert(
[
{'prompt': 'What is the capital of France?'},
{'prompt': 'Write a haiku about coding'},
]
)
```
### Extract specific fields
Use dot notation to access nested JSON fields:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Extract just the text content
t.add_computed_column(text=t.response.choices[0].message.content)
# Extract token usage
t.add_computed_column(tokens=t.response.usage.total_tokens)
```
Added 2 column values with 0 errors.
Added 2 column values with 0 errors.
2 rows updated, 2 values computed.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View clean results
t.select(t.prompt, t.text, t.tokens).collect()
```
## Explanation
**Common extraction patterns:**
**Accessing JSON fields:**
* Use dot notation for object properties: `response.usage`
* Use brackets for array elements: `choices[0]`
* Chain them: `response.choices[0].message.content`
**Extracted columns are computed:**
Changes to the source data automatically update all extracted fields.
## See also
* [Configure API
keys](/howto/cookbooks/core/workflow-api-keys)
* [Extract structured data from
images](/howto/cookbooks/images/vision-structured-output)
# Add unique identifiers to your tables
Source: https://docs.pixeltable.com/howto/cookbooks/core/workflow-uuid-identity
Add UUID identity columns to Pixeltable tables to give every row a stable, globally unique identifier for joins, exports, and external systems.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Generate UUIDs for automatic row identification.
## Problem
You need unique identifiers for rows in your data pipeline. Maybe you’re
building an API that returns specific records, tracking processing
status across systems, or joining data from multiple sources.
## Solution
**What’s in this recipe:**
* Create tables with auto-generated UUID primary keys
* Add UUID columns to existing tables
* Generate UUIDs with `uuid7()`
You use `uuid7()` to generate UUIDs for each row. Define it in the
schema with `{'column_name': uuid7()}` syntax, or add it to existing
tables with `add_computed_column()`.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.uuid import uuid7
# Create a fresh directory
pxt.drop_dir('uuid_demo', force=True)
pxt.create_dir('uuid_demo')
```
### Create a table with a UUID primary key
Use `uuid7()` in your schema to create a column that auto-generates
UUIDs:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create table with auto-generated UUID primary key
products = pxt.create_table(
'uuid_demo/products',
{
'id': uuid7(), # Auto-generates UUID for each row
'name': pxt.String,
'price': pxt.Float,
},
primary_key=['id'],
)
```
Created table 'products'.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert data - no need to provide 'id', it's auto-generated
products.insert(
[
{'name': 'Laptop', 'price': 999.99},
{'name': 'Mouse', 'price': 29.99},
{'name': 'Keyboard', 'price': 79.99},
]
)
```
Inserted 3 rows with 0 errors in 0.02 s (191.21 rows/s)
3 rows inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View the data - each row has a unique UUID
products.collect()
```
### Add a UUID column to an existing table
You can add a UUID column to a table that already exists using
`add_computed_column()`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table without a UUID column
orders = pxt.create_table(
'uuid_demo/orders', {'customer': pxt.String, 'amount': pxt.Float}
)
```
Created table 'orders'.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert some data
orders.insert(
[
{'customer': 'Alice', 'amount': 150.00},
{'customer': 'Bob', 'amount': 75.50},
]
)
```
Inserted 2 rows with 0 errors in 0.01 s (310.49 rows/s)
2 rows inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add a UUID column to existing table
orders.add_computed_column(order_id=uuid7())
```
Added 2 column values with 0 errors in 0.02 s (98.14 rows/s)
2 rows updated.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View orders with their UUID column
orders.collect()
```
## Explanation
**Two ways to add UUIDs:**
Both use `uuid7()` which generates UUIDv7 (time-based) identifiers:
* 128-bit values
* Formatted as 32 hex digits with hyphens:
`018e65c5-35e5-7c5d-8f37-f1c5b9c8a7b2`
* Time-ordered for better database performance
* Virtually guaranteed unique (collision probability is negligible)
## See also
* [Tables and
operations](/tutorials/tables-and-data-operations)
* [Computed
columns](/tutorials/computed-columns)
# Export data for ML training
Source: https://docs.pixeltable.com/howto/cookbooks/data/data-export-pytorch
Export Pixeltable tables and views to PyTorch DataLoaders for training image, video, audio, and text models with streaming batches.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Convert Pixeltable data to PyTorch DataLoader format for model training.
## Problem
You have prepared training data—images with labels, text with
embeddings, or multimodal data—and need to export it for PyTorch model
training.
## Solution
**What’s in this recipe:**
* Convert query results to PyTorch Dataset
* Use with DataLoader for batch training
* Export to Parquet for external tools
You use `query.to_pytorch_dataset()` to create an iterable dataset
compatible with PyTorch DataLoader.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable torch torchvision
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
import torch
from torch.utils.data import DataLoader
# Create a fresh directory
pxt.drop_dir('pytorch_demo', force=True)
pxt.create_dir('pytorch_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'pytorch\_demo'.
\
### Create sample training data
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create table with images and labels
training_data = pxt.create_table(
'pytorch_demo/training_data', {'image': pxt.Image, 'label': pxt.Int}
)
```
### Export to PyTorch dataset
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add a resize step to ensure all images have the same size
training_data.add_computed_column(
image_resized=training_data.image.resize((224, 224))
)
# Convert to PyTorch dataset
# 'pt' format returns images as CxHxW tensors with values in [0,1]
pytorch_dataset = training_data.select(
training_data.image_resized, training_data.label
).to_pytorch_dataset(image_format='pt')
```
Added 3 column values with 0 errors.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Use with PyTorch DataLoader
dataloader = DataLoader(pytorch_dataset, batch_size=2)
# Get first batch to verify the shape
batch = next(iter(dataloader))
batch[
'image_resized'
].shape # Should be (2, 3, 224, 224) - batch_size x channels x height x width
```
torch.Size(\[2, 3, 224, 224])
### Export to Parquet for external tools
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import tempfile
from pathlib import Path
# Export to Parquet for use with other ML tools
export_path = Path(tempfile.mkdtemp()) / 'training_data'
pxt.io.export_parquet(
training_data.select(training_data.label), # Non-image columns
parquet_path=export_path,
)
```
## Explanation
**Export methods:**
**Image format options:**
**DataLoader tips:**
* Data is cached to disk for efficient repeated loading
* Use `num_workers > 0` for parallel data loading
* Filter/transform data before export to reduce size
## See also
* [Sample data for
training](/howto/cookbooks/data/data-sampling) -
Stratified sampling
* [Import Parquet
files](/howto/cookbooks/data/data-import-parquet) -
Parquet import/export
# Upload media to S3 and other cloud storage
Source: https://docs.pixeltable.com/howto/cookbooks/data/data-export-s3
Upload media files generated by Pixeltable computed columns to Amazon S3 and other cloud storage providers for sharing and downstream use.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
When Pixeltable generates media files (thumbnails, extracted frames,
processed images), by default it stores them locally. For production
workflows, you can configure Pixeltable to upload these files directly
to cloud blob storage including Amazon S3, Google Cloud Storage, Azure
Blob Storage, and S3-compatible services like Cloudflare R2, Backblaze
B2, and Tigris.
**Key features:**
* Computed media (AI-generated outputs) automatically uploads to your
bucket
* Input media can optionally be persisted for durability
* Files are cached locally and downloaded on-demand
**Configuration options:**
1. **Global defaults** in `config.toml`:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[pixeltable]
input_media_dest = "s3://my-bucket/input/"
output_media_dest = "s3://my-bucket/output/"
```
2. **Per-column destination** (computed columns only):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(
thumbnail=t.image.thumbnail((128, 128)),
destination='s3://my-bucket/thumbnails/'
)
```
In this notebook, you’ll learn how to configure blob storage
destinations for your media files.
## What you’ll learn
* Where Pixeltable stores files by default
* How to specify destinations for individual columns
* How to configure global destinations for all columns
* How destination precedence works
## How it works
Pixeltable decides where to store media files using this priority:
1. **Column destination** (highest priority) — `destination` parameter
in `add_computed_column()`
2. **Global configuration** — `input_media_dest` / `output_media_dest`
in [config file](/platform/configuration)
3. **Pixeltable’s default local storage** — Used if nothing else is
configured
## Prerequisites
For this notebook, you’ll need:
* `pixeltable` and `boto3` installed
* (Optional) Cloud storage credentials if you want to use a cloud
provider
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable boto3
```
## Setup
Let’s set up our demo environment. We’ll create a Pixeltable directory
for this demo, set up local destination paths, create a table, and
insert a sample image.
You can substitute cloud storage URIs (like `s3://my-bucket/path/`)
anywhere you see a local destination path.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pathlib import Path
# Clean slate for this demo
pxt.drop_dir('blob_storage_demo', force=True)
pxt.create_dir('blob_storage_demo')
```
Now we’ll create a table with an image column and insert a sample image
from the web.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create table
t = pxt.create_table(
'blob_storage_demo/media',
{'source_image': pxt.Image},
if_exists='replace',
)
```
Created table 'media'.
We can inspect the schema before adding images to our table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t
```
Let’s insert a single sample image.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sample_image = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg'
t.insert(source_image=sample_image)
```
Inserted 1 row with 0 errors in 0.77 s (1.29 rows/s)
1 row inserted.
And we can see the image in our table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.collect()
```
## Default destinations
By default, Pixeltable stores all media files in local storage under
`~/.pixeltable/media`:
* **Input files** (files you insert) — If you insert a URL, Pixeltable
stores the URL and downloads it to cache on access. If you insert a
local file path, Pixeltable just stores the path reference (the file
stays where it is).
* **Output files** (files Pixeltable generates) — Stored in
`~/.pixeltable/media`
This works out of the box with no configuration. You can change these
defaults, which we’ll cover in the rest of this notebook.
Let’s check where the source image is stored. Since we inserted a URL
(not a local file), Pixeltable stores the URL reference and will
download it to cache when we access it.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Let's see where the source_image is stored by default
t.select(t.source_image.fileurl).collect()
```
Now let’s add a computed column without specifying a destination. This
will show us where Pixeltable stores **output** files by default.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add computed column with no destination specified - uses default
t.add_computed_column(
flipped=t.source_image.transpose(0), if_exists='replace'
)
```
Added 1 column value with 0 errors in 0.02 s (45.44 rows/s)
1 row updated.
Check the file URL - it points to `~/.pixeltable/media`, the default
location for generated files.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.flipped, t.flipped.fileurl).collect()
```
## Per-column destinations
When you create a computed column, you can specify exactly where to
store generated files using the `destination=` parameter. This gives you
fine-grained control over outputs, which may be costly and/or difficult
to re-generate.
We’ll create a destination directory for storing one of our processed
images. For this demo, we’re using a local directory on your Desktop,
but you can replace this path with a cloud storage URI (like
`s3://my-bucket/rotated/`).
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a local destination directory
# For S3: dest_rotated = "s3://my-bucket/rotated/"
# For GCS: dest_rotated = "gs://my-bucket/rotated/"
base_path = Path.home() / 'Desktop' / 'pixeltable_outputs'
base_path.mkdir(parents=True, exist_ok=True)
dest_rotated = str(base_path / 'rotated')
# Create directory (only needed for local paths)
Path(dest_rotated).mkdir(exist_ok=True)
```
Now let’s add a computed column **with** an explicit destination to see
the difference from the default behavior.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add column WITH explicit destination
t.add_computed_column(
rotated=t.source_image.rotate(90),
destination=dest_rotated,
if_exists='replace',
)
```
Added 1 column value with 0 errors in 0.02 s (48.98 rows/s)
1 row updated.
Compare the file URLs. The `rotated` image uses our explicit
destination, while `flipped` (created earlier) uses the default
`~/.pixeltable/media` location.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.rotated, t.rotated.fileurl).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.flipped, t.flipped.fileurl).collect()
```
## Changing global destinations
Instead of setting `destination=` on every column, you can change the
global default for ALL columns.
### Output and input destinations
You can configure two types of global destinations:
* **`output_media_dest`** — Changes the default for files Pixeltable
generates (computed columns)
* **`input_media_dest`** — Changes the default for files you insert into
tables
You can set them to the same bucket or different buckets depending on
your needs.
### How to configure
You have two options:
**Option 1: Configuration file** (`~/.pixeltable/config.toml`)
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[pixeltable]
# Where files Pixeltable generates are stored
output_media_dest = "s3://my-bucket/output/"
# Where files you insert are stored
input_media_dest = "s3://my-bucket/input/"
```
**Option 2: Environment variables**
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
export PIXELTABLE_OUTPUT_MEDIA_DEST="s3://my-bucket/output/"
export PIXELTABLE_INPUT_MEDIA_DEST="s3://my-bucket/input/"
```
### Supported providers and URI formats
For complete authentication and setup details, see the [Cloud Storage
documentation](/integrations/cloud-storage).
## Overriding global destinations
Even if you configure global destinations, you can still override them
for specific columns using the `destination=` parameter in
`add_computed_column()`.
Let’s create a new destination directory and add a thumbnail column that
uses it.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a different destination for thumbnails
dest_thumbnails = str(base_path / 'thumbnails')
Path(dest_thumbnails).mkdir(exist_ok=True)
# Add column with explicit destination (overrides any global default)
t.add_computed_column(
thumbnail=t.source_image.thumbnail((128, 128)),
destination=dest_thumbnails,
if_exists='replace',
)
```
Added 1 column value with 0 errors in 0.02 s (47.89 rows/s)
1 row updated.
Let’s view the thumbnail and its file URL. The explicit `destination=`
parameter always wins, regardless of global configuration.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.thumbnail, t.thumbnail.fileurl).collect()
```
## Getting URLs for your files
When your files are in blob storage, you can get URLs that point
directly to them. These URLs work in HTML, APIs, or any application you
need to serve media with.
The `.fileurl` property gives you direct URLs you can use anywhere.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(
source=t.source_image.fileurl,
rotated=t.rotated.fileurl,
flipped=t.flipped.fileurl,
).collect()
```
## Generating presigned URLs
**Note:** This section only applies if you’re using cloud storage (S3,
GCS, Azure, R2, B2, Tigris). If you’re following along with local
destinations (as in the examples above), you can skip this section or
configure cloud storage to try it out.
When your files are in cloud storage, the `.fileurl` property returns
storage URIs like `s3://bucket/path/file.jpg`. These aren’t directly
accessible over HTTP.
For private buckets or when you need time-limited HTTP access, use
**presigned URLs**. These are temporary, authenticated URLs that allow
anyone to access your files for a limited time without needing
credentials.
Presigned URLs are particularly useful for:
* Sharing files from private buckets without making them public
* Creating temporary download links with expiration
* Serving media in web applications without exposing credentials
* Providing time-limited access to sensitive content
Use the `presigned_url` function from `pixeltable.functions.net`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import os
# Use HTTPS URL format for Backblaze B2
b2_region = 'us-east-005'
b2_bucket = 'pixeltable'
cloud_destination = (
f'https://s3.{b2_region}.backblazeb2.com/{b2_bucket}/presigned-demo/'
)
# Add the computed column
t.add_computed_column(
cloud_thumbnail=t.source_image.thumbnail((64, 64)),
destination=cloud_destination,
if_exists='replace',
)
```
Added 1 column value with 0 errors in 0.22 s (4.46 rows/s)
1 row updated.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Now generate presigned URLs for the cloud-stored files
from pixeltable.functions import net
t.select(
cloud_thumbnail=t.cloud_thumbnail,
storage_url=t.cloud_thumbnail.fileurl,
presigned_url=net.presigned_url(
t.cloud_thumbnail.fileurl, 3600
), # 1-hour expiration
).collect()
```
The presigned URLs in the output are fully authenticated HTTP/HTTPS URLs
that can be accessed directly in a browser or used in APIs without any
credentials.
### Common expiration times
**Note:** Different storage providers have different maximum expiration
limits. For example, Google Cloud Storage has a maximum 7-day expiration
for presigned URLs.
### Troubleshooting presigned URLs
If `presigned_url()` isn’t working:
1. **Local files**: Presigned URLs only work with cloud storage (S3,
GCS, Azure, R2, B2, Tigris). If your files are stored locally
(default), you’ll get an error. Configure a cloud destination first.
2. **Already HTTP URLs**: If `.fileurl` returns an `http://` or
`https://` URL (not a storage URI like `s3://`), the file is already
publicly accessible and doesn’t need a presigned URL.
3. **Credentials**: Ensure your cloud storage credentials are properly
configured. See the [Cloud Storage
documentation](/integrations/cloud-storage)
for provider-specific setup.
## Common patterns
Here are a few real-world patterns you might use:
### Pattern 1: All media in one place
If you want everything in the same bucket, configure both input and
output destinations in `~/.pixeltable/config.toml`:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[pixeltable]
input_media_dest = "s3://my-bucket/media/"
output_media_dest = "s3://my-bucket/media/"
```
Or set environment variables:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
export PIXELTABLE_INPUT_MEDIA_DEST="s3://my-bucket/media/"
export PIXELTABLE_OUTPUT_MEDIA_DEST="s3://my-bucket/media/"
```
### Pattern 2: Separate input and output
Keep source files separate from processed files in
`~/.pixeltable/config.toml`:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[pixeltable]
input_media_dest = "s3://my-bucket/uploads/"
output_media_dest = "s3://my-bucket/processed/"
```
### Pattern 3: Override for specific columns
Use a global default, but send some columns elsewhere. First, set a
global default in your config:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[pixeltable]
output_media_dest = "s3://my-bucket/processed/"
```
Then in your code, most columns use the global default, but you can
override specific ones:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Uses global default (s3://my-bucket/processed/)
t.add_computed_column(
thumbnail=t.image.thumbnail((128, 128))
)
# Overrides global default - goes to different location
t.add_computed_column(
large_thumbnail=t.image.thumbnail((512, 512)),
destination='s3://my-bucket/thumbnails/'
)
```
## Where do my files go?
Understanding how Pixeltable handles different types of input files
helps you make better decisions about storage configuration.
When you configure a cloud destination, Pixeltable populates both the
destination and the local cache efficiently during `insert()`. For URLs,
this means downloading once and using that download for both the upload
and cache—avoiding wasteful upload→download cycles.
## What you learned
* Pixeltable uses local storage by default for all media files
* You can override the default for specific columns with the
`destination` parameter
* You can change the global default with `input_media_dest` and
`output_media_dest`
* Precedence: column destination > global config > Pixeltable’s
default local storage
* Use `.fileurl` to get URLs for your stored files
* Use `net.presigned_url()` to generate time-limited, authenticated HTTP
URLs for cloud storage files
* Pixeltable handles caching intelligently to avoid wasteful operations
## See also
* [Load from S3](../../../howto/cookbooks/data/data-import-s3) - Import
media from cloud storage
* [Cloud Storage Integration](../../../integrations/cloud-storage) -
Provider setup
## Next steps
* See the [Cloud Storage
documentation](/integrations/cloud-storage)
for complete provider setup and authentication details
* Check out [Pixeltable
Configuration](/platform/configuration) for
all config options
* Join our [Discord community](https://pixeltable.com/discord) if you
have questions
# Export data to SQL databases
Source: https://docs.pixeltable.com/howto/cookbooks/data/data-export-sql
Export Pixeltable tables to PostgreSQL, SQLite, and other SQL databases for BI tools, dashboards, and downstream analytics workflows.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Send your Pixeltable data to PostgreSQL, SQLite, MySQL, TigerData, or
Snowflake for use in external applications.
**What’s in this recipe:**
* Export entire tables or filtered queries to any SQL database
* Select specific columns for export
* Handle existing tables with replace or append options
* Connect to cloud PostgreSQL services (e.g. TigerData)
## Problem
You have processed data in your pipeline—cleaned text, generated
embeddings, extracted metadata—and need to send it to a SQL database for
use by other applications or teams.
## Solution
You use `export_sql()` to export tables or queries to any SQL database
via database connection strings. The function automatically maps
Pixeltable types to appropriate SQL types for each database dialect.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable snowflake-sqlalchemy
```
### Create sample data
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
import tempfile
from pathlib import Path
from pixeltable.io.sql import export_sql
# Create a fresh directory
pxt.drop_dir('sql_export_demo', force=True)
pxt.create_dir('sql_export_demo')
```
Created directory 'sql\_export\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table with product data
products = pxt.create_table(
'sql_export_demo/products',
{
'name': pxt.String,
'price': pxt.Float,
'in_stock': pxt.Bool,
'metadata': pxt.Json,
},
)
```
Inserted 5 rows with 0 errors in 0.01 s (566.35 rows/s)
5 rows inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View the data
products.collect()
```
### Export an entire table
You pass a table and a SQLAlchemy connection string to export all rows
and columns.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a SQLite database for this demo
db_path = Path(tempfile.mkdtemp()) / 'products.db'
connection_string = f'sqlite:///{db_path}'
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Export the full table
export_sql(products, 'products', db_connect_str=connection_string)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Verify the export with SQLAlchemy
import sqlalchemy as sql
engine = sql.create_engine(connection_string)
with engine.connect() as conn:
result = conn.execute(sql.text('SELECT * FROM products')).fetchall()
result
```
### Export a filtered query
You can export any query result—filter rows, select specific columns, or
apply transformations before export.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Export only in-stock products
export_sql(
products.where(products.in_stock == True),
'in_stock_products',
db_connect_str=connection_string,
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Verify filtered export
with engine.connect() as conn:
result = conn.execute(
sql.text('SELECT name, price FROM in_stock_products')
).fetchall()
result
```
### Export specific columns
You select only the columns you need before exporting. You can also
rename columns in the output.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Export only name and price columns
export_sql(
products.select(products.name, products.price),
'price_list',
db_connect_str=connection_string,
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Export with renamed columns
export_sql(
products.select(
product_name=products.name, unit_price=products.price
),
'renamed_columns',
db_connect_str=connection_string,
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Verify column selection
inspector = sql.inspect(engine)
columns = [col['name'] for col in inspector.get_columns('price_list')]
columns
```
\['name', 'price']
### Handle existing tables
You control what happens when the target table already exists using the
`if_exists` parameter:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Append new data to existing table
export_sql(
products.where(products.price > 50),
'products',
db_connect_str=connection_string,
if_exists='insert',
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Check row count after insert
with engine.connect() as conn:
result = conn.execute(
sql.text('SELECT COUNT(*) FROM products')
).fetchone()
f'Total rows after insert: {result[0]}'
```
'Total rows after insert: 7'
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Replace with fresh data
export_sql(
products.select(products.name, products.price),
'products',
db_connect_str=connection_string,
if_exists='replace',
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Check that table was replaced
inspector = sql.inspect(engine)
columns = [col['name'] for col in inspector.get_columns('products')]
with engine.connect() as conn:
row_count = conn.execute(
sql.text('SELECT COUNT(*) FROM products')
).fetchone()[0]
f'Columns: {columns}, Row count: {row_count}'
```
"Columns: \['name', 'price'], Row count: 5"
### Export to cloud PostgreSQL (TigerData)
You can export directly to cloud-hosted PostgreSQL databases like
[TigerData](https://www.timescale.com/cloud) (Timescale Cloud). Get your
credentials from the TigerData dashboard after creating a service.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
# Skip interactive sections in CI environments
SKIP_CLOUD_TESTS = os.environ.get('CI') or os.environ.get(
'GITHUB_ACTIONS'
)
if not SKIP_CLOUD_TESTS:
# Enter your TigerData credentials interactively
tigerdata_host = input(
'TigerData host (e.g., abc123.tsdb.cloud.timescale.com): '
)
tigerdata_port = input('TigerData port (e.g., 38963): ')
tigerdata_user = input('TigerData username (e.g., tsdbadmin): ')
tigerdata_password = getpass.getpass('TigerData password: ')
tigerdata_dbname = input('TigerData database name (e.g., tsdb): ')
# Build the connection string (use postgresql+psycopg:// for SQLAlchemy compatibility)
tigerdata_connection = f'postgresql+psycopg://{tigerdata_user}:{tigerdata_password}@{tigerdata_host}:{tigerdata_port}/{tigerdata_dbname}?sslmode=require'
else:
print('Skipping TigerData section (running in CI)')
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
if not SKIP_CLOUD_TESTS:
# Export to TigerData
export_sql(
products,
'pixeltable_products',
db_connect_str=tigerdata_connection,
if_exists='replace',
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
if not SKIP_CLOUD_TESTS:
# Verify the export in TigerData
tigerdata_engine = sql.create_engine(tigerdata_connection)
with tigerdata_engine.connect() as conn:
result = conn.execute(
sql.text('SELECT * FROM pixeltable_products')
).fetchall()
result
```
### Export to Snowflake
You can export directly to [Snowflake](https://www.snowflake.com/) data
warehouses. Get your account identifier from the Snowflake web interface
under **Admin → Accounts**.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
if not SKIP_CLOUD_TESTS:
# Enter your Snowflake credentials interactively
snowflake_account = input(
'Snowflake account identifier (e.g., WEZMMGC-AIB20064): '
)
snowflake_user = input('Snowflake username: ')
snowflake_password = getpass.getpass('Snowflake password: ')
snowflake_warehouse = input(
'Snowflake warehouse (e.g., COMPUTE_WH): '
)
snowflake_database = input('Snowflake database: ')
snowflake_schema = input('Snowflake schema (e.g., PUBLIC): ')
# Build the connection string
snowflake_connection = f'snowflake://{snowflake_user}:{snowflake_password}@{snowflake_account}/{snowflake_database}/{snowflake_schema}?warehouse={snowflake_warehouse}'
else:
print('Skipping Snowflake section (running in CI)')
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
if not SKIP_CLOUD_TESTS:
# Export to Snowflake (without JSON column)
export_sql(
products.select(products.name, products.price, products.in_stock),
'PIXELTABLE_PRODUCTS',
db_connect_str=snowflake_connection,
if_exists='replace',
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
if not SKIP_CLOUD_TESTS:
# Verify the export in Snowflake
snowflake_engine = sql.create_engine(snowflake_connection)
with snowflake_engine.connect() as conn:
result = conn.execute(
sql.text('SELECT * FROM PIXELTABLE_PRODUCTS')
).fetchall()
result
```
### Exporting media data
For tables containing media types (`pxt.Image`, `pxt.Video`,
`pxt.Audio`), you have two options:
1. **Extract metadata before export** - Select only the columns you
need (paths, embeddings, extracted text, etc.) and export those to
SQL.
2. **Use Pixeltable destinations** - For syncing media files to cloud
storage, use Pixeltable’s built-in destination support with
providers like
[Tigris](/howto/providers/working-with-tigris).
**Example: Export image metadata to SQL**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table with images
images = pxt.create_table(
'sql_export_demo/images', {'image': pxt.Image, 'label': pxt.String}
)
# Add computed columns for metadata
images.add_computed_column(width=images.image.width)
images.add_computed_column(height=images.image.height)
images.add_computed_column(mode=images.image.mode)
```
Created table 'images'.
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
Inserted 2 rows with 0 errors in 0.03 s (63.85 rows/s)
2 rows inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Export metadata (not the image itself) to SQL
export_sql(
images.select(images.label, images.width, images.height, images.mode),
'image_metadata',
db_connect_str=connection_string, # or tigerdata_connection for cloud
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Verify the metadata export
with engine.connect() as conn:
result = conn.execute(
sql.text('SELECT * FROM image_metadata')
).fetchall()
result
```
## Explanation
**Connection strings:**
The function uses SQLAlchemy connection strings. Common formats:
**Type mapping:**
Pixeltable types map to SQL types automatically:
**Unsupported types:**
Media types like `pxt.Image`, `pxt.Video`, and `pxt.Audio` cannot be
exported directly. Extract the data you need (paths, embeddings,
metadata) before export.
## See also
* [Working with
Tigris](/howto/providers/working-with-tigris) -
Sync media files to cloud storage
* [Cloud Storage
Integration](/integrations/cloud-storage) -
S3, GCS, and Azure Blob storage
* [Export to PyTorch](./data-export-pytorch) - Export for ML training
# Import data from CSV files
Source: https://docs.pixeltable.com/howto/cookbooks/data/data-import-csv
Import CSV files into Pixeltable tables with automatic type inference, column mapping, and incremental loading for tabular datasets.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Load data from CSV and Excel files into Pixeltable tables for processing
and analysis.
## Problem
You have data in CSV or Excel files that you want to process with AI
models, add computed columns to, or combine with other data sources.
## Solution
**What’s in this recipe:**
* Import CSV files directly into tables
* Import from Pandas DataFrames
* Handle different data types
You use `pxt.create_table()` with a `source` parameter to create a table
from a CSV file, or insert DataFrame rows into an existing table.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable pandas
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pandas as pd
import pixeltable as pxt
# Create a fresh directory
pxt.drop_dir('import_demo', force=True)
pxt.create_dir('import_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'import\_demo'.
\
### Import CSV directly
Use `create_table` with `source` to create a table from a CSV file:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Import CSV from URL
csv_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/world-population-data.csv'
population = pxt.create_table('import_demo/population', source=csv_url)
```
Created table 'population'.
Inserting rows into \`population\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`population\`: 234 rows \[00:00, 9032.63 rows/s]
Inserted 234 rows with 0 errors.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View the imported data
population.head(5)
```
### Import from Pandas DataFrame
You can also create a DataFrame first and insert it:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a DataFrame
df = pd.DataFrame(
{
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'city': ['NYC', 'LA', 'Chicago'],
}
)
# Create table and insert DataFrame
users = pxt.create_table(
'import_demo/users',
{'name': pxt.String, 'age': pxt.Int, 'city': pxt.String},
)
users.insert(df)
```
Created table 'users'.
Inserting rows into \`users\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`users\`: 3 rows \[00:00, 923.31 rows/s]
Inserted 3 rows with 0 errors.
3 rows inserted, 6 values computed.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View the data
users.collect()
```
## Explanation
**Source types supported:**
**Type inference:**
Pixeltable automatically infers column types from CSV data. You can
override types using `schema_overrides`.
**Large files:**
For very large CSV files, consider:
* Using `create_table(source=...)` which streams data
* Importing in batches if memory is limited
## See also
* [Tables
documentation](/tutorials/tables-and-data-operations)
* [Bringing data
guide](/howto/cookbooks/data/data-import-csv)
# Import data from Excel files
Source: https://docs.pixeltable.com/howto/cookbooks/data/data-import-excel
Load XLSX and Excel spreadsheets into Pixeltable tables with sheet selection, header handling, and type inference for analysis pipelines.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Load data from Excel spreadsheets (.xlsx) into Pixeltable tables.
## Problem
You have data in Excel format that needs to be loaded for AI
processing—reports, inventory lists, or business data exported from
other systems.
## Solution
**What’s in this recipe:**
* Import Excel files directly into tables
* Handle multiple sheets
* Override column types when needed
You use `pxt.create_table()` with an Excel file path as the `source`
parameter. Pixeltable infers column types automatically.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openpyxl pandas
```
### Create sample Excel file
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pandas as pd
import pixeltable as pxt
import tempfile
from pathlib import Path
# Create sample Excel file for demo
sample_data = pd.DataFrame(
{
'order_id': [1001, 1002, 1003, 1004, 1005],
'customer': ['Alice', 'Bob', 'Carol', 'Dave', 'Eve'],
'product': [
'Widget A',
'Gadget B',
'Widget A',
'Tool C',
'Gadget B',
],
'quantity': [2, 1, 5, 3, 2],
'price': [29.99, 149.99, 29.99, 79.99, 149.99],
'date': [
'2024-01-15',
'2024-01-16',
'2024-01-16',
'2024-01-17',
'2024-01-18',
],
}
)
# Save to temp Excel file
temp_dir = tempfile.mkdtemp()
excel_path = Path(temp_dir) / 'orders.xlsx'
sample_data.to_excel(excel_path, index=False)
sample_data
```
### Import Excel file
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a fresh directory
pxt.drop_dir('excel_demo', force=True)
pxt.create_dir('excel_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'excel\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Import Excel file directly
orders = pxt.create_table(
'excel_demo/orders',
source=str(excel_path),
source_format='excel', # Hint for Excel format
)
```
Created table 'orders'.
Inserting rows into \`orders\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`orders\`: 5 rows \[00:00, 501.21 rows/s]
Inserted 5 rows with 0 errors.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View imported data
orders.collect()
```
### Add computed columns
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add computed column for order total
orders.add_computed_column(total=orders.quantity * orders.price)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View with computed total
orders.select(
orders.order_id,
orders.customer,
orders.product,
orders.quantity,
orders.price,
orders.total,
).collect()
```
## Explanation
**Import methods:**
**Excel-specific options:**
Pass Pandas `read_excel` arguments via `extra_args`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.create_table(
'table_name',
source='data.xlsx',
source_format='excel',
extra_args={'sheet_name': 'Sheet2', 'skiprows': 1}
)
```
**Common extra\_args:**
## See also
* [Import CSV
files](/howto/cookbooks/data/data-import-csv) -
CSV and tabular data
* [Import Parquet
files](/howto/cookbooks/data/data-import-parquet) -
Columnar data
# Import data from Hugging Face datasets
Source: https://docs.pixeltable.com/howto/cookbooks/data/data-import-huggingface
Import Hugging Face datasets directly into Pixeltable tables for vision, text, and multimodal ML training and evaluation workflows.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Load datasets from Hugging Face Hub into Pixeltable tables for
processing with AI models.
## Problem
You want to use a dataset from Hugging Face Hub—for fine-tuning,
evaluation, or analysis. You need to load it into a format where you can
add computed columns, embeddings, or AI transformations.
## Solution
**What’s in this recipe:**
* Import Hugging Face datasets directly into tables
* Handle datasets with multiple splits (train/test/validation)
* Work with image datasets
You use `pxt.create_table()` with a Hugging Face dataset as the `source`
parameter. Pixeltable automatically maps HF types to Pixeltable column
types.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable datasets
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from datasets import load_dataset
# Create a fresh directory
pxt.drop_dir('hf_demo', force=True)
pxt.create_dir('hf_demo')
```
Created directory 'hf\_demo'.
\
### Import a single split
Load a specific split from a dataset:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Load a small subset for demo (first 100 rows of rotten_tomatoes)
hf_dataset = load_dataset(
'cornell-movie-review-data/rotten_tomatoes', split='train[:100]'
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Import into Pixeltable
reviews = pxt.create_table('hf_demo/reviews', source=hf_dataset)
```
Created table 'reviews'.
Inserting rows into \`reviews\`: 100 rows \[00:00, 14781.69 rows/s]
Inserted 100 rows with 0 errors.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View imported data
reviews.head(5)
```
### Import multiple splits
Load a DatasetDict with multiple splits and track which split each row
came from:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Load dataset with multiple splits (small subset for demo)
hf_dataset_dict = load_dataset(
'cornell-movie-review-data/rotten_tomatoes',
split={'train': 'train[:50]', 'test': 'test[:50]'},
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Import each split separately for clarity
train_data = pxt.create_table(
'hf_demo/reviews_train', source=hf_dataset_dict['train']
)
test_data = pxt.create_table(
'hf_demo/reviews_test', source=hf_dataset_dict['test']
)
```
Created table 'reviews\_train'.
Inserting rows into \`reviews\_train\`: 50 rows \[00:00, 10150.29 rows/s]
Inserted 50 rows with 0 errors.
Created table 'reviews\_test'.
Inserting rows into \`reviews\_test\`: 50 rows \[00:00, 9883.37 rows/s]
Inserted 50 rows with 0 errors.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View training data
train_data.head(5)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View test data
test_data.head(3)
```
### Add AI-powered computed columns
Enrich the dataset with AI models:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add a computed column for text length
reviews.add_computed_column(
text_length=reviews.text.apply(len, col_type=pxt.Int)
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View with computed column
reviews.select(reviews.text, reviews.label, reviews.text_length).head(5)
```
### Type mapping
Pixeltable automatically maps Hugging Face types to Pixeltable types:
Use `schema_overrides` to customize type mapping when needed.
## Explanation
**Why import Hugging Face datasets into Pixeltable:**
1. **Add computed columns** - Enrich data with embeddings, AI analysis,
or transformations
2. **Incremental processing** - Add new rows without reprocessing
existing data
3. **Persistent storage** - Keep processed results across sessions
4. **Query capabilities** - Filter, aggregate, and join with other
tables
**Working with large datasets:**
For very large datasets, consider loading in batches or using streaming
mode in the `datasets` library before importing.
## See also
* [Import CSV
files](/howto/cookbooks/data/data-import-csv) -
For CSV and Excel imports
* [Semantic text
search](/howto/cookbooks/search/search-semantic-text) -
Add embeddings to text data
* [Hugging Face integration
notebook](/howto/providers/working-with-hugging-face) -
Full integration guide
# Import data from JSON files
Source: https://docs.pixeltable.com/howto/cookbooks/data/data-import-json
Load JSON and JSONL files into Pixeltable tables with nested object support, schema inference, and streaming ingestion of large datasets.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Load structured data from JSON files into Pixeltable tables for
processing and analysis.
## Problem
You have data in JSON format—from APIs, exports, or application logs.
You need to load this data for processing with AI models or combining
with other data sources.
## Solution
**What’s in this recipe:**
* Import JSON files directly into tables
* Import from URLs (APIs, remote files)
* Handle nested JSON structures
You use `pxt.create_table()` with a `source` parameter to create a table
from a JSON file or URL. The JSON must be an array of objects, where
each object becomes a row.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
### Create sample JSON file
First, create a sample JSON file to demonstrate the import process:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import json
import pixeltable as pxt
import tempfile
from pathlib import Path
# Create sample JSON data (array of objects)
sample_data = [
{
'id': 1,
'title': 'Introduction to ML',
'author': 'Alice',
'tags': ['ml', 'intro'],
'rating': 4.5,
},
{
'id': 2,
'title': 'Deep Learning Basics',
'author': 'Bob',
'tags': ['dl', 'neural'],
'rating': 4.8,
},
{
'id': 3,
'title': 'NLP Fundamentals',
'author': 'Carol',
'tags': ['nlp', 'text'],
'rating': 4.2,
},
{
'id': 4,
'title': 'Computer Vision',
'author': 'Dave',
'tags': ['cv', 'images'],
'rating': 4.6,
},
{
'id': 5,
'title': 'Reinforcement Learning',
'author': 'Eve',
'tags': ['rl', 'agents'],
'rating': 4.3,
},
]
# Save to temporary JSON file
temp_dir = tempfile.mkdtemp()
json_path = Path(temp_dir) / 'articles.json'
with open(json_path, 'w') as f:
json.dump(sample_data, f, indent=2)
```
### Import JSON file
Use `create_table` with `source` to create a table directly from a JSON
file:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a fresh directory
pxt.drop_dir('json_demo', force=True)
pxt.create_dir('json_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'json\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Import JSON file into a new table
articles = pxt.create_table(
'json_demo/articles',
source=str(json_path),
source_format='json', # Explicitly specify format when using local file paths
)
```
Created table 'articles'.
Inserting rows into \`articles\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`articles\`: 5 rows \[00:00, 538.52 rows/s]
Inserted 5 rows with 0 errors.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View imported data
articles.collect()
```
### Import from URL
You can import JSON directly from a URL—useful for APIs and remote data:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Import from a public JSON URL
# Using JSONPlaceholder API as an example
posts = pxt.create_table(
'json_demo/posts',
source='https://jsonplaceholder.typicode.com/posts',
source_format='json', # Required for URL sources
)
```
Created table 'posts'.
Inserting rows into \`posts\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`posts\`: 100 rows \[00:00, 15623.57 rows/s]
Inserted 100 rows with 0 errors.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View first few rows
posts.head(5)
```
### Import from Python dictionaries
Use `create_table` with a list of dictionaries as `source`—useful when
you have data in memory:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Import from a list of dictionaries
events = [
{
'event': 'page_view',
'user_id': 101,
'timestamp': '2024-01-15T10:30:00',
},
{
'event': 'click',
'user_id': 101,
'timestamp': '2024-01-15T10:31:00',
},
{
'event': 'purchase',
'user_id': 102,
'timestamp': '2024-01-15T10:32:00',
},
]
event_table = pxt.create_table('json_demo/events', source=events)
```
Created table 'events'.
Inserting rows into \`events\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`events\`: 3 rows \[00:00, 988.06 rows/s]
Inserted 3 rows with 0 errors.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View imported events
event_table.collect()
```
### Add computed columns
Once imported, you can enrich the data with computed columns:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add a computed column combining title and author
articles.add_computed_column(
summary=articles.title + ' by ' + articles.author
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View with computed column
articles.select(
articles.title, articles.author, articles.summary
).collect()
```
## Explanation
**JSON format requirements:**
The JSON file must contain an array of objects at the top level:
```json theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[
{"col1": "value1", "col2": 123},
{"col1": "value2", "col2": 456}
]
```
**Source types supported:**
**Nested JSON handling:**
Nested objects and arrays are stored as JSON columns. You can access
nested fields using Pixeltable’s JSON path syntax in computed columns.
## See also
* [Import CSV
files](/howto/cookbooks/data/data-import-csv) -
For CSV and Excel imports
* [Import Parquet
files](/howto/cookbooks/data/data-import-parquet) -
For Parquet data
* [Extract fields from
JSON](/howto/cookbooks/core/workflow-json-extraction) -
Parse LLM response fields
# Import data from Parquet files
Source: https://docs.pixeltable.com/howto/cookbooks/data/data-import-parquet
Ingest Apache Parquet files into Pixeltable tables for fast columnar loading of large analytics, ML training, and feature-store datasets.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Load columnar data from Parquet files into Pixeltable tables for
processing and analysis.
## Problem
You have data stored in Parquet format—a common format for analytics,
data lakes, and ML pipelines. You need to load this data for processing
with AI models or combining with other data sources.
## Solution
**What’s in this recipe:**
* Import Parquet files directly into tables
* Export tables to Parquet for external tools
* Handle schema type overrides
You use `pxt.create_table()` with a `source` parameter to create a table
from a Parquet file. Pixeltable infers column types from the Parquet
schema automatically.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable pyarrow pandas
```
### Create sample Parquet file
First, create a sample Parquet file to demonstrate the import process:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pandas as pd
import pixeltable as pxt
import tempfile
from pathlib import Path
# Create sample data
sample_data = pd.DataFrame(
{
'product_id': [1, 2, 3, 4, 5],
'name': [
'Widget A',
'Widget B',
'Gadget X',
'Gadget Y',
'Tool Z',
],
'price': [29.99, 39.99, 149.99, 199.99, 79.99],
'category': ['widgets', 'widgets', 'gadgets', 'gadgets', 'tools'],
'in_stock': [True, False, True, True, False],
}
)
# Save to temporary Parquet file
temp_dir = tempfile.mkdtemp()
parquet_path = Path(temp_dir) / 'products.parquet'
sample_data.to_parquet(parquet_path, index=False)
sample_data
```
### Import Parquet file
Use `create_table` with the `source` parameter to create a table
directly from the Parquet file:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a fresh directory
pxt.drop_dir('parquet_demo', force=True)
pxt.create_dir('parquet_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'parquet\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Import Parquet file into a new table
products = pxt.create_table(
'parquet_demo/products', source=str(parquet_path)
)
```
Created table 'products'.
Inserting rows into \`products\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`products\`: 5 rows \[00:00, 653.18 rows/s]
Inserted 5 rows with 0 errors.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View imported data
products.collect()
```
### Add computed columns
Once imported, you can add computed columns like any other Pixeltable
table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add a computed column for discounted price
products.add_computed_column(sale_price=products.price * 0.9)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View with computed column
products.select(
products.name, products.price, products.sale_price
).collect()
```
### Import with primary key
Specify a primary key when you need upsert behavior or unique
constraints:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Import with a primary key
products_pk = pxt.create_table(
'parquet_demo/products_with_pk',
source=str(parquet_path),
primary_key='product_id',
)
```
Created table 'products\_with\_pk'.
Inserting rows into \`products\_with\_pk\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`products\_with\_pk\`: 5 rows \[00:00, 1548.97 rows/s]
Inserted 5 rows with 0 errors.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View the table
products_pk.collect()
```
### Export table to Parquet
Export your processed data back to Parquet for use with other toolee
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Export to Parquet (note: image columns require inline_images=True)
export_path = Path(temp_dir) / 'exported_products'
pxt.io.export_parquet(
products.select(products.name, products.price, products.sale_price),
parquet_path=export_path,
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Verify export by reading back
import pyarrow.parquet as pq
exported_table = pq.read_table(export_path)
exported_table.to_pandas()
```
## Explanation
**When to use Parquet import:**
**Key features:**
* Automatic schema inference from Parquet metadata
* Support for partitioned datasets (directory of files)
* Export with `pxt.io.export_parquet` for interoperability
* Primary key support for upsert workflows
## See also
* [Import CSV
files](/howto/cookbooks/data/data-import-csv) -
For CSV and Excel imports
* [Import JSON
files](/howto/cookbooks/data/data-import-json) -
For JSON data
# Load media from S3 and other cloud storage
Source: https://docs.pixeltable.com/howto/cookbooks/data/data-import-s3
Load images, videos, audio, and documents from Amazon S3 and other cloud storage buckets into Pixeltable tables using URL references.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Import images, videos, and audio files from S3, GCS, HTTP URLs, or local
paths into Pixeltable tables.
## Problem
You have media files stored in cloud storage (S3, GCS) or accessible via
HTTP URLs. You need to process these files with AI models without
downloading them all upfront.
## Solution
**What’s in this recipe:**
* Reference media files by URL (S3, HTTP, local paths)
* Automatic caching of remote files on access
* Process files lazily without bulk downloads
You insert media URLs as references. Pixeltable stores the URLs and
automatically downloads/caches files when you access them through
queries or computed columns.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable boto3
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Create a fresh directory
pxt.drop_dir('cloud_demo', force=True)
pxt.create_dir('cloud_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'cloud\_demo'.
\
### Load images from HTTP URLs
Reference images by URL—Pixeltable downloads them on demand:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table with image column
images = pxt.create_table('cloud_demo/images', {'image': pxt.Image})
```
Created table 'images'.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert images by URL (HTTP)
image_urls = [
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000106.jpg',
]
images.insert([{'image': url} for url in image_urls])
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View images - files are downloaded and cached on access
images.collect()
```
### Load videos from S3
Reference videos in S3 buckets (using public Multimedia Commons bucket):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table with video column
videos = pxt.create_table('cloud_demo/videos', {'video': pxt.Video})
```
Created table 'videos'.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert videos by S3 URL (public bucket, no credentials needed)
s3_prefix = 's3://multimedia-commons/'
video_paths = [
'data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4',
'data/videos/mp4/ffe/feb/ffefebb41485539f964760e6115fbc44.mp4',
]
videos.insert([{'video': s3_prefix + path} for path in video_paths])
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View videos - downloaded and cached on access
videos.collect()
```
### Add computed columns on remote media
Process remote media with computed columns—files are fetched
automatically:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add computed columns for image properties
images.add_computed_column(width=images.image.width)
images.add_computed_column(height=images.image.height)
```
Added 3 column values with 0 errors.
Added 3 column values with 0 errors.
3 rows updated, 6 values computed.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View with computed properties
images.select(images.image, images.width, images.height).collect()
```
### Generate presigned URLs for serving media
When you store media in private cloud storage, you need presigned URLs
to serve files over HTTP. The `presigned_url` function converts storage
URIs to time-limited, publicly accessible URLs:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable.functions as pxtf
# Generate presigned URLs for videos (1-hour expiration)
videos.select(
videos.video,
original_uri=videos.video.fileurl,
http_url=pxtf.net.presigned_url(videos.video.fileurl, 3600),
).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Store presigned URLs as computed column for API responses
videos.add_computed_column(
serving_url=pxtf.net.presigned_url(
videos.video.fileurl, 86400
) # 24-hour expiration
)
```
**Use cases for presigned URLs:**
* Serve private media in web applications without exposing credentials
* Generate download links for end users
* Integrate with CDNs or video players that require HTTP URLs
**Provider limitations:**
Note: HTTP/HTTPS URLs pass through unchanged (already publicly
accessible).
### Supported URL formats
Pixeltable supports multiple URL schemes for media files:
\*Configure AWS/GCP credentials via environment variables or config
files.
## Explanation
**How caching works:**
1. URLs are stored as references in the table
2. Files are downloaded on first access (query or computed column)
3. Downloaded files are cached in `~/.pixeltable/file_cache/`
4. Cache uses LRU eviction when space is needed
**Benefits of URL-based storage:**
* **Lazy loading** - Only download files when needed
* **Deduplication** - Same URL is cached once
* **Incremental processing** - Add files without bulk downloads
* **Cloud-native** - Works directly with object storage
**For private S3 buckets:**
Configure AWS credentials using standard methods:
* Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`)
* AWS credentials file (`~/.aws/credentials`)
* IAM roles (when running on EC2/ECS)
## See also
* [Upload to S3](../../../howto/cookbooks/data/data-export-s3) - Store
generated media in S3/GCS
* [Import from CSV](../../../howto/cookbooks/data/data-import-csv) -
Load structured data
* [Extract frames from
videos](/howto/cookbooks/video/video-extract-frames) -
Process video files
* [Analyze images in
batch](/howto/cookbooks/images/vision-batch-analysis) -
AI vision on images
* [Configure API
keys](/howto/cookbooks/core/workflow-api-keys) -
Set up credentials
# Sample data for training and testing
Source: https://docs.pixeltable.com/howto/cookbooks/data/data-sampling
Create train, validation, and test splits in Pixeltable using reproducible row sampling, stratification, and seeded random shuffles.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Create training, validation, and test splits with random or stratified
sampling.
## Problem
You have a large dataset and need to create subsets for ML
training—random samples for quick experiments, stratified samples for
balanced classes, or reproducible splits for benchmarking.
## Solution
**What’s in this recipe:**
* Random sampling with `sample(n=...)`
* Percentage-based sampling with `sample(fraction=...)`
* Stratified sampling with `stratify_by=`
You use `query.sample()` to create random subsets, with optional
stratification for balanced class distribution.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Create a fresh directory
pxt.drop_dir('sampling_demo', force=True)
pxt.create_dir('sampling_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'sampling\_demo'.
\
Created table 'data'.
Inserting rows into \`data\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`data\`: 10 rows \[00:00, 857.13 rows/s]
Inserted 10 rows with 0 errors.
10 rows inserted, 20 values computed.
### Random sampling
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Sample exactly N rows
data.sample(n=5, seed=42).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Sample a percentage of rows
sample_50pct = data.sample(fraction=0.5, seed=42).collect()
```
### Stratified sampling
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Stratified sampling: 50% from each class
data.sample(fraction=0.5, stratify_by=data.label, seed=42).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Equal allocation: N rows from each class
data.sample(n_per_stratum=1, stratify_by=data.label, seed=42).collect()
```
### Sampling from filtered data
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Sample from filtered query (high-confidence predictions only)
data.where(data.score > 0.8).sample(n=3, seed=42).collect()
```
### Persist samples as tables
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a persistent table from a sample for dev/test
train_sample = data.sample(fraction=0.8, seed=42)
test_sample = data.sample(fraction=0.2, seed=43)
# Persist as new tables
train_table = pxt.create_table('sampling_demo/train', source=train_sample)
test_table = pxt.create_table('sampling_demo/test', source=test_sample)
```
Created table 'train'.
Inserting rows into \`train\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`train\`: 9 rows \[00:00, 3080.27 rows/s]
Created table 'test'.
Inserting rows into \`test\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`test\`: 3 rows \[00:00, 1333.92 rows/s]
## Explanation
**Sampling methods:**
**Stratification options:**
**Tips:**
* Always set `seed` for reproducible experiments
* Use stratified sampling for imbalanced datasets
* Combine with `.where()` to sample from subsets
## See also
* [Export for ML
training](/howto/cookbooks/data/data-export-pytorch) -
PyTorch DataLoader export
* [Import Hugging Face
datasets](/howto/cookbooks/data/data-import-huggingface) -
Load pre-split datasets
# Add watermarks to images
Source: https://docs.pixeltable.com/howto/cookbooks/images/img-add-watermarks
Add text or image watermarks to photos in Pixeltable using PIL-backed computed columns for branding, attribution, and rights protection.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
## Problem
You need to add watermarks to hundreds of different images to protect
copyright, add branding, or mark drafts.
## Solution
**What’s in this recipe:**
* Create simple text watermarks
* Test transformations before applying
* Apply to multiple images automatically
You add watermarks to images using a custom UDF that wraps Pillow’s
`ImageDraw` (relies on PIL/Pillow). This gives you full control over
watermark placement, font, transparency, and color.
You can iterate on transformations before adding them to your table. Use
`.select()` with `.collect()` to preview results on sample
images—nothing is stored in your table. If you want to collect only the
first few rows, use `.head(n)` instead of `.collect()`. Once you’re
satisfied, use `.add_computed_column()` to apply watermarks to all
images in your table.
For more on this workflow, see [Get fast feedback on
transformations](/howto/cookbooks/core/dev-iterative-workflow).
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
### Load images
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from PIL import Image, ImageDraw, ImageFont
# Create a fresh directory (drop existing if present)
pxt.drop_dir('image_demo', force=True)
pxt.create_dir('image_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'image\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('image_demo/watermarks', {'image': pxt.Image})
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View all results
t.collect()
```
## Explanation
**How the watermark technique works:**
The UDF creates a transparent overlay on top of each image. The overlay
is created with the same dimensions as the image
(`Image.new('RGBA', img.size, ...)`), so watermarks adapt automatically
whether you’re processing small thumbnails or large photos. The function
draws white text with semi-transparent fill (alpha=200, where 255 is
fully opaque), composites the overlay onto the original image using
`Image.alpha_composite()`, and converts back to RGB since most image
formats don’t support transparency.
**To customize the UDF:**
* Position: Change the `(x, y)` coordinates in the `position` variable
* Color: Modify the `(R, G, B, Alpha)` fill value (0-255 for each)
* Size: Adjust the font size parameter in
`ImageFont.load_default(size=40)`
* Font: Use `ImageFont.truetype('path/to/font.ttf', size)` for custom
fonts
**The Pixeltable workflow:**
In traditional databases, `.select()` just picks which columns to view.
In Pixeltable, `.select()` also lets you compute new transformations on
the fly—define new columns without storing them. This makes `.select()`
perfect for testing transformations before you commit them.
When you use `.select()`, you’re creating a query that doesn’t execute
until you call `.collect()`. You must use `.collect()` to execute the
query and return results—nothing is stored in your table. If you want to
collect only the first few rows, use `.head(n)` instead of `.collect()`
to test on a subset before processing your full dataset. Once satisfied,
use `.add_computed_column()` with the same expression to persist results
permanently.
For more on this workflow, see [Get fast feedback on
transformations](/howto/cookbooks/core/dev-iterative-workflow).
## See also
* [Test transformations with fast feedback
loops](/howto/cookbooks/core/dev-iterative-workflow)
* [Transform images with PIL
operations](/howto/cookbooks/images/img-pil-transforms)
* *Pillow techniques from [Real Python: Image Processing With the Python
Pillow
Library](https://realpython.com/image-processing-with-the-python-pillow-library/)*
# Adjust image opacity
Source: https://docs.pixeltable.com/howto/cookbooks/images/img-adjust-opacity
Adjust image transparency and alpha channels in Pixeltable with PIL operations for compositing layered images, overlays, and watermark effects.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
## Problem
You need to make hundreds of images semi-transparent for backgrounds,
overlays, or watermarks.
## Solution
**What’s in this recipe:**
* Set image opacity (transparency level)
* Test transformations before applying
* Apply to multiple images automatically
You adjust image transparency using a custom UDF that modifies alpha
channels (relies on PIL/Pillow). This gives you precise control over
transparency levels.
You can iterate on transformations before adding them to your table. Use
`.select()` with `.collect()` to preview results on sample
images—nothing is stored in your table. If you want to collect only the
first few rows, use `.head(n)` instead of `.collect()`. Once you’re
satisfied, use `.add_computed_column()` to apply the opacity adjustment
to all images in your table.
For more on this workflow, see [Get fast feedback on
transformations](/howto/cookbooks/core/dev-iterative-workflow).
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
### Load images
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from PIL import Image
# Create a fresh directory (drop existing if present)
pxt.drop_dir('image_demo', force=True)
pxt.create_dir('image_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'image\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('image_demo/opacity', {'image': pxt.Image})
```
Inserting rows into \`opacity\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`opacity\`: 3 rows \[00:00, 545.05 rows/s]
Inserted 3 rows with 0 errors.
3 rows inserted, 6 values computed.
### Iterate: adjust opacity for a few images first
You define a custom function using the `@pxt.udf` decorator to make it
available in Pixeltable. Inside the function, you use standard PIL
(Pillow) operations to manipulate images. Pixeltable handles applying
your function to every row in your table.
**How it works:**
* All image manipulation (`.convert()`, `.split()`, `.point()`,
`.putalpha()`) comes from the PIL/Pillow library
* These are standard Python image operations—see [Pillow
docs](https://pillow.readthedocs.io/) for reference
* The `@pxt.udf` decorator lets Pixeltable apply your function to table
rows
* The opacity parameter (0.0 = fully transparent, 1.0 = fully opaque)
controls the alpha scaling
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
def set_opacity(img: Image.Image, opacity: float) -> Image.Image:
"""Set image opacity (0.0 = fully transparent, 1.0 = fully opaque)."""
img = img.convert('RGBA')
alpha = img.split()[3] # Get alpha channel
alpha = alpha.point(lambda p: int(p * opacity)) # Scale alpha values
img.putalpha(alpha)
return img
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Test 25%, 50%, and 75% opacity
t.select(
t.image,
alpha_25=set_opacity(t.image, 0.25),
alpha_50=set_opacity(t.image, 0.5),
alpha_75=set_opacity(t.image, 0.75),
).head(1)
```
### Add: adjust opacity for all images in your table
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create 50% opacity for backgrounds
t.add_computed_column(semi_transparent=set_opacity(t.image, 0.5))
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View original and semi-transparent side by side
t.collect()
```
## Explanation
**How the opacity technique works:**
The UDF modifies the alpha channel to control transparency. The function
converts the image to RGBA mode (which includes an alpha channel for
transparency), extracts the alpha channel with `.split()[3]`, scales all
values by the desired opacity factor using
`.point(lambda p: int(p * opacity))`, and applies it back with
`.putalpha()`. This preserves the original image while adjusting only
the transparency level.
**To customize the UDF:**
* **Opacity levels**: Use 0.25 for very faint backgrounds, 0.5 for
standard transparency, 0.75 for subtle effects
* **Selective transparency**: Modify the lambda function in `.point()`
to apply different transparency to different pixel values
* **Preserve regions**: Add conditional logic to keep certain areas
fully opaque
**The Pixeltable workflow:**
In traditional databases, `.select()` just picks which columns to view.
In Pixeltable, `.select()` also lets you compute new transformations on
the fly—define new columns without storing them. This makes `.select()`
perfect for testing transformations before you commit them.
When you use `.select()`, you’re creating a query that doesn’t execute
until you call `.collect()`. You must use `.collect()` to execute the
query and return results—nothing is stored in your table. If you want to
collect only the first few rows, use `.head(n)` instead of `.collect()`
to test on a subset before processing your full dataset. Once satisfied,
use `.add_computed_column()` with the same expression to persist results
permanently.
For more on this workflow, see [Get fast feedback on
transformations](/howto/cookbooks/core/dev-iterative-workflow).
## See also
* [Test transformations with fast feedback
loops](/howto/cookbooks/core/dev-iterative-workflow)
* [Add watermarks to
images](/howto/cookbooks/images/img-add-watermarks)
* [Transform images with PIL
operations](/howto/cookbooks/images/img-pil-transforms)
# Apply image filters
Source: https://docs.pixeltable.com/howto/cookbooks/images/img-apply-filters
Apply blur, sharpen, edge detection, and other PIL image filters in Pixeltable using declarative computed columns to process datasets at scale.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
## Problem
You need to apply filters to hundreds of images—blur, sharpen, edge
detection, and other enhancements.
## Solution
**What’s in this recipe:**
* Apply common image filters
* Test filters before applying
* Process multiple images in batch
You apply image filters (blur, sharpen, edge detection) to images in
your table using custom UDFs that wrap Pillow’s `ImageFilter` module
(relies on PIL/Pillow). This gives you control over filter parameters.
You can iterate on transformations before adding them to your table. Use
`.select()` with `.collect()` to preview results on sample
images—nothing is stored in your table. If you want to collect only the
first few rows, use `.head(n)` instead of `.collect()`. Once you’re
satisfied, use `.add_computed_column()` to apply the filter to all
images in your table.
For more on this workflow, see [Get fast feedback on
transformations](/howto/cookbooks/core/dev-iterative-workflow).
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
### Load images
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from PIL import ImageFilter
# Create a fresh directory (drop existing if present)
pxt.drop_dir('image_demo', force=True)
pxt.create_dir('image_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'image\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('image_demo/filters', {'image': pxt.Image})
```
Inserting rows into \`filters\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`filters\`: 3 rows \[00:00, 538.79 rows/s]
Inserted 3 rows with 0 errors.
3 rows inserted, 6 values computed.
### Iterate: apply filters to a few images first
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
def apply_blur(img: pxt.Image) -> pxt.Image:
"""Apply blur filter."""
return img.filter(ImageFilter.BLUR)
@pxt.udf
def apply_sharpen(img: pxt.Image) -> pxt.Image:
"""Apply sharpen filter."""
return img.filter(ImageFilter.SHARPEN)
@pxt.udf
def apply_find_edges(img: pxt.Image) -> pxt.Image:
"""Apply edge detection filter."""
return img.filter(ImageFilter.FIND_EDGES)
@pxt.udf
def apply_edge_enhance(img: pxt.Image) -> pxt.Image:
"""Apply edge enhancement filter."""
return img.filter(ImageFilter.EDGE_ENHANCE)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Test blur and sharpen
t.select(t.image, apply_blur(t.image), apply_sharpen(t.image)).head(1)
```
### Add: apply filters to all images in your table
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add filter columns
t.add_computed_column(blurred=apply_blur(t.image))
t.add_computed_column(sharpened=apply_sharpen(t.image))
t.add_computed_column(edges=apply_find_edges(t.image))
t.add_computed_column(edge_enhanced=apply_edge_enhance(t.image))
```
Added 3 column values with 0 errors.
Added 3 column values with 0 errors.
Added 3 column values with 0 errors.
Added 3 column values with 0 errors.
3 rows updated, 3 values computed.
### View results
Compare original and filtered images.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Compare blur and sharpen
t.select(t.image, t.blurred, t.sharpened).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Compare edge detection filters
t.select(t.image, t.edges, t.edge_enhanced).collect()
```
## Explanation
**How the filter technique works:**
The UDFs wrap PIL’s `ImageFilter` module to apply convolution-based
filters to images. Each filter uses a predefined kernel that processes
pixel neighborhoods to achieve different effects. Blur averages
surrounding pixels to reduce detail, Sharpen enhances pixel differences
to increase detail, Find Edges detects boundaries between contrasting
regions, and Edge Enhance strengthens edges while preserving the full
image. You can apply multiple filters to the same image to create
different versions for analysis or visual effects.
**To customize the UDFs:**
* **Blur intensity**: Use `ImageFilter.BoxBlur(radius)` or
`ImageFilter.GaussianBlur(radius)` for adjustable blur strength
* **Edge detection**: Combine with grayscale conversion for clearer edge
maps
* **Filter stacking**: Apply multiple filters in sequence for complex
effects
* **Custom kernels**: Use `ImageFilter.Kernel()` to define your own
convolution filters
**The Pixeltable workflow:**
In traditional databases, `.select()` just picks which columns to view.
In Pixeltable, `.select()` also lets you compute new transformations on
the fly—define new columns without storing them. This makes `.select()`
perfect for testing transformations before you commit them.
When you use `.select()`, you’re creating a query that doesn’t execute
until you call `.collect()`. You must use `.collect()` to execute the
query and return results—nothing is stored in your table. If you want to
collect only the first few rows, use `.head(n)` instead of `.collect()`
to test on a subset before processing your full dataset. Once satisfied,
use `.add_computed_column()` with the same expression to persist results
permanently.
For more on this workflow, see [Get fast feedback on
transformations](/howto/cookbooks/core/dev-iterative-workflow).
## See also
* [Test transformations with fast feedback
loops](/howto/cookbooks/core/dev-iterative-workflow)
* [Adjust image brightness and
contrast](/howto/cookbooks/images/img-brightness-contrast)
* *Pillow techniques from [Real Python: Image Processing With the Python
Pillow
Library](https://realpython.com/image-processing-with-the-python-pillow-library/)*
# Adjust image brightness and contrast
Source: https://docs.pixeltable.com/howto/cookbooks/images/img-brightness-contrast
Tune image brightness, contrast, saturation, and sharpness in Pixeltable using PIL ImageEnhance operations exposed as computed columns at scale.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
## Problem
You need to fix inconsistent lighting across hundreds of
images—adjusting brightness, contrast, and color saturation.
## Solution
**What’s in this recipe:**
* Adjust brightness, contrast, and saturation
* Test adjustments before applying
* Process multiple images in batch
You adjust brightness, contrast, and saturation for images in your table
using custom UDFs that wrap Pillow’s `ImageEnhance` module (relies on
PIL/Pillow). This lets you control enhancement levels to match your
needs.
You can iterate on transformations before adding them to your table. Use
`.select()` with `.collect()` to preview results on sample
images—nothing is stored in your table. If you want to collect only the
first few rows, use `.head(n)` instead of `.collect()`. Once you’re
satisfied, use `.add_computed_column()` to apply the adjustments to all
images in your table.
For more on this workflow, see [Get fast feedback on
transformations](/howto/cookbooks/core/dev-iterative-workflow).
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
### Load images
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from PIL import ImageEnhance
# Create a fresh directory (drop existing if present)
pxt.drop_dir('image_demo', force=True)
pxt.create_dir('image_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'image\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('image_demo/enhancements', {'image': pxt.Image})
```
Inserting rows into \`enhancements\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`enhancements\`: 3 rows \[00:00, 601.16 rows/s]
Inserted 3 rows with 0 errors.
3 rows inserted, 6 values computed.
### Iterate: adjust brightness and contrast for a few images first
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
def adjust_brightness(img: pxt.Image, factor: float) -> pxt.Image:
"""Adjust brightness. factor < 1 = darker, > 1 = brighter."""
return ImageEnhance.Brightness(img).enhance(factor)
@pxt.udf
def adjust_contrast(img: pxt.Image, factor: float) -> pxt.Image:
"""Adjust contrast. factor < 1 = lower, > 1 = higher."""
return ImageEnhance.Contrast(img).enhance(factor)
@pxt.udf
def adjust_saturation(img: pxt.Image, factor: float) -> pxt.Image:
"""Adjust saturation. factor < 1 = less saturated, > 1 = more saturated."""
return ImageEnhance.Color(img).enhance(factor)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Test brightness adjustments
t.select(
t.image,
adjust_brightness(t.image, 0.5),
adjust_brightness(t.image, 1.5),
).head(1)
```
### Add: adjust brightness and contrast for all images in your table
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Brightness adjustments (1.0 = original)
t.add_computed_column(darker=adjust_brightness(t.image, 0.5))
t.add_computed_column(brighter=adjust_brightness(t.image, 1.5))
# Contrast adjustments
t.add_computed_column(low_contrast=adjust_contrast(t.image, 0.5))
t.add_computed_column(high_contrast=adjust_contrast(t.image, 2.0))
# Color saturation
t.add_computed_column(desaturated=adjust_saturation(t.image, 0.3))
t.add_computed_column(saturated=adjust_saturation(t.image, 2.0))
```
Added 3 column values with 0 errors.
Added 3 column values with 0 errors.
Added 3 column values with 0 errors.
Added 3 column values with 0 errors.
Added 3 column values with 0 errors.
Added 3 column values with 0 errors.
3 rows updated, 3 values computed.
### View results
Compare different enhancement levels side-by-side.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Compare brightness levels
t.select(t.image, t.darker, t.brighter).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Compare contrast levels
t.select(t.image, t.low_contrast, t.high_contrast).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Compare saturation levels
t.select(t.image, t.desaturated, t.saturated).collect()
```
## Explanation
**How the enhancement technique works:**
The UDFs wrap PIL’s `ImageEnhance` module to adjust visual properties of
images. Each enhancement type creates an enhancer object for the image,
then applies a multiplication factor. A factor of 1.0 leaves the image
unchanged, values below 1.0 decrease the property (darker, less
contrast, desaturated), and values above 1.0 increase it (brighter, more
contrast, saturated). You can apply different factors to the same image
to create multiple variations for comparison or different use cases.
**To customize the UDFs:**
* **Brightness factors**: Use 0.5 for darker images, 1.5 for brighter,
or adjust to match your lighting needs
* **Contrast factors**: Use 0.5 for lower contrast, 2.0 for higher
contrast, or fine-tune for image clarity
* **Saturation factors**: Use 0.3 for desaturated/muted colors, 2.0 for
vibrant colors, or 0.0 for complete grayscale
* **Combine adjustments**: Apply multiple enhancements to create complex
transformations
**The Pixeltable workflow:**
In traditional databases, `.select()` just picks which columns to view.
In Pixeltable, `.select()` also lets you compute new transformations on
the fly—define new columns without storing them. This makes `.select()`
perfect for testing transformations before you commit them.
When you use `.select()`, you’re creating a query that doesn’t execute
until you call `.collect()`. You must use `.collect()` to execute the
query and return results—nothing is stored in your table. If you want to
collect only the first few rows, use `.head(n)` instead of `.collect()`
to test on a subset before processing your full dataset. Once satisfied,
use `.add_computed_column()` with the same expression to persist results
permanently.
For more on this workflow, see [Get fast feedback on
transformations](/howto/cookbooks/core/dev-iterative-workflow).
## See also
* [Test transformations with fast feedback
loops](/howto/cookbooks/core/dev-iterative-workflow)
* [Apply image
filters](/howto/cookbooks/images/img-apply-filters)
* *Pillow techniques from [Real Python: Image Processing With the Python
Pillow
Library](https://realpython.com/image-processing-with-the-python-pillow-library/)*
# Detect objects in images
Source: https://docs.pixeltable.com/howto/cookbooks/images/img-detect-objects
Detect objects in images at scale in Pixeltable using YOLOX, DETR, and other vision models with bounding box outputs and confidence scores.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Automatically identify and locate objects in images using YOLOX object
detection models.
## Problem
You have images that need object detection—identifying what objects are
present and where they’re located. Manual labeling is slow and
expensive.
## Solution
**What’s in this recipe:**
* Detect objects using YOLOX models (runs locally, no API needed)
* Get bounding boxes and class labels
* Filter detections by confidence threshold
You add a computed column that runs YOLOX on each image. Detection
happens automatically when you insert new images.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable pixeltable-yolox
```
### Load images
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.yolox import yolox
# Create a fresh directory
pxt.drop_dir('detection_demo', force=True)
pxt.create_dir('detection_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'detection\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View summary
images.select(
images.image, images.object_count, images.object_classes
).collect()
```
## Explanation
**YOLOX model sizes:**
**Detection output format:**
The `detections` dictionary contains:
* `labels`: List of class names (e.g., “person”, “car”, “dog”)
* `boxes`: Bounding box coordinates \[x1, y1, x2, y2]
* `scores`: Confidence scores (0-1)
**Adjusting threshold:**
* Higher threshold (0.7-0.9): Fewer detections, higher confidence
* Lower threshold (0.3-0.5): More detections, may include false
positives
## See also
* [Extract frames from
videos](/howto/cookbooks/video/video-extract-frames) -
Detect objects in video frames
* [Analyze images in
batch](/howto/cookbooks/images/vision-batch-analysis) -
AI vision analysis
* [Find similar
images](/howto/cookbooks/search/search-similar-images) -
Visual similarity search
# Compare object detection and panoptic segmentation
Source: https://docs.pixeltable.com/howto/cookbooks/images/img-detection-vs-segmentation
Compare object detection bounding boxes with panoptic segmentation masks in Pixeltable to pick the right vision approach for your task.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Understand when to use bounding boxes versus pixel-level masks for image
analysis.
**What’s in this recipe:**
* Run object detection to get bounding boxes and labels
* Run panoptic segmentation to get pixel-level masks
* Visualize and compare outputs side-by-side
## Problem
You need to analyze objects in images, but there are two approaches:
Which should you use? Detection is faster but approximate. Segmentation
is slower but precise.
## Solution
Run both approaches on the same images using DETR models and compare the
results.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable torch transformers timm
```
### Load images
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import numpy as np
import pixeltable as pxt
from pixeltable.functions.huggingface import (
detr_for_object_detection,
detr_for_segmentation,
)
from pixeltable.functions.vision import bboxes_draw, overlay_segmentation
pxt.drop_dir('detection_vs_seg', force=True)
pxt.create_dir('detection_vs_seg')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'detection\_vs\_seg'.
\
Created table 'images'.
Inserted 2 rows with 0 errors in 0.22 s (9.21 rows/s)
2 rows inserted.
### Run object detection
The `detr_for_object_detection` function returns bounding boxes, labels,
and confidence scores.
**Parameters:**
* `model_id`: DETR variant (`facebook/detr-resnet-50` or
`facebook/detr-resnet-101`)
* `threshold`: Confidence threshold (0.0-1.0). Higher = fewer but more
confident detections
**Output:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{'boxes': [[x1, y1, x2, y2], ...], 'scores': [0.98, ...], 'label_text': ['person', ...]}
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
images.add_computed_column(
detections=detr_for_object_detection(
images.image, model_id='facebook/detr-resnet-50', threshold=0.8
)
)
```
Added 2 column values with 0 errors in 4.09 s (0.49 rows/s)
2 rows updated.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View detection results
images.select(images.image, images.detections).collect()
```
### Visualize detections with bounding boxes
Use `bboxes_draw` to overlay the detection results on the original
image.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
images.add_computed_column(
detection_viz=bboxes_draw(
images.image,
boxes=images.detections.boxes,
labels=images.detections.label_text,
fill=True,
width=2,
)
)
```
Added 2 column values with 0 errors in 0.03 s (58.89 rows/s)
2 rows updated.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
images.select(images.detection_viz).collect()
```
### Run panoptic segmentation
The `detr_for_segmentation` function returns pixel-level masks and
segment metadata.
**Parameters:**
* `model_id`: Segmentation model (`facebook/detr-resnet-50-panoptic`)
* `threshold`: Confidence threshold for filtering segments
**Output:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{
'segmentation': np.ndarray, # (H, W) array where each pixel = segment ID
'segments_info': [{'id': 1, 'label_text': 'person', 'score': 0.98}, ...]
}
```
> **Note:** The full segmentation output contains a numpy array that
> can’t be stored as JSON. We store just the `segments_info` metadata
> and compute the pixel-level visualization inline.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Store just the segments_info (JSON-serializable) as a computed column
# The segmentation array will be computed inline for visualization
seg_expr = detr_for_segmentation(
images.image,
model_id='facebook/detr-resnet-50-panoptic',
threshold=0.5,
)
images.add_computed_column(segments_info=seg_expr.segments_info)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View stored segmentation info
images.select(images.image, images.segments_info).collect()
```
### Visualize segmentation with colored overlay
Use `overlay_segmentation` to visualize the pixel masks with colored
regions and contours.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Compute segmentation visualization inline
# Cast the segmentation array to the proper type for overlay_segmentation
seg_expr = detr_for_segmentation(
images.image,
model_id='facebook/detr-resnet-50-panoptic',
threshold=0.5,
)
segmentation_map = seg_expr.segmentation.astype(
pxt.Array[(None, None), np.int32]
)
images.select(
segmentation_viz=overlay_segmentation(
images.image,
segmentation_map,
alpha=0.5,
draw_contours=True,
contour_thickness=2,
)
).collect()
```
### Compare side-by-side
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Side-by-side comparison: original, detection, segmentation
seg_expr = detr_for_segmentation(
images.image,
model_id='facebook/detr-resnet-50-panoptic',
threshold=0.5,
)
segmentation_map = seg_expr.segmentation.astype(
pxt.Array[(None, None), np.int32]
)
images.select(
images.image,
images.detection_viz,
segmentation_viz=overlay_segmentation(
images.image,
segmentation_map,
alpha=0.5,
draw_contours=True,
contour_thickness=2,
),
).collect()
```
### Count objects per image
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Count objects per image (using stored columns)
images.select(
images.image,
num_detections=images.detections.boxes.apply(len, col_type=pxt.Int),
num_segments=images.segments_info.apply(len, col_type=pxt.Int),
).collect()
```
## Explanation
Detection gives fast, approximate locations. Segmentation gives slower
but precise boundaries.
### Capability comparison
### Performance tradeoffs
### When to use each
**Choose detection when:**
* You need to know *what* objects are present and *where*
(approximately)
* Speed matters (detection is 2x faster)
* You need search, filtering, or counting
* Bounding boxes suffice for visualization
**Choose segmentation when:**
* You need *exact* object boundaries (pixel-perfect masks)
* You’re doing image editing, compositing, or AR
* You need to measure actual object area/coverage
* You want scene composition analysis (what % is sky vs buildings)
## See also
* [Detect objects in images](./img-detect-objects) - Object detection
with YOLOX
* [Visualize detections](./img-visualize-detections) - Draw bounding
boxes and labels
* [DETR
documentation](https://huggingface.co/docs/transformers/model_doc/detr) -
Hugging Face model docs
# Generate captions for images
Source: https://docs.pixeltable.com/howto/cookbooks/images/img-generate-captions
Generate natural language captions for images in Pixeltable using BLIP, GPT-4 Vision, and other multimodal models with computed columns.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Automatically create descriptive captions for images using AI vision
models.
## Problem
You have a collection of images that need captions—for accessibility,
SEO, content management, or searchability. Writing captions manually
doesn’t scale.
## Solution
**What’s in this recipe:**
* Generate captions using OpenAI’s vision models
* Customize caption style (short, detailed, SEO-focused)
* Process images in batch automatically
You add a computed column that sends each image to a vision model with a
captioning prompt. New images are captioned automatically on insert.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
```
### Load images
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.openai import chat_completions
# Create a fresh directory
pxt.drop_dir('caption_demo', force=True)
pxt.create_dir('caption_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'caption\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert sample images
image_urls = [
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000106.jpg',
]
images.insert([{'image': url} for url in image_urls])
```
Inserted 3 rows with 0 errors in 0.12 s (25.17 rows/s)
3 rows inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View images
images.collect()
```
### Generate captions
Add a computed column that generates captions using the vision model:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add caption using OpenAI vision
messages = [
{
'role': 'user',
'content': [
{
'type': 'text',
'text': 'Write a concise, descriptive caption for this image in one sentence.',
},
{'type': 'image_url', 'image_url': images.image},
],
}
]
images.add_computed_column(
caption=chat_completions(messages, model='gpt-4o-mini')
)
```
Added 3 column values with 0 errors in 4.62 s (0.65 rows/s)
3 rows updated.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View images with captions
images.select(
images.image, images.caption['choices'][0]['message']['content']
).collect()
```
### Different caption styles
You can generate multiple caption styles for different uses:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add alt text for accessibility (brief)
messages = [
{
'role': 'user',
'content': [
{
'type': 'text',
'text': 'Write a brief alt text for this image (under 125 characters) for screen readers.',
},
{'type': 'image_url', 'image_url': images.image},
],
}
]
images.add_computed_column(
alt_text=chat_completions(messages, model='gpt-4o-mini')
)
```
Added 3 column values with 0 errors in 3.51 s (0.85 rows/s)
3 rows updated.
Added 3 column values with 0 errors in 11.28 s (0.27 rows/s)
3 rows updated.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View all caption types
images.select(
images.image,
images.caption['choices'][0]['message']['content'],
images.alt_text['choices'][0]['message']['content'],
images.description['choices'][0]['message']['content'],
).collect()
```
## Explanation
**Caption prompt patterns:**
**Model selection:**
* `gpt-4o-mini`: Fast and affordable, good for most captioning tasks
* `gpt-4o`: Higher quality for complex images or detailed descriptions
## See also
* [Analyze images in
batch](/howto/cookbooks/images/vision-batch-analysis) -
Run custom prompts on images
* [Extract structured data from
images](/howto/cookbooks/images/vision-structured-output) -
Get JSON from images
* [Find similar
images](/howto/cookbooks/search/search-similar-images) -
Visual similarity search
# Transform images with AI-powered editing
Source: https://docs.pixeltable.com/howto/cookbooks/images/img-image-to-image
Run image-to-image transformations in Pixeltable with diffusion models, style transfer, and AI editing APIs through computed columns.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
## Problem
You have a batch of images that need AI-powered transformations—like
turning photos into paintings, adding stylistic effects, or modifying
content based on text prompts.
## Solution
**What’s in this recipe:**
* Transform images using text prompts with Hugging Face Stable Diffusion
models
* Control transformation strength and quality settings
* Process batches of images automatically
You can iterate on transformations before adding them to your table. Use
`.select()` with `.collect()` to preview results on sample
images—nothing is stored in your table. If you want to collect only the
first few rows, use `.head(n)` instead of `.collect()`. Once you’re
satisfied, use `.add_computed_column()` to apply the transformation to
all images in your table.
For more on this workflow, see [Get fast feedback on
transformations](/howto/cookbooks/core/dev-iterative-workflow).
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable torch transformers diffusers accelerate
```
### Load images
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.huggingface import image_to_image
# Create a fresh directory (drop existing if present)
pxt.drop_dir('img2img_demo', force=True)
pxt.create_dir('img2img_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/cpestano/.pixeltable/pgdata
Created directory 'img2img\_demo'.
\
Inserted 2 rows with 0 errors in 0.49 s (4.07 rows/s)
2 rows inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View original images and prompts
t.collect()
```
### Iterate: test transformation on a single image
Use `.select()` to define the transformation, then `.head(n)` to preview
results on a subset of images. Nothing is stored in your table.
The `image_to_image` function requires:
* `image`: The source image to transform
* `prompt`: Text describing the desired output
* `model_id`: A Hugging Face model ID that supports image-to-image
(e.g., `stable-diffusion-v1-5/stable-diffusion-v1-5`)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Preview transformation on first image
t.select(
t.image,
t.prompt,
image_to_image(
t.image,
t.prompt,
model_id='stable-diffusion-v1-5/stable-diffusion-v1-5',
),
).head(1)
```
### Iterate: adjust transformation strength
You control how much the model modifies the original image using
`strength` (0.0-1.0):
* **Lower values** (0.3-0.5): Subtle changes, preserves more of the
original
* **Higher values** (0.7-1.0): Dramatic changes, more creative freedom
You pass additional parameters through `model_kwargs`. For example,
`negative_prompt` text describing what you don’t want the output to be.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Preview with lower strength (more preservation of original)
t.select(
t.image,
t.prompt,
t.negative_prompt,
image_to_image(
t.image,
t.prompt,
model_id='stable-diffusion-v1-5/stable-diffusion-v1-5',
model_kwargs={
'negative_prompt': t.negative_prompt,
'strength': 0.5,
'num_inference_steps': 30,
},
),
).head(1)
```
### Add: apply transformation to all images
Once you’re satisfied with the results, use `.add_computed_column()`
with the same expression. This processes all rows and stores the results
permanently in your table.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Save as computed column
t.add_computed_column(
transformed=image_to_image(
t.image,
t.prompt,
model_id='stable-diffusion-v1-5/stable-diffusion-v1-5',
model_kwargs={
'strength': 0.5,
'num_inference_steps': 25,
'negative_prompt': t.negative_prompt,
},
)
)
```
Added 2 column values with 0 errors in 53.83 s (0.04 rows/s)
2 rows updated.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View original and transformed images side by side
t.select(t.image, t.prompt, t.negative_prompt, t.transformed).collect()
```
### Use reproducible results with seeds
You set a `seed` parameter to get the same output every time you run the
transformation.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add reproducible transformation
t.add_computed_column(
transformed_seeded=image_to_image(
t.image,
t.prompt,
model_id='stable-diffusion-v1-5/stable-diffusion-v1-5',
seed=42,
model_kwargs={
'strength': 0.5,
'negative_prompt': t.negative_prompt,
},
)
)
```
Added 2 column values with 0 errors in 96.24 s (0.02 rows/s)
2 rows updated.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View results
t.select(t.image, t.transformed_seeded).collect()
```
## Explanation
**How image-to-image works:**
Image-to-image diffusion models take an existing image and a text
prompt, then generate a new image that blends the structure of the
original with the guidance from the prompt. The `strength` parameter
controls the balance—lower values preserve more of the original, while
higher values allow more dramatic transformations.
**Model compatibility:**
The `image_to_image` UDF uses `AutoPipelineForImage2Image` from the
diffusers library, which automatically detects the model type and
selects the appropriate pipeline. You use any compatible model:
* `stable-diffusion-v1-5/stable-diffusion-v1-5` - General-purpose, runs
on most hardware
* `stabilityai/stable-diffusion-xl-base-1.0` - Higher quality, needs
more VRAM
**Key parameters:**
* `strength` (0.0-1.0): How much to transform the image
* `negative_prompt`: Text describing what to avoid in the generated
image (e.g., “blurry, low quality”).
* `num_inference_steps`: Quality vs speed tradeoff (more steps = better
quality)
* `guidance_scale`: How closely to follow the prompt (7-8 is typical)
* `seed`: For reproducible results
## See also
* [Apply filters to
images](/howto/cookbooks/images/img-apply-filters)
* [Generate captions for
images](/howto/cookbooks/images/img-generate-captions)
* [Hugging Face image-to-image
models](https://huggingface.co/models?pipeline_tag=image-to-image)
# Transform images with PIL operations
Source: https://docs.pixeltable.com/howto/cookbooks/images/img-pil-transforms
Resize, crop, rotate, and transform images at scale in Pixeltable using PIL operations exposed as built-in UDFs for computed columns.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
## Problem
You need to resize, rotate, crop, or convert hundreds of images—and keep
track of all the transformed versions.
## Solution
**What’s in this recipe:**
* Basic image operations (resize, rotate, flip, crop)
* Track image properties
* Iterate on transformations before adding to your table
You apply PIL transformations (resize, rotate, flip, crop) to images in
your table using Pixeltable’s built-in image functions—common operations
that work directly on image columns.
You can iterate on transformations before adding them to your table. Use
`.select()` with `.collect()` to preview results on sample
images—nothing is stored in your table. If you want to collect only the
first few rows, use `.head(n)` instead of `.collect()`. Once you’re
satisfied, use `.add_computed_column()` to apply the transformation to
all images in your table.
For more on this workflow, see [Get fast feedback on
transformations](/howto/cookbooks/core/dev-iterative-workflow).
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
### Load images
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Create a fresh directory (drop existing if present)
pxt.drop_dir('image_demo', force=True)
pxt.create_dir('image_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'image\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('image_demo/images', {'image': pxt.Image})
```
Inserting rows into \`images\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`images\`: 3 rows \[00:00, 708.38 rows/s]
Inserted 3 rows with 0 errors.
3 rows inserted, 6 values computed.
### Iterate: check image properties for a few images first
Use `.select()` to define the transformation, then `.collect()` to
execute and return results. If you want to collect only the first few
rows, use `.head(n)` instead of `.collect()`. Nothing is stored in your
table.
Pixeltable includes these built-in functions for image properties:
* `.height` - Get image height in pixels
* `.width` - Get image width in pixels
* `.mode` - Get color mode (RGB, RGBA, L for grayscale, etc.)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Preview the properties
t.select(t.image, t.image.height, t.image.width, t.image.mode).collect()
```
### Add: check image properties for all images in your table
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Save as computed columns
t.add_computed_column(height=t.image.height)
t.add_computed_column(width=t.image.width)
t.add_computed_column(mode=t.image.mode) # RGB, RGBA, L (grayscale), etc.
```
Added 3 column values with 0 errors.
Added 3 column values with 0 errors.
Added 3 column values with 0 errors.
3 rows updated, 6 values computed.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View images with computed height, width, and mode columns
t.collect()
```
### Iterate: resize a few images first
Use `.select()` to define the transformation, then `.collect()` to
execute and return results. If you want to collect only the first few
rows, use `.head(n)` instead of `.collect()`. Nothing is stored in your
table.
Pixeltable includes a built-in function for resizing image files with
PIL:
* `.resize(width, height)` - Change image dimensions
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Preview the resize operation
t.select(t.image, t.image.resize((224, 224))).head(1)
```
### Add: resize all images in your table
Once you’re satisfied with the results, use `.add_computed_column()`
with the same expression. This processes all rows and stores the results
permanently in your table.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Save as computed column
t.add_computed_column(resized=t.image.resize((224, 224)))
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View images with resized column
t.collect()
```
### Iterate: rotate a few images first
Use `.select()` to define the transformation, then `.collect()` to
execute and return results. If you want to collect only the first few
rows, use `.head(n)` instead of `.collect()`. Nothing is stored in your
table.
Pixeltable includes a built-in function for rotating image files with
PIL:
* `.rotate(degrees)` - Rotate image by specified degrees
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Preview the rotation
t.select(t.image, t.image.rotate(90)).head(1)
```
### Add: rotate all images in your table
Once you’re satisfied with the results, use `.add_computed_column()`
with the same expression. This processes all rows and stores the results
permanently in your table.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Save as computed column
t.add_computed_column(rotated=t.image.rotate(90))
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View images with rotated column
t.collect()
```
### Iterate: flip a few images first
Use `.select()` to define the transformation, then `.collect()` to
execute and return results. If you want to collect only the first few
rows, use `.head(n)` instead of `.collect()`. Nothing is stored in your
table.
Pixeltable includes a built-in function for transposing image files with
PIL (note that for this transform you will need import PIL to access the
`FLIP_*` constants):
* `.transpose(Image.FLIP_TOP_BOTTOM)` - Flip image vertically
* `.transpose(Image.FLIP_LEFT_RIGHT)` - Mirror image horizontally
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Import PIL Image to access flip constants
from PIL import Image
# Preview both flip operations
t.select(
t.image,
t.image.transpose(Image.FLIP_TOP_BOTTOM),
t.image.transpose(Image.FLIP_LEFT_RIGHT),
).head(1)
```
### Add: flip all images in your table
Once you’re satisfied with the results, use `.add_computed_column()`
with the same expression. This processes all rows and stores the results
permanently in your table.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Flip vertically (top to bottom)
t.add_computed_column(flip_v=t.image.transpose(Image.FLIP_TOP_BOTTOM))
# Flip horizontally (left to right, mirror effect)
t.add_computed_column(flip_h=t.image.transpose(Image.FLIP_LEFT_RIGHT))
```
Added 3 column values with 0 errors.
Added 3 column values with 0 errors.
3 rows updated, 3 values computed.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View original and flipped versions side by side
t.select(t.image, t.flip_v, t.flip_h).collect()
```
### Iterate: crop a few images first
Use `.select()` to define the transformation, then `.collect()` to
execute and return results. If you want to collect only the first few
rows, use `.head(n)` instead of `.collect()`. Nothing is stored in your
table.
Pixeltable includes a built-in function for cropping image files with
PIL:
* `.crop(box)` - Extract a rectangular region from the image (box
format: `(left, top, right, bottom)`)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Preview the center crop
# Box format: (left, top, right, bottom)
t.select(
t.image,
t.image.crop(
(
t.image.width // 4,
t.image.height // 4,
3 * t.image.width // 4,
3 * t.image.height // 4,
)
),
).head(1)
```
### Add: crop all images in your table
Once you’re satisfied with the results, use `.add_computed_column()`
with the same expression. This processes all rows and stores the results
permanently in your table.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Save as computed column
t.add_computed_column(
center_crop=t.image.crop(
(
t.image.width // 4,
t.image.height // 4,
3 * t.image.width // 4,
3 * t.image.height // 4,
)
)
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View center-cropped images
t.select(t.center_crop).collect()
```
## Explanation
**How PIL transformations work in Pixeltable:**
Pixeltable provides built-in functions that wrap PIL (Pillow) operations
for image manipulation. These functions work directly on image columns
in your table—no need to write loops or manage file I/O. When you call
`.resize()`, `.rotate()`, or other methods on an image column,
Pixeltable handles applying the transformation to each image
automatically.
All these transformations use standard PIL operations under the hood.
For more details on PIL functionality, see the [Pillow
documentation](https://pillow.readthedocs.io/).
**To customize transformations:**
* **Resize**: Change dimensions with `.resize((width, height))` -
specify target size in pixels
* **Rotate**: Rotate counterclockwise with `.rotate(degrees)` - use
negative values for clockwise rotation
* **Flip**: Use `.transpose(Image.FLIP_LEFT_RIGHT)` for horizontal
mirror or `.transpose(Image.FLIP_TOP_BOTTOM)` for vertical flip
* **Crop**: Extract regions with `.crop((left, top, right, bottom))` -
coordinates are in pixels from top-left origin
* **Properties**: Access `.width`, `.height`, and `.mode` to get image
dimensions and color mode
**The Pixeltable workflow:**
In traditional databases, `.select()` just picks which columns to view.
In Pixeltable, `.select()` also lets you compute new transformations on
the fly—define new columns without storing them. This makes `.select()`
perfect for testing transformations before you commit them.
When you use `.select()`, you’re creating a query that doesn’t execute
until you call `.collect()`. You must use `.collect()` to execute the
query and return results—nothing is stored in your table. If you want to
collect only the first few rows, use `.head(n)` instead of `.collect()`
to test on a subset before processing your full dataset. Once satisfied,
use `.add_computed_column()` with the same expression to persist results
permanently.
For more on this workflow, see [Get fast feedback on
transformations](/howto/cookbooks/core/dev-iterative-workflow).
## See also
* [Convert RGB images to
grayscale](/howto/cookbooks/images/img-rgb-to-grayscale)
* [Apply filters to
images](/howto/cookbooks/images/img-apply-filters)
* [Test transformations with fast feedback
loops](/howto/cookbooks/core/dev-iterative-workflow)
# Convert color images to grayscale
Source: https://docs.pixeltable.com/howto/cookbooks/images/img-rgb-to-grayscale
Convert RGB images to grayscale in Pixeltable using PIL mode conversion on computed columns for preprocessing, OCR, and machine learning pipelines.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
## Problem
You need to convert color images to grayscale for analysis,
preprocessing, or model inputs that require single-channel images.
Different conversion methods produce different results—you need to
choose the right approach for your use case.
## Solution
**What’s in this recipe:**
* Simple conversion with PIL
* Perceptually accurate grayscale (weighted RGB channels)
* Custom UDF for advanced conversion
You convert RGB images to grayscale in your table using either
Pixeltable’s built-in `.convert()` method for standard conversion, or a
custom UDF (relies on NumPy and PIL/Pillow) for gamma-corrected
conversion when scientific accuracy matters.
You can iterate on transformations before adding them to your table. Use
`.select()` with `.collect()` to preview results on sample
images—nothing is stored in your table. If you want to collect only the
first few rows, use `.head(n)` instead of `.collect()`. Once you’re
satisfied, use `.add_computed_column()` to apply the conversion to all
images in your table.
For more on this workflow, see [Get fast feedback on
transformations](/howto/cookbooks/core/dev-iterative-workflow).
**Conversion methods:**
The simple method uses PIL’s built-in conversion. The gamma-corrected
method requires a custom UDF (not built into PIL) that applies
perceptual weighting in linear color space.
*For technical details on gamma correction and grayscale conversion, see
[Wikipedia: Grayscale](https://en.wikipedia.org/wiki/Grayscale).*
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable numpy
```
### Load images
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import numpy as np
import pixeltable as pxt
from PIL import Image
# Create a fresh directory (drop existing if present)
pxt.drop_dir('image_demo', force=True)
pxt.create_dir('image_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'image\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('image_demo/gray', {'image': pxt.Image})
```
Inserting rows into \`gray\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`gray\`: 3 rows \[00:00, 617.66 rows/s]
Inserted 3 rows with 0 errors.
3 rows inserted, 6 values computed.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View loaded images
t.collect()
```
### Iterate: convert with linear approximation for a few images first
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Query: Preview the conversion
t.select(t.image, t.image.convert('L')).head(1)
```
### Add: convert with linear approximation for all images in your table
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Commit: Save as computed column (built-in PIL conversion - fast and good for most use cases)
t.add_computed_column(grayscale=t.image.convert('L'))
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View all results
t.collect()
```
## Explanation
**Two approaches:**
1. **Simple (`.convert('L')`):** PIL’s built-in. Fast, good for most
use cases (model preprocessing, general analysis).
2. **Gamma-corrected (custom UDF):** Not built into PIL. Requires a
custom UDF that:
* Gamma-decompresses to linear space
* Applies perceptual weights: 0.2126 × R + 0.7152 × G + 0.0722 × B
* Gamma-compresses back for display
* Slower but most perceptually accurate
* Use for scientific imaging, professional photography
**Why gamma matters:** Displays aren’t linear—doubling a pixel value
doesn’t double perceived brightness. Gamma correction accounts for this.
For best results, convert to linear space before weighting, then convert
back.
*The gamma-corrected method is based on [Brandon Rohrer’s
explanation](https://brandonrohrer.com/convert_rgb_to_grayscale.html) of
perceptually accurate RGB to grayscale conversion.*
**The Pixeltable workflow:**
In traditional databases, `.select()` just picks which columns to view.
In Pixeltable, `.select()` also lets you compute new transformations on
the fly—define new columns without storing them. This makes `.select()`
perfect for testing transformations before you commit them.
When you use `.select()`, you’re creating a query that doesn’t execute
until you call `.collect()`. You must use `.collect()` to execute the
query and return results—nothing is stored in your table. If you want to
collect only the first few rows, use `.head(n)` instead of `.collect()`
to test on a subset before processing your full dataset. Once satisfied,
use `.add_computed_column()` with the same expression to persist results
permanently.
For more on this workflow, see [Get fast feedback on
transformations](/howto/cookbooks/core/dev-iterative-workflow).
## See also
* [Transform images with PIL
operations](/howto/cookbooks/images/img-pil-transforms)
* [Test transformations with fast feedback
loops](/howto/cookbooks/core/dev-iterative-workflow)
# Visualize object detections
Source: https://docs.pixeltable.com/howto/cookbooks/images/img-visualize-detections
Draw bounding boxes, labels, and segmentation masks over images in Pixeltable to visualize object detection and vision model outputs.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Draw bounding boxes on images to visualize object detection results.
## Problem
You’ve run object detection on images but need to visualize the
results—see where objects were detected and verify the model’s accuracy.
## Solution
**What’s in this recipe:**
* Run object detection with YOLOX
* Draw bounding boxes on images
* Color-code by object class
You create a pipeline that detects objects and then draws the results on
the original image.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable pixeltable-yolox
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.vision import bboxes_draw
from pixeltable.functions.yolox import yolox
# Create a fresh directory
pxt.drop_dir('viz_demo', force=True)
pxt.create_dir('viz_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'viz\_demo'.
\
### Create detection and visualization pipeline
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create table for images
images = pxt.create_table('viz_demo/images', {'image': pxt.Image})
```
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Step 2: Draw bounding boxes on the image
# Note: bboxes_draw takes image, boxes, and labels (scores are not used for drawing)
images.add_computed_column(
annotated=bboxes_draw(
images.image,
images.detections.bboxes,
labels=images.detections.labels,
)
)
```
Added 0 column values with 0 errors.
No rows affected.
### Detect and visualize
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert sample images
base_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images'
image_urls = [
f'{base_url}/000000000036.jpg', # cats
f'{base_url}/000000000139.jpg', # elephants
]
images.insert([{'image': url} for url in image_urls])
```
Inserting rows into \`images\`: 0 rows \[00:00, ? rows/s]
Inserting rows into \`images\`: 2 rows \[00:00, 236.29 rows/s]
Inserted 2 rows with 0 errors.
2 rows inserted, 8 values computed.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View original vs annotated images side by side
images.select(images.image, images.annotated).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View detection details
images.select(images.detections).collect()
```
## Explanation
**Pipeline flow:**
**Detection output format:**
The `yolox` function returns a dict with:
* `bboxes` - List of \[x1, y1, x2, y2] coordinates
* `labels` - List of class names (e.g., “cat”, “dog”)
* `scores` - List of confidence scores (0-1)
**YOLOX model options:**
## See also
* [Detect objects in
images](/howto/cookbooks/images/img-detect-objects) -
Object detection basics
* [Extract video
frames](/howto/cookbooks/video/video-extract-frames) -
Detect objects in video
# Analyze images in batch with AI vision
Source: https://docs.pixeltable.com/howto/cookbooks/images/vision-batch-analysis
Run GPT-4 Vision, Claude, and Gemini over large image batches in Pixeltable with computed columns, retries, and structured outputs.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Run the same AI prompt against multiple images automatically, without
writing loops or managing API calls.
## Problem
You have a collection of images that all need the same analysis—like
“Describe this image”, “Is this product damaged?”, or “What objects are
visible?”.
Writing a loop to call an API for each image is tedious and error-prone.
You need to handle rate limits, retries, and track which images
succeeded or failed.
## Solution
**What’s in this recipe:**
* Analyze multiple images with a single prompt using
`openai.chat_completions()`
* Get all results at once, stored in your table
* No loops or manual API calls
You add a computed column that applies `openai.chat_completions()` with
multimodal messages to every image in your table. Pixeltable handles the
API calls, retries, and result storage automatically.
When you insert new images, the analysis runs automatically—no extra
code needed.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
```
### Load images
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions import openai
# Create a fresh directory
pxt.drop_dir('vision_demo', force=True)
pxt.create_dir('vision_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'vision\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('vision_demo/images', {'image': pxt.Image})
```
Inserted 3 rows with 0 errors in 0.03 s (88.80 rows/s)
3 rows inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View loaded images
t.collect()
```
### Analyze images with AI
Add a computed column using `openai.chat_completions()`. The prompt runs
automatically on all images:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Define the prompt
messages = [
{
'role': 'user',
'content': [
{
'type': 'text',
'text': 'Describe this image in one sentence.',
},
{'type': 'image_url', 'image_url': t.image},
],
}
]
# Add computed column for AI analysis using openai.chat_completions()
t.add_computed_column(
description=openai.chat_completions(messages, model='gpt-4o-mini')
)
```
Added 3 column values with 0 errors in 4.84 s (0.62 rows/s)
3 rows updated.
### View results
`openai.chat_completions()` returns a JSON structure containing the
output, which we can unpack in the usual way:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View results: image alongside its AI-generated description
t.select(
t.image,
t.description,
t.description['choices'][0]['message']['content'],
).collect()
```
### New images are analyzed automatically
When you insert more images, the analysis runs without any extra code:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert a new image - analysis happens automatically
t.insert(
[
{
'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000139.jpg'
}
]
)
# View all results including the new image
t.select(
t.image,
t.description,
t.description['choices'][0]['message']['content'],
).collect()
```
## Explanation
**How it works:**
1. Add images to your table
2. Define a computed column with `openai.chat_completions()`
3. Pixeltable executes the API call for each row automatically
4. Results are cached—rerunning won’t re-call the API
5. New rows trigger automatic computation
**Changing the prompt:**
To use a different prompt, add a new computed column with
`if_exists='replace'`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
messages = ...
t.add_computed_column(
description=openai.chat_completions(messages, model='gpt-4o-mini'),
if_exists='replace'
)
```
**Using other providers:**
Replace `openai.chat_completions` with:
* `anthropic.messages` for Claude
* `gemini.generate_content` for Gemini
* `together.chat_completions` for Together AI
## See also
* [Configure API
keys](/howto/cookbooks/core/workflow-api-keys)
* [Working with
OpenAI](/howto/providers/working-with-openai)
### New images are analyzed automatically
When you insert more images, the analysis runs without any extra code:
# Extract structured data from images
Source: https://docs.pixeltable.com/howto/cookbooks/images/vision-structured-output
Extract structured JSON from images in Pixeltable using vision LLMs with Pydantic schemas, validation, and typed computed columns.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Use AI vision to extract JSON data from receipts, forms, documents, and
other images.
## Problem
You have images containing structured information (receipts, forms, ID
cards) and need to extract specific fields as JSON for downstream
processing.
## Solution
**What’s in this recipe:**
* Extract structured JSON from images using GPT-4o
* Use `openai.chat_completions()` with multimodal messages
* Access individual fields from the extracted data
You use Pixeltable’s `openai.chat_completions()` function with
multimodal messages that include images directly. Request JSON output
via `response_format` in `model_kwargs`.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
```
### Load images
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions import openai
# Create a fresh directory
pxt.drop_dir('extraction_demo', force=True)
pxt.create_dir('extraction_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'extraction\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('extraction_demo/images', {'image': pxt.Image})
```
Inserted 2 rows with 0 errors in 0.03 s (60.43 rows/s)
2 rows inserted.
### Extract structured data
Use `openai.chat_completions()` to analyze images and get JSON output:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add extraction column using openai.chat_completions with multimodal messages
PROMPT = """Analyze this image and extract the following as JSON:
- description: A brief description of the image
- objects: List of objects visible in the image
- dominant_colors: List of dominant colors
- scene_type: Type of scene (indoor, outdoor, etc.)"""
messages = [
{
'role': 'user',
'content': [
{'type': 'text', 'text': PROMPT},
{'type': 'image_url', 'image_url': t.image},
],
}
]
t.add_computed_column(
data=openai.chat_completions(
messages,
model='gpt-4o-mini',
model_kwargs={'response_format': {'type': 'json_object'}},
)
)
```
Added 2 column values with 0 errors in 7.55 s (0.26 rows/s)
2 rows updated.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View extracted data
t.select(
t.image, t.data, t.data['choices'][0]['message']['content']
).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# You can also parse the JSON into individual columns if needed
import json
@pxt.udf
def parse_description(data: str) -> str:
return json.loads(data).get('description', '')
t.select(
t.image,
description=parse_description(
t.data['choices'][0]['message']['content']
),
).collect()
```
## Explanation
**Getting JSON output:**
Pass `model_kwargs={'response_format': {'type': 'json_object'}}` to get
structured JSON.
**Other extraction use cases:**
## See also
* [Analyze images in
batch](/howto/cookbooks/images/vision-batch-analysis)
* [Configure API
keys](/howto/cookbooks/core/workflow-api-keys)
# Create text embeddings with OpenAI
Source: https://docs.pixeltable.com/howto/cookbooks/search/embed-text-openai
Embed text columns with OpenAI embedding models in Pixeltable to build vector indices for semantic search and retrieval-augmented generation.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Generate vector embeddings for text data to enable semantic search and
similarity matching.
## Problem
You need to convert text into vector embeddings for:
* Semantic search (find similar documents)
* RAG pipelines (retrieve relevant context)
* Clustering and classification
## Solution
**What’s in this recipe:**
* Generate embeddings with OpenAI’s models
* Store embeddings as computed columns
* Use embeddings for similarity queries
You add an embedding column that automatically generates vectors for new
rows. The embeddings are cached and only recomputed when the source text
changes.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.openai import embeddings
# Create a fresh directory
pxt.drop_dir('embed_demo', force=True)
pxt.create_dir('embed_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'embed\_demo'.
\
Added 0 column values with 0 errors.
No rows affected.
### Insert documents
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert sample documents
sample_docs = [
{
'title': 'Python Basics',
'content': 'Python is a high-level programming language known for its clear syntax and readability.',
},
{
'title': 'Machine Learning',
'content': 'Machine learning is a subset of AI that enables systems to learn from data.',
},
{
'title': 'Web Development',
'content': 'Web development involves building websites and web applications using HTML, CSS, and JavaScript.',
},
{
'title': 'Data Science',
'content': 'Data science combines statistics, programming, and domain expertise to extract insights from data.',
},
{
'title': 'Cloud Computing',
'content': 'Cloud computing provides on-demand computing resources over the internet.',
},
]
docs.insert(sample_docs)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View documents with embeddings (showing first 5 dimensions)
result = docs.select(docs.title, docs.embedding).collect()
```
### Query by similarity
Find documents similar to a query by creating an embedding index:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add embedding index for semantic search
docs.add_embedding_index(
column='content',
string_embed=embeddings.using(model='text-embedding-3-small'),
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Search for similar documents
sim = docs.content.similarity(
string='artificial intelligence applications'
)
results = (
docs.where(sim > 0.2)
.order_by(sim, asc=False)
.limit(3)
.select(docs.title, docs.content, sim=sim)
)
results.collect()
```
## Explanation
**OpenAI embedding models:**
**Similarity metrics:**
**Key benefits of computed embedding columns:**
* Embeddings are generated automatically on insert
* Results are cached—no re-computation on subsequent queries
* Index enables fast similarity search at scale
## See also
* [Semantic text
search](/howto/cookbooks/search/search-semantic-text) -
Full semantic search patterns
* [Chunk documents for
RAG](/howto/cookbooks/text/doc-chunk-for-rag) -
Prepare documents for retrieval
# Build semantic search for text
Source: https://docs.pixeltable.com/howto/cookbooks/search/search-semantic-text
Build semantic text search in Pixeltable with embedding indices, similarity queries, and top-k retrieval over documents and chunks.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Create a searchable knowledge base that finds content by meaning, not
just keywords.
## Problem
You have a collection of text content (articles, notes, documentation)
and need to find relevant items based on meaning.
Keyword search fails when users phrase queries differently from the
source text:
## Solution
**What’s in this recipe:**
* Create a text table with embeddings
* Search by semantic similarity
* Combine with metadata filters
You add an embedding index to your text column. Pixeltable automatically
generates embeddings for each row and enables similarity search.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable sentence-transformers
```
### Create knowledge base
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.huggingface import sentence_transformer
# Create a fresh directory
pxt.drop_dir('search_demo', force=True)
pxt.create_dir('search_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'search\_demo'.
\
### Add semantic search
Create an embedding index on the content column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add embedding index
kb.add_embedding_index(
column='content',
string_embed=sentence_transformer.using(model_id='all-MiniLM-L6-v2'),
)
```
### Search by meaning
Find content semantically similar to your query:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Search by meaning
query = 'how to fix bugs'
sim = kb.content.similarity(string=query)
results = (
kb.order_by(sim, asc=False)
.select(kb.title, kb.content, score=sim)
.limit(2)
)
results.collect()
```
### Filter by metadata
Combine semantic search with metadata filters:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Search within a specific category
query = 'best practices'
sim = kb.content.similarity(string=query)
results = (
kb.where(kb.category == 'engineering') # Filter first
.order_by(sim, asc=False)
.select(kb.title, kb.category, score=sim)
.limit(2)
)
results.collect()
```
## Explanation
**How similarity search works:**
1. Your query is converted to an embedding vector
2. Pixeltable finds the most similar vectors in the index
3. Results are ranked by cosine similarity (0 to 1)
**Embedding models:**
**New content is indexed automatically:**
When you insert new rows, embeddings are generated without extra code.
## See also
* [Vector database
documentation](/platform/embedding-indexes)
* [Split documents for
RAG](/howto/cookbooks/text/doc-chunk-for-rag)
# Find similar images with CLIP
Source: https://docs.pixeltable.com/howto/cookbooks/search/search-similar-images
Find visually similar images in Pixeltable using CLIP and other vision embeddings with similarity search over indexed image columns.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Build visual similarity search to find images that look alike using
OpenAI’s CLIP model.
## Problem
You have a collection of images and need to find visually similar
ones—for duplicate detection, content recommendations, or visual search.
## Solution
**What’s in this recipe:**
* Create image embeddings with CLIP
* Search by image similarity
* Search by text description (cross-modal)
You add an embedding index using CLIP, which understands both images and
text. This enables finding similar images or searching images by text
description.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable sentence-transformers torch
```
### Load images
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.huggingface import clip
# Create a fresh directory
pxt.drop_dir('image_search_demo', force=True)
pxt.create_dir('image_search_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'image\_search\_demo'.
\
### Create CLIP embedding index
Add an embedding index using CLIP for cross-modal search:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add CLIP embedding index (supports both image and text queries)
images.add_embedding_index(
'image', embedding=clip.using(model_id='openai/clip-vit-base-patch32')
)
```
### Search by text description
Find images matching a text query:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Search by text description
query = 'people eating food'
sim = images.image.similarity(string=query)
results = (
images.order_by(sim, asc=False)
.select(images.image, score=sim)
.limit(2)
)
results.collect()
```
## Explanation
**Why CLIP:**
CLIP (Contrastive Language-Image Pre-training) understands both images
and text in the same embedding space. This enables:
* Image-to-image search (find similar photos)
* Text-to-image search (find photos matching a description)
**Index parameters:**
**Both must use the same model** for cross-modal search to work.
**New images are indexed automatically:**
When you insert new images, embeddings are generated without extra code.
## See also
* [Semantic text
search](/howto/cookbooks/search/search-semantic-text)
* [Vector database
documentation](/platform/embedding-indexes)
# Split documents into chunks for RAG
Source: https://docs.pixeltable.com/howto/cookbooks/text/doc-chunk-for-rag
Split documents into RAG-ready chunks in Pixeltable using DocumentSplitter with overlap, token limits, and structural heading awareness.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Break PDFs and documents into searchable chunks for retrieval-augmented
generation (RAG) pipelines.
## Problem
You have PDF documents or text files that you want to use for
retrieval-augmented generation (RAG). Before you can search them, you
need to:
1. Split documents into smaller chunks
2. Generate embeddings for each chunk
3. Store everything in a searchable index
## Solution
**What’s in this recipe:**
* Split PDFs into sentences with token limits
* Control chunk size with token limits
* Add embeddings for semantic search
You create a view with a `document_splitter` iterator that automatically
breaks documents into chunks. Then you add an embedding index for
semantic search.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable sentence-transformers spacy tiktoken
!python -m spacy download en_core_web_sm -q
```
### Load documents
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.document import document_splitter
from pixeltable.functions.huggingface import sentence_transformer
# Create a fresh directory
pxt.drop_dir('rag_demo', force=True)
pxt.create_dir('rag_demo')
```
### Split into chunks
Create a view that splits each document into sentences with a token
limit:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a view that splits documents into chunks
chunks = pxt.create_view(
'rag_demo/chunks',
docs,
iterator=document_splitter(
docs.document,
separators='sentence,token_limit', # Split by sentence with token limit
limit=300, # Max 300 tokens per chunk
),
)
```
Inserting rows into \`chunks\`: 217 rows \[00:00, 42111.88 rows/s]
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View the chunks
chunks.select(chunks.text).head(5)
```
### Add semantic search
Create an embedding index on the chunks for similarity search:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add embedding index for semantic search
chunks.add_embedding_index(
column='text',
string_embed=sentence_transformer.using(model_id='all-MiniLM-L6-v2'),
)
```
### Search your documents
Use similarity search to find relevant chunks:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Search for relevant chunks
query = 'market trends'
sim = chunks.text.similarity(string=query)
results = (
chunks.order_by(sim, asc=False)
.select(chunks.text, score=sim)
.limit(3)
)
results.collect()
```
## Explanation
**Separator options:**
You can combine separators: `separators='sentence,token_limit'`
**Chunk sizing:**
* `limit`: Maximum tokens per chunk (default: 500)
* `overlap`: Tokens to overlap between chunks (default: 0)
**New documents are processed automatically:**
When you insert new documents, chunks and embeddings are generated
without extra code.
## See also
* [Iterators
documentation](/platform/iterators)
* [RAG demo
notebook](/howto/use-cases/rag-demo)
# Extract text from PowerPoint, Word, and Excel files
Source: https://docs.pixeltable.com/howto/cookbooks/text/doc-extract-text-from-office-files
Extract text from PowerPoint, Word, and Excel office documents in Pixeltable for indexing, search, and downstream LLM RAG workflows.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Transform office documents into searchable, analyzable text data.
**What’s in this recipe:**
* Extract text from PPTX, DOCX, and XLSX files
* Split documents by headings, paragraphs, or custom limits
* Preserve document structure and metadata for analysis
## Problem
You have office documents—presentations, reports, spreadsheets—that
contain valuable text data. You need to extract this text to analyze
content, search across documents, or feed into AI models.
Manual extraction means opening each file, copying text, and losing
structural information like headings and page boundaries. You need an
automated way to process hundreds or thousands of office files while
preserving their organization.
## Solution
You extract text from office documents using Pixeltable’s document type
with Microsoft’s MarkItDown library. This converts PowerPoint, Word, and
Excel files to structured text automatically.
You use `DocumentSplitter` to split documents by headings, paragraphs,
or token limits. Each split creates a view where each row represents a
chunk of the document with its metadata.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable 'markitdown[pptx,docx,xlsx]' mistune tiktoken
```
### Load office documents
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.iterators.document import DocumentSplitter
# Create a fresh directory (drop existing if present)
pxt.drop_dir('office_docs', force=True)
pxt.create_dir('office_docs')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'office\_docs'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Sample PowerPoint from Pixeltable repo
# Replace with your own PPTX, DOCX, or XLSX files
sample_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/calpy.pptx'
docs.insert([{'doc': sample_url}])
```
### Extract full document text
You create a view with `DocumentSplitter` to extract text. Setting
`separators=''` extracts the full document without splitting.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a view to extract full document text
full_text = pxt.create_view(
'office_docs/full_text',
docs,
iterator=DocumentSplitter.create(
document=docs.doc,
separators='', # No splitting - extract full document
),
)
```
Inserting rows into \`full\_text\`: 1 rows \[00:00, 196.50 rows/s]
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Preview extracted text
full_text.select(full_text.doc, full_text.text).head(1)
```
### Split documents by headings
You split documents by headings to preserve their logical structure.
Each section under a heading becomes a separate chunk.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create view that splits by headings
by_heading = pxt.create_view(
'office_docs/by_heading',
docs,
iterator=DocumentSplitter.create(
document=docs.doc,
separators='heading',
metadata='heading', # Preserve heading structure
),
)
```
Inserting rows into \`by\_heading\`: 87 rows \[00:00, 10359.54 rows/s]
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View chunks with their headings
by_heading.select(by_heading.heading, by_heading.text).head(5)
```
### Split by token limit for AI models
You split documents by token count when feeding chunks to AI models. The
`overlap` parameter ensures chunks share context at boundaries.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create view with token-based splitting
by_tokens = pxt.create_view(
'office_docs/by_tokens',
docs,
iterator=DocumentSplitter.create(
document=docs.doc,
separators='heading,token_limit', # Split by heading first, then by tokens
limit=512, # Maximum tokens per chunk
overlap=50, # Overlap between chunks to preserve context
metadata='heading',
),
)
```
Inserting rows into \`by\_tokens\`: 2369 rows \[00:00, 9212.05 rows/s]
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Preview chunks with token limits
by_tokens.select(by_tokens.doc, by_tokens.heading, by_tokens.text).head(3)
```
### Search across documents
You search across all document chunks using standard Pixeltable queries.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Find chunks containing specific keywords
by_tokens.where(by_tokens.text.contains('Python')).select(
by_tokens.doc, by_tokens.text
).head(3)
```
## Explanation
**Supported formats:**
* PowerPoint: `.pptx`, `.ppt`
* Word: `.docx`, `.doc`
* Excel: `.xlsx`, `.xls`
**Separator options:**
* `heading` - Split by document headings (preserves structure)
* `paragraph` - Split by paragraphs
* `sentence` - Split by sentences
* `token_limit` - Split by token count (requires `limit` parameter)
* `char_limit` - Split by character count (requires `limit` parameter)
* Multiple separators work together: `'heading,token_limit'` splits by
heading first, then ensures no chunk exceeds token limit
**Metadata fields:**
* `heading` - Hierarchical heading structure (e.g.,
`{'h1': 'Introduction', 'h2': 'Overview'}`)
* `title` - Document title
* `sourceline` - Source line number (HTML and Markdown documents)
**Token overlap:** The `overlap` parameter ensures chunks share context
at boundaries. This prevents sentences from being split mid-thought when
feeding chunks to AI models.
## See also
* [Get fast feedback on
transformations](/howto/cookbooks/core/dev-iterative-workflow)
* [Pixeltable Document
API](/sdk/latest/document)
# Extract named entities from text
Source: https://docs.pixeltable.com/howto/cookbooks/text/text-extract-entities
Extract named entities, relations, and structured fields from text in Pixeltable using LLMs with Pydantic schemas and typed outputs.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Identify and extract people, organizations, locations, dates, and other
entities from text using LLMs.
## Problem
You have unstructured text containing important information—names,
companies, dates, locations—that you need to extract and structure for
analysis, search, or integration with other systems.
## Solution
**What’s in this recipe:**
* Extract entities as structured JSON
* Use OpenAI’s structured output for reliable parsing
* Access extracted entities as queryable columns
You use structured output to get entities in a consistent JSON format.
The entities are stored as JSON columns that you can query and filter.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai
```
WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
Note: you may need to restart the kernel to use updated packages.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import json
import pixeltable as pxt
from pixeltable.functions.openai import chat_completions
# Create a fresh directory
pxt.drop_dir('entities_demo', force=True)
pxt.create_dir('entities_demo')
```
Added 0 column values with 0 errors.
No rows affected.
### Extract entities from text
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert sample articles
sample_articles = [
{
'title': 'Tech Acquisition',
'content': 'Microsoft announced today that CEO Satya Nadella will lead the acquisition of a Seattle-based startup. The deal, expected to close in March 2024, is valued at $500 million.',
},
{
'title': 'Sports Update',
'content': 'LeBron James led the Los Angeles Lakers to victory against the Boston Celtics on Tuesday night at Staples Center. Coach Darvin Ham praised the teams performance.',
},
{
'title': 'Research Breakthrough',
'content': 'Dr. Sarah Chen at Stanford University published groundbreaking research on renewable energy. The study, funded by the National Science Foundation, was conducted in Palo Alto, California.',
},
]
articles.insert(sample_articles)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View extracted entities
articles.select(articles.title, articles.entities).collect()
```
## Explanation
**Structured output ensures reliable extraction:**
By using OpenAI’s structured output (`response_format`), the model
always returns valid JSON matching the schema. No post-processing or
error handling needed.
**Common entity types:**
**Customizing the schema:**
Modify the `entity_schema` to extract domain-specific entities—product
SKUs, legal terms, medical conditions, etc.
## See also
* [Extract structured data from
images](/howto/cookbooks/images/vision-structured-output) -
JSON extraction from images
* [Extract fields from
JSON](/howto/cookbooks/core/workflow-json-extraction) -
Parse LLM response fields
# Summarize text with LLMs
Source: https://docs.pixeltable.com/howto/cookbooks/text/text-summarize
Summarize long documents, articles, and transcripts in Pixeltable using LLMs with chunking, map-reduce, and structured output schemas.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Generate concise summaries of long text, articles, or documents using
large language models.
## Problem
You have long text content—articles, transcripts, documents—that needs
to be summarized. Processing each piece manually is time-consuming and
inconsistent.
## Solution
**What’s in this recipe:**
* Summarize text using OpenAI GPT models
* Customize summary style with prompts
* Process multiple documents automatically
You add a computed column that calls an LLM to generate summaries. When
you insert new text, summaries are generated automatically.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai
```
WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
Note: you may need to restart the kernel to use updated packages.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
```
### Load sample text
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions import openai
# Create a fresh directory
pxt.drop_dir('summarize_demo', force=True)
pxt.create_dir('summarize_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'summarize\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Sample articles to summarize
sample_articles = [
{
'title': 'The Rise of Electric Vehicles',
'content': """Electric vehicles (EVs) have seen unprecedented growth in recent years,
transforming the automotive industry. Sales increased by 60% globally in 2023,
with China leading the market followed by Europe and North America. Major automakers
like Tesla, BYD, and traditional manufacturers have invested billions in EV technology.
Battery costs have dropped significantly, making EVs more affordable for consumers.
Government incentives and stricter emissions regulations continue to drive adoption.
Charging infrastructure is expanding rapidly, with new fast-charging networks being
deployed across major highways. Despite challenges like range anxiety and charging
times, consumer acceptance is growing steadily.""",
},
{
'title': 'Advances in Renewable Energy',
'content': """Solar and wind power capacity reached record levels in 2023, accounting
for over 30% of global electricity generation. The cost of solar panels has fallen
by 90% over the past decade, making renewable energy competitive with fossil fuels.
Offshore wind farms are being built at scale, with turbines now reaching heights
of over 250 meters. Energy storage solutions, particularly lithium-ion batteries,
are addressing intermittency challenges. Countries like Denmark and Scotland have
achieved periods of 100% renewable electricity. Corporate power purchase agreements
are accelerating the transition, with tech giants committing to carbon-neutral operations.""",
},
]
articles.insert(sample_articles)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Extract the summary text from the response
articles.add_computed_column(
summary=articles.response.choices[0].message.content
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View titles and summaries
articles.select(articles.title, articles.summary).collect()
```
### Custom summary styles
You can customize the summary format by changing the prompt:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add bullet-point summary
bullet_prompt = (
'List the 3 key points from this article as bullet points:\n\n'
+ articles.content
)
articles.add_computed_column(
bullet_response=openai.chat_completions(
messages=[{'role': 'user', 'content': bullet_prompt}],
model='gpt-4o-mini',
)
)
articles.add_computed_column(
key_points=articles.bullet_response.choices[0].message.content
)
```
Added 2 column values with 0 errors.
Added 2 column values with 0 errors.
2 rows updated, 2 values computed.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View bullet-point summaries
articles.select(articles.title, articles.key_points).collect()
```
### Automatic processing
New articles are automatically summarized when inserted:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert a new article - summaries are generated automatically
articles.insert(
[
{
'title': 'AI in Healthcare',
'content': """Artificial intelligence is revolutionizing healthcare diagnostics
and treatment planning. Machine learning models can now detect diseases from
medical images with accuracy matching or exceeding human specialists. AI-powered
drug discovery is accelerating the development of new treatments. Natural language
processing is being used to extract insights from clinical notes and research papers.""",
}
]
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View all summaries including the new article
articles.select(articles.title, articles.summary).collect()
```
## Explanation
**Prompt engineering for summaries:**
**Cost optimization:**
* Use `gpt-4o-mini` for most summarization tasks (fast and affordable)
* Use `gpt-4o` for complex documents requiring deeper understanding
* Summaries are cached—you only pay once per article and stuand toofor
trL para
## See also
* [Split documents for
RAG](/howto/cookbooks/text/doc-chunk-for-rag) -
Process long documents
* [Extract fields from
JSON](/howto/cookbooks/core/workflow-json-extraction) -
Parse structured LLM output
* [Configure API
keys](/howto/cookbooks/core/workflow-api-keys) -
Set up OpenAI credentials
# Translate text between languages
Source: https://docs.pixeltable.com/howto/cookbooks/text/text-translate
Translate text columns between languages in Pixeltable using OpenAI, Anthropic, and other LLM providers through declarative computed columns.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Automatically translate content into multiple languages using LLMs.
## Problem
You have content that needs to be available in multiple
languages—product descriptions, documentation, user-generated content.
Manual translation is slow and expensive.
## Solution
**What’s in this recipe:**
* Translate text using OpenAI models
* Create multiple language columns from one source
* Handle batch translation efficiently
You add computed columns for each target language. Translations are
generated automatically when you insert new content and cached for
future queries.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.openai import chat_completions
# Create a fresh directory
pxt.drop_dir('translate_demo', force=True)
pxt.create_dir('translate_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'translate\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add Spanish translation column
spanish_prompt = (
'Translate the following text to Spanish. Return only the translation, no explanations:\n\n'
+ content.text_en
)
content.add_computed_column(
response_es=chat_completions(
messages=[{'role': 'user', 'content': spanish_prompt}],
model='gpt-4o-mini',
)
)
content.add_computed_column(
text_es=content.response_es.choices[0].message.content
)
```
Added 0 column values with 0 errors.
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add French translation column
french_prompt = (
'Translate the following text to French. Return only the translation, no explanations:\n\n'
+ content.text_en
)
content.add_computed_column(
response_fr=chat_completions(
messages=[{'role': 'user', 'content': french_prompt}],
model='gpt-4o-mini',
)
)
content.add_computed_column(
text_fr=content.response_fr.choices[0].message.content
)
```
Added 0 column values with 0 errors.
Added 0 column values with 0 errors.
No rows affected.
### Translate content
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert sample content
sample_content = [
{
'title': 'Welcome Message',
'text_en': 'Welcome to our platform! We are excited to have you here.',
},
{
'title': 'Product Description',
'text_en': 'This lightweight laptop features a 14-inch display and all-day battery life.',
},
{
'title': 'Support Article',
'text_en': 'To reset your password, click the forgot password link on the login page.',
},
]
content.insert(sample_content)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View all translations
content.select(
content.title, content.text_en, content.text_es, content.text_fr
).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Pretty print one example
row = content.where(content.title == 'Welcome Message').collect()[0]
```
## Explanation
**How it works:**
Each target language is a computed column with a translation prompt.
When you insert new content:
1. The English text is processed
2. Translation prompts are generated for each language
3. All translations run in parallel
4. Results are cached—no re-translation needed
**Adding more languages:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add German translation
german_prompt = 'Translate to German:\n\n' + content.text_en
content.add_computed_column(
response_de=chat_completions(messages=[{'role': 'user', 'content': german_prompt}], model='gpt-4o-mini')
)
content.add_computed_column(text_de=content.response_de.choices[0].message.content)
```
**Cost optimization:**
## See also
* [Summarize
text](/howto/cookbooks/text/text-summarize) -
Text summarization with LLMs
* [Extract structured
data](/howto/cookbooks/images/vision-structured-output) -
Get JSON from LLM responses
# Add text overlays to videos
Source: https://docs.pixeltable.com/howto/cookbooks/video/video-add-text-overlay
Overlay text, captions, and timestamps onto videos in Pixeltable using FFmpeg-backed computed columns and frame-level transformations.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Burn text, captions, or watermarks directly into video files.
## Problem
You need to add text to videos—captions, watermarks, titles, or dynamic
labels. Manual video editing doesn’t scale for batch processing.
## Solution
**What’s in this recipe:**
* Add simple text overlays
* Create styled captions with backgrounds
* Position text with alignment options
* Crop a rectangular region from a video
Use `video.overlay_text()` to burn text into videos with full control
over styling and position, and `video.crop()` to extract a rectangular
region.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Create a fresh directory
pxt.drop_dir('overlay_demo', force=True)
pxt.create_dir('overlay_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'overlay\_demo'.
\
### Load sample videos
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a video table
videos = pxt.create_table(
'overlay_demo/videos', {'video': pxt.Video, 'title': pxt.String}
)
# Insert a sample video
videos.insert(
[
{
'video': 's3://multimedia-commons/data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4',
'title': 'Sample Video',
}
]
)
```
Created table 'videos'.
Inserted 1 row with 0 errors in 3.21 s (0.31 rows/s)
1 row inserted.
### Add a simple text overlay
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add a simple watermark in the corner
videos.add_computed_column(
watermarked=videos.video.overlay_text(
'My Brand',
font_size=24,
color='white',
opacity=0.7,
horizontal_align='right',
horizontal_margin=20,
vertical_align='top',
vertical_margin=20,
)
)
```
Added 1 column value with 0 errors in 1.25 s (0.80 rows/s)
1 row updated.
### Add YouTube-style captions
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add a caption with a semi-transparent background box
videos.add_computed_column(
captioned=videos.video.overlay_text(
'This is a sample caption',
font_size=32,
color='white',
box=True, # Add background box
box_color='black',
box_opacity=0.8,
box_border=[6, 14], # Padding: [top/bottom, left/right]
horizontal_align='center',
vertical_align='bottom',
vertical_margin=70, # Distance from bottom
)
)
```
Added 1 column value with 0 errors in 1.08 s (0.92 rows/s)
1 row updated.
### Add dynamic titles from table columns
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add video title as an overlay (dynamic per video)
videos.add_computed_column(
titled=videos.video.overlay_text(
videos.title, # Use the title column!
font_size=48,
color='yellow',
opacity=1.0,
horizontal_align='center',
vertical_align='top',
vertical_margin=30,
)
)
```
Added 1 column value with 0 errors in 1.15 s (0.87 rows/s)
1 row updated.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View all versions
videos.select(
videos.title,
videos.video,
videos.watermarked,
videos.captioned,
videos.titled,
).collect()
```
### Crop a region from a video
Use `video.crop()` to extract a rectangular region from a video. This is
useful for focusing on a specific area of interest, removing borders, or
preparing clips for object-specific analysis.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Crop using xywh format (default): [x, y, width, height]
videos.add_computed_column(cropped=videos.video.crop([100, 50, 320, 240]))
# Crop using xyxy format (common in object detection pipelines):
# videos.add_computed_column(
# cropped_xyxy=videos.video.crop([100, 50, 420, 290], bbox_format='xyxy')
# )
```
Added 1 column value with 0 errors in 0.56 s (1.78 rows/s)
1 row updated.
## Explanation
**Positioning options:**
**Styling options:**
**Background box options:**
**Requirements:**
* FFmpeg must be installed and in PATH
## See also
* [Generate
thumbnails](/howto/cookbooks/video/video-generate-thumbnails) -
Create preview images
* [Detect scene
changes](/howto/cookbooks/video/video-scene-detection) -
Find cuts and transitions
# Extract frames from videos
Source: https://docs.pixeltable.com/howto/cookbooks/video/video-extract-frames
Split videos into individual frames in Pixeltable with FrameIterator so you can run image models, vision LLMs, and analytics per frame.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pull frames from video files at specified intervals for analysis,
thumbnails, or training data.
## Problem
You have video files and need to extract frames for:
* Object detection on video content
* Creating thumbnails or previews
* Building training datasets
* Scene analysis and classification
## Solution
**What’s in this recipe:**
* Extract frames at a fixed rate (FPS)
* Extract a specific number of frames
* Extract only keyframes for efficiency
You create a view with a `frame_iterator` that automatically extracts
frames from each video. New videos are processed without extra code.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
### Load videos
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.video import frame_iterator
# Create a fresh directory
pxt.drop_dir('video_demo', force=True)
pxt.create_dir('video_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'video\_demo'.
\
### Extract frames at fixed rate
Create a view that extracts 1 frame per second:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Extract 1 frame per second
frames = pxt.create_view(
'video_demo/frames',
videos,
iterator=frame_iterator(
videos.video,
fps=1.0, # 1 frame per second
),
)
```
Inserting rows into \`frames\`: 19 rows \[00:00, 8687.65 rows/s]
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View extracted frames
frames.select(frames.frame, frames.pos).head(3)
```
### Extract keyframes only
For faster processing, extract only keyframes (I-frames):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Extract only keyframes (much faster for long videos)
keyframes = pxt.create_view(
'video_demo/keyframes',
videos,
iterator=frame_iterator(videos.video, keyframes_only=True),
)
keyframes.select(keyframes.frame).head(3)
```
Inserting rows into \`keyframes\`: 7 rows \[00:00, 3277.53 rows/s]
## Explanation
**Extraction options:**
Only one of `fps`, `num_frames`, or `keyframes_only` can be specified.
**When to use keyframes:**
* Quick video scanning and thumbnails
* Initial content classification
* Processing very long videos
**Frame metadata:**
Each frame includes:
* `frame`: The extracted image
* `pos`: Frame position in the video
* `pts`: Presentation timestamp
## See also
* [Iterators
documentation](/platform/iterators)
* [Analyze images in
batch](/howto/cookbooks/images/vision-batch-analysis)
# Generate videos with AI
Source: https://docs.pixeltable.com/howto/cookbooks/video/video-generate-ai
Generate AI video clips in Pixeltable from text or image prompts using Runway, Replicate, and other generative video provider integrations.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Create videos from text prompts or animate images using Google’s Veo
model.
## Problem
You need to generate video content programmatically—for social media,
product demos, or creative applications.
## Solution
**What’s in this recipe:**
* Generate videos from text prompts
* Animate existing images into videos
* Store prompts and generated videos together
Use Google’s Veo model to generate videos. Videos are
cached—regeneration only happens if the prompt changes.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable google-genai
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'GEMINI_API_KEY' not in os.environ:
os.environ['GEMINI_API_KEY'] = getpass.getpass(
'Google AI Studio API Key: '
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions import gemini
# Create a fresh directory
pxt.drop_dir('video_gen_demo', force=True)
pxt.create_dir('video_gen_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'video\_gen\_demo'.
\
### Generate videos from text prompts
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table for text-to-video generation
videos = pxt.create_table(
'video_gen_demo/text_to_video', {'prompt': pxt.String}
)
# Add computed column that generates videos
videos.add_computed_column(
video=gemini.generate_videos(
videos.prompt, model='veo-2.0-generate-001'
)
)
```
Created table 'text\_to\_video'.
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Generate a video from a text prompt
videos.insert(
[
{
'prompt': 'A serene mountain lake at sunrise with mist rising from the water'
}
]
)
# View the result
videos.select(videos.prompt, videos.video).collect()
```
Inserting rows into \`text\_to\_video\`: 1 rows \[00:00, 190.68 rows/s]
Inserted 1 row with 0 errors.
### Animate images into videos
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table for image-to-video generation
animated = pxt.create_table(
'video_gen_demo/image_to_video',
{'image': pxt.Image, 'description': pxt.String},
)
# Add computed column that animates images
animated.add_computed_column(
video=gemini.generate_videos(
image=animated.image, model='veo-2.0-generate-001'
)
)
```
Created table 'image\_to\_video'.
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Animate an image
base_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images'
animated.insert(
[
{
'image': f'{base_url}/000000000030.jpg',
'description': 'Beach scene',
}
]
)
# View the animated result
animated.select(animated.image, animated.video).collect()
```
Inserting rows into \`image\_to\_video\`: 1 rows \[00:00, 291.88 rows/s]
Inserted 1 row with 0 errors.
## Explanation
**Generation modes:**
**Veo model options:**
**Tips:**
* Prompts work best when descriptive and specific
* Generated videos are cached - same prompt returns cached result
* Image-to-video preserves the composition of the input image
* New rows automatically generate videos on insert
**Requirements:**
* Google AI Studio API key (set `GEMINI_API_KEY`)
* `pip install google-genai`
## See also
* [Extract frames from
videos](/howto/cookbooks/video/video-extract-frames) -
Pull frames from generated videos
* [Add text
overlays](/howto/cookbooks/video/video-add-text-overlay) -
Add captions to videos
# Generate thumbnails from videos
Source: https://docs.pixeltable.com/howto/cookbooks/video/video-generate-thumbnails
Generate thumbnail images and preview frames for videos in Pixeltable using FrameIterator views and FFmpeg-backed computed columns at scale.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Automatically create preview thumbnails from video files at specific
timestamps or intervals.
## Problem
You have video files that need preview thumbnails for galleries, search
results, or video players. Manually extracting frames doesn’t scale.
## Solution
**What’s in this recipe:**
* Extract thumbnail at a specific timestamp
* Generate multiple thumbnails per video
* Resize thumbnails to standard dimensions
You add computed columns that extract frames from videos. Thumbnails are
generated automatically when you insert new videos.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
### Load videos
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
import pixeltable.functions as pxtf
# Create a fresh directory
pxt.drop_dir('thumbnail_demo', force=True)
pxt.create_dir('thumbnail_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'thumbnail\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View videos
videos.collect()
```
### Extract thumbnail at timestamp
Extract a single frame at a specific time (e.g., 1 second into the
video):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Extract frame at 1 second as thumbnail
videos.add_computed_column(
thumbnail=pxtf.video.extract_frame(videos.video, timestamp=1.0)
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View resized thumbnails with dimensions
videos.select(
videos.thumbnail_small,
videos.thumbnail_small.width,
videos.thumbnail_small.height,
).collect()
```
### Multiple thumbnails with `frame_iterator`
For preview strips or timeline thumbnails, use `frame_iterator` to
extract multiple frames:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a view with frames extracted at 0.5 fps (one frame every 2 seconds)
frames = pxt.create_view(
'thumbnail_demo/frames',
videos,
iterator=pxtf.video.frame_iterator(videos.video, fps=0.5),
)
```
Inserting rows into \`frames\`: 17 rows \[00:00, 9736.88 rows/s]
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View extracted frames (multiple per video)
frames.select(frames.frame, frames.pos).head(10)
```
## Explanation
**Thumbnail extraction methods:**
**Common thumbnail sizes:**
## See also
* [Extract frames from
videos](/howto/cookbooks/video/video-extract-frames) -
Detailed frame extraction guide
* [Load media from
S3](/howto/cookbooks/data/data-import-s3) -
Import videos from cloud storage
* [Transform images with
PIL](/howto/cookbooks/images/img-pil-transforms) -
Resize and crop images
# Create a video slideshow from images
Source: https://docs.pixeltable.com/howto/cookbooks/video/video-image-slideshow
Build slideshow videos from sequences of images in Pixeltable using FFmpeg-backed UDFs with configurable timing, transitions, and audio tracks.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Turn a collection of images into an animated video with Ken Burns
effects, text overlays, logos, and background music — using a fully
declarative Pixeltable pipeline.
## Problem
You have a set of images and want to produce a polished video — an ad, a
product reel, a social media clip. Typically this means a video editor
or a complex ffmpeg scripting pipeline.
## Solution
**What’s in this recipe:**
* Convert still images into animated video clips with pan effects
* Control pan direction per image from table data (no Python loops)
* Add per-image captions and a logo overlay
* Concatenate all clips and add background music
The key insight: store per-clip metadata (caption text, pan direction,
logo) as **table columns**. One chained computed column expression
handles the entire pipeline, and Pixeltable evaluates it per row.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
Note: you may need to restart the kernel to use updated packages.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.video import concat_videos_agg, with_audio
pxt.drop_dir('slideshow_demo', force=True)
pxt.create_dir('slideshow_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Pixeltable dashboard available at: [http://localhost:22089](http://localhost:22089)
Created directory 'slideshow\_demo'.
\
## Step 1: Define the data
Each row is one clip in the final video. Per-clip variation comes from
table columns:
* `caption`: text overlay for each clip
* `pan_sign`: `+1.0` for pan-right, `-1.0` for pan-left
* `logo`: image to overlay in the corner
This is what makes the pipeline fully declarative — no Python loops, no
conditional logic.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
LOGO_URL = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/pixeltable-logo-large.png'
t = pxt.create_table(
'slideshow_demo/clips',
{
'image': pxt.Image,
'seq': pxt.Int,
'caption': pxt.String,
'logo': pxt.Image,
'pan_sign': pxt.Int,
},
)
t.insert(
[
{
'image': 'https://images.unsplash.com/photo-1506744038136-46273834b3fb?w=1920&q=80',
'seq': 0,
'caption': 'DISCOVER NATURE',
'logo': LOGO_URL,
'pan_sign': 1,
},
{
'image': 'https://images.unsplash.com/photo-1470071459604-3b5ec3a7fe05?w=1920&q=80',
'seq': 1,
'caption': 'WILD FORESTS',
'logo': LOGO_URL,
'pan_sign': -1,
},
{
'image': 'https://images.unsplash.com/photo-1441974231531-c6227db76b6e?w=1920&q=80',
'seq': 2,
'caption': 'SUNLIT CANOPY',
'logo': LOGO_URL,
'pan_sign': 1,
},
{
'image': 'https://images.unsplash.com/photo-1507525428034-b723cf961d3e?w=1920&q=80',
'seq': 3,
'caption': 'OCEAN BREEZE',
'logo': LOGO_URL,
'pan_sign': -1,
},
{
'image': 'https://images.unsplash.com/photo-1519681393784-d120267933ba?w=1920&q=80',
'seq': 4,
'caption': 'EXPLORE MORE',
'logo': LOGO_URL,
'pan_sign': 1,
},
]
)
```
Inserted 5 rows with 0 errors in 1.04 s (4.81 rows/s)
5 rows inserted.
## Step 2: Build the video pipeline
One computed column chains the entire transformation:
image → static video → resize → pan effect → resize → logo overlay → text overlay
`pan()` is a convenience wrapper around `scroll()` that computes
viewport size, start position, and speed automatically. It accepts
column expressions for per-row direction:
* `pan_sign = +1` → pans right
* `pan_sign = -1` → pans left
For lower-level control (custom speed, diagonal pans, asymmetric crops),
use `scroll()` directly.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
W, H, DUR, CROP = 1280, 720, 4.0, 0.25
# Base: still image → video → uniform resolution
t.add_computed_column(
base=t.image.to_video(duration=DUR).resize(width=W, height=H)
)
# Full pipeline: pan → resize → logo → caption
# pan() accepts column expressions for per-row direction
t.add_computed_column(
clip=t.base.pan(x_sign=t.pan_sign, crop_pct=CROP)
.resize(width=W, height=H)
.overlay_image(
t.logo,
scale=0.10,
opacity=0.85,
horizontal_align='right',
vertical_align='top',
horizontal_margin=15,
vertical_margin=15,
)
.overlay_text(
t.caption,
font_size=44,
color='white',
horizontal_align='center',
vertical_align='bottom',
vertical_margin=50,
box=True,
box_color='black',
box_opacity=0.5,
box_border=[8, 16],
start_time=0.5,
end_time=3.0,
)
)
```
Added 5 column values with 0 errors in 6.43 s (0.78 rows/s)
5 rows updated.
## Step 3: Preview individual clips
Each row now has a fully rendered video clip. Let’s inspect them.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(
t.seq, t.caption, t.pan_sign, dur=t.clip.get_duration()
).order_by(t.seq).collect()
```
## Step 4: Concatenate into final video
`concat_videos_agg` merges all clips in `seq` order into a single video.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
video_path = (
t.group_by()
.select(v=concat_videos_agg(t.seq, t.clip))
.collect()[0]['v']
)
```
## Step 5: Add background music
`with_audio()` replaces (or adds) the audio track on a video. The audio
is trimmed to match the video duration automatically.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
MUSIC_URL = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/sample-background-music.m4a'
final = pxt.create_table(
'slideshow_demo/final', {'video': pxt.Video, 'music': pxt.Audio}
)
final.insert([{'video': video_path, 'music': MUSIC_URL}])
final.add_computed_column(out=with_audio(final.video, final.music))
final.select(final.out).collect()
```
## How it works
The entire pipeline is declarative — per-clip variation comes from
**data**, not code:
One computed column expression handles the full transformation chain.
Pixeltable evaluates it per row, pulling caption, logo, and pan
direction from each row’s data.
### Alternative effects
Swap the pan for other built-in effects:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Zoom instead of pan
t.add_computed_column(clip=t.base.zoom(start_scale=1.0, end_scale=1.3))
# Fade-through-black transitions (add to each clip before concat)
t.add_computed_column(clip=t.base.fade_in(duration=0.5).fade_out(duration=0.5))
# Combine: pan + fade
t.add_computed_column(
clip=t.base.pan(x_sign=1)
.resize(width=1280, height=720)
.fade_in(duration=0.5)
.fade_out(duration=0.5)
)
```
### Audio
This recipe uses `with_audio()` to replace the soundtrack. To **blend**
a second audio track into a video that already has audio (e.g. add
background music under narration), use `mix_audio()`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.video.mix_audio(t.music, audio_volume=0.3, original_volume=1.0)
```
# Detect scene changes in videos
Source: https://docs.pixeltable.com/howto/cookbooks/video/video-scene-detection
Detect scene cuts and shot boundaries in videos using Pixeltable to split footage into segments for indexing, summarization, and editing.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Automatically find scene cuts, transitions, and fades in video files.
## Problem
You have video files and need to identify scene boundaries for:
## Solution
**What’s in this recipe:**
* Detect hard cuts with `scene_detect_content()`
* Find fade transitions with `scene_detect_threshold()`
* Use adaptive detection with `scene_detect_adaptive()`
Three built-in detection methods handle different transition types using
PySceneDetect.
### Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable scenedetect opencv-python
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Create a fresh directory
pxt.drop_dir('scene_demo', force=True)
pxt.create_dir('scene_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'scene\_demo'.
\
### Load sample videos
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a video table
videos = pxt.create_table(
'scene_demo/videos', {'video': pxt.Video, 'title': pxt.String}
)
# Insert sample videos from S3
videos.insert(
[
{
'video': 's3://multimedia-commons/data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4',
'title': 'Sample video 1',
}
]
)
```
Created table 'videos'.
Inserting rows into \`videos\`: 1 rows \[00:00, 200.53 rows/s]
Inserted 1 row with 0 errors.
1 row inserted, 3 values computed.
### Detect scenes with content-based detection
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Detect scenes using content-based detection (best for hard cuts)
videos.add_computed_column(
scenes_content=videos.video.scene_detect_content(
threshold=27.0, # Lower = more sensitive
min_scene_len=15, # Minimum frames between cuts
)
)
# View detected scenes
videos.select(videos.title, videos.scenes_content).collect()
```
### Adaptive detection for complex videos
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Adaptive detection adjusts to video content dynamically
videos.add_computed_column(
scenes_adaptive=videos.video.scene_detect_adaptive(
adaptive_threshold=3.0, # Lower = more scenes detected
min_scene_len=15,
fps=2.0, # Analyze at 2 FPS for speed
)
)
# View adaptively-detected scenes
videos.select(videos.title, videos.scenes_adaptive).collect()
```
Added 1 column value with 0 errors.
## Explanation
**Detection methods:**
**Output format:**
Each method returns a list of scene dictionaries:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{
'start_time': 5.2, # Scene start in seconds
'start_pts': 156, # Presentation timestamp
'duration': 3.8 # Scene duration in seconds
}
```
**Tuning tips:**
## See also
* [Extract frames from
videos](/howto/cookbooks/video/video-extract-frames) -
Get frames at scene boundaries
* [Generate
thumbnails](/howto/cookbooks/video/video-generate-thumbnails) -
Create preview images
# Infrastructure Setup
Source: https://docs.pixeltable.com/howto/deployment/infrastructure
Organize Pixeltable code, configure storage backends, and design infrastructure for production deployments of multimodal AI pipelines.
## Code Organization
Separate schema definition from router logic.
**Schema Definition (`schema.py`):**
* Tables, views, computed columns, indexes, and agent-internal `@pxt.query` functions
* Flat module with `if_exists='ignore'` for idempotency (no `setup()` wrapper, no `_initialized` flag)
* Run once before starting workers: `python schema.py`
**Router Files (`routers/data.py`, `routers/search.py`, etc.):**
* Call `pxt.get_table()` directly to get table handles
* Define router-facing `@pxt.query` functions next to the routes that use them
* No `import schema` needed; tables already exist from the init step
**Configuration (`config.py`):**
* Externalizes model IDs, API keys, thresholds, connection strings
* Uses environment variables (`.env` + `python-dotenv`) or secrets management
* Never hardcodes secrets
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# schema.py — creates tables (flat, idempotent, no queries for routers)
import pixeltable as pxt
import config
pxt.create_dir(config.APP_NAMESPACE, if_exists='ignore')
docs = pxt.create_table(
f'{config.APP_NAMESPACE}/documents',
{'document': pxt.Document, 'metadata': pxt.Json, 'timestamp': pxt.Timestamp},
if_exists='ignore',
)
# ---
# routers/data.py — queries live next to the routes that use them
import pixeltable as pxt
from pixeltable.serving import FastAPIRouter
import config
router = FastAPIRouter(prefix="/api/data", tags=["data"])
docs = pxt.get_table(f'{config.APP_NAMESPACE}/documents')
@pxt.query
def list_documents():
return docs.select(docs.title, docs.document).order_by(docs.title)
router.add_query_route(path="/documents", query=list_documents, method="get")
```
## Project Structure
```
project/
├── config.py # Environment variables, model IDs, API keys
├── functions.py # Custom UDFs (imported as modules)
├── schema.py # Schema definition (tables, views, indexes)
├── main.py # FastAPI app, mounts routers
├── routers/
│ ├── data.py # CRUD routes + queries for data pipeline
│ ├── search.py # Search routes + queries
│ └── agent.py # Agent routes (declarative + hand-written)
├── pyproject.toml # Dependencies and pxt serve config
└── .env # Secrets (gitignored)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import os
ENV = os.getenv('ENVIRONMENT', 'dev')
APP_NAMESPACE = f'{ENV}_myapp'
# Model Configuration
EMBEDDING_MODEL = os.getenv('EMBEDDING_MODEL', 'intfloat/e5-large-v2')
OPENAI_MODEL = os.getenv('OPENAI_MODEL', 'gpt-4o-mini')
# Storage
MEDIA_STORAGE_BUCKET = os.getenv('MEDIA_STORAGE_BUCKET')
# Prompts
RAG_SYSTEM_PROMPT = """You are a helpful assistant. Use the provided context to answer questions."""
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
@pxt.udf
def format_prompt(context: list, question: str) -> str:
"""Format RAG prompt with context."""
context_str = "\n".join([doc['text'] for doc in context])
return f"Context:\n{context_str}\n\nQuestion: {question}"
@pxt.udf(resource_pool='request-rate:my_service')
async def call_custom_model(prompt: str) -> dict:
"""Call self-hosted model endpoint."""
# Your custom logic here
return {"response": "..."}
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.huggingface import sentence_transformer
import config
pxt.create_dir(config.APP_NAMESPACE, if_exists='ignore')
docs = pxt.create_table(
f'{config.APP_NAMESPACE}/documents',
{'document': pxt.Document, 'metadata': pxt.Json, 'timestamp': pxt.Timestamp},
if_exists='ignore',
)
docs.add_computed_column(
embedding=sentence_transformer(docs.document, model_id=config.EMBEDDING_MODEL),
if_exists='ignore',
)
docs.add_embedding_index('embedding', idx_name='docs_embed', metric='cosine', if_exists='ignore')
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.serving import FastAPIRouter
import config
router = FastAPIRouter(prefix="/api/search", tags=["search"])
docs = pxt.get_table(f'{config.APP_NAMESPACE}/documents')
@pxt.query
def search_documents(query_text: str, limit: int = 5):
sim = docs.embedding.similarity(string=query_text)
return docs.order_by(sim, asc=False).limit(limit).select(docs.document, sim)
router.add_query_route(path="/documents", query=search_documents)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from fastapi import FastAPI
from routers import data, search
app = FastAPI()
app.include_router(data.router)
app.include_router(search.router)
```
**Key Principles:**
* **Schema separate from routers:** `schema.py` defines tables/views/indexes. Router files define `@pxt.query` functions next to the routes that use them. No cross-imports needed.
* **Module UDFs** (`functions.py`): Update when code changes; improve testability. [Learn more](/platform/udfs-in-pixeltable)
* **Idempotency:** Use `if_exists='ignore'` to make `schema.py` safely re-runnable.
* **Built-in HTTP serving:** For standard endpoints, consider [`pxt serve`](/howto/deployment/serving) with a TOML config.
* **`return_rows=True`:** Pass to `insert()`/`update()` to get computed column values back without a follow-up query. See [HTTP Serving](/howto/deployment/serving#reading-back-computed-columns-after-insert).
* **Multi-worker deployments:** With `--workers N`, run `python schema.py` before `uvicorn` so schema creation happens once, not per worker (see [Starter Kit Dockerfile](https://github.com/pixeltable/pixeltable-starter-kit)).
See this structure in action: a production-ready FastAPI + React app with schema definition, config, UDFs, and endpoint routers already wired up. Includes deployment configs for Docker, Helm, Terraform (EKS/GKE/AKS), and AWS CDK.
## Storage Architecture
Pixeltable is an OLTP database built on embedded PostgreSQL. It uses multiple storage mechanisms:
```mermaid theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
flowchart LR
subgraph Home[~/.pixeltable/]
direction TB
PG[(pgdata PostgreSQL)]
Media[media Generated Files]
Cache[file_cache LRU Cache]
Tmp[tmp Temporary]
end
Cloud[Cloud Storage S3/GCS]
Media -.->|Optional| Cloud
Cache <-->|Downloads| Cloud
```
**Important Concept:** Pixeltable directories (`pxt.create_dir`) are logical namespaces in the catalog, NOT filesystem directories.
**How Media is Stored:**
* PostgreSQL stores only file paths/URLs, never raw media data.
* Inserted local files: path stored, original file remains in place.
* Inserted URLs: URL stored, file downloaded to File Cache on first access.
* Generated media (computed columns): saved to Media Store (default: local, configurable to S3/GCS/Azure per-column).
* File Cache size: configure via `file_cache_size_g` in `~/.pixeltable/config.toml`. [See configuration guide](/platform/configuration)
For large datasets with remote media, consider increasing file cache size to avoid repeated downloads (default is 20% of available disk):
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# ~/.pixeltable/config.toml
file_cache_size_g = 50 # 50 GB cache
```
### References, Not Copies
Unlike vector databases that require ingesting data into their own storage format, Pixeltable stores **references** to external files. Your original media stays in S3/GCS/Azure; only computed results (embeddings, metadata, generated media) are stored locally or in configured cloud buckets.
```mermaid theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
flowchart LR
S3[S3 / GCS / Azure] -. reference .-> PXT[Pixeltable]
PXT --> Meta[Computed Results]
PXT -. lazy load .-> S3
```
This means:
* **No data duplication** — you don't pay for storage twice.
* **Schema changes don't require re-upload** — add a column, not a migration script.
* **Works with existing storage** — point Pixeltable at your current buckets.
**Deployment-Specific Storage Patterns:**
*Batch Processing:*
* Pixeltable storage can be ephemeral (re-computable).
* Processing results exported to external RDBMS via `export_sql` and media to blob storage via `destination`.
* Reference input media from S3/GCS/Azure URIs.
*Full Backend:*
* Pixeltable IS the RDBMS (embedded PostgreSQL, not replaceable).
* Requires persistent volume at `~/.pixeltable` (pgdata, media, file\_cache).
* Media Store configurable to S3/GCS/Azure buckets for generated files.
*Declarative Serving (`pxt serve`):*
* Same persistent storage as Full Backend.
* API routes declared in `pyproject.toml`, no hand-written endpoint code.
All [Starter Kit](https://github.com/pixeltable/pixeltable-starter-kit) deployment configs set `PIXELTABLE_HOME=/data/pixeltable` pointing to persistent storage (Docker volumes, K8s PVCs, or EFS). For large media workloads, configure external blob storage:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
PIXELTABLE_INPUT_MEDIA_DEST=s3://your-bucket/input # or gs:// or az://
PIXELTABLE_OUTPUT_MEDIA_DEST=s3://your-bucket/output
```
## Dependency Management
**Virtual Environments:**
Use `venv`, `conda`, or `uv` to isolate dependencies.
**Dependencies (`pyproject.toml`):**
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[project]
name = "my-pixeltable-app"
requires-python = ">=3.10"
dependencies = [
"pixeltable>=0.6.2",
"fastapi[standard]>=0.115.0",
"python-dotenv>=1.0.1",
"sentence-transformers>=3.3.0", # If using embedding indexes
]
```
* Use `pyproject.toml` with `uv` or `pip` for dependency management
* Include integration packages (e.g., `openai`, `sentence-transformers`)
* Test updates in staging before production
## Data Interoperability
Pixeltable integrates with existing data pipelines via import/export capabilities. See the [Import/Export SDK reference](/sdk/latest/io) for full details.
**Import:**
* CSV, Excel, JSON: [`pxt.io.import_csv()`](/sdk/latest/io#func-import_csv), [`pxt.io.import_excel()`](/sdk/latest/io#func-import_excel), [`pxt.io.import_json()`](/sdk/latest/io#func-import_json)
* Parquet: [`pxt.io.import_parquet()`](/sdk/latest/io#func-import_parquet)
* Pandas DataFrames: [`table.insert(df)`](/sdk/latest/table#method-insert) or [`pxt.create_table(source=df)`](/sdk/latest/pixeltable#func-create_table)
* Hugging Face Datasets: [`pxt.io.import_huggingface_dataset()`](/sdk/latest/io#func-import_huggingface_dataset)
**Export:**
* CSV: [`pxt.io.export_csv(table, path)`](/sdk/latest/io#func-export_csv) for tabular data
* JSON: [`pxt.io.export_json(table, path)`](/sdk/latest/io#func-export_json) for structured data
* Parquet: [`pxt.io.export_parquet(table, path)`](/sdk/latest/io#func-export_parquet) for data warehousing
* LanceDB: [`pxt.io.export_lancedb(table, db_uri, table_name)`](/sdk/latest/io#func-export_lancedb) for vector databases
* PyTorch: [`table.to_pytorch_dataset()`](/sdk/latest/query#method-to_pytorch_dataset) for ML training pipelines
* COCO: [`table.to_coco_dataset()`](/sdk/latest/query#method-to_coco_dataset) for computer vision
* Pandas: [`table.collect().to_pandas()`](/sdk/latest/query#method-collect) for analysis
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Export query results to Parquet
import pixeltable as pxt
docs_table = pxt.get_table('myapp/documents')
results = docs_table.where(docs_table.timestamp > '2024-01-01')
pxt.io.export_parquet(results, '/data/exports/recent_docs.parquet')
```
# Monitoring & Performance
Source: https://docs.pixeltable.com/howto/deployment/monitoring
Monitor Pixeltable pipelines in production with structured logging, resource metrics, performance tuning, and provider rate-limit handling.
## Logging
* Implement Python logging in UDFs and application endpoints
* Track execution time, errors, API call latency
* Use structured logging (JSON) for log aggregation
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import logging
import time
import pixeltable as pxt
logger = logging.getLogger(__name__)
@pxt.udf
def process_video(video: pxt.Video) -> pxt.Json:
start = time.time()
try:
# Your processing logic here
result = {'processed': True}
logger.info(f"Processed in {time.time() - start:.2f}s")
return result
except Exception as e:
logger.error(f"Processing failed: {e}")
raise
```
## Resource Monitoring
* Monitor CPU, RAM, Disk I/O, Network on Pixeltable host
* Track UDF execution time and model inference latency
* Alert on resource exhaustion
**Key Metrics to Track:**
| Metric | What to Watch |
| -------- | ------------------------------------- |
| CPU | Sustained high usage during inference |
| Memory | Growth over time (potential leaks) |
| Disk I/O | Bottlenecks during media processing |
| Network | API call latency to external services |
## Optimization
### Batch Operations
Use batch processing for better throughput:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Batch UDF execution for GPU models
@pxt.udf(batch_size=32)
def embed_batch(texts: pxt.Batch[str]) -> pxt.Batch[list[float]]:
# Process multiple items at once
return model.encode(texts)
# Batch inserts (more efficient than individual inserts)
table.insert([row1, row2, row3, ...])
```
### Performance Tips
* **Batch Operations:** Use `@pxt.udf(batch_size=32)` for GPU model inference
* **Batch Inserts:** Insert multiple rows at once: `table.insert([row1, row2, ...])`
* **Profile UDFs:** Add execution time logging to identify bottlenecks
* **Embedding Indexes:** Use pgvector for efficient similarity search
## Rate Limiting
### Built-In Provider Limits
Automatic rate limiting for OpenAI, Anthropic, Gemini, etc. is configured per-model in `config.toml`:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# ~/.pixeltable/config.toml
[openai]
requests_per_minute = 500
tokens_per_minute = 90000
```
### Custom API Rate Limiting
Use `resource_pool` to throttle calls to self-hosted models or custom endpoints:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Default: 600 requests per minute
@pxt.udf(resource_pool='request-rate:my_service')
async def call_custom_api(prompt: str) -> dict:
# Your logic to call custom endpoint
return await custom_api_call(prompt)
# Example: Custom rate-limited UDF for self-hosted model
@pxt.udf(resource_pool='request-rate:my_ray_cluster')
async def call_ray_model(prompt: str, model: str) -> dict:
# Your logic to call FastAPI + Ray cluster
return await custom_api_call(prompt, model)
```
## Advanced Features
Build complex agent workflows as computed columns with tool calling, MCP integration, and persistent state.
Publish and replicate tables across Pixeltable instances for team collaboration.
Create immutable point-in-time copies for reproducible ML experiments.
Sync tables with annotation projects for human-in-the-loop workflows.
# Production Operations
Source: https://docs.pixeltable.com/howto/deployment/operations
Operate Pixeltable in production with concurrency control, error handling, schema evolution, and zero-downtime deployment patterns.
## Concurrent Access & Scaling
| Aspect | Details |
| ----------------- | ---------------------------------------------------------------------------------- |
| **Thread Safety** | Each thread gets its own database connection and transaction context automatically |
| **Locking** | Automatic table-level locking for schema changes |
| **Isolation** | PostgreSQL `SERIALIZABLE` isolation prevents data race conditions |
| **Retries** | Built-in retry logic handles transient serialization failures |
| Scaling Dimension | Current Approach | Limitation |
| --------------------- | ----------------------------------------------- | ---------------------------------------- |
| **Metadata Storage** | Single embedded PostgreSQL instance | Vertical scaling (larger EC2/VM) |
| **Compute** | Multiple API workers connected to same instance | Shared access to storage volume required |
| **High Availability** | Single attached storage volume | Failover requires volume detach/reattach |
Multi-node HA and horizontal scaling planned for Pixeltable Cloud (2026).
## Web Framework Concurrency
**For standard insert, query, and delete endpoints,** consider [built-in HTTP serving](/howto/deployment/serving) with `FastAPIRouter` or a TOML config before writing custom endpoint handlers. It handles request/response schemas, media serving, and background jobs automatically.
Pixeltable is thread-safe and works with FastAPI, Flask, Django, and other web frameworks out of the box. The key rule: **use sync (`def`) endpoint handlers**, not `async def`.
### Why Sync Endpoints
FastAPI (and Starlette) dispatches sync (`def`) handlers to a thread pool. Each concurrent request gets its own thread, and Pixeltable automatically creates an isolated database connection per thread. This gives you true parallel request handling with no extra configuration.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pydantic import BaseModel
from fastapi import FastAPI
import pixeltable as pxt
app = FastAPI()
class SearchResult(BaseModel):
text: str
score: float
@app.post("/ingest")
def ingest(text: str):
t = pxt.get_table('myapp/documents')
status = t.insert([{'text': text}])
return {'inserted': status.num_rows}
@app.get("/search")
def search(query: str, limit: int = 10) -> list[SearchResult]:
t = pxt.get_table('myapp/documents')
sim = t.text.similarity(string=query)
results = (
t.order_by(sim, asc=False)
.limit(limit)
.select(t.text, score=sim)
.collect()
)
return list(results.to_pydantic(SearchResult))
```
**Do not use `async def` for endpoints that call Pixeltable.** Pixeltable's API is synchronous. Inside an `async def` handler, Pixeltable calls block the event loop, serializing all requests and starving other coroutines. With `def` handlers, FastAPI's thread pool handles concurrency for you.
### Returning Query Results
`table.select(...).collect()` returns a `ResultSet` object, which Pydantic cannot serialize directly. You have two options:
**Option 1: `to_pydantic()` (recommended for FastAPI)**
Define a Pydantic model and let Pixeltable validate and convert each row. FastAPI serializes these natively.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
class Item(BaseModel):
name: str
score: float
@app.get("/rows")
def get_rows() -> list[Item]:
t = pxt.get_table('myapp/items')
return list(t.select(t.name, t.score).collect().to_pydantic(Item))
```
**Option 2: `to_pandas()` + `to_dict()`**
Convert via pandas when you don't need a Pydantic model.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@app.get("/rows")
def get_rows():
t = pxt.get_table('myapp/items')
df = t.select(t.name, t.score).collect().to_pandas()
return {'rows': df.to_dict(orient='records')}
```
### uvloop Compatibility
Pixeltable is compatible with [uvloop](https://github.com/MagicStack/uvloop), the high-performance event loop used by default in many production deployments. No special configuration is needed — sync endpoints work identically whether the server uses the default asyncio loop or uvloop.
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# uvicorn with uvloop (the default when uvloop is installed)
uvicorn app:app --host 0.0.0.0 --port 8000 --workers 1
```
## GPU Acceleration
* **Automatic GPU Detection:** Pixeltable uses CUDA GPUs for local models (Hugging Face, Ollama) when available.
* **CPU Fallback:** Models run on CPU if no GPU detected (functional but slower).
* **Configuration:** Control via `CUDA_VISIBLE_DEVICES` environment variable.
## Error Handling
| Error Type | Mode | Behavior |
| -------------------------- | --------------------------------------- | ------------------------------------------------------- |
| **Computed Column Errors** | `on_error='abort'` (default) | Fails entire operation if any row errors |
| | `on_error='ignore'` | Continues processing; stores `None` with error metadata |
| **Media Validation** | `media_validation='on_write'` (default) | Validates media during insert (catches errors early) |
| | `media_validation='on_read'` | Defers validation until media accessed (faster inserts) |
Access error details via `table.column.errortype` and `table.column.errormsg`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Example: Graceful error handling in production
table.add_computed_column(
analysis=llm_analyze(table.document),
on_error='ignore' # Continue processing despite individual failures
)
# Query for errors
errors = table.where(table.analysis.errortype != None).collect()
```
## Testing Transformations Before Deployment
When you add a computed column, Pixeltable executes it immediately for all existing rows. For expensive operations (LLM calls, model inference), validate your logic on a sample first using `select()`; nothing is stored until you commit with `add_computed_column()`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# 1. Test transformation on sample rows (nothing stored)
table.select(
table.text,
summary=summarize_with_llm(table.text)
).head(3) # Only processes 3 rows
# 2. Once satisfied, persist to table (processes all rows)
table.add_computed_column(summary=summarize_with_llm(table.text))
```
This "iterate-then-add" workflow lets you catch errors early without wasting API calls or compute on your full dataset.
**Pro tip:** Save expressions as variables to guarantee identical logic in both steps:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
summary_expr = summarize_with_llm(table.text)
table.select(table.text, summary=summary_expr).head(3) # Test
table.add_computed_column(summary=summary_expr) # Commit
```
Step-by-step guide with examples for built-in functions, expressions, and custom UDFs
## Schema Evolution
| Operation Type | Examples | Impact |
| --------------- | -------------------------------------------------------------------------- | ------------------------------- |
| **Safe** | Add columns, Add computed columns, Add indexes | Incremental computation only |
| **Destructive** | Modify computed columns (`if_exists='replace'`), Drop columns/tables/views | Full recomputation or data loss |
**Production Safety:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Use if_exists='ignore' for idempotent schema migrations
import pixeltable as pxt
import config
docs_table = pxt.get_table(f'{config.APP_NAMESPACE}/documents')
docs_table.add_computed_column(
embedding=embed_model(docs_table.document),
if_exists='ignore' # No-op if column exists
)
```
* Version control `schema.py` like database migration scripts.
* Rollback via `table.revert()` (single operation) or Git revert (complex changes).
### Updating Models
The most common schema evolution is switching an embedding or LLM model. In a traditional stack this requires a migration script, a compute cluster, reprocessing every row, and a maintenance window. In Pixeltable it's one line — the old column keeps working while the new one backfills.
**Traditional approach:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# 1. Write migration script
# 2. Spin up compute to re-embed all rows (hours of downtime)
# 3. Swap the column in application code
# 4. Deploy during maintenance window
# 5. Monitor for consistency issues
data = db.query("SELECT id, content FROM documents")
for row in data:
new_vec = new_model.encode(row["content"])
db.execute("UPDATE documents SET embedding = %s WHERE id = %s", (new_vec, row["id"]))
```
**Pixeltable approach:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add a new computed column. Old column still serves queries — zero downtime.
docs.add_computed_column(
embedding_v2=sentence_transformer(docs.text, model_id='intfloat/e5-large-v2'),
if_exists='ignore'
)
# Pixeltable backfills in batches, rate-limited, with automatic retries.
# Switch your queries to embedding_v2 when ready.
```
Because both columns coexist, you can A/B test retrieval quality before cutting over — no rollback plan needed.
## Deployment Patterns
**Web Applications:**
* For standard endpoints, use [`pxt serve`](/howto/deployment/serving) with a TOML config or [`FastAPIRouter`](/howto/deployment/serving#quickstart-python)
* Run `python schema.py` once before starting workers to create tables
* Each router calls `pxt.get_table()` directly and defines its own `@pxt.query` functions
* Use sync (`def`) endpoint handlers for concurrent request support
Clone a production-ready FastAPI + React app with multimodal upload, search, and agent endpoints — plus deployment configs for Docker Compose, Helm, Terraform, and AWS CDK.
**Batch Processing:**
* Schedule via `cron`, Airflow, AWS EventBridge, GCP Cloud Scheduler, or webhooks
* Deploy to Cloud Run Jobs, Lambda, ECS Fargate, Kubernetes Jobs
* Isolate batch workloads from real-time serving (separate containers/instances)
* Use Pixeltable's incremental computation to process only new data
* The starter kit includes a [batch processing pipeline](https://github.com/pixeltable/pixeltable-starter-kit/tree/main/batch) with `export_sql` and the `destination` parameter, plus ready-to-use deploy configs for Lambda, Cloud Run, ECS Fargate, and K8s Jobs
**Containers:**
* Docker provides reproducible builds across environments
* **Full Backend:** Mount persistent volume at `~/.pixeltable` (or set `PIXELTABLE_HOME`)
* **Kubernetes:** Use `ReadWriteOnce` PVC (single-pod write access)
* Docker Compose or Kubernetes for multi-container deployments
* The starter kit includes a [multi-stage Dockerfile](https://github.com/pixeltable/pixeltable-starter-kit/blob/main/Dockerfile) and ready-to-use deployment configs:
| Method | Directory | Use case |
| ------------------- | -------------------------------------------------------------------------------------------------------------- | ------------------------ |
| **Docker Compose** | [root](https://github.com/pixeltable/pixeltable-starter-kit/blob/main/docker-compose.yml) | Local / single server |
| **Helm** | [`deploy/helm/`](https://github.com/pixeltable/pixeltable-starter-kit/tree/main/deploy/helm) | Any existing K8s cluster |
| **Terraform (EKS)** | [`deploy/terraform-k8s/`](https://github.com/pixeltable/pixeltable-starter-kit/tree/main/deploy/terraform-k8s) | AWS from scratch |
| **Terraform (GKE)** | [`deploy/terraform-gke/`](https://github.com/pixeltable/pixeltable-starter-kit/tree/main/deploy/terraform-gke) | GCP from scratch |
| **Terraform (AKS)** | [`deploy/terraform-aks/`](https://github.com/pixeltable/pixeltable-starter-kit/tree/main/deploy/terraform-aks) | Azure from scratch |
| **AWS CDK** | [`deploy/aws-cdk/`](https://github.com/pixeltable/pixeltable-starter-kit/tree/main/deploy/aws-cdk) | ECS Fargate + EFS |
```dockerfile theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Multi-stage Dockerfile (from the starter kit)
FROM node:20-slim AS frontend-build
WORKDIR /app/frontend
COPY frontend/package.json frontend/package-lock.json ./
RUN npm ci
COPY frontend/ ./
RUN npm run build
FROM python:3.12-slim AS runtime
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential libpq-dev curl && \
rm -rf /var/lib/apt/lists/*
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
WORKDIR /app
COPY backend/pyproject.toml backend/uv.lock ./
RUN uv sync --frozen --no-dev --python /usr/local/bin/python3
COPY backend/ ./
COPY --from=frontend-build /app/backend/static ./static
ENV PIXELTABLE_HOME=/data/pixeltable
EXPOSE 8000
CMD ["sh", "-c", "uv run python schema.py && uv run uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4"]
```
## Environment Management
### Multi-Tenancy and Isolation
| Isolation Type | Implementation | Use Case | Overhead |
| -------------- | ------------------------------------------------------------------------------------------ | ------------------------------------------------ | -------- |
| **Logical** | Single Pixeltable instance with directory namespaces (`pxt.create_dir(f"user_{user_id}")`) | Dev/staging environments, simple multi-user apps | Low |
| **Physical** | Separate container instances per tenant | SaaS with strict data isolation | High |
**Logical Isolation Example:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Per-user isolation via namespaces
pxt.create_dir(f"user_{user_id}", if_exists='ignore')
user_table = pxt.create_table(f"user_{user_id}/chat_history", schema={...})
```
### High Availability Constraints
| Configuration | Status | Details |
| ------------------------------------------- | --------------- | ----------------------------------------------------------------------------------------------------------------------- |
| **Single Pod + ReadWriteOnce PVC** | ✅ Supported | One active pod writes to dedicated volume. Failover requires volume detach/reattach. |
| **Multiple Pods + Shared Volume (NFS/EFS)** | ❌ Not Supported | **Will cause database corruption.** Do not mount same `pgdata` to multiple pods. |
| **Multi-Node HA** | 🔜 Coming 2026 | Available in Pixeltable Cloud (serverless scaling, API endpoints). [Join waitlist](https://www.pixeltable.com/waitlist) |
**Single-Writer Limitation:** Pixeltable's storage layer uses an embedded PostgreSQL instance. **Only one process can write to `~/.pixeltable/pgdata` at a time**.
## Troubleshooting
### Reset Database (Development Only)
To completely reset Pixeltable's local state during development:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Stop all Pixeltable processes first, then:
rm -rf ~/.pixeltable/pgdata ~/.pixeltable/media ~/.pixeltable/file_cache
```
**This deletes all data.** Only use in development. For production, use backups and `table.revert()` or snapshots instead.
### Common Issues
| Symptom | Cause | Solution |
| ---------------------------- | -------------------------- | ---------------------------------------------------------------------------- |
| "Cannot connect to database" | Stale lock file | Remove `~/.pixeltable/pgdata/postmaster.pid` if no process is running |
| Slow first query | File cache miss | Files download on first access; subsequent queries are fast |
| "Table not found" | Wrong namespace | Check `pxt.list_tables()` and verify `config.APP_NAMESPACE` |
| OOM on large media | Full file loaded to memory | Use iterators (`FrameIterator`, `DocumentSplitter`) to process incrementally |
### Environment Separation
Use environment-specific namespaces to manage dev/staging/prod configurations:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# config.py
import os
ENV = os.getenv('ENVIRONMENT', 'dev')
APP_NAMESPACE = f'{ENV}_myapp' # Creates: dev_myapp, staging_myapp, prod_myapp
# Model and API configuration
EMBEDDING_MODEL = os.getenv('EMBEDDING_MODEL', 'intfloat/e5-large-v2')
OPENAI_MODEL = os.getenv('OPENAI_MODEL', 'gpt-4o-mini')
# Optional: Cloud storage for generated media
MEDIA_STORAGE_BUCKET = os.getenv('MEDIA_STORAGE_BUCKET')
```
## Testing
**Staging Environment:**
* Mirror production configuration.
* Test schema changes, UDF updates, application code changes.
* Use representative data (anonymized or synthetic).
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Test environment with isolated namespace
import pixeltable as pxt
TEST_NS = 'test_myapp'
pxt.create_dir(TEST_NS, if_exists='replace')
# Run setup targeting test namespace
# Execute tests
# pxt.drop_dir(TEST_NS, force=True) # Cleanup
```
# Deployment Overview
Source: https://docs.pixeltable.com/howto/deployment/overview
Compare deployment options for Pixeltable applications across local, server, container, and managed cloud environments to pick the best fit.
## What Pixeltable Replaces
Most multimodal AI stacks look like this: blob storage for media, a relational database for metadata, a vector database for embeddings, an orchestrator for scheduling, and custom glue code holding it all together.
```mermaid theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
flowchart LR
S3[S3 / GCS] --> Orch[Airflow / Prefect]
Orch --> PG[(PostgreSQL)]
Orch --> VDB[(Vector DB)]
PG --- Cache[Redis]
PG --- Glue[Glue Code]
VDB --- Glue
```
**5+ services to deploy and maintain:** blob storage, orchestrator, relational DB, vector DB, cache — plus custom retry logic, rate limiting, sync scripts, and error handling to wire them together.
```mermaid theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
flowchart LR
Refs[S3 / GCS] -->|references| CC[Computed Columns]
CC --> Query[Query + Search]
```
**1 Python import.** Storage, orchestration, caching, vector indexing, rate limiting, and retry logic are built in. The infrastructure you don't deploy is infrastructure you don't maintain.
### Systems Pixeltable Replaces
You don't install, configure, or manage these — Pixeltable handles them natively.
| Instead of … | With Pixeltable … |
| --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **PostgreSQL / MySQL** | `pxt.create_table()` — schema is Python, versioned automatically |
| **Pinecone / Weaviate / Qdrant** | `add_embedding_index()` — one line, auto-maintained on insert/update/delete |
| **S3 / boto3 / blob storage** | `pxt.Image` / `Video` / `Audio` / `Document` types with transparent caching; `destination='s3://…'` for cloud routing |
| **Airflow / Prefect / Celery** | Computed columns trigger on insert — no orchestrator, no workers, no DAGs |
| **LangChain / LlamaIndex** (RAG) | `@pxt.query` + `.similarity()` + computed column chaining |
| **pandas / polars** (multimodal) | `.sample()`, ephemeral UDFs, then `add_computed_column()` to commit — [same code, prototype to production](/howto/cookbooks/core/dev-iterative-workflow) |
| **DVC / MLflow / W\&B** | Built-in [`history()`](/platform/version-control), [`revert()`](/platform/version-control), time travel (`table:N`), [snapshots](/platform/version-control) — zero config |
| **Custom retry / rate-limit / caching** | Built into every [AI integration](/integrations/frameworks); results cached, only new rows recomputed |
| **Custom ETL / glue code** | Declarative schema — Pixeltable handles execution, caching, incremental updates |
### Tools Pixeltable Abstracts
These tools run under the hood, but you interact through a cleaner interface. This is a sample — Pixeltable wraps [30+ AI providers](/integrations/frameworks), [dozens of built-in functions](/sdk/latest/image) for media and data processing, and supports any Python library via [`@pxt.udf`](/platform/udfs-in-pixeltable).
| Tool | Raw usage | Through Pixeltable |
| --------------------------------- | ------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **FFmpeg** | Install binary, subprocess calls, format conversion, frame seeking | [`extract_audio(video, format='mp3')`](/sdk/latest/video#udf-extract_audio) for audio; [`frame_iterator(video, fps=1)`](/sdk/latest/video#iterator-frame_iterator) for frame extraction via `pxt.create_view()` |
| **Pillow/PIL** | `Image.open()`, resize, convert, encode, save, handle formats | [`pixeltable.functions.image`](/sdk/latest/image) module: `resize()`, `crop()`, `thumbnail()`, `b64_encode()`, `rotate()`, `blend()`, plus `width()`, `height()`, `get_metadata()` |
| **spaCy** | `pip install spacy`, download model, load pipeline, parse documents | [`document_splitter(doc, separators='sentence')`](/sdk/latest/document#iterator-document_splitter) — spaCy runs under the hood (configurable via `spacy_model` parameter). Also supports `'heading'`, `'paragraph'`, `'page'`, `'token_limit'`, `'char_limit'` separators |
| **sentence-transformers** | Load model, tokenize, encode batches, normalize vectors | [`sentence_transformer.using(model_id='intfloat/e5-large-v2')`](/sdk/latest/huggingface) passed to `add_embedding_index()`. Pixeltable handles model loading, batching, and index maintenance |
| **OpenAI CLIP** | Load model, preprocess images/text differently, encode, handle multimodal alignment | [`clip.using(model_id='openai/clip-vit-base-patch32')`](/sdk/latest/huggingface) — multimodal embedding index that accepts both image and text queries for cross-modal search |
| **OpenAI Whisper** | API key setup, audio format handling, chunking long files, parsing responses | [`openai.transcriptions(audio=table.audio_col, model='whisper-1')`](/sdk/latest/openai#udf-transcriptions) as a computed column — automatic rate limiting, caching. Also supports local Whisper via [`whisper.transcribe()`](/sdk/latest/whisper) |
| **Anthropic Claude tool calling** | Construct messages, define tool schemas as JSON, parse tool\_use blocks, execute tools, re-call with results | [`anthropic.messages()`](/sdk/latest/anthropic) + [`anthropic.invoke_tools()`](/sdk/latest/anthropic) + [`pxt.tools()`](/howto/cookbooks/agents/llm-tool-calling) — all as chained computed columns. Tool schemas derived automatically from `@pxt.udf` function signatures |
| **+ many more** | | See the full [SDK Reference](/sdk/latest/pixeltable), [AI Integrations](/integrations/frameworks), and [Cookbooks](/howto/cookbooks/agents/pattern-rag-pipeline) |
### What Pixeltable Doesn't Replace
You still need these — Pixeltable is a data layer, not a full application framework.
| Tool | Why you still need it |
| ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| **FastAPI / Flask / Django** | Standard CRUD endpoints can use built-in [HTTP serving](/howto/deployment/serving); custom logic still needs a framework |
| **Pydantic** | Request/response validation for your API endpoints (Pixeltable's `.to_pydantic()` bridges the two) |
| **React / Vue / frontend** | UI layer — Pixeltable has no frontend |
| **Docker / Kubernetes / Terraform** | Deployment infrastructure — Pixeltable runs *inside* your containers, it doesn't provision them |
| **Authentication / authorization** | User management, API keys, OAuth — outside Pixeltable's scope |
| **Domain-specific UDFs** | Business logic you write as `@pxt.udf` functions (e.g., web search, custom scoring) — Pixeltable provides the framework, you provide the logic |
**Migrating from a specific stack?** See the step-by-step migration guides with side-by-side code comparisons:
* [From DIY Data Pipelines](/migrate/from-diy-data-pipeline) — replace custom scripts, DVC, Airflow, and manual processing
* [From RDBMS & Vector DBs](/migrate/from-rdbms-vectordbs) — replace Postgres + Pinecone + LangChain RAG stacks
* [From Agent Frameworks](/migrate/from-agent-frameworks) — replace LangGraph, CrewAI, and similar agent DSLs
## Deployment Decision Guide
Pixeltable supports three production deployment patterns. Choose based on your constraints:
| Question | Answer | Recommendation |
| -------------------------------------------------------------- | ------ | --------------------------------------------------------------------- |
| Building a web app with a frontend? | Yes | **Full Backend** (FastAPI + React) |
| Need an API with zero web code? | Yes | **Declarative Serving** (`pxt serve` from TOML) |
| Need batch/background processing (cron, queue, Cloud Run Job)? | Yes | **Batch Processing** (pure Python script, no HTTP server) |
| Existing production DB that must stay? | Yes | **Batch Processing** (process in Pixeltable, `export_sql` to your DB) |
| Need semantic search (RAG)? | Yes | **Full Backend** or **Declarative Serving** |
| Expose Pixeltable as MCP server for LLM tools? | Yes | **Full Backend** + [MCP Server](/libraries/mcp) |
### Technical Capabilities (Both)
Regardless of deployment mode, you get:
* **[Multimodal Types](/platform/type-system):** Native handling of Video, Document, Audio, Image, JSON.
* **[Computed Columns](/tutorials/computed-columns):** Automatic incremental updates and dependency tracking.
* **[Views & Iterators](/platform/views):** Built-in logic for chunking documents, extracting frames, etc.
* **[Model Orchestration](/integrations/frameworks):** Rate-limited API calls to OpenAI, Anthropic, Gemini, local models.
* **[Data Interoperability](/sdk/latest/io):** Import/export CSV, JSON, Parquet, PyTorch, LanceDB, pandas.
* **[Configurable Media Storage](/platform/configuration):** Per-column destination (local or cloud bucket).
### Use Case Comparison
| Capability | [ML Data Wrangling](/use-cases/ml-data-wrangling) | [AI Applications](/use-cases/ai-applications) |
| --------------------- | ------------------------------------------------- | --------------------------------------------- |
| **Multimodal Types** | ✅ Video, Audio, Image, Document | ✅ Video, Audio, Image, Document |
| **Computed Columns** | ✅ Enrichment & pre-annotation | ✅ Pipeline orchestration |
| **Embedding Indexes** | ✅ Curation & similarity search | ✅ RAG & retrieval |
| **Versioning** | ✅ Dataset snapshots | ✅ Data lineage |
| **Data Sharing** | ✅ Publish datasets | ✅ Team collaboration |
***
## Deployment Strategies
### Approach 1: Batch Processing
Use Pixeltable as a batch processing engine: a Python script that ingests data, lets computed columns process it, exports results to your existing serving database via `export_sql`, and exits. **No HTTP server, no FastAPI.** Run it as a Cloud Run Job, ECS Task, Kubernetes Job, Lambda, or a cron container.
```mermaid theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
flowchart TB
Trigger["Cron / Queue / Webhook"]
subgraph Container["Ephemeral Container"]
Schema["Create Schema"]
Ingest["Ingest Data"]
Process["Computed Columns"]
end
DB["Serving DB (export_sql)"]
Bucket["Cloud Bucket (destination)"]
Trigger --> Schema --> Ingest --> Process
Process -->|"structured data"| DB
Process -->|"generated media"| Bucket
```
* Existing RDBMS (PostgreSQL, MySQL, Snowflake) and blob storage (S3, GCS, Azure Blob) must remain
* Long-running batch jobs (processing thousands of documents, hours of video)
* Background tasks triggered by a queue, cron, or webhook
* You don't need an HTTP API at all
* Run Pixeltable in an ephemeral container (Cloud Run Job, ECS Fargate, K8s Job, Lambda)
* Define tables, views, computed columns in `schema.py` (idempotent)
* Insert data from queue, RDBMS, or cloud storage
* Computed columns process everything automatically (chunking, embeddings, LLM calls)
* `export_sql` pushes structured results to your serving database
* `destination` parameter routes generated media to cloud buckets
* Container exits when done
* Native multimodal type system (Video, Document, Audio, Image, JSON)
* Declarative computed columns eliminate orchestration boilerplate
* Incremental computation automatically handles new data
* `export_sql` for any SQL database (PostgreSQL, MySQL, Snowflake, SQLite)
* `destination` parameter for routing media to S3/GCS/Azure Blob
* LLM call orchestration with automatic rate limiting
* Iterators for chunking documents, extracting frames, splitting audio
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# schema.py: declarative schema (idempotent, safe to re-run)
import pixeltable as pxt
from pixeltable.functions.huggingface import sentence_transformer
from pixeltable.functions.string import string_splitter
from pixeltable.functions.uuid import uuid7
pxt.create_dir('pipeline', if_exists='ignore')
embed_fn = sentence_transformer.using(model_id='all-MiniLM-L6-v2')
documents = pxt.create_table('pipeline.documents', {
'title': pxt.String,
'body': pxt.String,
'source_id': pxt.String,
'uuid': uuid7(),
'timestamp': pxt.Timestamp,
}, primary_key=['uuid'], if_exists='ignore')
sentences = pxt.create_view(
'pipeline.sentences', documents,
iterator=string_splitter(text=documents.body, separators='sentence'),
if_exists='ignore',
)
sentences.add_embedding_index(
'text', idx_name='sentences_embed', string_embed=embed_fn, if_exists='ignore'
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# pipeline.py: ingest, compute, export, exit
import json
from datetime import datetime
from pixeltable.io.sql import export_sql
import schema
SERVING_DB_URL = 'postgresql+psycopg://user:pass@host/db'
with open('batch.json') as f:
batch = json.load(f)
now = datetime.now()
for row in batch['documents']:
row.setdefault('timestamp', now)
# Insert triggers computed columns: chunking, embeddings, etc.
schema.documents.insert(batch['documents'])
# Export structured results to serving DB
export_sql(
schema.documents.select(
schema.documents.source_id,
schema.documents.title,
schema.documents.body,
),
'processed_documents',
db_connect_str=SERVING_DB_URL,
if_exists='replace',
)
```
### Approach 2: Pixeltable as Full Backend
Use Pixeltable for both orchestration and storage as your primary data backend.
```mermaid theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
flowchart TB
Frontend[Frontend App]
API[FastAPI / Flask / Django]
subgraph Pixeltable[Pixeltable Full Backend]
PG[(PostgreSQL Metadata & Data)]
Media[Media Storage S3/GCS/Local]
Compute[Computed Columns Embeddings & LLMs]
PG --- Media
PG --- Compute
end
Frontend --> API
API --> Pixeltable
```
* Building new multimodal AI application
* Semantic search and vector similarity required
* Storage and ML pipeline need tight integration
* Stack consolidation preferred over separate storage/orchestration layers
* Deploy Pixeltable on persistent instance (EC2 with EBS, EKS with persistent volumes, VM)
* Build API endpoints (FastAPI, Flask, Django) that interact with Pixeltable tables
* Frontend calls endpoints to insert data and retrieve results
* Query using Pixeltable's semantic search, filters, joins, and aggregations
* All data stored in Pixeltable: metadata, media references, computed column results
* Unified storage, computation, and retrieval in single system
* Native semantic search via embedding indexes (pgvector)
* No synchronization layer between storage and orchestration
* Automatic versioning and lineage tracking
* Incremental computation propagates through views
* LLM/agent orchestration
* Data export to PyTorch, Parquet, LanceDB
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Example: FastAPIRouter endpoints backed by Pixeltable
import fastapi
import pixeltable as pxt
from pixeltable.serving import FastAPIRouter
app = fastapi.FastAPI()
router = FastAPIRouter(prefix="/api", tags=["documents"])
docs = pxt.get_table('myapp/documents')
# File upload with background processing (returns job handle)
router.add_insert_route(docs, path="/upload",
uploadfile_inputs=["document"], inputs=["uploaded_at"],
outputs=["uuid"], background=True)
# Search via @pxt.query
@pxt.query
def search_documents(query_text: str, limit: int = 10):
sim = docs.embedding.similarity(string=query_text)
return docs.order_by(sim, asc=False).limit(limit).select(
docs.document, docs.summary, similarity=sim)
router.add_query_route(path="/search", query=search_documents)
# Delete by primary key
router.add_delete_route(docs, path="/delete")
app.include_router(router)
```
`FastAPIRouter` auto-generates request/response schemas from column types, handles file uploads via `uploadfile_inputs`, and supports `background=True` for long-running inserts. OpenAPI docs are available at `/docs`.
**When to keep hand-written endpoints:** Use `@router.post()` for multi-table operations, conditional logic, or custom response shapes. Since `FastAPIRouter` extends `APIRouter`, hand-written and declarative routes coexist on the same router. See the [migration guide](/migrate/from-hand-written-endpoints) for details.
**Use sync (`def`) endpoints, not `async def`.** FastAPI dispatches sync endpoints to a thread pool, giving each request its own thread. Pixeltable is thread-safe and handles concurrent requests automatically. Using `async def` would block the event loop and serialize all requests. See [Production Operations](/howto/deployment/operations) for details.
### Approach 3: Declarative Serving (`pxt serve`)
Generate a complete REST API from a TOML config. No FastAPI code, no frontend, no hand-written endpoints. Define your schema in Python, declare routes in `pyproject.toml`, and run `pxt serve`.
```mermaid theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
flowchart LR
Client["REST Client / cURL"]
TOML["pyproject.toml"]
subgraph PXT["pxt serve"]
API["Auto-generated FastAPI"]
Tables["Pixeltable Tables"]
end
TOML -->|declares routes| PXT
Client --> API --> Tables
```
* You need an API but not a frontend
* Endpoints are standard insert, query, delete, or `export_sql` operations
* Prototyping an API before building a full application
* You want zero Python web framework code
* Define tables, views, computed columns, embedding indexes in `schema.py`
* Declare routes in `pyproject.toml` using `[[tool.pixeltable.service.routes]]`
* Run `pxt serve my-service` to generate and start a FastAPI app
* Supports insert, query, delete, and `export_sql` route types
* Auto-generates OpenAPI/Swagger docs
* Complete REST API from configuration alone
* Auto-generated request/response schemas
* Background job support for long-running inserts
* `export_sql` routes for pushing data to external databases
* OpenAPI documentation out of the box
* Same Pixeltable capabilities (computed columns, embedding indexes, etc.)
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# pyproject.toml
[project]
name = "my-api"
requires-python = ">=3.10"
[[tool.pixeltable.service]]
name = "my-service"
prefix = "/api"
modules = ["schema"]
[[tool.pixeltable.service.routes]]
type = "query"
path = "/search"
query = "schema:search_documents"
[[tool.pixeltable.service.routes]]
type = "insert"
table = "pipeline.documents"
path = "/ingest"
inputs = ["title", "body"]
outputs = ["title", "body"]
[[tool.pixeltable.service.routes]]
type = "export_sql"
path = "/export"
query = "schema:export_query"
db_url = "postgresql+psycopg://user:pass@host/db"
table_name = "exported_docs"
if_exists = "replace"
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve my-service
# curl -X POST localhost:8000/api/search -d '{"query_text": "machine learning"}'
```
## Get Started
Scaffold a project in one command, then customize:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
uvx pixeltable-new myapp # declarative serving (default)
uvx pixeltable-new myapp --backend # full FastAPI + React app
uvx pixeltable-new myapp --batch # batch processing script
```
Or scaffold a vertical application template:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
uvx pixeltable-new myapp --template knowledge-base # cross-modal search + RAG Q&A
uvx pixeltable-new myapp --template chat-agent # tool-calling agent with memory
uvx pixeltable-new myapp --template audio-transcription # audio transcription + search
uvx pixeltable-new myapp --template full-stack-showcase # FastAPI + React reference app
uvx pixeltable-new myapp --template video-search # video frame analysis + search
uvx pixeltable-new myapp --template media-indexing # enterprise media processing + export
uvx pixeltable-new myapp --template image-dataset # ML dataset auto-labeling + export
```
Or clone the full [Starter Kit](https://github.com/pixeltable/pixeltable-starter-kit) for reference implementations with Docker, Helm, Terraform, CDK, and cloud job runners.
The starter kit contains reference implementations for all three deployment patterns:
| Directory | Pattern | What it demonstrates |
| -------------------------------------------------------------------------------------------- | ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **[`backend/` + `frontend/`](https://github.com/pixeltable/pixeltable-starter-kit)** | **Full Backend** | FastAPI + React with persistent storage, multimodal upload, cross-modal search, tool-calling agent. Deployment via Docker, Helm, Terraform, CDK. |
| **[`batch/`](https://github.com/pixeltable/pixeltable-starter-kit/tree/main/batch)** | **Batch Processing** | Pure Python script: ingest, computed columns, `export_sql` to serving DB. Deploy to Cloud Run Jobs, Lambda, ECS Fargate, K8s Jobs. |
| **[`serving/`](https://github.com/pixeltable/pixeltable-starter-kit/tree/main/serving)** | **Declarative Serving** | `pxt serve` from TOML config: zero-code REST API with insert, query, search, and `export_sql` routes. |
| **[`templates/`](https://github.com/pixeltable/pixeltable-starter-kit/tree/main/templates)** | **Application Templates** | 7 vertical templates (knowledge base, chat agent, audio transcription, video search, media indexing, image dataset, full-stack showcase) scaffolded via `pixeltable-new --template`. |
## Next Steps
Expose tables and queries as HTTP endpoints with TOML or Python
Code organization and storage architecture
Concurrency, error handling, and schema evolution
Backup strategies and security best practices
# Security & Backup
Source: https://docs.pixeltable.com/howto/deployment/security
Secure Pixeltable deployments with backup strategies, disaster recovery procedures, access controls, and credential management best practices.
## Backup Strategies
| Deployment Approach | Backup Strategy | Recovery Method |
| -------------------- | ------------------------------------------------------- | ------------------------------- |
| **Batch Processing** | External RDBMS + Blob Storage backups | Re-run transformation pipelines |
| **Full Backend** | `pg_dump` of `~/.pixeltable/pgdata` + S3/GCS versioning | Restore `pgdata` + media files |
### Full Backend Backup
For deployments using Pixeltable as the full backend:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Backup PostgreSQL data
pg_dump -h ~/.pixeltable/pgdata -U postgres pixeltable > backup.sql
# Backup media files (if stored locally)
tar -czf media_backup.tar.gz ~/.pixeltable/media/
# For cloud media storage, ensure S3/GCS versioning is enabled
```
### Batch Processing Backup
For batch processing deployments:
* Primary data lives in your external RDBMS and blob storage
* Pixeltable state can be rebuilt by re-running transformation pipelines
* Back up your `schema.py` and UDF code in version control
## Recovery Procedures
### Full Backend Recovery
1. Stop the Pixeltable application
2. Restore PostgreSQL data: `psql -f backup.sql`
3. Restore media files to `~/.pixeltable/media/`
4. Restart the application
### Batch Processing Recovery
1. Deploy fresh Pixeltable instance
2. Run `python schema.py` to recreate schema (idempotent with `if_exists='ignore'`)
3. Re-process data through computed columns (incremental)
## Security Best Practices
| Security Layer | Recommendation | Implementation |
| --------------------- | ---------------------------------- | ------------------------------------------------ |
| **Network** | Deploy within private VPC | Do not expose PostgreSQL port (5432) to internet |
| **Authentication** | Application layer (FastAPI/Django) | Pixeltable does not manage end-user accounts |
| **Cloud Credentials** | IAM Roles / Workload Identity | Avoid long-lived keys in `config.toml` |
### Network Security
```yaml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Example: Kubernetes NetworkPolicy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: pixeltable-network-policy
spec:
podSelector:
matchLabels:
app: pixeltable
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: api-server
ports:
- protocol: TCP
port: 8000
```
### Secrets Management
**Never hardcode secrets.** Use environment variables or secrets managers:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# config.py - Load from environment
import os
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
AWS_ACCESS_KEY_ID = os.getenv('AWS_ACCESS_KEY_ID')
AWS_SECRET_ACCESS_KEY = os.getenv('AWS_SECRET_ACCESS_KEY')
# Or use python-dotenv for local development
from dotenv import load_dotenv
load_dotenv()
```
For production, use:
* **AWS:** Secrets Manager, Parameter Store
* **GCP:** Secret Manager
* **Kubernetes:** Secrets, External Secrets Operator
### Cloud Storage Credentials
For S3/GCS/Azure media storage:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Prefer IAM roles over long-lived credentials
# AWS: Use EC2 instance profile or EKS IRSA
# GCP: Use Workload Identity
# If credentials required, set via environment variables:
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
# GOOGLE_APPLICATION_CREDENTIALS
```
## Audit and Compliance
### Data Lineage
Pixeltable automatically tracks:
* Table versions and schema changes
* Computed column definitions and dependencies
* Insert/update/delete operations
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View table history
table.history()
# Get specific version
old_version = pxt.get_table('myapp/documents:5') # Version 5
```
### Access Logging
Implement application-level access logging:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from fastapi import FastAPI, Request
import logging
logger = logging.getLogger("audit")
@app.middleware("http")
async def audit_log(request: Request, call_next):
logger.info(f"User: {request.user} Action: {request.method} {request.url}")
response = await call_next(request)
return response
```
## Disaster Recovery
### Recovery Time Objectives
| Deployment | RTO | Strategy |
| ---------------- | ------- | -------------------------------------------- |
| Batch Processing | Minutes | Spin up new instance, re-run pipelines |
| Full Backend | Hours | Restore from backup, validate data integrity |
### Recommendations
1. **Regular backups:** Daily for production workloads
2. **Test recovery:** Quarterly disaster recovery drills
3. **Multi-region:** Store backups in different region than primary
4. **Immutable backups:** Use S3 Object Lock or GCS retention policies
# Serving Tables and Queries over HTTP
Source: https://docs.pixeltable.com/howto/deployment/serving
Serve Pixeltable tables and queries as HTTP API endpoints using TOML service definitions or Python code with FastAPIRouter integration.
## Overview
Pixeltable can expose table operations and queries as HTTP endpoints via
`pixeltable.serving.FastAPIRouter`. You can configure endpoints either
programmatically in Python or declaratively with a TOML service file.
This page builds on the HTTP-based paths from the
[Deployment Strategies](/howto/deployment/overview#deployment-strategies)
overview:
* [Approach 2: Pixeltable as Full Backend](/howto/deployment/overview#approach-2-pixeltable-as-full-backend):
use the Pixeltable SDK's `FastAPIRouter` when you already have, or are
building, a FastAPI app.
* [Approach 3: Declarative Serving](/howto/deployment/overview#approach-3-declarative-serving-pxt-serve):
use the Pixeltable CLI command `pxt serve` when you want Pixeltable to create
and run the FastAPI app from TOML or CLI flags.
FastAPI and uvicorn are optional dependencies. Install them before using
`pxt serve` or the Python serving API:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pip install 'fastapi[standard]'
```
## Quickstart (TOML)
This is the primary config-file path for [Approach 3: Declarative Serving](/howto/deployment/overview#approach-3-declarative-serving-pxt-serve).
Add a `[tool.pixeltable]` section to your `pyproject.toml`:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# pyproject.toml
[[tool.pixeltable.service]]
name = "my-service"
port = 8000
modules = ["schema"] # Python modules to import (creates tables/views)
[[tool.pixeltable.service.routes]]
type = "insert"
table = "my_dir.my_table"
path = "/insert"
inputs = ["prompt"]
outputs = ["prompt", "result"]
```
Start the service:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve my-service
```
You can also use a standalone TOML file (`pxt serve my-service --config service.toml`) with the same route syntax but `[[service]]` and `[[service.routes]]` as top-level keys. The `pyproject.toml` format is preferred for new projects.
The service is now running at `http://localhost:8000` with auto-generated
[OpenAPI docs](https://fastapi.tiangolo.com/features/#automatic-docs) at `/docs`.
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl -X POST http://localhost:8000/insert \
-H 'Content-Type: application/json' \
-d '{"prompt": "a sunset over the ocean"}'
# {"prompt": "a sunset over the ocean", "result": "..."}
```
## Quickstart (Python)
This is the SDK path for [Approach 2: Pixeltable as Full Backend](/howto/deployment/overview#approach-2-pixeltable-as-full-backend).
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import fastapi
import uvicorn
import pixeltable as pxt
from pixeltable.serving import FastAPIRouter
t = pxt.get_table('my_dir.my_table')
app = fastapi.FastAPI()
router = FastAPIRouter()
router.add_insert_route(t, path='/insert', inputs=['prompt'], outputs=['prompt', 'result'])
router.add_update_route(t, path='/update', inputs=['prompt'], outputs=['id', 'prompt', 'result'])
app.include_router(router)
uvicorn.run(app, host='0.0.0.0', port=8000)
```
`@pxt.query` eagerly evaluates the function body at decoration time. Create tables
**before** defining queries that reference them. Use `if_exists='ignore'` to keep
table creation idempotent. See the
[migration guide](/migrate/from-hand-written-endpoints) for examples.
## Quickstart (single-endpoint CLI)
This is a single-endpoint CLI path for [Approach 3: Declarative Serving](/howto/deployment/overview#approach-3-declarative-serving-pxt-serve).
For quick experiments and one-off endpoints, you can skip the TOML file and
configure a single route directly on the command line:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# one insert endpoint
pxt serve insert --table my_dir.my_table --path /generate \
--inputs prompt --outputs prompt result --port 8000
# one update endpoint
pxt serve update --table my_dir.my_table --path /update \
--inputs prompt --outputs id prompt result
# one delete endpoint
pxt serve delete --table my_dir.my_table --path /delete
# one query endpoint
pxt serve query --query myapp.queries.search_docs --path /search
```
Single-endpoint mode accepts the same fields as the equivalent `[[service.routes]]`
TOML entry. It is meant for development and ad-hoc serving; for anything
beyond that use the TOML file (it is the only way to expose more than one
endpoint).
## Decorator-style routes
`add_insert_route()` / `add_update_route()` build a response model automatically
from the column schema. When you need a *different* response shape -- e.g. a
richer API envelope, derived fields, or a stripped-down payload -- use the
`@router.insert_route` / `@router.update_route` decorators. The decorated
function receives the requested output columns as keyword arguments and returns
a `pydantic.BaseModel` subclass that defines the HTTP response body.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pydantic
from pixeltable.serving import FastAPIRouter
router = FastAPIRouter()
class GenerateResponse(pydantic.BaseModel):
caption: str
score: float
@router.insert_route(t, path='/generate', inputs=['prompt'], outputs=['caption', 'score'])
def format_insert(*, caption: str, score: float) -> GenerateResponse:
return GenerateResponse(caption=caption.strip(), score=round(score, 3))
@router.update_route(t, path='/update', inputs=['prompt'], outputs=['id', 'caption', 'score'])
def format_update(*, id: int, caption: str, score: float) -> GenerateResponse:
return GenerateResponse(caption=caption.strip(), score=round(score, 3))
```
Rules the framework enforces at registration time:
* Every parameter must be keyword-only and have a type annotation.
* Every parameter name must appear in `outputs`, and every `outputs` entry must
be a parameter.
* Parameter annotations must match the column types (strict nullability: a
nullable column requires `T | None`). Media columns are delivered as URL
strings, so annotate them as `str`.
* The return annotation must be a `pydantic.BaseModel` subclass.
Decorator routes support the same `background=True` flag as the non-decorator
forms; when enabled, the decorated function's return value is delivered as the
background-job result.
The decorator forms are Python-only -- there is no equivalent in the TOML
service file.
## Exporting rows to an external database
Insert and update routes can export each successful request as a row in an
external SQL database. Configure it via the Python API, the TOML service file,
or the CLI; all three forms route through the same `SqlExport` specification.
Each request performs the Pixeltable insert/update first; only if it succeeds
does the row get written to the external table.
**Python API.** Pass an `export_sql=SqlExport(...)` argument to
`add_insert_route`, `add_update_route`, or the decorator forms.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.serving import FastAPIRouter, SqlExport
router = FastAPIRouter()
router.add_insert_route(
t,
path='/generate',
inputs=['prompt'],
outputs=['prompt', 'result'],
export_sql=SqlExport(
db_connect='postgresql+psycopg://user:pw@host/analytics',
table='generations',
),
)
```
**TOML.** Add a nested `[routes.export_sql]` table under any insert or update
route. The same fields apply.
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[[routes]]
type = "insert"
table = "my_dir.my_table"
path = "/generate"
inputs = ["prompt"]
outputs = ["prompt", "result"]
[routes.export_sql]
db_connect = "postgresql+psycopg://user:pw@host/analytics"
table = "generations"
# db_schema = "public" # optional
# method = "insert" # default; alternatives: "update", "merge" (not yet supported)
```
**CLI.** Single-endpoint mode supports the same fields via `--export-sql-*`
flags on `pxt serve insert` and `pxt serve update`.
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve insert --table my_dir.my_table --path /generate \
--inputs prompt --outputs prompt result \
--export-sql-db-connect 'postgresql+psycopg://user:pw@host/analytics' \
--export-sql-table generations
```
The row written to the target is the response body: the same columns as
`outputs`, with media-typed columns rendered as URL strings (so the
corresponding target columns must be string-typed). Schema compatibility is
validated once at registration; the target table must already exist or
registration fails. The engine and connection pool are cached per
`db_connect`, so multiple routes pointing at the same database share one pool.
`SqlExport.method` controls how each row is written:
* `'insert'` (default): append the row via `INSERT ... VALUES`. The target
acts as an audit log -- replaying the same request produces a duplicate.
* `'update'`: update the row by primary-key match. The target is treated as a
current-state view keyed on its primary key. The response columns must
include all primary-key columns of the target plus at least one non-PK
column to set; the target table itself must declare a primary key. This is a
strict update, not an upsert -- if no row matches, the request fails with
HTTP 500.
* `'merge'` (upsert): not yet supported.
When paired with `add_insert_route` and `method='update'`, a Pixeltable insert
triggers a target-side update. This is intentional: it supports the pattern
where the Pixeltable table is append-only (e.g. an event log) but the target
is a deduplicated, current-state view.
If the external write fails after the Pixeltable insert/update has already
committed, the request returns HTTP 500; no rollback is performed.
`export_sql=` is mutually exclusive with `return_fileresponse=True` and is
compatible with `background=True` (the SQL write runs in the worker thread).
See [`SqlExport`](https://docs.pixeltable.com/sdk/latest/pixeltable/serving/SqlExport)
for the full target specification.
Connection strings with embedded passwords land in plaintext on disk (TOML)
or in process listings (CLI). Don't commit `service.toml`; consider a
connection string that pulls credentials from the environment or a
`.pgpass`-style file. Env-var substitution within the TOML is not yet
supported.
## TOML Service File Reference
### `[service]` (optional)
Top-level settings for the FastAPI application and server.
| Field | Type | Default | Description |
| -------- | ------- | -------------- | ------------------------------------------------------------------------- |
| `title` | string | `"Pixeltable"` | Title shown in the OpenAPI docs |
| `prefix` | string | `""` | URL prefix prepended to all route paths (must be empty or start with `/`) |
| `host` | string | `"0.0.0.0"` | Server bind address |
| `port` | integer | `8000` | Server bind port |
### `modules` (optional)
A list of additional Python modules to import at startup, for their
registration side effects. The module that hosts a query route's dotted
path is imported automatically, so this is only needed when you depend
on `@pxt.query` / `@pxt.udf` definitions in *other* modules that wouldn't
otherwise be loaded.
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
modules = ["myapp.queries", "myapp.udfs"]
```
### `[[service.routes]]`
Each `[[service.routes]]` entry defines one HTTP endpoint. The `type` field determines
the route kind and which additional fields are valid.
#### Common fields
| Field | Type | Required | Default | Description |
| ------------ | ------------------------------------------------ | -------- | ------- | ---------------------------------------------------------------- |
| `type` | `"insert"` / `"update"` / `"delete"` / `"query"` | yes | -- | Route kind |
| `path` | string | yes | -- | URL path (e.g., `"/generate"`) |
| `background` | bool | no | `false` | Run the operation in a background thread and return a job handle |
When `background = true`, the endpoint returns immediately with:
```json theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{"id": "abc123", "job_url": "http://localhost:8000/jobs/abc123"}
```
Poll `job_url` until `status` is `"done"` or `"error"`.
***
### Insert routes
Insert a single row into a table and return the resulting column values.
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[[service.routes]]
type = "insert"
table = "my_dir/my_table"
path = "/generate"
inputs = ["prompt"]
outputs = ["prompt", "result"]
```
| Field | Type | Required | Default | Description |
| --------------------- | --------------- | -------- | ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `table` | string | yes | -- | Pixeltable table path |
| `inputs` | list of strings | no | all non-computed columns | Columns to accept as request fields |
| `uploadfile_inputs` | list of strings | no | -- | Columns to accept as file uploads (multipart) |
| `outputs` | list of strings | no | all columns | Columns to include in the response |
| `return_fileresponse` | bool | no | `false` | Return the single media-typed output as a raw file download |
| `export_sql` | nested table | no | -- | Export each request as a row into an external SQL database. See [Exporting rows to an external database](#exporting-rows-to-an-external-database). |
**File uploads:** when `uploadfile_inputs` is set, the request uses `multipart/form-data`
instead of JSON. All other inputs become form fields.
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[[service.routes]]
type = "insert"
table = "my_dir/images"
path = "/resize"
inputs = ["width", "height"]
uploadfile_inputs = ["image"]
outputs = ["resized"]
return_fileresponse = true
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl -X POST http://localhost:8000/resize \
-F image=@photo.jpg -F width=640 -F height=480 \
--output resized.jpg
```
***
### Update routes
Update a single row identified by its primary key values, and return the updated
columns (including any computed columns that depend on them).
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[[service.routes]]
type = "update"
table = "my_dir/my_table"
path = "/update"
inputs = ["prompt"]
outputs = ["id", "prompt", "result"]
```
| Field | Type | Required | Default | Description |
| --------------------- | --------------- | -------- | ------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `table` | string | yes | -- | Pixeltable table path (must have a primary key) |
| `inputs` | list of strings | no | all non-PK, non-computed, non-media columns | Columns to update (PK columns are always in the request body but cannot appear here) |
| `outputs` | list of strings | no | all columns | Columns to include in the response |
| `return_fileresponse` | bool | no | `false` | Return the single media-typed output as a raw file download |
| `export_sql` | nested table | no | -- | Export each request as a row into an external SQL database. See [Exporting rows to an external database](#exporting-rows-to-an-external-database). |
The request body carries the primary key values (to identify the row) plus the
values to update, as JSON fields:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl -X POST http://localhost:8000/update \
-H 'Content-Type: application/json' \
-d '{"id": 42, "prompt": "updated text"}'
# {"id": 42, "prompt": "updated text", "result": "..."}
```
If the identified row does not exist, the endpoint returns HTTP 404. Computed
columns that depend on any updated column are automatically recomputed and
appear in the response.
Media-typed columns (image, video, audio, document) cannot currently be
updated. To replace a media value, delete the row and insert a new one.
***
Delete routes use HTTP **POST**, not the HTTP DELETE method. Update client-side fetch calls accordingly.
### Delete routes
Delete rows matching the given column values and return the count of rows affected.
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[[service.routes]]
type = "delete"
table = "my_dir/my_table"
path = "/delete"
```
| Field | Type | Required | Default | Description |
| --------------- | --------------- | -------- | ------------------- | ------------------------------------- |
| `table` | string | yes | -- | Pixeltable table path |
| `match_columns` | list of strings | no | primary key columns | Columns to match on (AND-ed equality) |
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl -X POST http://localhost:8000/delete \
-H 'Content-Type: application/json' \
-d '{"id": 42}'
# {"num_rows": 1}
```
To delete by a non-primary-key column:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[[service.routes]]
type = "delete"
table = "my_dir/my_table"
path = "/delete-by-tag"
match_columns = ["tag"]
```
***
Media columns (`pxt.Image`, `pxt.Video`, `pxt.Audio`, `pxt.Document`) serialize as URL strings (e.g., `/media/path/to/file.pdf`). The client receives a string, not binary data.
### Query routes
Execute a `@pxt.query` or `retrieval_udf` function and return the results.
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[[service.routes]]
type = "query"
path = "/search"
query = "myapp.queries.search_docs"
```
| Field | Type | Required | Default | Description |
| --------------------- | ------------------ | -------- | -------------------- | -------------------------------------------------------------------------- |
| `query` | string | yes | -- | Dotted Python path to a `@pxt.query` or `retrieval_udf` function |
| `inputs` | list of strings | no | all query parameters | Parameters to accept as request fields |
| `uploadfile_inputs` | list of strings | no | -- | Parameters to accept as file uploads (not supported with `method = "get"`) |
| `one_row` | bool | no | `false` | Expect exactly one result row (404 on 0, 409 on >1) |
| `return_fileresponse` | bool | no | `false` | Return the single media-typed result as a raw file |
| `method` | `"get"` / `"post"` | no | `"post"` | HTTP method for the endpoint |
The `query` field is a dotted Python path (e.g., `"myapp.queries.search_docs"`);
the module portion is imported automatically when the route is resolved.
**Multi-row response** (default):
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl -X POST http://localhost:8000/search \
-H 'Content-Type: application/json' \
-d '{"query_text": "hello"}'
# {"rows": [{"id": 1, "text": "hello world", "score": 0.95}, ...]}
```
**Single-row lookup** (`one_row = true`):
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[[service.routes]]
type = "query"
path = "/lookup"
query = "myapp.queries.lookup_by_id"
one_row = true
method = "get"
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl 'http://localhost:8000/lookup?id=42'
# {"id": 42, "name": "Alice", "email": "alice@example.com"}
```
***
## Full example
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[[service]]
name = "image-processing-service"
port = 8080
modules = ["myapp.queries"]
# Insert an image and get back processed outputs
[[service.routes]]
type = "insert"
table = "myapp/images"
path = "/process"
inputs = ["width", "height"]
uploadfile_inputs = ["image"]
outputs = ["thumbnail", "embedding"]
# Insert with background processing for slow pipelines
[[service.routes]]
type = "insert"
table = "myapp/videos"
path = "/ingest"
background = true
# Update a row by primary key (returns the updated columns incl. computed ones)
[[service.routes]]
type = "update"
table = "myapp/images"
path = "/images/update"
inputs = ["tag"]
outputs = ["id", "tag", "thumbnail"]
# Delete by primary key
[[service.routes]]
type = "delete"
table = "myapp/images"
path = "/images/delete"
# Delete by a non-PK column
[[service.routes]]
type = "delete"
table = "myapp/images"
path = "/images/delete-by-tag"
match_columns = ["tag"]
# Search via a @pxt.query function
[[service.routes]]
type = "query"
path = "/search"
query = "myapp.queries.search_images"
# Single-row lookup via GET
[[service.routes]]
type = "query"
path = "/lookup"
query = "myapp.queries.lookup_by_id"
one_row = true
method = "get"
# Return a raw image file from a query
[[service.routes]]
type = "query"
path = "/thumbnail"
query = "myapp.queries.get_thumbnail"
return_fileresponse = true
```
## Reading Back Computed Columns After Insert
Use `return_rows=True` to get all column values (including computed columns)
directly from `insert()`, `update()`, or `batch_update()` without a follow-up
query.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Anti-pattern: insert then query
table.insert([row])
result = table.where(table.id == value).select(...).collect()
data = result[0]
# Correct: return_rows=True
status = table.insert([row], return_rows=True)
data = status.rows[0] # dict with ALL columns including computed
```
`status.rows` is a list of plain dicts. For typed access, use Pydantic's
`model_validate()` with `extra="ignore"` (row dicts contain every column;
ignore the ones you don't need):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pydantic import BaseModel
class AgentResult(BaseModel):
model_config = {"extra": "ignore"}
answer: str | None = None
tool_output: Any = None
status = agent_table.insert([{"prompt": user_input}], return_rows=True)
result = AgentResult.model_validate(status.rows[0])
```
**When to use which:**
* `return_rows=True` -- any time you insert a row with computed columns and need the results back
* `to_pydantic()` -- when reading from a `ResultSet` (after `.collect()`)
* `model_validate()` -- when reading from `status.rows` (plain dicts from `return_rows=True`)
## Background jobs
Any route can set `background = true`. The endpoint returns immediately with a
job handle:
```json theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{"id": "abc123", "job_url": "http://localhost:8000/jobs/abc123"}
```
Poll the job URL:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl http://localhost:8000/jobs/abc123
# {"status": "pending"}
# ... later ...
# {"status": "done", "result": {...}}
# or
# {"status": "error", "error": "..."}
```
`background` is mutually exclusive with `return_fileresponse`.
## CLI Reference
For the full `pxt` command-line reference (all subcommands, flags, and usage patterns), see the dedicated [CLI Reference](/platform/cli) page.
Quick examples:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve my-service --config service.toml # named service
pxt serve insert --table my_dir.my_table --path /gen --inputs prompt # single endpoint
pxt serve my-service --config service.toml --dry-run # validate without starting
```
# Working with Anthropic in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-anthropic
Call Claude models from Pixeltable for chat completions, tool calling, and structured outputs as computed columns over text and image data.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable’s Anthropic integration enables you to access Anthropic’s
Claude LLM via the Anthropic API.
### Prerequisites
* An Anthropic account with an API key
([https://docs.anthropic.com/en/api/getting-started](https://docs.anthropic.com/en/api/getting-started))
### Important notes
* Anthropic usage may incur costs based on your Anthropic plan.
* Be mindful of sensitive data and consider security measures when
integrating with external services.
First you’ll need to install required libraries and enter an Anthropic
API key.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable anthropic
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'ANTHROPIC_API_KEY' not in os.environ:
os.environ['ANTHROPIC_API_KEY'] = getpass.getpass(
'Anthropic API Key:'
)
```
Now let’s create a Pixeltable directory to hold the tables for our demo.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Remove the 'anthropic_demo' directory and its contents, if it exists
pxt.drop_dir('anthropic_demo', force=True)
pxt.create_dir('anthropic_demo')
```
Created directory 'anthropic\_demo'.
\
## Messages
Create a Table: In Pixeltable, create a table with columns to represent
your input data and the columns where you want to store the results from
Anthropic.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import anthropic
# Create a table in Pixeltable and add a computed column that calls Anthropic
t = pxt.create_table('anthropic_demo/chat', {'input': pxt.String})
msgs = [{'role': 'user', 'content': t.input}]
t.add_computed_column(
output=anthropic.messages(
messages=msgs,
model='claude-haiku-4-5-20251001',
max_tokens=300,
model_kwargs={
# Optional dict with parameters for the Anthropic API
'system': 'Respond to the prompt with detailed historical information.',
'temperature': 0.7,
},
)
)
```
Created table 'chat'.
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Parse the response into a new column
t.add_computed_column(response=t.output.content[0].text)
```
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Start a conversation
t.insert(
[
{
'input': 'What was the outcome of the 1904 US Presidential election?'
}
]
)
t.select(t.input, t.response).show()
```
Inserting rows into \`chat\`: 1 rows \[00:00, 203.87 rows/s]
Inserted 1 row with 0 errors.
### Learn More
To learn more about advanced techniques like RAG operations in
Pixeltable, check out the [RAG Operations in
Pixeltable](/howto/use-cases/rag-operations)
tutorial.
If you have any questions, don’t hesitate to reach out.
# Working with Bedrock in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-bedrock
Run Claude, Titan, and other AWS Bedrock foundation models from Pixeltable computed columns for enterprise LLM and embedding workflows.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable’s Bedrock integration enables you to access AWS Bedrock
foundation models directly from your tables.
### Prerequisites
* Activate Bedrock in your AWS account.
* Request access to your desired models (e.g. Claude Sonnet 3.7, Amazon
Nova Pro).
* Obtain a **Bedrock API Key** from the AWS console (under Bedrock >
API keys), or configure standard AWS IAM credentials.
### Important notes
* Bedrock usage may incur costs based on your Bedrock plan.
* Be mindful of sensitive data and consider security measures when
integrating with external services.
First you’ll need to install required libraries and configure your
Bedrock credentials.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable boto3
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'BEDROCK_API_KEY' not in os.environ:
os.environ['BEDROCK_API_KEY'] = getpass.getpass(
'Enter your Bedrock API Key: '
)
# Optional: set the region if your Bedrock endpoint is not in us-east-1
# os.environ['BEDROCK_REGION_NAME'] = 'us-west-2'
```
Now let’s create a Pixeltable directory to hold the tables for our demo.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Remove the `bedrock_demo` directory and its contents, if it exists
pxt.drop_dir('bedrock_demo', force=True)
pxt.create_dir('bedrock_demo')
```
Created directory 'bedrock\_demo'.
\
## Messages
Create a Table: In Pixeltable, create a table with columns to represent
your input data and the columns where you want to store the results from
Bedrock.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import bedrock
# Create a table in Pixeltable and add a computed column that calls Bedrock
t = pxt.create_table('bedrock_demo/chat', {'input': pxt.String})
t.add_computed_column(
output=bedrock.converse(
model_id='amazon.nova-pro-v1:0',
messages=[{'role': 'user', 'content': [{'text': t.input}]}],
)
)
```
Created table 'chat'.
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Parse the response into a new column
t.add_computed_column(response=t.output.output.message.content[0].text)
```
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Start a conversation
t.insert(
[
{
'input': 'What was the outcome of the 1904 US Presidential election?'
}
]
)
t.select(t.input, t.response).show()
```
Inserted 1 row with 0 errors in 2.75 s (0.36 rows/s)
### Learn more
To learn more about advanced techniques like RAG operations in
Pixeltable, check out the [RAG Operations in
Pixeltable](/howto/use-cases/rag-operations)
tutorial.
If you have any questions, don’t hesitate to reach out.
# Working with BFL FLUX in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-bfl
Generate and edit images with Black Forest Labs FLUX models from Pixeltable using text prompts, image inputs, and computed columns.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
This notebook demonstrates how to use Black Forest Labs (BFL) FLUX
models for image generation and editing through Pixeltable.
[BFL FLUX](https://docs.bfl.ai/) offers state-of-the-art text-to-image
generation with models like FLUX.2 and FLUX 1.1, featuring:
* High-fidelity image generation with accurate hands, faces, and
textures
* Multi-reference image editing
* Precise hex color control
* Typography and text rendering
## Prerequisites
1. A BFL API key from [dashboard.bfl.ai](https://dashboard.bfl.ai)
2. Pixeltable installed
## Setup
First, install Pixeltable and set up your API key:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
# Set your BFL API key - get one from https://dashboard.bfl.ai
if 'BFL_API_KEY' not in os.environ:
os.environ['BFL_API_KEY'] = getpass.getpass(
'Enter your BFL API key: '
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions import bfl
# Create a directory for our examples
pxt.drop_dir('bfl_demo', force=True)
pxt.create_dir('bfl_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'bfl\_demo'.
\
## Text-to-Image Generation
Generate images from text prompts using FLUX models. FLUX 1.1 \[pro]
offers fast, reliable results.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table for image generation
prompts_t = pxt.create_table(
'bfl_demo.prompts',
{'prompt': pxt.String, 'style': pxt.String},
if_exists='replace',
)
# Add a computed column that generates images from prompts
prompts_t.add_computed_column(
image=bfl.generate(
prompts_t.prompt, model='flux-pro-1.1', width=1024, height=1024
)
)
```
Created table 'prompts'.
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert some prompts to generate images
prompts_t.insert(
[
{
'prompt': 'A majestic mountain landscape at golden hour, photorealistic',
'style': 'landscape',
},
{
'prompt': 'A futuristic city with flying cars and neon lights, cyberpunk style',
'style': 'sci-fi',
},
]
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View the generated images
prompts_t.select(prompts_t.prompt, prompts_t.image).show()
```
## Image Editing
Edit existing images with text prompts using FLUX models. This is
powerful for:
* Changing backgrounds
* Adding or removing objects
* Style transfer
* Multi-reference editing
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table for image editing
edit_t = pxt.create_table(
'bfl_demo.edits',
{'original': pxt.Image, 'edit_prompt': pxt.String},
if_exists='replace',
)
# Add computed column for edited images
edit_t.add_computed_column(
edited=bfl.edit(
edit_t.edit_prompt, edit_t.original, model='flux-2-pro'
)
)
```
Created table 'edits'.
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert an image and edit prompt
# Replace with your own image URL or path
edit_t.insert(
original='https://images.unsplash.com/photo-1506744038136-46273834b3fb?w=1024&h=768&fit=crop',
edit_prompt='Add a dramatic sunset with vibrant orange and purple colors',
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View original and edited images
edit_t.select(edit_t.original, edit_t.edited, edit_t.edit_prompt).show()
```
## Using Seeds for Reproducibility
Use the `seed` parameter to get reproducible results:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table with seed support
seed_t = pxt.create_table(
'bfl_demo.seeded',
{'prompt': pxt.String, 'seed': pxt.Int},
if_exists='replace',
)
seed_t.add_computed_column(
image=bfl.generate(
seed_t.prompt,
model='flux-pro-1.1',
width=512,
height=512,
seed=seed_t.seed,
)
)
# Same seed = same image
seed_t.insert(
[
{'prompt': 'A red rose in a crystal vase', 'seed': 42},
{
'prompt': 'A red rose in a crystal vase',
'seed': 42,
}, # Same result
{
'prompt': 'A red rose in a crystal vase',
'seed': 123,
}, # Different result
]
)
```
Created table 'seeded'.
Added 0 column values with 0 errors.
Inserting rows into \`seeded\`: 3 rows \[00:00, 1211.18 rows/s]
Inserted 3 rows with 0 errors.
3 rows inserted, 6 values computed.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
seed_t.collect()
```
## Image Expansion (Outpainting)
Expand an image beyond its original boundaries using `bfl.expand()`.
This is perfect for:
* Making images wider or taller
* Adapting content for different aspect ratios
* Extending scenes naturally
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table for image expansion
expand_t = pxt.create_table(
'bfl_demo.expanded',
{'image': pxt.Image, 'expand_prompt': pxt.String},
if_exists='replace',
)
# Add computed column to expand images horizontally
expand_t.add_computed_column(
wide=bfl.expand(
expand_t.expand_prompt,
expand_t.image,
left=256, # Add 256 pixels to left
right=256, # Add 256 pixels to right
)
)
```
Created table 'expanded'.
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert an image to expand - using a known small image
expand_t.insert(
image='https://replicate.delivery/pbxt/HtGQBfA5TrqFYZBf0UL18NTqHrzt8UiSIsAkUuMHtjvFDO6p/overture-creations-5sI6fQgYIuo.png',
expand_prompt='Continue the room interior with similar decor and lighting',
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View original and expanded images
expand_t.select(expand_t.image, expand_t.wide).show()
```
## Inpainting with Fill
`bfl.fill()` allows you to inpaint specific regions of an image using a
mask:
* **Black areas** in the mask are preserved
* **White areas** in the mask are inpainted based on the prompt
Use cases: remove unwanted objects, replace backgrounds, edit text in
images, restore damaged areas.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table for inpainting
fill_t = pxt.create_table(
'bfl_demo.filled',
{'image': pxt.Image, 'mask': pxt.Image, 'fill_prompt': pxt.String},
if_exists='replace',
)
# Add computed column to fill masked regions
fill_t.add_computed_column(
filled=bfl.fill(
fill_t.fill_prompt,
fill_t.image,
fill_t.mask,
steps=50,
guidance=30,
)
)
```
Created table 'filled'.
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert image and mask - the white area in the mask will be inpainted
fill_t.insert(
image='https://replicate.delivery/pbxt/HtGQBfA5TrqFYZBf0UL18NTqHrzt8UiSIsAkUuMHtjvFDO6p/overture-creations-5sI6fQgYIuo.png',
mask='https://replicate.delivery/pbxt/HtGQBqO9MtVbPm0G0K43nsvvjBB0E0PaWOhuNRrRBBT4ttbf/mask.png',
fill_prompt='A fluffy golden retriever sitting on the couch',
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View original image, mask, and filled result
fill_t.select(fill_t.image, fill_t.mask, fill_t.filled).show()
```
## Best Practices
1. **Use specific prompts**: Include details about style, lighting,
composition, and subject
2. **Start with flux-pro-1.1**: Fast and reliable for most use cases
3. **Use seeds for reproducibility**: When you need consistent results
4. **Resolution**: Minimum 64x64, max 4MP (2048x2048), dimensions must
be multiples of 16
5. **Input image limits**: Max 20MP for editing/expand/fill operations
6. **Safety tolerance**: Default is 2; lower = stricter moderation (0-6
scale)
## Learn More
* [BFL Documentation](https://docs.bfl.ai/)
* [FLUX.2 Overview](https://docs.bfl.ai/flux_2/flux2_overview)
* [Text-to-Image API](https://docs.bfl.ai/flux_2/flux2_text_to_image)
* [Image Editing API](https://docs.bfl.ai/flux_2/flux2_image_editing)
* [Pricing](https://docs.bfl.ai/quick_start/pricing)
# Working with Deepseek in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-deepseek
Use DeepSeek chat and reasoning models from Pixeltable computed columns for code generation, math, structured outputs, and tool-using agents.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable’s Deepseek integration enables you to access Deepseek’s LLM
via the Deepseek API.
### Prerequisites
* A Deepseek account with an API key ([https://api-docs.deepseek.com/](https://api-docs.deepseek.com/))
### Important notes
* Deepseek usage may incur costs based on your Deepseek plan.
* Be mindful of sensitive data and consider security measures when
integrating with external services.
First you’ll need to install the required libraries and enter a Deepseek
API key. Deepseek uses the OpenAI SDK as its Python API, so we need to
install it in addition to Pixeltable.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'DEEPSEEK_API_KEY' not in os.environ:
os.environ['DEEPSEEK_API_KEY'] = getpass.getpass('Deepseek API Key:')
```
Now let’s create a Pixeltable directory to hold the tables for our demo.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Remove the 'deepseek_demo' directory and its contents, if it exists
pxt.drop_dir('deepseek_demo', force=True)
pxt.create_dir('deepseek_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'deepseek\_demo'.
\
## Messages
Create a Table: In Pixeltable, create a table with columns to represent
your input data and the columns where you want to store the results from
Deepseek.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import deepseek
# Create a table in Pixeltable and add a computed column that calls Deepseek
t = pxt.create_table('deepseek_demo/chat', {'input': pxt.String})
msgs = [{'role': 'user', 'content': t.input}]
t.add_computed_column(
output=deepseek.chat_completions(messages=msgs, model='deepseek-chat')
)
```
Created table 'chat'.
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Parse the response into a new column
t.add_computed_column(response=t.output.choices[0].message.content)
```
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Start a conversation
t.insert(
[
{
'input': 'What was the outcome of the 1904 US Presidential election?'
}
]
)
t.select(t.input, t.response).show()
```
Inserted 1 row with 0 errors in 18.72 s (0.05 rows/s)
### Learn more
To learn more about advanced techniques like RAG operations in
Pixeltable, check out the [RAG Operations in
Pixeltable](/howto/use-cases/rag-operations)
tutorial.
If you have any questions, don’t hesitate to reach out.
# Working with Microsoft Fabric
Source: https://docs.pixeltable.com/howto/providers/working-with-fabric
Read from and write to Microsoft Fabric OneLake tables and datasets directly from Pixeltable for enterprise data and ML workflows.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable’s Microsoft Fabric integration enables you to access Azure
OpenAI models within Microsoft Fabric notebook environments with
automatic authentication.
## Prerequisites
* A Microsoft Fabric workspace with access to AI services
* Running in a Microsoft Fabric notebook environment
## Important notes
* This integration only works within Microsoft Fabric notebook
environments
* Authentication is handled automatically - no API keys required
* Azure OpenAI usage in Fabric is subject to your organization’s Fabric
capacity and policies
For more information about Fabric AI services, see the [Microsoft Fabric
AI Services
documentation](https://learn.microsoft.com/en-us/fabric/data-science/ai-services/ai-services-overview).
First, install Pixeltable in your Fabric notebook:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
Now let’s create a Pixeltable directory to hold the tables for our demo.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Remove the 'fabric_demo' directory and its contents, if it exists
pxt.drop_dir('fabric_demo', force=True)
pxt.create_dir('fabric_demo')
```
## Chat Completions with Standard Models
Let’s start by using a standard chat model (gpt-4.1) for a simple Q\&A
application.
Create a table in Pixeltable with a computed column that calls Azure
OpenAI via Fabric:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import fabric
# Create a table for customer support tickets
tickets = pxt.create_table(
'fabric_demo.support_tickets',
{
'ticket_id': pxt.Int,
'customer_message': pxt.String,
'priority': pxt.String,
},
)
# Add a computed column that automatically generates AI responses
# No API keys needed - Fabric handles authentication!
messages = [
{
'role': 'system',
'content': 'You are a helpful customer support agent. Be concise and professional.',
},
{'role': 'user', 'content': tickets.customer_message},
]
tickets.add_computed_column(
ai_response=fabric.chat_completions(
messages,
model='gpt-4.1',
model_kwargs={'max_tokens': 200, 'temperature': 0.7},
)
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Parse the response to extract just the message content
tickets.add_computed_column(
response_text=tickets.ai_response.choices[0].message.content
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert data - AI responses are generated automatically
tickets.insert(
[
{
'ticket_id': 1,
'customer_message': 'How do I reset my password?',
'priority': 'low',
},
{
'ticket_id': 2,
'customer_message': "My order hasn't arrived after 2 weeks",
'priority': 'high',
},
{
'ticket_id': 3,
'customer_message': 'Can I change my subscription plan?',
'priority': 'medium',
},
]
)
# Query results with AI-generated responses
tickets.select(
tickets.ticket_id, tickets.customer_message, tickets.response_text
).show()
```
## Chat Completions with Reasoning Models
Fabric also supports reasoning models like gpt-5, which are optimized
for complex reasoning tasks.
**Note:** Reasoning models have different parameter requirements:
* Use `max_completion_tokens` instead of `max_tokens`
* Don’t support the `temperature` parameter
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table for complex reasoning tasks
reasoning_tasks = pxt.create_table(
'fabric_demo.reasoning', {'task_id': pxt.Int, 'problem': pxt.String}
)
messages = [{'role': 'user', 'content': reasoning_tasks.problem}]
reasoning_tasks.add_computed_column(
reasoning_output=fabric.chat_completions(
messages,
model='gpt-5', # Reasoning model
model_kwargs={
'max_completion_tokens': 1000 # Note: max_completion_tokens, not max_tokens
},
)
)
reasoning_tasks.add_computed_column(
solution=reasoning_tasks.reasoning_output.choices[0].message.content
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert a complex reasoning task
reasoning_tasks.insert(
[
{
'task_id': 1,
'problem': 'Explain how to implement a binary search tree with self-balancing capabilities. Include time complexity analysis.',
}
]
)
reasoning_tasks.select(
reasoning_tasks.problem, reasoning_tasks.solution
).show()
```
## Embeddings for Semantic Search
Fabric also supports embedding models for semantic search and similarity
operations.
Let’s create a knowledge base with semantic search capabilities:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a knowledge base table
knowledge_base = pxt.create_table(
'fabric_demo.knowledge',
{'doc_id': pxt.Int, 'content': pxt.String, 'category': pxt.String},
)
# Add embeddings column
knowledge_base.add_computed_column(
embedding=fabric.embeddings(
knowledge_base.content, model='text-embedding-3-small'
)
)
# Insert some documents
knowledge_base.insert(
[
{
'doc_id': 1,
'content': 'Pixeltable is a Python library for AI data workflows with built-in versioning.',
'category': 'product',
},
{
'doc_id': 2,
'content': 'Microsoft Fabric provides a unified analytics platform for data engineering and AI.',
'category': 'platform',
},
{
'doc_id': 3,
'content': 'Azure OpenAI Service offers powerful language models through REST APIs.',
'category': 'service',
},
]
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add an embedding index for fast similarity search
knowledge_base.add_embedding_index(
'content',
embedding=fabric.embeddings.using(model='text-embedding-3-small'),
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Perform similarity search
sim = knowledge_base.content.similarity('AI platform for data science')
knowledge_base.select(
knowledge_base.content, knowledge_base.category, sim=sim
).order_by(sim, asc=False).limit(2).show()
```
## Combining Chat and Embeddings: RAG Pattern
Let’s combine embeddings and chat completions to build a simple
Retrieval-Augmented Generation (RAG) system:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table for questions
questions = pxt.create_table(
'fabric_demo.questions',
{'question_id': pxt.Int, 'question': pxt.String},
)
# Find similar documents using similarity search
@pxt.query
def retrieve_context(question: str, top_k: int = 2) -> list[dict]:
sim = knowledge_base.content.similarity(question)
return (
knowledge_base.select(knowledge_base.content)
.order_by(sim, asc=False)
.limit(top_k)
.collect()['content']
)
# Add context retrieval
questions.add_computed_column(
context=retrieve_context(questions.question, top_k=2)
)
# Build RAG prompt with retrieved context
questions.add_computed_column(
rag_messages=[
{
'role': 'system',
'content': "Answer the question based on the provided context. If the context doesn't contain relevant information, say so.",
},
{
'role': 'user',
'content': f'Context: {questions.context}\n\nQuestion: {questions.question}',
},
]
)
# Generate answer using gpt-4.1
questions.add_computed_column(
answer_response=fabric.chat_completions(
questions.rag_messages,
model='gpt-4.1',
model_kwargs={'max_tokens': 300},
)
)
questions.add_computed_column(
answer=questions.answer_response.choices[0].message.content
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Ask a question
questions.insert(
[{'question_id': 1, 'question': 'What is Microsoft Fabric used for?'}]
)
questions.select(
questions.question, questions.context, questions.answer
).show()
```
## Available Models in Fabric
The following models are currently available in Microsoft Fabric:
**Chat Models:**
* `gpt-5` (reasoning model)
* `gpt-4.1`
* `gpt-4.1-mini`
**Embedding Models:**
* `text-embedding-ada-002`
* `text-embedding-3-small`
* `text-embedding-3-large`
For the latest information on available models, see the [Fabric AI
Services
documentation](https://learn.microsoft.com/en-us/fabric/data-science/ai-services/ai-services-overview).
## Key Features
* **Automatic Authentication**: No API keys required - authentication is
handled by Fabric
* **Rate Limiting**: Pixeltable automatically handles rate limiting
based on Azure OpenAI response headers
* **Batching**: Embedding requests are automatically batched for
efficiency (up to 32 inputs per request)
* **Incremental Processing**: Computed columns only run on new or
updated data
* **Versioning**: All data and transformations are automatically
versioned
### Learn More
To learn more about advanced techniques in Pixeltable:
* [RAG Operations in
Pixeltable](/howto/use-cases/rag-operations)
* [Working with
Embeddings](/platform/embedding-indexes)
* [Microsoft Fabric AI
Services](https://learn.microsoft.com/en-us/fabric/data-science/ai-services/ai-services-overview)
If you have any questions, don’t hesitate to reach out on our [Discord
community](https://discord.gg/QPyqFYx2UN).
# Working with fal.ai in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-fal
Generate images, video, and audio and run open-source ML models from Pixeltable computed columns using the Fal serverless inference platform.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable’s fal.ai integration enables you to access fal.ai’s fast
inference models via the fal.ai API.
### Prerequisites
* A fal.ai account with an API key ([https://fal.ai/dashboard/keys](https://fal.ai/dashboard/keys))
### Important notes
* fal.ai usage may incur costs based on your fal.ai plan.
* Be mindful of sensitive data and consider security measures when
integrating with external services.
First you’ll need to install required libraries and enter a fal.ai API
key.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU fal-client
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'FAL_API_KEY' not in os.environ:
os.environ['FAL_API_KEY'] = getpass.getpass('fal.ai API Key: ')
```
Now let’s create a Pixeltable directory to hold the tables for our demo.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Remove the 'fal_demo' directory and its contents, if it exists
pxt.drop_dir('fal_demo', force=True)
pxt.create_dir('fal_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'fal\_demo'.
\
## Text-to-image generation with FLUX Schnell
Let’s start by using fal.ai’s FLUX Schnell model, which is optimized for
fast image generation. We’ll create a table to store prompts and
generated images.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import fal
# Create a table for image generation
t = pxt.create_table('fal_demo/images', {'prompt': pxt.String})
# Add a computed column that calls the FLUX Schnell model
t.add_computed_column(
response=fal.run(
input={'prompt': t.prompt}, app='fal-ai/flux/schnell'
)
)
```
Created table 'images'.
Added 0 column values with 0 errors in 0.01 s
No rows affected.
Now let’s insert some prompts and see the results:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert a few prompts
t.insert(
[
{
'prompt': 'A serene mountain landscape at sunset with a crystal clear lake'
},
{
'prompt': 'A friendly robot teaching a class of kittens to code'
},
{'prompt': 'An underwater city with bioluminescent architecture'},
]
)
```
Inserted 3 rows with 0 errors in 1.77 s (1.70 rows/s)
3 rows inserted.
Let’s examine the structure of the response:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.prompt, t.response).head(1)
```
We can see that fal.ai returns a JSON response with an `images` array.
Each image has a `url` field. Let’s extract and display the images:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add a computed column to extract the image URL and convert it to an Image type
t.add_computed_column(
image=t.response['images'][0]['url'].astype(pxt.Image)
)
# Display the prompts and images
t.select(t.prompt, t.image).head()
```
Added 3 column values with 0 errors in 0.04 s (85.38 rows/s)
## Advanced image generation with Fast SDXL
fal.ai also offers Fast SDXL, which provides more control over image
generation parameters. Let’s create a new table to explore these
capabilities.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table with more parameters
sdxl_t = pxt.create_table(
'fal_demo/sdxl_images',
{
'prompt': pxt.String,
'negative_prompt': pxt.String,
'steps': pxt.Int,
},
)
# Add a computed column with more parameters
sdxl_t.add_computed_column(
response=fal.run(
input={
'prompt': sdxl_t.prompt,
'negative_prompt': sdxl_t.negative_prompt,
'image_size': 'square_hd', # 1024x1024
'num_inference_steps': sdxl_t.steps,
},
app='fal-ai/fast-sdxl',
)
)
# Extract the image
sdxl_t.add_computed_column(
image=sdxl_t.response['images'][0]['url'].astype(pxt.Image)
)
```
Created table 'sdxl\_images'.
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert prompts with different parameters
sdxl_t.insert(
[
{
'prompt': 'A majestic lion in a savanna at golden hour, photorealistic',
'negative_prompt': 'cartoon, illustration, drawing',
'steps': 25,
},
{
'prompt': 'A futuristic cityscape with flying cars and neon lights',
'negative_prompt': 'blurry, low quality',
'steps': 30,
},
]
)
```
Inserted 2 rows with 0 errors in 5.23 s (0.38 rows/s)
2 rows inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Display the results
sdxl_t.select(sdxl_t.prompt, sdxl_t.image).head()
```
## Generating multiple images per prompt
You can also generate multiple variations of the same prompt in a single
request:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table for multiple image generation
multi_t = pxt.create_table(
'fal_demo/multi_images', {'prompt': pxt.String}
)
# Generate 3 variations of each prompt
multi_t.add_computed_column(
response=fal.run(
input={'prompt': multi_t.prompt, 'num_images': 3},
app='fal-ai/flux/schnell',
)
)
# Extract the first image (you could create columns for all three)
multi_t.add_computed_column(
image_1=multi_t.response['images'][0]['url'].astype(pxt.Image)
)
multi_t.add_computed_column(
image_2=multi_t.response['images'][1]['url'].astype(pxt.Image)
)
multi_t.add_computed_column(
image_3=multi_t.response['images'][2]['url'].astype(pxt.Image)
)
```
Created table 'multi\_images'.
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert a prompt
multi_t.insert(
[{'prompt': 'A steampunk mechanical butterfly on a brass flower'}]
)
```
Inserted 1 row with 0 errors in 1.14 s (0.88 rows/s)
1 row inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Display all three variations
multi_t.select(multi_t.image_1, multi_t.image_2, multi_t.image_3).head()
```
## Using Higher Quality Models
For higher quality generation, you can use models like `fal-ai/flux/dev`
which produce better results but take more time:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table using FLUX Dev
dev_t = pxt.create_table('fal_demo/flux_dev', {'prompt': pxt.String})
# Use FLUX Dev model for higher quality
dev_t.add_computed_column(
response=fal.run(
input={'prompt': dev_t.prompt}, app='fal-ai/flux/dev'
)
)
dev_t.add_computed_column(
image=dev_t.response['images'][0]['url'].astype(pxt.Image)
)
```
Created table 'flux\_dev'.
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert a prompt (note: FLUX Dev may take longer but produces higher quality results)
dev_t.insert(
[
{
'prompt': 'A highly detailed oil painting of a wizard casting a spell in an ancient library'
}
]
)
```
Inserted 1 row with 0 errors in 1.74 s (0.58 rows/s)
1 row inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Display the result
dev_t.select(dev_t.prompt, dev_t.image).head()
```
## Exploring Available Models
fal.ai offers a wide variety of models. Here are some popular ones you
can try:
### Image Generation Models
* `fal-ai/flux/schnell` - Fast FLUX model for quick image generation
* `fal-ai/flux/dev` - Higher quality FLUX model (slower)
* `fal-ai/fast-sdxl` - Fast Stable Diffusion XL
* `fal-ai/stable-diffusion-v3-medium` - Stable Diffusion 3 Medium
### Other Models
* `fal-ai/fast-lightning-sdxl` - Ultra-fast SDXL variant
* `fal-ai/recraft-v3` - Recraft V3 for design-focused generation
To use a different model, simply change the `app` parameter in your
`fal.run()` call.
## Working with Batch Processing
Pixeltable’s computed columns make it easy to process multiple images in
batch. Let’s create a larger dataset:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a batch processing table
batch_t = pxt.create_table(
'fal_demo/batch', {'category': pxt.String, 'description': pxt.String}
)
# Create a prompt by combining category and description
batch_t.add_computed_column(
prompt=pxt.functions.string.format(
'A {} that is {}', batch_t.category, batch_t.description
)
)
# Generate images
batch_t.add_computed_column(
response=fal.run(
input={'prompt': batch_t.prompt}, app='fal-ai/flux/schnell'
)
)
batch_t.add_computed_column(
image=batch_t.response['images'][0]['url'].astype(pxt.Image)
)
```
Created table 'batch'.
Added 0 column values with 0 errors in 0.02 s
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert a batch of prompts
batch_t.insert(
[
{'category': 'landscape', 'description': 'peaceful and zen-like'},
{
'category': 'portrait',
'description': 'mysterious and ethereal',
},
{
'category': 'abstract art',
'description': 'colorful and energetic',
},
{
'category': 'architecture',
'description': 'modern and minimalist',
},
{'category': 'animal', 'description': 'cute and fluffy'},
]
)
```
Inserted 5 rows with 0 errors in 1.69 s (2.96 rows/s)
5 rows inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View all results
batch_t.select(
batch_t.category, batch_t.description, batch_t.image
).show()
```
## Tips and Best Practices
1. **Rate Limiting**: fal.ai has rate limits. Pixeltable respects these
limits by default. You can configure custom rate limits in your
Pixeltable config.
2. **Model Selection**:
* Use `flux/schnell` for fast prototyping and when speed is critical
* Use `flux/dev` when you need higher quality and can afford longer
generation times
* Use `fast-sdxl` for a good balance of speed and quality
3. **Prompt Engineering**: Good prompts lead to better results. Be
specific and descriptive.
4. **Negative Prompts**: Use negative prompts to exclude unwanted
elements from your images.
5. **Caching**: Pixeltable automatically caches results, so re-running
the same prompt won’t incur additional costs.
### Learn more
* fal.ai Documentation: [https://fal.ai/docs](https://fal.ai/docs)
* Pixeltable Documentation: [https://docs.pixeltable.com](https://docs.pixeltable.com)
* To learn more about advanced techniques like RAG operations in
Pixeltable, check out the [RAG Operations in
Pixeltable](/howto/use-cases/rag-operations)
tutorial.
If you have any questions, don’t hesitate to reach out on our [Discord
community](https://pixeltable.com/discord)!
# Working with Fireworks AI in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-fireworks
Call Llama, Mixtral, and other open-source LLMs hosted on Fireworks AI from Pixeltable computed columns for fast chat and text generation.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable’s Fireworks integration enables you to access LLMs hosted on
the Fireworks platform.
### Prerequisites
* A Fireworks account with an API key ([https://fireworks.ai/api-keys](https://fireworks.ai/api-keys))
### Important notes
* Fireworks usage may incur costs based on your Fireworks plan.
* Be mindful of sensitive data and consider security measures when
integrating with external services.
First you’ll need to install required libraries and enter a Fireworks
API key.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable fireworks-ai
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'FIREWORKS_API_KEY' not in os.environ:
os.environ['FIREWORKS_API_KEY'] = getpass.getpass(
'Fireworks API Key:'
)
```
Now let’s create a Pixeltable directory to hold the tables for our demo.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Remove the 'fireworks_demo' directory and its contents, if it exists
pxt.drop_dir('fireworks_demo', force=True)
pxt.create_dir('fireworks_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'fireworks\_demo'.
\
## Completions
Create a Table: In Pixeltable, create a table with columns to represent
your input data and the columns where you want to store the results from
Fireworks.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.fireworks import chat_completions
# Create a table in Pixeltable and pick a model hosted on Fireworks with some parameters
t = pxt.create_table('fireworks_demo/chat', {'input': pxt.String})
messages = [{'role': 'user', 'content': t.input}]
t.add_computed_column(
output=chat_completions(
messages=messages,
model='accounts/fireworks/models/gpt-oss-20b',
model_kwargs={
# Optional dict with parameters for the Fireworks API
'max_tokens': 300,
'top_k': 40,
'top_p': 0.9,
'temperature': 0.7,
},
)
)
```
Created table 'chat'.
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Parse the bot_response into a new column
t.add_computed_column(response=t.output.choices[0].message.content)
```
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Start a conversation
t.insert(
[{'input': 'Can you tell me who was President of the US in 1961?'}]
)
t.select(t.input, t.response).show()
```
Inserted 1 row with 0 errors in 2.15 s (0.47 rows/s)
### Learn more
To learn more about advanced techniques like RAG operations in
Pixeltable, check out the [RAG Operations in
Pixeltable](/howto/use-cases/rag-operations)
tutorial.
If you have any questions, don’t hesitate to reach out.
# Working with Gemini in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-gemini
Use Google Gemini multimodal models from Pixeltable for chat, vision, structured outputs, and embeddings over text, image, and video columns.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable’s Gemini integration supports two authentication methods:
* Google AI Studio: requires an API key from
[https://aistudio.google.com/app/apikey](https://aistudio.google.com/app/apikey)
* Vertex AI: uses Application Default Credentials via the Google Cloud
SDK
### Prerequisites
For Google AI Studio:
* A Google AI Studio account with an API key
([https://aistudio.google.com/app/apikey](https://aistudio.google.com/app/apikey))
For Vertex AI:
* A Google Cloud project with the Vertex AI API enabled
* The Google Cloud SDK installed and configured
(`gcloud auth application-default login`)
### Important notes
* Usage may incur costs based on your plan.
* Be mindful of sensitive data and consider security measures when
integrating with external services.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable google-genai
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import os
vertex_enabled = (
os.environ.get('GOOGLE_GENAI_USE_VERTEXAI', '').lower() == 'true'
)
# Option 1: Google AI Studio (API key)
# Set GEMINI_API_KEY or GOOGLE_API_KEY in your environment,
# or add api_key to the [gemini] section of $PIXELTABLE_HOME/config.toml
if (
not vertex_enabled
and 'GEMINI_API_KEY' not in os.environ
and 'GOOGLE_API_KEY' not in os.environ
):
import getpass
os.environ['GEMINI_API_KEY'] = getpass.getpass(
'Google AI Studio API Key:'
)
# Option 2: Vertex AI (Application Default Credentials)
# Uncomment and set the following environment variables to use Vertex AI instead:
# os.environ['GOOGLE_GENAI_USE_VERTEXAI'] = 'true'
# os.environ['GOOGLE_CLOUD_PROJECT'] = 'your-project-id'
# os.environ['GOOGLE_CLOUD_LOCATION'] = 'us-central1' # optional, defaults to us-central1
# Then authenticate via: gcloud auth application-default login
```
Now let’s create a Pixeltable directory to hold the tables for our demo.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Remove the 'gemini_demo' directory and its contents, if it exists
pxt.drop_dir('gemini_demo', force=True)
pxt.create_dir('gemini_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'gemini\_demo'.
\
## Generate content
Create a Table: In Pixeltable, create a table with columns to represent
your input data and the columns where you want to store the results from
Gemini.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from google.genai.types import GenerateContentConfigDict
from pixeltable.functions import gemini
# Create a table in Pixeltable and pick a model hosted on Google AI Studio with some parameters
t = pxt.create_table('gemini_demo/text', {'input': pxt.String})
config = GenerateContentConfigDict(
max_output_tokens=300, temperature=1.0, top_p=0.95, top_k=40
)
t.add_computed_column(
output=gemini.generate_content(
t.input, model='gemini-2.5-flash', config=config
)
)
```
Created table 'text'.
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Ask Gemini to generate some content based on the input
t.insert(
[
{'input': 'Write a story about a magic backpack.'},
{'input': 'Tell me a science joke.'},
]
)
```
Inserted 2 rows with 0 errors in 4.96 s (0.40 rows/s)
2 rows inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Parse the response into a new column
t.add_computed_column(
response=t.output['candidates'][0]['content']['parts'][0]['text']
)
t.select(t.input, t.response).head()
```
## Generate images with Imagen
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from google.genai.types import GenerateImagesConfigDict
from pixeltable.functions import gemini
images_t = pxt.create_table('gemini_demo/images', {'prompt': pxt.String})
config = GenerateImagesConfigDict(aspect_ratio='16:9')
images_t.add_computed_column(
generated_image=gemini.generate_images(
images_t.prompt, model='imagen-4.0-generate-001', config=config
)
)
```
Created table 'images'.
Added 0 column values with 0 errors in 0.02 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
images_t.insert(
[{'prompt': 'A friendly dinosaur playing tennis in a cornfield'}]
)
```
Inserted 1 row with 0 errors in 12.31 s (0.08 rows/s)
1 row inserted.
Created table 'videos'.
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
videos_t.insert(
[
{
'prompt': 'A giant pixel floating over the open ocean in a sea of data'
}
]
)
```
Inserted 1 row with 0 errors in 40.06 s (0.02 rows/s)
1 row inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
videos_t.head()
```
## Generate Video from an existing Image
We’ll add an additional computed column to our existing `images_t` to
animate the generated images.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
images_t.add_computed_column(
generated_video=gemini.generate_videos(
image=images_t.generated_image, model='veo-2.0-generate-001'
)
)
```
Added 1 column value with 0 errors in 52.22 s (0.02 rows/s)
1 row updated.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
images_t.head()
```
### Learn more
To learn more about advanced techniques like RAG operations in
Pixeltable, check out the [RAG Operations in
Pixeltable](/howto/use-cases/rag-operations)
tutorial.
If you have any questions, don’t hesitate to reach out.
# Working with Groq in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-groq
Run low-latency Llama, Mixtral, and other open LLMs hosted on Groq from Pixeltable computed columns for fast chat and streaming generation.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable’s Groq integration enables you to access Groq models via the
Groq API.
### Prerequisites
* A Groq account with an API key
([https://console.groq.com/docs/quickstart](https://console.groq.com/docs/quickstart))
### Important notes
* Groq usage may incur costs based on your Groq plan.
* Be mindful of sensitive data and consider security measures when
integrating with external services.
First you’ll need to install required libraries and enter your OpenAI
API key.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable groq
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'GROQ_API_KEY' not in os.environ:
os.environ['GROQ_API_KEY'] = getpass.getpass(
'Enter your Groq API key:'
)
```
Now let’s create a Pixeltable directory to hold the tables for our demo.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Remove the 'groq_demo' directory and its contents, if it exists
pxt.drop_dir('groq_demo', force=True)
pxt.create_dir('groq_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'groq\_demo'.
\
## Chat Completions
Create a Table: In Pixeltable, create a table with columns to represent
your input data and the columns where you want to store the results from
Groq.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import groq
# Create a table in Pixeltable and add a computed column that calls OpenAI
t = pxt.create_table('groq_demo/chat', {'input': pxt.String})
messages = [{'role': 'user', 'content': t.input}]
t.add_computed_column(
output=groq.chat_completions(
messages=messages,
model='llama-3.3-70b-versatile',
model_kwargs={
# Optional dict with parameters for the Groq API
'max_tokens': 300,
'top_p': 0.9,
'temperature': 0.7,
},
)
)
```
Created table 'chat'.
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Parse the response into a new column
t.add_computed_column(response=t.output.choices[0].message.content)
```
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Start a conversation
t.insert(
[{'input': 'How many islands are in the Aleutian island chain?'}]
)
t.select(t.input, t.response).head()
```
Inserted 1 row with 0 errors in 1.16 s (0.86 rows/s)
### Learn more
To learn more about advanced techniques like RAG operations in
Pixeltable, check out the [RAG Operations in
Pixeltable](/howto/use-cases/rag-operations)
tutorial.
If you have any questions, don’t hesitate to reach out.
# Working with Hugging Face
Source: https://docs.pixeltable.com/howto/providers/working-with-hugging-face
Run Hugging Face transformer, vision, and sentence-embedding models locally inside Pixeltable computed columns over text, image, and audio.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable provides seamless integration with Hugging Face datasets and
models. This tutorial covers:
* Importing datasets directly into Pixeltable tables
* Working with dataset splits (train/test/validation)
* Streaming large datasets with `IterableDataset`
* Type mappings from Hugging Face to Pixeltable
* Using Hugging Face models for embeddings
## Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable datasets torch transformers sentence-transformers
```
## Import a Hugging Face Dataset
Use `pxt.create_table()` with the `source=` parameter to import a
Hugging Face dataset directly. Pixeltable automatically maps Hugging
Face feature types to Pixeltable column types.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import datasets
import pixeltable as pxt
pxt.drop_dir('hf_demo', force=True)
pxt.create_dir('hf_demo')
# Load a dataset with images
padoru = datasets.load_dataset(
'not-lain/padoru', split='train'
).select_columns(['Image', 'ImageSize', 'Name', 'ImageSource'])
# Import into Pixeltable
images = pxt.create_table('hf_demo/images', source=padoru)
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'hf\_demo'.
Created table 'images'.
Inserting rows into \`images\`: 100 rows \[00:00, 310.24 rows/s]
Inserting rows into \`images\`: 100 rows \[00:00, 353.22 rows/s]
Inserting rows into \`images\`: 100 rows \[00:00, 368.40 rows/s]
Inserting rows into \`images\`: 82 rows \[00:00, 567.89 rows/s]
Inserted 382 rows with 0 errors.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
images.head(3)
```
## Working with Dataset Splits
When importing a `DatasetDict` (which contains multiple splits like
train/test), use `extra_args={'column_name_for_split': 'split'}` to
preserve split information in a column.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Load a dataset with multiple splits
imdb = datasets.load_dataset('stanfordnlp/imdb')
# Import all splits, storing split info in 'split' column
reviews = pxt.create_table(
'hf_demo/reviews',
source=imdb,
extra_args={'column_name_for_split': 'split'},
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Query by split
reviews.where(reviews.split == 'train').limit(3).select(
reviews.text, reviews.label, reviews.split
).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Count rows per split
reviews.group_by(reviews.split).select(
reviews.split, count=pxt.functions.count(reviews.text)
).collect()
```
Using `schema_overrides` for Embeddings
When importing datasets with pre-computed embeddings (common in RAG),
use `schema_overrides` to specify the exact array shape:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Wikipedia with pre-computed embeddings - specify array shape
wiki_ds = (
datasets.load_dataset(
'Cohere/wikipedia-2023-11-embed-multilingual-v3',
'simple',
split='train',
streaming=True,
)
.select_columns(['url', 'title', 'text', 'emb'])
.take(50)
)
wiki = pxt.create_table(
'hf_demo/wiki_embeddings',
source=wiki_ds,
schema_overrides={'emb': pxt.Array[(1024,), pxt.Float]},
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
wiki.select(wiki.title, wiki.emb).limit(2).collect()
```
## Streaming Large Datasets
For very large datasets, use `streaming=True` to filter and sample
before importing:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Stream, filter, and sample before importing
streaming_ds = datasets.load_dataset(
'stanfordnlp/imdb', split='train', streaming=True
)
positive_stream = streaming_ds.filter(lambda x: x['label'] == 1).take(50)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
positive_samples = pxt.create_table(
'hf_demo/positive_samples', source=positive_stream
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
positive_samples.select(
positive_samples.text, positive_samples.label
).limit(2).collect()
```
## Importing Audio Datasets
Audio datasets work seamlessly - Pixeltable stores audio files locally:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Import a small audio dataset
audio_ds = datasets.load_dataset(
'hf-internal-testing/librispeech_asr_dummy',
'clean',
split='validation',
)
audio_table = pxt.create_table('hf_demo/audio_samples', source=audio_ds)
audio_table.select(audio_table.audio, audio_table.text).limit(2).collect()
```
Created table 'audio\_samples'.
Inserting rows into \`audio\_samples\`: 73 rows \[00:00, 3960.27 rows/s]
Inserted 73 rows with 0 errors.
## Inserting More Data
Use `table.insert()` to add more data from a HuggingFace dataset to an
existing table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert more data from the same or similar dataset
more_audio = datasets.load_dataset(
'hf-internal-testing/librispeech_asr_dummy',
'clean',
split='validation',
).select(range(5))
audio_table.insert(more_audio)
audio_table.count()
```
Inserting rows into \`audio\_samples\`: 5 rows \[00:00, 3186.68 rows/s]
Inserted 5 rows with 0 errors.
78
## Type Mappings Reference
## Using Hugging Face Models
Pixeltable integrates with Hugging Face models for embeddings and
inference, running locally without API keys.
### Image Embeddings with CLIP
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import clip
# Add CLIP embedding index for cross-modal image search
images.add_embedding_index(
'Image', embedding=clip.using(model_id='openai/clip-vit-base-patch32')
)
# Search images using text
sim = images.Image.similarity(string='anime character with red clothes')
images.order_by(sim, asc=False).limit(3).select(
images.Image, images.Name, sim=sim
).collect()
```
### Text Embeddings with Sentence Transformers
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import sentence_transformer
# Create table with text embedding index
sample_reviews = pxt.create_table(
'hf_demo/sample_reviews',
source=datasets.load_dataset('stanfordnlp/imdb', split='test').select(
range(100)
),
)
sample_reviews.add_embedding_index(
'text',
string_embed=sentence_transformer.using(model_id='all-MiniLM-L6-v2'),
)
# Semantic search
query = 'great acting and cinematography'
sim = sample_reviews.text.similarity(string=query)
sample_reviews.order_by(sim, asc=False).limit(3).select(
sample_reviews.text, sim=sim
).collect()
```
Created table 'sample\_reviews'.
Inserting rows into \`sample\_reviews\`: 100 rows \[00:00, 21625.70 rows/s]
Inserted 100 rows with 0 errors.
### More Hugging Face Models
Pixeltable supports many more HuggingFace models including:
* **ASR**: `automatic_speech_recognition()` - transcribe audio
* **Translation**: `translation()` - translate between languages
* **Text Generation**: `text_generation()` - generate text completions
* **Image Classification**: `vit_for_image_classification()` - classify
images
* **Object Detection**: `detr_for_object_detection()` - detect objects
in images
See the SDK reference below for the complete list.
## See Also
* [HuggingFace SDK
Reference](/sdk/latest/huggingface) - Full
list of models: ASR, translation, text generation, image
classification, etc.
* [Working with embedding indexes](../../platform/embedding-indexes) -
Index configuration
# Working with Jina AI in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-jina
Use Jina AI embedding and reranker models from Pixeltable to build multilingual semantic search and RAG indexes over text and images.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable’s Jina AI integration enables you to access state-of-the-art
embedding and reranker models via the Jina AI API.
### Prerequisites
* A Jina AI account with an API key ([https://jina.ai/](https://jina.ai/))
### Important notes
* Jina AI usage may incur costs based on your Jina AI plan.
* Be mindful of sensitive data and consider security measures when
integrating with external services.
First you’ll need to install Pixeltable and set up your Jina AI API key.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'JINA_API_KEY' not in os.environ:
os.environ['JINA_API_KEY'] = getpass.getpass(
'Enter your Jina AI API key: '
)
```
Now let’s create a Pixeltable directory to hold the tables for our demo.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Remove the 'jina_demo' directory and its contents, if it exists
pxt.drop_dir('jina_demo', force=True)
pxt.create_dir('jina_demo')
```
Created directory 'jina\_demo'.
\
## Text Embeddings
Jina AI provides frontier multilingual embedding models for semantic
search and RAG applications. The `jina-embeddings-v3` model supports 89+
languages and achieves state-of-the-art performance.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import jina
# Create a table for document embeddings
docs_t = pxt.create_table('jina_demo.documents', {'text': pxt.String})
# Add computed column with Jina embeddings
# task='retrieval.passage' optimizes embeddings for documents to be searched
docs_t.add_computed_column(
embedding=jina.embeddings(
docs_t.text, model='jina-embeddings-v3', task='retrieval.passage'
)
)
```
Created table 'documents'.
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert some sample documents
documents = [
'The Mediterranean diet emphasizes fish, olive oil, and vegetables, believed to reduce chronic diseases.',
'Photosynthesis in plants converts light energy into glucose and produces essential oxygen.',
'20th-century innovations, from radios to smartphones, centered on electronic advancements.',
'Rivers provide water, irrigation, and habitat for aquatic species, vital for ecosystems.',
"Apple's conference call to discuss fourth fiscal quarter results is scheduled for Thursday, November 2, 2023.",
"Shakespeare's works, like 'Hamlet' and 'A Midsummer Night's Dream,' endure in literature.",
]
docs_t.insert({'text': doc} for doc in documents)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View the embeddings
docs_t.select(docs_t.text, docs_t.embedding).head(3)
```
## Multilingual Embeddings
Jina AI models excel at multilingual text. The same model can embed text
in different languages into the same semantic space.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table for multilingual content
multilingual_t = pxt.create_table(
'jina_demo.multilingual', {'text': pxt.String, 'language': pxt.String}
)
multilingual_t.add_computed_column(
embedding=jina.embeddings(
multilingual_t.text,
model='jina-embeddings-v3',
task='text-matching',
)
)
# Insert texts in different languages (all about organic skincare)
multilingual_t.insert(
[
{
'text': 'Organic skincare for sensitive skin with aloe vera and chamomile.',
'language': 'English',
},
{
'text': 'Bio-Hautpflege für empfindliche Haut mit Aloe Vera und Kamille.',
'language': 'German',
},
{
'text': 'Cuidado de la piel orgánico para piel sensible con aloe vera y manzanilla.',
'language': 'Spanish',
},
{
'text': '针对敏感肌专门设计的天然有机护肤产品',
'language': 'Chinese',
},
]
)
multilingual_t.select(
multilingual_t.language, multilingual_t.text
).collect()
```
Created table 'multilingual'.
Added 0 column values with 0 errors.
Inserting rows into \`multilingual\`: 4 rows \[00:00, 736.23 rows/s]
Inserted 4 rows with 0 errors.
## Embedding Index for Similarity Search
You can use Jina AI embeddings with Pixeltable’s embedding index for
efficient similarity search.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table with an embedding index
search_t = pxt.create_table('jina_demo.search', {'text': pxt.String})
# Add embedding index for similarity search
embed_fn = jina.embeddings.using(
model='jina-embeddings-v3', task='retrieval.passage'
)
search_t.add_embedding_index('text', string_embed=embed_fn)
# Insert documents
search_t.insert({'text': doc} for doc in documents)
```
Created table 'search'.
Inserting rows into \`search\`: 6 rows \[00:00, 565.03 rows/s]
Inserted 6 rows with 0 errors.
6 rows inserted, 12 values computed.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Perform similarity search
sim = search_t.text.similarity(
string='What are the health benefits of Mediterranean food?'
)
search_t.order_by(sim, asc=False).limit(3).select(
search_t.text, score=sim
).collect()
```
## Reranking
Jina AI’s reranker models can improve search relevance by reordering
results based on semantic similarity to the query.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table for reranking queries
rerank_t = pxt.create_table(
'jina_demo.rerank',
{'query': pxt.String, 'documents': pxt.Json},
if_exists='replace',
)
# Add computed column for reranking
rerank_t.add_computed_column(
reranked=jina.rerank(
rerank_t.query,
rerank_t.documents,
model='jina-reranker-v2-base-multilingual',
top_n=3,
return_documents=True,
)
)
# Insert a query with candidate documents
rerank_t.insert(
query="When is Apple's conference call scheduled?",
documents=documents,
)
```
Created table 'rerank'.
Added 0 column values with 0 errors.
Inserting rows into \`rerank\`: 1 rows \[00:00, 543.16 rows/s]
Inserted 1 row with 0 errors.
1 row inserted, 2 values computed.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View the reranked results
result = rerank_t.select(rerank_t.reranked).collect()
result['reranked'][0]
```
\{'usage': \{'total\_tokens': 221},
'results': \[\{'index': 4,
'document': "Apple's conference call to discuss fourth fiscal quarter results is scheduled for Thursday, November 2, 2023.",
'relevance\_score': 0.64511991},
\{'index': 2,
'document': '20th-century innovations, from radios to smartphones, centered on electronic advancements.',
'relevance\_score': 0.03846619},
\{'index': 5,
'document': "Shakespeare's works, like 'Hamlet' and 'A Midsummer Night's Dream,' endure in literature.",
'relevance\_score': 0.02517884}]}
## Learn More
* [Jina AI Documentation](https://jina.ai/)
* [Jina Embeddings](https://jina.ai/embeddings/)
* [Jina Reranker](https://jina.ai/reranker/)
* [API Rate Limits](https://jina.ai/api-dashboard/rate-limit)
# Working with llama.cpp in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-llama-cpp
Run GGUF quantized LLMs locally with llama.cpp from Pixeltable computed columns for offline chat, generation, and embedding workflows.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
This tutorial demonstrates how to use Pixeltable’s built-in `llama.cpp`
integration to run local LLMs efficiently.
### Important notes
* Models are automatically downloaded from Hugging Face and cached
locally
* Different quantization levels are available for performance/quality
tradeoffs
* Consider memory usage when choosing models and quantizations
## Set up environment
First, let’s install Pixeltable with llama.cpp support:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable llama-cpp-python huggingface-hub
```
## Create a table for chat completions
Now let’s create a table that will contain our inputs and responses.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions import llama_cpp
pxt.drop_dir('llama_demo', force=True)
pxt.create_dir('llama_demo')
t = pxt.create_table('llama_demo/chat', {'input': pxt.String})
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'llama\_demo'.
Created table 'chat'.
Next, we add a computed column that calls the Pixeltable
`create_chat_completion` UDF, which adapts the corresponding llama.cpp
API call. In our examples, we’ll use pretrained models from the Hugging
Face repository. llama.cpp makes it easy to do this by specifying a
repo\_id (from the URL of the model) and filename from the model repo;
the model will then be downloaded and cached automatically.
(If this is your first time using Pixeltable, the
Pixeltable
Fundamentals tutorial contains more details about table creation,
computed columns, and UDFs.)
For this demo we’ll use `Qwen2.5-0.5B`, a very small (0.5-billion
parameter) model that still produces decent results. We’ll use `Q5_K_M`
(5-bit) quantization, which gives an excellent balance of quality and
efficiency.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add a computed column that uses llama.cpp for chat completion
# against the input.
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': t.input},
]
t.add_computed_column(
result=llama_cpp.create_chat_completion(
messages,
repo_id='Qwen/Qwen2.5-0.5B-Instruct-GGUF',
repo_filename='*q5_k_m.gguf',
)
)
# Extract the output content from the JSON structure returned
# by llama_cpp.
t.add_computed_column(output=t.result.choices[0].message.content)
```
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
## Test chat completion
Let’s try a simple query:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Test with a simple question
t.insert(
[
{'input': 'What is the capital of France?'},
{'input': 'What are some edible species of fish?'},
{'input': 'Who are the most prominent classical composers?'},
]
)
```
Inserted 3 rows with 0 errors in 6.74 s (0.44 rows/s)
3 rows inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.input, t.output).collect()
```
## Comparing models
Local model frameworks like `llama.cpp` make it easy to compare the
output of different models. Let’s try comparing the output from `Qwen`
against a somewhat larger model, `Llama-3.2-1B`. As always, when we add
a new computed column to our table, it’s automatically evaluated against
the existing table rows.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(
result_l3=llama_cpp.create_chat_completion(
messages,
repo_id='bartowski/Llama-3.2-1B-Instruct-GGUF',
repo_filename='*Q5_K_M.gguf',
)
)
t.add_computed_column(output_l3=t.result_l3.choices[0].message.content)
t.select(t.input, t.output, t.output_l3).collect()
```
Added 3 column values with 0 errors in 6.32 s (0.47 rows/s)
Added 3 column values with 0 errors in 0.03 s (113.79 rows/s)
Just for fun, let’s try running against a different system prompt with a
different persona.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
messages_teacher = [
{
'role': 'system',
'content': 'You are a patient school teacher. '
'Explain concepts simply and clearly.',
},
{'role': 'user', 'content': t.input},
]
t.add_computed_column(
result_teacher=llama_cpp.create_chat_completion(
messages_teacher,
repo_id='bartowski/Llama-3.2-1B-Instruct-GGUF',
repo_filename='*Q5_K_M.gguf',
)
)
t.add_computed_column(
output_teacher=t.result_teacher.choices[0].message.content
)
t.select(t.input, t.output_teacher).collect()
```
Added 3 column values with 0 errors in 7.70 s (0.39 rows/s)
Added 3 column values with 0 errors in 0.02 s (143.54 rows/s)
## Additional Resources
* [Pixeltable Documentation](https:/docs.pixeltable.com/)
* [llama.cpp GitHub](https://github.com/ggerganov/llama.cpp)
# Working with Mistral AI in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-mistralai
Use Mistral chat, embedding, and code models from Pixeltable computed columns for European data residency and open-weight LLM pipelines.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable’s Mistral AI integration enables you to access Mistral’s LLM
and other models via the Mistral AI API.
### Prerequisites
* A Mistral AI account with an API key
([https://console.mistral.ai/api-keys/](https://console.mistral.ai/api-keys/))
### Important notes
* Mistral AI usage may incur costs based on your Mistral AI plan.
* Be mindful of sensitive data and consider security measures when
integrating with external services.
First you’ll need to install required libraries and enter a Mistral AI
API key.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable mistralai
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'MISTRAL_API_KEY' not in os.environ:
os.environ['MISTRAL_API_KEY'] = getpass.getpass('Mistral AI API Key:')
```
Now let’s create a Pixeltable directory to hold the tables for our demo.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Remove the 'mistralai_demo' directory and its contents, if it exists
pxt.drop_dir('mistralai_demo', force=True)
pxt.create_dir('mistralai_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'mistralai\_demo'.
\
## Messages
Create a Table: In Pixeltable, create a table with columns to represent
your input data and the columns where you want to store the results from
Mistral.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.mistralai import chat_completions
# Create a table in Pixeltable and add a computed column that calls Mistral AI
t = pxt.create_table('mistralai_demo/chat', {'input': pxt.String})
messages = [{'role': 'user', 'content': t.input}]
t.add_computed_column(
output=chat_completions(
messages=messages,
model='mistral-small-latest',
model_kwargs={
# Optional dict with parameters for the Mistral API
'max_tokens': 300,
'top_p': 0.9,
'temperature': 0.7,
},
)
)
```
Created table 'chat'.
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Parse the response into a new column
t.add_computed_column(response=t.output.choices[0].message.content)
```
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Start a conversation
t.insert(
[
{
'input': 'What three species of fish have the highest mercury content?'
}
]
)
t.select(t.input, t.response).show()
```
Inserted 1 row with 0 errors in 2.31 s (0.43 rows/s)
### Learn more
To learn more about advanced techniques like RAG operations in
Pixeltable, check out the [RAG Operations in
Pixeltable](/howto/use-cases/rag-operations)
tutorial.
If you have any questions, don’t hesitate to reach out.
# Working with Ollama in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-ollama
Run Llama, Mistral, and other open LLMs locally with Ollama from Pixeltable computed columns for private, offline chat and generation.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Ollama is a popular platform for local serving of LLMs. In this
tutorial, we’ll show how to integrate Ollama models into a Pixeltable
workflow.
## Install Ollama
You’ll need to have an Ollama server instance to query. There are
several ways to do this.
### Running on a local machine
If you’re running this notebook on your own machine, running Windows,
Mac OS, or Linux, you can install Ollama at: [https://ollama.com/download](https://ollama.com/download)
### Running on Google Colab
* OR, if you’re running on Colab, you can install Ollama by uncommenting
and running the following code.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# To install Ollama on colab, uncomment and run the following
# three lines (this will also work on a local Linux machine
# if you don't already have Ollama installed).
# !curl -fsSL https://ollama.com/install.sh | sh
# import subprocess
# ollama_process = subprocess.Popen(['ollama', 'serve'], stderr=subprocess.PIPE)
```
### Running on a remote Ollama server
* OR, if you have access to an Ollama server running remotely, you can
uncomment and run the following line, replacing the default URL with
the URL of your remote Ollama instance.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# To run the notebook against an instance of Ollama running on a
# remote server, uncomment the following line and specify the URL.
# os.environs['OLLAMA_HOST'] = 'https://127.0.0.1:11434'
```
Once you’ve completed the installation, run the following commands to
verify that it’s been successfully installed. This may result in an LLM
being downloaded, so it may take some time.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU ollama
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import ollama
ollama.pull('qwen3:0.6b')
ollama.generate('qwen3:0.6b', 'What is the capital of Missouri?')[
'response'
]
```
'The capital of Missouri is Jefferson City. Jefferson City was originally named after the French explorer Pierre-Jacques Houget and the American statesman Thomas Jefferson, who lived in this city from 1764 to 1805. It became the seat of government for most of Jefferson County when it was established in 1836. In more recent times, the name has changed several times due to various political changes and legal changes.'
## Install Pixeltable
Now, let’s install Pixeltable and create a table for the demo.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.ollama import chat
pxt.drop_dir('ollama_demo', force=True)
pxt.create_dir('ollama_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'ollama\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('ollama_demo/chat', {'input': pxt.String})
messages = [{'role': 'user', 'content': t.input}]
# Add a computed column that runs the model to generate responses
t.add_computed_column(
output=chat(
messages=messages,
model='qwen3:0.6b',
# These parameters are optional and can be used to tune model behavior:
options={'max_tokens': 300, 'top_p': 0.9, 'temperature': 0.5},
)
)
```
Created table 'chat'.
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Extract the message content into a separate column
t.add_computed_column(response=t.output.message.content)
```
Added 0 column values with 0 errors in 0.01 s
No rows affected.
We can insert our input prompts into the table now. As always,
Pixeltable automatically updates the computed columns by calling the
relevant Ollama endpoint.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Start a conversation
t.insert(input='What are the most popular services for LLM inference?')
t.select(t.input, t.response).show()
```
Inserted 1 row with 0 errors in 1.28 s (0.78 rows/s)
### Learn More
To learn more about advanced techniques like RAG operations in
Pixeltable, check out the [RAG Operations in
Pixeltable](/howto/use-cases/rag-operations)
tutorial.
If you have any questions, don’t hesitate to reach out.
# Working with OpenAI in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-openai
Call GPT-4, embeddings, DALL-E, and Whisper from Pixeltable computed columns for chat, vision, image generation, and speech workflows.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable’s OpenAI integration enables you to access OpenAI models via
the OpenAI API.
### Prerequisites
* An OpenAI account with an API key
([https://openai.com/index/openai-api/](https://openai.com/index/openai-api/))
### Important notes
* OpenAI usage may incur costs based on your OpenAI plan.
* Be mindful of sensitive data and consider security measures when
integrating with external services.
First you’ll need to install required libraries and enter your OpenAI
API key.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass(
'Enter your OpenAI API key:'
)
```
Now let’s create a Pixeltable directory to hold the tables for our demo.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Remove the 'openai_demo' directory and its contents, if it exists
pxt.drop_dir('openai_demo', force=True)
pxt.create_dir('openai_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Pixeltable dashboard available at: [http://localhost:22089](http://localhost:22089)
Created directory 'openai\_demo'.
\
## Chat completions
Create a Table: In Pixeltable, create a table with columns to represent
your input data and the columns where you want to store the results from
OpenAI.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import openai
# Create a table in Pixeltable and add a computed column that calls OpenAI
t = pxt.create_table('openai_demo/chat', {'input': pxt.String})
messages = [{'role': 'user', 'content': t.input}]
t.add_computed_column(
output=openai.chat_completions(
messages=messages,
model='gpt-4o-mini',
model_kwargs={
# Optional dict with parameters for the OpenAI API
'max_tokens': 300,
'top_p': 0.9,
'temperature': 0.7,
},
)
)
```
Created table 'chat'.
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Parse the response into a new column
t.add_computed_column(response=t.output.choices[0].message.content)
```
Added 0 column values with 0 errors in 0.02 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Start a conversation
t.insert(
[{'input': 'How many islands are in the Aleutian island chain?'}]
)
t.select(t.input, t.response).head()
```
Inserted 1 row with 0 errors in 2.17 s (0.46 rows/s)
## Responses API
The [Responses
API](https://developers.openai.com/api/docs/api-reference/responses) is
OpenAI’s recommended API for new projects. It offers built-in tools (web
search, file search, code interpreter), improved reasoning model
performance, lower costs, and cleaner semantics compared to Chat
Completions.
Key benefits:
* **`instructions` parameter**: Cleanly separates system-level guidance
from user input
* **`output_text` field**: Convenient access to the response text
* **Built-in tools**: Use `web_search`, `file_search`,
`code_interpreter` natively
* **Multi-turn via `previous_response_id`**: Simpler conversation state
management
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table and add a computed column using the Responses API
resp_t = pxt.create_table('openai_demo/responses', {'input': pxt.String})
messages = [{'role': 'user', 'content': resp_t.input}]
resp_t.add_computed_column(
output=openai.responses(
input=messages,
model='gpt-4o-mini',
model_kwargs={
'instructions': 'You are a helpful assistant. Be concise.',
'temperature': 0.7,
'max_output_tokens': 300,
},
)
)
# The Responses API provides a convenient 'output_text' field
resp_t.add_computed_column(response=resp_t.output.output_text)
```
Created table 'responses'.
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert data and view results
resp_t.insert([{'input': 'What are the three largest moons of Jupiter?'}])
resp_t.select(resp_t.input, resp_t.response).head()
```
Inserted 1 row with 0 errors in 1.22 s (0.82 rows/s)
## Embeddings
Note: OpenAI Embeddings API is not available with free tier API keys
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
emb_t = pxt.create_table('openai_demo/embeddings', {'input': pxt.String})
emb_t.add_computed_column(
embedding=openai.embeddings(
input=emb_t.input, model='text-embedding-3-small'
)
)
```
Created table 'embeddings'.
Added 0 column values with 0 errors in 0.00 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
emb_t.insert(
[{'input': 'OpenAI provides a variety of embeddings models.'}]
)
```
Inserted 1 row with 0 errors in 1.03 s (0.97 rows/s)
1 row inserted.
Created table 'images'.
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
image_t.insert(
[
{
'input': 'A giant Pixel floating in the open ocean in a sea of data'
}
]
)
```
Inserted 1 row with 0 errors in 12.21 s (0.08 rows/s)
1 row inserted.
'Allow me to illustrate. During the last 60 days, I have been at the task of constructing an administration. It has been a long and deliberate process. Some have counseled greater speed. Others have counseled more expedient tests. But I have been guided by the standard John Winthrop set before his shipmates on the flagship Arabella 331 years ago, as they too faced the task of building a new government on a perilous frontier. We must always consider, he said, that we shall be as a city upon a hill. The eyes of all peoples are upon us. Today the eyes of all people are truly upon us. And our governments, in every branch, at every level,'
### Learn more
To learn more about advanced techniques like RAG operations in
Pixeltable, check out the [RAG Operations in
Pixeltable](/howto/use-cases/rag-operations)
tutorial.
If you have any questions, don’t hesitate to reach out.
# Working with OpenRouter in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-openrouter
Access dozens of LLM providers through a single OpenRouter API from Pixeltable computed columns for chat, vision, and tool calling.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable’s OpenRouter integration enables you to access multiple LLM
providers through a unified API via OpenRouter.
### Prerequisites
* An OpenRouter account with an API key ([https://openrouter.ai](https://openrouter.ai))
### Important notes
* OpenRouter usage may incur costs based on the models you use and your
usage volume.
* Be mindful of sensitive data and consider security measures when
integrating with external services.
First you’ll need to install required libraries and enter your
OpenRouter API key.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'OPENROUTER_API_KEY' not in os.environ:
os.environ['OPENROUTER_API_KEY'] = getpass.getpass(
'Enter your OpenRouter API key:'
)
```
Now let’s create a Pixeltable directory to hold the tables for our demo.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Remove the 'openrouter_demo' directory and its contents, if it exists
pxt.drop_dir('openrouter_demo', force=True)
pxt.create_dir('openrouter_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'openrouter\_demo'.
\
## Chat completions
Create a Table: In Pixeltable, create a table with columns to represent
your input data and the columns where you want to store the results from
OpenRouter.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import openrouter
# Create a table in Pixeltable and add a computed column that calls OpenRouter
t = pxt.create_table('openrouter_demo/chat', {'input': pxt.String})
messages = [{'role': 'user', 'content': t.input}]
t.add_computed_column(
output=openrouter.chat_completions(
messages=messages,
model='anthropic/claude-sonnet-4',
model_kwargs={
# Optional dict with parameters compatible with the model
'max_tokens': 300,
'temperature': 0.7,
},
)
)
```
Created table 'chat'.
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Parse the response into a new column
t.add_computed_column(response=t.output.choices[0].message.content)
```
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Start a conversation
t.insert(
[
{'input': 'How many species of felids have been classified?'},
{'input': 'Can you make me a coffee?'},
]
)
t.select(t.input, t.response).head()
```
Inserted 2 rows with 0 errors in 7.59 s (0.26 rows/s)
## Using different models
One of OpenRouter’s key benefits is easy access to models from multiple
providers. Let’s create a table that compares responses from Anthropic
Claude, OpenAI GPT-4, and Meta Llama.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table to compare different models
compare_t = pxt.create_table(
'openrouter_demo/compare_models', {'prompt': pxt.String}
)
messages = [{'role': 'user', 'content': compare_t.prompt}]
# Add responses from different models
compare_t.add_computed_column(
claude=openrouter.chat_completions(
messages=messages,
model='anthropic/claude-sonnet-4',
model_kwargs={'max_tokens': 150},
)
.choices[0]
.message.content
)
compare_t.add_computed_column(
gpt4=openrouter.chat_completions(
messages=messages,
model='openai/gpt-4o-mini',
model_kwargs={'max_tokens': 150},
)
.choices[0]
.message.content
)
compare_t.add_computed_column(
llama=openrouter.chat_completions(
messages=messages,
model='meta-llama/llama-3.3-70b-instruct',
model_kwargs={'max_tokens': 150},
)
.choices[0]
.message.content
)
```
Created table 'compare\_models'.
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert a prompt and compare responses
compare_t.insert(
[{'prompt': 'Explain quantum entanglement in one sentence.'}]
)
compare_t.select(
compare_t.prompt, compare_t.claude, compare_t.gpt4, compare_t.llama
).head()
```
Inserted 1 row with 0 errors in 1.27 s (0.79 rows/s)
Created table 'routing'.
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
routing_t.insert([{'input': 'What are the primary colors?'}])
routing_t.select(routing_t.input, routing_t.response).head()
```
Inserted 1 row with 0 errors in 3.97 s (0.25 rows/s)
## Advanced Features: Context Window Optimization
OpenRouter supports transforms like ‘middle-out’ to optimize handling of
long contexts.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table with transforms for long context optimization
transform_t = pxt.create_table(
'openrouter_demo/transforms', {'long_context': pxt.String}
)
messages = [{'role': 'user', 'content': transform_t.long_context}]
transform_t.add_computed_column(
output=openrouter.chat_completions(
messages=messages,
model='openai/gpt-4o-mini',
model_kwargs={'max_tokens': 200},
# Apply middle-out transform for better long context handling
transforms=['middle-out'],
)
)
transform_t.add_computed_column(
response=transform_t.output.choices[0].message.content
)
```
Created table 'transforms'.
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Example with longer context
long_text = """
Artificial intelligence has transformed many industries. Machine learning algorithms
can now detect patterns in data that humans might miss. Deep learning has revolutionized
computer vision and natural language processing. The future of AI looks promising with
developments in areas like reinforcement learning and generative models.
Question: What are the main AI developments mentioned?
"""
transform_t.insert([{'long_context': long_text}])
transform_t.select(transform_t.response).head()
```
Inserted 1 row with 0 errors in 1.82 s (0.55 rows/s)
### Learn more
To learn more about advanced techniques like RAG operations in
Pixeltable, check out the [RAG Operations in
Pixeltable](/howto/use-cases/rag-operations)
tutorial.
For more information about OpenRouter’s features and available models,
visit:
* [OpenRouter Documentation](https://openrouter.ai/docs)
* [Available Models](https://openrouter.ai/models)
If you have any questions, don’t hesitate to reach out.
# Working with Pydantic in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-pydantic
Define Pydantic schemas in Pixeltable to extract validated structured outputs from LLMs with type checking, defaults, and nested object models.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable’s Pydantic integration enables type-safe data insertion using
Pydantic models. Instead of inserting raw dictionaries, you can define
structured models with validation and insert them directly into
Pixeltable tables.
### Benefits
* **Type Safety**: Pydantic validates data before insertion
* **IDE Support**: Autocomplete and type hints for your data
* **Self-Documenting**: Models serve as schema documentation
* **Validation**: Built-in data validation via Pydantic
### Important notes
* Pydantic model fields map to Pixeltable columns by name
* Computed columns are automatically skipped during insertion
* Nested Pydantic models map to JSON columns
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable pydantic
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
pxt.drop_dir('pydantic_demo', force=True)
pxt.create_dir('pydantic_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'pydantic\_demo'.
\
## Basic usage: scalar types
Define a Pydantic model with fields that match your table columns.
Pixeltable automatically maps Python types to Pixeltable types:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import datetime
import pydantic
from enum import Enum
from typing import Literal
# Define an enum for product categories
class Category(Enum):
ELECTRONICS = 1
CLOTHING = 2
BOOKS = 3
# Define a Pydantic model
class Product(pydantic.BaseModel):
name: str
price: float
in_stock: bool
category: Category
rating: Literal['poor', 'average', 'good', 'excellent']
created_at: datetime.datetime
description: str | None = None # Optional field
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table with matching schema
products = pxt.create_table(
'pydantic_demo/products',
{
'name': pxt.Required[pxt.String],
'price': pxt.Required[pxt.Float],
'in_stock': pxt.Required[pxt.Bool],
'category': pxt.Required[pxt.Int], # Enum values are integers
'rating': pxt.Required[pxt.String], # Literal values
'created_at': pxt.Required[pxt.Timestamp],
'description': pxt.String, # Nullable
},
)
```
Inserted 2 rows with 0 errors in 0.01 s (227.55 rows/s)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Query nested JSON fields using Pixeltable's JSON path syntax
customers.select(
customers.name,
email=customers.contact.email,
city=customers.contact.address.city,
).collect()
```
## Media files with Pydantic
For media columns (Image, Video, Audio, Document), use `str` or `Path`
fields in your Pydantic model to specify file paths or URLs.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pathlib import Path
class ImageRecord(pydantic.BaseModel):
title: str
image_url: str # URLs or file paths as strings
tags: list[str]
# Create table with Image column
images = pxt.create_table(
'pydantic_demo/images',
{
'title': pxt.Required[pxt.String],
'image_url': pxt.Required[pxt.Image], # Media column
'tags': pxt.Required[pxt.Json],
},
)
```
Inserted 1 row with 0 errors in 0.27 s (3.74 rows/s)
## Working with Computed Columns
Pydantic models work seamlessly with computed columns. Simply omit
computed column fields from your model - Pixeltable will skip them
during insertion.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Model only includes input columns
class Article(pydantic.BaseModel):
title: str
content: str
# Create table with computed column
articles = pxt.create_table(
'pydantic_demo/articles',
{
'title': pxt.Required[pxt.String],
'content': pxt.Required[pxt.String],
},
)
# Add a computed column
articles.add_computed_column(
word_count=articles.content.apply(
lambda x: len(x.split()), col_type=pxt.Int
)
)
```
Created table 'articles'.
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert data - computed columns are automatically calculated
article_data = [
Article(
title='Getting Started with Pixeltable',
content='Pixeltable is a powerful tool for building AI applications. It provides automatic versioning and incremental computation.',
),
Article(
title='Type Safety in Python',
content='Using Pydantic with Pixeltable provides type safety and validation for your data pipelines.',
),
]
articles.insert(article_data)
articles.select(articles.title, articles.word_count).collect()
```
Inserted 2 rows with 0 errors in 0.01 s (186.43 rows/s)
## Optional Fields and Defaults
Pydantic’s optional fields with defaults work naturally with
Pixeltable’s nullable columns.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
class Task(pydantic.BaseModel):
title: str
priority: int = 1 # Default value
due_date: datetime.datetime | None = None # Optional
notes: str | None = None # Optional
tasks = pxt.create_table(
'pydantic_demo/tasks',
{
'title': pxt.Required[pxt.String],
'priority': pxt.Required[pxt.Int],
'due_date': pxt.Timestamp, # Nullable
'notes': pxt.String, # Nullable
},
)
# Insert with and without optional fields
tasks.insert(
[
Task(
title='Complete project',
priority=3,
due_date=datetime.datetime(2025, 12, 31),
),
Task(
title='Review code'
), # Uses default priority=1, None for optionals
Task(title='Write docs', notes='Include examples'),
]
)
tasks.collect()
```
Created table 'tasks'.
Inserted 3 rows with 0 errors in 0.01 s (408.88 rows/s)
## Type Mapping Reference
Here’s the complete mapping between Pydantic/Python types and Pixeltable
types:
## Learn More
For more information about working with Pydantic in Pixeltable:
* [Pixeltable Documentation](https://docs.pixeltable.com)
* [Pydantic Documentation](https://docs.pydantic.dev)
* [Type Safety Blog
Post](https://www.pixeltable.com/blog/pydantic-integration-type-safety)
If you have any questions, don’t hesitate to reach out on
[Discord](https://discord.com/invite/QPyqFYx2UN).
# Working with Replicate in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-replicate
Run thousands of open-source models hosted on Replicate from Pixeltable computed columns for image, video, audio, and language tasks.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable’s Replicate integration enables you to access Replicate’s
models via the Replicate API.
### Prerequisites
* A Replicate account with an API token.
### Important notes
* Replicate usage may incur costs based on your Replicate plan.
* Be mindful of sensitive data and consider security measures when
integrating with external services.
First you’ll need to install required libraries and enter a Replicate
API token.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable replicate
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'REPLICATE_API_TOKEN' not in os.environ:
os.environ['REPLICATE_API_TOKEN'] = getpass.getpass(
'Replicate API Token:'
)
```
Now let’s create a Pixeltable directory to hold the tables for our demo.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Remove the `replicate_demo` directory and its contents, if it exists
pxt.drop_dir('replicate_demo', force=True)
pxt.create_dir('replicate_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'replicate\_demo'.
\
## Chat completions
Create a Table: In Pixeltable, create a table with columns to represent
your input data and the columns where you want to store the results from
Replicate.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.replicate import run
# Create a table in Pixeltable and pick a model hosted on Replicate with some parameters
t = pxt.create_table('replicate_demo/chat', {'prompt': pxt.String})
input = {
'system_prompt': 'You are a helpful assistant.',
'prompt': t.prompt,
# These parameters are optional and can be used to tune model behavior:
'max_tokens': 300,
'top_p': 0.9,
'temperature': 0.8,
}
t.add_computed_column(
output=run(input, ref='meta/meta-llama-3-8b-instruct')
)
```
Created table 'chat'.
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Parse the response into a new column
t.add_computed_column(response=pxt.functions.string.join('', t.output))
```
Added 0 column values with 0 errors in 0.02 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Start a conversation
t.insert([{'prompt': 'What foods are rich in selenium?'}])
t.select(t.prompt, t.response).show()
```
Inserted 1 row with 0 errors in 4.45 s (0.22 rows/s)
## Image generation
Here’s an example that shows how to use Replicate’s image generation
models with Pixeltable. We’ll use the FLUX Schnell model.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('replicate_demo/images', {'prompt': pxt.String})
input = {'prompt': t.prompt, 'go_fast': True, 'megapixels': '1'}
t.add_computed_column(
output=run(input, ref='black-forest-labs/flux-schnell')
)
```
Created table 'images'.
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.insert(
[
{
'prompt': 'Draw a pencil sketch of a friendly dinosaur playing tennis in a cornfield.'
}
]
)
```
Inserted 1 row with 0 errors in 0.99 s (1.01 rows/s)
1 row inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.prompt, t.output).collect()
```
We see that Replicate returns our image as an array containing a single
URL. To turn it into an actual image, we cast the string to type
`pxt.Image` in a new computed column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(image=t.output[0].astype(pxt.Image))
t.select(t.image).collect()
```
Added 1 column value with 0 errors in 0.02 s (53.36 rows/s)
### Learn more
To learn more about advanced techniques like RAG operations in
Pixeltable, check out the [RAG Operations in
Pixeltable](/howto/use-cases/rag-operations)
tutorial.
If you have any questions, don’t hesitate to reach out.
# Working with Reve in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-reve
Generate images with Reve from Pixeltable using text and reference prompts through declarative computed columns and batch inference.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable’s Reve integration lets you call Reve’s `create`, `edit`, and
`remix` endpoints directly from tables so you can iterate on visuals
without leaving your data workflows.
## What is Reve?
Reve is an image generation/editing service with three API endpoints:
* **`create`**: Generate new images from text prompts
* **`edit`**: Edit existing images with natural language instructions
* **`remix`**: Blend multiple images together
### Documentation
* [Pixeltable Reve
Functions](/sdk/latest/reve#module-pixeltable-functions-reve)
* [Reve API Reference](https://api.reve.com/console/docs)
## Prerequisites
* A Reve account with an API key (see
[https://api.reve.com/](https://api.reve.com/) for instructions)
**Important:** Reve API calls consume credits based on your plan—monitor
your usage to avoid unexpected charges. Images sent to Reve are
processed on Reve’s servers outside your environment, so do not upload
sensitive, private, or confidential images.
We’ll start by installing Pixeltable, configuring your API key, creating
a directory, and setting up a table. Then we’ll walk through each Reve
endpoint—`create`, `edit`, and `remix`—one at a time.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'REVE_API_KEY' not in os.environ:
os.environ['REVE_API_KEY'] = getpass.getpass('Reve API Key: ')
```
To read more about working with API keys in Pixeltable, see
[Configuration](/platform/configuration).
## Setup
Create a Pixeltable directory to keep the tables for this demo separate
from anything else you’re working on.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
pxt.create_dir('reve_demo', if_exists='replace_force')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/alison-pxt/.pixeltable/pgdata
Created directory 'reve\_demo'.
\
We’ll create a Pixeltable table that starts with a prompt and a source
image, and ends with a final scene. The table we’ll build up to will
require two inputs per row:
1. A prompt for creating a background scene image. We’ll use this
prompt for Reve to create a scene with `reve.create()`.
2. An existing source image. We’ll ask Reve to edit this image with
`reve.edit()`, and then it will be ready as the foreground image.
Finally, we’ll remix the background scene image we made in step 1 by
combining it with the foreground image we made in step 2 with
`reve.remix()`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
spunk_t = pxt.create_table(
'reve_demo/solarpunk_scenes',
{'prompt': pxt.String, 'source_image': pxt.Image},
)
```
Created table 'solarpunk\_scenes'.
To read more about creating tables, see [Tables and Data
Operations](/tutorials/tables-and-data-operations).
You can look at the schema for this table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
spunk_t.describe()
```
Now, we’ll insert values for our first row. We need to provide a text
prompt for the `reve.create()` function and a source image for the
`reve.edit()` function.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
scene_prompt = (
'Create a scene of lush solarpunk metropolis in the desert '
'with urban agriculture and an oasis theme.'
'It should not look like an office park, corporate campus, or an outdoor mall.'
)
image_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg'
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
spunk_t.insert([{'prompt': scene_prompt, 'source_image': image_url}])
```
Inserted 1 row with 0 errors in 0.03 s (39.10 rows/s)
1 row inserted.
To read more about inserting data, see [Bringing
Data](/howto/cookbooks/data/data-import-csv).
And we can peek at our starter table with a single row:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
spunk_t.collect()
```
## Generate new imagery with Reve Create
Use `reve.create()` when you want Reve to synthesize an entirely new
image from a prompt. In Pixeltable, we place this function call inside a
computed column. We’ll generate fresh imagery from the prompt first in
this section. Feel free to change the prompt. Here we ask for a
solarpunk oasis city.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import reve
spunk_t.add_computed_column(
new_image=reve.create(spunk_t.prompt), if_exists='replace'
)
```
Added 1 column value with 0 errors in 6.16 s (0.16 rows/s)
1 row updated.
To read more about computed columns in Pixeltable, see [Computed
Columns](/tutorials/computed-columns).
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
spunk_t.select(spunk_t.prompt, spunk_t.new_image).collect()
```
By default, Pixeltable saves all generated media outputs to a media
directory. We can see the file path by using the `fileurl` property.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
spunk_t.select(spunk_t.new_image.fileurl).collect()
```
### Add Reve parameters
All Reve functions accept optional parameters to customize the output:
* `aspect_ratio`: desired image aspect ratio, e.g. ‘3:2’, ‘16:9’, ‘1:1’,
etc. (available for `reve.create()` and `reve.remix()`)
* `version`: specific model version to use (optional; defaults to latest
if not specified). Available for all Reve functions (`reve.create()`,
`reve.edit()`, and `reve.remix()`)
This adds a second image column using the same prompt that renders in a
square frame.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
spunk_t.add_computed_column(
new_image_sq=reve.create(spunk_t.prompt, aspect_ratio='1:1'),
if_exists='replace',
)
```
Added 1 column value with 0 errors in 6.22 s (0.16 rows/s)
1 row updated.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
spunk_t.select(
spunk_t.prompt, spunk_t.new_image, spunk_t.new_image_sq
).collect()
```
To read more about `reve.create()`, see [reve.create
UDF](/sdk/latest/reve#udf-create).
## Edit an existing photo with Reve Edit
`reve.edit()` takes an existing image plus natural-language instructions
and returns an edited version. We already have a `source_image` column
in our table from the initial setup.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
spunk_t.select(spunk_t.source_image).collect()
```
We can now add a computed column that calls `reve.edit()` to modify the
source image. To read more about `reve.edit()`, see [reve.edit
UDF](/sdk/latest/reve#udf-edit).
This editing prompt is integrated into our computed column logic in
Pixeltable, as opposed to our creating example where we saved the prompt
as its own column. This means that the same prompt will be applied to
any new rows that we insert into this table. We will phrase the editing
prompt to reflect this table’s solarpunk theme, but otherwise keep it
general. This way, we don’t need to provide a specific prompt for every
new table row.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Uncomment the below line to use a Reve function, if you have not already done so
# from pixeltable.functions import reve
spunk_t.add_computed_column(
edited_subject=reve.edit(
spunk_t.source_image,
'Remove any existing background. Focus on the closest person in the foreground. '
'Keep the person and props, but make the lighting and colors vibrant and fit with a solarpunk theme. '
'Make the background behind the person blank.',
),
if_exists='replace',
)
```
Added 1 column value with 0 errors in 16.54 s (0.06 rows/s)
1 row updated.
We can use `collect()` to see the new image:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
spunk_t.select(spunk_t.source_image, spunk_t.edited_subject).collect()
```
## Remix multiple references with Reve Remix
`reve.remix()` blends multiple reference images. Inside the prompt
string, you reference each image with a numbered placeholder:
* `0` refers to `images[0]`
* `1` refers to `images[1]`
* etc.
You can optionally specify `aspect_ratio` and `version` parameters (both
default to latest/auto if not specified).
In the next cell we place the edited subject from `0` (the
first entry in the images list) into the scene from `1` (the
second entry).
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Uncomment the below line to use a Reve function, if you have not already done so
# from pixeltable.functions import reve
spunk_t.add_computed_column(
solarpunk_remix=reve.remix(
'Place the person in 0 in the foreground of the scene from 1. '
'Make the background clear and detailed so it feels like a complete "day in the life" in solarpunk city scene.',
images=[spunk_t.edited_subject, spunk_t.new_image],
aspect_ratio='16:9',
),
if_exists='replace',
)
```
Added 1 column value with 0 errors in 18.58 s (0.05 rows/s)
1 row updated.
To read more about `reve.remix()`, see [reve.remix
UDF](/sdk/latest/reve#udf-remix).
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
spunk_t.select(spunk_t.solarpunk_remix).collect()
```
## Insert a new row
So far, we have been building up our table schema with a single row. Now
we’ll insert a new row, with two fresh input values:
1. A text prompt to create the scene image with `reve.create()` and
2. A source image to edit with `reve.edit()` and remix into that scene
with `reve.remix()`.
Pixeltable will then automatically make the desired Reve API calls and
populate the computed columns.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
spunk_t.insert(
[
{
'prompt': 'Create an indoor tennis court scene, with clay courts inside a lush solarpunk greenhouse filled with bougainvillea, terraced gardens, and an oasis theme.',
'source_image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000885.jpg',
}
]
)
```
Inserted 1 row with 0 errors in 34.30 s (0.03 rows/s)
1 row inserted.
Now we can inspect both outputs because the `insert()` in Pixeltable
triggers our computed columns to update for any that are missing values
(existing images we already generated are not changed because Pixeltable
does incremental updates). For example, here is our inserted image and
our edited image:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
spunk_t.select(spunk_t.source_image, spunk_t.edited_subject).collect()
```
Here are our two remixed images created by Reve:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
spunk_t.select(spunk_t.solarpunk_remix).collect()
```
All together, we created a new scene image, edited an existing image of
a person, then remixed both together to reimagine an existing person in
our new scene.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
spunk_t.select(
spunk_t.new_image, spunk_t.edited_subject, spunk_t.solarpunk_remix
).collect()
```
## Review Reve in Pixeltable
Below is a quick recap of how each Reve function maps inputs to outputs
inside Pixeltable tables. Each function reads input parameters and
writes its results into computed columns.
### Reve Create
* **Input parameter:** A prompt inserted as a row inside a Pixeltable
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
spunk_t.select(
spunk_t.prompt, spunk_t.new_image, spunk_t.new_image_sq
).collect()
```
### Reve Edit
* **Input parameter:** A source image of type `pxt.Image`
* **Usage reminder:** The edit instructions live inline inside the
`add_computed_column()` call
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
spunk_t.select(spunk_t.source_image, spunk_t.edited_subject).collect()
```
### Reve Remix
* **Input parameters:** We started with two image columns
* **How the prompt references them:**
* `images=[my_table.image00, my_table.image01]`
* Inside the prompt, `0` points at `images[0]` and
`1` points at `images[1]`
* **Usage reminder:** Always keep the placeholders and the order of the
`images` list in sync; add more `n` tags if you pass more
reference images.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
spunk_t.select(
spunk_t.new_image, spunk_t.edited_subject, spunk_t.solarpunk_remix
).collect()
```
## Learn more
* Reve API reference:
[https://api.reve.com/console/docs](https://api.reve.com/console/docs)
* Pixeltable documentation:
[https://docs.pixeltable.com/sdk/latest/reve#module-pixeltable-functions-reve](/sdk/latest/reve#module-pixeltable-functions-reve)
If you build something with Reve, let us know!
# Working with RunwayML in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-runwayml
Generate and edit AI videos with Runway from Pixeltable using text-to-video, image-to-video, and motion control through computed columns.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable’s RunwayML integration lets you generate and transform images
and videos using RunwayML’s Gen-4.5, Veo, and other models, directly
from your Pixeltable tables.
## What is RunwayML?
[RunwayML](https://runwayml.com/) is an AI creative platform offering
state-of-the-art generative models for images and video. Pixeltable
wraps four endpoints:
* **`text_to_image`**: Generate images from text prompts with optional
reference images
* **`text_to_video`**: Generate videos from text prompts
* **`image_to_video`**: Animate a still image into a video
* **`video_to_video`**: Transform an existing video with text guidance
### Documentation
* [Pixeltable RunwayML SDK
Reference](/sdk/latest/runwayml)
* [RunwayML API Reference](https://docs.dev.runwayml.com/api/)
## Prerequisites
1. A RunwayML account with an API secret from
[RunwayML](https://app.runwayml.com/)
2. `pip install runwayml`
**Important:** RunwayML API calls consume credits based on your plan.
Image and video generation can be expensive — monitor your usage to
avoid unexpected charges.
## Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable runwayml
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'RUNWAYML_API_SECRET' not in os.environ:
os.environ['RUNWAYML_API_SECRET'] = getpass.getpass(
'RunwayML API Secret: '
)
```
To read more about working with API keys in Pixeltable, see
[Configuration](/platform/configuration).
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions import runwayml
pxt.drop_dir('runwayml_demo', force=True)
pxt.create_dir('runwayml_demo')
```
Created directory 'runwayml\_demo'.
\
## Text-to-Video Generation
Generate videos from text prompts. The `text_to_video` function returns
a JSON response containing the output video URL.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t2v = pxt.create_table(
'runwayml_demo.text_to_video', {'prompt': pxt.String}
)
t2v.add_computed_column(
response=runwayml.text_to_video(
t2v.prompt, model='gen4.5', ratio='1280:720', duration=5
)
)
# Extract the video URL from the response
t2v.add_computed_column(video=t2v.response['output'][0].astype(pxt.Video))
```
Created table 'text\_to\_video'.
Added 0 column values with 0 errors in 0.00 s
Added 0 column values with 0 errors in 0.00 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t2v.insert(
[
{
'prompt': 'A cinematic aerial shot of a coastal city at golden hour, waves crashing against the shore'
}
]
)
```
Inserted 1 row with 0 errors in 105.52 s (0.01 rows/s)
1 row inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t2v.select(t2v.prompt, t2v.video).collect()
```
## Image-to-Video Generation
Animate a still image into a video. You can optionally add a text prompt
to guide the motion.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
i2v = pxt.create_table(
'runwayml_demo.image_to_video',
{'image': pxt.Image, 'prompt': pxt.String},
)
i2v.add_computed_column(
response=runwayml.image_to_video(
i2v.image,
model='gen4.5',
ratio='1280:720',
prompt_text=i2v.prompt,
duration=5,
)
)
i2v.add_computed_column(video=i2v.response['output'][0].astype(pxt.Video))
```
Created table 'image\_to\_video'.
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
i2v.insert(
[
{
'image': 'https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Camponotus_flavomarginatus_ant.jpg/640px-Camponotus_flavomarginatus_ant.jpg',
'prompt': 'The ant slowly walks across the leaf',
}
]
)
```
Inserted 1 row with 0 errors in 145.73 s (0.01 rows/s)
1 row inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
i2v.select(i2v.image, i2v.video).collect()
```
## Text-to-Image Generation
Generate images from text prompts with reference images. The
`text_to_image` function requires at least one reference image.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t2i = pxt.create_table(
'runwayml_demo.text_to_image',
{'prompt': pxt.String, 'ref_image': pxt.Image},
)
t2i.add_computed_column(
response=runwayml.text_to_image(
t2i.prompt, [t2i.ref_image], model='gen4_image', ratio='1280:720'
)
)
# The response contains a list of output image URLs
t2i.add_computed_column(image=t2i.response['output'][0].astype(pxt.Image))
```
Created table 'text\_to\_image'.
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t2i.insert(
[
{
'prompt': 'A photorealistic painting in the same style as the reference',
'ref_image': 'https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Mona_Lisa%2C_by_Leonardo_da_Vinci%2C_from_C2RMF_retouched.jpg/800px-Mona_Lisa%2C_by_Leonardo_da_Vinci%2C_from_C2RMF_retouched.jpg',
}
]
)
```
Inserted 1 row with 0 errors in 21.70 s (0.05 rows/s)
1 row inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t2i.select(t2i.ref_image, t2i.image).collect()
```
## Video-to-Video Transformation
Transform an existing video with text guidance. Note that
`video_to_video` requires a publicly accessible HTTPS URL for the input
video (not a local file).
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v2v = pxt.create_table(
'runwayml_demo.video_to_video',
{'video_url': pxt.String, 'style_prompt': pxt.String},
)
v2v.add_computed_column(
response=runwayml.video_to_video(
v2v.video_url,
v2v.style_prompt,
model='gen4_aleph',
ratio='1280:720',
)
)
v2v.add_computed_column(video=v2v.response['output'][0].astype(pxt.Video))
```
## Using `model_kwargs` for Advanced Parameters
All RunwayML functions accept an optional `model_kwargs` parameter for
passing additional API parameters not exposed as explicit arguments.
Refer to the [RunwayML API docs](https://docs.dev.runwayml.com/api/) for
the full list of supported parameters per endpoint.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Example: pass additional parameters via model_kwargs
advanced = pxt.create_table(
'runwayml_demo.advanced', {'prompt': pxt.String}
)
advanced.add_computed_column(
response=runwayml.text_to_video(
advanced.prompt,
model='gen4.5',
ratio='1280:720',
duration=5,
model_kwargs={'audio': True, 'seed': 42},
)
)
```
## Learn More
* [RunwayML SDK
Reference](/sdk/latest/runwayml) — Full API
details for all RunwayML functions
* [RunwayML API Documentation](https://docs.dev.runwayml.com/api/) —
Official RunwayML API reference
* [Working with
fal.ai](/howto/providers/working-with-fal)
— Another image/video generation integration
* [Working with
Reve](/howto/providers/working-with-reve) —
AI video generation and editing
# Working with Tigris in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-tigris
Store and serve Pixeltable media files from Tigris globally distributed S3-compatible object storage for low-latency multimodal apps.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
This tutorial demonstrates how to configure Pixeltable to use
[Tigris](https://tigrisdata.com) for storage. This lets you store
unlimited amounts of images in Tigris’ global data plane, allowing your
images to load fast everywhere.
## Prerequisites
* A Tigris account, bucket, and access keypair
([https://storage.new](https://storage.new))
## Important notes
* Tigris usage may incur costs based on your plan.
* Be mindful of sensitive data and consider security measures when
integrating with external services.
First you need to install required libraries and enter a Tigris access
keypair obtained via the Tigris Admin Console.
## Set up environment
First, let’s install Pixeltable:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable boto3 datasets
```
## Configure authentication
These steps will have you enter in your Tigris credentials:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import os
from getpass import getpass
os.environ['AWS_ACCESS_KEY_ID'] = getpass('Tigris access key ID')
os.environ['AWS_SECRET_ACCESS_KEY'] = getpass('Tigris secret access key')
bucket_name = getpass('Tigris bucket name')
os.environ['AWS_ENDPOINT_URL_S3'] = 'https://t3.storage.dev'
os.environ['AWS_REGION'] = 'auto'
os.environ['PIXELTABLE_INPUT_MEDIA_DEST'] = f's3://{bucket_name}/input/'
os.environ['PIXELTABLE_OUTPUT_MEDIA_DEST'] = f's3://{bucket_name}/output/'
```
## Create a table for images
Now let’s create a table that will contain images from the
[XeIaso/botw-screenshots-captioned](https://huggingface.co/datasets/XeIaso/botw-screenshots-captioned)
dataset:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from datasets import load_dataset
# Create directory for this demo
pxt.drop_dir('tigris', force=True)
pxt.create_dir('tigris', if_exists='replace')
# Load the dataset
ds = load_dataset('XeIaso/botw-screenshots-captioned')
# Import it to pixeltable with the name screenshots
pxt.drop_table('tigris/screenshots', force=True)
screenshots = pxt.create_table(
'tigris/screenshots', source=ds, if_exists='replace'
)
```
Created directory 'tigris'.
Created table 'screenshots'.
Inserting rows into \`screenshots\`: 100 rows \[00:01, 51.72 rows/s]
Inserting rows into \`screenshots\`: 100 rows \[00:01, 55.57 rows/s]
Inserting rows into \`screenshots\`: 100 rows \[00:01, 52.74 rows/s]
Inserting rows into \`screenshots\`: 100 rows \[00:02, 33.96 rows/s]
Inserting rows into \`screenshots\`: 100 rows \[00:02, 42.64 rows/s]
Inserting rows into \`screenshots\`: 100 rows \[00:02, 39.65 rows/s]
Inserting rows into \`screenshots\`: 100 rows \[00:02, 47.36 rows/s]
Inserting rows into \`screenshots\`: 28 rows \[00:00, 6786.12 rows/s]
Inserted 728 rows with 0 errors.
Once the import is done, you can create thumbnails with a [computed
column](/tutorials/computed-columns):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add a computed column for thumbnails
# Uses output_media_dest by default, or specify a custom destination
screenshots.add_computed_column(
thumbnail=screenshots.image.resize((256, 256)),
destination=f's3://{bucket_name}/botw-screenshots/thumbnails/',
)
```
And then inspect that with the `collect` method:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
results = screenshots.limit(1).collect()
results
```
## Getting URLs for your files
When your files are in object storage, you can get URLs that point
directly to them. These URLs work in HTML, APIs, or any application you
need to serve media with. Fetch them with the `.fileurl` property:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
screenshots.select(
image=screenshots.image,
image_url=screenshots.image.fileurl,
thumbnail=screenshots.thumbnail,
thumbnail_url=screenshots.thumbnail.fileurl,
).limit(1).collect()
```
## Generating Presigned URLs
For private buckets or when you need time-limited access to files, use
presigned URLs. These are temporary, authenticated URLs that allow
anyone to access your files for a limited time without needing
credentials.
Use the `presigned_url` function from `pixeltable.functions.net`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import net
# Generate presigned URLs with 1-hour expiration (3600 seconds)
screenshots.select(
image=screenshots.image,
image_url=screenshots.image.fileurl,
image_presigned=net.presigned_url(screenshots.image.fileurl, 3600),
thumbnail=screenshots.thumbnail,
thumbnail_url=screenshots.thumbnail.fileurl,
thumbnail_presigned=net.presigned_url(
screenshots.thumbnail.fileurl, 3600
),
).limit(1).collect()
```
### Common expiration times
## What you learned
* When you configure Pixeltable to use Tigris to store images, adding
images transparently uploads them into Tigris for global distribution.
* You can override where images are stored in Tigris using the
`destination=` kwarg when creating computed columns.
* Use the `.fileurl` property in queries to get URLs for your stored
files.
* Use `net.presigned_url()` to generate time-limited, authenticated URLs
for private bucket access.
Pixeltable handles everything else for you.
## Next steps
* See the [Cloud Storage
documentation](/integrations/cloud-storage)
for complete provider setup and authentication details.
* Check out [Pixeltable
Configuration](/platform/configuration) for
all config options.
* Join our [Discord community](https://pixeltable.com/discord) if you
have questions.
## Additional Resources
* [Pixeltable Documentation](/)
* [Tigris Documentation](https://www.tigrisdata.com/docs/)
# Working with Together AI in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-together
Run Llama, Qwen, and other open LLMs hosted on Together AI from Pixeltable computed columns for chat, code, and embedding workflows.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
### Prerequisites
* A Together AI account with an API key
([https://api.together.ai/settings/api-keys](https://api.together.ai/settings/api-keys))
### Important notes
* Together.ai usage may incur costs based on your Together.ai plan.
* Be mindful of sensitive data and consider security measures when
integrating with external services.
First you’ll need to install required libraries and enter your Together
API key.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable together
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'TOGETHER_API_KEY' not in os.environ:
os.environ['TOGETHER_API_KEY'] = getpass.getpass('Together API Key: ')
```
Now let’s create a Pixeltable directory to hold the tables for our demo.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Remove the 'together_demo' directory and its contents, if it exists
pxt.drop_dir('together_demo', force=True)
pxt.create_dir('together_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'together\_demo'.
\
## Chat completions
Create a Table: In Pixeltable, create a table with columns to represent
your input data and the columns where you want to store the results from
OpenAI.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import together
chat_t = pxt.create_table('together_demo/chat', {'input': pxt.String})
messages = [{'role': 'user', 'content': chat_t.input}]
chat_t.add_computed_column(
output=together.chat_completions(
messages=messages,
model='Qwen/Qwen3.5-9B',
model_kwargs={
# Optional dict with parameters for the Together API
'max_tokens': 300,
'stop': ['\n'],
'temperature': 0.7,
'top_p': 0.9,
},
)
)
chat_t.add_computed_column(
response=chat_t.output.choices[0].message.content
)
```
Created table 'chat'.
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Start a conversation
chat_t.insert(
[
{'input': 'How many species of felids have been classified?'},
{'input': 'Can you make me a coffee?'},
]
)
chat_t.select(chat_t.input, chat_t.response).head()
```
Inserted 2 rows with 0 errors in 1.58 s (1.27 rows/s)
Created table 'embeddings'.
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
emb_t.insert(
[{'input': 'Together AI provides a variety of embeddings models.'}]
)
```
Inserted 1 row with 0 errors in 0.54 s (1.86 rows/s)
1 row inserted.
Created table 'images'.
Added 0 column values with 0 errors in 0.01 s
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
image_t.insert(
[{'input': 'A friendly dinosaur playing tennis in a cornfield'}]
)
```
Inserted 1 row with 0 errors in 1.35 s (0.74 rows/s)
1 row inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
image_t
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
image_t.head()
```
### Learn more
To learn more about advanced techniques like RAG operations in
Pixeltable, check out the [RAG Operations in
Pixeltable](/howto/use-cases/rag-operations)
tutorial.
If you have any questions, don’t hesitate to reach out.
# Working with Twelve Labs in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-twelvelabs
Index and search videos by visual content, speech, and text using Twelve Labs models from Pixeltable computed columns and embeddings.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Twelve Labs provides multimodal embeddings that project text, images,
audio, and video into the **same semantic space**. This enables true
**cross-modal search** - the most powerful feature of this integration.
**What makes this special?** You can search a video index using *any*
modality:
This notebook demonstrates this cross-modal capability with video, then
shows how to apply the same embeddings to other modalities.
### Prerequisites
* A Twelve Labs account with an API key
([playground.twelvelabs.io](https://playground.twelvelabs.io/))
* Audio and video must be at least 4 seconds long
## Setup
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable twelvelabs
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'TWELVELABS_API_KEY' not in os.environ:
os.environ['TWELVELABS_API_KEY'] = getpass.getpass(
'Enter your Twelve Labs API key: '
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
import pixeltable.functions as pxtf
# Create a fresh directory for our demo
pxt.drop_dir('twelvelabs_demo', force=True)
pxt.create_dir('twelvelabs_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'twelvelabs\_demo'.
\
## Cross-Modal Video Search
Let’s index a video and search it using text, images, audio, and other
videos - all against the same index.
### Create Video Table and Index
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table for videos
video_t = pxt.create_table('twelvelabs_demo/videos', {'video': pxt.Video})
# Insert a sample video
video_url = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/The-Pursuit-of-Happiness.mp4'
video_t.insert([{'video': video_url}])
```
Created table 'videos'.
Inserted 1 row with 0 errors in 1.60 s (0.63 rows/s)
1 row inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a view that segments the video into searchable chunks
# Twelve Labs requires minimum 4 second segments
video_chunks = pxt.create_view(
'twelvelabs_demo/video_chunks',
video_t,
iterator=pxtf.video.video_splitter(
video=video_t.video, duration=5.0, min_segment_duration=4.0
),
)
# Add embedding index for cross-modal search
video_chunks.add_embedding_index(
'video_segment',
embedding=pxtf.twelvelabs.embed.using(model_name='marengo3.0'),
)
```
Let’s look at the index we just added in the table metadata:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
video_chunks
```
The iterator created a larger table from our single video:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
video_chunks.count()
```
51
### Text to Video Search
Find video segments matching a text description.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sim = video_chunks.video_segment.similarity(string='pink')
video_chunks.order_by(sim, asc=False).limit(3).select(
video_chunks.video_segment, score=sim
).collect()
```
### Image to Video Search
Find video segments similar to an image.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
image_query = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/The-Pursuit-of-Happiness-Screenshot.png'
sim = video_chunks.video_segment.similarity(image=image_query)
video_chunks.order_by(sim, asc=False).limit(2).select(
video_chunks.video_segment, score=sim
).collect()
```
### Video to Video Search
Find video segments similar to another video clip.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
video_query = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/The-Pursuit-of-Happiness-Video-Extract.mp4'
sim = video_chunks.video_segment.similarity(video=video_query)
video_chunks.order_by(sim, asc=False).limit(2).select(
video_chunks.video_segment, score=sim
).collect()
```
### Audio to Video Search
Find video segments with similar audio/speech content.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
audio_query = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/The-Pursuit-of-Happiness-Audio-Extract.m4a'
sim = video_chunks.video_segment.similarity(audio=audio_query)
video_chunks.order_by(sim, asc=False).limit(2).select(
video_chunks.video_segment, score=sim
).collect()
```
## Embedding Options
For video embeddings, you can focus on specific aspects:
* `'visual'` - Focus on what you see
* `'audio'` - Focus on what you hear
* `'transcription'` - Focus on what is said
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add a visual-only embedding column
video_chunks.add_computed_column(
visual_embedding=pxtf.twelvelabs.embed(
video_chunks.video_segment,
model_name='marengo3.0',
embedding_option=['visual'],
)
)
video_chunks.select(
video_chunks.video_segment, video_chunks.visual_embedding
).limit(2).collect()
```
Added 51 column values with 0 errors in 19.81 s (2.57 rows/s)
## Other Modalities: Text, Images, and Documents
Twelve Labs embeddings also work for text, images, and documents. Here’s
a compact example showing **multiple embedding indexes on a single
table**.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a multimodal content table
content_t = pxt.create_table(
'twelvelabs_demo/content',
{
'title': pxt.String,
'description': pxt.String,
'thumbnail': pxt.Image,
},
)
# Add computed column combining title and description
content_t.add_computed_column(
text_content=content_t.title + '. ' + content_t.description
)
# Add embedding index on combined text column
content_t.add_embedding_index(
'text_content',
embedding=pxtf.twelvelabs.embed.using(model_name='marengo3.0'),
)
# Add embedding index on image column
content_t.add_embedding_index(
'thumbnail',
embedding=pxtf.twelvelabs.embed.using(model_name='marengo3.0'),
)
```
Created table 'content'.
Added 0 column values with 0 errors in 0.01 s
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert sample content
content_t.insert(
[
{
'title': 'Beach Sunset',
'description': 'A beautiful sunset over the ocean with palm trees.',
'thumbnail': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000025.jpg',
},
{
'title': 'Mountain Hiking',
'description': 'Hikers climbing a steep mountain trail with scenic views.',
'thumbnail': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000139.jpg',
},
{
'title': 'City Street',
'description': 'Busy urban street with cars and pedestrians.',
'thumbnail': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000042.jpg',
},
{
'title': 'Wildlife Safari',
'description': 'Elephants and zebras on the African savanna.',
'thumbnail': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000061.jpg',
},
]
)
```
Inserted 4 rows with 0 errors in 1.97 s (2.03 rows/s)
4 rows inserted.
We can see the two indexes we added in the schema:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
content_t
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Search by text description
sim = content_t.text_content.similarity(string='outdoor nature adventure')
content_t.order_by(sim, asc=False).limit(2).select(
content_t.title, content_t.text_content, score=sim
).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Search by image similarity
query_image = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000001.jpg'
sim = content_t.thumbnail.similarity(image=query_image)
content_t.order_by(sim, asc=False).limit(2).select(
content_t.title, content_t.thumbnail, score=sim
).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Cross-modal: Search images using text!
sim = content_t.thumbnail.similarity(string='shoe rack')
content_t.order_by(sim, asc=False).limit(2).select(
content_t.title, content_t.thumbnail, score=sim
).collect()
```
## Summary
**Twelve Labs + Pixeltable enables:**
* **Cross-modal search**: Query video with text, images, audio, or other
videos
* **Multiple indexes per table**: Add embedding indexes on different
columns
* **Embedding options**: Focus on visual, audio, or transcription
aspects
* **All modalities**: Text, images, audio, video, and documents
### Learn More
* [Twelve Labs Documentation](https://docs.twelvelabs.io/)
* [Pixeltable Documentation](/)
# Working with Voyage AI in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-voyageai
Use Voyage AI embedding and reranker models from Pixeltable to build high-quality retrieval indices for RAG over documents and code.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable’s Voyage AI integration enables you to access
state-of-the-art embedding and reranker models via the Voyage AI API.
### Prerequisites
* A Voyage AI account with an API key ([https://www.voyageai.com/](https://www.voyageai.com/))
### Important notes
* Voyage AI usage may incur costs based on your Voyage AI plan.
* Be mindful of sensitive data and consider security measures when
integrating with external services.
First you’ll need to install required libraries and enter your Voyage AI
API key.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU voyageai
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'VOYAGE_API_KEY' not in os.environ:
os.environ['VOYAGE_API_KEY'] = getpass.getpass(
'Enter your Voyage AI API key:'
)
```
Now let’s create a Pixeltable directory to hold the tables for our demo.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Remove the 'voyageai_demo' directory and its contents, if it exists
pxt.drop_dir('voyageai_demo', force=True)
pxt.create_dir('voyageai_demo')
```
Created directory 'voyageai\_demo'.
\
## Text embeddings
Voyage AI provides state-of-the-art embedding models for semantic search
and RAG applications.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import voyageai
# Create a table for document embeddings
docs_t = pxt.create_table('voyageai_demo/documents', {'text': pxt.String})
# Add computed column with Voyage embeddings
docs_t.add_computed_column(
embedding=voyageai.embeddings(
docs_t.text, model='voyage-3.5', input_type='document'
)
)
```
Created table 'documents'.
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert some sample documents
documents = [
'The Mediterranean diet emphasizes fish, olive oil, and vegetables, believed to reduce chronic diseases.',
'Photosynthesis in plants converts light energy into glucose and produces essential oxygen.',
'20th-century innovations, from radios to smartphones, centered on electronic advancements.',
'Rivers provide water, irrigation, and habitat for aquatic species, vital for ecosystems.',
"Apple's conference call to discuss fourth fiscal quarter results is scheduled for Thursday, November 2, 2023.",
"Shakespeare's works, like 'Hamlet' and 'A Midsummer Night's Dream,' endure in literature.",
]
docs_t.insert({'text': doc} for doc in documents)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View the embeddings
docs_t.select(docs_t.text, docs_t.embedding).head(3)
```
## Embedding index for similarity search
You can use Voyage AI embeddings with Pixeltable’s embedding index for
efficient similarity search.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table with an embedding index
search_t = pxt.create_table('voyageai_demo/search', {'text': pxt.String})
# Add embedding index for similarity search
embed_fn = voyageai.embeddings.using(
model='voyage-3.5', input_type='document'
)
search_t.add_embedding_index('text', string_embed=embed_fn)
```
Created table 'search'.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert documents
search_t.insert({'text': doc} for doc in documents)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add computed column to extract top results using JSON path
rerank_t.add_computed_column(top_results=rerank_t.reranked['results'])
```
Added 1 column value with 0 errors.
1 row updated, 1 value computed.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Extract the top result's document and score
rerank_t.select(
rerank_t.query,
top_document=rerank_t.top_results[0]['document'],
top_score=rerank_t.top_results[0]['relevance_score'],
).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View reranking results
rerank_t.select(rerank_t.query, rerank_t.top_results).collect()
```
## Multimodal Embeddings
Voyage AI’s multimodal model (`voyage-multimodal-3`) can embed both
images and text into the same vector space, enabling cross-modal
similarity search.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a table for multimodal embeddings
mm_t = pxt.create_table(
'voyageai_demo/multimodal',
{'image': pxt.Image, 'caption': pxt.String},
if_exists='replace',
)
# Add computed columns for image and text embeddings
# multimodal_embed can embed either images or text independently
mm_t.add_computed_column(
image_embedding=voyageai.multimodal_embed(
mm_t.image, model='voyage-multimodal-3.5', input_type='document'
)
)
mm_t.add_computed_column(
text_embedding=voyageai.multimodal_embed(
mm_t.caption, model='voyage-multimodal-3.5', input_type='document'
)
)
```
Created table 'multimodal'.
Added 0 column values with 0 errors.
Added 0 column values with 0 errors.
No rows affected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert a sample image with caption
mm_t.insert(
[
{
'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000139.jpg',
'caption': 'A person standing next to an elephant',
}
]
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View the multimodal embeddings
mm_t.select(
mm_t.image, mm_t.caption, mm_t.image_embedding, mm_t.text_embedding
).head()
```
### Learn more
To learn more about RAG operations in Pixeltable, check out the [RAG
Operations in
Pixeltable](/howto/use-cases/rag-operations)
tutorial.
For more information about Voyage AI models and features, visit:
* [Voyage AI Documentation](https://docs.voyageai.com/)
* [Text Embeddings](https://docs.voyageai.com/docs/embeddings)
* [Multimodal
Embeddings](https://docs.voyageai.com/docs/multimodal-embeddings)
* [Rerankers](https://docs.voyageai.com/docs/reranker)
If you have any questions, don’t hesitate to reach out.
# Transcribing and Indexing Audio and Video in Pixeltable
Source: https://docs.pixeltable.com/howto/use-cases/audio-transcriptions
End-to-end audio transcription pipeline in Pixeltable using Whisper, speaker diarization, and indexed transcripts for search and analysis.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
In this tutorial, we’ll build an end-to-end workflow for creating and
indexing audio transcriptions of video data. We’ll demonstrate how
Pixeltable can be used to:
1. Extract audio data from video files;
2. Transcribe the audio using OpenAI Whisper;
3. Build a semantic index of the transcriptions, using the Huggingface
sentence\_transformers models;
4. Search this index.
The tutorial assumes you’re already somewhat familiar with Pixeltable.
If this is your first time using Pixeltable, the [10-Minute
Tour](/overview/ten-minute-tour) tutorial is
a great place to start.
## Create a Table for Video Data
Let’s first install the Python packages we’ll need for the demo. We’re
going to use the popular Whisper library, running locally. Later in the
demo, we’ll see how to use the OpenAI API endpoints as an alternative.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -q pixeltable openai openai-whisper sentence-transformers spacy
!python -m spacy download en_core_web_sm -q
```
Now we create a Pixeltable table to hold our videos.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
pxt.drop_dir(
'transcription_demo', force=True
) # Ensure a clean slate for the demo
pxt.create_dir('transcription_demo')
# Create a table to store our videos and workflow
video_table = pxt.create_table(
'transcription_demo/video_table', {'video': pxt.Video}
)
video_table
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'transcription\_demo'.
Created table 'video\_table'.
Next let’s insert some video files into the table. In this demo, we’ll
be using one-minute excerpts from a Lex Fridman podcast. We’ll begin by
inserting two of them into our new table. In this demo, our videos are
given as `https` links, but Pixeltable also accepts local files and S3
URLs as input.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
videos = [
'https://github.com/pixeltable/pixeltable/raw/release/docs/resources/audio-transcription-demo/'
f'Lex-Fridman-Podcast-430-Excerpt-{n}.mp4'
for n in range(3)
]
video_table.insert({'video': video} for video in videos[:2])
video_table.show()
```
Inserted 2 rows with 0 errors in 2.04 s (0.98 rows/s)
Now we’ll add another column to hold extracted audio from our videos.
The new column is an example of a *computed column*: it’s updated
automatically based on the contents of another column (or columns). In
this case, the value of the `audio` column is defined to be the audio
track extracted from whatever’s in the `video` column.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.video import extract_audio
video_table.add_computed_column(
audio=extract_audio(video_table.video, format='mp3')
)
video_table.show()
```
Added 2 column values with 0 errors in 0.91 s (2.19 rows/s)
If we look at the structure of the video table, we see that the new
column is a computed column.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
video_table
```
We can also add another computed column to extract metadata from the
audio streams.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.audio import get_metadata
video_table.add_computed_column(metadata=get_metadata(video_table.audio))
video_table.show()
```
Added 2 column values with 0 errors in 0.02 s (95.47 rows/s)
## Create Transcriptions
Now we’ll add a step to create transcriptions of our videos. As
mentioned above, we’re going to use the Whisper library for this,
running locally. Pixeltable has a built-in function,
`whisper.transcribe`, that serves as an adapter for the Whisper
library’s transcription capability. All we have to do is add a computed
column that calls this function:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import whisper
video_table.add_computed_column(
transcription=whisper.transcribe(
audio=video_table.audio, model='base.en'
)
)
video_table.select(
video_table.video, video_table.transcription.text
).show()
```
Added 2 column values with 0 errors in 4.63 s (0.43 rows/s)
In order to index the transcriptions, we’ll first need to split them
into sentences. We can do this using Pixeltable’s built-in
`string_splitter` iterator.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.string import string_splitter
sentences_view = pxt.create_view(
'transcription_demo/sentences_view',
video_table,
iterator=string_splitter(
video_table.transcription.text, separators='sentence'
),
)
```
The `string_splitter` creates a new view, with the audio transcriptions
broken into individual, one-sentence chunks.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sentences_view.select(sentences_view.pos, sentences_view.text).show(8)
```
## Add an Embedding Index
Next, let’s use the Huggingface `sentence_transformers` library to
create an embedding index of our sentences, attaching it to the `text`
column of our `sentences_view`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import sentence_transformer
sentences_view.add_embedding_index(
'text',
embedding=sentence_transformer.using(model_id='intfloat/e5-large-v2'),
)
```
We can do a simple lookup to test our new index. The following snippet
returns the results of a nearest-neighbor search on the input “What is
happiness?”
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sim = sentences_view.text.similarity(string='What is happiness?')
(
sentences_view.order_by(sim, asc=False)
.limit(10)
.select(sentences_view.text, similarity=sim)
.collect()
)
```
## Incremental Updates
*Incremental updates* are a key feature of Pixeltable. Whenever a new
video is added to the original table, all of its downstream computed
columns are updated automatically. Let’s demonstrate this by adding a
third video to the table and seeing how the updates propagate through to
the index.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
video_table.insert([{'video': videos[2]}])
```
Inserted 10 rows with 0 errors in 4.20 s (2.38 rows/s)
10 rows inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
video_table.select(
video_table.video,
video_table.metadata,
video_table.transcription.text,
).show()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sim = sentences_view.text.similarity(string='What is happiness?')
(
sentences_view.order_by(sim, asc=False)
.limit(20)
.select(sentences_view.text, similarity=sim)
.collect()
)
```
We can see the new results showing up in `sentences_view`.
## Using the OpenAI API
This concludes our tutorial using the locally installed Whisper library.
Sometimes, it may be preferable to use the OpenAI API rather than a
locally installed library. In this section we’ll show how this can be
done in Pixeltable, simply by using a different function to construct
our computed columns.
Since this section relies on calling out to the OpenAI API, you’ll need
to have an API key, which you can enter below.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import openai
video_table.add_computed_column(
transcription_from_api=openai.transcriptions(
video_table.audio, model='whisper-1'
)
)
```
Added 3 column values with 0 errors in 6.49 s (0.46 rows/s)
3 rows updated.
Now let’s compare the results from the local model and the API
side-by-side.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
video_table.select(
video_table.video,
video_table.transcription.text,
video_table.transcription_from_api.text,
).show()
```
They look pretty similar, which isn’t surprising, since the OpenAI
transcriptions endpoint runs on Whisper.
One difference is that the local library spits out a lot more
information about the internal behavior of the model. Note that we’ve
been selecting `video_table.transcription.text` in the preceding
queries, which pulls out just the `text` field of the transcription
results. The actual results are a sizable JSON structure that includes a
lot of metadata. To see the full output, we can select
`video_table.transcription` instead, to get the full JSON struct. Here’s
what it looks like (we’ll select just one row, since it’s a lot of
output):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
video_table.select(
video_table.transcription, video_table.transcription_from_api
).show(1)
```
# Object Detection in Videos
Source: https://docs.pixeltable.com/howto/use-cases/object-detection-in-videos
Detect, track, and visualize objects across video frames in Pixeltable using YOLOX and other vision models with FrameIterator-backed views.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
In this tutorial, we’ll demonstrate how to use Pixeltable to do
frame-by-frame object detection, made simple through Pixeltable’s
video-related functionality:
* automatic frame extraction
* running complex functions against frames (in this case, the YOLOX
object detection models)
* reassembling frames back into videos We’ll be working with a single
video file from Pixeltable’s test data repository.
This tutorial assumes you’re at least somewhat familiar with Pixeltable;
a good place to learn more is the [Pixeltable
Documentation](/overview/pixeltable).
## Creating a tutorial directory and table
First, let’s make sure the packages we need for this tutorial are
installed: Pixeltable itself, PyTorch, and the YOLOX object detection
library.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable pixeltable-yolox
```
All data in Pixeltable is stored in tables, which in turn reside in
directories. We’ll begin by creating a `detection_demo` directory and a
table to hold our videos, with a single column of type `pxt.Video`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
pxt.create_dir('detection_demo', if_exists='replace_force')
videos_table = pxt.create_table(
'detection_demo/videos', {'video': pxt.Video}
)
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'detection\_demo'.
Created table 'videos'.
In order to interact with the frames, we take advantage of Pixeltable’s
component view concept: we create a “view” of our video table that
contains one row for each frame of each video in the table. Pixeltable
provides the built-in `frame_iterator` for this.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.video import frame_iterator
frames_view = pxt.create_view(
'detection_demo/frames',
videos_table,
iterator=frame_iterator(videos_table.video),
)
```
You’ll see that neither the `videos` table nor the `frames` view has any
actual data yet, because we haven’t yet added any videos to the table.
However, the `frames` view is now configured to automatically track the
`videos` table as new data shows up.
The new view is automatically configured with six columns:
* `pos` - a system column that is part of every component view
* `video` - the column inherited from our base table (all base table
columns are visible in any of its views)
* `frame_idx`, `pos_msec`, `pos_frame`, `frame` - these four columns are
created by the `frame_iterator`.
Let’s have a look at the new view:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
frames_view
```
We’ll now insert a single row into the videos table, containing a video
of a busy intersection in Bangkok.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
videos_table.insert(
[
{
'video': 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/bangkok.mp4'
}
]
)
```
Inserted 462 rows with 0 errors in 4.35 s (106.25 rows/s)
462 rows inserted.
Notice that both the `videos` table and `frames` view were automatically
updated, expanding the single video into 461 rows in the view. Let’s
have a look at `videos` first.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
videos_table.show()
```
Now let’s peek at the first five rows of `frames`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
frames_view.select(
frames_view.pos,
frames_view.frame,
frames_view.frame.width,
frames_view.frame.height,
).show(5)
```
One advantage of using Pixeltable’s component view mechanism is that
Pixeltable does not physically store the frames. Instead, Pixeltable
re-extracts the frames on retrieval using the frame index, which can be
done very efficiently and avoids any storage overhead (which can be
quite substantial for video frames).
## Object Detection with Pixeltable
Now let’s apply an object detection model to our frames. Pixeltable
includes built-in support for a number of models; we’re going to use the
YOLOX family of models, which are lightweight models with solid
performance. We first import the `yolox` Pixeltable function.
Pixeltable functions operate on columns and expressions using standard
Python function call syntax. Here’s an example that shows how we might
experiment with applying one of the YOLOX models to the first few frames
in our video, using Pixeltable’s powerful `select` comprehension.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.yolox import yolox
# Show the results of applying the `yolox_tiny` model
# to the first few frames in the table.
frames_view.select(
frames_view.frame, yolox(frames_view.frame, model_id='yolox_tiny')
).head(3)
```
It may appear that we just ran the YOLOX inference over the entire view
of 461 frames, but remember that Pixeltable evaluates expressions
lazily: in this case, it only ran inference over the 3 frames that we
actually displayed.
The inference output looks like what we’d expect, so let’s add a
*computed column* that runs inference over the entire view (computed
columns are discussed in detail in the [Computed
Columns](https://github.com/pixeltable/pixeltable/blob/release/docs/tutorials/computed-columns.ipynb)
tutorial). Remember that once a computed column is created, Pixeltable
will update it incrementally any time new rows are added to the view.
This is a convenient way to incorporate inference (and other operations)
into data workflows.
This *will* cause Pixeltable to run inference over all 461 frames, so
please be patient.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a computed column to compute detections using the `yolox_tiny`
# model.
# We'll adjust the confidence threshold down a bit (the default is 0.5)
# to pick up even more bounding boxes.
frames_view.add_computed_column(
detections_tiny=yolox(
frames_view.frame, model_id='yolox_tiny', threshold=0.25
)
)
```
Added 461 column values with 0 errors in 15.09 s (30.55 rows/s)
461 rows updated.
The new column is now part of the schema of the `frames` view:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
frames_view
```
The data in the computed column is now stored for fast retrieval.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
frames_view.select(frames_view.frame, frames_view.detections_tiny).show(3)
```
Now let’s create a new set of images, in which we superimpose the
detected bounding boxes on top of the original images. We’ll use the
handy built-in `bboxes_draw` UDF for this. We could create a new
computed column to hold the superimposed images, but we don’t have to;
sometimes it’s easier just to use a `select` comprehension, as we did
when we were first experimenting with the detection model.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable.functions as pxtf
frames_view.select(
frames_view.frame,
pxtf.vision.bboxes_draw(
frames_view.frame, frames_view.detections_tiny.bboxes, width=4
),
).show(1)
```
Our `select` comprehension ranged over the entire table, but just as
before, Pixeltable computes the output lazily: image operations are
performed at retrieval time, so in this case, Pixeltable drew the
annotations just for the one frame that we actually displayed.
Looking at individual frames gives us some idea of how well our
detection algorithm works, but it would be more instructive to turn the
visualization output back into a video.
We do that with the built-in function `make_video()`, which is an
aggregation function that takes a frame index (actually: any expression
that can be used to order the frames; a timestamp would also work) and
an image, and then assembles the sequence of images into a video.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
frames_view.group_by(videos_table).select(
pxt.functions.video.make_video(
frames_view.pos,
pxtf.vision.bboxes_draw(
frames_view.frame, frames_view.detections_tiny.bboxes, width=4
),
)
).show(1)
```
## Comparing Object Detection Models
The detections that we get out of `yolox_tiny` are passable, but a
little choppy. Suppose we want to experiment with a more powerful object
detection model, to see if there is any improvement in detection
quality. We can create an additional column to hold the new inferences.
The larger model takes longer to download and run, so please be patient.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Here we use the larger `yolox_m` (medium) model.
frames_view.add_computed_column(
detections_m=yolox(
frames_view.frame, model_id='yolox_m', threshold=0.25
)
)
```
Added 461 column values with 0 errors in 65.94 s (6.99 rows/s)
461 rows updated.
Let’s see the results of the two models side-by-side.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
frames_view.group_by(videos_table).select(
pxt.functions.video.make_video(
frames_view.pos,
pxtf.vision.bboxes_draw(
frames_view.frame, frames_view.detections_tiny.bboxes, width=4
),
),
pxt.functions.video.make_video(
frames_view.pos,
pxtf.vision.bboxes_draw(
frames_view.frame, frames_view.detections_m.bboxes, width=4
),
),
).show(1)
```
Running the videos side-by-side, we can see that the larger model is
higher in quality: less flickering, with more stable boxes from frame to
frame.
## Evaluating Models Against a Ground Truth
In order to do a quantitative evaluation of model performance, we need a
ground truth to compare them against. Let’s generate some (synthetic)
“ground truth” data by running against the largest YOLOX model
available. It will take even longer to cache and evaluate this model.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
frames_view.add_computed_column(
detections_x=yolox(
frames_view.frame, model_id='yolox_x', threshold=0.25
)
)
```
Added 461 column values with 0 errors in 156.55 s (2.94 rows/s)
461 rows updated.
Let’s have a look at our enlarged view, now with three `detections`
columns.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
frames_view
```
We’re going to be evaluating the generated detections with the
commonly-used [mean average
precision](https://learnopencv.com/mean-average-precision-map-object-detection-model-evaluation-metric/)
metric (mAP).
The mAP metric is based on per-frame metrics, such as true and false
positives per detected class, which are then aggregated into a single
(per-class) number. In Pixeltable, functionality is available via the
`eval_detections()` and `mean_ap()` built-in functions.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.vision import eval_detections, mean_ap
frames_view.add_computed_column(
eval_yolox_tiny=eval_detections(
pred_bboxes=frames_view.detections_tiny.bboxes,
pred_labels=frames_view.detections_tiny.labels,
pred_scores=frames_view.detections_tiny.scores,
gt_bboxes=frames_view.detections_x.bboxes,
gt_labels=frames_view.detections_x.labels,
)
)
frames_view.add_computed_column(
eval_yolox_m=eval_detections(
pred_bboxes=frames_view.detections_m.bboxes,
pred_labels=frames_view.detections_m.labels,
pred_scores=frames_view.detections_m.scores,
gt_bboxes=frames_view.detections_x.bboxes,
gt_labels=frames_view.detections_x.labels,
)
)
```
Added 461 column values with 0 errors in 0.29 s (1589.38 rows/s)
Added 461 column values with 0 errors in 0.31 s (1475.98 rows/s)
461 rows updated.
Let’s take a look at the output.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
frames_view.select(
frames_view.eval_yolox_tiny, frames_view.eval_yolox_m
).show(1)
```
The computation of the mAP metric is now simply a query over the
evaluation output, aggregated with the `mean_ap()` function.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
frames_view.select(
mean_ap(frames_view.eval_yolox_tiny),
mean_ap(frames_view.eval_yolox_m),
).show()
```
This two-step process allows you to compute mAP at every granularity:
over your entire dataset, only for specific videos, only for videos that
pass a certain filter, etc. Moreover, you can compute this metric any
time, not just during training, and use it to guide your understanding
of your dataset and how it affects the quality of your models.
# Document Indexing and RAG
Source: https://docs.pixeltable.com/howto/use-cases/rag-demo
Complete RAG demo in Pixeltable that ingests PDFs, chunks text, builds embeddings, runs semantic search, and generates grounded LLM answers.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
In this tutorial, we’ll demonstrate how RAG operations can be
implemented in Pixeltable. In particular, we’ll develop a RAG
application that summarizes a collection of PDF documents and uses
ChatGPT to answer questions about them.
In a traditional RAG workflow, such operations might be implemented as a
Python script that runs on a periodic schedule or in response to certain
events. In Pixeltable, they are implemented as persistent tables that
are updated automatically and incrementally as new data becomes
available.
We first set up our OpenAI API key:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')
```
We then install the packages we need for this tutorial and then set up
our environment.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -q pixeltable sentence-transformers tiktoken openai openpyxl
```
Note: you may need to restart the kernel to use updated packages.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Ensure a clean slate for the demo
pxt.drop_dir('rag_demo', force=True)
pxt.create_dir('rag_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/sergeymkhitaryan/.pixeltable/pgdata
Created directory 'rag\_demo'.
\
Next we’ll create a table containing the sample questions we want to
answer. The questions are stored in an Excel spreadsheet, along with a
set of “ground truth” answers to help evaluate our model pipeline. We
can use `create_table()` with the `source` parameter to load them. Note
that we can pass the URL of the spreadsheet directly.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
base = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/rag-demo/'
qa_url = base + 'Q-A-Rag.xlsx'
queries_t = pxt.create_table('rag_demo/queries', source=qa_url)
```
Created table 'queries'.
Inserting rows into \`queries\`: 8 rows \[00:00, 2469.96 rows/s]
Inserted 8 rows with 0 errors.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
queries_t.head()
```
## Outline
There are two major parts to our RAG application:
1. Document Indexing: Load the documents, split them into chunks, and
index them using a vector embedding.
2. Querying: For each question on our list, do a top-k lookup for the
most relevant chunks, use them to construct a ChatGPT prompt, and
send the enriched prompt to an LLM.
We’ll implement both parts in Pixeltable.
## Document Indexing
All data in Pixeltable, including documents, resides in tables.
Tables are persistent containers that can serve as the store of record
for your data. Since we are starting from scratch, we will start with an
empty table `rag_demo.documents` with a single column, `document`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
documents_t = pxt.create_table(
'rag_demo/documents', {'document': pxt.Document}
)
documents_t
```
Created table 'documents'.
Next, we’ll insert our first few source documents into the new table.
We’ll leave the rest for later, in order to show how to update the
indexed document base incrementally.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
document_urls = [
base + 'Argus-Market-Digest-June-2024.pdf',
base + 'Argus-Market-Watch-June-2024.pdf',
base + 'Company-Research-Alphabet.pdf',
base + 'Jefferson-Amazon.pdf',
base + 'Mclean-Equity-Alphabet.pdf',
base + 'Zacks-Nvidia-Report.pdf',
]
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
documents_t.insert({'document': url} for url in document_urls[:3])
documents_t.show()
```
Inserting rows into \`documents\`: 3 rows \[00:00, 491.31 rows/s]
Inserted 3 rows with 0 errors.
In RAG applications, we often decompose documents into smaller units, or
chunks, rather than treating each document as a single entity. In this
example, we’ll use Pixeltable’s built-in `document_splitter`, but in
general the chunking methodology is highly customizable.
`document_splitter` has a variety of options for controlling the
chunking behavior, and it’s also possible to replace it entirely with a
user-defined iterator (or an adapter for a third-party document
splitter).
In Pixeltable, operations such as chunking can be automated by creating
**views** of the base `documents` table. A view is a virtual derived
table: rather than adding data directly to the view, we define it via a
computation over the base table. In this example, the view is defined by
iteration over the chunks of a `document_splitter`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.document import document_splitter
chunks_t = pxt.create_view(
'rag_demo/chunks',
documents_t,
iterator=document_splitter(
documents_t.document, separators='token_limit', limit=300
),
)
```
Inserting rows into \`chunks\`: 41 rows \[00:00, 20799.04 rows/s]
Our `chunks` view now has 3 columns:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
chunks_t
```
* `text` is the chunk text produced by the `document_splitter`
* `pos` is a system-generated integer column, starting at 0, that
provides a sequence number for each row
* `document`, which is simply the `document` column from the base table
`documents`. We won’t need it here, but having access to the base
table’s columns (in effect a parent-child join) can be quite useful.
Notice that as soon as we created it, `chunks` was automatically
populated with data from the existing documents in our base table. We
can select the first 2 chunks from each document using common query
operations, in order to get a feel for what was extracted:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
chunks_t.where(chunks_t.pos < 2).show()
```
Now let’s compute vector embeddings for the document chunks and store
them in a vector index. Pixeltable has built-in support for vector
indexing using a variety of embedding model families, and it’s easy for
users to add new ones via UDFs. In this demo, we’re going to use the E5
model from the Huggingface `sentence_transformers` library, which runs
locally.
The following command creates a vector index on the `text` column in the
`chunks` table, using the E5 embedding model. (For details on index
creation, see the [Embedding and Vector
Indices](https://github.com/pixeltable/pixeltable/blob/release/docs/platform/embedding-indexes.ipynb)
guide.) Note that defining the index is sufficient in order to load it
with the existing data (and also to update it when the underlying data
changes, as we’ll see later).
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import sentence_transformer
chunks_t.add_embedding_index(
'text',
embedding=sentence_transformer.using(model_id='intfloat/e5-large-v2'),
)
```
This completes the first part of our application, creating an indexed
document base. Next, we’ll use it to run some queries.
## Querying
In order to express a top-k lookup against our index, we use
Pixeltable’s `similarity` operator in combination with the standard
`order_by` and `limit` operations. Before building this into our
application, let’s run a sample query to make sure it works.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query_text = 'What is the expected EPS for Nvidia in Q1 2026?'
sim = chunks_t.text.similarity(string=query_text)
nvidia_eps_query = (
chunks_t.order_by(sim, asc=False)
.select(similarity=sim, text=chunks_t.text)
.limit(5)
)
nvidia_eps_query.collect()
```
We perform this context retrieval for each row of our `queries` table by
adding it as a computed column. In this case, the operation is a top-k
similarity lookup against the data in the `chunks` table. To implement
this operation, we’ll use Pixeltable’s `@query` decorator to enhance the
capabilities of the `chunks` table.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# A @query is essentially a reusable, parameterized query that is attached to a table (or view),
# which is a modular way of getting data from that table.
@pxt.query
def top_k(query_text: str):
sim = chunks_t.text.similarity(string=query_text)
return (
chunks_t.order_by(sim, asc=False)
.select(chunks_t.text, sim=sim)
.limit(5)
)
# Now add a computed column to `queries_t`, calling the query
# `top_k` that we just defined.
queries_t.add_computed_column(question_context=top_k(queries_t.Question))
```
Our `queries` table now looks like this:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
queries_t
```
The new column `question_context` now contains the result of executing
the query for each row, formatted as a list of dictionaries:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
queries_t.select(queries_t.question_context).head(1)
```
### Asking the LLM
Now it’s time for the final step in our application: feeding the
document chunks and questions to an LLM for resolution. In this demo,
we’ll use OpenAI for this, but any other inference cloud or local model
could be used instead.
We start by defining a UDF that takes a top-k list of context chunks and
a question and turns them into a ChatGPT prompt.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Define a UDF to create an LLM prompt given a top-k list of
# context chunks and a question.
@pxt.udf
def create_prompt(top_k_list: list[dict], question: str) -> str:
concat_top_k = '\n\n'.join(
elt['text'] for elt in reversed(top_k_list)
)
return f"""
PASSAGES:
{concat_top_k}
QUESTION:
{question}"""
```
We then add that again as a computed column to `queries`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
queries_t.add_computed_column(
prompt=create_prompt(queries_t.question_context, queries_t.Question)
)
```
We now have a new string column containing the prompt:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
queries_t
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
queries_t.select(queries_t.prompt).head(1)
```
We now add another computed column to call OpenAI. For the
`chat_completions()` call, we need to construct two messages, containing
the instructions to the model and the prompt. For the latter, we can
simply reference the `prompt` column we just added.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import openai
# Assemble the prompt and instructions into OpenAI's message format
messages = [
{
'role': 'system',
'content': 'Please read the following passages and answer the question based on their contents.',
},
{'role': 'user', 'content': queries_t.prompt},
]
# Add a computed column that calls OpenAI
queries_t.add_computed_column(
response=openai.chat_completions(
model='gpt-4o-mini', messages=messages
)
)
```
Our `queries` table now contains a JSON-structured column `response`,
which holds the entire API response structure. At the moment, we’re only
interested in the response content, which we can extract easily into
another computed column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
queries_t.add_computed_column(
answer=queries_t.response.choices[0].message.content
)
```
We now have the following `queries` schema:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
queries_t
```
Let’s take a look at what we got back:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
queries_t.select(
queries_t.Question, queries_t.correct_answer, queries_t.answer
).show()
```
The application works, but, as expected, a few questions couldn’t be
answered due to the missing documents. As a final step, let’s add the
remaining documents to our document base, and run the queries again.
## Incremental Updates
Pixeltable’s views and computed columns update automatically in response
to new data. We can see this when we add the remaining documents to our
`documents` table. Watch how the `chunks` view is updated to stay in
sync with `documents`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
documents_t.insert({'document': p} for p in document_urls[3:])
```
Inserting rows into \`documents\`: 3 rows \[00:00, 569.05 rows/s]
Inserting rows into \`chunks\`: 67 rows \[00:00, 325.91 rows/s]
Inserted 70 rows with 0 errors.
70 rows inserted, 6 values computed.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
documents_t.show()
```
(Note: although Pixeltable updates `documents` and `chunks`, it **does
not** automatically update the `queries` table. This is by design: we
don’t want all rows in `queries` to get automatically re-executed every
time a single new document is added to the document base. However,
newly-added rows will be run over the new, incrementally-updated index.)
To confirm that the `chunks` index got updated, we’ll re-run the chunks
retrieval query for the question
`What is the expected EPS for Nvidia in Q1 2026?`
Previously, our most similar chunk had a similarity score of \~0.8. Let’s
see what we get now:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
nvidia_eps_query.collect()
```
Our most similar chunk now has a score of \~0.855 and pulls in more
relevant chunks from the newly-inserted documents.
Let’s recompute the `question_context` column of the `queries_t` table,
which will automatically recompute the `answer` column as well.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
queries_t.recompute_columns('question_context')
```
As a final step, let’s confirm that all the queries now have answers:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
queries_t.select(
queries_t.Question, queries_t.correct_answer, queries_t.answer
).show()
```
# RAG Operations in Pixeltable
Source: https://docs.pixeltable.com/howto/use-cases/rag-operations
Operate production RAG pipelines in Pixeltable with incremental indexing, versioning, evaluation, and observability over document collections.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
In this tutorial, we’ll explore Pixeltable’s flexible handling of RAG
operations on unstructured text. In a traditional AI workflow, such
operations might be implemented as a Python script that runs on a
periodic schedule or in response to certain events. In Pixeltable, as
with everything else, they are implemented as persistent table
operations that update incrementally as new data becomes available. In
our tutorial workflow, we’ll chunk PDF documents in various ways with a
document splitter, then apply several kinds of embeddings to the chunks.
## Set Up the Table Structure
We start by installing the necessary dependencies, creating a Pixeltable
directory `rag_ops_demo` (if it doesn’t already exist), and setting up
the table structure for our new workflow.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable sentence-transformers spacy tiktoken
!python -m spacy download en_core_web_sm -q
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Ensure a clean slate for the demo
pxt.drop_dir('rag_ops_demo', force=True)
# Create the Pixeltable workspace
pxt.create_dir('rag_ops_demo')
```
## Creating Tables and Views
Now we’ll create the tables that represent our workflow, starting with a
table to hold references to source documents. The table contains a
single column `source_doc` whose elements have type `pxt.Document`,
representing a general document instance. In this tutorial, we’ll be
working with PDF documents, but Pixeltable supports a range of other
document types, such as Markdown and HTML.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
docs = pxt.create_table('rag_ops_demo/docs', {'source_doc': pxt.Document})
```
Created table 'docs'.
If we take a peek at the `docs` table, we see its very simple structure.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
docs
```
Next we create a view to represent chunks of our PDF documents. A
Pixeltable view is a virtual table, which is dynamically derived from a
source table by applying a transformation and/or selecting a subset of
data. In this case, our view represents a one-to-many transformation
from source documents into individual sentences. This is achieved using
Pixeltable’s built-in `document_splitter` class.
Note that the `docs` table is currently empty, so creating this view
doesn’t actually *do* anything yet: it simply defines an operation that
we want Pixeltable to execute when it sees new data.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.document import document_splitter
sentences = pxt.create_view(
'rag_ops_demo/sentences', # Name of the view
docs, # Table from which the view is derived
iterator=document_splitter(
docs.source_doc,
separators='sentence', # Chunk docs into sentences
metadata='title,heading,sourceline',
),
)
```
Let’s take a peek at the new `sentences` view.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sentences
```
We see that `sentences` inherits the `source_doc` column from `docs`,
together with some new fields:
* `pos`: The position in the source document where the sentence appears.
* `text`: The text of the sentence.
* `title`, `heading`, and `sourceline`: The metadata we requested when
we set up the view.
## Data Ingestion
Ok, now it’s time to insert some data into our workflow. A document in
Pixeltable is just a URL; the following command inserts a single row
into the `docs` table with the `source_doc` field set to the specified
URL:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
docs.insert(
[
{
'source_doc': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/rag-demo/Argus-Market-Digest-June-2024.pdf'
}
]
)
```
Inserting rows into \`docs\`: 1 rows \[00:00, 292.76 rows/s]
Inserting rows into \`sentences\`: 217 rows \[00:00, 42910.00 rows/s]
Inserted 218 rows with 0 errors.
218 rows inserted, 2 values computed.
We can see that two things happened. First, a single row was inserted
into `docs`, containing the URL representing our source PDF. Then, the
view `sentences` was incrementally updated by applying the
`document_splitter` according to the definition of the view. This
illustrates an important principle in Pixeltable: by default, anytime
Pixeltable sees new data, the update is incrementally propagated to any
downstream views or computed columns.
We can see the effect of the insertion with the `select` command.
There’s a single row in `docs`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
docs.select(docs.source_doc.fileurl).show()
```
And here are the first 20 rows in `sentences`. The content of the PDF is
broken into individual sentences, as expected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sentences.select(sentences.text, sentences.heading).show(20)
```
## Experimenting with Chunking
Of course, chunking into sentences isn’t the only way to split a
document. Perhaps we want to experiment with different chunking
methodologies, in order to see which one performs best in a particular
application. Pixeltable makes it easy to do this, by creating several
views of the same source table. Here are a few examples. Notice that as
each new view is created, it is initially populated from the data
already in `docs`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
chunks = pxt.create_view(
'rag_ops_demo/chunks',
docs,
iterator=document_splitter(
docs.source_doc,
separators='sentence,token_limit',
limit=2048,
overlap=0,
metadata='title,heading,sourceline',
),
)
```
Inserting rows into \`chunks\`: 217 rows \[00:00, 47827.85 rows/s]
Inserting rows into \`short\_char\_chunks\`: 459 rows \[00:00, 63241.10 rows/s]
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
chunks.select(chunks.text, chunks.heading).show(20)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
short_chunks.select(short_chunks.text, short_chunks.heading).show(20)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
short_char_chunks.select(
short_char_chunks.text, short_char_chunks.heading
).show(20)
```
Now let’s add a few more documents to our workflow. Notice how all of
the downstream views are updated incrementally, processing just the new
documents as they are inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
urls = [
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/rag-demo/Argus-Market-Watch-June-2024.pdf',
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/rag-demo/Company-Research-Alphabet.pdf',
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/rag-demo/Zacks-Nvidia-Report.pdf',
]
docs.insert({'source_doc': url} for url in urls)
```
Inserting rows into \`docs\`: 3 rows \[00:00, 1969.77 rows/s]
Inserting rows into \`chunks\`: 742 rows \[00:00, 61926.41 rows/s]
Inserting rows into \`short\_chunks\`: 747 rows \[00:00, 67743.68 rows/s]
Inserting rows into \`sentences\`: 742 rows \[00:00, 67949.90 rows/s]
Inserting rows into \`short\_char\_chunks\`: 1165 rows \[00:00, 3603.41 rows/s]
Inserted 3399 rows with 0 errors.
3399 rows inserted, 6 values computed.
## Further Experiments
This is a good time to mention another important guiding principle of
Pixeltable. The preceding examples all used the built-in
`document_splitter` class with various configurations. That’s probably
fine as a first cut or to prototype an application quickly, and it might
be sufficient for some applications. But other applications might want
to do more sophisticated kinds of chunking, implementing their own
specialized logic or leveraging third-party tools. Pixeltable imposes no
constraints on the AI or RAG operations a workflow uses: the iterator
interface is highly general, and it’s easy to implement new operations
or adapt existing code or third-party tools into the Pixeltable
workflow.
## Computing Embeddings
Next, let’s look at how embedding indices can be added seamlessly to
existing Pixeltable workflows. To compute our embeddings, we’ll use the
Huggingface `sentence_transformer` package, running it over the `chunks`
view that broke our documents up into sentence-based chunks. Pixeltable
has a built-in `sentence_transformer` adapter, and all we have to do is
add a new column that leverages it. Pixeltable takes care of the rest,
applying the new column to all existing data in the view.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import sentence_transformer
chunks.add_computed_column(
minilm_embed=sentence_transformer(
chunks.text, model_id='paraphrase-MiniLM-L6-v2'
)
)
```
The new column is a *computed column*: it is defined as a function on
top of existing data and updated incrementally as new data are added to
the workflow. Let’s have a look at how the new column affected the
`chunks` view.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
chunks
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
chunks.select(chunks.text, chunks.heading, chunks.minilm_embed).head()
```
Similarly, we might want to add a CLIP embedding to our workflow; once
again, it’s just another computed column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import clip
chunks.add_computed_column(
clip_embed=clip(chunks.text, model_id='openai/clip-vit-base-patch32')
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
chunks
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
chunks.select(chunks.text, chunks.heading, chunks.clip_embed).head()
```
# Using Label Studio for Annotations with Pixeltable
Source: https://docs.pixeltable.com/howto/using-label-studio-with-pixeltable
Send Pixeltable video and image data to Label Studio for human annotation, then sync labeled tasks back into your tables for ML training.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
This tutorial demonstrates how to integrate Pixeltable with Label
Studio, in order to provide seamless management of annotations data
across the annotation workflow. We’ll assume that you’re at least
somewhat familiar with Pixeltable and have read the [10-Minute
Tour](/overview/ten-minute-tour) tutorial.
**This tutorial can only be run in a local Pixeltable installation, not
in Colab or Kaggle**, since it relies on spinning up a locally running
Label Studio instance. See the [Quick
Start](/overview/quick-start) guide for
instructions on how to set up a local Pixeltable instance.
To begin, let’s ensure the requisite dependencies are installed.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable label-studio label-studio-sdk torch transformers
```
## Set up Label Studio
Now let’s spin up a Label Studio server process. (If you’re already
running Label Studio, you can choose to skip this step, and instead
enter your existing Label Studio URL and access token in the subsequent
step.) Be patient, as it may take a minute or two to start.
This will open a new browser window containing the Label Studio
interface. If you’ve never run Label Studio before, you’ll need to
create an account; a link to create one will appear in the Label Studio
browser window. **Everything is running locally in this tutorial, so the
account will exist only on your local system.**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import subprocess
ls_process = subprocess.Popen(['label-studio'], stderr=subprocess.PIPE)
```
January 23, 2026 - 01:41:50
Django version 5.1.15, using settings 'label\_studio.core.settings.label\_studio'
Starting development server at [http://0.0.0.0:8080/](http://0.0.0.0:8080/)
Quit the server with CONTROL-C.
If for some reason the Label Studio browser window failed to open, you
can always access it at: [http://localhost:8080/](http://localhost:8080/)
Once you’ve created an account in Label Studio, you’ll need to locate
your API key. In the Label Studio browser window, log in, click
“Organization”, “API Tokens Settings”, and enable “Legacy Tokens”. Then
click on “Account & Settings” in the top right, click “Legacy Token”,
and copy the Access Token from the interface.
## Configure Pixeltable
Next, we configure Pixeltable to communicate with Label Studio. Run the
following command, pasting in the API key that you copied from the Label
Studio interface.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'LABEL_STUDIO_URL' not in os.environ:
os.environ['LABEL_STUDIO_URL'] = 'http://localhost:8080/'
if 'LABEL_STUDIO_API_KEY' not in os.environ:
os.environ['LABEL_STUDIO_API_KEY'] = getpass.getpass(
'Label Studio API key: '
)
```
## Create a Table to Store Videos
Now we create the master table that will hold our videos to be
annotated. This only needs to be done once, when we initially set up the
workflow.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
schema = {'video': pxt.Video, 'date': pxt.Timestamp}
# Before creating the table, we drop the `ls_demo` dir and all its contents,
# in order to ensure a clean environment for the demo.
pxt.drop_dir('ls_demo', force=True)
pxt.create_dir('ls_demo')
videos_table = pxt.create_table('ls_demo/videos', schema)
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'ls\_demo'.
Created table 'videos'.
## Populate It with Data
Now let’s add some videos to the table to populate it. For this
tutorial, we’ll use some randomly selected videos from the Multimedia
Commons archive. The table also contains a `date` field, for which we’ll
use a fixed date (but in a production setting, it would typically be the
date on which the video was imported).
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from datetime import datetime
url_prefix = 'http://multimedia-commons.s3-website-us-west-2.amazonaws.com/data/videos/mp4/'
files = [
'122/8ff/1228ff94bf742242ee7c88e4769ad5d5.mp4',
'2cf/a20/2cfa205eae979b31b1144abd9fa4e521.mp4',
'ffe/ff3/ffeff3c6bf57504e7a6cecaff6aefbc9.mp4',
]
today = datetime(2024, 4, 22)
videos_table.insert(
{'video': url_prefix + file, 'date': today} for file in files
)
```
Inserted 3 rows with 0 errors in 1.07 s (2.81 rows/s)
3 rows inserted.
Let’s have a look at the table now.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
videos_table.head()
```
## Create a Label Studio project
Next we’ll create a new Label Studio project and link it to a new view
on the Pixeltable table. You can link a Label Studio project to either a
table or a view. For tables that are expecting a lot of input data, it’s
often easier to link to views. In this example, we’ll create a view that
filters the table down by date.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a view to filter on the specified date
v = pxt.create_view(
'ls_demo/videos_2024_04_22',
videos_table.where(videos_table.date == today),
)
# Create a new Label Studio project and link it to the view. The
# configuration uses Label Studio's standard XML format. This only
# needs to be done once: after the view and project are linked,
# the relationship is stored indefinitely in Pixeltable's metadata.
label_config = """
"""
pxt.io.create_label_studio_project(v, label_config)
```
Added 3 column values with 0 errors in 0.01 s (355.10 rows/s)
Added 3 column values with 0 errors in 0.02 s (146.19 rows/s)
Linked external store 'ls\_project\_0' to table 'videos\_2024\_04\_22'.
Created 3 new task(s) in LabelStudioProject \`videos\_2024\_04\_22\`.
No rows affected.
If you look in the Label Studio UI now, you’ll see that there’s a new
project with the name `videos_2022_04_22`, with three tasks, one for
each of the videos in the view. If you want to create the project
without populating it with tasks (yet), you can set
`sync_immediately=False` in the call to `create_label_studio_project()`.
You can always sync the table and project by calling `v.sync()`.
Note also that we didn’t have to specify an explicit mapping between
Pixeltable columns and Label Studio data fields. This is because, by
default, Pixeltable assumes the Pixeltable and Label Studio field names
coincide. The data field in the Label Studio project has the name
`$video`, which Pixeltable maps, by default, to the column in
`ls_demo.videos_2024_02_22` that is also called `video`. If you want to
override this behavior to specify an explicit mapping of columns to
fields, you can do that with the `col_mapping` parameter of
`create_label_studio_project()`.
Inspecting the view, we also see that Pixeltable created an additional
column on the view, `annotations`, which will hold the output of our
annotations workflow. The name of the output column can also be
overridden by specifying a dict entry in `col_mapping` of the form
`{'my_col_name': 'annotations'}`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v
```
## Add Some Annotations
Now, let’s add some annotations to our Label Studio project to simulate
a human-in-the-loop workflow. In the Label Studio UI, click on the new
`videos_2024_02_22` project, and click on any of the three tasks. Select
the appropriate category (“city”, “food”, or “sports”), and click
“Submit”.
## Import the Annotations Back To Pixeltable
Now let’s try importing annotations from Label Studio back to our view.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v = pxt.get_table('ls_demo/videos_2024_04_22')
v.sync()
```
Created 0 new task(s) in LabelStudioProject \`videos\_2024\_04\_22\`.
Updated annotation(s) from 3 task(s) in LabelStudioProject \`videos\_2024\_04\_22\`.
3 rows updated.
Let’s see what effect that had. You’ll see that any videos that you
annotated now have their `annotations` field populated in the view.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v.select(v.video, v.annotations).head()
```
## Parse Annotations with a Computed Column
Pixeltable pulls in all sorts of metadata from Label Studio during a
sync: everything that Label Studio reports back about the annotations,
including things like the user account that created the annotations.
Let’s say that all we care about is the annotation value. We can add a
computed column to our table to pull it out.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v.add_computed_column(
video_category=v.annotations[0].result[0].value.choices[0]
)
v.select(v.video, v.annotations, v.video_category).head()
```
Added 3 column values with 0 errors in 0.02 s (143.55 rows/s)
Another useful operation is the `get_metadata` function, which returns
information about the video itself, such as the resolution and codec
(independent of Label Studio). Let’s add another computed column to hold
such metadata.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.video import get_metadata
v.add_computed_column(video_metadata=get_metadata(v.video))
v.select(
v.video, v.annotations, v.video_category, v.video_metadata
).head()
```
Added 3 column values with 0 errors in 0.03 s (115.36 rows/s)
## Preannotations with Pixeltable and Label Studio
Frame extraction is another common operation in labeling workflows. In
this example, we’ll extract frames from our videos into a view, then use
an object detection model to generate preannotations for each frame. The
following code uses a Pixeltable `frame_iterator` to automatically
extract frames into a new view, which we’ll call `frames_2024_04_22`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from datetime import datetime
from pixeltable.functions.video import frame_iterator
today = datetime(2024, 4, 22)
videos_table = pxt.get_table('ls_demo/videos')
# Create the view, using a `frame_iterator` to extract frames with a sample rate
# of `fps=0.25`, or 1 frame per 4 seconds of video. Setting `fps=0` would use the
# native framerate of the video, extracting every frame.
frames = pxt.create_view(
'ls_demo/frames_2024_04_22',
videos_table.where(videos_table.date == today),
iterator=frame_iterator(videos_table.video, fps=0.25),
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Show just the first 3 frames in the table, to avoid cluttering the notebook
frames.select(frames.frame).head(3)
```
Now we’ll use the Resnet-50 object detection model to generate
preannotations. We do this by creating a new computed column.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import detr_for_object_detection
# Run the Resnet-50 object detection model against each frame to generate bounding boxes
frames.add_computed_column(
detections=detr_for_object_detection(
frames.frame, model_id='facebook/detr-resnet-50', threshold=0.95
)
)
frames.select(frames.frame, frames.detections).head(3)
```
Added 11 column values with 0 errors in 9.71 s (1.13 rows/s)
We’d like to send these detections to Label Studio as preannotations,
but they’re not quite ready. Label Studio expects preannotations in
standard COCO format, but the Huggingface library outputs them in its
own custom format. We can use Pixeltable’s handy `detr_to_coco` function
to do the conversion, using another computed column.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import detr_to_coco
frames.add_computed_column(
preannotations=detr_to_coco(frames.frame, frames.detections)
)
frames.select(
frames.frame, frames.detections, frames.preannotations
).head(3)
```
## Create a Label Studio Project for Frames
With our data workflow set up and the COCO preannotations prepared, all
that’s left is to create a corresponding Label Studio project. Note how
Pixeltable automatically maps `RectangleLabels` preannotation fields to
columns, just like it does with data fields. Here, Pixeltable interprets
the `name="preannotations"` attribute in `RectangleLabels` to mean, “map
these rectangle labels to the `preannotations` column in my linked table
or view”.
The Label values `car`, `person`, and `train` are standard COCO object
identifiers used by many off-the-shelf object detection models. You can
find the complete list of them here, and include as many as you wish:
[https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/coco-categories.csv](https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/coco-categories.csv)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
frames_config = """
"""
pxt.io.create_label_studio_project(frames, frames_config)
```
If you go into Label Studio and open up the new project, you can see the
effect of adding the preannotations from Resnet-50 to our workflow.
## Incremental Updates
As we saw in the [10-Minute
Tour](/overview/ten-minute-tour) tutorial,
adding new data to Pixeltable results in incremental updates of
everything downstream. We can see this by inserting a new video into our
base videos table: all of the downstream views and computed columns are
updated automatically, including the video metadata, frames, and
preannotations.
The update may take some time, so please be patient (it involves a
sequence of operations, including frame extraction and object
detection).
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
videos_table.insert(
video=url_prefix + '22a/948/22a9487a92956ac453a9c15e0fc4dd4.mp4',
date=today,
)
```
Note that the incremental updates do *not* automatically sync the
`Table` with the remote Label Studio projects. To issue a sync, we have
to call the `sync()` methods separately. Note that tasks will be created
only for the *newly added* rows in the videos and frames views, not the
existing ones.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v.sync()
frames.sync()
```
## Deleting a Project
To remove a Label Studio project from a table or view, use
`unlink_external_stores()`, as demonstrated by the following example. If
you specify `delete_external_data=True`, then the Label Studio project
will also be deleted, along with all existing data and annotations (be
careful!) If `delete_external_data=False`, then the Label Studio project
will be unlinked from Pixeltable, but the project and data will remain
in Label Studio (so you’ll need to delete the project manually if you
later want to get rid of it).
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v.external_stores # Get a list of all external stores for `v`
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v.unlink_external_stores('ls_project_0', delete_external_data=True)
```
## Configuring `media_import_method`
All of the examples so far in this tutorial use HTTP file uploads to
send media data to Label Studio. This is the simplest method and the
easiest to configure, but it’s undesirable for complex projects or
projects with a lot of data. In fact, the Label Studio documentation
includes this specific warning: “Uploading data works fine for proof of
concept projects, but it is not recommended for larger projects.”
In Pixeltable, you can configure linked Label Studio projects to use
URLs for media data (instead of file uploads) by specifying the
`media_import_method='url'` argument in `create_label_studio_project`.
This is recommended for all production applications, and is mandatory
for projects whose input configuration is more complex than a single
media file (in the Label Studio parlance, projects with more than one
“data key”).
If `media_import_method='url'`, then Pixeltable will simply pass the
media data URLs directly to Label Studio. If the URLs are `http://` or
`https://` URLs, then nothing more needs to be done.
Label Studio also supports `s3://` URLs with credentialed access. To use
them, you’ll need to configure access to your bucket in the project
configuration. The simplest way to do this is by specifying an
`s3_configuration` in `create_label_studio_project`. Here’s an example,
though it won’t work directly in this demo notebook, since it relies on
having an access key. (If your AWS credentials are stored in
`~/.aws/credentials`, then you can omit the access key and secret, and
Pixeltable will fill them in automatically.)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.io.create_label_studio_project(
v,
label_config,
media_import_method='url',
s3_configuration={
'bucket': 'pxt-test',
'aws_access_key_id': my_key,
'aws_secret_access_key': my_secret,
},
)
```
Before you can set up credentialed S3 access, you’ll need to configure
your S3 bucket to work with Label Studio; the details on how to do this
are described here:
* [Label Studio Docs: Amazon
S3](https://labelstud.io/guide/storage.html#Amazon-S3)
For the full documentation on `create_label_studio_project` usage, see:
* [Pixeltable SDK Docs:
create\_label\_studio\_project()](/sdk/latest/io#func-create_label_studio_project)
## Notebook Cleanup
That’s the end of the tutorial! To conclude, let’s terminate the running
Label Studio process. (Of course, feel free to leave it running if you
want to play around with it some more.)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
ls_process.kill()
```
# Working with Voxel51 for Visualization in Pixeltable
Source: https://docs.pixeltable.com/howto/working-with-fiftyone
Visualize Pixeltable images, detections, and embeddings interactively in the Voxel51 FiftyOne app for dataset exploration and quality review.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable can export data directly from tables and views to the popular
[Voxel51](https://voxel51.com/) frontend, providing a way to visualize
and explore image and video datasets. In this tutorial, we’ll learn how
to:
* Export data from Pixeltable to Voxel51
* Apply labels from image classification and object detection models to
exported data
We begin by installing the necessary libraries for this tutorial.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable fiftyone torch transformers timm
```
## Example 1: An Image Dataset
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import fiftyone as fo
import pixeltable as pxt
# Create a Pixeltable directory for the demo. We first drop the directory if it
# exists, in order to ensure a clean environment.
pxt.drop_dir('fo_demo', force=True)
pxt.create_dir('fo_demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'fo\_demo'.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a Pixeltable table for our dataset and insert some sample images.
url_prefix = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images'
urls = [
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000019.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000025.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000030.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000034.jpg',
]
t = pxt.create_table('fo_demo/images', {'image': pxt.Image})
t.insert({'image': url} for url in urls)
t.head()
```
Created table 'images'.
Inserted 4 rows with 0 errors in 0.71 s (5.60 rows/s)
Now we export our new table to a Voxel51 dataset and load it into a new
Voxel51 session within our demo notebook. Once it’s been loaded, the
images can be interactively navigated as with any other Voxel51 dataset.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
fo_dataset = pxt.io.export_images_as_fo_dataset(t, t.image)
session = fo.launch_app(fo_dataset)
```
You are running the oldest supported major version of MongoDB. Please refer to [https://deprecation.voxel51.com\ for\ deprecation\ notices.\ You\ can\ suppress\ this\ exception\ by\ setting\ your\ \\\`database\_validation\\\`\ config\ parameter\ to\ \\\`False\\\`.\ See\ https://docs.voxel51.com/user\_guide/config.html#configuring-a-mongodb-connection\ for\ more\ information](https://deprecation.voxel51.com\ for\ deprecation\ notices.\ You\ can\ suppress\ this\ exception\ by\ setting\ your\ \\`database_validation\\`\ config\ parameter\ to\ \\`False\\`.\ See\ https://docs.voxel51.com/user_guide/config.html#configuring-a-mongodb-connection\ for\ more\ information)
28 \[31.4ms elapsed, ? remaining, 890.5 samples/s]
## Adding Labels
We’ll now show how Voxel51 labels can be attached to the exported
dataset. Currently, Pixeltable supports only classification and
detection labels; other Voxel51 label types may be added in the future.
First, let’s generate some labels by applying two models from the
Huggingface `transformers` library: A ViT model for image classification
and a DETR model for object detection.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import (
detr_for_object_detection,
vit_for_image_classification,
)
t.add_computed_column(
classifications=vit_for_image_classification(
t.image, model_id='google/vit-base-patch16-224'
)
)
t.add_computed_column(
detections=detr_for_object_detection(
t.image, model_id='facebook/detr-resnet-50'
)
)
```
Added 4 column values with 0 errors in 4.17 s (0.96 rows/s)
Added 4 column values with 0 errors in 2.72 s (1.47 rows/s)
4 rows updated.
Both models output JSON containing the model results. Let’s peek at the
contents of our table now:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.head()
```
Now we need to transform our model data into the format the Voxel51 API
expects (see the Pixeltable documentation for
[pxt.io.export\_images\_as\_fo\_dataset](/sdk/latest/io#func-export_images_as_fo_dataset)
for details). We’ll use Pixeltable UDFs to do the appropriate
conversions.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
def vit_to_fo(vit_labels: list) -> list:
return [
{'label': label, 'confidence': score}
for label, score in zip(
vit_labels['label_text'], vit_labels['scores']
)
]
@pxt.udf
def detr_to_fo(img: pxt.Image, detr_labels: dict) -> list:
result = []
for label, box, score in zip(
detr_labels['label_text'],
detr_labels['boxes'],
detr_labels['scores'],
):
# DETR gives us bounding boxes in (x1,y1,x2,y2) absolute (pixel) coordinates.
# Voxel51 expects (x,y,w,h) relative (fractional) coordinates.
# So we need to do a conversion.
fo_box = [
box[0] / img.width,
box[1] / img.height,
(box[2] - box[0]) / img.width,
(box[3] - box[1]) / img.height,
]
result.append(
{'label': label, 'bounding_box': fo_box, 'confidence': score}
)
return result
```
We can test that our UDFs are working as expected with a `select()`
statement.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(
t.image,
t.classifications,
vit_to_fo(t.classifications),
t.detections,
detr_to_fo(t.image, t.detections),
).head()
```
Now we pass the modified structures to `export_images_as_fo_dataset`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
fo_dataset = pxt.io.export_images_as_fo_dataset(
t,
t.image,
classifications=vit_to_fo(t.classifications),
detections=detr_to_fo(t.image, t.detections),
)
session = fo.launch_app(fo_dataset)
```
## Adding Multiple Label Sets
You can include multiple label sets of the same type in the same dataset
by passing a `list` or `dict` of expressions to the `classifications`
and/or `detections` parameters. If a `list` is specified, default names
will be assigned to the label sets; if a `dict` is specified, the label
sets will be named according to its keys.
As an example, let’s try recomputing our detections using the more
powerful DETR model ResNet-101, and then load them into the same Voxel51
dataset as the earlier detections in order to compare them side-by-side.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(
detections_101=detr_for_object_detection(
t.image, model_id='facebook/detr-resnet-101'
)
)
```
Added 4 column values with 0 errors in 21.91 s (0.18 rows/s)
4 rows updated.
Exploring the resulting images, we can see that the results are not much
different between the two models, at least on our small sample dataset.
# Cloud Storage
Source: https://docs.pixeltable.com/integrations/cloud-storage
Connect Pixeltable to S3, Google Cloud Storage, Azure Blob, and other cloud storage backends to manage media files and external references.
Pixeltable supports storing media files (images, videos, audio, documents) in external cloud storage providers instead of local disk. This is essential for production deployments, enabling scalable storage, team collaboration, and integration with existing data infrastructure.
## Supported providers
Free managed storage, no bucket setup required
Native S3 storage with full feature support
GCS buckets with gs\:// URI scheme
Azure containers with wasb:// or abfs\:// schemes
S3-compatible storage with zero egress fees
Cost-effective S3-compatible storage
Globally distributed S3-compatible storage
## How it works
When you configure a storage destination, Pixeltable automatically:
1. **Uploads computed media** — AI-generated images, extracted video frames, and other computed media files are stored in your bucket
2. **Copies input media** — Optionally persists referenced media files for durability
3. **Manages file lifecycle** — Cleans up files when table data is deleted
4. **Handles caching** — Downloads files on-demand with intelligent local caching
## Configuration
There are two ways to configure cloud storage destinations:
### Global default destinations
Set default destinations for all media columns in your `config.toml` (see [Configuration](/platform/configuration) for details):
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[pixeltable]
# For input media (inserted/referenced files)
input_media_dest = "s3://my-bucket/input/"
# For computed media (AI-generated outputs)
output_media_dest = "s3://my-bucket/output/"
```
Or via environment variables:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
export PIXELTABLE_INPUT_MEDIA_DEST="s3://my-bucket/input/"
export PIXELTABLE_OUTPUT_MEDIA_DEST="s3://my-bucket/output/"
```
Configure these before creating tables. All media columns will automatically use the configured destinations.
### Per-column destination (computed columns only)
For **computed columns**, you can override the default with a specific destination:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Create a table with input media column
# (uses global input_media_dest if configured)
t = pxt.create_table('my_app/images', {'image': pxt.Image})
# Add computed column with explicit destination
t.add_computed_column(
thumbnail=t.image.resize((128, 128)),
destination='s3://my-bucket/thumbnails/'
)
```
The `destination` parameter only applies to **stored computed columns**. For input columns, use the global `input_media_dest` configuration.
### Precedence rules
Destinations are resolved in this order:
1. **Explicit column destination** — highest priority (computed columns only)
2. **Global default** — `input_media_dest` for input columns, `output_media_dest` for computed columns
3. **Local storage** — fallback if no destination is configured
## Provider configuration
### Pixeltable Cloud (home bucket)
Every Pixeltable Cloud account includes a free managed storage bucket. No bucket creation, no credentials file, no cloud provider account needed.
```
pxtfs://org-slug:db-slug/home
```
Replace `org-slug` and `db-slug` with your Pixeltable Cloud organization and database names.
Set your Pixeltable API key:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[pixeltable]
api_key = "your-pixeltable-api-key"
```
Or via environment variable:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
export PIXELTABLE_API_KEY="your-pixeltable-api-key"
```
Pixeltable automatically fetches and refreshes temporary credentials from the Cloud control plane.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
t = pxt.create_table('app/images', {'photo': pxt.Image})
t.add_computed_column(
thumbnail=t.photo.resize((256, 256)),
destination='pxtfs://myorg:mydb/home'
)
```
Or set it as your global default in `config.toml`:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[pixeltable]
output_media_dest = "pxtfs://myorg:mydb/home"
```
Or as an environment variable:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
export PIXELTABLE_OUTPUT_MEDIA_DEST="pxtfs://myorg:mydb/home"
```
This is the fastest way to get cloud storage working. No AWS/GCP/Azure account required.
You can browse, search, and preview the contents of your home bucket directly from the [Pixeltable Cloud dashboard](https://www.pixeltable.com/dashboard). Navigate to **Storage & Buckets** in the sidebar, then select your **home** bucket to explore files by type (images, docs, video, audio), view metadata, and inspect individual objects.
```
https://www.pixeltable.com/dashboard/{org-slug}/{db-slug}/storage/home/browse
```
### Amazon S3
```
s3://bucket-name/optional/prefix/
```
Uses standard AWS credential chain:
* Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`)
* AWS credentials file (`~/.aws/credentials`)
* IAM role (when running on AWS)
Optionally specify a profile in `config.toml`:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[pixeltable]
s3_profile = "my-aws-profile"
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# With global config: output_media_dest = "s3://my-bucket/output/"
t = pxt.create_table('app/images', {'photo': pxt.Image})
# Or set destination per computed column
t.add_computed_column(
thumbnail=t.photo.resize((256, 256)),
destination='s3://my-production-bucket/thumbnails/'
)
```
### Google Cloud Storage
```
gs://bucket-name/optional/prefix/
```
Uses Google Cloud Application Default Credentials:
* Service account key file (`GOOGLE_APPLICATION_CREDENTIALS`)
* gcloud CLI authentication
* GCE metadata service (when running on GCP)
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pip install google-cloud-storage
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# With global config: output_media_dest = "gs://my-gcs-bucket/output/"
t = pxt.create_table('app/videos', {'video': pxt.Video})
# Or set destination per computed column
t.add_computed_column(
frames=pxt.functions.video.frame_iterator(t.video, fps=1),
destination='gs://my-gcs-bucket/frames/'
)
```
### Azure Blob Storage
Azure supports multiple URI schemes:
```
wasbs://container@account.blob.core.windows.net/prefix/
abfss://container@account.dfs.core.windows.net/prefix/
```
Configure in `config.toml`:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[azure]
storage_account_name = "myaccount"
storage_account_key = "your-key-here"
```
Or via environment variables:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
export AZURE_STORAGE_ACCOUNT_NAME="myaccount"
export AZURE_STORAGE_ACCOUNT_KEY="your-key-here"
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pip install azure-storage-blob
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# With global config: output_media_dest = "wasbs://mycontainer@myaccount.blob.core.windows.net/output/"
t = pxt.create_table('app/docs', {'document': pxt.Document})
# Or set destination per computed column
t.add_computed_column(
chunks=pxt.functions.video.document_splitter(t.document),
destination='wasbs://mycontainer@myaccount.blob.core.windows.net/chunks/'
)
```
### Cloudflare R2
```
https://account-id.r2.cloudflarestorage.com/bucket-name/prefix/
```
Create an R2 API token and configure AWS-style credentials.
In `~/.aws/credentials`:
```ini theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[r2]
aws_access_key_id = your-r2-access-key
aws_secret_access_key = your-r2-secret-key
```
In `config.toml`:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[pixeltable]
r2_profile = "r2"
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('app/images', {'image': pxt.Image})
t.add_computed_column(
rotated=t.image.rotate(90),
destination='https://abc123.r2.cloudflarestorage.com/my-bucket/processed/'
)
```
### Backblaze B2
```
https://s3.region.backblazeb2.com/bucket-name/prefix/
```
Create B2 application keys and configure AWS-style credentials.
In `~/.aws/credentials`:
```ini theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[b2]
aws_access_key_id = your-b2-key-id
aws_secret_access_key = your-b2-application-key
```
In `config.toml`:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[pixeltable]
b2_profile = "b2"
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('app/audio', {'audio': pxt.Audio})
t.add_computed_column(
segments=pxt.functions.video.audio_splitter(t.audio, duration=30),
destination='https://s3.us-west-004.backblazeb2.com/my-bucket/segments/'
)
```
### Tigris
```
https://t3.storage.dev/bucket-name/prefix/
```
Configure AWS-style credentials for Tigris.
In `~/.aws/credentials`:
```ini theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[tigris]
aws_access_key_id = your-tigris-access-key
aws_secret_access_key = your-tigris-secret-key
```
In `config.toml`:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[pixeltable]
tigris_profile = "tigris"
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('app/media', {'file': pxt.Image})
t.add_computed_column(
thumbnail=t.file.resize((128, 128)),
destination='https://t3.storage.dev/my-bucket/thumbnails/'
)
```
## Complete example
Here's a full example using S3 for both input and computed media.
First, configure your global destinations in `~/.pixeltable/config.toml`:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[pixeltable]
input_media_dest = "s3://my-app-bucket/uploads/"
output_media_dest = "s3://my-app-bucket/generated/"
s3_profile = "my-aws-profile" # optional, uses default credentials if not set
```
Then create your table and add computed columns:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions import openai
# Create a table — input media automatically goes to input_media_dest
t = pxt.create_table('production/photos', {'photo': pxt.Image})
# Add a computed column for thumbnails
# Uses output_media_dest by default, or specify a custom destination
t.add_computed_column(
thumbnail=t.photo.resize((256, 256)),
destination='s3://my-app-bucket/thumbnails/' # override default
)
# Add AI-generated descriptions (uses output_media_dest)
messages = [
{
'role': 'user',
'content': [
{'type': 'text', 'text': 'Describe this image briefly.'},
{'type': 'image_url', 'image_url': t.photo},
],
}
]
t.add_computed_column(
description=openai.chat_completions(messages, model='gpt-4o-mini')
)
# Insert data — Pixeltable handles all uploads automatically
t.insert([
{'photo': 'https://example.com/image1.jpg'},
{'photo': '/local/path/to/image2.png'},
])
# Query as usual — files are streamed/cached as needed
t.select(t.photo, t.thumbnail, t.description).collect()
```
## Best practices
Structure your bucket with prefixes that reflect your application:
```
s3://my-bucket/
├── production/
│ ├── uploads/
│ └── generated/
└── staging/
├── uploads/
└── generated/
```
Use different prefixes or buckets for input vs computed media:
* Easier to set different retention policies
* Clearer cost attribution
* Simpler backup strategies
Set up bucket lifecycle policies to automatically:
* Transition old data to cheaper storage tiers
* Delete temporary/staging data after a period
* Enable versioning for critical data
When running on cloud infrastructure, use IAM roles instead of access keys:
* More secure (no key rotation needed)
* Automatic credential refresh
* Better audit trails
## Troubleshooting
Verify your credentials have the necessary permissions:
* `s3:GetObject`, `s3:PutObject`, `s3:DeleteObject`
* `s3:ListBucket` for the bucket
For GCS: `storage.objects.create`, `storage.objects.get`, `storage.objects.delete`
* Ensure the bucket exists and the name is spelled correctly
* Check the region matches your credential configuration
* For S3-compatible providers, verify the endpoint URL is correct
* Pixeltable uses connection pooling and parallel uploads automatically
* Consider using a bucket in the same region as your compute
* Check your network bandwidth and latency
See the complete list of storage configuration options including profiles for S3, R2, B2, Tigris, and Azure.
Need help setting up cloud storage? Join our [Discord community](https://discord.com/invite/QPyqFYx2UN) for support.
# Embedding Models
Source: https://docs.pixeltable.com/integrations/embedding-model
Plug custom embedding models into Pixeltable for vector indices, semantic search, and retrieval-augmented generation over your own data.
Pixeltable provides extensive built-in support for popular embedding models, but you can also easily integrate your own custom embedding models. This guide shows you how to create and use custom embedding functions for any model architecture.
## Quick start
Here's a simple example using a custom BERT model:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import tensorflow as tf
import tensorflow_hub as hub
import pixeltable as pxt
@pxt.udf
def custom_bert_embed(text: str) -> pxt.Array[(512,), pxt.Float]:
"""Basic BERT embedding function"""
preprocessor = hub.load('https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3')
model = hub.load('https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/2')
tensor = tf.constant([text])
result = model(preprocessor(tensor))['pooled_output']
return result.numpy()[0, :]
# Create table and add embedding index
docs = pxt.create_table('documents', {'text': pxt.String})
docs.add_embedding_index('text', string_embed=custom_bert_embed)
```
## Production best practices
The quick start example works but isn't production-ready. Below we'll cover how to optimize your custom embedding UDFs.
### Model caching
Always cache your model instances to avoid reloading on every call:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
def optimized_bert_embed(text: str) -> pxt.Array[(512,), pxt.Float]:
"""BERT embedding function with model caching"""
if not hasattr(optimized_bert_embed, 'model'):
# Load models once
optimized_bert_embed.preprocessor = hub.load(
'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3'
)
optimized_bert_embed.model = hub.load(
'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/2'
)
tensor = tf.constant([text])
result = optimized_bert_embed.model(
optimized_bert_embed.preprocessor(tensor)
)['pooled_output']
return result.numpy()[0, :]
```
### Batch processing
Use Pixeltable's batching capabilities for better performance:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.func import Batch
@pxt.udf(batch_size=32)
def batched_bert_embed(texts: Batch[str]) -> Batch[pxt.Array[(512,), pxt.Float]]:
"""BERT embedding function with batching"""
if not hasattr(batched_bert_embed, 'model'):
batched_bert_embed.preprocessor = hub.load(
'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3'
)
batched_bert_embed.model = hub.load(
'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/2'
)
# Process entire batch at once
tensor = tf.constant(list(texts))
results = batched_bert_embed.model(
batched_bert_embed.preprocessor(tensor)
)['pooled_output']
return [r for r in results.numpy()]
```
## Error handling
Always implement proper error handling in production UDFs:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
def robust_bert_embed(text: str) -> pxt.Array[(512,), pxt.Float]:
"""BERT embedding with error handling"""
try:
if not text or len(text.strip()) == 0:
raise ValueError("Empty text input")
if not hasattr(robust_bert_embed, 'model'):
# Model initialization...
pass
tensor = tf.constant([text])
result = robust_bert_embed.model(
robust_bert_embed.preprocessor(tensor)
)['pooled_output']
return result.numpy()[0, :]
except Exception as e:
logger.error(f"Embedding failed: {str(e)}")
raise
```
## Additional resources
Complete UDF documentation
More embedding examples
Find embedding models
# Ecosystem
Source: https://docs.pixeltable.com/integrations/frameworks
Browse Pixeltable integrations with LangChain, FastAPI, PyTorch, Hugging Face, and other AI and ML frameworks for end-to-end pipelines.
From language models to computer vision frameworks, Pixeltable integrates with the entire ecosystem. All integrations are available out-of-the-box with Pixeltable installation. No additional setup required unless specified.
If you have a framework that you want us to integrate with, please reach out and you can also leverage Pixeltable's [UDFs](/platform/udfs-in-pixeltable) to build your own.
## Cloud LLM providers
Integrate Claude models for advanced language understanding and generation with multimodal capabilities
Access Google's Gemini models via Google AI Studio or Vertex AI for state-of-the-art multimodal AI capabilities
Leverage GPT models for text generation, embeddings, and image analysis
Use OpenAI models via Azure with enterprise security and compliance
Use Mistral's efficient language models for various NLP tasks
Access a variety of open-source models through Together AI's platform
Use Fireworks.ai's optimized model inference infrastructure
Leverage DeepSeek's powerful language and code models for text and code generation
Access a variety of AI models through AWS Bedrock's unified API
Access Groq's models for text generation
Unified access to 100+ LLMs from various providers through a single API
## Embeddings & Reranking
High-quality embeddings and reranking for text, images, and video
Embeddings and reranking optimized for search and RAG pipelines
## Video Understanding
Multimodal video understanding, search, and analysis with state-of-the-art foundation models
## Media Generation
Image generation, editing, fill, and expansion with FLUX models from Black Forest Labs
Fast image generation with Flux, Stable Diffusion, and other models
AI-powered video generation and editing capabilities
AI video generation with Gen-4 and other Runway models
## Local LLM runtimes
High-performance C++ implementation for running LLMs on CPU and GPU
Easy-to-use toolkit for running and managing open-source models locally
## Computer vision
State-of-the-art object detection with YOLOX models
Advanced video and image dataset management with Voxel51
## Annotation tools
Comprehensive platform for data annotation and labeling workflows
## Audio processing
High-quality speech recognition and transcription using OpenAI's Whisper models
## Enterprise Platforms
Azure OpenAI integration through Microsoft Fabric for enterprise AI workloads
## Data Wrangling
Import and export from and to Pandas DataFrames
## Usage examples
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions import openai
# Create a table with computed column for OpenAI completion
table = pxt.create_table('responses', {'prompt': pxt.String})
table.add_computed_column(
response=openai.chat_completions(
messages=[{'role': 'user', 'content': table.prompt}],
model='gpt-4'
)
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.yolox import yolox
# Add object detection to video frames
frames_view.add_computed_column(
detections=yolox(
frames_view.frame,
model_id='yolox_l'
)
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import openai
# Transcribe audio files
audio_table.add_computed_column(
transcription=openai.transcriptions(
audio=audio_table.file,
model='whisper-1'
)
)
```
## Integration features
Most integrations work out-of-the-box with simple API configuration
Use integrations directly in computed columns for automated processing
Efficient handling of batch operations with automatic optimization
Check the [provider notebooks](https://github.com/pixeltable/pixeltable/tree/main/docs/release/howto/providers) for detailed usage instructions for each integration.
Need help setting up integrations? Join our [Discord community](https://discord.gg/QPyqFYx2UN) for support.
# Model Hub & Repositories
Source: https://docs.pixeltable.com/integrations/models
Browse pre-trained models built into Pixeltable for vision, language, speech, and embeddings across OpenAI, Anthropic, Hugging Face, and more.
## Model hubs
Access thousands of pre-trained models across vision, text, and audio domains
Deploy and run ML models through Replicate's cloud infrastructure
## Hugging Face models
Pixeltable provides seamless integration with Hugging Face's transformers library through built-in UDFs. These functions allow you to use state-of-the-art models directly in your data workflows.
Requirements: Install required dependencies with `pip install transformers`. Some models may require additional packages like `sentence-transformers` or `torch`.
### CLIP models
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import clip
# For text embedding
t.add_computed_column(
text_embedding=clip(
t.text_column,
model_id='openai/clip-vit-base-patch32'
)
)
# For image embedding
t.add_computed_column(
image_embedding=clip(
t.image_column,
model_id='openai/clip-vit-base-patch32'
)
)
```
Perfect for multimodal applications combining text and image understanding.
### Cross-encoders
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import cross_encoder
t.add_computed_column(
similarity_score=cross_encoder(
t.sentence1,
t.sentence2,
model_id='cross-encoder/ms-marco-MiniLM-L-4-v2'
)
)
```
Ideal for semantic similarity tasks and sentence pair classification.
### DETR object detection
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import detr_for_object_detection
t.add_computed_column(
detections=detr_for_object_detection(
t.image,
model_id='facebook/detr-resnet-50',
threshold=0.8
)
)
# Convert to COCO format if needed
t.add_computed_column(
coco_format=detr_to_coco(t.image, t.detections)
)
```
Powerful object detection with end-to-end transformer architecture.
### Sentence transformers
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import sentence_transformer
t.add_computed_column(
embeddings=sentence_transformer(
t.text,
model_id='sentence-transformers/all-mpnet-base-v2',
normalize_embeddings=True
)
)
```
State-of-the-art sentence and document embeddings for semantic search and similarity.
### Speech2Text models
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import speech2text_for_conditional_generation
# Basic transcription
t.add_computed_column(
transcript=speech2text_for_conditional_generation(
t.audio,
model_id='facebook/s2t-small-librispeech-asr'
)
)
# Multilingual translation
t.add_computed_column(
translation=speech2text_for_conditional_generation(
t.audio,
model_id='facebook/s2t-medium-mustc-multilingual-st',
language='fr'
)
)
```
Support for both transcription and translation of audio content.
### Vision Transformer (ViT)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import vit_for_image_classification
t.add_computed_column(
classifications=vit_for_image_classification(
t.image,
model_id='google/vit-base-patch16-224',
top_k=5
)
)
```
Modern image classification using transformer architecture.
## Integration features
All models can be used directly in computed columns for automated processing:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Example: Combine CLIP embeddings with ViT classification
t.add_computed_column(
image_features=clip(t.image, model_id='openai/clip-vit-base-patch32')
)
t.add_computed_column(
classifications=vit_for_image_classification(t.image, model_id='google/vit-base-patch16-224')
)
```
Pixeltable automatically handles batch processing and optimization:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Pixeltable efficiently processes large datasets
t.add_computed_column(
embeddings=sentence_transformer(
t.text,
model_id='all-mpnet-base-v2'
)
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Object Detection Output
{
'scores': [0.99, 0.98], # confidence scores
'labels': [25, 30], # class labels
'label_text': ['cat', 'dog'], # human-readable labels
'boxes': [[x1, y1, x2, y2], ...] # bounding boxes
}
# Image Classification Output
{
'scores': [0.8, 0.15], # class probabilities
'labels': [340, 353], # class IDs
'label_text': ['zebra', 'gazelle'] # class names
}
```
## Model selection guide
Select the appropriate model family based on your task:
* Text/Image Similarity → CLIP
* Object Detection → DETR
* Text Embeddings → Sentence Transformers
* Speech Processing → Speech2Text
* Image Classification → ViT
Install necessary dependencies:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pip install transformers torch sentence-transformers
```
Import and use the model in your Pixeltable workflow:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import clip, sentence_transformer
```
Need help choosing the right model? Check our [provider notebooks](https://docs.pixeltable.com/integrations/frameworks) or join our [Discord community](https://discord.gg/QPyqFYx2UN).
# Agent Frameworks
Source: https://docs.pixeltable.com/migrate/from-agent-frameworks
Map LangGraph, CrewAI, LangChain, and AutoGen agent concepts to Pixeltable tables, computed columns, and tool-calling UDFs for migration.
If you've been building AI agents with LangGraph or CrewAI — defining state graphs, tool nodes, conditional edges, and bolting on separate memory stores — this guide shows how Pixeltable replaces the graph DSL with declarative tables.
**Related use case:** [Agents & MCP](/use-cases/agents-mcp)
***
## Concept Mapping
| Agent Framework | Pixeltable Equivalent |
| --------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| `StateGraph` / `AgentExecutor` | [`pxt.create_table()`](/tutorials/tables-and-data-operations) with [computed columns](/tutorials/computed-columns) |
| Graph nodes (functions) | Computed columns — dependencies resolved automatically |
| Graph edges / conditional routing | Column references — Pixeltable infers the DAG |
| `ToolNode` / `@tool` | [`pxt.tools()` + `invoke_tools()`](/howto/cookbooks/agents/llm-tool-calling) |
| `MemorySaver` / checkpointer | Tables are persistent by default |
| Separate vector DB for RAG | [`add_embedding_index()`](/platform/embedding-indexes) + [`@pxt.query`](/platform/udfs-in-pixeltable) |
| LangSmith for observability | `t.select()` on any column — every step is [queryable](/tutorials/queries-and-expressions) |
***
## Side by Side: Tool-Calling Agent
An agent that picks tools, calls them, and answers based on the results.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from typing import Annotated, Sequence, TypedDict
from langchain_core.messages import BaseMessage, HumanMessage
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END, add_messages
from langgraph.prebuilt import ToolNode
from langchain_core.tools import tool
class AgentState(TypedDict):
messages: Annotated[Sequence[BaseMessage], add_messages]
@tool
def get_weather(city: str) -> str:
"""Get current weather for a city."""
return f'Weather in {city}: 72°F, sunny'
@tool
def search_docs(query: str) -> str:
"""Search internal documents."""
return f'Results for: {query}'
tools = [get_weather, search_docs]
model = ChatOpenAI(model='gpt-4o-mini').bind_tools(tools)
def call_model(state):
return {'messages': [model.invoke(state['messages'])]}
def should_continue(state):
last = state['messages'][-1]
return 'tools' if last.tool_calls else END
workflow = StateGraph(AgentState)
workflow.add_node('agent', call_model)
workflow.add_node('tools', ToolNode(tools))
workflow.set_entry_point('agent')
workflow.add_conditional_edges(
'agent', should_continue, {'tools': 'tools', END: END})
workflow.add_edge('tools', 'agent')
graph = workflow.compile()
result = graph.invoke(
{'messages': [HumanMessage(content='Weather in SF?')]})
print(result['messages'][-1].content)
```
**Packages:** `langgraph`, `langchain-openai`, `langchain-core`, plus a vector DB client for RAG
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.openai import chat_completions, invoke_tools
@pxt.udf
def get_weather(city: str) -> str:
"""Get current weather for a city."""
return f'Weather in {city}: 72°F, sunny'
@pxt.udf
def search_docs(query: str) -> str:
"""Search internal documents."""
return f'Results for: {query}'
tools = pxt.tools(get_weather, search_docs)
agent = pxt.create_table('agents.assistant', {'message': pxt.String})
agent.add_computed_column(response=chat_completions(
messages=[{'role': 'user', 'content': agent.message}],
model='gpt-4o-mini', tools=tools))
agent.add_computed_column(
tool_output=invoke_tools(tools, agent.response))
@pxt.udf
def build_followup(message: str, tool_output: dict) -> list[dict]:
results = [
str(r) for vals in (tool_output or {}).values()
if vals for r in vals
]
return [
{'role': 'user', 'content': message},
{'role': 'assistant', 'content': '\n'.join(results)},
{'role': 'user', 'content':
'Answer my original question using that information.'},
]
agent.add_computed_column(
followup=build_followup(agent.message, agent.tool_output))
agent.add_computed_column(
final=chat_completions(messages=agent.followup, model='gpt-4o-mini'))
agent.add_computed_column(
answer=agent.final.choices[0].message.content)
agent.insert([{'message': 'What is the weather in SF?'}])
agent.select(agent.message, agent.answer).collect()
```
**Packages:** `pixeltable`, `openai`
### What Changes
| | LangGraph / CrewAI | Pixeltable |
| -------------------- | -------------------------------------- | --------------------------------------------------------- |
| **State** | Ephemeral — lost when the process ends | Persistent — every row survives restarts |
| **Caching** | No built-in caching of tool results | Same input returns cached result |
| **Observability** | LangSmith (separate service + API key) | `agent.select(agent.tool_output).collect()` |
| **Adding RAG** | Separate vector DB integration | `add_embedding_index()` + `@pxt.query` — no extra service |
| **Graph definition** | Nodes, edges, conditional routing DSL | Computed columns — Pixeltable infers the DAG |
| **MCP tools** | Custom integration | `pxt.mcp_udfs()` loads tools from any MCP server |
***
## Common Patterns
### Adding persistent memory
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from langgraph.checkpoint.memory import MemorySaver
checkpointer = MemorySaver()
graph = workflow.compile(checkpointer=checkpointer)
# In-process only — lost on restart
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.openai import embeddings
memories = pxt.create_table('agents.memories', {
'content': pxt.String, 'timestamp': pxt.Timestamp})
memories.add_embedding_index('content',
string_embed=embeddings.using(model='text-embedding-3-small'))
@pxt.query
def recall(query: str, top_k: int = 5) -> pxt.Query:
sim = memories.content.similarity(string=query)
return memories.order_by(sim, asc=False) \
.limit(top_k).select(memories.content)
```
### Adding RAG to an agent
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from langchain_pinecone import PineconeVectorStore
vector_store = PineconeVectorStore(
index_name='docs', embedding=embeddings)
@tool
def search_kb(query: str) -> str:
"""Search the knowledge base."""
docs = vector_store.as_retriever() \
.get_relevant_documents(query)
return '\n'.join(d.page_content for d in docs)
# Must add tool to graph, re-compile...
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.query
def search_kb(query: str) -> pxt.Query:
"""Search the knowledge base."""
sim = chunks.text.similarity(string=query)
return chunks.order_by(sim, asc=False) \
.limit(5).select(chunks.text)
tools = pxt.tools(get_weather, search_kb)
```
### Inspecting agent behavior
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Requires LangSmith: set LANGSMITH_API_KEY,
# LANGSMITH_PROJECT, then view traces in dashboard
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
agent.select(
agent.message,
agent.tool_output,
agent.answer
).collect()
```
***
## Next Steps
Full use case walkthrough
All 8 agentic patterns as Pixeltable tables
Register UDFs and queries as LLM tools
Lightweight agent framework built on Pixeltable
# DIY Data Pipeline
Source: https://docs.pixeltable.com/migrate/from-diy-data-pipeline
Replace custom Python scripts, DVC, Airflow, and manual ETL with declarative Pixeltable tables, views, and incremental computed columns.
If you've been wrangling multimodal data with custom Python scripts, DVC for versioning, Airflow for scheduling, and manual processing loops — this guide shows how Pixeltable replaces that plumbing with declarative tables.
**Related use case:** [Data Wrangling for ML](/use-cases/ml-data-wrangling)
***
## Concept Mapping
| Your DIY Stack | Pixeltable Equivalent |
| -------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
| S3 buckets for media files | [`pxt.Image`, `pxt.Video`, `pxt.Audio`](/platform/type-system) columns — can still [read from S3](/integrations/cloud-storage) |
| DVC for data versioning | Built-in [`history()`, `revert()`, `create_snapshot()`](/platform/version-control) |
| Airflow / cron for scheduling | [Computed columns](/tutorials/computed-columns) — run automatically on insert |
| Custom scripts with OpenCV / PIL | [`@pxt.udf`](/platform/udfs-in-pixeltable) functions as computed columns |
| `cv2.VideoCapture()` + frame loops | [`frame_iterator`](/platform/iterators) via `create_view()` |
| Manual retry logic (`tenacity`) | Automatic retries with result caching |
| Embeddings as numpy / Parquet | [`add_embedding_index()`](/platform/embedding-indexes) with HNSW search |
| `torch.utils.data.Dataset` boilerplate | [`to_pytorch_dataset()`](/howto/cookbooks/data/data-export-pytorch) — one line |
| Re-run pipeline when data changes | Incremental — only new rows are processed |
***
## Side by Side: Image Processing Pipeline
Process images: generate thumbnails, caption with an LLM, embed for search, version everything.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pandas as pd
import numpy as np
from PIL import Image
from openai import OpenAI
from pathlib import Path
import base64, time
client = OpenAI()
# Load metadata
image_dir = Path('dataset/images/')
df = pd.DataFrame([
{'filename': f.name, 'path': str(f), 'category': 'unknown'}
for f in image_dir.glob('*.jpg')
])
# Generate thumbnails (manual loop)
thumb_dir = Path('dataset/thumbnails/')
thumb_dir.mkdir(exist_ok=True)
for idx, row in df.iterrows():
img = Image.open(row['path'])
img.thumbnail((256, 256))
img.save(thumb_dir / row['filename'])
df.at[idx, 'thumbnail'] = str(thumb_dir / row['filename'])
# Caption images (manual retry, one at a time)
def caption_image(path, max_retries=3):
with open(path, 'rb') as f:
b64 = base64.b64encode(f.read()).decode()
for attempt in range(max_retries):
try:
resp = client.chat.completions.create(
model='gpt-4o-mini',
messages=[{'role': 'user', 'content': [
{'type': 'text', 'text': 'Describe this image in one sentence.'},
{'type': 'image_url', 'image_url': {
'url': f'data:image/jpeg;base64,{b64}'}}
]}],
)
return resp.choices[0].message.content
except Exception:
if attempt < max_retries - 1:
time.sleep(2 ** attempt)
else:
return None
df['caption'] = [caption_image(row['path']) for _, row in df.iterrows()]
# Generate embeddings (batch manually, store as numpy)
valid = df.dropna(subset=['caption'])
resp = client.embeddings.create(
input=valid['caption'].tolist(), model='text-embedding-3-small')
np.save('dataset/embeddings.npy', [e.embedding for e in resp.data])
# Persist and version
df.to_csv('dataset/metadata.csv', index=False)
# Then: dvc add dataset/ && dvc push && git add && git commit
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.openai import chat_completions, embeddings
from pathlib import Path
images = pxt.create_table('ml.images', {
'image': pxt.Image, 'category': pxt.String})
images.add_computed_column(thumbnail=images.image.resize((256, 256)))
messages = [{'role': 'user', 'content': [
{'type': 'text', 'text': 'Describe this image in one sentence.'},
{'type': 'image_url', 'image_url': images.image},
]}]
images.add_computed_column(response=chat_completions(
messages=messages, model='gpt-4o-mini'))
images.add_computed_column(
caption=images.response.choices[0].message.content)
images.add_embedding_index('caption',
string_embed=embeddings.using(model='text-embedding-3-small'))
images.insert([{'image': str(f), 'category': 'unknown'}
for f in Path('dataset/images/').glob('*.jpg')])
sim = images.caption.similarity(string='a dog playing in the park')
images.order_by(sim, asc=False).limit(5) \
.select(images.image, images.caption).collect()
```
### What Changes
| | Custom Scripts | Pixeltable |
| ------------------ | ------------------------------------------------------- | -------------------------------------------------------- |
| **New images** | Re-run the entire pipeline | `images.insert([...])` — everything downstream runs |
| **Change model** | Re-run everything; DVC tracks snapshots, not transforms | Drop and re-add the column — only that column recomputes |
| **Versioning** | `dvc add` + `git commit` ceremony | Automatic — `images.history()`, `pxt.create_snapshot()` |
| **Scheduling** | Airflow, cron, or manual re-runs | Not needed — computed columns run on insert |
| **Retries** | `try/except` with backoff in every function | Built-in; successful results are cached |
| **Search** | Brute-force numpy, or set up a vector DB | `add_embedding_index()` with HNSW |
| **PyTorch export** | Custom `Dataset` class | `images.to_pytorch_dataset()` |
***
## Common Patterns
### Video frame extraction
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import cv2
from PIL import Image
cap = cv2.VideoCapture('demo.mp4')
fps = cap.get(cv2.CAP_PROP_FPS)
frames, idx = [], 0
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
if idx % int(fps) == 0:
frames.append(Image.fromarray(
cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)))
idx += 1
cap.release()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.video import frame_iterator
videos = pxt.create_table('ml.videos', {'video': pxt.Video})
frames = pxt.create_view('ml.frames', videos,
iterator=frame_iterator(videos.video, fps=1))
videos.insert([{'video': 'demo.mp4'}])
frames.select(frames.frame).head(10)
```
### Data versioning
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
dvc add dataset/
git add dataset.dvc .gitignore
git commit -m "update dataset v3"
dvc push
# Revert
git checkout HEAD~1 -- dataset.dvc
dvc checkout
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
images.history()
pxt.create_snapshot('ml.images_before_relabeling', images)
images.revert()
```
### PyTorch export
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
class ImageDataset(Dataset):
def __init__(self, df, transform=None):
self.df = df.reset_index(drop=True)
self.transform = transform
def __len__(self):
return len(self.df)
def __getitem__(self, idx):
img = Image.open(self.df.at[idx, 'path'])
if self.transform:
img = self.transform(img)
return img, self.df.at[idx, 'category']
loader = DataLoader(ImageDataset(df, transforms.Compose([
transforms.Resize((224, 224)), transforms.ToTensor()])),
batch_size=32)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from torch.utils.data import DataLoader
ds = images.select(images.image, images.category) \
.to_pytorch_dataset()
loader = DataLoader(ds, batch_size=32)
```
***
## Next Steps
Full use case walkthrough
Frame extraction with FPS control
Convert tables to DataLoaders
S3, GCS, Azure, R2, Tigris
# Hand-Written FastAPI Endpoints
Source: https://docs.pixeltable.com/migrate/from-hand-written-endpoints
Migrate hand-written FastAPI endpoints into declarative Pixeltable FastAPIRouter routes that serve tables and queries with typed schemas.
If your Pixeltable app has hand-written FastAPI endpoints that call `pxt.get_table()`, run queries, and manually serialize results, you can replace most of them with [`FastAPIRouter`](/howto/deployment/serving) routes. The result: fewer lines of code, automatic request/response schemas, built-in media serving, and background job support.
**Related guide:** [Serving Tables and Queries over HTTP](/howto/deployment/serving)
***
## Concept Mapping
| Hand-Written Endpoints | `FastAPIRouter` Routes |
| -------------------------------------------------------------- | ------------------------------------------------ |
| Parse request, call `table.insert()`, serialize response | `add_insert_route(table, path, inputs, outputs)` |
| Parse params, run query, call `to_pydantic()` or `to_pandas()` | `add_query_route(path, query)` |
| Parse body, call `table.delete_where()` | `add_delete_route(table, path)` |
| One Pydantic model per endpoint | Auto-generated from column schemas |
| Manual file responses for media | Built-in `/media/...` handler |
| Custom task queue for background work | `background=True` on any route |
***
## Side by Side: Query Endpoint
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from fastapi import APIRouter
from pydantic import BaseModel
router = APIRouter(prefix="/api/data", tags=["data"])
class DocumentItem(BaseModel):
title: str
document: str
class DocumentsResponse(BaseModel):
items: list[DocumentItem]
@router.get("/documents", response_model=DocumentsResponse)
def list_documents():
t = pxt.get_table('myapp.documents')
results = t.select(t.title, t.document).collect()
return DocumentsResponse(
items=[DocumentItem(**row) for row in results.to_dicts()]
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.serving import FastAPIRouter
router = FastAPIRouter(prefix="/api/data", tags=["data"])
docs = pxt.get_table('myapp/documents')
@pxt.query
def list_documents():
return docs.select(docs.title, docs.document).order_by(docs.title)
router.add_query_route(path="/documents", query=list_documents, method="get")
```
### What Changes
| | Hand-Written | `FastAPIRouter` |
| ------------------------- | --------------------------------------------- | ------------------------------- |
| **New endpoint** | Write handler, Pydantic model, serialization | One `add_*_route()` call |
| **File uploads** | Parse `UploadFile`, save, pass path to insert | `uploadfile_inputs=["image"]` |
| **Media responses** | Manual `FileResponse` or base64 encoding | Built-in `/media/...` serving |
| **Background processing** | Custom task queue (Celery, RQ, etc.) | `background=True` on any route |
| **OpenAPI docs** | Manual schema definitions | Auto-generated from columns |
| **Delete** | Parse body, call `delete_where()` | `add_delete_route(table, path)` |
***
## Step-by-Step Migration
### 1. Define queries in router files
Every read pattern becomes a `@pxt.query` function defined in the router file where it's used. Each one maps to a single `add_query_route` call.
`@pxt.query` eagerly evaluates the function body at decoration time. Tables must exist before the router module is imported. Run `python schema.py` before starting the app.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# routers/data.py — queries live next to the routes that use them
import pixeltable as pxt
from pixeltable.serving import FastAPIRouter
router = FastAPIRouter(prefix="/api/data", tags=["data"])
docs = pxt.get_table('myapp/documents')
@pxt.query
def list_documents():
return docs.select(docs.title, docs.document).order_by(docs.title)
@pxt.query
def search_documents(query_text: str, limit: int = 10):
sim = docs.text.similarity(string=query_text)
return docs.order_by(sim, asc=False).limit(limit).select(docs.title, sim)
router.add_query_route(path="/documents", query=list_documents, method="get")
router.add_query_route(path="/search", query=search_documents)
router.add_insert_route(docs, path="/upload", uploadfile_inputs=["document"], outputs=["uuid"])
```
### 2. Replace `APIRouter` with `FastAPIRouter`
`FastAPIRouter` is a subclass of `APIRouter`, so hand-written `@router.post()` endpoints coexist on the same instance. Mount routers in `main.py`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# main.py
from fastapi import FastAPI
from routers import data, search
app = FastAPI()
app.include_router(data.router)
app.include_router(search.router)
```
### 4. Convert uploads
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@router.post("/upload")
def upload(file: UploadFile):
t = pxt.get_table('myapp.images')
t.insert([{'image': file.filename, 'timestamp': datetime.now()}])
return {"status": "ok"}
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
images_table = pxt.get_table('myapp/images')
router.add_insert_route(
images_table, path="/upload",
uploadfile_inputs=["image"], inputs=["timestamp"],
outputs=["uuid", "thumbnail"],
)
```
The multipart form field name must match the column name.
### 5. Convert deletes
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
router.add_delete_route(docs_table, path="/delete")
# Client sends: POST {"uuid": "..."} (matches primary key)
# Response: {"num_rows": 1}
```
### 6. Delete old Pydantic models
`FastAPIRouter` auto-generates request/response schemas from column types. Delete hand-written Pydantic models for any endpoint that is now declarative. Keep only models for remaining hand-written endpoints.
***
## Eliminating Insert-Then-Query with `return_rows=True`
A common hand-written pattern is inserting a row, then immediately querying to read back computed columns. Use `return_rows=True` instead:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@router.post("/query")
def agent_query(request: QueryRequest):
table.insert([{"prompt": request.prompt}])
result = table.where(table.prompt == request.prompt).select(
table.answer, table.tool_output
).collect()
return result[0]
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pydantic import BaseModel
class AgentResult(BaseModel):
model_config = {"extra": "ignore"}
answer: str | None = None
tool_output: Any = None
@router.post("/query")
def agent_query(request: QueryRequest):
status = table.insert(
[{"prompt": request.prompt}], return_rows=True
)
return AgentResult.model_validate(status.rows[0])
```
`status.rows` contains one dict per inserted row with all columns (including computed). Use `model_validate()` with `extra="ignore"` for typed access to the subset you need.
Also works with `update()` and `batch_update()`.
***
## When to Keep a Hand-Written Endpoint
Not everything fits the declarative model. Keep `@router.post()` when:
* **Multi-table operations:** inserting into one table then conditionally writing to another
* **Conditional logic:** different behavior based on intermediate results
* **Custom response shapes:** aggregating across multiple tables into a single response
* **Side effects:** sending emails, webhooks, or other non-Pixeltable actions
Since `FastAPIRouter` extends `APIRouter`, hand-written and declarative routes coexist on the same router instance.
***
## Gotchas
### `@pxt.query` runs at decoration time
The function body executes when Python hits the `@pxt.query` decorator, not when you call the function. If the tables don't exist yet, you get an error. Run `python schema.py` before starting the app so tables exist when router modules are imported.
### UUID parameters need `uuid.UUID` annotations
Pixeltable UUID columns require `uuid.UUID` objects. Use the type annotation and let Pydantic parse strings automatically:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import uuid as _uuid
@pxt.query
def get_chunks(file_uuid: _uuid.UUID):
return view.where(view.uuid == file_uuid).select(...)
```
Do not call `UUID()` inside the query body. The parameter is an expression proxy, not a string.
### Media columns serialize as URL strings
`FastAPIRouter` serializes `pxt.Document`, `pxt.Image`, `pxt.Video` columns as URL paths (e.g., `/api/data/media/path/to/file.pdf`). The client receives a string, not binary data.
### Query routes default to POST
`add_query_route` defaults to `method="post"` (JSON body). For parameter-free list endpoints, set `method="get"`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
router.add_query_route(path="/list", query=list_all, method="get")
```
### Delete routes use POST, not HTTP DELETE
`add_delete_route` uses POST with a JSON body (`{"uuid": "..."}`), not the HTTP DELETE method. Update client fetch calls accordingly.
### Query responses wrap results in `{ "rows": [...] }`
All `add_query_route` responses return `{"rows": [...]}`, not a flat array. Client code must unwrap `.rows`.
### Name your embedding indexes explicitly
`add_embedding_index(if_exists="ignore")` can create duplicates without an explicit name. Always pass `idx_name`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
view.add_embedding_index(
'text', idx_name='chunks_text_embed',
string_embed=e5_embed, if_exists='ignore',
)
```
***
## Best Practices
1. **Start with `@pxt.query`** for all read patterns. Each one becomes a single `add_query_route`.
2. **One table per route.** Let the client merge results from granular endpoints.
3. **Use `FastAPIRouter` as your base router.** You get media serving and background jobs for free.
4. **Keep complex orchestration hand-written.** Multi-table inserts and conditional logic stay as `@router.post()`.
5. **Type your `@pxt.query` parameters** (`uuid.UUID`, `int`, `float`, `str`). Pydantic coerces the incoming JSON; Pixeltable handles the rest.
6. **Prototype with TOML first.** [`pxt serve`](/howto/deployment/serving) with `[[tool.pixeltable.service.routes]]` in `pyproject.toml` is the fastest way to test a new route before wiring it into your app.
***
## Next Steps
TOML config, CLI, Python API, background jobs
Full backend, batch processing, and declarative serving
Concurrency, error handling, sync endpoints
End-to-end multimodal app patterns
# RDBMS & Vector DBs
Source: https://docs.pixeltable.com/migrate/from-rdbms-vectordbs
Replace Postgres plus Pinecone plus LangChain stacks with a single Pixeltable system that handles structured data, embeddings, and RAG queries.
If you're running a RAG application with Postgres for metadata, a vector database like Pinecone or Weaviate for embeddings, and LangChain for orchestration — this guide shows how Pixeltable unifies all three.
**Related use case:** [Backend for AI Apps](/use-cases/ai-applications)
***
## Concept Mapping
| Your Database Stack | Pixeltable Equivalent |
| ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| Postgres / MySQL for metadata | [`pxt.create_table()`](/tutorials/tables-and-data-operations) with typed columns |
| Pinecone / Weaviate / Chroma for embeddings | [`add_embedding_index()`](/platform/embedding-indexes) — built-in HNSW search |
| S3 for media files (referenced by URL) | [`pxt.Image`, `pxt.Video`, `pxt.Document`](/platform/type-system) native types |
| ORM (SQLAlchemy, Prisma) | [`.select()`, `.where()`, `.order_by()`](/tutorials/queries-and-expressions) |
| LangChain `DocumentLoader` | `insert()`, [`import_csv()`](/howto/cookbooks/data/data-import-csv), [import from S3](/integrations/cloud-storage) |
| `RecursiveCharacterTextSplitter` | [`document_splitter`](/platform/iterators) iterator via `create_view()` |
| `retriever.get_relevant_documents()` | [`.similarity()`](/platform/embedding-indexes) + `.order_by()` |
| `create_retrieval_chain()` | [Computed column](/tutorials/computed-columns) with LLM call |
| Keeping Postgres and Pinecone in sync | Automatic — derived columns can't go stale |
***
## Side by Side: RAG Pipeline
Load documents, chunk, embed, retrieve, and generate answers.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_pinecone import PineconeVectorStore
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.prompts import PromptTemplate
# Load and chunk
documents = PyPDFLoader('report.pdf').load()
chunks = RecursiveCharacterTextSplitter(
chunk_size=1000, chunk_overlap=200
).split_documents(documents)
# Embed and store in Pinecone
embeddings = OpenAIEmbeddings(model='text-embedding-3-small')
vector_store = PineconeVectorStore.from_documents(
chunks, embeddings, index_name='my-index')
retriever = vector_store.as_retriever(search_kwargs={'k': 5})
# Build chain
prompt = PromptTemplate.from_template(
'Answer based on context:\n{context}\n\nQuestion: {input}')
llm = ChatOpenAI(model='gpt-4o-mini', temperature=0)
rag_chain = create_retrieval_chain(
retriever,
create_stuff_documents_chain(llm, prompt))
result = rag_chain.invoke({'input': 'What were the key findings?'})
print(result['answer'])
```
**Packages:** `langchain`, `langchain-openai`, `langchain-pinecone`, `pinecone-client`, `sqlalchemy`
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.openai import chat_completions, embeddings
from pixeltable.functions.document import document_splitter
docs = pxt.create_table('rag.docs', {
'pdf': pxt.Document, 'source': pxt.String})
chunks = pxt.create_view('rag.chunks', docs,
iterator=document_splitter(
docs.pdf, separators='sentence,token_limit', limit=300))
chunks.add_embedding_index('text',
string_embed=embeddings.using(model='text-embedding-3-small'))
@pxt.query
def retrieve(question: str, top_k: int = 5) -> pxt.Query:
sim = chunks.text.similarity(string=question)
return chunks.order_by(sim, asc=False) \
.limit(top_k).select(chunks.text)
qa = pxt.create_table('rag.qa', {'question': pxt.String})
qa.add_computed_column(context=retrieve(qa.question))
@pxt.udf
def build_prompt(question: str, context: list[dict]) -> str:
ctx = '\n\n'.join(c['text'] for c in context)
return f'Answer based on context:\n{ctx}\n\nQuestion: {question}'
qa.add_computed_column(prompt=build_prompt(qa.question, qa.context))
qa.add_computed_column(response=chat_completions(
messages=[{'role': 'user', 'content': qa.prompt}],
model='gpt-4o-mini'))
qa.add_computed_column(
answer=qa.response.choices[0].message.content)
docs.insert([{'pdf': 'report.pdf', 'source': 'annual_report'}])
qa.insert([{'question': 'What were the key findings?'}])
qa.select(qa.question, qa.answer).collect()
```
**Packages:** `pixeltable`, `openai`
### What Changes
| | LangChain + Pinecone | Pixeltable |
| ------------------------ | ------------------------------------------------ | ------------------------------------------------------------------------- |
| **New documents** | Re-run chunking, embedding, and Pinecone upsert | `docs.insert([...])` — chunks, embeddings, and index update automatically |
| **Infrastructure** | Postgres + Pinecone account + API keys | Single local system, no external services |
| **Sync issues** | Postgres metadata and Pinecone vectors can drift | Impossible — derived columns are always consistent |
| **Intermediate results** | Ephemeral unless you add logging | Every column is stored and queryable: `qa.select(qa.context).collect()` |
| **Versioning** | Not built-in | `t.history()`, `pxt.create_snapshot()` |
| **Swap providers** | Rewrite chain with new provider classes | Change the model string — same pipeline |
***
## Common Patterns
### Adding new documents
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
new_docs = PyPDFLoader('new_report.pdf').load()
new_chunks = splitter.split_documents(new_docs)
vector_store.add_documents(new_chunks)
# Also update Postgres metadata...
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
docs.insert([{'pdf': 'new_report.pdf', 'source': 'quarterly'}])
```
### Filtering by metadata
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
retriever = vector_store.as_retriever(
search_kwargs={'k': 5, 'filter': {'source': 'annual_report'}})
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sim = chunks.text.similarity(string=query)
results = (chunks
.where((chunks.source == 'annual_report') & (sim > 0.3))
.order_by(sim, asc=False).limit(5).collect())
```
### Inspecting what was retrieved
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
result = rag_chain.invoke({'input': query})
print(result['context']) # if available
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
qa.select(qa.question, qa.context, qa.answer).collect()
```
***
## Next Steps
Full use case walkthrough
Complete RAG system with chunking and retrieval
Control chunk size, overlap, and splitting strategies
Search patterns and similarity queries
# Building with LLMs
Source: https://docs.pixeltable.com/overview/building-pixeltable-with-llms
Build Pixeltable applications faster with AI coding tools like Cursor and Claude using bundled context, examples, and llms.txt resources.
## Why Pixeltable Is Easy to Vibe-Code
Pixeltable's API is declarative: you say *what* you want, not *how* to wire it up. That means LLMs get it right on the first try. Ask your AI tool to "summarize articles with GPT-4o-mini" and you get working code:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.openai import chat_completions
t = pxt.create_table('app.articles', {'title': pxt.String, 'body': pxt.String})
t.add_computed_column(response=chat_completions(
messages=[{'role': 'user', 'content': t.body}], model='gpt-4o-mini'))
t.add_computed_column(summary=t.response.choices[0].message.content)
t.insert([{'title': 'Climate Report', 'body': 'Global temperatures rose 1.2°C ...'}])
t.select(t.title, t.summary).collect()
```
Ten lines of code, and the result is **persistent**, **versioned**, **traceable**, and **incrementally optimized**. Every output is stored, every transformation is replayable, and new rows only recompute what changed. The same pattern scales to [RAG pipelines](/howto/cookbooks/agents/pattern-rag-pipeline), [video frame extraction](/howto/cookbooks/video/video-extract-frames), [tool-calling agents](/howto/cookbooks/agents/llm-tool-calling), and [semantic search](/howto/cookbooks/search/search-semantic-text).
***
## Set Up Your AI Tool
Pick the setup that matches your editor. These aren't mutually exclusive; use whichever combination helps.
### Agent Skill (recommended, works everywhere)
The [Pixeltable Agent Skill](https://github.com/pixeltable/pixeltable-skill) teaches AI coding assistants to write correct Pixeltable code on the first attempt. It provides anti-pattern deflection (no LangChain, no pandas-as-store, no for-loops calling AI), correct patterns for 25+ providers, and production recipes for agents, RAG, and multimodal pipelines.
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
npx skills add pixeltable/pixeltable-skill
```
Works with any tool that supports the [Agent Skills specification](https://agentskills.io/specification): Cursor, Claude Code, Windsurf, Cline, OpenCode, Codex CLI, and 40+ more.
The `npx skills add` command above is the recommended setup. It installs the full skill with anti-pattern deflection, provider coverage, and progressive reference loading.
Alternatively, drop our [AGENTS.md](https://github.com/pixeltable/pixeltable/blob/main/AGENTS.md) into your project root for contributor-focused context:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl -o AGENTS.md https://raw.githubusercontent.com/pixeltable/pixeltable/main/AGENTS.md
```
The `npx skills add` command works with Claude Code. You can also install as a Claude Code plugin for auto-updates:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
/plugin marketplace add pixeltable/pixeltable-skill
/plugin install pixeltable-skill@pixeltable-skill
```
The skill loads a concise `SKILL.md` first (\~480 lines), then pulls in reference files on demand only when the task requires them.
Append `.md` to any docs URL to get a plain-text version optimized for LLMs. Paste it straight into your chat.
| Resource | URL |
| --------------------------- | --------------------------------------------------------------------------------------------------------------------------------- |
| Any docs page as markdown | `https://docs.pixeltable.com/.md`, e.g., [this page](https://docs.pixeltable.com/overview/building-pixeltable-with-llms.md) |
| Site index for LLMs | [llms.txt](https://docs.pixeltable.com/llms.txt) ([standard](https://llmstxt.org/)) |
| Full site map with metadata | [llms-full.txt](https://docs.pixeltable.com/llms-full.txt) |
***
## MCP Servers
Connect your AI tool to Pixeltable directly via the [Model Context Protocol](https://modelcontextprotocol.io). We ship two servers, or you can build your own using [`pxt.mcp_udfs()`](/libraries/mcp).
Search the full documentation from Claude Desktop, Cursor, or Windsurf:
```
https://docs.pixeltable.com/mcp
```
Exposes a `SearchPixeltableDocumentation` tool that returns relevant content, code examples, and direct links.
32 tools for creating tables, running queries, managing dependencies, and executing Python, all from your AI editor. Experimental; great for prototyping.
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Install
uv tool install --from git+https://github.com/pixeltable/mcp-server-pixeltable-developer.git mcp-server-pixeltable-developer
# Add to Claude Code
claude mcp add pixeltable mcp-server-pixeltable-developer
```
See [configuration for Cursor, Claude Desktop, and more](https://github.com/pixeltable/mcp-server-pixeltable-developer) in the repo README.
Any Pixeltable UDF or query function can be exposed as an MCP tool with a single call:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
@pxt.udf
def lookup_customer(name: str) -> str:
"""Look up customer info by name."""
t = pxt.get_table('app.customers')
return t.where(t.name == name).select(t.info).collect()[0]['info']
tools = pxt.tools(lookup_customer)
```
`pxt.tools()` wraps your functions so any MCP-compatible client can call them. See the [MCP integration guide](/libraries/mcp) for the full setup.
***
## Start Building
Scaffold a full project in one command with `pixeltable-new`. It generates a working Pixeltable project (serving, backend, or batch) with schema, configuration, and deployment configs already wired up. Ask your AI tool to customize it from there.
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
uvx pixeltable-new myapp # declarative serving (default)
uvx pixeltable-new myapp --backend # full FastAPI + React app
uvx pixeltable-new myapp --template knowledge-base my-kb # vertical template
```
Reference implementations for all three deployment patterns, plus 7 vertical application templates
***
## Next Steps
Install and run your first pipeline in 5 minutes
The core pattern LLMs generate. Learn how it works
Build agents with UDFs, queries, and MCP tools
Full use case walkthrough for AI agents
# What is Pixeltable?
Source: https://docs.pixeltable.com/overview/pixeltable
Pixeltable is declarative AI data infrastructure that provides incremental computed columns, multimodal storage, and versioning in one Python API.
**The only open source Python library providing declarative data infrastructure for building multimodal AI applications, enabling incremental storage, transformation, indexing, retrieval, and orchestration of data.**
With Pixeltable, you define your entire data processing and AI workflow declaratively using computed columns on tables. Focus on your application logic, not the data plumbing.
## Before Pixeltable
AI teams are building on images, video, audio, and text, but the infrastructure is broken:
Data lives across object stores, vector DBs, SQL, and ad-hoc pipelines. No single source of truth.
Every model change requires reprocessing. Pipelines are brittle and hard to reproduce.
This creates high engineering cost, slow iteration, and production risk.
**Pixeltable solves this.** One system for storage, orchestration, and retrieval. Transactions, incremental updates, and automatic dependency tracking built in.
## With Pixeltable
All data and computed results are automatically stored and versioned.
Data transformations run automatically on new data. No orchestration code needed.
Images, video, audio, and documents integrate seamlessly with structured data.
Built-in support for OpenAI, Anthropic, Gemini, Hugging Face, and dozens more.
## Get started
Install Pixeltable and run your first pipeline in 5 minutes.
See Pixeltable in action with a hands-on image workflow.
Learn about tables, computed columns, views, and the type system.
Complete API reference for the Pixeltable Python SDK.
Many documentation pages are interactive notebooks (marked with in the sidebar). Open them in Colab, Kaggle, or locally to follow along.
## Core Primitives
Pixeltable provides a small set of primitives that compose into any multimodal AI workflow:
**Create tables with native multimodal types**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('myapp.media', {
'video': pxt.Video,
'image': pxt.Image,
'audio': pxt.Audio,
'document': pxt.Document,
'metadata': pxt.Json
})
```
Create, insert, update, delete
All supported types
**Declarative computed columns: API calls, LLM inference, local models, vision**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# LLM API call
t.add_computed_column(summary=openai.chat_completions(
messages=[{'role': 'user', 'content': 'Summarize: ' + t.text}]
))
# Local model inference
t.add_computed_column(objects=yolox(t.image, model_id='yolox_s'))
# Vision analysis (multimodal)
t.add_computed_column(desc=openai.chat_completions(
messages=[{'role': 'user', 'content': [
{'type': 'text', 'text': 'Describe this image'},
{'type': 'image_url', 'image_url': t.image},
]}],
model='gpt-4o-mini'
))
```
Incremental transforms
OpenAI, Anthropic, Gemini, HuggingFace...
**Explode rows: video→frames, doc→chunks, audio→segments**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Extract frames from video at 1 fps
frames = pxt.create_view('myapp.frames', t, iterator=frame_iterator(t.video, fps=1))
# Chunk documents for RAG
chunks = pxt.create_view('myapp.chunks', t, iterator=document_splitter(t.document))
```
Virtual tables
Frame, Document, Audio splitters
**Add embedding indexes for semantic search**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_embedding_index('text', embedding=openai.embeddings())
# Search by similarity
results = t.order_by(t.text.similarity('find relevant docs'), asc=False).limit(10)
```
Vector search with automatic maintenance
**Write custom functions with `@pxt.udf` and `@pxt.query`**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
def extract_entities(text: str) -> list[str]:
# Your custom logic
return entities
@pxt.query
def search_by_topic(topic: str):
return t.where(t.category == topic).select(t.title, t.summary)
```
Custom Python functions
**Tool calling for AI agents and MCP integration**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Load tools from MCP server, UDFs, and queries
mcp_tools = pxt.mcp_udfs('http://localhost:8000/mcp')
tools = pxt.tools(search_by_topic, extract_entities, *mcp_tools)
# LLM decides which tool to call; Pixeltable executes it
t.add_computed_column(response=openai.chat_completions(
messages=[{'role': 'user', 'content': t.question}],
tools=tools
))
t.add_computed_column(result=openai.invoke_tools(tools, t.response))
```
Build agents with tools
MCP servers, memory, Pixelbot
**Expose tables and queries as HTTP endpoints**
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# pyproject.toml
[[tool.pixeltable.service]]
name = "my-service"
prefix = "/api"
modules = ["schema"]
[[tool.pixeltable.service.routes]]
type = "insert"
table = "myapp.docs"
path = "/ingest"
inputs = ["document"]
outputs = ["document", "summary"]
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve my-service
```
TOML config, CLI, Python API, background jobs
**SQL-like queries + test transformations before committing**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Query data with familiar syntax
results = t.where(t.score > 0.8).order_by(t.timestamp).limit(10).collect()
# Test transformations on sample rows BEFORE adding to table
t.select(t.text, summary=summarize(t.text)).head(3) # Nothing stored yet
t.add_computed_column(summary=summarize(t.text)) # Now commit to all rows
```
Select, filter, aggregate
Test before commit
**Time travel and automatic versioning**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.history() # View all versions
t.revert(version=5) # Rollback changes
old_data = pxt.get_table('myapp.media:3') # Query past version
```
History, snapshots, lineage
**Load from any source, export to ML formats**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Import from files, URLs, S3, Hugging Face
t.insert(pxt.io.import_csv('data.csv'))
t.insert(pxt.io.import_huggingface_dataset(dataset))
# Export to files or ML formats
pxt.io.export_csv(t, 'output.csv')
pxt.io.export_json(t, 'output.json')
pxt.io.export_parquet(t, 'output.parquet')
loader = DataLoader(t.to_pytorch_dataset(), batch_size=32)
```
CSV, JSON, Parquet, S3, HF
CSV, JSON, Parquet, PyTorch, COCO, LanceDB
**Publish and replicate datasets via Pixeltable Cloud**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.publish(t, 'my-dataset') # Share publicly
pxt.replicate('user/dataset', 'local') # Pull to local
```
Publish, replicate, collaborate
## Use Cases
Pixeltable's primitives are **use-case agnostic**. They compose into any multimodal AI workflow:
Curate, augment, export training datasets. Pre-annotate with models, integrate Label Studio, export PyTorch.
Build RAG systems, semantic search, and multimodal APIs. Pixeltable handles storage, retrieval, and orchestration.
Tool-calling agents with persistent memory, MCP server integration, and automatic conversation history.
Start with the **[Quick Start](/overview/quick-start)** to get running in 5 minutes, or explore **[Cookbooks](/howto/cookbooks/agents/pattern-rag-pipeline)** for hands-on examples covering RAG, video analysis, audio transcription, and more.
## Choose How You Run Pixeltable
Open-source Python library. Install with `pip install pixeltable` and run locally. Same APIs scale to production.
Data sharing available now. Managed endpoints and live tables coming soon.
Schedule a call to discuss your use case and see how Pixeltable can help.
## Next steps
Get help, share projects, and connect with other developers
Star the repo, report issues, and contribute
# Quick Start
Source: https://docs.pixeltable.com/overview/quick-start
Install Pixeltable, create your first table, add a computed column powered by an LLM, and run your first query in just a few minutes of setup.
## System requirements
Before installing, ensure your system meets these requirements:
* Python 3.10 or higher
* Linux, MacOS, or Windows
## Installation
It is recommended that you install Pixeltable in a virtual environment.
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
python -m venv .venv
```
```bash Linux/MacOS theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
source .venv/bin/activate
```
```bash Windows theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
.venv\Scripts\activate
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pip install pixeltable
```
Install uv from the [Installing uv](https://docs.astral.sh/uv/getting-started/installation/) guide.
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
uv venv --python 3.12
```
```bash Linux/MacOS theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
source .venv/bin/activate
```
```bash Windows theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
.venv\Scripts\activate
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
uv pip install pixeltable
```
Download and install from the [Miniconda Installation](https://www.anaconda.com/docs/getting-started/miniconda/main) guide.
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
conda create --name pxt python=3.12
conda activate pxt
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pip install pixeltable
```
## Getting help
* Join our [Discord Community](https://discord.com/invite/QPyqFYx2UN)
* Report issues on [GitHub](https://github.com/pixeltable/pixeltable/issues)
* Contact [support@pixeltable.com](mailto:support@pixeltable.com)
## Build an image analysis app
This guide will help you spin up a functioning AI workload in 5 minutes.
Pixeltable requires only a minimal set of Python packages by default. To use AI models, you'll need to install
additional dependencies.
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pip install torch transformers openai
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Create a namespace and table
pxt.create_dir('quickstart', if_exists='replace_force')
t = pxt.create_table('quickstart/images', {'image': pxt.Image})
```
Tables are persistent: your data survives restarts and can be queried anytime.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import huggingface
# Add DETR object detection as a computed column
t.add_computed_column(
detections=huggingface.detr_for_object_detection(
t.image,
model_id='facebook/detr-resnet-50'
)
)
# Extract labels from detections
t.add_computed_column(labels=t.detections.label_text)
```
Computed columns run automatically whenever new data is inserted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert a few images
t.insert([
{'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000001.jpg'},
{'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000025.jpg'}
])
```
You can insert images from URLs and/or local paths in any combination.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Query results
t.select(t.image, t.labels).collect()
```
**Expected output:**
| image | labels |
| -------- | --------------------------------------------- |
| \[Image] | \[car, parking meter, truck, car, car, truck] |
| \[Image] | \[giraffe, giraffe] |
You'll need an OpenAI API key to use this step. If you don't have one, you can
safely skip this step.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import os
from pixeltable.functions import openai
# Set your API key
os.environ['OPENAI_API_KEY'] = 'your-key-here'
messages = [
{
'role': 'user',
'content': [
{'type': 'text', 'text': 'Describe this image in one sentence.'},
{'type': 'image_url', 'image_url': t.image},
],
}
]
t.add_computed_column(
response=openai.chat_completions(messages, model='gpt-4o-mini')
)
t.select(t.image, t.labels, t.response.choices[0].message.content).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# See the full text of the response in row 0
t.select(t.response.choices[0].message.content).collect()[0]
```
Pixeltable orchestrates LLM calls for optimized throughput, handling
rate limiting, retries, and caching automatically.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.insert([
{'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000034.jpg'},
{'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000057.jpg'}
])
t.select(t.image, t.labels).collect()
```
When new data is inserted into tables, Pixeltable incrementally runs all
computed columns against the new data, ensuring the table is up to date.
If you completed the optional LLM Vision step, the descriptions will also
be generated automatically for these new images.
Pixeltable automatically:
1. Created a persistent multimodal table
2. Downloaded and cached the DETR model
3. Ran inference on your image
4. Stored all results (including computed columns) for instant retrieval
5. Will incrementally process any new images you insert
## Next Steps
A deeper walkthrough with video, embeddings, and similarity search.
Three production patterns: Full Backend (FastAPI + React), Batch Processing (export to your DB), and Declarative Serving (API from TOML).
Expose tables and queries as HTTP endpoints with `pxt serve` or `FastAPIRouter`.
`uvx pixeltable-new myapp` — scaffold a full project (serving, backend, or batch) in one command.
# 10-Minute Tour
Source: https://docs.pixeltable.com/overview/ten-minute-tour
Hands-on ten-minute walkthrough of Pixeltable that covers tables, computed columns, views, embedding indices, and multimodal pipelines.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Welcome to Pixeltable! In this tutorial, we’ll survey how to create
tables, populate them with data, and enhance them with built-in and
user-defined transformations and AI operations.
## Install Python packages
First run the following command to install Pixeltable and related
libraries needed for this tutorial.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU torch transformers timm openai pixeltable
```
## Creating a table
Let’s begin by creating a `demo` directory (if it doesn’t already exist)
and a table that can hold image data, `demo/first`. The table will
initially have just a single column to hold our input images, which
we’ll call `input_image`. We also need to specify a type for the column:
`pxt.Image`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Create the directory `demo`, dropping it first (if it exists)
# to ensure a clean environment.
pxt.drop_dir('demo', force=True)
pxt.create_dir('demo')
# Create the table `demo/first` with a single column `input_image`
t = pxt.create_table('demo/first', {'input_image': pxt.Image})
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'demo'.
Created table 'first'.
We can use `t.describe()` to examine the table schema. We see that it
now contains a single column, as expected.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.describe()
```
The new table is initially empty, with no rows:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.count()
```
0
Now let’s put an image into it! We can add images simply by giving
Pixeltable their URLs. The example images in this demo come from the
[COCO dataset](https://cocodataset.org/), and we’ll be referencing
copies of them in the Pixeltable github repo. But in practice, the
images can come from anywhere: an S3 bucket, say, or the local file
system.
When we add the image, we see that Pixeltable gives us some useful
status updates indicating that the operation was successful.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.insert(
[
{
'input_image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000025.jpg'
}
]
)
```
Inserted 1 row with 0 errors in 0.21 s (4.86 rows/s)
1 row inserted.
We can use `t.head()` to examine the contents of the table.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.head()
```
## Adding computed columns
Great! Now we have a table containing some data. Let’s add an object
detection model to our workflow. Specifically, we’re going to use the
ResNet-50 object detection model, which runs using the Huggingface DETR
(“DEtection TRansformer”) model class. Pixeltable contains a built-in
adapter for this model family, so all we have to do is call the
`detr_for_object_detection` Pixeltable function. A nice thing about the
Huggingface models is that they run locally, so you don’t need an
account with a service provider in order to use them.
This is our first example of a **computed column**, a key concept in
Pixeltable. Recall that when we created the `input_image` column, we
specified a type, `ImageType`, indicating our intent to populate it with
data in the future. When we create a *computed* column, we instead
specify a function that operates on other columns of the table. By
default, when we add the new computed column, Pixeltable immediately
evaluates it against all existing data in the table - in this case, by
calling the `detr_for_object_detection` function on the image.
Depending on your setup, it may take a minute for the function to
execute. In the background, Pixeltable is downloading the model from
Huggingface (if necessary), instantiating it, and caching it for later
use.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import huggingface
t.add_computed_column(
detections=huggingface.detr_for_object_detection(
t.input_image, model_id='facebook/detr-resnet-50'
)
)
```
Added 1 column value with 0 errors in 3.26 s (0.31 rows/s)
1 row updated.
Let’s examine the results.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.head()
```
We see that the model returned a JSON structure containing a lot of
information. In particular, it has the following fields:
* `label_text`: Descriptions of the objects detected
* `boxes`: Bounding boxes for each detected object
* `scores`: Confidence scores for each detection
* `labels`: The DETR model’s internal IDs for the detected objects
Perhaps this is more than we need, and all we really want are the text
labels. We could add another computed column to extract `label_text`
from the JSON struct:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(detections_text=t.detections.label_text)
t.head()
```
If we inspect the table schema now, we see how Pixeltable distinguishes
between ordinary and computed columns.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.describe()
```
Now let’s add some more images to our table. This demonstrates another
important feature of computed columns: by default, they update
incrementally any time new data shows up on their inputs. In this case,
Pixeltable will run the ResNet-50 model against each new image that is
added, then extract the labels into the `detect_text` column. Pixeltable
will orchestrate the execution of any sequence (or DAG) of computed
columns.
Note how we can pass multiple rows to `t.insert` with a single
statement, which will insert them more efficiently.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
more_images = [
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000030.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000034.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000042.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000061.jpg',
]
t.insert({'input_image': image} for image in more_images)
```
Inserted 4 rows with 0 errors in 1.51 s (2.65 rows/s)
4 rows inserted.
Let’s see what the model came up with. We’ll use `t.select` to suppress
the display of the `detect` column, since right now we’re only
interested in the text labels.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.input_image, t.detections_text).head()
```
## Pixeltable is persistent
An important feature of Pixeltable is that *everything is persistent*.
Unlike in-memory Python libraries such as Pandas, Pixeltable is a
database: all your data, transformations, and computed columns are
stored and preserved between sessions. To see this, let’s clear all the
variables in our notebook and start fresh. You can optionally restart
your notebook kernel at this point, to demonstrate how Pixeltable data
persists across sessions.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Clear all variables in the notebook
%reset -f
# Instantiate a new client object
import pixeltable as pxt
t = pxt.get_table('demo/first')
# Display just the first two rows, to avoid cluttering the tutorial
t.select(t.input_image, t.detections_text).head(2)
```
## GPT-4o
For comparison, let’s try running our examples through a generative
model, Open AI’s `gpt-4o-mini`. For this section, you’ll need an OpenAI
account with an API key. You can use the following command to add your
API key to the environment (just enter your API key when prompted):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass(
'Enter your OpenAI API key:'
)
```
Now we can connect to OpenAI through Pixeltable. This may take some
time, depending on how long OpenAI takes to process the query.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import openai
# Construct a message dict for OpenAI. It follows the same pattern
# as the OpenAI SDK, except that in place of an image URL, we can
# put a reference to our image column, and Pixeltable will do the
# substitution once for each row of the table.
messages = [
{
'role': 'user',
'content': [
{'type': 'text', 'text': "What's in this image?"},
{'type': 'image_url', 'image_url': t.input_image},
],
}
]
t.add_computed_column(
vision=openai.chat_completions(messages, model='gpt-4o-mini')
)
```
Added 5 column values with 0 errors in 6.98 s (0.72 rows/s)
5 rows updated.
Let’s see how GPT-4’s responses compare to the traditional
discriminative (DETR) model.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.input_image, t.detections_text, t.vision).head()
```
It looks like OpenAI returned a whole range of context information along
with the image descriptions. Let’s pluck out just the response content
from inside those JSON structures, so that it’s easier to see in the
table. Note that we can unpack JSON columns in Pixeltable the same way
we would with ordinary Python dicts and lists.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(
t.input_image,
t.detections_text,
t.vision['choices'][0]['message']['content'],
).head()
```
In addition to adapters for local models and inference APIs, Pixeltable
can perform a range of more basic image operations. These image
operations can be seamlessly chained with API calls, and Pixeltable will
keep track of the sequence of operations, constructing new images and
caching when necessary to keep things running smoothly. Just for fun
(and to demonstrate the power of computed columns), let’s see what
OpenAI thinks of our sample images when we rotate them by 180 degrees.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(rot_image=t.input_image.rotate(180))
# This is identical to the preceding messages dict, but with
# `t.rot_image` in place of `t.input_image`.
messages = [
{
'role': 'user',
'content': [
{'type': 'text', 'text': "What's in this image?"},
{'type': 'image_url', 'image_url': t.rot_image},
],
}
]
t.add_computed_column(
rot_vision=openai.chat_completions(messages, model='gpt-4o-mini')
)
```
Added 5 column values with 0 errors in 6.19 s (0.81 rows/s)
5 rows updated.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(
t.rot_image, t.rot_vision['choices'][0]['message']['content']
).head()
```
## UDFs: Enhancing Pixeltable’s capabilities
Another important principle of Pixeltable is that, although Pixeltable
has a built-in library of useful operations and adapters, it will never
prescribe a particular way of doing things. Pixeltable is built from the
ground up to be extensible.
Let’s take a specific example. Recall our use of the ResNet-50 detection
model, in which the `detect` column contains a JSON blob with bounding
boxes, scores, and labels. Suppose we want to create a column containing
the single label with the highest confidence score. There’s no built-in
Pixeltable function to do this, but it’s easy to write our own. In fact,
all we have to do is write a Python function that does the thing we
want, and mark it with the `@pxt.udf` decorator.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
def top_detection(detect: dict) -> str:
scores = detect['scores']
label_text = detect['label_text']
# Get the index of the object with the highest confidence
i = scores.index(max(scores))
# Return the corresponding label
return label_text[i]
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(top=top_detection(t.detections))
```
Added 5 column values with 0 errors in 0.11 s (45.52 rows/s)
5 rows updated.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.detections_text, t.top).show()
```
Congratulations! You’ve reached the end of the tutorial. Hopefully, this
gives a good overview of the capabilities of Pixeltable, but there’s
much more to explore. As a next step, you might check out one of the
other tutorials, depending on your interests:
* [Object Detection in
Videos](/howto/use-cases/object-detection-in-videos)
* [RAG Operations in
Pixeltable](/howto/use-cases/rag-operations)
* [Working with OpenAI in
Pixeltable](/howto/providers/working-with-openai)
# CLI Reference
Source: https://docs.pixeltable.com/platform/cli
Serve Pixeltable tables and queries as HTTP endpoints directly from the terminal using the pxt command-line interface for fast iteration.
The `pxt` CLI ships with the `pixeltable` package. It turns tables, computed columns, and `@pxt.query` functions into HTTP endpoints, no Python application code required.
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pip install pixeltable 'fastapi[standard]'
```
Verify the installation:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt --version
```
`pxt serve` generates a full FastAPI application with auto-generated [OpenAPI docs](https://fastapi.tiangolo.com/features/#automatic-docs) at `/docs`. For programmatic control over the same endpoints, see the [Python serving API](/howto/deployment/serving#quickstart-python) using `FastAPIRouter`.
## Command structure
```text theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt [subcommand] [flags]
```
| Command | Description |
| -------------------------- | ------------------------------------------------------------------------------------------------------- |
| `pxt --version` | Print the installed Pixeltable version |
| `pxt --help` | List available commands |
| `pxt serve ` | Start a named service defined in a [TOML config](/howto/deployment/serving#toml-service-file-reference) |
| `pxt serve insert` | Start a single insert endpoint |
| `pxt serve update` | Start a single update endpoint |
| `pxt serve delete` | Start a single delete endpoint |
| `pxt serve query` | Start a single query endpoint |
| `pxt deploy ` | Build a deploy bundle for the named environment |
## Quick start
### Named service (TOML config)
Define your routes in a TOML file and start everything with one command:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# service.toml
[[service]]
name = "my-service"
port = 8000
[[service.routes]]
type = "insert"
table = "my_dir/my_table"
path = "/generate"
inputs = ["prompt"]
outputs = ["prompt", "result"]
[[service.routes]]
type = "query"
path = "/search"
query = "myapp.queries.search_docs"
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve my-service --config service.toml
```
```text Output theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
Starting Pixeltable service: my-service
Bound to 0.0.0.0:8000
Listening on http://localhost:8000
API docs at http://localhost:8000/docs
Routes: 2
```
### Single-endpoint mode
For quick experiments, skip the TOML file and configure one route directly:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve insert --table my_dir.my_table --path /generate \
--inputs prompt --outputs prompt result --port 8000
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve query --query myapp.queries.search_docs --path /search
```
Single-endpoint mode is meant for development; for production or multi-route services, use the TOML config.
## Global flags
Every `pxt serve` subcommand accepts these flags:
| Flag | Type | Default | Description |
| ----------- | ------- | --------- | -------------------------------------------------------------------- |
| `--host` | string | `0.0.0.0` | Bind address (overrides TOML `host`) |
| `--port` | integer | `8000` | Bind port (overrides TOML `port`) |
| `--prefix` | string | `""` | URL prefix prepended to all routes (must start with `/` or be empty) |
| `--config` | string | | Path to an additional TOML config file to merge |
| `--dry-run` | flag | | Print the resolved config and exit without starting the server |
| `--json` | flag | | Emit machine-readable JSON on stdout (startup) or stderr (errors) |
### Machine-readable output
When `--json` is set, a successful start emits:
```json theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{"status": "starting", "host": "0.0.0.0", "port": 8000, "url": "http://localhost:8000", "docs_url": "http://localhost:8000/docs", "routes": 2}
```
Errors (including port conflicts) emit to stderr:
```json theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{"status": "error", "code": "EADDRINUSE", "port": 8000, "message": "port 8000 is already in use"}
```
Combine `--dry-run` and `--json` to validate a config in CI without starting a server:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve my-service --config service.toml --dry-run --json
```
## Serve subcommands
### `pxt serve `
Load a named service from the [TOML config](/howto/deployment/serving#toml-service-file-reference) and start the server. This is the primary way to run multi-route services in production.
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve my-service --config service.toml
pxt serve my-service --config service.toml --port 9000
```
### `pxt serve insert`
Start a service with a single insert endpoint.
| Flag | Type | Required | Description |
| ----------------------- | ------- | -------- | ----------------------------------------------------- |
| `--table` | string | yes | Pixeltable table path (e.g. `my_dir.my_table`) |
| `--path` | string | yes | URL path (e.g. `/generate`) |
| `--inputs` | strings | no | Columns accepted from the request body |
| `--uploadfile-inputs` | strings | no | Columns accepted as multipart file uploads |
| `--outputs` | strings | no | Columns returned in the response |
| `--return-fileresponse` | flag | no | Return the single media output as a raw file download |
| `--background` | flag | no | Run the insert in a background thread |
SQL export flags are also available on insert and update routes. See [SQL export flags](#sql-export-flags).
`--background` and `--return-fileresponse` are mutually exclusive. Similarly, `--export-sql-*` flags cannot be combined with `--return-fileresponse`. These constraints apply to all serve subcommands that support these flags.
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve insert --table my_dir.my_table --path /generate \
--inputs prompt --outputs prompt result --port 8000
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl -X POST http://localhost:8000/generate \
-H 'Content-Type: application/json' \
-d '{"prompt": "a sunset over the ocean"}'
```
### `pxt serve update`
Start a service with a single update endpoint. The table must have a primary key.
| Flag | Type | Required | Description |
| ----------------------- | ------- | -------- | --------------------------------------------------------- |
| `--table` | string | yes | Pixeltable table path |
| `--path` | string | yes | URL path |
| `--inputs` | strings | no | Non-PK columns to update (PK columns are always accepted) |
| `--outputs` | strings | no | Columns returned in the response |
| `--return-fileresponse` | flag | no | Return the single media output as a raw file download |
| `--background` | flag | no | Run the update in a background thread |
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve update --table my_dir.my_table --path /update \
--inputs prompt --outputs id prompt result
```
### `pxt serve delete`
Start a service with a single delete endpoint.
| Flag | Type | Required | Description |
| ----------------- | ------- | -------- | --------------------------------------------- |
| `--table` | string | yes | Pixeltable table path |
| `--path` | string | yes | URL path |
| `--match-columns` | strings | no | Columns to match on (defaults to primary key) |
| `--background` | flag | no | Run the delete in a background thread |
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve delete --table my_dir.my_table --path /delete
```
### `pxt serve query`
Start a service with a single query endpoint.
| Flag | Type | Required | Description |
| ----------------------- | -------------- | -------- | ------------------------------------------------------- |
| `--query` | string | yes | Dotted Python path to a `@pxt.query` or `retrieval_udf` |
| `--path` | string | yes | URL path |
| `--inputs` | strings | no | Parameters accepted from the request |
| `--uploadfile-inputs` | strings | no | Parameters accepted as multipart file uploads |
| `--one-row` | flag | no | Expect exactly one result row (404 on 0, 409 on >1) |
| `--return-fileresponse` | flag | no | Return the single media result as a raw file |
| `--background` | flag | no | Run the query in a background thread |
| `--method` | `get` / `post` | no | HTTP method (default: `post`) |
The dotted path is resolved at startup; the module is imported automatically.
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve query --query myapp.queries.search_docs --path /search
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve query --query myapp.queries.lookup_by_id --path /lookup \
--one-row --method get
```
## SQL export flags
Insert and update routes can export each successful request as a row in an external SQL database. These flags mirror the [`export_sql` TOML config](/howto/deployment/serving#exporting-rows-to-an-external-database):
| Flag | Type | Description |
| ------------------------- | ----------------------------- | ------------------------------------------------------------------ |
| `--export-sql-db-connect` | string | SQLAlchemy connection string for the external database |
| `--export-sql-table` | string | Target table name (required when `--export-sql-db-connect` is set) |
| `--export-sql-db-schema` | string | Optional database schema qualifier |
| `--export-sql-method` | `insert` / `update` / `merge` | How to write each row (default: `insert`) |
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve insert --table my_dir.my_table --path /generate \
--inputs prompt --outputs prompt result \
--export-sql-db-connect 'postgresql+psycopg://user:pw@host/analytics' \
--export-sql-table generations
```
## Deploy
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt deploy
```
| Argument / flag | Description |
| --------------- | ----------------------------------------------------- |
| `` | Target environment name (from your Pixeltable config) |
| `--json` | Emit machine-readable JSON on errors |
Builds a deploy bundle for the specified environment. See [Deployment Overview](/howto/deployment/overview) for how environments are configured.
## Patterns
### Validate a config without starting a server
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve my-service --config service.toml --dry-run
```
```text Output theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
Service: my-service
Host: 0.0.0.0
Port: 8000
Routes (2):
[insert] /generate
[query] /search
```
### Override port for local development
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve my-service --config service.toml --port 9000
```
### File upload endpoint
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve insert --table my_dir.images --path /resize \
--inputs width height --uploadfile-inputs image \
--outputs resized --return-fileresponse
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl -X POST http://localhost:8000/resize \
-F image=@photo.jpg -F width=640 -F height=480 \
--output resized.jpg
```
### Background processing for slow pipelines
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve insert --table my_dir.videos --path /ingest --background
```
The endpoint returns immediately with a job handle:
```json theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{"id": "abc123", "job_url": "http://localhost:8000/jobs/abc123"}
```
Poll `job_url` until `status` is `"done"` or `"error"`.
## What's next
* [HTTP Serving Guide](/howto/deployment/serving): TOML config reference, Python `FastAPIRouter` API, decorator routes
* [Deployment Overview](/howto/deployment/overview): production architecture and deployment strategies
* [Configuration](/platform/configuration): API keys, storage paths, and environment settings
# Configuration
Source: https://docs.pixeltable.com/platform/configuration
Configure Pixeltable storage paths, providers, API keys, logging, and runtime options through environment variables and config files.
## Configuration options
Pixeltable can be configured through:
* Environment variables
* System configuration file (`~/.pixeltable/config.toml` on Linux/macOS or `C:\Users\\.pixeltable\config.toml` on Windows)
Example `config.toml`:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[pixeltable]
file_cache_size_g = 250
time_zone = "America/Los_Angeles"
hide_warnings = true
verbosity = 2
[openai]
api_key = 'my-openai-api-key'
[openai.rate_limits]
tts-1 = 500 # OpenAI uses a per-model rate limit configuration (see below for details)
[mistral]
api_key = 'my-mistral-api-key'
rate_limit = 600 # Mistral uses a single rate limit for all models
[label_studio]
url = 'http://localhost:8080/'
api_key = 'my-label-studio-api-key'
```
## System settings
| Environment Variable | Config File | Meaning |
| -------------------------------- | --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| PIXELTABLE\_HOME | | (string) Pixeltable user directory; default is \~/.pixeltable |
| PIXELTABLE\_CONFIG | | (string) Pixeltable config file; default is \$PIXELTABLE\_HOME/config.toml |
| PIXELTABLE\_PGDATA | | (string) Directory where Pixeltable DB is stored; default is \$PIXELTABLE\_HOME/pgdata |
| PIXELTABLE\_DB | | (string) Pixeltable database name; default is pixeltable |
| PIXELTABLE\_FILE\_CACHE\_SIZE\_G | \[pixeltable] file\_cache\_size\_g | (float) Maximum size of the Pixeltable file cache, in GiB; required |
| PIXELTABLE\_TIME\_ZONE | \[pixeltable] time\_zone | (string) Default time zone in [IANA format](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones); defaults to the system time zone |
| PIXELTABLE\_HIDE\_WARNINGS | \[pixeltable] hide\_warnings | (bool) Suppress warnings generated by various libraries used by Pixeltable; default is false |
| PIXELTABLE\_VERBOSITY | \[pixeltable] verbosity | (int) Verbosity for Pixeltable console logging (0: minimum, 1: normal, 2: maximum); default is 1 |
| PIXELTABLE\_START\_DASHBOARD | \[pixeltable] start\_dashboard | (bool) Whether to automatically start the Pixeltable Dashboard server; default is true |
| PIXELTABLE\_DASHBOARD\_PORT | \[pixeltable] dashboard\_port | (int) Port number for the Pixeltable Dashboard server; default is 22089 |
| PIXELTABLE\_API\_KEY | \[pixeltable] api\_key | (string) API key for Pixeltable Cloud |
| PIXELTABLE\_INPUT\_MEDIA\_DEST | \[pixeltable] input\_media\_dest | (string) Default destination URI for media files that are inserted into tables |
| PIXELTABLE\_OUTPUT\_MEDIA\_DEST | \[pixeltable] output\_media\_dest | (string) Default destination URI for media files that are generated by Pixeltable operations |
| PIXELTABLE\_R2\_PROFILE | \[pixeltable] r2\_profile | (string) Name of AWS config profile to use when accessing Cloudflare R2 resources. If not specified, default AWS credentials will be used. |
| PIXELTABLE\_S3\_PROFILE | \[pixeltable] s3\_profile | (string) Name of AWS config profile to use when accessing Amazon S3 resources. If not specified, default AWS credentials will be used. |
| PIXELTABLE\_B2\_PROFILE | \[pixeltable] b2\_profile | (string) Name of an S3-compatible profile for accessing Backblaze B2. Defaults to the standard AWS credential chain if not set. |
| PIXELTABLE\_TIGRIS\_PROFILE | \[pixeltable] tigris\_profile | (string) Name of an S3-compatible profile for accessing Tigris. Defaults to the standard AWS credential chain if not set. |
## API configuration
| Environment Variable | Config File | Meaning |
| ----------------------------- | ------------------------------------ | --------------------------------------------------------------------------------- |
| ANTHROPIC\_API\_KEY | \[anthropic] api\_key | (string) API key to use for Anthropic services |
| AZURE\_STORAGE\_ACCOUNT\_NAME | \[azure] storage\_account\_name | (string) Azure Storage account name for use with Azure Blob Storage |
| AZURE\_STORAGE\_ACCOUNT\_KEY | \[azure] storage\_account\_key | (string) Azure Storage account key for use with Azure Blob Storage |
| BEDROCK\_API\_KEY | \[bedrock] api\_key | (string) API key to use for AWS Bedrock services |
| DEEPSEEK\_API\_KEY | \[deepseek] api\_key | (string) API key to use for Deepseek services |
| FAL\_API\_KEY | \[fal] api\_key | (string) API key to use for fal.ai services |
| FIREWORKS\_API\_KEY | \[fireworks] api\_key | (string) API key to use for Fireworks AI services |
| GEMINI\_API\_KEY | \[gemini] api\_key | (string) API key for Google AI Studio (not used for Vertex AI) |
| GOOGLE\_API\_KEY | | (string) Alternative API key for Google AI Studio (not used for Vertex AI) |
| GOOGLE\_CLOUD\_LOCATION | | (string) Google Cloud region for Vertex AI |
| GOOGLE\_CLOUD\_PROJECT | | (string) Google Cloud project ID for Vertex AI |
| GOOGLE\_GENAI\_USE\_VERTEXAI | | (bool) Set to `true` to use Vertex AI instead of Google AI Studio |
| GROQ\_API\_KEY | \[groq] api\_key | (string) API key to use for Groq AI services |
| HF\_AUTH\_TOKEN | \[hf] auth\_token | (string) Hugging Face auth token for use with Hugging Face services |
| LABEL\_STUDIO\_API\_KEY | \[label\_studio] api\_key | (string) API key to use for Label Studio |
| LABEL\_STUDIO\_URL | \[label\_studio] url | (string) URL of the Label Studio server to use |
| MISTRAL\_API\_KEY | \[mistral] api\_key | (string) API key to use for Mistral AI services |
| OPENAI\_API\_KEY | \[openai] api\_key | (string) API key to use for OpenAI services |
| OPENAI\_BASE\_URL | \[openai] base\_url | (string, optional) Base URL to use for OpenAI services |
| OPENAI\_API\_VERSION | \[openai] api\_version | (string) API version for use with Azure OpenAI; must be `'latest'` or `'preview'` |
| OPENROUTER\_API\_KEY | \[openrouter] api\_key | (string) API key to use for OpenRouter services |
| OPENROUTER\_SITE\_URL | \[openrouter] site\_url | (string) Application URL (optional, for OpenRouter analytics) |
| OPENROUTER\_APP\_NAME | \[openrouter] app\_name | (string) Application name (optional, for OpenRouter analytics) |
| REPLICATE\_API\_TOKEN | \[replicate] api\_token | (string) API token to use for Replicate services |
| REVE\_API\_KEY | \[reve] api\_key | (string) API key to use for Reve Image services |
| TOGETHER\_API\_KEY | \[together] api\_key | (string) API key to use for Together AI services |
| TWELVELABS\_API\_KEY | \[twelvelabs] api\_key | (string) API key to use for TwelveLabs services |
| VOYAGE\_API\_KEY | \[voyage] api\_key | (string) API key to use for Voyage AI services |
## Rate limit configuration
Pixeltable supports two patterns for configuring API rate limits in `config.toml`. Refer to the docstring of the
relevant udf in the [SDK Reference](/sdk/latest) for details on the rate limiting pattern used by that udf.
### Single rate limit per provider
For providers with a single rate limit across all models, add a `rate_limit` key to the provider's config section:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[mistral]
api_key = 'my-mistral-api-key'
rate_limit = 600 # requests per minute
[fireworks]
api_key = 'my-fireworks-api-key'
rate_limit = 300
```
### Per-model rate limits
For providers that support different rate limits for different models, add a `.rate_limits` section and list the rate limits for each model:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[openai]
api_key = 'my-openai-api-key'
[openai.rate_limits]
gpt-4o = 500
gpt-4o-mini = 1000
tts-1 = 50
dall-e-3 = 10
[gemini.rate_limits]
gemini-2.5-flash = 600
gemini-2.5-pro = 300
```
If no rate limit is configured, Pixeltable uses a default of 600 requests per minute.
## Configuration best practices
### Security considerations
When configuring API keys and sensitive information:
* Avoid hardcoding API keys in your code
* Use environment variables for temporary access
* Use the config file for persistent configuration
* Ensure your config.toml file has appropriate permissions (readable only by you)
### Performance tuning
* Adjust `file_cache_size_g` based on your available system memory
* For large datasets, increase the cache size to improve performance
* Set appropriate verbosity level based on your debugging needs
## Applying configuration changes
Configuration changes take effect when you restart your Python session.
Return to the installation guide for setup instructions
# Local Dashboard
Source: https://docs.pixeltable.com/platform/dashboard
Browse Pixeltable tables, inspect pipelines, preview multimedia, and debug computed columns from a local web dashboard that starts automatically.
The Pixeltable dashboard is a local web UI for exploring your data. It starts automatically when you use Pixeltable and runs at [http://localhost:22089](http://localhost:22089).
No extra install is needed. The dashboard ships inside the `pixeltable` package.
## Opening the dashboard
The dashboard server launches in the background the first time your script or notebook calls any Pixeltable API. To open it in a browser explicitly:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
pxt.dashboard.serve() # opens http://localhost:22089 in your default browser
pxt.dashboard.serve(open_browser=False) # start the server without opening a browser
```
The dashboard binds to `127.0.0.1` only and is not accessible from other machines on the network. For remote access to tables and queries, see [HTTP Serving](/howto/deployment/serving).
## What you can do
The dashboard is a **read-only inspector**. You browse and debug your data visually; all mutations (inserts, schema changes, etc.) happen through the Python SDK or [CLI](/platform/cli).
### Browse the catalog
The left sidebar shows your full directory tree: directories, tables, views, and snapshots. Each entry displays its kind and error count. Click any table to open it, or click a directory to see a summary with aggregate stats and a list of child tables.
A global search panel (**Cmd/Ctrl + K**) lets you find directories, tables, and columns by name.
### Inspect table data
The main table view has three tabs:
* **Data** - paginated rows with typed columns, sortable by indexed columns, with configurable page sizes. Cells render inline previews for images, video, audio, and documents.
* **Lineage** - shows the table's position in the pipeline: its base table, derived views, and column-level dependency graphs.
* **History** - version list with change types (inserts, updates, deletes) and timestamps.
Above the data grid, a collapsible **Schema** panel shows every column with its type, whether it is computed (with the Python expression), stored vs. unstored, and any embedding indexes.
### Filter and search rows
A filter panel lets you narrow down the current page:
* **Free-text search** across all columns (debounced)
* **Faceted filters**: text contains, numeric ranges, datetime ranges, boolean/enum checklists
* **Errors-only mode**: server-side filter that shows only rows with computation errors, useful for debugging failed computed columns
### Preview media
Images and videos render inline in the data grid. Click any media cell to open a **lightbox** with keyboard navigation (arrow keys for prev/next). For image-heavy tables, a **gallery view** displays thumbnails in a grid with a detail overlay per row.
### Explore the pipeline
The **Lineage** page (accessible from the sidebar) shows a full-instance pipeline graph built with React Flow. Every table and view appears as a node, with edges showing relationships (base table, iterator, query dependencies).
Click any node to open a detail drawer with:
* Row count and version info
* Column breakdown (insertable, computed, stored)
* Embedding index definitions
* Computed column expressions with dependency graphs
* Version history
A **Find table** search on the canvas lets you jump to any node and fit it into view.
### Export data
Download the current table as **CSV** (up to 100,000 rows) from the table view.
### Copy SDK snippets
The table view includes a **Copy** button that generates the Python SDK code for opening the current table, making it easy to switch from visual inspection to scripted work.
## Configuration
| Setting | Default | Environment variable | Description |
| ----------------- | ------- | ---------------------------- | ------------------------------------ |
| `start_dashboard` | `true` | `PIXELTABLE_START_DASHBOARD` | Set to `false` to prevent auto-start |
| `dashboard_port` | `22089` | `PIXELTABLE_DASHBOARD_PORT` | Port the dashboard listens on |
You can also set these in your [Pixeltable config file](/platform/configuration):
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[pixeltable]
start_dashboard = false
dashboard_port = 3000
```
## Keyboard shortcuts
| Shortcut | Action |
| ---------------- | --------------------------------- |
| **Cmd/Ctrl + K** | Open global search |
| **Cmd/Ctrl + F** | Focus in-table row filter |
| **Esc** | Close search, modals, or lightbox |
| **Arrow keys** | Navigate media in lightbox |
## Troubleshooting
**Port already in use**: If another process occupies port 22089, set a different port via `PIXELTABLE_DASHBOARD_PORT=3000` or the config file. The dashboard detects port conflicts and prints the alternative in the console.
**Dashboard not starting**: Verify the dashboard is not explicitly disabled:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
pxt.dashboard.serve() # force-start and open browser
```
**Static files missing**: The React app ships as pre-built static files inside the `pixeltable` package. If you installed from source, ensure the dashboard was built (`npm run build` in the `dashboard/` directory).
# Data Sharing
Source: https://docs.pixeltable.com/platform/data-sharing
Share Pixeltable tables, views, and snapshots across users and projects with read-only access, lineage tracking, and reproducible references.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Learn how to publish datasets to Pixeltable Cloud and replicate datasets
from the cloud to your local environment.
## Overview
Pixeltable Cloud enables you to:
* **Publish** your datasets for sharing with teams or the public
* **Replicate** datasets from the cloud to your local environment
* Share multimodal AI datasets (images, videos, audio, documents)
without managing infrastructure
This guide demonstrates both publishing and replicating datasets.
## Setup
Data sharing functionality requires Pixeltable version 0.4.24 or later.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
## Replicating datasets
You can replicate any public dataset from Pixeltable Cloud to your local
environment without needing an account or API key.
### Replicate a public dataset
Let’s replicate a mini-version of the COCO-2017 dataset from Pixeltable
Cloud. You can find this dataset at
[pixeltable.com/t/pixeltable:fiftyone/coco\_mini\_2017](https://www.pixeltable.com/t/pixeltable:fiftyone/coco_mini_2017),
or browse for other [public
datasets](https://www.pixeltable.com/data-products).
When calling `replicate()`:
* **`remote_uri`** (required): The URI of the cloud dataset you want to
replicate
* **`local_path`** (your choice): The local directory/table name where
you want to store the replica
* **Variable name** (your choice): The Python variable in your
session/script to reference the table (e.g., `coco_copy`)
See the [replicate() SDK
reference](/sdk/latest/pixeltable#func-replicate)
for full documentation.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
pxt.drop_dir('sharing-demo', force=True)
pxt.create_dir('sharing-demo')
# The remote_uri is the specific cloud dataset you want to replicate
# The local_path and variable name are yours to choose
coco_copy = pxt.replicate(
remote_uri='pxt://pixeltable:fiftyone/coco_mini_2017',
local_path='sharing-demo.coco-copy',
)
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'sharing-demo'.
Output()
Extracting table data into: /Users/asiegel/.pixeltable/tmp/acad78b1-4a62-483e-a0b1-728ccb5603cf
Created directory '\_system'.
Created local replica 'sharing-demo/coco-copy' from URI: pxt://pixeltable:fiftyone/coco\_mini\_2017
You can check that the replica exists at the local path with
`list_tables()`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.list_tables('sharing-demo')
```
\['sharing-demo/coco-copy']
To see the structure of the replicated table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
coco_copy
```
### Working with replicas
Replicated datasets are read-only locally, but you can query, explore,
and use them in powerful ways:
**1. Query and explore the data**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View the replicated data
coco_copy.limit(3).collect()
```
**2. Perform similarity searches**
Replicas include embedding indexes, so you can immediately perform
similarity searches:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Get a sample image to search with
sample_img = (
coco_copy.select(coco_copy.image).limit(1).collect()[0]['image']
)
sample_img
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Perform image-based similarity search
sim = coco_copy.image.similarity(image=sample_img)
results = (
coco_copy.order_by(sim, asc=False)
.limit(5)
.select(coco_copy.image, sim)
.collect()
)
results
```
Because the COCO dataset uses CLIP embeddings (which are multimodal),
you can also search using text queries:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Perform text-based similarity search
sim = coco_copy.image.similarity(string='surfing')
results = (
coco_copy.order_by(sim, asc=False)
.limit(4)
.select(coco_copy.image, sim)
.collect()
)
results
```
**3. Access replicas in new sessions**
In a new Python session, use `list_tables()` and `get_table()` to access
your replicas:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# List all tables to see your replica
pxt.list_tables('sharing-demo')
```
\['sharing-demo/coco-copy']
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Assign a handle to the replica
coco_copy = pxt.get_table('sharing-demo.coco-copy')
```
**4. Create an independent copy**
To work with the data in new ways, create an independent table with the
replica as the source:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a fresh table with values only
my_coco = pxt.create_table('sharing-demo.my-coco-table', source=coco_copy)
```
Created table 'my-coco-table'.
This copies the values in the source, but drops the computational
definitions and cannot be updated if the source table changes.
### Updating replicas with pull
If the upstream table changes, you can update your local replica using
`pull()`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Update your local replica with changes from the cloud
coco_copy.pull()
```
Replica 'sharing-demo/coco-copy' is already up to date with source: pxt://pixeltable:fiftyone/d699317b-23a4-404b-8f71-6531fd8dc462
This synchronizes your local replica with any updates made to the source
dataset.
## Publishing datasets
**Requirements:**
* A Pixeltable Cloud account (Community Edition includes 1TB storage -
see [pricing](https://www.pixeltable.com/pricing))
* Your API key from the [account
dashboard](https://pixeltable.com/dashboard)
Publishing allows you to share your datasets with your team or make them
publicly available.
### Configure your API key
Pixeltable looks for your API key in the `PIXELTABLE_API_KEY`
environment variable. Choose one of these methods:
**Option 1: In your notebook (secure and convenient)**
Run this cell to securely enter your API key (get it from
[pixeltable.com/dashboard](https://pixeltable.com/dashboard)):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import os
from getpass import getpass
os.environ['PIXELTABLE_API_KEY'] = getpass('Pixeltable API Key:')
```
**Option 2: Environment variable**
Add to your `~/.zshrc` or `~/.bashrc`:
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
export PIXELTABLE_API_KEY='your-api-key-here'
```
**Option 3: Config file**
Add to `~/.pixeltable/config.toml`:
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
[pixeltable]
api_key = 'your-api-key-here'
```
See the [Configuration
Guide](/platform/configuration) for details.
### Create a sample dataset
Let’s create a table with images from this repository to publish. The
`comment` parameter provides a description that will be visible on
Pixeltable Cloud:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table(
'sharing-demo.photos',
schema={'image': pxt.Image, 'description': pxt.String},
comment='Sample image dataset for demonstrating Pixeltable Cloud publishing',
)
```
Inserted 3 rows with 0 errors in 0.02 s (169.05 rows/s)
3 rows inserted.
### Publish your dataset
Publish your table to Pixeltable Cloud. When calling `publish()`:
* **`source`** (required): An existing local table - either a table path
string (e.g., `'sample-images.photos'`) or table handle (e.g., `t`)
* If you use a local table path string, it must match a table in your
local database (you can verify with `pxt.list_tables()`)
* **`destination_uri`** (required): The cloud URI where you want to
publish, in the format `pxt://orgname/dataset`
* Pixeltable automatically creates any directory structure in the
cloud based on this URI
* Your local directory structure doesn’t need to match the cloud
structure
See the [publish() SDK
reference](/sdk/latest/pixeltable#func-publish)
for full documentation.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Option 1: Publish using table path (string)
pxt.publish(
source='sharing-demo.photos', # Table path from list_tables()
destination_uri='pxt://your-orgname/sample-images',
)
# Option 2: Publish using table handle
# pxt.publish(
# source=t, # Table handle you assigned
# destination_uri='pxt://your-orgname/sample-images'
# )
```
### Understanding destination URIs
The `destination_uri` in `publish()` uses the format:
`pxt://org:database/path`
**URI components:**
* **`org`** (required): Your organization name
* **`database`** (optional): Database name - defaults to `main` if
omitted
* **`path`** (required): Directory and table path in the cloud
**Examples:**
* `pxt://orgname/my-dataset` → Uses the default `main` database
* `pxt://orgname:main/my-dataset` → Explicitly specifies the `main`
database
* `pxt://orgname:analytics/my-dataset` → Uses the `analytics` database
**About databases:**
* Every Pixeltable Cloud account includes a `main` database by default
* Each database has its own storage bucket
* You can create additional databases in your [Pixeltable
dashboard](https://pixeltable.com/dashboard)
### Updating published datasets with push
After you’ve published a dataset, you can update the cloud replica with
local changes using `push()`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Make some changes to your local table
t.insert(
[
{
'image': f'{base_url}/000000000049.jpg',
'description': 'Outdoor scene',
}
]
)
# Push the changes to your published dataset
t.push()
```
This updates the published dataset on Pixeltable Cloud with your local
changes.
Your dataset is now published and can be replicated by others using:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
sample_images = pxt.replicate(
remote_uri='pxt://your-orgname/sample-images',
local_path='sample-images-copy'
)
```
**Note:** If you are the owner of a published table, you cannot use
`replicate()` to create a replica of your own table. This is because the
table already exists in your Pixeltable database. The `replicate()`
function is intended for pulling datasets published by others into your
environment.
### Access control
The `access` parameter in `publish()` controls who can replicate your
dataset:
* **`access='private'`** (default): Only your team members can access
the dataset
* **`access='public'`**: Anyone can replicate your dataset
You can set access control either at the time of publish using the
`access` parameter, or change it later in the [Pixeltable Cloud
UI](https://pixeltable.com/dashboard). You can also manage team members
and permissions in your dashboard.
### Deleting published tables
If you want to delete a published table, you have two options:
**Option 1: Using the Pixeltable SDK**
Use `drop_table()` with your table’s destination URI (the same `pxt://`
URI you used when publishing):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.drop_table('pxt://your-orgname/sample-images')
```
**Option 2: Using the Pixeltable Cloud dashboard**
Navigate to your [Pixeltable Cloud
dashboard](https://pixeltable.com/dashboard) and delete the table from
the UI.
## Get help
Have questions or need support? Join our community:
* **[Discord Community](https://discord.com/invite/QPyqFYx2UN)**: Ask
questions, get community support, and share what you build with
Pixeltable
* **[YouTube](https://www.youtube.com/@PixeltableHQ)**: Watch tutorials,
demos, and feature walkthroughs
* **[GitHub Issues](https://github.com/pixeltable/pixeltable/issues)**:
Report bugs or request features
## Resources
* [Pixeltable Cloud Dashboard](https://www.pixeltable.com/dashboard)
* [Pixeltable Public Datasets](https://www.pixeltable.com/data-products)
* [Pixeltable SDK Reference](/sdk/latest/)
# Embedding Indices
Source: https://docs.pixeltable.com/platform/embedding-indexes
Create and query embedding indices in Pixeltable to power semantic search, similarity lookup, and retrieval-augmented generation pipelines.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Main takeaways:
* Indexing in Pixeltable is declarative
* you create an index on a column and supply the embedding functions
you want to use (for inserting data into the index as well as
lookups)
* Pixeltable maintains the index in response to any kind of update of
the indexed table (i.e., `insert()`/`update()`/`delete()`)
* Perform index lookups with the `similarity()` pseudo-function, in
combination with the `order_by()` and `limit()` clauses
To make this concrete, let’s create a table of images with the
[`create_table()`](/sdk/latest/pixeltable#func-create_table)
function. We’re also going to add some columns to demonstrate combining
similarity search with other predicates.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable transformers sentence-transformers
```
\[notice] A new release of pip is available: 25.3 -> 26.0.1
\[notice] To update, run: pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Delete the `indices_demo` directory and its contents, if it exists
pxt.drop_dir('indices_demo', force=True)
# Create the directory and table to use for the demo
pxt.create_dir('indices_demo')
schema = {'id': pxt.Int, 'img': pxt.Image}
imgs = pxt.create_table('indices_demo/img_tbl', schema)
```
We start out by inserting 10 rows:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
img_urls = [
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000030.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000034.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000042.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000049.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000057.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000061.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000063.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000064.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000069.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000071.jpg',
]
imgs.insert({'id': i, 'img': url} for i, url in enumerate(img_urls))
```
Inserted 10 rows with 0 errors in 2.53 s (3.96 rows/s)
10 rows inserted.
For the sake of convenience, we’re storing the images as external URLs,
which are cached transparently by Pixeltable. For details on working
with external media files, see [Working with External
Files](/platform/external-files).
## Creating an index
To create and populate an index, we call
[`Table.add_embedding_index()`](/sdk/latest/table#method-add_embedding_index)
and tell it which UDF or UDFs to use to create embeddings. That
definition is persisted as part of the table’s metadata, which allows
Pixeltable to maintain the index in response to updates to the table.
Any embedding UDF can be used for the index. For this example, we’re
going to use a
[CLIP](https://huggingface.co/docs/transformers/en/model_doc/clip)
model, which has built-in support in Pixeltable under the
[`pixeltable.functions.huggingface`](/sdk/latest/huggingface)
package. As an alternative, you could use an online service such as
OpenAI (see
[`pixeltable.functions.openai`](/sdk/latest/openai)),
or create your own embedding UDF with custom code (we’ll see how to do
this below).
Because we’re adding an index to an image column, the UDF we specify
*must* be able to handle images. In fact, CLIP models are multimodal:
they can handle both text and images, which is useful for doing lookups
against the index.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import PIL.Image
from pixeltable.functions.huggingface import clip
# create embedding index on the 'img' column
imgs.add_embedding_index(
'img', embedding=clip.using(model_id='openai/clip-vit-base-patch32')
)
```
The first parameter of `add_embedding_index()` is the name of the column
being indexed; the `embed` parameter specifies the relevant embedding.
Notice the notation we used:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
clip.using(model_id='openai/clip-vit-base-patch32')
```
`clip` is a general-purpose UDF that can accept any CLIP model available
in the Hugging Face model repository. To define an embedding, however,
we need to provide a specific embedding function to
`add_embedding_index()`: a function that is *not* parameterized on
`model_id`. The `.using(model_id=...)` syntax tells Pixeltable to
specialize the `clip` UDF by fixing the `model_id` parameter to the
specific value `'openai/clip-vit-base-patch32'`.
If you’re familiar with functional programming
concepts, you might recognize .using() as a partial
function operator. It’s a general operator that can be applied to
any UDF (not just embedding functions), transforming a UDF with n
parameters into one with k parameters by fixing the values of
n-k of its arguments. Python has something similar in the
functools package: the
functools.partial()
operator.
`add_embedding_index()` provides a few other optional parameters:
* `idx_name`: optional name for the index, which needs to be unique for
the table; a default name is created if this isn’t provided explicitly
* `metric`: the metric to use to compute the similarity of two embedding
vectors; one of:
* `'cosine'`: cosine distance (default)
* `'ip'`: inner product
* `'l2'`: L2 distance
If desired, you can create multiple indexes on the same column, using
different embedding functions. This can be useful to evaluate the
effectiveness of different embedding functions side-by-side, or to use
embedding functions tailored to specific use cases. In that case, you
can provide explicit names for those indexes and then reference them
during queries. We’ll illustrate that later with an example.
## Using the index in queries
To take advantage of an embedding index when querying a table, we use
the `similarity()` pseudo-function, which is invoked as a method on the
indexed column, in combination with the
[`order_by()`](/sdk/latest/query#method-order_by)
and
[`limit()`](/sdk/latest/query#method-limit)
clauses. First, we’ll get a sample image from the table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# retrieve the 'img' column of some row as a PIL.Image.Image
sample_img = imgs.select(imgs.img).collect()[6]['img']
sample_img
```
We then call the `similarity()` pseudo-function as a method on the
indexed column and apply `order_by()` and `limit()`. We used the default
cosine distance when we created the index, so we’re going to order by
descending similarity (`order_by(..., asc=False)`):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sim = imgs.img.similarity(image=sample_img)
res = (
imgs.order_by(sim, asc=False) # Order by descending similarity
.limit(2) # Limit number of results to 2
.select(imgs.id, imgs.img, sim)
.collect() # Retrieve results now
)
res
```
We can combine nearest-neighbor/similarity search with standard
predicates. Here’s the same query, but filtering out the selected
`sample_img` (which we already know has perfect similarity with itself):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
res = (
imgs.order_by(sim, asc=False)
.where(imgs.id != 6) # Additional clause
.limit(2)
.select(imgs.id, imgs.img, sim)
.collect()
)
res
```
## Index updates
In Pixeltable, each index is kept up-to-date automatically in response
to changes to the indexed table.
To illustrate this, let’s insert a few more rows:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
more_img_urls = [
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000080.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000090.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000106.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000108.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000139.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000285.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000632.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000724.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000776.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000785.jpg',
]
imgs.insert(
{'id': 10 + i, 'img': url} for i, url in enumerate(more_img_urls)
)
```
Inserted 10 rows with 0 errors in 1.29 s (7.75 rows/s)
10 rows inserted.
When we now re-run the initial similarity query, we get a different
result:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sim = imgs.img.similarity(image=sample_img)
res = (
imgs.order_by(sim, asc=False)
.limit(2)
.select(imgs.id, imgs.img, sim)
.collect()
)
res
```
## Similarity search on different types
Because CLIP models are multimodal, we can also do lookups by text.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sim = imgs.img.similarity(string='train') # String lookup
res = (
imgs.order_by(sim, asc=False)
.limit(2)
.select(imgs.id, imgs.img, sim)
.collect()
)
res
```
## Creating multiple indexes on a single column
We can create multiple embedding indexes on the same column, utilizing
different embedding models. In order to use a specific index in a query,
we need to assign it a name and then use that name in the query.
To illustrate this, let’s create a table with text (taken from the
Wikipedia article on [Pablo
Picasso](https://en.wikipedia.org/wiki/Pablo_Picasso)):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
txts = pxt.create_table('indices_demo/text_tbl', {'text': pxt.String})
sentences = [
'Pablo Ruiz Picasso (25 October 1881 – 8 April 1973) was a Spanish painter, sculptor, printmaker, ceramicist, and theatre designer who spent most of his adult life in France.',
'One of the most influential artists of the 20th century, he is known for co-founding the Cubist movement, the invention of constructed sculpture,[8][9] the co-invention of collage, and for the wide variety of styles that he helped develop and explore.',
"Among his most famous works are the proto-Cubist Les Demoiselles d'Avignon (1907) and the anti-war painting Guernica (1937), a dramatic portrayal of the bombing of Guernica by German and Italian air forces during the Spanish Civil War.",
'Picasso demonstrated extraordinary artistic talent in his early years, painting in a naturalistic manner through his childhood and adolescence.',
'During the first decade of the 20th century, his style changed as he experimented with different theories, techniques, and ideas.',
'After 1906, the Fauvist work of the older artist Henri Matisse motivated Picasso to explore more radical styles, beginning a fruitful rivalry between the two artists, who subsequently were often paired by critics as the leaders of modern art.',
"Picasso's output, especially in his early career, is often periodized.",
'While the names of many of his later periods are debated, the most commonly accepted periods in his work are the Blue Period (1901–1904), the Rose Period (1904–1906), the African-influenced Period (1907–1909), Analytic Cubism (1909–1912), and Synthetic Cubism (1912–1919), also referred to as the Crystal period.',
"Much of Picasso's work of the late 1910s and early 1920s is in a neoclassical style, and his work in the mid-1920s often has characteristics of Surrealism.",
'His later work often combines elements of his earlier styles.',
]
txts.insert({'text': s} for s in sentences)
```
Inserted 10 rows with 0 errors in 0.03 s (301.25 rows/s)
10 rows inserted.
When calling
[`add_embedding_index()`](/sdk/latest/table#method-add_embedding_index),
we now specify the index name (`idx_name`) directly. If it is not
specified, Pixeltable will assign a name (such as `idx0`).
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import sentence_transformer
txts.add_embedding_index(
'text',
idx_name='minilm_idx',
embedding=sentence_transformer.using(
model_id='sentence-transformers/all-MiniLM-L12-v2'
),
)
txts.add_embedding_index(
'text',
idx_name='e5_idx',
embedding=sentence_transformer.using(model_id='intfloat/e5-large-v2'),
)
```
To do a similarity query, we now call `similarity()` with the `idx`
parameter:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sim = txts.text.similarity('cubism', idx='minilm_idx')
res = (
txts.order_by(sim, asc=False)
.limit(2)
.select(txts.text, sim)
.collect()
)
res
```
## Using a UDF for a custom embedding
## Indexing precomputed embeddings (Array columns)
If your data already contains precomputed embedding vectors — for
example, embeddings exported from another system or generated by a
custom pipeline — you can index them directly without specifying an
embedding function.
`add_embedding_index` supports `Array` columns in addition to `String`,
`Image`, `Video`, and `Audio` columns. When the indexed column is an
`Array`, Pixeltable treats its values as precomputed embeddings and
builds the vector index directly from them.
An embedding function is **optional** for `Array` columns. If provided,
it will be used only to convert query values into embeddings during
similarity search.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import numpy as np
import pixeltable as pxt
pxt.drop_table('indices_demo.precomputed', force=True)
# Table with a precomputed embedding column
precomputed = pxt.create_table(
'indices_demo.precomputed',
{'text': pxt.String, 'embedding': pxt.Array[(384,), pxt.Float]},
)
# Insert rows with precomputed vectors
rng = np.random.default_rng(42)
precomputed.insert(
[
{
'text': 'sample sentence',
'embedding': rng.random(384).tolist(),
},
{
'text': 'another sentence',
'embedding': rng.random(384).tolist(),
},
]
)
# Index the Array column directly — no embedding function needed
precomputed.add_embedding_index('embedding', metric='cosine')
```
To run a similarity search, pass a raw vector to the `similarity`
method:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query_vector = rng.random(384)
sim = precomputed.embedding.similarity(vector=query_vector)
precomputed.order_by(sim, asc=False).limit(2).select(
precomputed.text, sim
).collect()
```
If you want to search by **text** (or another modality) rather than a
raw vector, provide an embedding function that converts the query value
into the same vector space. The function is only used at query time — it
does not re-embed the stored vectors.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
precomputed.add_embedding_index(
'embedding',
string_embed=my_text_embedding_fn, # converts query strings → vectors
metric='cosine',
)
sim = precomputed.embedding.similarity(string='search query')
```
The above examples show how to use any model in the Hugging Face `CLIP`
or `sentence_transformer` model families, and essentially the same
pattern can be used for any other embedding with built-in Pixeltable
support, such as OpenAI embeddings. But what if you want to adapt a new
model family that doesn’t have built-in support in Pixeltable? This can
be done by writing a custom Pixeltable UDF.
In the following example, we’ll write a simple UDF to use the
[BERT](https://www.kaggle.com/models/tensorflow/bert/tensorFlow2/en-uncased-preprocess/3)
model built on TensorFlow. First we install the necessary dependencies.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU tensorflow tensorflow-hub tensorflow-text
```
Text embedding UDFs must always take a string as input, and return a
1-dimensional numpy array of fixed dimension (512 in the case of
`small_bert`, the variant we’ll be using). If we were writing an image
embedding UDF, the `input` would have type `PIL.Image.Image` rather than
`str`. The UDF is straightforward, loading the model and evaluating it
against the input, with a minor data conversion on either side of the
model invocation.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_text # Necessary to ensure BERT dependencies are loaded
@pxt.udf
def bert(input: str) -> pxt.Array[(512,), pxt.Float]:
"""Computes text embeddings using the small_bert model."""
preprocessor = hub.load(
'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3'
)
bert_model = hub.load(
'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/2'
)
tensor = tf.constant([input]) # Convert the string to a tensor
result = bert_model(preprocessor(tensor))['pooled_output']
return result.numpy()[0, :]
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
txts.add_embedding_index('text', idx_name='bert_idx', embedding=bert)
```
Here’s the output of our sample query run against `bert_idx`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sim = txts.text.similarity('cubism', idx='bert_idx')
res = (
txts.order_by(sim, asc=False)
.limit(2)
.select(txts.text, sim)
.collect()
)
res
```
Our example UDF is very simple, but it would perform poorly in a
production setting. To make our UDF production-ready, we’d want to do
two things:
* Cache the model: the current version calls `hub.load()` on every UDF
invocation. In a real application, we’d want to instantiate the model
just once, then reuse it on subsequent UDF calls.
* Batch our inputs: we’d use Pixeltable’s batching capability to ensure
we’re making efficient use of the model. Batched UDFs are described in
depth in the [User-Defined
Functions](/platform/udfs-in-pixeltable)
how-to guide.
You might have noticed that the updates to `bert_idx` seem sluggish;
that’s why!
## Deleting an index
To delete an index, call
[`Table.drop_embedding_index()`](/sdk/latest/table#method-drop_embedding_index):
* specify the `idx_name` parameter if you have multiple indices
* otherwise the `column_name` parameter is sufficient
Given that we have several embedding indices, we’ll specify which index
to drop:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
txts.drop_embedding_index(idx_name='e5_idx')
```
# External Files
Source: https://docs.pixeltable.com/platform/external-files
Reference media stored in Amazon S3, GCS, Azure, and other external locations from Pixeltable tables without copying files locally.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
In Pixeltable, all media data (videos, images, audio) resides in
external files, and Pixeltable stores references to those. The files can
be local or remote (e.g., in S3). For the latter, Pixeltable
automatically caches the files locally on access.
When interacting with media data via Pixeltable, either through queries
or UDFs, the user sees the following Python types:
* `ImageType`: `PIL.Image.Image`
* `VideoType`: `string` (local path)
* `AudioType`: `string` (local path)
Let’s create a table and load some data to see what that looks like:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable boto3
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
import random
import shutil
import tempfile
# First drop the `external_data` directory if it exists, to ensure
# a clean environment for the demo
pxt.drop_dir('external_data', force=True)
pxt.create_dir('external_data')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory \`external\_data\`.
\
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v = pxt.create_table('external_data/videos', {'video': pxt.Video})
prefix = 's3://multimedia-commons/'
paths = [
'data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4',
'data/videos/mp4/ffe/feb/ffefebb41485539f964760e6115fbc44.mp4',
'data/videos/mp4/ffe/f73/ffef7384d698b5f70d411c696247169.mp4',
]
v.insert({'video': prefix + p} for p in paths)
```
We just inserted 3 rows with video files residing in S3. When we now
query these, we are presented with their locally cached counterparts.
(Note: we don’t simply display the output of `collect()` here, because
that is formatted as an HTML table with a media player and so would
obscure the file path.)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
rows = list(v.select(v.video).collect())
rows[0]
```
Let’s make a local copy of the first file and insert that separately.
First, the copy:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
local_path = tempfile.mktemp(suffix='.mp4')
shutil.copyfile(rows[0]['video'], local_path)
local_path
```
When we query this again, we see that local paths are preserved:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
rows = list(v.select(v.video).collect())
rows
```
## Dealing with errors
When interacting with media data in Pixeltable, the user can assume that
the underlying files exist, are local and are valid for their respective
data type. In other words, the user doesn’t need to consider error
conditions.
To that end, Pixeltable validates media data on ingest. The default
behavior is to reject invalid media files:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v.insert([{'video': prefix + 'bad_path.mp4'}])
```
Computing cells: 0%| | 0/2 \[00:01\, ? cells/s]
Error: Failed to download s3://multimedia-commons/bad\_path.mp4: An error occurred (404) when calling the HeadObject operation: Not Found
\[0;31m---------------------------------------------------------------------------\[0m
\[0;31mError\[0m Traceback (most recent call last)
Cell \[0;32mIn\[9], line 1\[0m
\[0;32m----> 1\[0m \[43mv\[49m\[38;5;241;43m.\[39;49m\[43minsert\[49m\[43m(\[49m\[43mvideo\[49m\[38;5;241;43m=\[39;49m\[43mprefix\[49m\[43m \[49m\[38;5;241;43m+\[39;49m\[43m \[49m\[38;5;124;43m'\[39;49m\[38;5;124;43mbad\_path.mp4\[39;49m\[38;5;124;43m'\[39;49m\[43m)\[49m
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/insertable\_table.py:125\[0m, in \[0;36mInsertableTable.insert\[0;34m(self, rows, print\_stats, on\_error, \*\*kwargs)\[0m
\[1;32m 123\[0m \[38;5;28;01mraise\[39;00m excs\[38;5;241m.\[39mError(\[38;5;124m'\[39m\[38;5;124mrows must be a list of dictionaries\[39m\[38;5;124m'\[39m)
\[1;32m 124\[0m \[38;5;28mself\[39m\[38;5;241m.\[39m\_validate\_input\_rows(rows)
\[0;32m--> 125\[0m status \[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43m\_tbl\_version\[49m\[38;5;241;43m.\[39;49m\[43minsert\[49m\[43m(\[49m\[43mrows\[49m\[43m,\[49m\[43m \[49m\[38;5;28;43;01mNone\[39;49;00m\[43m,\[49m\[43m \[49m\[43mprint\_stats\[49m\[38;5;241;43m=\[39;49m\[43mprint\_stats\[49m\[43m,\[49m\[43m \[49m\[43mfail\_on\_exception\[49m\[38;5;241;43m=\[39;49m\[43mfail\_on\_exception\[49m\[43m)\[49m
\[1;32m 127\[0m \[38;5;28;01mif\[39;00m status\[38;5;241m.\[39mnum\_excs \[38;5;241m==\[39m \[38;5;241m0\[39m:
\[1;32m 128\[0m cols\_with\_excs\_str \[38;5;241m=\[39m \[38;5;124m'\[39m\[38;5;124m'\[39m
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table\_version.py:723\[0m, in \[0;36mTableVersion.insert\[0;34m(self, rows, df, conn, print\_stats, fail\_on\_exception)\[0m
\[1;32m 721\[0m \[38;5;28;01mif\[39;00m conn \[38;5;129;01mis\[39;00m \[38;5;28;01mNone\[39;00m:
\[1;32m 722\[0m \[38;5;28;01mwith\[39;00m Env\[38;5;241m.\[39mget()\[38;5;241m.\[39mengine\[38;5;241m.\[39mbegin() \[38;5;28;01mas\[39;00m conn:
\[0;32m--> 723\[0m \[38;5;28;01mreturn\[39;00m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43m\_insert\[49m\[43m(\[49m
\[1;32m 724\[0m \[43m \[49m\[43mplan\[49m\[43m,\[49m\[43m \[49m\[43mconn\[49m\[43m,\[49m\[43m \[49m\[43mtime\[49m\[38;5;241;43m.\[39;49m\[43mtime\[49m\[43m(\[49m\[43m)\[49m\[43m,\[49m\[43m \[49m\[43mprint\_stats\[49m\[38;5;241;43m=\[39;49m\[43mprint\_stats\[49m\[43m,\[49m\[43m \[49m\[43mrowids\[49m\[38;5;241;43m=\[39;49m\[43mrowids\[49m\[43m(\[49m\[43m)\[49m\[43m,\[49m\[43m \[49m\[43mabort\_on\_exc\[49m\[38;5;241;43m=\[39;49m\[43mfail\_on\_exception\[49m\[43m)\[49m
\[1;32m 725\[0m \[38;5;28;01melse\[39;00m:
\[1;32m 726\[0m \[38;5;28;01mreturn\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_insert(
\[1;32m 727\[0m plan, conn, time\[38;5;241m.\[39mtime(), print\_stats\[38;5;241m=\[39mprint\_stats, rowids\[38;5;241m=\[39mrowids(), abort\_on\_exc\[38;5;241m=\[39mfail\_on\_exception)
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table\_version.py:737\[0m, in \[0;36mTableVersion.\_insert\[0;34m(self, exec\_plan, conn, timestamp, rowids, print\_stats, abort\_on\_exc)\[0m
\[1;32m 735\[0m \[38;5;28mself\[39m\[38;5;241m.\[39mversion \[38;5;241m+\[39m\[38;5;241m=\[39m \[38;5;241m1\[39m
\[1;32m 736\[0m result \[38;5;241m=\[39m UpdateStatus()
\[0;32m--> 737\[0m num\_rows, num\_excs, cols\_with\_excs \[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mstore\_tbl\[49m\[38;5;241;43m.\[39;49m\[43minsert\_rows\[49m\[43m(\[49m
\[1;32m 738\[0m \[43m \[49m\[43mexec\_plan\[49m\[43m,\[49m\[43m \[49m\[43mconn\[49m\[43m,\[49m\[43m \[49m\[43mv\_min\[49m\[38;5;241;43m=\[39;49m\[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mversion\[49m\[43m,\[49m\[43m \[49m\[43mrowids\[49m\[38;5;241;43m=\[39;49m\[43mrowids\[49m\[43m,\[49m\[43m \[49m\[43mabort\_on\_exc\[49m\[38;5;241;43m=\[39;49m\[43mabort\_on\_exc\[49m\[43m)\[49m
\[1;32m 739\[0m result\[38;5;241m.\[39mnum\_rows \[38;5;241m=\[39m num\_rows
\[1;32m 740\[0m result\[38;5;241m.\[39mnum\_excs \[38;5;241m=\[39m num\_excs
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/store.py:323\[0m, in \[0;36mStoreBase.insert\_rows\[0;34m(self, exec\_plan, conn, v\_min, show\_progress, rowids, abort\_on\_exc)\[0m
\[1;32m 321\[0m \[38;5;28;01mtry\[39;00m:
\[1;32m 322\[0m exec\_plan\[38;5;241m.\[39mopen()
\[0;32m--> 323\[0m \[38;5;28;01mfor\[39;00m row\_batch \[38;5;129;01min\[39;00m exec\_plan:
\[1;32m 324\[0m num\_rows \[38;5;241m+\[39m\[38;5;241m=\[39m \[38;5;28mlen\[39m(row\_batch)
\[1;32m 325\[0m \[38;5;28;01mfor\[39;00m batch\_start\_idx \[38;5;129;01min\[39;00m \[38;5;28mrange\[39m(\[38;5;241m0\[39m, \[38;5;28mlen\[39m(row\_batch), \[38;5;28mself\[39m\[38;5;241m.\[39m\_\_INSERT\_BATCH\_SIZE):
\[1;32m 326\[0m \[38;5;66;03m# compute batch of rows and convert them into table rows\[39;00m
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exec/expr\_eval\_node.py:45\[0m, in \[0;36mExprEvalNode.\_\_next\_\_\[0;34m(self)\[0m
\[1;32m 44\[0m \[38;5;28;01mdef\[39;00m \[38;5;21m\_\_next\_\_\[39m(\[38;5;28mself\[39m) \[38;5;241m-\[39m\[38;5;241m>\[39m DataRowBatch:
\[0;32m---> 45\[0m input\_batch \[38;5;241m=\[39m \[38;5;28;43mnext\[39;49m\[43m(\[49m\[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43minput\[49m\[43m)\[49m
\[1;32m 46\[0m \[38;5;66;03m# compute target exprs\[39;00m
\[1;32m 47\[0m \[38;5;28;01mfor\[39;00m cohort \[38;5;129;01min\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39mcohorts:
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exec/cache\_prefetch\_node.py:71\[0m, in \[0;36mCachePrefetchNode.\_\_next\_\_\[0;34m(self)\[0m
\[1;32m 68\[0m futures\[executor\[38;5;241m.\[39msubmit(\[38;5;28mself\[39m\[38;5;241m.\[39m\_fetch\_url, row, info\[38;5;241m.\[39mslot\_idx)] \[38;5;241m=\[39m (row, info)
\[1;32m 69\[0m \[38;5;28;01mfor\[39;00m future \[38;5;129;01min\[39;00m concurrent\[38;5;241m.\[39mfutures\[38;5;241m.\[39mas\_completed(futures):
\[1;32m 70\[0m \[38;5;66;03m# TODO: does this need to deal with recoverable errors (such as retry after throttling)?\[39;00m
\[0;32m---> 71\[0m tmp\_path \[38;5;241m=\[39m \[43mfuture\[49m\[38;5;241;43m.\[39;49m\[43mresult\[49m\[43m(\[49m\[43m)\[49m
\[1;32m 72\[0m \[38;5;28;01mif\[39;00m tmp\_path \[38;5;129;01mis\[39;00m \[38;5;28;01mNone\[39;00m:
\[1;32m 73\[0m \[38;5;28;01mcontinue\[39;00m
File \[0;32m/opt/miniconda3/envs/pxt/lib/python3.9/concurrent/futures/\_base.py:439\[0m, in \[0;36mFuture.result\[0;34m(self, timeout)\[0m
\[1;32m 437\[0m \[38;5;28;01mraise\[39;00m CancelledError()
\[1;32m 438\[0m \[38;5;28;01melif\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_state \[38;5;241m==\[39m FINISHED:
\[0;32m--> 439\[0m \[38;5;28;01mreturn\[39;00m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43m\_\_get\_result\[49m\[43m(\[49m\[43m)\[49m
\[1;32m 441\[0m \[38;5;28mself\[39m\[38;5;241m.\[39m\_condition\[38;5;241m.\[39mwait(timeout)
\[1;32m 443\[0m \[38;5;28;01mif\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_state \[38;5;129;01min\[39;00m \[CANCELLED, CANCELLED\_AND\_NOTIFIED]:
File \[0;32m/opt/miniconda3/envs/pxt/lib/python3.9/concurrent/futures/\_base.py:391\[0m, in \[0;36mFuture.\_\_get\_result\[0;34m(self)\[0m
\[1;32m 389\[0m \[38;5;28;01mif\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_exception:
\[1;32m 390\[0m \[38;5;28;01mtry\[39;00m:
\[0;32m--> 391\[0m \[38;5;28;01mraise\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_exception
\[1;32m 392\[0m \[38;5;28;01mfinally\[39;00m:
\[1;32m 393\[0m \[38;5;66;03m# Break a reference cycle with the exception in self.\_exception\[39;00m
\[1;32m 394\[0m \[38;5;28mself\[39m \[38;5;241m=\[39m \[38;5;28;01mNone\[39;00m
File \[0;32m/opt/miniconda3/envs/pxt/lib/python3.9/concurrent/futures/thread.py:58\[0m, in \[0;36m\_WorkItem.run\[0;34m(self)\[0m
\[1;32m 55\[0m \[38;5;28;01mreturn\[39;00m
\[1;32m 57\[0m \[38;5;28;01mtry\[39;00m:
\[0;32m---> 58\[0m result \[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mfn\[49m\[43m(\[49m\[38;5;241;43m*\[39;49m\[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43margs\[49m\[43m,\[49m\[43m \[49m\[38;5;241;43m*\[39;49m\[38;5;241;43m\*\[39;49m\[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mkwargs\[49m\[43m)\[49m
\[1;32m 59\[0m \[38;5;28;01mexcept\[39;00m \[38;5;167;01mBaseException\[39;00m \[38;5;28;01mas\[39;00m exc:
\[1;32m 60\[0m \[38;5;28mself\[39m\[38;5;241m.\[39mfuture\[38;5;241m.\[39mset\_exception(exc)
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exec/cache\_prefetch\_node.py:115\[0m, in \[0;36mCachePrefetchNode.\_fetch\_url\[0;34m(self, row, slot\_idx)\[0m
\[1;32m 113\[0m \[38;5;28mself\[39m\[38;5;241m.\[39mrow\_builder\[38;5;241m.\[39mset\_exc(row, slot\_idx, exc)
\[1;32m 114\[0m \[38;5;28;01mif\[39;00m \[38;5;129;01mnot\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39mctx\[38;5;241m.\[39mignore\_errors:
\[0;32m--> 115\[0m \[38;5;28;01mraise\[39;00m exc \[38;5;28;01mfrom\[39;00m \[38;5;28;01mNone\[39;00m \[38;5;66;03m# suppress original exception\[39;00m
\[1;32m 116\[0m \[38;5;28;01mreturn\[39;00m \[38;5;28;01mNone\[39;00m
\[0;31mError\[0m: Failed to download s3://multimedia-commons/bad\_path.mp4: An error occurred (404) when calling the HeadObject operation: Not Found
The same happens for corrupted files:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# create invalid .mp4
with tempfile.NamedTemporaryFile(
mode='wb', suffix='.mp4', delete=False
) as temp_file:
temp_file.write(random.randbytes(1024))
corrupted_path = temp_file.name
v.insert([{'video': corrupted_path}])
```
Alternatively, Pixeltable can also be instructed to record error
conditions and proceed with the ingest, via the `on_error` flag
(default: `'abort'`):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v.insert(
[{'video': prefix + 'bad_path.mp4'}, {'video': corrupted_path}],
on_error='ignore',
)
```
Every media column has properties `errortype` and `errormsg` (both
containing `string` data) that indicate whether the column value is
valid. Invalid values show up as `None` and have non-null
`errortype`/`errormsg`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v.select(v.video == None, v.video.errortype, v.video.errormsg).collect()
```
Errors can now be inspected (and corrected) after the ingest:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v.where(v.video.errortype != None).select(v.video.errormsg).collect()
```
## Accessing the original file paths
In some cases, it will be necessary to access file paths (not, say, the
`PIL.Image.Image`), and Pixeltable provides the column properties
`fileurl` and `localpath` for that purpose:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
v.select(v.video.fileurl, v.video.localpath).collect()
```
Note that for local media files, the `fileurl` property still returns a
parsable URL.
# Iterators
Source: https://docs.pixeltable.com/platform/iterators
Use Pixeltable iterators to split documents, video, audio, and images into row-level components for view-based downstream processing.
## What are iterators?
Iterators in Pixeltable are specialized tools for processing and transforming media content. They efficiently break down large files into manageable chunks, enabling analysis at different granularities. Iterators work seamlessly with views to create virtual derived tables without duplicating storage.
In Pixeltable, iterators:
* Process media files incrementally to manage memory efficiently
* Transform single records into multiple output records
* Support various media types including documents, videos, images, and audio
* Integrate with the view system for automated processing pipelines
* Provide configurable parameters for fine-tuning output
Iterators are particularly useful when:
* Working with large media files that can't be processed at once
* Building retrieval systems that require chunked content
* Creating analysis pipelines for multimedia data
* Implementing feature extraction workflows
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.document import document_splitter
# Create a view using an iterator
chunks = pxt.create_view(
'docs/chunks',
documents_table,
iterator=document_splitter(
document=documents_table.document,
separators='sentence,token_limit',
limit=300
)
)
```
## Core concepts
Split documents into chunks by headings, sentences, or token limits
Extract frames at specified intervals or counts
Divide images into overlapping or non-overlapping tiles
Split audio files into time-based chunks with configurable overlap
Iterators are powerful tools for processing large media files. They work seamlessly with Pixeltable's computed columns and versioning system.
## Available iterators
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.document import document_splitter
# Create view with document chunks
chunks_view = pxt.create_view(
'docs/chunks',
docs_table,
iterator=document_splitter(
document=docs_table.document,
separators='sentence,token_limit',
limit=500,
metadata='title,heading'
)
)
```
### Parameters
* `separators`: Choose from 'heading', 'sentence', 'token\_limit', 'char\_limit', 'page'
* `limit`: Maximum tokens/characters per chunk
* `metadata`: Optional fields like 'title', 'heading', 'sourceline', 'page', 'bounding\_box'
* `overlap`: Optional overlap between chunks
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.video import frame_iterator
# Extract frames at 1 FPS
frames_view = pxt.create_view(
'videos/frames',
videos_table,
iterator=frame_iterator(
video=videos_table.video,
fps=1.0
)
)
# Extract exact number of frames (evenly spaced)
frames_view = pxt.create_view(
'videos/sampled',
videos_table,
iterator=frame_iterator(
video=videos_table.video,
num_frames=10 # Extract 10 evenly-spaced frames
)
)
# Extract only keyframes (I-frames) for efficient processing
keyframes_view = pxt.create_view(
'videos/keyframes',
videos_table,
iterator=frame_iterator(
video=videos_table.video,
keyframes_only=True
)
)
```
### Parameters
* `fps`: Frames per second to extract (can be fractional)
* `num_frames`: Exact number of frames to extract
* `keyframes_only`: Extract only keyframes (I-frames) - efficient for quick video scanning
* Only one of `fps`, `num_frames`, or `keyframes_only` can be specified
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.video import video_splitter
# Split video into 10-second segments
segments_view = pxt.create_view(
'videos/segments',
videos_table,
iterator=video_splitter(
video=videos_table.video,
duration=10.0,
min_segment_duration=1.0
)
)
```
### Parameters
* `duration`: Duration of each segment in seconds
* `overlap`: Overlap between segments in seconds
* `min_segment_duration`: Drop last segment if shorter than this value
### Returns
For each segment, yields:
* `segment_start`: Start time of the segment in seconds
* `segment_end`: End time of the segment in seconds
* `video_segment`: The video segment file
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.string import string_splitter
# Split text into sentences
sentences_view = pxt.create_view(
'texts/sentences',
texts_table,
iterator=string_splitter(
text=texts_table.content,
separators='sentence'
)
)
```
### Parameters
* `separators`: Choose from 'sentence' (requires spacy)
### Returns
For each chunk, yields:
* `text`: The text chunk
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.image import tile_iterator
# Create tiles with overlap
tiles_view = pxt.create_view(
'images/tiles',
images_table,
iterator=tile_iterator(
image=images_table.image,
tile_size=(224, 224), # Width, Height
overlap=(32, 32) # Horizontal, Vertical overlap
)
)
```
### Parameters
* `tile_size`: Tuple of (width, height) for each tile
* `overlap`: Optional tuple for overlap between tiles
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.audio import audio_splitter
# Split audio into chunks
chunks_view = pxt.create_view(
'audio/chunks',
audio_table,
iterator=audio_splitter(
audio=audio_table.audio,
duration=30.0, # Split into 30-second chunks
overlap=2.0, # 2-second overlap between chunks
min_segment_duration=5.0 # Drop last chunk if < 5 seconds
)
)
```
### Parameters
* `duration` (float): Duration of each audio chunk in seconds
* `overlap` (float, default: 0.0): Overlap duration between consecutive chunks in seconds
* `min_segment_duration` (float, default: 0.0): Minimum duration threshold - the last chunk will be dropped if it's shorter than this value
### Returns
For each chunk, yields:
* `start_time_sec`: Start time of the chunk in seconds
* `end_time_sec`: End time of the chunk in seconds
* `audio_chunk`: The audio chunk as pxt.Audio type
### Notes
* If the input contains no audio, no chunks are yielded
* The audio file is processed efficiently with proper codec handling
* Supports various audio formats including MP3, AAC, Vorbis, Opus, FLAC
## Common use cases
Split documents for:
* RAG systems
* Text analysis
* Content extraction
Extract frames for:
* Object detection
* Scene classification
* Activity recognition
Create tiles for:
* High-resolution analysis
* Object detection
* Segmentation tasks
Split audio for:
* Speech recognition
* Sound classification
* Audio feature extraction
## Example workflows
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create document chunks
chunks = pxt.create_view(
'rag/chunks',
docs_table,
iterator=document_splitter(
document=docs_table.document,
separators='sentence,token_limit',
limit=500
)
)
# Add embeddings
chunks.add_embedding_index(
'text',
string_embed=sentence_transformer.using(
model_id='all-mpnet-base-v2'
)
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Extract frames at 1 FPS
frames = pxt.create_view(
'detection/frames',
videos_table,
iterator=frame_iterator(
video=videos_table.video,
fps=1.0
)
)
# Add object detection
frames.add_computed_column(detections=detect_objects(frames.frame))
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Split long audio files
chunks = pxt.create_view(
'audio/chunks',
audio_table,
iterator=audio_splitter(
audio=audio_table.audio,
duration=30.0
)
)
# Add transcription
chunks.add_computed_column(text=whisper_transcribe(chunks.audio_chunk))
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.video import make_video
# Extract frames at 1 FPS
frames = pxt.create_view(
'video/frames',
videos_table,
iterator=frame_iterator(
video=videos_table.video,
fps=1.0
)
)
# Process frames (e.g., apply a filter)
frames.add_computed_column(processed=frames.frame.filter('BLUR'))
# Create new videos from processed frames
processed_videos = frames.select(
frames.video_id,
make_video(frames.pos, frames.processed) # Default fps is 25
).group_by(frames.video_id).collect()
```
## Best practices
* Use appropriate chunk sizes
* Consider overlap requirements
* Monitor memory usage with large files
* Balance chunk size vs. processing time
* Use batch processing when possible
* Cache intermediate results
## Tips & tricks
When using `token_limit` with `document_splitter`, ensure the limit accounts for any model context windows in your pipeline.
## Custom iterators with `@pxt.iterator`
You can create your own iterators using the `@pxt.iterator` decorator on a Python generator function. This is the simplest way to define a custom iterator that splits one row into many.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from typing import Iterator, TypedDict
import pixeltable as pxt
class WordRow(TypedDict):
word: str
position: int
@pxt.iterator
def word_iterator(text: str) -> Iterator[WordRow]:
for i, word in enumerate(text.split()):
yield WordRow(word=word, position=i)
# Use as a view iterator
words_view = pxt.create_view(
'text/words',
text_table,
iterator=word_iterator(text_table.content)
)
```
Use `unstored_cols` to mark columns that should not be persisted:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from typing import Iterator, TypedDict
import pixeltable as pxt
class FrameRow(TypedDict):
frame: pxt.Image
timestamp: float
@pxt.iterator(unstored_cols=['frame'])
def my_frame_extractor(video: pxt.Video) -> Iterator[FrameRow]:
# Custom frame extraction logic
...
```
Step-by-step guide to building custom iterators
## Additional resources
All built-in iterators
Chunk documents for RAG
Extract video frames
# Multimodal Type System
Source: https://docs.pixeltable.com/platform/type-system
Pixeltable type system covering scalars, JSON, arrays, image, video, audio, document, and embedding types for structured and ML pipelines.
Pixeltable provides a rich type system designed for multimodal AI applications. Every column and expression has an associated type that determines what data it can hold and what operations are available.
## Type overview
| Pixeltable Type | Python Type | Description |
| --------------- | --------------------------------------------- | -------------------------------------- |
| `pxt.String` | `str` | Text data |
| `pxt.Int` | `int` | Integer numbers |
| `pxt.Float` | `float` | Decimal numbers |
| `pxt.Bool` | `bool` | Boolean values |
| `pxt.Timestamp` | `datetime.datetime` | Timestamp values |
| `pxt.Date` | `datetime.date` | Date values |
| `pxt.UUID` | `uuid.UUID` | Unique identifiers |
| `pxt.Array` | `np.ndarray` | Numerical arrays (embeddings, tensors) |
| `pxt.Json` | `dict`, `list`, `str`, `int`, `float`, `bool` | Flexible JSON data |
| `pxt.Image` | `PIL.Image.Image` | Image data |
| `pxt.Video` | `str` (file path) | Video files |
| `pxt.Audio` | `str` (file path) | Audio files |
| `pxt.Document` | `str` (file path) | Documents (PDFs, markdown, html, etc.) |
`pxt.Audio`, `pxt.Video`, and `pxt.Document` return file paths when queried. Pixeltable automatically downloads and caches remote media locally. Use `.fileurl` to get the original URL.
## Basic types
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
table = pxt.create_table('example/basic_types', {
'text': pxt.String, # Text data
'count': pxt.Int, # Integer numbers
'score': pxt.Float, # Decimal numbers
'active': pxt.Bool, # Boolean values
'created': pxt.Timestamp # Date/time values
})
```
### Auto-generated UUIDs
Use `uuid7()` to create columns that auto-generate unique identifiers:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.uuid import uuid7
# UUID as primary key - auto-generated for each row
products = pxt.create_table('example/products', {
'id': uuid7(), # Auto-generates UUID
'name': pxt.String,
'price': pxt.Float
}, primary_key=['id'])
# Insert without providing 'id' - it's generated automatically
products.insert([{'name': 'Laptop', 'price': 999.99}])
```
You can also add UUIDs to existing tables:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add UUID column to existing table
orders.add_computed_column(order_id=uuid7())
```
By default, `stored=True` for all computed columns—values compute once and persist. For UUIDs, this ensures stable identifiers. Setting `stored=False` would regenerate UUIDs on every query (almost never what you want).
See the [UUID cookbook](/howto/cookbooks/core/workflow-uuid-identity) for more examples of working with unique identifiers.
## Media types
Pixeltable natively supports images, video, audio, and documents as first-class column types.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
media = pxt.create_table('example/media', {
'image': pxt.Image, # Any image
'video': pxt.Video, # Video reference
'audio': pxt.Audio, # Audio file
'document': pxt.Document # PDF/text document
})
```
### Image specialization
Images can be constrained by resolution and/or color mode:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Constrain by resolution
thumbnails = pxt.create_table('example/thumbnails', {
'thumb': pxt.Image[(224, 224)] # Width 224, height 224
})
# Constrain by color mode
grayscale = pxt.create_table('example/grayscale', {
'img': pxt.Image['L'] # Grayscale (1-channel)
})
# Constrain both
rgb_fixed = pxt.create_table('example/rgb_fixed', {
'img': pxt.Image[(300, 200), 'RGB'] # 300x200 RGB images
})
```
See the [PIL Documentation](https://pillow.readthedocs.io/en/stable/handbook/concepts.html) for the full list of image modes (`'RGB'`, `'RGBA'`, `'L'`, etc.).
## Array types (embeddings & tensors)
Arrays are used for embeddings, feature vectors, and tensor data. They must always specify a shape and dtype.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
ml_data = pxt.create_table('example/ml_features', {
# Fixed-size embedding (e.g., from CLIP or OpenAI)
'embedding': pxt.Array[(768,), pxt.Float],
# Variable first dimension (batch of 512-dim vectors)
'features': pxt.Array[(None, 512), pxt.Float],
# 3D tensor with flexible dimensions
'tensor': pxt.Array[(None, None, 3), pxt.Float]
})
```
Array shapes follow NumPy conventions. Use `None` for unconstrained dimensions:
* `(512,)` — fixed 512-element vector
* `(None, 768)` — variable-length sequence of 768-dim vectors
* `(64, 64, 3)` — fixed 64×64×3 tensor
### Working with arrays
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Arrays can be sliced like NumPy arrays
t.select(
t.embedding[0], # First element
t.embedding[5:10], # Slice
t.embedding[-3:] # Last 3 elements
).collect()
```
## JSON type
The `Json` type stores flexible structured data—dictionaries, lists, or primitives.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
logs = pxt.create_table('example/logs', {
'event': pxt.Json
})
logs.insert([
{'event': {'type': 'click', 'x': 100, 'y': 200}},
{'event': {'type': 'scroll', 'delta': 50}},
{'event': ['tag1', 'tag2', 'tag3']}
])
```
### JSON path access
Access nested data using dictionary or attribute syntax:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Dictionary syntax
t.select(t.event['type']).collect()
# Attribute syntax (JSONPath)
t.select(t.event.type).collect()
# List indexing
t.select(t.event.tags[0]).collect()
# Slicing
t.select(t.event.tags[:2]).collect()
```
Pixeltable handles missing keys gracefully—you'll get `None` instead of an exception.
### JSON schema validation
Validate JSON columns against a schema to ensure data integrity:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Define a JSON schema
movie_schema = {
'type': 'object',
'properties': {
'title': {'type': 'string'},
'year': {'type': 'integer'},
'rating': {'type': 'number'}
},
'required': ['title', 'year']
}
# Create table with validated JSON column
movies = pxt.create_table('example/validated_movies', {
'data': pxt.Json[movie_schema]
})
# Valid insert
movies.insert(data={'title': 'Inception', 'year': 2010, 'rating': 8.8})
# Invalid insert raises error (missing required 'year')
# movies.insert(data={'title': 'Movie'}) # Error!
```
### Using Pydantic models
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pydantic import BaseModel
class Movie(BaseModel):
title: str
year: int
rating: float | None = None
# Use the model's JSON schema for validation
movies = pxt.create_table('example/pydantic_movies', {
'data': pxt.Json[Movie.model_json_schema()]
})
```
## Type conversion
Use `astype()` to convert string file paths or URLs to media types:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# String file paths → Media types
media = pxt.create_table('media_table', {'path': pxt.String})
media.insert([{'path': '/path/to/image.jpg'}])
# Convert string path to Image
media.select(img=media.path.astype(pxt.Image)).collect()
```
**Primary use case:** Converting string columns containing file paths or URLs to media types (`Image`, `Video`, `Audio`, `Document`).
For other type conversions, use built-in functions from the [`string`](/sdk/latest/string), [`json`](/sdk/latest/json), or [`math`](/sdk/latest/math) modules. For example, use `string.len()` to get string length as an integer, or access JSON fields directly.
## Column properties
### Media column properties
Media columns (`Image`, `Video`, `Audio`, `Document`) have special properties:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Local file path (Pixeltable ensures this is on local filesystem)
t.select(t.image.localpath).collect()
# Original URL where the media resides
t.select(t.image.fileurl).collect()
```
### Error properties
Computed columns have `errortype` and `errormsg` properties for debugging:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a computed column that might fail
t.add_computed_column(
result=some_function(t.input),
on_error='ignore' # Continue on errors
)
# Query error information for failed rows
t.where(t.result == None).select(
t.input,
t.result.errortype, # Exception class name
t.result.errormsg # Error message
).collect()
```
## Best practices
Prefer `pxt.Image[(224,224), 'RGB']` over `pxt.Image` when you know the constraints. This enables optimizations and catches errors early.
Use JSON schema validation or Pydantic models for structured data to ensure consistency across your pipeline.
Always specify array shapes and dtypes. Use `None` for variable dimensions: `pxt.Array[(None, 768), pxt.Float]`.
Use `on_error='ignore'` in production pipelines, then query `.errortype` and `.errormsg` to debug failures.
## See also
Creating and managing tables
Transform data with computed columns
Complete type reference
# UDFs in Pixeltable
Source: https://docs.pixeltable.com/platform/udfs-in-pixeltable
Write Python user-defined functions in Pixeltable to extend tables with custom logic, model inference, and API calls as typed computed columns.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Pixeltable comes with a library of built-in functions and integrations,
but sooner or later, you’ll want to introduce some customized logic into
your workflow. This is where Pixeltable’s rich UDF (User-Defined
Function) capability comes in. Pixeltable UDFs let you write code in
Python, then directly insert your custom logic into Pixeltable
expressions and computed columns. In this how-to guide, we’ll show how
to define UDFs, extend their capabilities, and use them in computed
columns.
To start, we’ll install the necessary dependencies, create a Pixeltable
directory and table to experiment with, and add some sample data.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Create the directory and table
pxt.drop_dir('udf_demo', force=True) # Ensure a clean slate for the demo
pxt.create_dir('udf_demo')
t = pxt.create_table('udf_demo/strings', {'input': pxt.String})
# Add some sample data
t.insert(
[
{'input': 'Hello, world!'},
{'input': 'You can do a lot with Pixeltable UDFs.'},
]
)
t.show()
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory \`udf\_demo\`.
Created table \`strings\`.
Inserting rows into \`strings\`: 2 rows \[00:00, 763.99 rows/s]
Inserted 2 rows with 0 errors.
## What is a UDF?
A Pixeltable UDF is just a Python function that is marked with the
`@pxt.udf` decorator.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
def add_one(n: int) -> int:
return n + 1
```
It’s as simple as that! Without the decorator, `add_one` would be an
ordinary Python function that operates on integers. Adding `@pxt.udf`
converts it into a Pixeltable function that operates on *columns* of
integers. The decorated function can then be used directly to define
computed columns; Pixeltable will orchestrate its execution across all
the input data.
For our first working example, let’s do something slightly more
interesting: write a function to extract the longest word from a
sentence. (If there are ties for the longest word, we choose the first
word among those ties.) In Python, that might look something like this:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import numpy as np
def longest_word(sentence: str, strip_punctuation: bool = False) -> str:
words = sentence.split()
if (
strip_punctuation
): # Remove non-alphanumeric characters from each word
words = [''.join(filter(str.isalnum, word)) for word in words]
i = np.argmax([len(word) for word in words])
return words[i]
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
longest_word("Let's check that it works.", strip_punctuation=True)
```
'check'
The `longest_word` Python function isn’t a Pixeltable UDF (yet); it
operates on individual strings, not columns of strings. Adding the
decorator turns it into a UDF:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
def longest_word(sentence: str, strip_punctuation: bool = False) -> str:
words = sentence.split()
if (
strip_punctuation
): # Remove non-alphanumeric characters from each word
words = [''.join(filter(str.isalnum, word)) for word in words]
i = np.argmax([len(word) for word in words])
return words[i]
```
Now we can use it to create a computed column. Pixeltable orchestrates
the computation like it does with any other function, applying the UDF
in turn to each existing row of the table, then updating incrementally
each time a new row is added.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(longest_word=longest_word(t.input))
t.show()
```
Oops, those trailing punctuation marks are kind of annoying. Let’s add
another column, this time using the handy `strip_punctuation` parameter
from our UDF. (We could alternatively drop the first column before
adding the new one, but for purposes of this tutorial it’s convenient to
see how Pixeltable executes both variants side-by-side.) Note how
*columns* such as `t.input` and *constants* such as `True` can be freely
intermixed as arguments to the UDF.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(
longest_word_2=longest_word(t.input, strip_punctuation=True)
)
t.show()
```
## Types in UDFs
You might have noticed that the `longest_word` UDF has *type hints* in
its signature.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
def longest_word(sentence: str, strip_punctuation: bool = False) -> str: ...
```
The `sentence` parameter, `strip_punctuation` parameter, and return
value all have explicit types (`str`, `bool`, and `str` respectively).
In general Python code, type hints are usually optional. But Pixeltable
is a database system: *everything* in Pixeltable must have a type. And
since Pixeltable is also an orchestrator - meaning it sets up workflows
and computed columns *before* executing them - these types need to be
known in advance. That’s the reasoning behind a fundamental principle of
Pixeltable UDFs:
* Type hints are *required*.
You can turn almost any Python function into a Pixeltable UDF, provided
that it has type hints, and provided that Pixeltable supports the types
that it uses. The most familiar types that you’ll use in UDFs are:
* `int`
* `float`
* `str`
* `list` (can optionally be parameterized, e.g., `list[str]`)
* `dict` (can optionally be parameterized, e.g., `dict[str, int]`)
* `PIL.Image.Image`
In addition to these standard Python types, Pixeltable also recognizes
various kinds of arrays, audio and video media, and documents.
## Local and module UDFs
The `longest_word` UDF that we defined above is a *local* UDF: it was
defined directly in our notebook, rather than in a module that we
imported. Many other UDFs, including all of Pixeltable’s built-in
functions, are defined in modules. We encountered a few of these in the
10-Minute Tour tutorial: the `huggingface.detr_for_object_detection` and
`openai.chat_completions` functions. (Although these are built-in
functions, they behave the same way as UDFs, and in fact they’re defined
the same way under the covers.)
There is an important difference between the two. When you add a module
UDF such as `openai.chat_completions` to a table, Pixeltable stores a
*reference* to the corresponding Python function in the module. If you
later restart your Python runtime and reload Pixeltable, then Pixeltable
will re-import the module UDF when it loads the computed column. This
means that any code changes made to the UDF will be picked up at that
time, and the new version of the UDF will be used in any future
execution.
Conversely, when you add a local UDF to a table, the *entire code* for
the UDF is serialized and stored in the table. This ensures that if you
restart your notebook kernel (say), or even delete the notebook
entirely, the UDF will continue to function. However, it also means that
if you modify the UDF code, the updated logic will not be reflected in
any existing Pixeltable columns.
To see how this works in practice, let’s modify our `longest_word` UDF
so that if `strip_punctuation` is `True`, then we remove only a single
punctuation mark from the *end* of each word.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
def longest_word(sentence: str, strip_punctuation: bool = False) -> str:
words = sentence.split()
if strip_punctuation:
words = [
word if word[-1].isalnum() else word[:-1] for word in words
]
i = np.argmax([len(word) for word in words])
return words[i]
```
Now we see that Pixeltable continues to use the *old* definition, even
as new rows are added to the table.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.insert([{'input': "Let's check that it still works."}])
t.show()
```
But if we add a new *column* that references the `longest_word` UDF,
Pixeltable will use the updated version.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(
longest_word_3=longest_word(t.input, strip_punctuation=True)
)
t.show()
```
The general rule is: changes to module UDFs will affect any future
execution; changes to local UDFs will only affect *new columns* that are
defined using the new version of the UDF.
## Batching
Pixeltable provides several ways to optimize UDFs for better
performance. One of the most common is *batching*, which is particularly
important for UDFs that involve GPU operations.
Ordinary UDFs process one row at a time, meaning the UDF will be invoked
exactly once per row processed. Conversely, a batched UDF processes
several rows at a time; the specific number is user-configurable. As an
example, let’s modify our `longest_word` UDF to take a batched
parameter. Here’s what it looks like:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.func import Batch
@pxt.udf(batch_size=16)
def longest_word(
sentences: Batch[str], strip_punctuation: bool = False
) -> Batch[str]:
results = []
for sentence in sentences:
words = sentence.split()
if strip_punctuation:
words = [
word if word[-1].isalnum() else word[:-1]
for word in words
]
i = np.argmax([len(word) for word in words])
results.append(words[i])
return results
```
There are several changes:
* The parameter `batch_size=16` has been added to the `@pxt.udf`
decorator, specifying the batch size;
* The `sentences` parameter has changed from `str` to `Batch[str]`;
* The return type has also changed from `str` to `Batch[str]`; and
* Instead of processing a single sentence, the UDF is processing a
`Batch` of sentences and returning the result `Batch`.
What exactly is a `Batch[str]`? Functionally, it’s simply a `list[str]`,
and you can use it exactly like a `list[str]` in any Python code. The
only difference is in the type hint; a type hint of `Batch[str]` tells
Pixeltable, “My data consists of individual strings that I want you to
process in batches”. Conversely, a type hint of `list[str]` would mean,
“My data consists of *lists* of strings that I want you to process one
at a time”.
Notice that the `strip_punctuation` parameter is *not* wrapped in a
`Batch` type. This because `strip_punctuation` controls the behavior of
the UDF, rather than being part of the input data. When we use the
batched `longest_word` UDF, the `strip_punctuation` parameter will
always be a constant, not a column.
Let’s put the new, batched UDF to work.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(
longest_word_3_batched=longest_word(t.input, strip_punctuation=True)
)
t.show()
```
As expected, the output of the `longest_word_3_batched` column is
identical to the `longest_word_3` column. Under the covers, though,
Pixeltable is orchestrating execution in batches of 16. That probably
won’t have much performance impact on our toy example, but for GPU-bound
computations such as text or image embeddings, it can make a substantial
difference.
## UDAs (aggregate UDFs)
Ordinary UDFs are always one-to-one on rows: each row of input generates
one UDF output value. Functions that aggregate data, conversely, are
many-to-one, and in Pixeltable they are represented by a related
abstraction, the UDA (User-Defined Aggregate).
Pixeltable has a number of built-in UDAs; if you’ve worked through the
Fundamentals tutorial, you’ll have already encountered a few of them,
such as `sum` and `count`. In this section, we’ll show how to define
your own custom UDAs. For demonstration purposes, let’s start by
creating a table containing all the integers from 0 to 49.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
t = pxt.create_table('udf_demo/values', {'val': pxt.Int})
t.insert({'val': n} for n in range(50))
```
Created table \`values\`.
Inserting rows into \`values\`: 50 rows \[00:00, 9267.95 rows/s]
Inserted 50 rows with 0 errors.
UpdateStatus(num\_rows=50, num\_computed\_values=0, num\_excs=0, updated\_cols=\[], cols\_with\_excs=\[])
If we wanted to compute their sum using the built-in `sum` aggregate,
we’d do it like this:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable.functions as pxtf
t.select(pxtf.sum(t.val)).collect()
```
Or perhaps we want to group them by `n // 10` (corresponding to the tens
digit of each integer) and sum each group:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.group_by(t.val // 10).order_by(t.val // 10).select(
t.val // 10, pxtf.sum(t.val)
).collect()
```
Now let’s define a new aggregate to compute the sum of squares of a set
of numbers. To define an aggregate, we implement a subclass of the
`pxt.Aggregator` Python class and decorate it with the `@pxt.uda`
decorator, similar to what we did for UDFs. The subclass must implement
three methods:
* `__init__()` - initializes the aggregator; can be used to parameterize
aggregator behavior
* `update()` - updates the internal state of the aggregator with a new
value
* `value()` - retrieves the current value held by the aggregator
In our example, the class will have a single member `cur_sum`, which
holds a running total of the squares of all the values we’ve seen.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.uda
class sum_of_squares(pxt.Aggregator):
def __init__(self):
# No data yet; initialize `cur_sum` to 0
self.cur_sum = 0
def update(self, val: int) -> None:
# Update the value of `cur_sum` with the new datapoint
self.cur_sum += val * val
def value(self) -> int:
# Retrieve the current value of `cur_sum`
return self.cur_sum
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(sum_of_squares(t.val)).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.group_by(t.val // 10).order_by(t.val // 10).select(
t.val // 10, sum_of_squares(t.val)
).collect()
```
# Version Control and Lineage
Source: https://docs.pixeltable.com/platform/version-control
Pixeltable automatically versions every table and column, supports time-travel queries, and tracks full data lineage across pipeline changes.
Pixeltable automatically tracks every change to your tables—data insertions, updates, deletions, and schema modifications. Query any point in history, undo mistakes, and maintain full reproducibility without manual version management.
## How it works
Every operation that modifies a table creates a new version:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Version 0: Table created
products = pxt.create_table('demo/products', {
'name': pxt.String,
'price': pxt.Float
})
# Version 1: Data inserted
products.insert([
{'name': 'Widget', 'price': 9.99},
{'name': 'Gadget', 'price': 24.99}
])
# Version 2: Schema changed
products.add_computed_column(price_with_tax=products.price * 1.08)
# Version 3: Data updated
products.update({'price': 19.99}, where=products.name == 'Widget')
```
No configuration required—versioning is always on.
## Viewing history
### Human-readable history
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
products.history()
```
Returns a DataFrame showing all versions with timestamps, change types, and row counts:
| version | created\_at | change\_type | inserts | updates | deletes | schema\_change |
| ------- | ------------------- | ------------ | ------- | ------- | ------- | ----------------------- |
| 3 | 2025-01-15 10:30:00 | data | 0 | 1 | 0 | None |
| 2 | 2025-01-15 10:29:00 | schema | 0 | 2 | 0 | Added: price\_with\_tax |
| 1 | 2025-01-15 10:28:00 | data | 2 | 0 | 0 | None |
| 0 | 2025-01-15 10:27:00 | schema | 0 | 0 | 0 | Initial Version |
### Programmatic access
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
versions = products.get_versions() # List of dictionaries
latest = versions[0]
print(f"Version {latest['version']}: {latest['inserts']} inserts")
```
## Time travel queries
Query any historical version using the `table_name:version` syntax:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Get the table at version 1 (before computed column)
products_v1 = pxt.get_table('demo/products:1')
products_v1.collect() # Returns data as it was at version 1
# Compare with current state
products.collect() # Returns current data
```
Version handles are **read-only**—you cannot modify historical data.
### Use cases
* **Debugging**: Compare data before and after a problematic update
* **Auditing**: Track who changed what and when
* **Recovery**: Find and extract accidentally deleted or modified data
* **Reproducibility**: Query exact data used for a specific model training run
## Reverting changes
Undo the most recent change with `revert()`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Oops, wrong update
products.update({'price': 0.00}, where=products.name == 'Widget')
# Undo it
products.revert() # Removes version N, table is now at version N-1
```
`revert()` permanently removes the latest version. This cannot be undone.
You can call `revert()` multiple times to go back further, but cannot revert past version 0 or past a version referenced by a snapshot.
## Snapshots
Create named, persistent point-in-time copies for long-term preservation:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Freeze current state before a major data update
baseline = pxt.create_snapshot('demo/products_baseline', products)
# Later: source table changes, but snapshot remains unchanged
products.insert([{'name': 'NewItem', 'price': 99.99}])
products.count() # 3 rows (updated)
baseline.count() # 2 rows (frozen)
```
**Snapshots vs Time Travel:**
* Time travel (`pxt.get_table('table:N')`) queries historical versions in place
* Snapshots create a named, independent copy that persists even if the source table is modified or deleted
## Data lineage
Pixeltable tracks the complete lineage of your data:
### Schema lineage
Every computed column records its dependencies:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
products.add_computed_column(
discounted=products.price * 0.9
)
products.add_computed_column(
discounted_with_tax=products.discounted * 1.08
)
# Pixeltable knows: discounted_with_tax → discounted → price
```
### View lineage
Views automatically track their source tables:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
expensive = pxt.create_view(
'demo/expensive_products',
products.where(products.price > 20)
)
# View lineage: expensive_products → products
```
### What's tracked
| Change Type | Tracked Information |
| ----------------------- | --------------------------------------------------------- |
| `insert()` | Row count, timestamp, computed values generated |
| `update()` | Rows affected, old vs new values (via version comparison) |
| `delete()` | Row count removed |
| `add_column()` | Column name, type, dependencies |
| `add_computed_column()` | Column name, expression, dependencies |
| `drop_column()` | Column removed |
| `rename_column()` | Old name → new name |
## Best practices
Create snapshots before major data loads, model training runs, or production deployments.
Log table version numbers alongside model artifacts: `products.get_versions()[0]['version']`
Use `revert()` immediately after mistakes. For older issues, use time travel to identify the problem.
Use directories like `dev/products`, `staging/products` to isolate versioning across environments.
## Comparison with other systems
| Feature | Pixeltable | Git | Delta Lake |
| ----------------------- | --------------------- | --------------- | ------------------ |
| Automatic versioning | ✅ Every operation | Manual commits | ✅ Every operation |
| Time travel queries | ✅ `table:N` syntax | Checkout commit | ✅ `VERSION AS OF` |
| Schema versioning | ✅ Tracked | File-based | ✅ Schema evolution |
| Computed column lineage | ✅ Automatic | N/A | N/A |
| Revert | ✅ `revert()` | `git revert` | `RESTORE` |
| Named snapshots | ✅ `create_snapshot()` | Tags/branches | N/A |
## Next steps
Step-by-step cookbook with runnable examples
Publish and replicate tables across environments
# Views
Source: https://docs.pixeltable.com/platform/views
Create virtual derived tables in Pixeltable with views that filter, transform, or expand rows without copying underlying data or computations.
# When to Use Views
Views in Pixeltable are best used when you need to:
1. **Transform Data**: When you need to process or reshape data from a base table (e.g., splitting documents into chunks, extracting features from images)
2. **Filter Data**: When you frequently need to work with a specific subset of your data
3. **Create Virtual Tables**: When you want to avoid storing redundant data and automatically keep derived data in sync
4. **Build Data Workflows**: When you need to chain multiple data transformations together
5. **Save Storage**: When you want to compute data on demand rather than storing it permanently
Choose views over tables when your data is derived from other base tables and needs to stay synchronized with its source. Use regular tables when you need to store original data or when the computation cost of deriving data on demand is too high.
## Phase 1: Define your base table and view structure
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.document import document_splitter
# Create a directory to organize data (optional)
pxt.drop_dir('documents', force=True)
pxt.create_dir('documents')
# Define your base table first
documents = pxt.create_table(
"documents/collection",
{"document": pxt.Document}
)
# Create a view that splits documents into chunks
chunks = pxt.create_view(
'documents/chunks',
documents,
iterator=document_splitter(
document=documents.document,
separators='token_limit',
limit=300
)
)
```
## Phase 2: Use your application
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Connect to your base table and view
documents = pxt.get_table("documents/collection")
chunks = pxt.get_table("documents/chunks")
# Insert data into base table - view updates automatically
documents.insert([{
"document": "path/to/document.pdf"
}])
# Query the view
print(chunks.collect())
```
## View types
Views created using iterators to transform data:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Document splitting view
chunks = pxt.create_view(
'docs/chunks',
documents,
iterator=document_splitter(
document=documents.document
)
)
```
Views created from query operations:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Filtered view of high-budget movies
blockbusters = pxt.create_view(
'movies/blockbusters',
movies.where(movies.budget >= 100.0)
)
```
## View operations
Query views like regular tables:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Basic filtering on view
chunks.where(chunks.text.contains('specific topic')).collect()
# Select specific columns
chunks.select(chunks.text, chunks.pos).collect()
# Order results
chunks.order_by(chunks.pos).limit(5).collect()
```
Add computed columns to views:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add embeddings to chunks
chunks.add_computed_column(
embedding=sentence_transformer.using(
model_id='intfloat/e5-large-v2'
)(chunks.text)
)
```
Create views based on other views:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Create a view of embedded chunks
embedded_chunks = pxt.create_view(
'docs/embedded_chunks',
chunks.where(chunks.text.len() > 100)
)
```
## Key features
Views automatically update when base tables change
Views compute data on demand, saving storage
Views can be part of larger data workflows
# anthropic
Source: https://docs.pixeltable.com/sdk/latest/anthropic
# module pixeltable.functions.anthropic
Pixeltable UDFs
that wrap various endpoints from the Anthropic API. In order to use them, you must
first `pip install anthropic` and configure your Anthropic credentials, as described in
the [Working with Anthropic](https://docs.pixeltable.com/notebooks/integrations/working-with-anthropic) tutorial.
## func invoke\_tools()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
invoke_tools(
tools: pixeltable.func.tools.Tools,
response: pixeltable.exprs.expr.Expr
) -> pixeltable.exprs.inline_expr.InlineDict
```
Converts an Anthropic response dict to Pixeltable tool invocation format and calls `tools._invoke()`.
## udf messages()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
messages(
messages: pxt.Json[(Json, ...)],
*,
model: pxt.String,
max_tokens: pxt.Int,
model_kwargs: pxt.Json | None = None,
tools: pxt.Json[(Json, ...)] | None = None,
tool_choice: pxt.Json | None = None
) -> pxt.Json
```
Create a Message.
Equivalent to the Anthropic `messages` API endpoint.
For additional details, see: [https://docs.anthropic.com/en/api/messages](https://docs.anthropic.com/en/api/messages)
Request throttling:
Uses the rate limit-related headers returned by the API to throttle requests adaptively, based on available
request and token capacity. No configuration is necessary.
**Requirements:**
* `pip install anthropic`
**Parameters:**
* **`messages`** (`pxt.Json[(Json`): Input messages.
* **`model`** (`Any`): The model that will complete your prompt.
* **`model_kwargs`** (`Any`): Additional keyword args for the Anthropic `messages` API.
For details on the available parameters, see: [https://docs.anthropic.com/en/api/messages](https://docs.anthropic.com/en/api/messages)
* **`tools`** (`Any`): An optional list of Pixeltable tools to use for the request.
* **`tool_choice`** (`Any`): An optional tool choice configuration.
**Returns:**
* `pxt.Json`: A dictionary containing the response and other metadata.
**Examples:**
Add a computed column that applies the model `claude-3-5-sonnet-20241022`
to an existing Pixeltable column `tbl.prompt` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
msgs = [{'role': 'user', 'content': tbl.prompt}]
tbl.add_computed_column(
response=messages(msgs, model='claude-3-5-sonnet-20241022')
)
```
# audio
Source: https://docs.pixeltable.com/sdk/latest/audio
# module pixeltable.functions.audio
Pixeltable UDFs for `AudioType`.
## iterator audio\_splitter()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.iterator
audio_splitter(
audio: pxt.Audio,
duration: pxt.Float,
*,
overlap: pxt.Float = 0.0,
min_segment_duration: pxt.Float = 0.0
)
```
Iterator over segments of an audio file. The audio file is split into smaller segments,
where the duration of each segment is determined by `duration`.
If the input contains no audio, no segments are yielded.
**Outputs**:
One row per audio segment, with the following columns:
* `segment_start` (`pxt.Float`): Start time of the audio segment in seconds
* `segment_end` (`pxt.Float`): End time of the audio segment in seconds
* `audio_segment` (`pxt.Audio | None`): The audio content of the segment
**Parameters:**
* **`duration`** (`pxt.Float`): Audio segment duration in seconds
* **`overlap`** (`pxt.Float`): Overlap between consecutive segments in seconds
* **`min_segment_duration`** (`pxt.Float`): Drop the last segment if it is smaller than `min_segment_duration`
**Examples:**
This example assumes an existing table `tbl` with a column `audio` of type `pxt.Audio`.
Create a view that splits all audio files into segments of 30 seconds with 5 seconds overlap:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.create_view(
'audio_segments',
tbl,
iterator=audio_splitter(tbl.audio, duration=30.0, overlap=5.0),
)
```
## udf encode\_audio()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
encode_audio(
audio_data: pxt.Array[float32],
*,
input_sample_rate: pxt.Int,
format: pxt.String,
output_sample_rate: pxt.Int | None = None
) -> pxt.Audio
```
Encodes an audio clip represented as an array into a specified audio format.
**Parameters:**
* **`audio_data`** (`pxt.Array[float32]`): An array of sampled amplitudes. The accepted array shapes are `(N,)` or `(1, N)` for mono audio
or `(2, N)` for stereo.
* **`input_sample_rate`** (`pxt.Int`): The sample rate of the input audio data.
* **`format`** (`pxt.String`): The desired output audio format. The supported formats are 'wav', 'mp3', 'flac', and 'mp4'.
* **`output_sample_rate`** (`pxt.Int | None`): The desired sample rate for the output audio. Defaults to the input sample rate if
unspecified.
**Examples:**
Add a computed column with encoded FLAC audio files to a table with audio data (as arrays of floats) and sample
rates:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(
audio_file=encode_audio(
t.audio_data, input_sample_rate=t.sample_rate, format='flac'
)
)
```
## udf get\_metadata()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
get_metadata(audio: pxt.Audio) -> ContainerMetadata
```
Gets various metadata associated with an audio file and returns it as
a [`ContainerMetadata`](./containermetadata) dictionary.
**Parameters:**
* **`audio`** (`pxt.Audio`): The audio to get metadata for.
**Returns:**
* `ContainerMetadata`: A [`ContainerMetadata`](./containermetadata) with typical structure:
```json theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{
'size': 2568827,
'streams': [
{
'type': 'audio',
'frames': 0,
'duration': 2646000,
'metadata': {},
'time_base': 2.2675736961451248e-05,
'codec_context': {
'name': 'flac',
'profile': None,
'channels': 1,
'codec_tag': '\x00\x00\x00\x00',
},
'duration_seconds': 60.0,
}
],
'bit_rate': 342510,
'metadata': {'encoder': 'Lavf61.1.100'},
'bit_exact': False,
}
```
**Examples:**
Extract metadata for files in the `audio_col` column of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.audio_col.get_metadata()).collect()
```
# bedrock
Source: https://docs.pixeltable.com/sdk/latest/bedrock
# module pixeltable.functions.bedrock
Pixeltable UDFs for AWS Bedrock AI models.
In order to use them, you must
first `pip install boto3` and configure your AWS credentials, as described in
the [Working with Bedrock](https://docs.pixeltable.com/howto/providers/working-with-bedrock) tutorial.
## func invoke\_tools()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
invoke_tools(
tools: pixeltable.func.tools.Tools,
response: pixeltable.exprs.expr.Expr
) -> pixeltable.exprs.inline_expr.InlineDict
```
Converts a Bedrock response dict to Pixeltable tool invocation format and calls `tools._invoke()`.
## udf converse()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
converse(
messages: pxt.Json[(Json, ...)],
*,
model_id: pxt.String,
system: pxt.Json[(Json, ...)] | None = None,
inference_config: pxt.Json | None = None,
additional_model_request_fields: pxt.Json | None = None,
tool_config: pxt.Json[(Json, ...)] | None = None
) -> pxt.Json
```
Generate a conversation response.
Equivalent to the AWS Bedrock `converse` API endpoint.
For additional details, see:
[https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime/client/converse.html](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime/client/converse.html)
PIL images and media file paths in `messages[*].content[*].(image|video|audio).source.bytes`
are converted to raw bytes automatically.
**Requirements:**
* `pip install boto3`
**Parameters:**
* **`messages`** (`pxt.Json[(Json`): Input messages.
* **`model_id`** (`Any`): The model that will complete your prompt.
* **`system`** (`Any`): An optional system prompt.
* **`inference_config`** (`Any`): Base inference parameters to use.
* **`additional_model_request_fields`** (`Any`): Additional inference parameters to use.
* **`tool_config`** (`Any`): An optional list of Pixeltable tools to use.
**Returns:**
* `pxt.Json`: A dictionary containing the response and other metadata.
**Examples:**
Add a computed column that applies the model `anthropic.claude-3-haiku-20240307-v1:0`
to an existing Pixeltable column `tbl.prompt` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
msgs = [{'role': 'user', 'content': [{'text': tbl.prompt}]}]
tbl.add_computed_column(
response=converse(
msgs, model_id='anthropic.claude-3-haiku-20240307-v1:0'
)
)
```
Pass an image via the Converse API:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
msgs = [
{
'role': 'user',
'content': [
{'image': {'format': 'jpeg', 'source': {'bytes': tbl.image}}},
{'text': "What's in this image?"},
],
}
]
tbl.add_computed_column(
response=converse(msgs, model_id='amazon.nova-lite-v1:0')
)
```
## udf embed()
```python Signatures theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Signature 1:
@pxt.udf
embed(
text: pxt.String,
model_id: pxt.String,
dimensions: pxt.Int | None
) -> pxt.Array[(None,), float32]
# Signature 2:
@pxt.udf
embed(
image: pxt.Image,
model_id: pxt.String,
dimensions: pxt.Int | None
) -> pxt.Array[(None,), float32]
```
Generate text or image embeddings using Amazon Titan, Amazon Nova, or Cohere embedding models.
Calls the AWS Bedrock `invoke_model` API for embedding models.
For additional details, see:
[https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html)
[https://docs.aws.amazon.com/nova/latest/userguide/modality-embedding.html](https://docs.aws.amazon.com/nova/latest/userguide/modality-embedding.html)
[https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-embed.html](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-embed.html)
**Requirements:**
* `pip install boto3`
**Parameters:**
* **`text`** (`String`): Input text to embed.
* **`model_id`** (`String`): The embedding model identifier. Supported models:
* `amazon.titan-embed-text-v1`
* `amazon.titan-embed-text-v2:0` (supports `dimensions`: 256, 512, 1024)
* `amazon.nova-2-multimodal-embeddings-v1:0` (supports `dimensions`: 256, 512, 1024, 3072)
* `cohere.embed-english-v3`
* `cohere.embed-multilingual-v3`
* `cohere.embed-v4:0` (supports `dimensions`: 256, 512, 1024, 1536)
* **`dimensions`** (`Int | None`, default: `Literal(None)`): Output embedding dimensions (model-dependent, optional).
**Returns:**
* `pxt.Array[(None,), float32]`: Embedding vector.
**Examples:**
Create an embedding index on a column `description` with Nova embeddings and custom dimensions:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_embedding_index(
tbl.description,
string_embed=embed.using(
model_id='amazon.nova-2-multimodal-embeddings-v1:0',
dimensions=1024,
),
)
```
## udf invoke\_model()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
invoke_model(body: pxt.Json, *, model_id: pxt.String) -> pxt.Json
```
Invoke a Bedrock model.
Equivalent to the AWS Bedrock `invoke_model` API endpoint, with automatic routing to
`StartAsyncInvoke` for models that require it (e.g. video generation, audio/video embeddings).
For additional details, see:
[https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime/client/invoke\_model.html](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime/client/invoke_model.html)
PIL images and media file paths anywhere in the request body are converted automatically
to the encoding expected by the target model. For image-generation models, base64-encoded
images in the response are automatically decoded into `PIL.Image` objects.
For models that require async invocation, `bedrock.temp_location` must be configured
(set environment variable `BEDROCK_TEMP_LOCATION` or add `temp_location` to the `[bedrock]` section of
your Pixeltable configuration file).
**Requirements:**
* `pip install boto3`
**Parameters:**
* **`body`** (`pxt.Json`): The prompt and inference parameters as a dictionary.
* **`model_id`** (`pxt.String`): The model identifier to invoke.
**Returns:**
* `pxt.Json`: A dictionary containing the model response. For image-generation models,
image fields are decoded to `PIL.Image` objects. For video-generation models,
returns a `pxt.Video` path.
**Examples:**
Invoke Amazon Titan text embeddings:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
body = {'inputText': tbl.text, 'dimensions': 512, 'normalize': True}
tbl.add_computed_column(
response=invoke_model(body, model_id='amazon.titan-embed-text-v2:0')
)
```
Invoke TwelveLabs Marengo with an image column (note that the image can be included directly
in the invoke body, and will be automatically base64-encoded by Pixeltable):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
body = {
'inputType': 'image',
'image': {'mediaSource': {'base64String': tbl.image}},
}
tbl.add_computed_column(
response=invoke_model(
body, model_id='twelvelabs.marengo-embed-3-0-v1:0'
)
)
```
Invoke TwelveLabs Marengo with audio (auto-routes to async via StartAsyncInvoke;
requires `bedrock.temp_location` to be configured):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
body = {
'inputType': 'audio',
'audio': {'mediaSource': {'base64String': tbl.audio}},
}
tbl.add_computed_column(
response=invoke_model(
body, model_id='twelvelabs.marengo-embed-3-0-v1:0'
)
)
```
Invoke Anthropic Claude with an image:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
body = {
'anthropic_version': 'bedrock-2023-05-31',
'max_tokens': 1024,
'messages': [
{
'role': 'user',
'content': [
{
'type': 'image',
'source': {
'type': 'base64',
'media_type': 'image/jpeg',
'data': tbl.image,
},
},
{'type': 'text', 'text': "What's in this image?"},
],
}
],
}
tbl.add_computed_column(
response=invoke_model(
body, model_id='anthropic.claude-3-haiku-20240307-v1:0'
)
)
```
Invoke Amazon Nova Lite with a video column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
body = {
'messages': [
{
'role': 'user',
'content': [
{
'video': {
'format': 'mp4',
'source': {'bytes': tbl.video},
}
},
{'text': 'What happens in this video?'},
],
}
]
}
tbl.add_computed_column(
response=invoke_model(body, model_id='amazon.nova-lite-v1:0')
)
```
Invoke Stability AI for image generation:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
body = {
'prompt': tbl.prompt,
'mode': 'text-to-image',
'aspect_ratio': '1:1',
'output_format': 'jpeg',
}
tbl.add_computed_column(
response=invoke_model(body, model_id='stability.sd3-5-large-v1:0')
)
tbl.add_computed_column(image=tbl.response['images'][0])
```
Invoke Amazon Nova Reel for video generation (auto-routes to async; requires `bedrock.temp_location`):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
body = {
'taskType': 'TEXT_VIDEO',
'textToVideoParams': {'text': tbl.prompt},
'videoGenerationConfig': {'durationSeconds': 6, 'fps': 24},
}
tbl.add_computed_column(
video=invoke_model(body, model_id='amazon.nova-reel-v1:1')
)
```
# bfl
Source: https://docs.pixeltable.com/sdk/latest/bfl
# module pixeltable.functions.bfl
Pixeltable [UDFs](https://docs.pixeltable.com/platform/udfs-in-pixeltable) that wrap
[Black Forest Labs (BFL)](https://docs.bfl.ai/) FLUX image generation API. In order to use them,
the API key must be specified either with `BFL_API_KEY` environment variable, or as `api_key`
in the `bfl` section of the Pixeltable config file.
For more information on FLUX models, see the [BFL documentation](https://docs.bfl.ai/).
## udf edit()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
edit(
prompt: pxt.String,
image: pxt.Image,
*,
model: pxt.String,
reference_images: pxt.Json[(Image, ...)] | None = None,
width: pxt.Int | None = None,
height: pxt.Int | None = None,
seed: pxt.Int | None = None,
safety_tolerance: pxt.Int | None = None,
output_format: pxt.String | None = None,
steps: pxt.Int | None = None,
guidance: pxt.Float | None = None
) -> pxt.Image
```
Edit an image using FLUX models with text prompts and optional reference images.
This UDF wraps the BFL FLUX image editing API. For more information, refer to the official
[API documentation](https://docs.bfl.ai/flux_2/flux2_image_editing).
**Parameters:**
* **`prompt`** (`pxt.String`): Text description of the edit to apply.
* **`image`** (`pxt.Image`): The base image to edit.
* **`model`** (`pxt.String`): The FLUX model to use for editing. See available models at
[https://docs.bfl.ai/](https://docs.bfl.ai/).
* **`reference_images`** (`pxt.Json[(Image`): Additional reference images (up to 7) for multi-reference editing.
* **`width`** (`Any`): Output width in pixels (multiple of 16). Matches input if not specified.
* **`height`** (`Any`): Output height in pixels (multiple of 16). Matches input if not specified.
* **`seed`** (`Any`): Random seed for reproducible results.
* **`safety_tolerance`** (`Any`): Moderation level from 0 (strict) to 6 (permissive). Default 2.
* **`output_format`** (`Any`): Image format, 'jpeg' or 'png'. Default 'jpeg'.
* **`steps`** (`Any`): Number of inference steps (flux-2-flex only, max 50).
* **`guidance`** (`Any`): Guidance scale 1.5-10 (flux-2-flex only). Default 4.5.
**Returns:**
* `pxt.Image`: An edited PIL Image.
**Examples:**
Edit an image to change its background:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(
edited=bfl.edit(
'Change the background to a sunset beach',
t.original_image,
model='flux-2-pro',
)
)
```
Multi-reference editing with additional images:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(
edited=bfl.edit(
'Combine the person from the first image with the background from the second',
t.person_image,
model='flux-kontext-pro',
reference_images=[t.background_image],
)
)
```
## udf expand()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
expand(
prompt: pxt.String,
image: pxt.Image,
*,
model: pxt.String,
top: pxt.Int = 0,
bottom: pxt.Int = 0,
left: pxt.Int = 0,
right: pxt.Int = 0,
seed: pxt.Int | None = None,
safety_tolerance: pxt.Int | None = None,
output_format: pxt.String | None = None
) -> pxt.Image
```
Expand an image by adding pixels on any side using FLUX Expand models.
Outpaint an image by specifying how many pixels to add to each edge.
The expansion maintains context from the original image.
This UDF wraps the BFL FLUX Expand API. For more information, refer to the official
[API documentation](https://docs.bfl.ai/flux_tools/flux_1_expand).
**Parameters:**
* **`prompt`** (`pxt.String`): Text description to guide the expansion.
* **`image`** (`pxt.Image`): The base image to expand.
* **`model`** (`pxt.String`): The FLUX Expand model to use. See available models at
[https://docs.bfl.ai/](https://docs.bfl.ai/).
* **`top`** (`pxt.Int`): Pixels to add to the top edge.
* **`bottom`** (`pxt.Int`): Pixels to add to the bottom edge.
* **`left`** (`pxt.Int`): Pixels to add to the left edge.
* **`right`** (`pxt.Int`): Pixels to add to the right edge.
* **`seed`** (`pxt.Int | None`): Random seed for reproducible results.
* **`safety_tolerance`** (`pxt.Int | None`): Moderation level from 0 (strict) to 6 (permissive). Default 2.
* **`output_format`** (`pxt.String | None`): Image format, 'jpeg' or 'png'. Default 'jpeg'.
**Returns:**
* `pxt.Image`: An expanded PIL Image.
**Examples:**
Expand an image to create a wider landscape:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(
wide=bfl.expand(
'Continue the landscape scenery',
t.original_image,
model='flux-pro-1.0-expand',
left=256,
right=256,
)
)
```
## udf fill()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
fill(
prompt: pxt.String,
image: pxt.Image,
mask: pxt.Image,
*,
model: pxt.String,
steps: pxt.Int | None = None,
guidance: pxt.Float | None = None,
seed: pxt.Int | None = None,
safety_tolerance: pxt.Int | None = None,
output_format: pxt.String | None = None
) -> pxt.Image
```
Inpaint an image using FLUX Fill models.
Fill specified areas of an image based on a mask and text prompt. The mask can be
a separate image or applied to the alpha channel of the input image.
This UDF wraps the BFL FLUX Fill API. For more information, refer to the official
[API documentation](https://docs.bfl.ai/flux_tools/flux_1_fill).
**Parameters:**
* **`prompt`** (`pxt.String`): Text description of what to fill in the masked area.
* **`image`** (`pxt.Image`): The base image to inpaint.
* **`mask`** (`pxt.Image`): Mask image where white areas indicate regions to fill (black areas preserved).
* **`model`** (`pxt.String`): The FLUX Fill model to use. See available models at
[https://docs.bfl.ai/](https://docs.bfl.ai/).
* **`steps`** (`pxt.Int | None`): Number of inference steps (max 50). Default 50.
* **`guidance`** (`pxt.Float | None`): Guidance scale for generation. Default 30.
* **`seed`** (`pxt.Int | None`): Random seed for reproducible results.
* **`safety_tolerance`** (`pxt.Int | None`): Moderation level from 0 (strict) to 6 (permissive). Default 2.
* **`output_format`** (`pxt.String | None`): Image format, 'jpeg' or 'png'. Default 'jpeg'.
**Returns:**
* `pxt.Image`: An inpainted PIL Image.
**Examples:**
Fill a masked region with generated content:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(
filled=bfl.fill(
'A beautiful garden with flowers',
t.original_image,
t.mask_image,
model='flux-pro-1.0-fill',
)
)
```
## udf generate()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
generate(
prompt: pxt.String,
*,
model: pxt.String,
width: pxt.Int | None = None,
height: pxt.Int | None = None,
seed: pxt.Int | None = None,
safety_tolerance: pxt.Int | None = None,
output_format: pxt.String | None = None,
steps: pxt.Int | None = None,
guidance: pxt.Float | None = None
) -> pxt.Image
```
Generate an image from a text prompt using FLUX models.
This UDF wraps the BFL FLUX API endpoints. For more information, refer to the official
[API documentation](https://docs.bfl.ai/flux_2/flux2_text_to_image).
**Parameters:**
* **`prompt`** (`pxt.String`): Text description of the image to generate.
* **`model`** (`pxt.String`): The FLUX model to use. See available models at
[https://docs.bfl.ai/](https://docs.bfl.ai/).
* **`width`** (`pxt.Int | None`): Output width in pixels (multiple of 16). Default 1024.
* **`height`** (`pxt.Int | None`): Output height in pixels (multiple of 16). Default 1024.
* **`seed`** (`pxt.Int | None`): Random seed for reproducible results.
* **`safety_tolerance`** (`pxt.Int | None`): Moderation level from 0 (strict) to 6 (permissive). Default 2.
* **`output_format`** (`pxt.String | None`): Image format, 'jpeg' or 'png'. Default 'jpeg'.
* **`steps`** (`pxt.Int | None`): Number of inference steps (flux-2-flex only, max 50).
* **`guidance`** (`pxt.Float | None`): Guidance scale 1.5-10 (flux-2-flex only). Default 4.5.
**Returns:**
* `pxt.Image`: A generated PIL Image.
**Examples:**
Generate images using default dimensions:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(image=bfl.generate(t.prompt, model='flux-2-pro'))
```
Generate with custom dimensions:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(
image=bfl.generate(
t.prompt, model='flux-2-pro', width=1920, height=1080
)
)
```
Generate with specific seed for reproducibility:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(
image=bfl.generate(t.prompt, model='flux-2-pro', seed=42)
)
```
# CodecContextMetadata
Source: https://docs.pixeltable.com/sdk/latest/codeccontextmetadata
# class pixeltable.functions.CodecContextMetadata
Metadata about a stream's codec.
## attr channels
```
channels: int | None
```
Number of audio channels. Present only for audio streams.
## attr codec\_tag
```
codec_tag: str
```
Four-character codec tag, unicode-escaped.
## attr name
```
name: str
```
Codec name (e.g. `'h264'`, `'aac'`).
## attr pix\_fmt
```
pix_fmt: str | None
```
Pixel format (e.g. `'yuv420p'`). Present only for video streams.
## attr profile
```
profile: str | None
```
Codec profile (e.g. `'High'`, `'LC'`), or `None` if unavailable.
# ColumnMetadata
Source: https://docs.pixeltable.com/sdk/latest/columnmetadata
# class pixeltable.ColumnMetadata
Metadata for a column of a Pixeltable table.
## attr comment
```
comment: str | None
```
User-provided column comment.
## attr computed\_with
```
computed_with: str | None
```
Expression used to compute this column; `None` if this is not a computed column.
## attr custom\_metadata
```
custom_metadata: Any
```
User-defined JSON metadata for this column, if any.
## attr defined\_in
```
defined_in: str | None
```
Name of the table where this column was originally defined.
If the current table is a view, then `defined_in` may differ from the current table name.
## attr depends\_on
```
depends_on: list[tuple[str, str]] | None
```
List of dependencies (table name, column name) if this is a computed column, else `None`.
## attr destination
```
destination: str | None
```
An object store reference for computed files, if one is configured.
## attr is\_builtin
```
is_builtin: bool | None
```
If False, this computed column makes calls to custom UDFs; `None` if this is not a computed column.
## attr is\_computed
```
is_computed: bool
```
`True` if this column is a computed column.
## attr is\_iterator\_col
```
is_iterator_col: bool
```
`True` if this column is produced by an iterator (only applicable to views).
## attr is\_primary\_key
```
is_primary_key: bool
```
`True` if this column is part of the table's primary key.
## attr is\_stored
```
is_stored: bool
```
`True` if this is a stored column; `False` if it is dynamically computed.
## attr media\_validation
```
media_validation: Literal['on_read', 'on_write'] | None
```
The media validation policy for this column. `None` if the type of this column is not a media type.
## attr name
```
name: str
```
The name of the column.
## attr type\_
```
type_: str
```
The type specifier of the column.
## attr version\_added
```
version_added: int
```
The table version when this column was added.
# ColumnRef
Source: https://docs.pixeltable.com/sdk/latest/columnref
# class pixeltable.exprs.ColumnRef
A Pixeltable expression that references a column of a table. A `ColumnRef` is created by column access
on a [`Table`](./table), such as `t.col`.
Not thread-safe.
## method embedding()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
embedding(*, idx: str | None = None) -> ColumnRef
```
Return a reference to the values of an embedding index on this column.
If an embedding index is defined on a column, the usual way to use that index is via a
[`similarity()`](./columnref#method-similarity) lookup. Sometimes it is also useful to directly access
the index values (i.e., the embedding vectors themselves). Calling `embedding()` returns a new `ColumnRef`
expression of type `pxt.Array[(dim,), prec]`, where `dim` and `prec` are the dimensionality and precision
of the column's embedding index.
If there is more than one embedding index defined on this column, then the `idx` parameter must be provided to
specify which index to reference. If there is only one index, then `idx` is optional.
Args:
idx: An optional embedding index name. *Required* if there is more than one embedding index defined on
this column.
Returns:
A new `ColumnRef` referencing the values of the specified embedding index on this column.
Raises:
`pxt.Error` if there is no embedding index defined on this column, if `idx` is not provided when there are
multiple embedding indices, or if `idx` does not match any embedding index defined on this column.
Examples:
All of these examples assume that `t` is a table with an image column `t.image`.
Add an embedding index to `t.image` using the `clip()`
embedding (this only needs to be done once):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import clip
t.add_embedding_index(
t.image, clip.using(model_id='openai/clip-vit-base-patch32')
)
```
Reference the embedding index values directly:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.image, t.image.embedding())
```
## method similarity()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
similarity(
item: Any = None,
*,
string: str | None = None,
image: str | PIL.Image.Image | None = None,
audio: str | None = None,
video: str | None = None,
document: str | None = None,
vector: np.ndarray | None = None,
idx: str | None = None
) -> Expr
```
Return a new expression representing the similarity score between the values of this column and the given
(constant) item. In order for this to work, there must be an embedding index defined on this column that
supports the modality of the given item (string, image, audio, video, document). Similarity will be scored
according to the metric defined by the embedding index.
Exactly one of `string`, `image`, `audio`, `video`, `document`, or `vector` must be provided. The `item`
parameter is deprecated and exists for backward compatibility only.
If `string`, `image`, `audio`, `video`, or `document` is provided, then an embedding vector will be computed
for the given input as defined by the embedding index and used to determine similarity. If `vector` is
provided, then it must be a 1-dimensional array of the same dimensionality as the index, and similarity will
be determined directly against the vector.
The optional `idx` parameter specifies the name of the embedding index to use. If there is more than one
embedding index defined on this column, then `idx` *must* be provided.
**Parameters:**
* **`string`** (`str | None`): A string to compare against the values of this column.
* **`image`** (`str | PIL.Image.Image | None`): An image to compare against the values of this column (either a local file path, a URL, or an
in-memory `PIL.Image.Image`).
* **`audio`** (`str | None`): An audio file to compare against the values of this column (a local file path or a URL).
* **`video`** (`str | None`): A video file to compare against the values of this column (a local file path or a URL).
* **`document`** (`str | None`): A document file to compare against the values of this column (a local file path or a URL).
* **`vector`** (`np.ndarray | None`): A 1-dimensional NumPy array to compare against the values of this column.
* **`idx`** (`str | None`): An optional embedding index name. *Required* if there is more than one embedding index defined on
this column.
* **`item`** (`Any`): **Deprecated** as of version 0.5.7.
**Returns:**
* `Expr`: A new expression representing the similarity score between the values of this column and the given item.
**Examples:**
All of these examples assume that `t` is a table with an image column `t.image`.
Add an embedding index to `t.image` using the `clip()`
embedding (this only needs to be done once):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import clip
t.add_embedding_index(
t.image, clip.using(model_id='openai/clip-vit-base-patch32')
)
```
Do a nearest neighbor search against a string (with `k=5`):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sim = t.image.similarity(string='a photograph of a cat')
t.select(t.image, sim).order_by(sim, asc=False).head(5)
```
Do a nearest neighbor search against an image:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sim = t.image.similarity(image='https://example.com/reference-cat.jpg')
t.select(t.image, sim).order_by(sim, asc=False).head(5)
```
# ColumnSpec
Source: https://docs.pixeltable.com/sdk/latest/columnspec
# class pixeltable.types.ColumnSpec
Column specification, a dictionary representation of a column's schema.
Exactly one of `type` or `value` must be included in the dictionary.
## attr comment
```
comment: str
```
Optional comment for the column. Displayed in .describe() output.
## attr custom\_metadata
```
custom_metadata: Any
```
User-defined metadata to associate with the column.
## attr destination
```
destination: str | Path
```
Destination for storing computed output files. Only applicable for computed columns.
Can be:
* A local pathname (such as `path/to/outputs/`), or
* The URI of an object store (such as `s3://my-bucket/outputs/`).
## attr media\_validation
```
media_validation: Literal['on_read', 'on_write']
```
When to validate media; `'on_read'` or `'on_write'`.
## attr primary\_key
```
primary_key: bool
```
Whether this column is part of the primary key. Defaults to `False`.
## attr stored
```
stored: bool
```
Whether to store the column data. Defaults to `True`.
## attr type
```
type: type
```
The column type (e.g., `pxt.Image`, `str`). Required unless `value` is specified.
## attr value
```
value: 'exprs.Expr'
```
A Pixeltable expression for computed columns. Mutually exclusive with `type`.
# ContainerMetadata
Source: https://docs.pixeltable.com/sdk/latest/containermetadata
# class pixeltable.functions.ContainerMetadata
Metadata for a media container, as returned by
[`audio.get_metadata()`](./audio#func-get_metadata)
or [`video.get_metadata()`](./video#func-get_metadata).
## attr bit\_exact
```
bit_exact: bool
```
Whether the container was opened in bit-exact mode.
## attr bit\_rate
```
bit_rate: int | None
```
Overall bit rate of the container in bits per second, or `None` if unknown.
## attr metadata
```
metadata: dict[str, str]
```
Additional container-level metadata tags (e.g. title, encoder).
## attr size
```
size: int | None
```
Size of the container in bytes, or `None` if unknown.
## attr streams
```
streams: list[StreamMetadata]
```
Per-stream metadata for each stream in the container.
# date
Source: https://docs.pixeltable.com/sdk/latest/date
# module pixeltable.functions.date
Pixeltable UDFs for `DateType`.
Usage example:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
t = pxt.get_table(...)
t.select(t.date_col.year, t.date_col.weekday()).collect()
```
## udf add\_days()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
add_days(self: pxt.Date, n: pxt.Int) -> pxt.Date
```
Add `n` days to the date.
Equivalent to [`date + timedelta(days=n)`](https://docs.python.org/3/library/datetime.html#datetime.timedelta).
## udf day()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
day(self: pxt.Date) -> pxt.Int
```
Between 1 and the number of days in the given month of the given year.
Equivalent to [`date.day`](https://docs.python.org/3/library/datetime.html#datetime.date.day).
## udf isocalendar()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
isocalendar(self: pxt.Date) -> pxt.Json
```
Return a dictionary with three entries: `'year'`, `'week'`, and `'weekday'`.
Equivalent to
[`date.isocalendar()`](https://docs.python.org/3/library/datetime.html#datetime.date.isocalendar).
## udf isoformat()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
isoformat(
self: pxt.Date,
sep: pxt.String = 'T',
timespec: pxt.String = 'auto'
) -> pxt.String
```
Return a string representing the date and time in ISO 8601 format.
Equivalent to [`date.isoformat()`](https://docs.python.org/3/library/datetime.html#datetime.date.isoformat).
**Parameters:**
* **`sep`** (`pxt.String`): Separator between date and time.
* **`timespec`** (`pxt.String`): The number of additional terms in the output. See the
[`date.isoformat()`](https://docs.python.org/3/library/datetime.html#datetime.date.isoformat)
documentation for more details.
## udf isoweekday()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
isoweekday(self: pxt.Date) -> pxt.Int
```
Return the day of the week as an integer, where Monday is 1 and Sunday is 7.
Equivalent to [`date.isoweekday()`](https://docs.python.org/3/library/datetime.html#datetime.date.isoweekday).
## udf make\_date()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
make_date(year: pxt.Int, month: pxt.Int, day: pxt.Int) -> pxt.Date
```
Create a date.
Equivalent to [`datetime()`](https://docs.python.org/3/library/datetime.html#datetime.date).
## udf month()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
month(self: pxt.Date) -> pxt.Int
```
Between 1 and 12 inclusive.
Equivalent to [`date.month`](https://docs.python.org/3/library/datetime.html#datetime.date.month).
## udf strftime()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
strftime(self: pxt.Date, format: pxt.String) -> pxt.String
```
Return a string representing the date and time, controlled by an explicit format string.
Equivalent to [`date.strftime()`](https://docs.python.org/3/library/datetime.html#datetime.date.strftime).
**Parameters:**
* **`format`** (`pxt.String`): The format string to control the output. For a complete list of formatting directives, see
[`strftime()` and `strptime()` Behavior](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior).
## udf toordinal()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
toordinal(self: pxt.Date) -> pxt.Int
```
Return the proleptic Gregorian ordinal of the date, where January 1 of year 1 has ordinal 1.
Equivalent to [`date.toordinal()`](https://docs.python.org/3/library/datetime.html#datetime.date.toordinal).
## udf weekday()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
weekday(self: pxt.Date) -> pxt.Int
```
Between 0 (Monday) and 6 (Sunday) inclusive.
Equivalent to [`date.weekday()`](https://docs.python.org/3/library/datetime.html#datetime.date.weekday).
## udf year()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
year(self: pxt.Date) -> pxt.Int
```
Between 1 and 9999 inclusive.
(Between [`MINYEAR`](https://docs.python.org/3/library/datetime.html#datetime.MINYEAR) and
[`MAXYEAR`](https://docs.python.org/3/library/datetime.html#datetime.MAXYEAR) as defined by the Python `datetime`
library).
Equivalent to [`date.year`](https://docs.python.org/3/library/datetime.html#datetime.date.year).
# deepseek
Source: https://docs.pixeltable.com/sdk/latest/deepseek
# module pixeltable.functions.deepseek
Pixeltable UDFs for Deepseek AI models.
Provides integration with Deepseek's language models for chat completions
and other AI capabilities.
## udf chat\_completions()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
chat_completions(
messages: pxt.Json[(Json, ...)],
*,
model: pxt.String,
model_kwargs: pxt.Json | None = None,
tools: pxt.Json[(Json, ...)] | None = None,
tool_choice: pxt.Json | None = None
) -> pxt.Json
```
Creates a model response for the given chat conversation.
Equivalent to the Deepseek `chat/completions` API endpoint.
For additional details, see: [https://api-docs.deepseek.com/api/create-chat-completion](https://api-docs.deepseek.com/api/create-chat-completion)
Deepseek uses the OpenAI SDK, so you will need to install the `openai` package to use this UDF.
Request throttling:
Applies the rate limit set in the config (section `deepseek`, key `rate_limit`). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install openai`
**Parameters:**
* **`messages`** (`pxt.Json[(Json`): A list of messages to use for chat completion, as described in the Deepseek API documentation.
* **`model`** (`Any`): The model to use for chat completion.
* **`model_kwargs`** (`Any`): Additional keyword args for the Deepseek `chat/completions` API.
For details on the available parameters, see: [https://api-docs.deepseek.com/api/create-chat-completion](https://api-docs.deepseek.com/api/create-chat-completion)
* **`tools`** (`Any`): An optional list of Pixeltable tools to use for the request.
* **`tool_choice`** (`Any`): An optional tool choice configuration.
**Returns:**
* `pxt.Json`: A dictionary containing the response and other metadata.
**Examples:**
Add a computed column that applies the model `deepseek-chat` to an existing Pixeltable column `tbl.prompt`
of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': tbl.prompt},
]
tbl.add_computed_column(
response=chat_completions(messages, model='deepseek-chat')
)
```
# DirContents
Source: https://docs.pixeltable.com/sdk/latest/dircontents
# class pixeltable.DirContents
Represents the contents of a Pixeltable directory.
## attr dirs
```
dirs: list[str]
```
List of directory paths contained in this directory.
## attr tables
```
tables: list[str]
```
List of table paths contained in this directory.
# DirectoryNode
Source: https://docs.pixeltable.com/sdk/latest/directorynode
# class pixeltable.DirectoryNode
A directory entry in a [`TreeNode`](./treenode) tree.
## attr entries
```
entries: list['TreeNode']
```
⚠️ **No documentation**
## attr kind
```
kind: Literal['directory']
```
⚠️ **No documentation**
## attr name
```
name: str
```
⚠️ **No documentation**
## attr path
```
path: str
```
⚠️ **No documentation**
# document
Source: https://docs.pixeltable.com/sdk/latest/document
# module pixeltable.functions.document
Pixeltable UDFs for `DocumentType`.
## iterator document\_splitter()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.iterator
document_splitter(
document: pxt.Document,
separators: pxt.String,
*,
elements: pxt.Json[(String, ...)] | None = None,
limit: pxt.Int | None = None,
overlap: pxt.Int | None = None,
metadata: pxt.String = '',
skip_tags: pxt.Json[(String, ...)] | None = None,
spacy_model: pxt.String = 'en_core_web_sm',
tiktoken_encoding: pxt.String | None = 'cl100k_base',
tiktoken_target_model: pxt.String | None = None,
image_dpi: pxt.Int = 300,
image_format: pxt.String = 'png'
)
```
Iterator over chunks of a document. The document is chunked according to the specified `separators`.
Chunked text will be cleaned with `ftfy.fix_text` to fix up common problems with unicode sequences.
**Outputs**:
One row per chunk, with the following columns, depending on the specified `elements` and `metadata`:
* `text` (`pxt.String`): The text of the chunk. Present if `'text'` is specified in `elements`.
* `image` (`pxt.Image`): The image extracted from the chunk. Present if `'image'` is specified in `elements`.
* `title` (`pxt.String | None`): The document title. Present if `'title'` is specified in `metadata`.
* `heading` (`pxt.Json | None`): The heading hierarchy at the start of the chunk (HTML and Markdown only).
Present if `'heading'` is specified in `metadata`.
* `sourceline` (`pxt.Int | None`): The source line number of the start of the chunk (HTML only).
Present if `'sourceline'` is specified in `metadata`.
* `page` (`pxt.Int | None`): The page number of the chunk (PDF only). Present if `'page'` is specified in
`metadata`.
* `bounding_box` (`pxt.Json | None`): The bounding box of the chunk on the page, as an `{x1, y1, x2, y2}`
dictionary (PDF only). Present if `'bounding_box'` is specified in `metadata`.
**Parameters:**
* **`separators`** (`pxt.String`): separators to use to chunk the document. Options are:
`'heading'`, `'paragraph'`, `'sentence'`, `'token_limit'`, `'char_limit'`, `'page'`.
This may be a comma-separated string, e.g., `'heading,token_limit'`.
* **`elements`** (`pxt.Json[(String`): list of elements to extract from the document. Options are:
`'text'`, `'image'`. Defaults to `['text']` if not specified. The `'image'` element is only supported
for the `'page'` separator on PDF documents.
* **`limit`** (`Any`): the maximum number of tokens or characters in each chunk, if `'token_limit'`
or `'char_limit'` is specified.
* **`metadata`** (`Any`): additional metadata fields to include in the output. Options are:
`'title'`, `'heading'` (HTML and Markdown), `'sourceline'` (HTML), `'page'` (PDF), `'bounding_box'`
(PDF). The input may be a comma-separated string, e.g., `'title,heading,sourceline'`.
* **`skip_tags`** (`Any`): list of HTML tags to skip when processing HTML documents.
* **`spacy_model`** (`Any`): Name of the spaCy model to use for sentence segmentation. This parameter is ignored unless
the `'sentence'` separator is specified.
* **`tiktoken_encoding`** (`Any`): Name of the tiktoken encoding to use when counting tokens. This parameter is ignored
unless the `'token_limit'` separator is specified.
* **`tiktoken_target_model`** (`Any`): Name of the target model to use when counting tokens with tiktoken. If specified,
this parameter overrides `tiktoken_encoding`. This parameter is ignored unless the `'token_limit'`
separator is specified.
* **`image_dpi`** (`Any`): DPI to use when extracting images from PDFs. Defaults to 300.
* **`image_format`** (`Any`): format to use when extracting images from PDFs. Defaults to 'png'.
**Examples:**
All these examples assume an existing table `tbl` with a column `doc` of type `pxt.Document`.
Create a view that splits all documents into chunks of up to 300 tokens:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.create_view(
'chunks',
tbl,
iterator=document_splitter(
tbl.doc, separators='token_limit', limit=300
),
)
```
Create a view that splits all documents along sentence boundaries, including title and heading metadata:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.create_view(
'sentence_chunks',
tbl,
iterator=document_splitter(
tbl.doc, separators='sentence', metadata='title,heading'
),
)
```
# Expr
Source: https://docs.pixeltable.com/sdk/latest/expr
# class pixeltable.exprs.Expr
A Pixeltable expression.
All Pixeltable expressions, including [column references](./columnref) (such as `t.col`),
UDF calls (`t.my_string.lower()`), and compound expressions (`t.col + 5`) are instances of this class.
Not thread-safe: Expr and subclasses contain execution state and are never thread-safe.
## method astype()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
astype(new_type: ts.ColumnType | type | _AnnotatedAlias) -> exprs.TypeCast
```
Return a new expression that casts this expression to a different type.
This represents a type *cast*, not a type *coercion*, so it will not mutate the underlying data; it simply
changes the static type given by the expression.
**Parameters:**
* **`new_type`** (`ts.ColumnType | type | _AnnotatedAlias`): The type to cast to.
**Examples:**
Given an existing column `t.json_col` of type `pxt.Json`, cast that column to type `pxt.String`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.json_col.astype(pxt.String)
```
This will assume that all values in `t.json_col` are *actually* strings, and it will result in an error
at runtime if there are values in `t.json_col` that are not strings. It will *not* convert those values to
a string representation. (For that, use the \[`dumps()`]\[pixeltable.functions.json.dumps] UDF instead.)
## method isin()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
isin(value_set: Any) -> exprs.InPredicate
```
Return a new, boolean-valued expression that is `True` whenever this expression is in `value_set`.
**Parameters:**
* **`value_set`** (`Any`): Either another expression that evaluates to a set of values, or a constant collection of
values. If the latter, can be any `Iterable`.
**Examples:**
These examples assume that `t` is a table with a column `int_col` of type `pxt.Int`, and another column
`list_col` of type `pxt.Json`, containing lists of integers.
Select all rows where `int_col` is in the constant set `{1, 3, 22}`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.where(t.int_col.isin({1, 3, 22})).select()
```
Select all rows where `int_col` is in the set of values in that row's `list_col`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.where(t.int_col.isin(t.list_col)).select()
```
# fabric
Source: https://docs.pixeltable.com/sdk/latest/fabric
# module pixeltable.functions.fabric
Pixeltable UDFs that wrap Azure OpenAI endpoints via Microsoft Fabric.
These functions provide seamless access to Azure OpenAI models within Microsoft Fabric
notebook environments. Authentication and endpoint discovery are handled automatically
using Fabric's built-in service discovery and token utilities.
**Note:** These functions only work within Microsoft Fabric notebook environments.
For more information on Fabric AI services, see:
[https://learn.microsoft.com/en-us/fabric/data-science/ai-services/ai-services-overview](https://learn.microsoft.com/en-us/fabric/data-science/ai-services/ai-services-overview)
## udf chat\_completions()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
chat_completions(
messages: pxt.Json[(Json, ...)],
*,
model: pxt.String,
api_version: pxt.String | None = None,
model_kwargs: pxt.Json | None = None
) -> pxt.Json
```
Creates a model response for the given chat conversation using Azure OpenAI in Fabric.
Equivalent to the Azure OpenAI `chat/completions` API endpoint.
For additional details, see: [https://learn.microsoft.com/en-us/azure/ai-services/openai/reference](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference)
**Automatic authentication:** Authentication is handled automatically in Fabric notebooks using
token-based authentication. No API keys are required.
**Supported models in Fabric:**
* `gpt-5` (reasoning model)
* `gpt-4.1`
* `gpt-4.1-mini`
Request throttling:
Applies the rate limit set in the config (section `fabric.rate_limits`, key `chat`). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* Microsoft Fabric notebook environment
* `synapse-ml-fabric` package (pre-installed in Fabric)
**Parameters:**
* **`messages`** (`pxt.Json[(Json`): A list of message dicts with 'role' and 'content' keys, as described in the
Azure OpenAI API documentation.
* **`model`** (`Any`): The deployment name to use (e.g., 'gpt-5', 'gpt-4.1', 'gpt-4.1-mini').
* **`api_version`** (`Any`): Optional API version override. If not specified, defaults to '2025-04-01-preview'
for reasoning models (gpt-5) and '2024-02-15-preview' for standard models.
* **`model_kwargs`** (`Any`): Additional keyword args for the Azure OpenAI `chat/completions` API.
For details on available parameters, see:
[https://learn.microsoft.com/en-us/azure/ai-services/openai/reference](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference)
**Note:** Reasoning models (gpt-5) use `max_completion_tokens` instead of `max_tokens`
and do not support the `temperature` parameter.
**Returns:**
* `pxt.Json`: A dictionary containing the response and other metadata.
**Examples:**
Add a computed column that applies the model `gpt-4.1` to an existing Pixeltable column
`tbl.prompt` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import fabric
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': tbl.prompt},
]
tbl.add_computed_column(
response=fabric.chat_completions(messages, model='gpt-4.1')
)
```
Using a reasoning model (gpt-5):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
reasoning_response=fabric.chat_completions(
messages,
model='gpt-5',
model_kwargs={'max_completion_tokens': 5000},
)
)
```
## udf embeddings()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
embeddings(
input: pxt.String,
*,
model: pxt.String,
api_version: pxt.String = '2024-02-15-preview',
model_kwargs: pxt.Json | None = None
) -> pxt.Array[(None,), float32]
```
Creates an embedding vector representing the input text using Azure OpenAI in Fabric.
Equivalent to the Azure OpenAI `embeddings` API endpoint.
For additional details, see: [https://learn.microsoft.com/en-us/azure/ai-services/openai/reference](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference)
**Automatic authentication:** Authentication is handled automatically in Fabric notebooks using
token-based authentication. No API keys are required.
**Supported models in Fabric:**
* `text-embedding-ada-002`
* `text-embedding-3-small`
* `text-embedding-3-large`
Request throttling:
Applies the rate limit set in the config (section `fabric.rate_limits`, key `embeddings`). If no rate
limit is configured, uses a default of 600 RPM. Batches up to 32 inputs per request for efficiency.
**Requirements:**
* Microsoft Fabric notebook environment
* `synapse-ml-fabric` package (pre-installed in Fabric)
**Parameters:**
* **`input`** (`pxt.String`): The text to embed (automatically batched).
* **`model`** (`pxt.String`): The embedding model deployment name
* **`api_version`** (`pxt.String`): The API version to use (default: '2024-02-15-preview').
* **`model_kwargs`** (`pxt.Json | None`): Additional keyword args for the Azure OpenAI `embeddings` API.
For details on available parameters, see:
[https://learn.microsoft.com/en-us/azure/ai-services/openai/reference](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference)
**Returns:**
* `pxt.Array[(None,), float32]`: An array representing the embedding vector for the input text.
**Examples:**
Add a computed column that applies the model `text-embedding-3-small` to an existing
Pixeltable column `tbl.text` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions import fabric
tbl.add_computed_column(
embed=fabric.embeddings(tbl.text, model='text-embedding-3-small')
)
```
Add an embedding index to an existing column `text`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_embedding_index(
'text',
embedding=fabric.embeddings.using(model='text-embedding-3-large'),
)
```
# fal
Source: https://docs.pixeltable.com/sdk/latest/fal
# module pixeltable.functions.fal
Pixeltable UDFs
that wrap various endpoints from the fal.ai API. In order to use them, you must
first `pip install fal-client` and configure your fal.ai credentials, as described in
the [Working with fal.ai](https://docs.pixeltable.com/notebooks/integrations/working-with-fal) tutorial.
## udf run()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
run(input: pxt.Json, *, app: pxt.String) -> pxt.Json
```
Run a model on fal.ai.
Uses fal's queue-based subscribe mechanism for reliable execution.
For additional details, see: [https://fal.ai/docs](https://fal.ai/docs)
Request throttling:
Applies the rate limit set in the config (section `fal`, key `rate_limit`). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install fal-client`
**Parameters:**
* **`input`** (`pxt.Json`): The input parameters for the model.
* **`app`** (`pxt.String`): The name or ID of the fal.ai application to run (e.g., 'fal-ai/flux/schnell').
**Returns:**
* `pxt.Json`: The output of the model as a JSON object.
**Examples:**
Add a computed column that applies the model `fal-ai/flux/schnell`
to an existing Pixeltable column `tbl.prompt` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
input = {'prompt': tbl.prompt}
tbl.add_computed_column(response=run(input, app='fal-ai/flux/schnell'))
```
Add a computed column that uses the model `fal-ai/fast-sdxl`
to generate images from an existing Pixeltable column `tbl.prompt`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
input = {
'prompt': tbl.prompt,
'image_size': 'square',
'num_inference_steps': 25,
}
tbl.add_computed_column(response=run(input, app='fal-ai/fast-sdxl'))
tbl.add_computed_column(
image=tbl.response['images'][0]['url'].astype(pxt.Image)
)
```
# FastAPIRouter
Source: https://docs.pixeltable.com/sdk/latest/fastapirouter
# class pixeltable.serving.FastAPIRouter
A FastAPI `APIRouter` that exposes Pixeltable table operations as HTTP endpoints.
`FastAPIRouter` is for apps that already have a FastAPI server. If you do
not have one, use `pxt serve` from the CLI; Pixeltable creates and runs the
FastAPI app for you. Learn more here: [HTTP Serving](https://docs.pixeltable.com/howto/deployment/serving).
## method add\_delete\_route()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
add_delete_route(
t: pxt.Table,
*,
path: str,
match_columns: list[str] | None = None,
background: bool = False
) -> None
```
Add a POST endpoint that deletes rows from `t` matching the given match column values.
The request body contains the match column values as JSON fields. The endpoint deletes every row
where each match column equals the provided value, and returns the number of rows affected.
**Parameters:**
* **`t`** (`pxt.Table`): The table to delete from.
* **`path`** (`str`): The URL path for the endpoint.
* **`match_columns`** (`list[str] | None`): Columns to match on (AND-ed equality). Defaults to the table's primary key.
Must be non-empty.
* **`background`** (`bool`, default: `False`): If True, return immediately with `{"id": ..., "job_url": ...}` and run the
operation in a background thread. Poll `job_url` for the result.
**Examples:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
router.add_delete_route(t, path='/delete')
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl -X POST http://localhost:8000/delete \
-H 'Content-Type: application/json' \
-d '{"id": 42}'
# {"num_rows": 1}
```
## method add\_insert\_route()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
add_insert_route(
t: pxt.Table,
*,
path: str,
inputs: list[str] | None = None,
uploadfile_inputs: list[str] | None = None,
outputs: list[str] | None = None,
return_fileresponse: bool = False,
export_sql: SqlExport | None = None,
background: bool = False
) -> None
```
Add a POST endpoint that inserts a single row into `t` and returns the resulting row.
The request body contains the input column values as JSON fields (or as
[multipart form data](https://fastapi.tiangolo.com/tutorial/request-files/) when
`uploadfile_inputs` is used). The response is a JSON object with the output column values,
or a [`FileResponse`](https://fastapi.tiangolo.com/advanced/custom-response/#fileresponse)
when `return_fileresponse=True`.
**Parameters:**
* **`t`** (`pxt.Table`): The table to insert into.
* **`path`** (`str`): The URL path for the endpoint.
* **`inputs`** (`list[str] | None`): Columns to accept as request fields. Defaults to all non-computed columns.
* **`uploadfile_inputs`** (`list[str] | None`): Columns to accept as
[`UploadFile`](https://fastapi.tiangolo.com/tutorial/request-files/) fields
(must be media-typed). These are sent as multipart form data; all other inputs
become [`Form`](https://fastapi.tiangolo.com/tutorial/request-forms/) fields.
* **`outputs`** (`list[str] | None`): Columns to include in the response. Defaults to all columns (including inputs).
* **`return_fileresponse`** (`bool`, default: `False`): If True, return the single media-typed output column as a
[`FileResponse`](https://fastapi.tiangolo.com/advanced/custom-response/#fileresponse).
Requires exactly one media-typed output column.
* **`export_sql`** (`SqlExport | None`): If set, export each inserted row into an external RDBMS table after the
Pixeltable insert succeeds. See [`SqlExport`](./sqlexport) for
the target specification and supported `method` values.
The row written is the response body: same columns as `outputs`, with media-typed
columns rendered as URL strings (so the corresponding target columns must be
string-typed).
Schema compatibility against the response columns is validated once at
registration time; the target table must already exist or registration fails.
Mutually exclusive with `return_fileresponse`. Compatible with `background=True`
(the SQL write runs in the worker thread).
Note: when paired with `method='update'`, a Pixeltable insert triggers a
target-side update -- this is intentional, supporting the append-only-source /
current-state-view pattern.
If the external write fails after the Pixeltable insert has already succeeded,
the request fails with HTTP 500; no rollback is performed.
* **`background`** (`bool`, default: `False`): If True, return immediately with `{"id": ..., "job_url": ...}` and run
the insert in a background thread. Poll `job_url` for the result. Mutually
exclusive with `return_fileresponse`.
**Examples:**
JSON request/response:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
router.add_insert_route(
t, path='/generate', inputs=['prompt'], outputs=['result']
)
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl -X POST http://localhost:8000/generate \
-H 'Content-Type: application/json' \
-d '{"prompt": "a sunset over the ocean"}'
# {"prompt": "a sunset over the ocean", "result": "..."}
```
File upload with `FileResponse`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
router.add_insert_route(
t,
path='/resize',
inputs=['width', 'height'],
uploadfile_inputs=['image'],
outputs=['resized'],
return_fileresponse=True,
)
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl -X POST http://localhost:8000/resize \
-F image=@photo.jpg -F width=640 -F height=480 \
--output resized.jpg
# saves the resized image to resized.jpg
```
Export each inserted row into an external RDBMS table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
router.add_insert_route(
t,
path='/generate',
inputs=['prompt'],
outputs=['prompt', 'result'],
export_sql=SqlExport(
db_connect='postgresql+psycopg://user:pw@host/analytics',
table='generations',
db_schema='public',
),
)
```
Each successful POST inserts a row into the Pixeltable table and then inserts the
same row (columns: `prompt`, `result`) to `public.generations` in the target database.
Background processing:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
router.add_insert_route(t, path='/slow', background=True)
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# submit
curl -X POST http://localhost:8000/slow -d '{"prompt": "hello"}'
# {"id": "abc123", "job_url": "http://localhost:8000/jobs/abc123"}
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# poll
curl http://localhost:8000/jobs/abc123
# {"status": "done", "result": {...}}
```
## method add\_query\_route()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
add_query_route(
*,
path: str,
query: pxt.Function,
inputs: list[str] | None = None,
uploadfile_inputs: list[str] | None = None,
one_row: bool = False,
return_fileresponse: bool = False,
background: bool = False,
method: Literal['get', 'post'] = 'post'
) -> None
```
Add an endpoint that executes a `@pxt.query` or `pxt.retrieval_udf` and returns the results.
By default the endpoint accepts POST requests with a JSON
[`Body`](https://fastapi.tiangolo.com/tutorial/body/) and returns `{"rows": [{...}, ...]}`.
Use `method='get'` for
[`Query`](https://fastapi.tiangolo.com/tutorial/query-params/) parameters instead.
**Parameters:**
* **`path`** (`str`): The URL path for the endpoint.
* **`query`** (`pxt.Function`): The query to execute, created with `@pxt.query` or `pxt.retrieval_udf()`.
* **`inputs`** (`list[str] | None`): Parameters to accept as request fields. Defaults to all query parameters.
* **`uploadfile_inputs`** (`list[str] | None`): Parameters to accept as
[`UploadFile`](https://fastapi.tiangolo.com/tutorial/request-files/) fields
(must be media-typed).
* **`one_row`** (`bool`, default: `False`): If True, expect exactly one result row and return it as a plain JSON object
(not wrapped in `{"rows": [...]}`). 0 rows produces a 404, >1 rows a 409.
* **`return_fileresponse`** (`bool`, default: `False`): If True, return the single media-typed result column as a
[`FileResponse`](https://fastapi.tiangolo.com/advanced/custom-response/#fileresponse).
Requires `one_row` semantics (0 rows -> 404, >1 rows -> 409).
Mutually exclusive with `background`.
* **`background`** (`bool`, default: `False`): If True, return immediately with `{"id": ..., "job_url": ...}` and run
the query in a background thread. Poll `job_url` for the result. Mutually
exclusive with `return_fileresponse`.
* **`method`** (`Literal['get', 'post']`, default: `'post'`): HTTP method for the endpoint (`'get'` or `'post'`).
**Examples:**
Multi-row JSON response:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
router.add_query_route(path='/search', query=search_docs)
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl -X POST http://localhost:8000/search \
-H 'Content-Type: application/json' \
-d '{"query_text": "hello"}'
# {"rows": [{"id": 1, "text": "hello world", "score": 0.95}, ...]}
```
Single-row lookup:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
router.add_query_route(path='/lookup', query=lookup_by_id, one_row=True)
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl -X POST http://localhost:8000/lookup -d '{"id": 42}'
# {"id": 42, "name": "Alice", "email": "alice@example.com"}
```
GET with query-string parameters:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
router.add_query_route(path='/lookup', query=lookup_by_id, method='get')
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl 'http://localhost:8000/lookup?id=42'
# {"id": 42, "name": "Alice", "email": "alice@example.com"}
```
`FileResponse`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
router.add_query_route(
path='/thumbnail', query=get_thumbnail, return_fileresponse=True
)
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl -X POST http://localhost:8000/thumbnail -d '{"id": 1}' --output thumb.jpg
# saves the thumbnail image to thumb.jpg
```
## method add\_update\_route()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
add_update_route(
t: pxt.Table,
*,
path: str,
inputs: list[str] | None = None,
outputs: list[str] | None = None,
return_fileresponse: bool = False,
export_sql: SqlExport | None = None,
background: bool = False
) -> None
```
Add a POST endpoint that updates a single row in `t` and returns the updated row.
The row to update is identified by its primary key, which must be included in the
request body alongside the input column values. The update is performed via a
single-row [`batch_update()`](./table#method-batch_update) call, using the primary
key columns to identify the row and the columns referenced in `inputs` as the values
to set.
The request body contains values for the primary key columns plus the input columns
as JSON fields. The response is a JSON object with the output column values, or a
[`FileResponse`](https://fastapi.tiangolo.com/advanced/custom-response/#fileresponse)
when `return_fileresponse=True`.
Note: media-typed columns (image, video, audio, document) are excluded from `inputs`
and from the default input set.
**Parameters:**
* **`t`** (`pxt.Table`): The table to update.
* **`path`** (`str`): The URL path for the endpoint.
* **`inputs`** (`list[str] | None`): Columns to accept as request fields, excluding primary key and media-typed
columns (which cannot be updated). Defaults to all non-computed, non-primary-key,
non-media columns.
* **`outputs`** (`list[str] | None`): Columns to include in the response. Defaults to all columns (including
inputs).
* **`return_fileresponse`** (`bool`, default: `False`): If True, return the single media-typed output column as a
[`FileResponse`](https://fastapi.tiangolo.com/advanced/custom-response/#fileresponse).
Requires exactly one media-typed output column.
* **`export_sql`** (`SqlExport | None`): If set, export each updated row into an external RDBMS table after the
Pixeltable update succeeds. See [`SqlExport`](./sqlexport) for
the target specification and supported `method` values.
The row written is the response body: same columns as `outputs`, with media-typed
columns rendered as URL strings (so the corresponding target columns must be
string-typed).
Schema compatibility is validated once at registration time; the target table
must already exist or registration fails. Mutually exclusive with
`return_fileresponse`. Compatible with `background=True`.
Note: with `method='insert'` (the default), every update appends a new row to the
target table -- the target acts as an audit log, not a current-state view. Use
`method='update'` to keep the target as a current-state view keyed on the
target's primary key.
If the external write fails after the Pixeltable update has already succeeded,
the request fails with HTTP 500; no rollback is performed.
* **`background`** (`bool`, default: `False`): If True, return immediately with `{"id": ..., "job_url": ...}` and run
the update in a background thread. Poll `job_url` for the result. Mutually
exclusive with `return_fileresponse`.
**Examples:**
JSON request/response (table has primary key `id`):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
router.add_update_route(
t, path='/update', inputs=['prompt'], outputs=['prompt', 'result']
)
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl -X POST http://localhost:8000/update \
-H 'Content-Type: application/json' \
-d '{"id": 1, "prompt": "a sunset over the ocean"}'
# {"prompt": "a sunset over the ocean", "result": "..."}
```
Append every update to an external audit table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
router.add_update_route(
t,
path='/update',
inputs=['prompt'],
outputs=['id', 'prompt', 'result'],
export_sql=SqlExport(
db_connect='postgresql+psycopg://user:pw@host/analytics',
table='update_log',
),
)
```
Background processing:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
router.add_update_route(t, path='/slow-update', background=True)
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# submit
curl -X POST http://localhost:8000/slow-update \
-H 'Content-Type: application/json' \
-d '{"id": 1, "prompt": "hello"}'
# {"id": "abc123", "job_url": "http://localhost:8000/jobs/abc123"}
# poll
curl http://localhost:8000/jobs/abc123
# {"status": "done", "result": {...}}
```
## method insert\_route()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
insert_route(
t: pxt.Table,
*,
path: str,
inputs: list[str] | None = None,
uploadfile_inputs: list[str] | None = None,
outputs: list[str] | None = None,
export_sql: SqlExport | None = None,
background: bool = False
) -> Callable[[Callable[..., pydantic.BaseModel]], Callable[..., pydantic.BaseModel]]
```
Decorator that registers a POST endpoint performing a `Table.insert()` followed by user-defined post-processing.
The request body carries the input column values (JSON, or multipart form data when `uploadfile_inputs` is
used). After inserting the row, the decorated function is called with the requested output columns as
keyword arguments (parameter names and Pixeltable types must match `outputs`). Its return value must be a
Pydantic model and is returned as the HTTP response body.
Media-typed outputs (image, video, audio, document) are delivered to the function as `/media/` URL
strings -- annotate those parameters as `str` (or `str | None` if the column is nullable), not as
`pxt.Image` / `pxt.Video` / etc.
**Parameters:**
* **`t`** (`pxt.Table`): The table to insert into.
* **`path`** (`str`): The URL path for the endpoint.
* **`inputs`** (`list[str] | None`): Columns to accept as request fields. Defaults to all non-computed columns.
* **`uploadfile_inputs`** (`list[str] | None`): Columns to accept as
[`UploadFile`](https://fastapi.tiangolo.com/tutorial/request-files/) fields
(must be media-typed). These are sent as multipart form data; all other inputs
become [`Form`](https://fastapi.tiangolo.com/tutorial/request-forms/) fields.
* **`outputs`** (`list[str] | None`): Columns from the inserted row to pass to the decorated function as keyword
arguments. Defaults to all columns.
* **`export_sql`** (`SqlExport | None`): If set, export the decorated function's return value into an external
RDBMS table after the Pixeltable insert succeeds. See
[`SqlExport`](./sqlexport) for the target specification and
supported `method` values.
The row written is the user function's pydantic return value (its fields, not the
source columns), so the target table schema must match those fields. Media-typed
fields are modeled as strings (URL form).
Schema compatibility is validated once at registration time; the target table
must already exist or registration fails. Compatible with `background=True` (the
SQL write runs in the worker thread).
If the external write fails after the Pixeltable insert has already succeeded,
the request fails with HTTP 500; no rollback is performed.
* **`background`** (`bool`, default: `False`): If True, return immediately with `{"id": ..., "job_url": ...}` and run
the insert plus post-processing in a background thread. Poll `job_url` for the
result; the decorated function's return value is delivered as the job result.
**Examples:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
class GenerateResponse(pydantic.BaseModel):
caption: str
score: float
@router.insert_route(
t,
path='/generate',
inputs=['prompt'],
outputs=['caption', 'score'],
background=False,
)
def format_response(*, caption: str, score: float) -> GenerateResponse:
return GenerateResponse(
caption=caption.strip(), score=round(score, 3)
)
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl -X POST http://localhost:8000/generate \
-H 'Content-Type: application/json' \
-d '{"prompt": "a sunset over the ocean"}'
# {"caption": "orange sky above calm water", "score": 0.932}
```
Export the post-processed response into an external RDBMS table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@router.insert_route(
t,
path='/generate',
inputs=['prompt'],
outputs=['caption', 'score'],
export_sql=SqlExport(
db_connect='postgresql+psycopg://user:pw@host/analytics',
table='captions',
),
)
def format_response(*, caption: str, score: float) -> GenerateResponse:
return GenerateResponse(
caption=caption.strip(), score=round(score, 3)
)
```
Each successful POST inserts a row into the Pixeltable table and then appends a row
with columns `caption`, `score` (the response model fields) to `captions`.
## method update\_route()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
update_route(
t: pxt.Table,
*,
path: str,
inputs: list[str] | None = None,
outputs: list[str] | None = None,
export_sql: SqlExport | None = None,
background: bool = False
) -> Callable[[Callable[..., pydantic.BaseModel]], Callable[..., pydantic.BaseModel]]
```
Decorator that registers a POST endpoint performing a `Table.batch_update()` followed by
user-defined post-processing.
The request body carries values for the primary key columns (to identify the row) plus the
input column values as JSON fields. After updating the row, the decorated function is called
with the requested output columns as keyword arguments (parameter names and Pixeltable types
must match `outputs`). Its return value must be a Pydantic model and is returned as the HTTP
response body.
Media-typed outputs (image, video, audio, document) are delivered to the function as `/media/` URL
strings -- annotate those parameters as `str` (or `str | None` if the column is nullable), not as
`pxt.Image` / `pxt.Video` / etc.
If the row does not exist, the endpoint returns HTTP 404 without calling the decorated
function.
Note: media-typed columns (image, video, audio, document) and primary key columns cannot be
used as `inputs`. Primary key columns are always part of the request body for row
identification.
**Parameters:**
* **`t`** (`pxt.Table`): The table to update.
* **`path`** (`str`): The URL path for the endpoint.
* **`inputs`** (`list[str] | None`): Columns to accept as update fields. Defaults to all non-computed, non-primary-key,
non-media columns.
* **`outputs`** (`list[str] | None`): Columns from the updated row to pass to the decorated function as keyword
arguments. Defaults to all columns.
* **`export_sql`** (`SqlExport | None`): If set, export the decorated function's return value into an external
RDBMS table after the Pixeltable update succeeds. See
[`SqlExport`](./sqlexport) for the target specification and
supported `method` values.
The row written is the user function's pydantic return value (its fields, not the
source columns), so the target table schema must match those fields. Media-typed
fields are modeled as strings (URL form).
Schema compatibility is validated once at registration time; the target table
must already exist or registration fails. Compatible with `background=True`.
Note: with `method='insert'` (the default), every update appends a new row to the
target table -- the target acts as an audit log, not a current-state view. Use
`method='update'` to keep the target as a current-state view keyed on the
target's primary key.
If the external write fails after the Pixeltable update has already succeeded,
the request fails with HTTP 500; no rollback is performed.
* **`background`** (`bool`, default: `False`): If True, return immediately with `{"id": ..., "job_url": ...}` and run the
update plus post-processing in a background thread. Poll `job_url` for the result;
the decorated function's return value is delivered as the job result.
**Examples:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
class ItemResponse(pydantic.BaseModel):
id: int
summary: str
score: float
@router.update_route(
t, path='/update', inputs=['text'], outputs=['id', 'text', 'score']
)
def format_response(*, id: int, text: str, score: float) -> ItemResponse:
return ItemResponse(id=id, summary=text[:100], score=round(score, 3))
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
curl -X POST http://localhost:8000/update \
-H 'Content-Type: application/json' \
-d '{"id": 42, "text": "new content"}'
# {"id": 42, "summary": "new content", "score": 0.871}
```
Append every post-processed update into an external audit table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@router.update_route(
t,
path='/update',
inputs=['text'],
outputs=['id', 'text', 'score'],
export_sql=SqlExport(
db_connect='postgresql+psycopg://user:pw@host/analytics',
table='item_log',
),
)
def format_response(*, id: int, text: str, score: float) -> ItemResponse:
return ItemResponse(id=id, summary=text[:100], score=round(score, 3))
```
Background processing:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@router.update_route(t, path='/slow-update', background=True)
def format_response(*, id: int, result: str) -> MyResponse:
return MyResponse(id=id, result=result.strip())
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# submit
curl -X POST http://localhost:8000/slow-update \
-H 'Content-Type: application/json' \
-d '{"id": 1, "text": "hello"}'
# {"id": "abc123", "job_url": "http://localhost:8000/jobs/abc123"}
# poll
curl http://localhost:8000/jobs/abc123
# {"status": "done", "result": {"id": 1, "result": "hello"}}
```
# fireworks
Source: https://docs.pixeltable.com/sdk/latest/fireworks
# module pixeltable.functions.fireworks
Pixeltable UDFs
that wrap various endpoints from the Fireworks AI API. In order to use them, you must
first `pip install fireworks-ai` and configure your Fireworks AI credentials, as described in
the [Working with Fireworks](https://docs.pixeltable.com/notebooks/integrations/working-with-fireworks) tutorial.
## udf chat\_completions()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
chat_completions(
messages: pxt.Json[(Json, ...)],
*,
model: pxt.String,
model_kwargs: pxt.Json | None = None
) -> pxt.Json
```
Creates a model response for the given chat conversation.
Equivalent to the Fireworks AI `chat/completions` API endpoint.
For additional details, see: [https://docs.fireworks.ai/api-reference/post-chatcompletions](https://docs.fireworks.ai/api-reference/post-chatcompletions)
Request throttling:
Applies the rate limit set in the config (section `fireworks`, key `rate_limit`). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install fireworks-ai`
**Parameters:**
* **`messages`** (`pxt.Json[(Json`): A list of messages comprising the conversation so far.
* **`model`** (`Any`): The name of the model to use.
* **`model_kwargs`** (`Any`): Additional keyword args for the Fireworks `chat_completions` API. For details on the available
parameters, see: [https://docs.fireworks.ai/api-reference/post-chatcompletions](https://docs.fireworks.ai/api-reference/post-chatcompletions)
**Returns:**
* `pxt.Json`: A dictionary containing the response and other metadata.
**Examples:**
Add a computed column that applies the model `accounts/fireworks/models/mixtral-8x22b-instruct`
to an existing Pixeltable column `tbl.prompt` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
messages = [{'role': 'user', 'content': tbl.prompt}]
tbl.add_computed_column(
response=chat_completions(
messages, model='accounts/fireworks/models/mixtral-8x22b-instruct'
)
)
```
# functions
Source: https://docs.pixeltable.com/sdk/latest/functions
# module pixeltable.functions
General Pixeltable UDFs.
This parent module contains general-purpose UDFs that apply to multiple data types.
## func map()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
map(
expr: pixeltable.exprs.expr.Expr,
fn: Callable[[pixeltable.exprs.expr.Expr], Any]
) -> pixeltable.exprs.expr.Expr
```
Applies a mapping function to each element of a list.
**Parameters:**
* **`expr`** (`pixeltable.exprs.expr.Expr`): The list expression to map over; must be an expression of type `pxt.Json`.
* **`fn`** (`typing.Callable[[pixeltable.exprs.expr.Expr], typing.Any]`): An operation on Pixeltable expressions that will be applied to each element of the JSON array.
**Examples:**
Given a table `tbl` with a column `data` of type `pxt.Json` containing lists of integers, add a computed
column that produces new lists with each integer doubled:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
doubled=pxt.functions.map(t.data, lambda x: x * 2)
)
```
## uda count()
```python Signatures theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Signature 1:
@pxt.uda
count(val: pxt.String | None) -> pxt.Int
# Signature 2:
@pxt.uda
count(val: pxt.Bool | None) -> pxt.Int
# Signature 3:
@pxt.uda
count(val: pxt.Int | None) -> pxt.Int
# Signature 4:
@pxt.uda
count(val: pxt.Float | None) -> pxt.Int
# Signature 5:
@pxt.uda
count(val: pxt.Timestamp | None) -> pxt.Int
# Signature 6:
@pxt.uda
count(val: pxt.Json | None) -> pxt.Int
# Signature 7:
@pxt.uda
count(val: pxt.Array | None) -> pxt.Int
# Signature 8:
@pxt.uda
count(val: pxt.Image | None) -> pxt.Int
# Signature 9:
@pxt.uda
count(val: pxt.Video | None) -> pxt.Int
# Signature 10:
@pxt.uda
count(val: pxt.Audio | None) -> pxt.Int
# Signature 11:
@pxt.uda
count(val: pxt.Document | None) -> pxt.Int
# Signature 12:
@pxt.uda
count(val: pxt.Date | None) -> pxt.Int
# Signature 13:
@pxt.uda
count(val: pxt.UUID | None) -> pxt.Int
# Signature 14:
@pxt.uda
count(val: pxt.Binary | None) -> pxt.Int
```
Aggregate function that counts the number of non-null values in a column or grouping.
**Parameters:**
* **`val`** (`String | None`): The value to count.
**Returns:**
* `pxt.Int`: The count of non-null values.
**Examples:**
Count the number of non-null values in the `value` column of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(pxt.functions.count(tbl.value)).collect()
```
Group by the `category` column and compute the count of non-null values in the `value` column
for each category, assigning the name `'category_count'` to the new column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.group_by(tbl.category).select(
tbl.category, category_count=pxt.functions.count(tbl.value)
).collect()
```
## uda max()
```python Signatures theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Signature 1:
@pxt.uda
max(val: pxt.String | None) -> pxt.String | None
# Signature 2:
@pxt.uda
max(val: pxt.Int | None) -> pxt.Int | None
# Signature 3:
@pxt.uda
max(val: pxt.Float | None) -> pxt.Float | None
# Signature 4:
@pxt.uda
max(val: pxt.Bool | None) -> pxt.Bool | None
# Signature 5:
@pxt.uda
max(val: pxt.Timestamp | None) -> pxt.Timestamp | None
```
Aggregate function that computes the maximum value in a column or grouping.
**Parameters:**
* **`val`** (`String | None`): The value to compare.
**Returns:**
* `pxt.String | None`: The maximum value, or `None` if there are no non-null values.
**Examples:**
Compute the maximum value in the `value` column of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(pxt.functions.max(tbl.value)).collect()
```
Group by the `category` column and compute the maximum value in the `value` column for each category,
assigning the name `'category_max'` to the new column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.group_by(tbl.category).select(
tbl.category, category_max=pxt.functions.max(tbl.value)
).collect()
```
## uda mean()
```python Signatures theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Signature 1:
@pxt.uda
mean(val: pxt.Int | None) -> pxt.Float | None
# Signature 2:
@pxt.uda
mean(val: pxt.Float | None) -> pxt.Float | None
```
Aggregate function that computes the mean (average) of non-null values of a numeric column or grouping.
**Parameters:**
* **`val`** (`Int | None`): The numeric value to include in the mean.
**Returns:**
* `pxt.Float | None`: The mean of the non-null values, or `None` if there are no non-null values.
**Examples:**
Compute the mean of the values in the `value` column of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(pxt.functions.mean(tbl.value)).collect()
```
Group by the `category` column and compute the mean of the `value` column for each category,
assigning the name `'category_mean'` to the new column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.group_by(tbl.category).select(
tbl.category, category_mean=pxt.functions.mean(tbl.value)
).collect()
```
## uda min()
```python Signatures theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Signature 1:
@pxt.uda
min(val: pxt.String | None) -> pxt.String | None
# Signature 2:
@pxt.uda
min(val: pxt.Int | None) -> pxt.Int | None
# Signature 3:
@pxt.uda
min(val: pxt.Float | None) -> pxt.Float | None
# Signature 4:
@pxt.uda
min(val: pxt.Bool | None) -> pxt.Bool | None
# Signature 5:
@pxt.uda
min(val: pxt.Timestamp | None) -> pxt.Timestamp | None
```
Aggregate function that computes the minimum value in a column or grouping.
**Parameters:**
* **`val`** (`String | None`): The value to compare.
**Returns:**
* `pxt.String | None`: The minimum value, or `None` if there are no non-null values.
**Examples:**
Compute the minimum value in the `value` column of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(pxt.functions.min(tbl.value)).collect()
```
Group by the `category` column and compute the minimum value in the `value` column for each category,
assigning the name `'category_min'` to the new column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.group_by(tbl.category).select(
tbl.category, category_min=pxt.functions.min(tbl.value)
).collect()
```
## uda sum()
```python Signatures theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Signature 1:
@pxt.uda
sum(val: pxt.Int | None) -> pxt.Int | None
# Signature 2:
@pxt.uda
sum(val: pxt.Float | None) -> pxt.Float | None
```
Aggregate function that computes the sum of non-null values of a numeric column or grouping.
**Parameters:**
* **`val`** (`Int | None`): The numeric value to add to the sum.
**Returns:**
* `pxt.Int | None`: The sum of the non-null values, or `None` if there are no non-null values.
**Examples:**
Sum the values in the `value` column of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(pxt.functions.sum(tbl.value)).collect()
```
Group by the `category` column and compute the sum of the `value` column for each category,
assigning the name `'category_total'` to the new column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.group_by(tbl.category).select(
tbl.category, category_total=pxt.functions.sum(tbl.value)
).collect()
```
# gemini
Source: https://docs.pixeltable.com/sdk/latest/gemini
# module pixeltable.functions.gemini
Pixeltable UDFs
that wrap various endpoints from the Google Gemini API. In order to use them, you must
first `pip install google-genai` and configure your Gemini credentials, as described in
the [Working with Gemini](https://docs.pixeltable.com/howto/providers/working-with-gemini) tutorial.
Supports two authentication methods:
* Google AI Studio: set `GOOGLE_API_KEY` or `GEMINI_API_KEY` (or put `api_key` in the `gemini` section of
the Pixeltable config file).
* Vertex AI: set `GOOGLE_GENAI_USE_VERTEXAI=true` and `GOOGLE_CLOUD_PROJECT` (and optionally
`GOOGLE_CLOUD_LOCATION`), then authenticate via Application Default Credentials
(e.g. `gcloud auth application-default login`).
## func invoke\_tools()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
invoke_tools(
tools: pixeltable.func.tools.Tools,
response: pixeltable.exprs.expr.Expr
) -> pixeltable.exprs.inline_expr.InlineDict
```
Converts an OpenAI response dict to Pixeltable tool invocation format and calls `tools._invoke()`.
## udf embed\_content()
```python Signatures theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Signature 1:
@pxt.udf
embed_content(
contents: pxt.String,
model: pxt.String,
config: pxt.Json | None
) -> pxt.Array[(None,), float32]
# Signature 2:
@pxt.udf
embed_content(
contents: pxt.Image,
model: pxt.String,
config: pxt.Json | None
) -> pxt.Array[(None,), float32]
# Signature 3:
@pxt.udf
embed_content(
contents: pxt.Audio,
model: pxt.String,
config: pxt.Json | None
) -> pxt.Array[(None,), float32]
# Signature 4:
@pxt.udf
embed_content(
contents: pxt.Video,
model: pxt.String,
config: pxt.Json | None
) -> pxt.Array[(None,), float32]
# Signature 5:
@pxt.udf
embed_content(
contents: pxt.Document,
model: pxt.String,
config: pxt.Json | None
) -> pxt.Array[(None,), float32]
```
Generate embeddings for text, images, video, and other content. For more information on Gemini embeddings API, see:
[https://ai.google.dev/gemini-api/docs/embeddings](https://ai.google.dev/gemini-api/docs/embeddings)
**Requirements:**
* `pip install google-genai`
**Parameters:**
* **`contents`** (`String`): The string, image, audio, video, or document to embed.
* **`model`** (`String`): The Gemini model to use.
* **`config`** (`Json | None`, default: `Literal(None)`): Configuration for embedding generation, corresponding to keyword arguments of
`genai.types.EmbedContentConfig`. For details on the parameters, see:
[https://googleapis.github.io/python-genai/genai.html#genai.types.EmbedContentConfig](https://googleapis.github.io/python-genai/genai.html#genai.types.EmbedContentConfig)
**Returns:**
* `pxt.Array[(None,), float32]`: The corresponding embedding vector.
**Examples:**
Add a computed column with embeddings to an existing table with a `text` column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(
embedding=embed_content(t.text, model='gemini-embedding-2')
)
```
Add an embedding index on `text` column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_embedding_index(
t.text, embedding=embed_content.using(model='gemini-embedding-2')
)
```
## udf generate\_content()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
generate_content(
contents: pxt.Json,
*,
model: pxt.String,
config: pxt.Json | None = None,
tools: pxt.Json[(Json, ...)] | None = None
) -> pxt.Json
```
Generate content from the specified model.
Request throttling:
Applies the rate limit set in the config (section `gemini.rate_limits`; use the model id as the key). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install google-genai`
**Parameters:**
* **`contents`** (`pxt.Json`): The input content to generate from. Can be a prompt, or a list containing images and text
prompts, as described in: [https://ai.google.dev/gemini-api/docs/text-generation](https://ai.google.dev/gemini-api/docs/text-generation) and
[https://ai.google.dev/gemini-api/docs/image-generation](https://ai.google.dev/gemini-api/docs/image-generation) for image generation.
* **`model`** (`pxt.String`): The name of the model to use.
* **`config`** (`pxt.Json | None`): Configuration for generation, corresponding to keyword arguments of
`genai.types.GenerateContentConfig`. For details on the parameters, see:
[https://googleapis.github.io/python-genai/genai.html#genai.types.GenerateContentConfig](https://googleapis.github.io/python-genai/genai.html#genai.types.GenerateContentConfig)
* **`tools`** (`pxt.Json[(Json`): An optional list of Pixeltable tools to use. It is also possible to specify tools manually via the
`config['tools']` parameter, but at most one of `config['tools']` or `tools` may be used.
**Returns:**
* `pxt.Json`: A dictionary containing the response and other metadata.
**Examples:**
Add a computed column that applies the model `gemini-2.5-flash`
to an existing Pixeltable column `tbl.prompt` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
response=generate_content(tbl.prompt, model='gemini-2.5-flash')
)
```
Generate an image with a Nano Banana model (Gemini image-generation models such as
`gemini-3.1-flash-image-preview`) and extract the PIL image from the response using JSON
subscripting. Image bytes in `inline_data.data` are decoded into PIL images
automatically. Pass `response_modalities=['IMAGE']` so the response contains a
single image part:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
response=generate_content(
tbl.prompt,
model='gemini-3.1-flash-image-preview',
config={'response_modalities': ['IMAGE']},
)
)
tbl.add_computed_column(
image=tbl.response.candidates[0].content.parts[0].inline_data.data
)
```
## udf generate\_images()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
generate_images(
prompt: pxt.String,
*,
model: pxt.String,
config: pxt.Json | None = None
) -> pxt.Image
```
Generates images based on a text description and configuration. For additional details, see:
[https://ai.google.dev/gemini-api/docs/imagen](https://ai.google.dev/gemini-api/docs/imagen)
Note: This function is for Imagen models only. For Gemini image-generation models (Nano Banana,
e.g. `gemini-3.1-flash-image-preview`), use [`generate_content`](./gemini#func-generate_content)
instead.
Request throttling:
Applies the rate limit set in the config (section `imagen.rate_limits`; use the model id as the key). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install google-genai`
**Parameters:**
* **`prompt`** (`pxt.String`): A text description of the images to generate.
* **`model`** (`pxt.String`): The model to use.
* **`config`** (`pxt.Json | None`): Configuration for generation, corresponding to keyword arguments of
`genai.types.GenerateImagesConfig`. For details on the parameters, see:
[https://googleapis.github.io/python-genai/genai.html#genai.types.GenerateImagesConfig](https://googleapis.github.io/python-genai/genai.html#genai.types.GenerateImagesConfig)
**Returns:**
* `pxt.Image`: The generated image.
**Examples:**
Add a computed column that applies the model `imagen-4.0-generate-001`
to an existing Pixeltable column `tbl.prompt` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
response=generate_images(tbl.prompt, model='imagen-4.0-generate-001')
)
```
## udf generate\_speech()
```python Signatures theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Signature 1:
@pxt.udf
generate_speech(
text: pxt.String,
model: pxt.String,
voice: pxt.String,
config: pxt.Json | None
) -> pxt.Audio
# Signature 2:
@pxt.udf
generate_speech(
text: pxt.String,
model: pxt.String,
voices: pxt.Json,
config: pxt.Json | None
) -> pxt.Audio
```
Generates speech audio from text using Gemini's text-to-speech capability. For additional details, see:
[https://ai.google.dev/gemini-api/docs/speech-generation](https://ai.google.dev/gemini-api/docs/speech-generation)
Request throttling:
Applies the rate limit set in the config (section `gemini.rate_limits`; use the model id as the key). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install google-genai`
**Parameters:**
* **`text`** (`String`): The text to synthesize into speech.
* **`model`** (`String`): The model to use (e.g. `'gemini-2.5-flash-preview-tts'`).
* **`voice`** (`String`): The voice profile to use. Supported voices include `'Kore'`, `'Puck'`, `'Charon'`,
`'Fenrir'`, `'Aoede'`, `'Leda'`, `'Orus'`, `'Zephyr'`, and others. See the
[speech generation docs](https://ai.google.dev/gemini-api/docs/speech-generation) for the full list.
Mutually exclusive with `voices`.
* **`voices`** (`Json`): A mapping from speaker alias (as used in the text) to voice name. For example,
`{'Alice': 'Kore', 'Bob': 'Puck'}`. Mutually exclusive with `voice`.
* **`config`** (`Json | None`, default: `Literal(None)`): Additional configuration, corresponding to keyword arguments of
`genai.types.GenerateContentConfig`. Keys such as `response_modalities` and `speech_config`
are set automatically and should not be included.
**Returns:**
* `pxt.Audio`: An audio file (WAV, 24 kHz mono 16-bit) containing the synthesized speech.
**Examples:**
Add a computed column that generates speech from text:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
audio=generate_speech(
tbl.text, model='gemini-2.5-flash-preview-tts', voice='Kore'
)
)
```
## udf generate\_videos()
```python Signatures theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Signature 1:
@pxt.udf
generate_videos(
prompt: pxt.String | None,
image: pxt.Image | None,
model: pxt.String,
config: pxt.Json | None
) -> pxt.Video
# Signature 2:
@pxt.udf
generate_videos(
prompt: pxt.String | None,
images: pxt.Json[(Image, ...)] | None,
model: pxt.String,
config: pxt.Json | None,
reference_types: pxt.Json[(String, ...)] | None
) -> pxt.Video
```
Generates videos based on a text description and configuration. For additional details, see:
[https://ai.google.dev/gemini-api/docs/video](https://ai.google.dev/gemini-api/docs/video)
At least one of `prompt` or `image` must be provided. When `image` is a single image, it is used as the first
frame of the generated video. When `image` is a list of images, they are used as reference images to guide the
style or asset appearance throughout the video (Veo 3.1+). See the overloaded signature for details.
Request throttling:
Applies the rate limit set in the config (section `veo.rate_limits`; use the model id as the key). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install google-genai`
**Parameters:**
* **`prompt`** (`String | None`, default: `Literal(None)`): A text description of the videos to generate.
* **`image`** (`Image | None`, default: `Literal(None)`): A single image to use as the first frame of the video, or as `images` a list of up to 3 reference images
for Veo 3.1 (see overloaded signature).
* **`model`** (`String`): The model to use.
* **`config`** (`Json | None`, default: `Literal(None)`): Configuration for generation, corresponding to keyword arguments of
`genai.types.GenerateVideosConfig`. For details on the parameters, see:
[https://googleapis.github.io/python-genai/genai.html#genai.types.GenerateVideosConfig](https://googleapis.github.io/python-genai/genai.html#genai.types.GenerateVideosConfig)
**Returns:**
* `pxt.Video`: The generated video.
**Examples:**
Add a computed column that applies the model `veo-3.0-generate-001`
to an existing Pixeltable column `tbl.prompt` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
response=generate_videos(tbl.prompt, model='veo-3.0-generate-001')
)
```
Use reference images with Veo 3.1 to guide video generation:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
response=generate_videos(
tbl.prompt,
images=[tbl.ref_img1, tbl.ref_img2],
reference_types=['asset', 'asset'],
model='veo-3.1-generate-preview',
)
)
```
## udf transcribe()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
transcribe(
audio: pxt.Audio,
*,
model: pxt.String,
prompt: pxt.String,
config: pxt.Json | None = None
) -> pxt.String
```
Transcribes audio to text using Gemini's audio understanding capability. For additional details, see:
[https://ai.google.dev/gemini-api/docs/audio](https://ai.google.dev/gemini-api/docs/audio)
Request throttling:
Applies the rate limit set in the config (section `gemini.rate_limits`; use the model id as the key). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install google-genai`
**Parameters:**
* **`audio`** (`pxt.Audio`): The audio file to transcribe.
* **`model`** (`pxt.String`): The model to use (e.g. `'gemini-2.5-flash'`).
* **`prompt`** (`pxt.String`): The instruction prompt sent alongside the audio. For example,
`'Generate a transcript of the speech.'` or `'Summarize the audio content.'`.
* **`config`** (`pxt.Json | None`): Additional configuration, corresponding to keyword arguments of
`genai.types.GenerateContentConfig`.
**Returns:**
* `pxt.String`: The transcribed text.
**Examples:**
Add a computed column that transcribes audio:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
transcript=transcribe(tbl.audio, model='gemini-2.5-flash')
)
```
# groq
Source: https://docs.pixeltable.com/sdk/latest/groq
# module pixeltable.functions.groq
Pixeltable UDFs
that wrap various endpoints from the Groq API. In order to use them, you must
first `pip install groq` and configure your Groq credentials, as described in
the [Working with Groq](https://docs.pixeltable.com/notebooks/integrations/working-with-groq) tutorial.
## func invoke\_tools()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
invoke_tools(
tools: pixeltable.func.tools.Tools,
response: pixeltable.exprs.expr.Expr
) -> pixeltable.exprs.inline_expr.InlineDict
```
Converts an OpenAI response dict to Pixeltable tool invocation format and calls `tools._invoke()`.
## udf chat\_completions()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
chat_completions(
messages: pxt.Json[(Json, ...)],
*,
model: pxt.String,
model_kwargs: pxt.Json | None = None,
tools: pxt.Json[(Json, ...)] | None = None,
tool_choice: pxt.Json | None = None
) -> pxt.Json
```
Chat Completion API.
Equivalent to the Groq `chat/completions` API endpoint.
For additional details, see: [https://console.groq.com/docs/api-reference#chat-create](https://console.groq.com/docs/api-reference#chat-create)
Request throttling:
Applies the rate limit set in the config (section `groq`, key `rate_limit`). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install groq`
**Parameters:**
* **`messages`** (`pxt.Json[(Json`): A list of messages comprising the conversation so far.
* **`model`** (`Any`): ID of the model to use. (See overview here: [https://console.groq.com/docs/models](https://console.groq.com/docs/models))
* **`model_kwargs`** (`Any`): Additional keyword args for the Groq `chat/completions` API.
For details on the available parameters, see: [https://console.groq.com/docs/api-reference#chat-create](https://console.groq.com/docs/api-reference#chat-create)
**Returns:**
* `pxt.Json`: A dictionary containing the response and other metadata.
**Examples:**
Add a computed column that applies the model `llama-3.1-8b-instant`
to an existing Pixeltable column `tbl.prompt` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
messages = [{'role': 'user', 'content': tbl.prompt}]
tbl.add_computed_column(
response=chat_completions(messages, model='llama-3.1-8b-instant')
)
```
# huggingface
Source: https://docs.pixeltable.com/sdk/latest/huggingface
# module pixeltable.functions.huggingface
Pixeltable UDFs
that wrap various models from the Hugging Face `transformers` package.
These UDFs will cause Pixeltable to invoke the relevant models locally. In order to use them, you must
first `pip install transformers` (or in some cases, `sentence-transformers`, as noted in the specific
UDFs).
## UDFs
## udf automatic\_speech\_recognition()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
automatic_speech_recognition(
audio: pxt.Audio,
*,
model_id: pxt.String,
language: pxt.String | None = None,
chunk_length_s: pxt.Int | None = None,
return_timestamps: pxt.Bool = False
) -> pxt.String
```
Transcribes speech to text using a pretrained ASR model. `model_id` should be a reference to a
pretrained [automatic-speech-recognition model](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition).
This is a **generic function** that works with many ASR model families. For production use with
specific models, consider specialized functions like `whisper.transcribe()` or
`speech2text_for_conditional_generation()`.
**Requirements:**
* `pip install torch transformers torchaudio torchcodec`
**Recommended Models:**
* **OpenAI Whisper**: `openai/whisper-tiny.en`, `openai/whisper-small`, `openai/whisper-base`
* **Facebook Wav2Vec2**: `facebook/wav2vec2-base-960h`, `facebook/wav2vec2-large-960h-lv60-self`
* **Microsoft SpeechT5**: `microsoft/speecht5_asr`
* **Meta MMS (Multilingual)**: `facebook/mms-1b-all`
**Parameters:**
* **`audio`** (`pxt.Audio`): The audio file(s) to transcribe.
* **`model_id`** (`pxt.String`): The pretrained ASR model to use.
* **`language`** (`pxt.String | None`): Language code for multilingual models (e.g., 'en', 'es', 'fr').
* **`chunk_length_s`** (`pxt.Int | None`): Maximum length of audio chunks in seconds for long audio processing.
* **`return_timestamps`** (`pxt.Bool`): Whether to return word-level timestamps (model dependent).
**Returns:**
* `pxt.String`: The transcribed text.
**Examples:**
Add a computed column that transcribes audio files:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
transcription=automatic_speech_recognition(
tbl.audio_file,
model_id='openai/whisper-tiny.en', # Recommended
)
)
```
Transcribe with language specification:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
transcription=automatic_speech_recognition(
tbl.audio_file, model_id='facebook/mms-1b-all', language='en'
)
)
```
## udf clip()
```python Signatures theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Signature 1:
@pxt.udf
clip(
text: pxt.String,
model_id: pxt.String
) -> pxt.Array[(None,), float32]
# Signature 2:
@pxt.udf
clip(
image: pxt.Image,
model_id: pxt.String
) -> pxt.Array[(None,), float32]
```
Computes a CLIP embedding for the specified text or image. `model_id` should be a reference to a pretrained
[CLIP Model](https://huggingface.co/docs/transformers/model_doc/clip).
**Requirements:**
* `pip install torch transformers`
**Parameters:**
* **`text`** (`String`): The string to embed.
* **`model_id`** (`String`): The pretrained model to use for the embedding.
**Returns:**
* `pxt.Array[(None,), float32]`: An array containing the output of the embedding model.
**Examples:**
Add a computed column that applies the model `openai/clip-vit-base-patch32` to an existing
Pixeltable column `tbl.text` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
result=clip(tbl.text, model_id='openai/clip-vit-base-patch32')
)
```
The same would work with an image column `tbl.image` in place of `tbl.text`.
## udf cross\_encoder()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
cross_encoder(
sentences1: pxt.String,
sentences2: pxt.String,
*,
model_id: pxt.String
) -> pxt.Float
```
Performs predicts on the given sentence pair.
`model_id` should be a pretrained Cross-Encoder model, as described in the
[Cross-Encoder Pretrained Models](https://www.sbert.net/docs/cross_encoder/pretrained_models.html)
documentation.
**Requirements:**
* `pip install torch sentence-transformers`
**Parameters:**
* **`sentences1`** (`pxt.String`): The first sentence to be paired.
* **`sentences2`** (`pxt.String`): The second sentence to be paired.
* **`model_id`** (`pxt.String`): The identifier of the cross-encoder model to use.
**Returns:**
* `pxt.Float`: The similarity score between the inputs.
**Examples:**
Add a computed column that applies the model `ms-marco-MiniLM-L-4-v2` to the sentences in
columns `tbl.sentence1` and `tbl.sentence2`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
result=sentence_transformer(
tbl.sentence1, tbl.sentence2, model_id='ms-marco-MiniLM-L-4-v2'
)
)
```
## udf detr\_for\_object\_detection()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
detr_for_object_detection(
image: pxt.Image,
*,
model_id: pxt.String,
threshold: pxt.Float = 0.5,
revision: pxt.String | None = None
) -> DetrForObjectDetectionResponse
```
Computes DETR object detections for the specified image. `model_id` should be a reference to a pretrained
[DETR Model](https://huggingface.co/docs/transformers/model_doc/detr).
**Requirements:**
* `pip install torch transformers`
**Parameters:**
* **`image`** (`pxt.Image`): The image to embed.
* **`model_id`** (`pxt.String`): The pretrained model to use for object detection.
* **`threshold`** (`pxt.Float`): Confidence threshold for filtering detections.
* **`revision`** (`pxt.String | None`): The specific model revision to use (e.g., a branch, tag, or git identifier). If not specified,
uses the default revision for the model (typically `'main'`).
**Returns:**
* `DetrForObjectDetectionResponse`: A dictionary containing the output of the object detection model, in the following format:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{
# list of confidence scores for each detected object
'scores': [0.99, 0.999],
# list of COCO class labels for each detected object
'labels': [25, 25],
# corresponding text names of class labels
'label_text': ['giraffe', 'giraffe'],
# list of bounding boxes for each detected object, as [x1, y1, x2, y2]
'boxes': [
[51.942, 356.174, 181.481, 413.975],
[383.225, 58.66, 605.64, 361.346],
],
}
```
**Examples:**
Add a computed column that applies the model `facebook/detr-resnet-50` to an existing
Pixeltable column `image` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
detections=detr_for_object_detection(
tbl.image, model_id='facebook/detr-resnet-50', threshold=0.8
)
)
```
## udf detr\_for\_segmentation()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
detr_for_segmentation(
image: pxt.Image,
*,
model_id: pxt.String,
threshold: pxt.Float = 0.5
) -> pxt.Json
```
Computes DETR panoptic segmentation for the specified image. `model_id` should be a reference to a pretrained
[DETR Model](https://huggingface.co/docs/transformers/model_doc/detr) with a segmentation head.
**Requirements:**
* `pip install torch transformers timm`
**Parameters:**
* **`image`** (`pxt.Image`): The image to segment.
* **`model_id`** (`pxt.String`): The pretrained model to use for segmentation (e.g., 'facebook/detr-resnet-50-panoptic').
* **`threshold`** (`pxt.Float`): Confidence threshold for filtering segments.
**Returns:**
* `pxt.Json`: A dictionary containing the output of the segmentation model, in the following format:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{
'segmentation': np.ndarray, # (H, W) array where each pixel value is a segment ID
'segments_info': [
{
'id': 1, # segment ID (matches pixel values in segmentation array)
'label_id': 0, # class label index
'label_text': 'person', # human-readable class name
'score': 0.98, # confidence score
'was_fused': False, # whether segment was fused from multiple instances
},
...,
],
}
```
**Examples:**
Add a computed column that applies the model `facebook/detr-resnet-50-panoptic` to an existing
Pixeltable column `image` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
segmentation=detr_for_segmentation(
tbl.image,
model_id='facebook/detr-resnet-50-panoptic',
threshold=0.5,
)
)
```
## udf detr\_to\_coco()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
detr_to_coco(image: pxt.Image, detr_info: pxt.Json) -> pxt.Json
```
Converts the output of a DETR object detection model to COCO format.
**Parameters:**
* **`image`** (`pxt.Image`): The image for which detections were computed.
* **`detr_info`** (`pxt.Json`): The output of a DETR object detection model, as returned by `detr_for_object_detection`.
**Returns:**
* `pxt.Json`: A dictionary containing the data from `detr_info`, converted to COCO format.
**Examples:**
Add a computed column that converts the output `tbl.detections` to COCO format, where `tbl.image`
is the image for which detections were computed:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
detections_coco=detr_to_coco(tbl.image, tbl.detections)
)
```
## udf image\_captioning()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
image_captioning(
image: pxt.Image,
*,
model_id: pxt.String,
model_kwargs: pxt.Json | None = None
) -> pxt.String
```
Generates captions for images using a pretrained image captioning model. `model_id` should be a reference to a
pretrained [image-to-text model](https://huggingface.co/models?pipeline_tag=image-to-text) such as BLIP,
Git, or LLaVA.
**Requirements:**
* `pip install torch transformers`
**Parameters:**
* **`image`** (`pxt.Image`): The image to caption.
* **`model_id`** (`pxt.String`): The pretrained model to use for captioning.
* **`model_kwargs`** (`pxt.Json | None`): Additional keyword arguments to pass to the model's `generate` method, such as `max_length`.
**Returns:**
* `pxt.String`: The generated caption text.
**Examples:**
Add a computed column `caption` to an existing table `tbl` that generates captions using the
`Salesforce/blip-image-captioning-base` model:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
caption=image_captioning(
tbl.image,
model_id='Salesforce/blip-image-captioning-base',
model_kwargs={'max_length': 30},
)
)
```
## udf image\_to\_image()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
image_to_image(
image: pxt.Image,
prompt: pxt.String,
*,
model_id: pxt.String,
seed: pxt.Int | None = None,
model_kwargs: pxt.Json | None = None
) -> pxt.Image
```
Transforms input images based on text prompts using a pretrained image-to-image model.
`model_id` should be a reference to a pretrained
[image-to-image model](https://huggingface.co/models?pipeline_tag=image-to-image) such as
Stable Diffusion.
**Requirements:**
* `pip install torch transformers diffusers accelerate`
**Parameters:**
* **`image`** (`pxt.Image`): The input image to transform.
* **`prompt`** (`pxt.String`): The text prompt describing the desired transformation.
* **`model_id`** (`pxt.String`): The pretrained image-to-image model to use.
* **`seed`** (`pxt.Int | None`): Random seed for reproducibility.
* **`model_kwargs`** (`pxt.Json | None`): Additional keyword arguments to pass to the model, such as `strength`,
`guidance_scale`, or `num_inference_steps`.
**Returns:**
* `pxt.Image`: The transformed image.
**Examples:**
Add a computed column that transforms images based on prompts:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
transformed=image_to_image(
tbl.source_image,
tbl.transformation_prompt,
model_id='stable-diffusion-v1-5/stable-diffusion-v1-5',
)
)
```
With custom transformation strength:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
transformed=image_to_image(
tbl.source_image,
tbl.transformation_prompt,
model_id='stable-diffusion-v1-5/stable-diffusion-v1-5',
model_kwargs={'strength': 0.75, 'num_inference_steps': 50},
)
)
```
## udf image\_to\_video()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
image_to_video(
image: pxt.Image,
*,
model_id: pxt.String,
num_frames: pxt.Int = 25,
fps: pxt.Int = 6,
seed: pxt.Int | None = None,
model_kwargs: pxt.Json | None = None
) -> pxt.Video
```
Generates videos from input images using a pretrained image-to-video model.
`model_id` should be a reference to a pretrained
[image-to-video model](https://huggingface.co/models?pipeline_tag=image-to-video).
**Requirements:**
* `pip install torch transformers diffusers accelerate`
**Parameters:**
* **`image`** (`pxt.Image`): The input image to animate into a video.
* **`model_id`** (`pxt.String`): The pretrained image-to-video model to use.
* **`num_frames`** (`pxt.Int`): Number of video frames to generate.
* **`fps`** (`pxt.Int`): Frames per second for the output video.
* **`seed`** (`pxt.Int | None`): Random seed for reproducibility.
* **`model_kwargs`** (`pxt.Json | None`): Additional keyword arguments to pass to the model, such as `num_inference_steps`,
`motion_bucket_id`, or `guidance_scale`.
**Returns:**
* `pxt.Video`: The generated video file.
**Examples:**
Add a computed column that creates videos from images:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
video=image_to_video(
tbl.input_image,
model_id='stabilityai/stable-video-diffusion-img2vid-xt',
num_frames=25,
fps=7,
)
)
```
## udf question\_answering()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
question_answering(
context: pxt.String,
question: pxt.String,
*,
model_id: pxt.String
) -> pxt.Json
```
Answers questions based on provided context using a pretrained QA model. `model_id` should be a reference to a
pretrained [question answering model](https://huggingface.co/models?pipeline_tag=question-answering) such as
BERT or RoBERTa.
**Requirements:**
* `pip install torch transformers`
**Parameters:**
* **`context`** (`pxt.String`): The context text containing the answer.
* **`question`** (`pxt.String`): The question to answer.
* **`model_id`** (`pxt.String`): The pretrained QA model to use.
**Returns:**
* `pxt.Json`: A dictionary containing the answer, confidence score, and start/end positions.
**Examples:**
Add a computed column that answers questions based on document context:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
answer=question_answering(
tbl.document_text,
tbl.question,
model_id='deepset/roberta-base-squad2',
)
)
```
## udf sentence\_transformer()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
sentence_transformer(
sentence: pxt.String,
*,
model_id: pxt.String,
normalize_embeddings: pxt.Bool = False
) -> pxt.Array[(None,), float32]
```
Computes sentence embeddings. `model_id` should be a pretrained Sentence Transformers model, as described
in the [Sentence Transformers Pretrained Models](https://sbert.net/docs/sentence_transformer/pretrained_models.html)
documentation.
**Requirements:**
* `pip install torch sentence-transformers`
**Parameters:**
* **`sentence`** (`pxt.String`): The sentence to embed.
* **`model_id`** (`pxt.String`): The pretrained model to use for the encoding.
* **`normalize_embeddings`** (`pxt.Bool`): If `True`, normalizes embeddings to length 1; see the
[Sentence Transformers API Docs](https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html)
for more details
**Returns:**
* `pxt.Array[(None,), float32]`: An array containing the output of the embedding model.
**Examples:**
Add a computed column that applies the model `all-mpnet-base-2` to an existing Pixeltable column `tbl.sentence`
of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
result=sentence_transformer(
tbl.sentence, model_id='all-mpnet-base-v2'
)
)
```
## udf speech2text\_for\_conditional\_generation()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
speech2text_for_conditional_generation(
audio: pxt.Audio,
*,
model_id: pxt.String,
language: pxt.String | None = None
) -> pxt.String
```
Transcribes or translates speech to text using a Speech2Text model. `model_id` should be a reference to a
pretrained [Speech2Text](https://huggingface.co/docs/transformers/en/model_doc/speech_to_text) model.
**Requirements:**
* `pip install torch torchaudio sentencepiece transformers`
**Parameters:**
* **`audio`** (`pxt.Audio`): The audio clip to transcribe or translate.
* **`model_id`** (`pxt.String`): The pretrained model to use for the transcription or translation.
* **`language`** (`pxt.String | None`): If using a multilingual translation model, the language code to translate to. If not provided,
the model's default language will be used. If the model is not translation model, is not a
multilingual model, or does not support the specified language, an error will be raised.
**Returns:**
* `pxt.String`: The transcribed or translated text.
**Examples:**
Add a computed column that applies the model `facebook/s2t-small-librispeech-asr` to an existing
Pixeltable column `audio` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
transcription=speech2text_for_conditional_generation(
tbl.audio, model_id='facebook/s2t-small-librispeech-asr'
)
)
```
Add a computed column that applies the model `facebook/s2t-medium-mustc-multilingual-st` to an existing
Pixeltable column `audio` of the table `tbl`, translating the audio to French:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
translation=speech2text_for_conditional_generation(
tbl.audio,
model_id='facebook/s2t-medium-mustc-multilingual-st',
language='fr',
)
)
```
## udf summarization()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
summarization(
text: pxt.String,
*,
model_id: pxt.String,
model_kwargs: pxt.Json | None = None
) -> pxt.String
```
Summarizes text using a pretrained summarization model. `model_id` should be a reference to a pretrained
[summarization model](https://huggingface.co/models?pipeline_tag=summarization) such as BART, T5, or Pegasus.
**Requirements:**
* `pip install torch transformers`
**Parameters:**
* **`text`** (`pxt.String`): The text to summarize.
* **`model_id`** (`pxt.String`): The pretrained model to use for summarization.
* **`model_kwargs`** (`pxt.Json | None`): Additional keyword arguments to pass to the model's `generate` method, such as `max_length`.
**Returns:**
* `pxt.String`: The generated summary text.
**Examples:**
Add a computed column that summarizes documents:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
summary=text_summarization(
tbl.document_text,
model_id='facebook/bart-large-cnn',
max_length=100,
)
)
```
## udf text\_classification()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
text_classification(
text: pxt.String,
*,
model_id: pxt.String,
top_k: pxt.Int = 5
) -> pxt.Json[(Json, ...)]
```
Classifies text using a pretrained classification model. `model_id` should be a reference to a pretrained
[text classification model](https://huggingface.co/models?pipeline_tag=text-classification)
such as BERT, RoBERTa, or DistilBERT.
**Requirements:**
* `pip install torch transformers`
**Parameters:**
* **`text`** (`pxt.String`): The text to classify.
* **`model_id`** (`pxt.String`): The pretrained model to use for classification.
* **`top_k`** (`pxt.Int`): The number of top predictions to return.
**Returns:**
* `pxt.Json[(Json, ...)]`: A dictionary containing classification results with scores, labels, and label text.
**Examples:**
Add a computed column for sentiment analysis:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
sentiment=text_classification(
tbl.review_text,
model_id='cardiffnlp/twitter-roberta-base-sentiment-latest',
)
)
```
## udf text\_generation()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
text_generation(
text: pxt.String,
*,
model_id: pxt.String,
model_kwargs: pxt.Json | None = None
) -> pxt.String
```
Generates text using a pretrained language model. `model_id` should be a reference to a pretrained
[text generation model](https://huggingface.co/models?pipeline_tag=text-generation).
**Requirements:**
* `pip install torch transformers`
**Parameters:**
* **`text`** (`pxt.String`): The input text to continue/complete.
* **`model_id`** (`pxt.String`): The pretrained model to use for text generation.
* **`model_kwargs`** (`pxt.Json | None`): Additional keyword arguments to pass to the model's `generate` method, such as `max_length`,
`temperature`, etc. See the
[Hugging Face text\_generation documentation](https://huggingface.co/docs/inference-providers/en/tasks/text-generation)
for details.
**Returns:**
* `pxt.String`: The generated text completion.
**Examples:**
Add a computed column that generates text completions using the `Qwen/Qwen3-0.6B` model:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
completion=text_generation(
tbl.prompt,
model_id='Qwen/Qwen3-0.6B',
model_kwargs={'temperature': 0.5, 'max_length': 150},
)
)
```
## udf text\_to\_image()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
text_to_image(
prompt: pxt.String,
*,
model_id: pxt.String,
height: pxt.Int = 512,
width: pxt.Int = 512,
seed: pxt.Int | None = None,
model_kwargs: pxt.Json | None = None
) -> pxt.Image
```
Generates images from text prompts using a pretrained text-to-image model. `model_id` should be a reference to a
pretrained [text-to-image model](https://huggingface.co/models?pipeline_tag=text-to-image) such as
Stable Diffusion.
**Requirements:**
* `pip install torch transformers diffusers accelerate`
**Parameters:**
* **`prompt`** (`pxt.String`): The text prompt describing the desired image.
* **`model_id`** (`pxt.String`): The pretrained text-to-image model to use.
* **`height`** (`pxt.Int`): Height of the generated image in pixels.
* **`width`** (`pxt.Int`): Width of the generated image in pixels.
* **`seed`** (`pxt.Int | None`): Optional random seed for reproducibility.
* **`model_kwargs`** (`pxt.Json | None`): Additional keyword arguments to pass to the model, such as `num_inference_steps`,
`guidance_scale`, or `negative_prompt`.
**Returns:**
* `pxt.Image`: The generated Image.
**Examples:**
Add a computed column that generates images from text prompts:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
generated_image=text_to_image(
tbl.prompt,
model_id='stable-diffusion-v1.5/stable-diffusion-v1-5',
height=512,
width=512,
model_kwargs={'num_inference_steps': 25},
)
)
```
## udf text\_to\_speech()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
text_to_speech(
text: pxt.String,
*,
model_id: pxt.String,
speaker_id: pxt.Int | None = None,
vocoder: pxt.String | None = None
) -> pxt.Audio
```
Converts text to speech using a pretrained TTS model. `model_id` should be a reference to a
pretrained [text-to-speech model](https://huggingface.co/models?pipeline_tag=text-to-speech).
**Requirements:**
* `pip install torch transformers datasets soundfile`
**Parameters:**
* **`text`** (`pxt.String`): The text to convert to speech.
* **`model_id`** (`pxt.String`): The pretrained TTS model to use.
* **`speaker_id`** (`pxt.Int | None`): Speaker ID for multi-speaker models.
* **`vocoder`** (`pxt.String | None`): Optional vocoder model for higher quality audio.
**Returns:**
* `pxt.Audio`: The generated audio file.
**Examples:**
Add a computed column that converts text to speech:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
audio=text_to_speech(
tbl.text_content, model_id='microsoft/speecht5_tts', speaker_id=0
)
)
```
## udf token\_classification()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
token_classification(
text: pxt.String,
*,
model_id: pxt.String,
aggregation_strategy: pxt.String = 'simple'
) -> pxt.Json[(Json, ...)]
```
Extracts named entities from text using a pretrained named entity recognition (NER) model.
`model_id` should be a reference to a pretrained
[token classification model](https://huggingface.co/models?pipeline_tag=token-classification) for NER.
**Requirements:**
* `pip install torch transformers`
**Parameters:**
* **`text`** (`pxt.String`): The text to analyze for named entities.
* **`model_id`** (`pxt.String`): The pretrained model to use.
* **`aggregation_strategy`** (`pxt.String`): Method used to aggregate tokens.
**Returns:**
* `pxt.Json[(Json, ...)]`: A list of dictionaries containing entity information (text, label, confidence, start, end).
**Examples:**
Add a computed column that extracts named entities:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
entities=token_classification(
tbl.text,
model_id='dbmdz/bert-large-cased-finetuned-conll03-english',
)
)
```
## udf translation()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
translation(
text: pxt.String,
*,
model_id: pxt.String,
src_lang: pxt.String | None = None,
target_lang: pxt.String | None = None
) -> pxt.String
```
Translates text using a pretrained translation model. `model_id` should be a reference to a pretrained
[translation model](https://huggingface.co/models?pipeline_tag=translation) such as MarianMT or T5.
**Requirements:**
* `pip install torch transformers sentencepiece`
**Parameters:**
* **`text`** (`pxt.String`): The text to translate.
* **`model_id`** (`pxt.String`): The pretrained translation model to use.
* **`src_lang`** (`pxt.String | None`): Source language code (optional, can be inferred from model).
* **`target_lang`** (`pxt.String | None`): Target language code (optional, can be inferred from model).
**Returns:**
* `pxt.String`: The translated text.
**Examples:**
Add a computed column that translates text:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
french_text=translation(
tbl.english_text,
model_id='Helsinki-NLP/opus-mt-en-fr',
src_lang='en',
target_lang='fr',
)
)
```
## udf vit\_for\_image\_classification()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
vit_for_image_classification(
image: pxt.Image,
*,
model_id: pxt.String,
top_k: pxt.Int = 5
) -> pxt.Json
```
Computes image classifications for the specified image using a Vision Transformer (ViT) model.
`model_id` should be a reference to a pretrained [ViT Model](https://huggingface.co/docs/transformers/en/model_doc/vit).
**Note:** Be sure the model is a ViT model that is trained for image classification (that is, a model designed for
use with the
[ViTForImageClassification](https://huggingface.co/docs/transformers/en/model_doc/vit#transformers.ViTForImageClassification)
class), such as `google/vit-base-patch16-224`. General feature-extraction models such as
`google/vit-base-patch16-224-in21k` will not produce the desired results.
**Requirements:**
* `pip install torch transformers`
**Parameters:**
* **`image`** (`pxt.Image`): The image to classify.
* **`model_id`** (`pxt.String`): The pretrained model to use for the classification.
* **`top_k`** (`pxt.Int`): The number of classes to return.
**Returns:**
* `pxt.Json`: A dictionary containing the output of the image classification model, in the following format:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{
# list of probabilities of the top-k most likely classes
'scores': [0.325, 0.198, 0.105],
# list of class IDs for the top-k most likely classes
'labels': [340, 353, 386],
# corresponding text names of the top-k most likely classes
'label_text': [
'zebra',
'gazelle',
'African elephant, Loxodonta africana',
],
}
```
**Examples:**
Add a computed column that applies the model `google/vit-base-patch16-224` to an existing
Pixeltable column `image` of the table `tbl`, returning the 10 most likely classes for each image:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
image_class=vit_for_image_classification(
tbl.image, model_id='google/vit-base-patch16-224', top_k=10
)
)
```
# image
Source: https://docs.pixeltable.com/sdk/latest/image
# module pixeltable.functions.image
Pixeltable UDFs for `ImageType`.
Example:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
t = pxt.get_table(...)
t.select(t.img_col.convert('L')).collect()
```
## iterator tile\_iterator()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.iterator
tile_iterator(
image: pxt.Image,
tile_size: pxt.Json[(Int, Int)],
*,
overlap: pxt.Json[(Int, Int)] = (0, 0)
)
```
Iterator over tiles of an image. Each image will be divided into tiles of size `tile_size`, and the tiles will be
iterated over in row-major order (left-to-right, then top-to-bottom). An optional `overlap` parameter may be
specified. If the tiles do not exactly cover the image, then the rightmost and bottommost tiles will be padded with
blackspace, so that the output images all have the exact size `tile_size`.
**Outputs**:
One row per tile, with the following columns:
* `tile` (`pxt.Image`): The image tile
* `tile_coord` (`pxt.Json`): The (x, y) coordinates of the tile in the grid of tiles
* `tile_box` (`pxt.Json`): The (x1, y1, x2, y2) pixel coordinates of the tile in the original image
**Parameters:**
* **`image`** (`pxt.Image`): Image to split into tiles.
* **`tile_size`** (`pxt.Json[(Int`): Size of each tile, as a pair of integers `(width, height)`.
* **`overlap`** (`Any`): Amount of overlap between adjacent tiles, as a pair of integers `(width, height)`.
**Examples:**
This example assumes an existing table `tbl` with a column `img` of type `pxt.Image`.
Create a view that splits all images into 256x256 tiles with 32 pixels of overlap:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.create_view(
'image_tiles',
tbl,
iterator=tile_iterator(
tbl.img, tile_size=(256, 256), overlap=(32, 32)
),
)
```
## udf alpha\_composite()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
alpha_composite(im1: pxt.Image, im2: pxt.Image) -> pxt.Image
```
Alpha composite `im2` over `im1`.
Equivalent to [`PIL.Image.alpha_composite()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.alpha_composite)
## udf b64\_encode()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
b64_encode(
img: pxt.Image,
image_format: pxt.String = 'png'
) -> pxt.String
```
Convert image to a base64-encoded string.
**Parameters:**
* **`img`** (`pxt.Image`): image
* **`image_format`** (`pxt.String`): image format [supported by PIL](https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html#fully-supported-formats)
## udf blend()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
blend(
im1: pxt.Image,
im2: pxt.Image,
alpha: pxt.Float
) -> pxt.Image
```
Return a new image by interpolating between two input images, using a constant alpha.
Equivalent to [`PIL.Image.blend()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.blend)
## udf composite()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
composite(
image1: pxt.Image,
image2: pxt.Image,
mask: pxt.Image
) -> pxt.Image
```
Return a composite image by blending two images using a mask.
Equivalent to [`PIL.Image.composite()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.composite)
## udf convert()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
convert(self: pxt.Image, mode: pxt.String) -> pxt.Image
```
Convert the image to a different mode.
Equivalent to
[`PIL.Image.Image.convert()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.convert).
**Parameters:**
* **`mode`** (`pxt.String`): The mode to convert to. See the
[Pillow documentation](https://pillow.readthedocs.io/en/stable/handbook/concepts.html#concept-modes)
for a list of supported modes.
## udf crop()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
crop(
self: pxt.Image,
box: pxt.Json[(Int, Int, Int, Int)]
) -> pxt.Image
```
Return a rectangular region from the image. The box is a 4-tuple defining the left, upper, right, and lower pixel
coordinates.
Equivalent to
[`PIL.Image.Image.crop()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.crop)
## udf effect\_spread()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
effect_spread(self: pxt.Image, distance: pxt.Int) -> pxt.Image
```
Randomly spread pixels in an image.
Equivalent to
[`PIL.Image.Image.effect_spread()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.effect_spread)
**Parameters:**
* **`distance`** (`pxt.Int`): The distance to spread pixels.
## udf entropy()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
entropy(
self: pxt.Image,
mask: pxt.Image | None = None,
extrema: pxt.Json[(Json, ...)] | None = None
) -> pxt.Float
```
Returns the entropy of the image, optionally using a mask and extrema.
Equivalent to
[`PIL.Image.Image.entropy()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.entropy)
**Parameters:**
* **`mask`** (`pxt.Image | None`): An optional mask image.
* **`extrema`** (`pxt.Json[(Json`): An optional list of extrema.
## udf get\_metadata()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
get_metadata(self: pxt.Image) -> pxt.Json
```
Return metadata for the image.
## udf getbands()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
getbands(self: pxt.Image) -> pxt.Json[(String, ...)]
```
Return a tuple containing the names of the image bands.
Equivalent to
[`PIL.Image.Image.getbands()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.getbands)
## udf getbbox()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
getbbox(
self: pxt.Image,
*,
alpha_only: pxt.Bool = True
) -> pxt.Json[(Int, Int, Int, Int)] | None
```
Return a bounding box for the non-zero regions of the image.
Equivalent to [`PIL.Image.Image.getbbox()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.getbbox)
**Parameters:**
* **`alpha_only`** (`pxt.Bool`): If `True`, and the image has an alpha channel, trim transparent pixels. Otherwise,
trim pixels when all channels are zero.
## udf getchannel()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
getchannel(self: pxt.Image, channel: pxt.Int) -> pxt.Image
```
Return an L-mode image containing a single channel of the original image.
Equivalent to
[`PIL.Image.Image.getchannel()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.getchannel)
**Parameters:**
* **`channel`** (`pxt.Int`): The channel to extract. This is a 0-based index.
## udf getcolors()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
getcolors(
self: pxt.Image,
maxcolors: pxt.Int = 256
) -> pxt.Json[(Json[(Int, Int)], ...)]
```
Return a list of colors used in the image, up to a maximum of `maxcolors`.
Equivalent to
[`PIL.Image.Image.getcolors()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.getcolors)
**Parameters:**
* **`maxcolors`** (`pxt.Int`): The maximum number of colors to return.
## udf getextrema()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
getextrema(self: pxt.Image) -> pxt.Json[(Int, Int)]
```
Return a 2-tuple containing the minimum and maximum pixel values of the image.
Equivalent to
[`PIL.Image.Image.getextrema()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.getextrema)
## udf getpalette()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
getpalette(
self: pxt.Image,
mode: pxt.String | None = None
) -> pxt.Json[(Int, ...)] | None
```
Return the palette of the image, optionally converting it to a different mode.
Equivalent to
[`PIL.Image.Image.getpalette()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.getpalette)
**Parameters:**
* **`mode`** (`pxt.String | None`): The mode to convert the palette to.
## udf getpixel()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
getpixel(
self: pxt.Image,
xy: pxt.Json[(Json, ...)]
) -> pxt.Json[(Int,)]
```
Return the pixel value at the given position. The position `xy` is a tuple containing the x and y coordinates.
Equivalent to
[`PIL.Image.Image.getpixel()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.getpixel)
**Parameters:**
* **`xy`** (`pxt.Json[(Json`): The coordinates, given as (x, y).
## udf getprojection()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
getprojection(self: pxt.Image) -> pxt.Json[(Json[(Int, ...)], Json[(Int, ...)])]
```
Return two sequences representing the horizontal and vertical projection of the image.
Equivalent to
[`PIL.Image.Image.getprojection()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.getprojection)
## udf height()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
height(self: pxt.Image) -> pxt.Int
```
Return the height of the image.
## udf histogram()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
histogram(
self: pxt.Image,
mask: pxt.Image | None = None,
extrema: pxt.Json[(Json, ...)] | None = None
) -> pxt.Json[(Int, ...)]
```
Return a histogram for the image.
Equivalent to
[`PIL.Image.Image.histogram()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.histogram)
**Parameters:**
* **`mask`** (`pxt.Image | None`): An optional mask image.
* **`extrema`** (`pxt.Json[(Json`): An optional list of extrema.
## udf mode()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
mode(self: pxt.Image) -> pxt.String
```
Return the image mode.
## udf point()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
point(
self: pxt.Image,
lut: pxt.Json[(Int, ...)],
mode: pxt.String | None = None
) -> pxt.Image
```
Map image pixels through a lookup table.
Equivalent to
[`PIL.Image.Image.point()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.point)
**Parameters:**
* **`lut`** (`pxt.Json[(Int`): A lookup table.
## udf quantize()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
quantize(
self: pxt.Image,
colors: pxt.Int = 256,
method: pxt.Int | None = None,
kmeans: pxt.Int = 0,
palette: pxt.Image | None = None,
dither: pxt.Int =
) -> pxt.Image
```
Convert the image to 'P' mode with the specified number of colors.
Equivalent to
[`PIL.Image.Image.quantize()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.quantize)
**Parameters:**
* **`colors`** (`pxt.Int`): The number of colors to quantize to.
* **`method`** (`pxt.Int | None`): The quantization method. See the
[Pillow documentation](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.quantize)
for a list of supported methods.
* **`kmeans`** (`pxt.Int`): The number of k-means clusters to use.
* **`palette`** (`pxt.Image | None`): The palette to use.
* **`dither`** (`pxt.Int`): The dithering method. See the
[Pillow documentation](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.quantize)
for a list of supported methods.
## udf reduce()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
reduce(
self: pxt.Image,
factor: pxt.Int,
box: pxt.Json[(Int, Int, Int, Int)] | None = None
) -> pxt.Image
```
Reduce the image by the given factor.
Equivalent to
[`PIL.Image.Image.reduce()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.reduce)
**Parameters:**
* **`factor`** (`pxt.Int`): The reduction factor.
* **`box`** (`pxt.Json[(Int`): An optional 4-tuple of ints providing the source image region to be reduced. The values must be within
(0, 0, width, height) rectangle. If omitted or None, the entire source is used.
## udf resize()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
resize(self: pxt.Image, size: pxt.Json[(Int, Int)]) -> pxt.Image
```
Return a resized copy of the image. The size parameter is a tuple containing the width and height of the new image.
Equivalent to
[`PIL.Image.Image.resize()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.resize)
## udf rotate()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
rotate(self: pxt.Image, angle: pxt.Int) -> pxt.Image
```
Return a copy of the image rotated by the given angle.
Equivalent to
[`PIL.Image.Image.rotate()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.rotate)
**Parameters:**
* **`angle`** (`pxt.Int`): The angle to rotate the image, in degrees. Positive angles are counter-clockwise.
## udf thumbnail()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
thumbnail(
self: pxt.Image,
size: pxt.Json[(Int, Int)],
resample: pxt.Int = ,
reducing_gap: pxt.Float | None = 2.0
) -> pxt.Image
```
Create a thumbnail of the image.
Equivalent to
[`PIL.Image.Image.thumbnail()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.thumbnail)
**Parameters:**
* **`size`** (`pxt.Json[(Int`): The size of the thumbnail, as a tuple of (width, height).
* **`resample`** (`Any`): The resampling filter to use. See the
[Pillow documentation](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.thumbnail)
for a list of supported filters.
* **`reducing_gap`** (`Any`): The reducing gap to use.
## udf to\_video()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
to_video(
image: pxt.Image,
*,
duration: pxt.Float,
fps: pxt.Int = 24,
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Video
```
Convert a still image into a video of a specified duration with ffmpeg's `-loop` option.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`image`** (`pxt.Image`): Input image to convert to video.
* **`duration`** (`pxt.Float`): Duration of the output video in seconds.
* **`fps`** (`pxt.Int`): Frames per second for the output video.
* **`video_encoder`** (`pxt.String | None`): Video encoder to use. If not specified, uses the default encoder.
* **`video_encoder_args`** (`pxt.Json | None`): Additional arguments to pass to the video encoder.
**Returns:**
* `pxt.Video`: A video displaying the input image for the specified duration.
**Examples:**
Create a 5-second video from an image:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.image.to_video(duration=5.0)).collect()
```
Create a 10-second video at 30 fps from a rotated image:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
tbl.image.rotate(180).to_video(duration=10.0, fps=30)
).collect()
```
## udf transpose()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
transpose(self: pxt.Image, method: pxt.Int) -> pxt.Image
```
Transpose the image.
Equivalent to
[`PIL.Image.Image.transpose()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.transpose)
**Parameters:**
* **`method`** (`pxt.Int`): The transpose method. See the
[Pillow documentation](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.transpose)
for a list of supported methods.
## udf width()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
width(self: pxt.Image) -> pxt.Int
```
Return the width of the image.
# IndexMetadata
Source: https://docs.pixeltable.com/sdk/latest/indexmetadata
# class pixeltable.IndexMetadata
Metadata for an index on a Pixeltable table.
## attr columns
```
columns: list[str]
```
The table columns that are indexed.
## attr index\_type
```
index_type: Literal['embedding', 'btree']
```
The type of index. New types may be added in the future.
## attr name
```
name: str
```
The name of the index.
## attr parameters
```
parameters: EmbeddingIndexParams | None
```
Parameters specific to the index type. `None` for index types without parameters.
# io
Source: https://docs.pixeltable.com/sdk/latest/io
# module pixeltable.io
Functions for importing and exporting Pixeltable data.
## func create\_label\_studio\_project()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
create_label_studio_project(
t: Table,
label_config: str,
name: str | None = None,
title: str | None = None,
media_import_method: Literal['post', 'file', 'url'] = 'post',
col_mapping: dict[str, str] | None = None,
sync_immediately: bool = True,
s3_configuration: dict[str, Any] | None = None,
**kwargs: Any
) -> UpdateStatus
```
Create a new Label Studio project and link it to the specified [`Table`](./table).
* A tutorial notebook with fully worked examples can be found here:
[Using Label Studio for Annotations with Pixeltable](https://docs.pixeltable.com/notebooks/integrations/using-label-studio-with-pixeltable)
The required parameter `label_config` specifies the Label Studio project configuration,
in XML format, as described in the Label Studio documentation. The linked project will
have one column for each data field in the configuration; for example, if the
configuration has an entry
```
```
then the linked project will have a column named `image`. In addition, the linked project
will always have a JSON-typed column `annotations` representing the output.
By default, Pixeltable will link each of these columns to a column of the specified [`Table`](./table)
with the same name. If any of the data fields are missing, an exception will be raised. If
the `annotations` column is missing, it will be created. The default names can be overridden
by specifying an optional `col_mapping`, with Pixeltable column names as keys and Label
Studio field names as values. In all cases, the Pixeltable columns must have types that are
consistent with their corresponding Label Studio fields; otherwise, an exception will be raised.
The API key and URL for a valid Label Studio server must be specified in Pixeltable config. Either:
* Set the `LABEL_STUDIO_API_KEY` and `LABEL_STUDIO_URL` environment variables; or
* Specify `api_key` and `url` fields in the `label-studio` section of `$PIXELTABLE_HOME/config.toml`.
**Requirements:**
* `pip install label-studio-sdk`
* `pip install boto3` (if using S3 import storage)
**Parameters:**
* **`t`** (`Table`): The table to link to.
* **`label_config`** (`str`): The Label Studio project configuration, in XML format.
* **`name`** (`str | None`): An optional name for the new project in Pixeltable. If specified, must be a valid
Pixeltable identifier and must not be the name of any other external data store
linked to `t`. If not specified, a default name will be used of the form
`ls_project_0`, `ls_project_1`, etc.
* **`title`** (`str | None`): An optional title for the Label Studio project. This is the title that annotators
will see inside Label Studio. Unlike `name`, it does not need to be an identifier and
does not need to be unique. If not specified, the table name `t.name` will be used.
* **`media_import_method`** (`Literal['post', 'file', 'url']`, default: `'post'`): The method to use when transferring media files to Label Studio:
* `post`: Media will be sent to Label Studio via HTTP post. This should generally only be used for
prototyping; due to restrictions in Label Studio, it can only be used with projects that have
just one data field, and does not scale well.
* `file`: Media will be sent to Label Studio as a file on the local filesystem. This method can be
used if Pixeltable and Label Studio are running on the same host.
* `url`: Media will be sent to Label Studio as externally accessible URLs. This method cannot be
used with local media files or with media generated by computed columns.
The default is `post`.
* **`col_mapping`** (`dict[str, str] | None`): An optional mapping of local column names to Label Studio fields.
* **`sync_immediately`** (`bool`, default: `True`): If `True`, immediately perform an initial synchronization by
exporting all rows of the table as Label Studio tasks.
* **`s3_configuration`** (`dict[str, Any] | None`): If specified, S3 import storage will be configured for the new project. This can only
be used with `media_import_method='url'`, and if `media_import_method='url'` and any of the media data is
referenced by `s3://` URLs, then it must be specified in order for such media to display correctly
in the Label Studio interface.
The items in the `s3_configuration` dictionary correspond to kwarg
parameters of the Label Studio `connect_s3_import_storage` method, as described in the
[Label Studio connect\_s3\_import\_storage docs](https://labelstud.io/sdk/project.html#label_studio_sdk.project.Project.connect_s3_import_storage).
`bucket` must be specified; all other parameters are optional. If credentials are not specified explicitly,
Pixeltable will attempt to retrieve them from the environment (such as from `~/.aws/credentials`).
If a title is not specified, Pixeltable will use the default `'Pixeltable-S3-Import-Storage'`.
All other parameters use their Label Studio defaults.
* **`kwargs`** (`Any`): Additional keyword arguments are passed to the `start_project` method in the Label
Studio SDK, as described in the
[Label Studio start\_project docs](https://labelstud.io/sdk/project.html#label_studio_sdk.project.Project.start_project).
**Returns:**
* `UpdateStatus`: An `UpdateStatus` representing the status of any synchronization operations that occurred.
**Examples:**
Create a Label Studio project whose tasks correspond to videos stored in the `video_col`
column of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
config = """
"""
create_label_studio_project(tbl, config)
```
Create a Label Studio project with the same configuration, using `media_import_method='url'`,
whose media are stored in an S3 bucket:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
create_label_studio_project(
tbl,
config,
media_import_method='url',
s3_configuration={'bucket': 'my-bucket', 'region_name': 'us-east-2'},
)
```
## func export\_csv()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
export_csv(
table_or_query: pxt.Table | pxt.Query,
file_path: str | Path,
*,
delimiter: str = ',',
quoting: int = 0
) -> None
```
Exports a query result or table to a CSV file.
Pixeltable column types are mapped to CSV values as follows:
* String, Int, Float, Bool: native CSV representation
* Timestamp, Date: ISO 8601 string representation
* UUID: string representation
* Json: JSON-encoded string
* Array: JSON-encoded string (via `tolist()`)
* Binary: excluded from export (not representable in CSV)
* Image, Video, Audio, Document: file path or URL string
**Parameters:**
* **`table_or_query`** (`pxt.Table | pxt.Query`): Table or Query to export.
* **`file_path`** (`str | Path`): Path to the output CSV file.
* **`delimiter`** (`str`, default: `','`): Field delimiter character. Default `','`.
* **`quoting`** (`int`, default: `0`): CSV quoting style (a `csv.QUOTE_*` constant). Default `csv.QUOTE_MINIMAL`.
## func export\_images\_as\_fo\_dataset()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
export_images_as_fo_dataset(
tbl: pxt.Table,
images: exprs.Expr,
image_format: str = 'webp',
classifications: exprs.Expr | list[exprs.Expr] | dict[str, exprs.Expr] | None = None,
detections: exprs.Expr | list[exprs.Expr] | dict[str, exprs.Expr] | None = None
) -> fo.Dataset
```
Export images from a Pixeltable table as a Voxel51 dataset. The data must consist of a single column
(or expression) containing image data, along with optional additional columns containing labels. Currently, only
classification and detection labels are supported.
The [Working with Voxel51 in Pixeltable](https://docs.pixeltable.com/examples/vision/voxel51) tutorial contains a
fully worked example showing how to export data from a Pixeltable table and load it into Voxel51.
Images in the dataset that already exist on disk will be exported directly, in whatever format they
are stored in. Images that are not already on disk (such as frames extracted using a
[`frame_iterator`](./video#iterator-frame_iterator)) will first be written to disk in the specified
`image_format`.
The label parameters accept one or more sets of labels of each type. If a single `Expr` is provided, then it will
be exported as a single set of labels with a default name such as `classifications`.
(The single set of labels may still containing multiple individual labels; see below.)
If a list of `Expr`s is provided, then each one will be exported as a separate set of labels with a default name
such as `classifications`, `classifications_1`, etc. If a dictionary of `Expr`s is provided, then each entry will
be exported as a set of labels with the specified name.
**Requirements:**
* `pip install fiftyone`
**Parameters:**
* **`tbl`** (`pxt.Table`): The table from which to export data.
* **`images`** (`exprs.Expr`): A column or expression that contains the images to export.
* **`image_format`** (`str`, default: `'webp'`): The format to use when writing out images for export.
* **`classifications`** (`exprs.Expr | list[exprs.Expr] | dict[str, exprs.Expr] | None`): Optional image classification labels. If a single `Expr` is provided, it must be a table
column or an expression that evaluates to a list of dictionaries. Each dictionary in the list corresponds
to an image class and must have the following structure:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{'label': 'zebra', 'confidence': 0.325}
```
If multiple `Expr`s are provided, each one must evaluate to a list of such dictionaries.
* **`detections`** (`exprs.Expr | list[exprs.Expr] | dict[str, exprs.Expr] | None`): Optional image detection labels. If a single `Expr` is provided, it must be a table column or an
expression that evaluates to a list of dictionaries. Each dictionary in the list corresponds to an image
detection, and must have the following structure:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{
'label': 'giraffe',
'confidence': 0.99,
# [x, y, w, h], fractional coordinates
'bounding_box': [0.081, 0.836, 0.202, 0.136],
}
```
If multiple `Expr`s are provided, each one must evaluate to a list of such dictionaries.
**Returns:**
* `'fo.Dataset'`: A Voxel51 dataset.
**Examples:**
Export the images in the `image` column of the table `tbl` as a Voxel51 dataset, using classification
labels from `tbl.classifications`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
export_images_as_fo_dataset(
tbl, tbl.image, classifications=tbl.classifications
)
```
See the [Working with Voxel51 in Pixeltable](https://docs.pixeltable.com/examples/vision/voxel51) tutorial
for a fully worked example.
## func export\_json()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
export_json(
table_or_query: pxt.Table | pxt.Query,
file_path: str | Path
) -> None
```
Exports a query result or table to a JSONL file.
Pixeltable column types are mapped to JSON values as follows:
* String: string
* Int: number
* Float: number
* Bool: boolean
* Timestamp: ISO 8601 string
* Date: ISO 8601 string
* UUID: string
* Json: native JSON value (object, array, etc.)
* Array: nested JSON array (via `tolist()`)
* Binary: excluded from export (not representable in JSON)
* Image, Video, Audio, Document: file path or URL string
**Parameters:**
* **`table_or_query`** (`pxt.Table | pxt.Query`): Table or Query to export.
* **`file_path`** (`str | Path`): Path to the output JSONL file.
## func export\_lancedb()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
export_lancedb(
table_or_query: pxt.Table | pxt.Query,
db_uri: Path,
table_name: str,
batch_size_bytes: int = 134217728,
if_exists: Literal['error', 'overwrite', 'append'] = 'error'
) -> None
```
Exports a Query's data to a LanceDB table.
This utilizes LanceDB's streaming interface for efficient table creation, via a sequence of in-memory pyarrow
`RecordBatches`, the size of which can be controlled with the `batch_size_bytes` parameter.
**Requirements:**
* `pip install lancedb pylance`
**Parameters:**
* **`table_or_query `** (`Any`): Table or Query to export.
* **`db_uri`** (`Path`): Local Path to the LanceDB database.
* **`table_name `** (`Any`): Name of the table in the LanceDB database.
* **`batch_size_bytes `** (`Any`): Maximum size in bytes for each batch.
* **`if_exists`** (`Literal['error', 'overwrite', 'append']`, default: `'error'`): Determines the behavior if the table already exists. Must be one of the following:
* `'error'`: raise an error
* `'overwrite'`: overwrite the existing table
* `'append'`: append to the existing table
## func export\_parquet()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
export_parquet(
table_or_query: pxt.Table | pxt.Query,
parquet_path: Path,
partition_size_bytes: int = 100000000,
inline_images: bool = False,
_write_md: bool = False
) -> None
```
Exports a query result or table to one or more Parquet files. Requires pyarrow to be installed.
Pixeltable column types are mapped to Parquet types as follows:
* String: string
* Int: int64
* Float: float32
* Bool: bool
* Timestamp: timestamp\[us, tz=UTC]
* Date: date32
* UUID: uuid
* Binary: binary
* Image: binary (when `inline_images=True`)
* Audio, Video, Document: string (file paths)
* Array (requires shape to be known):
* fixed\_shape\_tensor for fixed-shape arrays
* list for ragged arrays (one or more dimensions are None)
* Json: struct
* Schema is inferred from data via `pyarrow.infer_type()`
* Fields that contain empty dicts cannot be mapped to a Parquet type and will result in an exception
**Parameters:**
* **`table_or_query `** (`Any`): Table or Query to export.
* **`parquet_path `** (`Any`): Path to directory to write the parquet files to.
* **`partition_size_bytes `** (`Any`): The maximum target size for each chunk. Default 100\_000\_000 bytes.
* **`inline_images `** (`Any`): If True, images are stored inline in the parquet file. This is useful
for small images, to be imported as pytorch dataset. But can be inefficient
for large images, and cannot be imported into pixeltable.
If False, will raise an error if the Query has any image column.
Default False.
## func import\_csv()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import_csv(
tbl_name: str,
filepath_or_buffer: str | os.PathLike,
schema_overrides: dict[str, Any] | None = None,
primary_key: str | list[str] | None = None,
comment: str | None = None,
**kwargs: Any
) -> pxt.Table
```
Creates a new base table from a csv file. This is a convenience method and is equivalent
to calling `import_pandas(table_path, pd.read_csv(filepath_or_buffer, **kwargs), schema=schema)`.
See the Pandas documentation for [`read_csv`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)
for more details.
**Returns:**
* `pxt.Table`: A handle to the newly created [`Table`](./table).
## func import\_excel()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import_excel(
tbl_name: str,
io: str | os.PathLike,
*,
schema_overrides: dict[str, Any] | None = None,
primary_key: str | list[str] | None = None,
comment: str = '',
**kwargs: Any
) -> pxt.Table
```
Creates a new base table from an Excel (.xlsx) file. This is a convenience method and is
equivalent to calling `import_pandas(table_path, pd.read_excel(io, *args, **kwargs), schema=schema)`.
See the Pandas documentation for [`read_excel`](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html)
for more details.
**Returns:**
* `pxt.Table`: A handle to the newly created [`Table`](./table).
## func import\_huggingface\_dataset()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import_huggingface_dataset(
table_path: str,
dataset: datasets.Dataset | datasets.DatasetDict | datasets.IterableDataset | datasets.IterableDatasetDict,
*,
schema_overrides: dict[str, Any] | None = None,
primary_key: str | list[str] | None = None,
**kwargs: Any
) -> pxt.Table
```
Create a new base table from a Huggingface dataset, or dataset dict with multiple splits.
Requires `datasets` library to be installed.
HuggingFace feature types are mapped to Pixeltable column types as follows:
* `Value(bool)`: `Bool`
`Value(int*/uint*)`: `Int`
`Value(float*)`: `Float`
`Value(string/large_string)`: `String`
`Value(timestamp*)`: `Timestamp`
`Value(date*)`: `Date`
* `ClassLabel`: `String` (converted to label names)
* `Sequence`/`LargeList` of numeric types: `Array`
* `Sequence`/`LargeList` of string: `Json`
* `Sequence`/`LargeList` of dicts: `Json`
* `Array2D`-`Array5D`: `Array` (preserves shape)
* `Image`: `Image`
* `Audio`: `Audio`
* `Video`: `Video`
* `Translation`/`TranslationVariableLanguages`: `Json`
**Parameters:**
* **`table_path`** (`str`): Path to the table.
* **`dataset`** (`datasets.Dataset | datasets.DatasetDict | datasets.IterableDataset | datasets.IterableDatasetDict`): An instance of any of the Huggingface dataset classes:
[`datasets.Dataset`](https://huggingface.co/docs/datasets/en/package_reference/main_classes#datasets.Dataset),
[`datasets.DatasetDict`](https://huggingface.co/docs/datasets/en/package_reference/main_classes#datasets.DatasetDict),
[`datasets.IterableDataset`](https://huggingface.co/docs/datasets/en/package_reference/main_classes#datasets.IterableDataset),
[`datasets.IterableDatasetDict`](https://huggingface.co/docs/datasets/en/package_reference/main_classes#datasets.IterableDatasetDict)
* **`schema_overrides`** (`dict[str, Any] | None`): If specified, then for each (name, type) pair in `schema_overrides`, the column with
name `name` will be given type `type`, instead of being inferred from the `Dataset` or `DatasetDict`.
The keys in `schema_overrides` should be the column names of the `Dataset` or `DatasetDict` (whether or not
they are valid Pixeltable identifiers).
* **`primary_key`** (`str | list[str] | None`): The primary key of the table (see [`create_table()`](./pixeltable#func-create_table)).
* **`kwargs`** (`Any`): Additional arguments to pass to `create_table`.
An argument of `column_name_for_split` must be provided if the source is a DatasetDict.
This column name will contain the split information. If None, no split information will be stored.
**Returns:**
* `pxt.Table`: A handle to the newly created [`Table`](./table).
## func import\_json()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import_json(
tbl_path: str,
filepath_or_url: str,
*,
schema_overrides: dict[str, Any] | None = None,
primary_key: str | list[str] | None = None,
comment: str | None = None,
**kwargs: Any
) -> pxt.Table
```
Creates a new base table from a JSON file. This is a convenience method and is
equivalent to calling [`create_table()`](./pixeltable#func-create_table) with
`pxt.create_table(tbl_path, source=filepath_or_url, extra_args=kwargs, ...)`.
The contents of `filepath_or_url` are read and parsed as JSON internally (using `json.loads(**kwargs)`).
**Parameters:**
* **`tbl_path`** (`str`): The name of the table to create.
* **`filepath_or_url`** (`str`): The path or URL of the JSON file.
* **`schema_overrides`** (`dict[str, Any] | None`): If specified, then columns in `schema_overrides` will be given the specified types
(see [`import_rows()`](./io#func-import_rows)).
* **`primary_key`** (`str | list[str] | None`): The primary key of the table (see [`create_table()`](./pixeltable#func-create_table)).
* **`comment`** (`str | None`): A comment to attach to the table (see [`create_table()`](./pixeltable#func-create_table)).
* **`kwargs`** (`Any`): Additional keyword arguments to pass to `json.loads`.
**Returns:**
* `pxt.Table`: A handle to the newly created [`Table`](./table).
## func import\_pandas()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import_pandas(
tbl_name: str,
df: pd.DataFrame,
*,
schema_overrides: dict[str, Any] | None = None,
primary_key: str | list[str] | None = None,
comment: str = ''
) -> pxt.Table
```
Creates a new base table from a Pandas
[`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), with the
specified name. The schema of the table will be inferred from the DataFrame.
The column names of the new table will be identical to those in the DataFrame, as long as they are valid
Pixeltable identifiers. If a column name is not a valid Pixeltable identifier, it will be normalized according to
the following procedure:
* first replace any non-alphanumeric characters with underscores;
* then, preface the result with the letter 'c' if it begins with a number or an underscore;
* then, if there are any duplicate column names, suffix the duplicates with '\_2', '\_3', etc., in column order.
**Parameters:**
* **`tbl_name`** (`str`): The name of the table to create.
* **`df`** (`pd.DataFrame`): The Pandas `DataFrame`.
* **`schema_overrides`** (`dict[str, Any] | None`): If specified, then for each (name, type) pair in `schema_overrides`, the column with
name `name` will be given type `type`, instead of being inferred from the `DataFrame`. The keys in
`schema_overrides` should be the column names of the `DataFrame` (whether or not they are valid
Pixeltable identifiers).
**Returns:**
* `pxt.Table`: A handle to the newly created [`Table`](./table).
## func import\_parquet()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import_parquet(
table: str,
*,
parquet_path: str,
schema_overrides: dict[str, Any] | None = None,
primary_key: str | list[str] | None = None,
**kwargs: Any
) -> pxt.Table
```
Creates a new base table from a Parquet file or set of files. Requires pyarrow to be installed.
**Parameters:**
* **`table`** (`str`): Fully qualified name of the table to import the data into.
* **`parquet_path`** (`str`): Path to an individual Parquet file or directory of Parquet files.
* **`schema_overrides`** (`dict[str, Any] | None`): If specified, then for each (name, type) pair in `schema_overrides`, the column with
name `name` will be given type `type`, instead of being inferred from the Parquet dataset. The keys in
`schema_overrides` should be the column names of the Parquet dataset (whether or not they are valid
Pixeltable identifiers).
* **`primary_key`** (`str | list[str] | None`): The primary key of the table (see [`create_table()`](./pixeltable#func-create_table)).
* **`kwargs`** (`Any`): Additional arguments to pass to `create_table`.
**Returns:**
* `pxt.Table`: A handle to the newly created table.
## func import\_rows()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import_rows(
tbl_path: str,
rows: list[dict[str, Any]],
*,
schema_overrides: dict[str, Any] | None = None,
primary_key: str | list[str] | None = None,
comment: str = ''
) -> pxt.Table
```
Creates a new base table from a list of dictionaries. The dictionaries must be of the
form `{column_name: value, ...}`. Pixeltable will attempt to infer the schema of the table from the
supplied data, using the most specific type that can represent all the values in a column.
If `schema_overrides` is specified, then for each entry `(column_name, type)` in `schema_overrides`,
Pixeltable will force the specified column to the specified type (and will not attempt any type inference
for that column).
All column types of the new table will be nullable unless explicitly specified as non-nullable in
`schema_overrides`.
**Parameters:**
* **`tbl_path`** (`str`): The qualified name of the table to create.
* **`rows`** (`list[dict[str, Any]]`): The list of dictionaries to import.
* **`schema_overrides`** (`dict[str, Any] | None`): If specified, then columns in `schema_overrides` will be given the specified types
as described above.
* **`primary_key`** (`str | list[str] | None`): The primary key of the table (see [`create_table()`](./pixeltable#func-create_table)).
* **`comment`** (`str`, default: `''`): A comment to attach to the table (see [`create_table()`](./pixeltable#func-create_table)).
**Returns:**
* `pxt.Table`: A handle to the newly created [`Table`](./table).
# jina
Source: https://docs.pixeltable.com/sdk/latest/jina
# module pixeltable.functions.jina
Pixeltable [UDFs](https://docs.pixeltable.com/platform/udfs-in-pixeltable) that wrap [Jina AI](https://jina.ai/) APIs
for embeddings and reranking. In order to use them, the API key must be specified either with `JINA_API_KEY`
environment variable, or as `api_key` in the `jina` section of the Pixeltable config file.
## udf embeddings()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
embeddings(
input: pxt.String,
*,
model: pxt.String,
task: pxt.String | None = None,
dimensions: pxt.Int | None = None,
late_chunking: pxt.Bool | None = None
) -> pxt.Array[(None,), float32]
```
Creates embedding vectors for the input text using Jina AI embedding models.
Equivalent to the Jina AI embeddings API endpoint.
For additional details, see: [https://jina.ai/embeddings/](https://jina.ai/embeddings/)
Request throttling:
Applies the rate limit set in the config (section `jina`, key `rate_limit`). If no rate
limit is configured, uses a default of 600 RPM.
**Parameters:**
* **`input`** (`pxt.String`): The text to embed.
* **`model`** (`pxt.String`): The Jina embedding model to use. See available models at
[https://jina.ai/embeddings/](https://jina.ai/embeddings/).
* **`task`** (`pxt.String | None`): Task-specific embedding optimization. Options:
* `retrieval.query`: For search queries
* `retrieval.passage`: For documents/passages to be searched
* `separation`: For clustering/separation tasks
* `classification`: For classification tasks
* `text-matching`: For semantic similarity
* **`dimensions`** (`pxt.Int | None`): Output embedding dimensions (optional). If not specified, uses
the model's default dimension.
* **`late_chunking`** (`pxt.Bool | None`): Enable late chunking for long documents.
**Returns:**
* `pxt.Array[(None,), float32]`: An array representing the embedding of `input`.
**Examples:**
Add a computed column that applies jina-embeddings-v3 to an existing column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
embed=jina.embeddings(
tbl.text, model='jina-embeddings-v3', task='retrieval.passage'
)
)
```
Add an embedding index:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_embedding_index(
'text', string_embed=jina.embeddings.using(model='jina-embeddings-v3')
)
```
## udf rerank()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
rerank(
query: pxt.String,
documents: pxt.Json[(String, ...)],
*,
model: pxt.String,
top_n: pxt.Int | None = None,
return_documents: pxt.Bool | None = None
) -> pxt.Json
```
Reranks documents based on their relevance to a query using Jina AI reranker models.
Equivalent to the Jina AI rerank API endpoint.
For additional details, see: [https://jina.ai/reranker/](https://jina.ai/reranker/)
Request throttling:
Applies the rate limit set in the config (section `jina`, key `rate_limit`). If no rate
limit is configured, uses a default of 600 RPM.
**Parameters:**
* **`query`** (`pxt.String`): The query string to rank documents against.
* **`documents`** (`pxt.Json[(String`): The list of documents to rerank.
* **`model`** (`Any`): The Jina reranker model to use. See available models at
[https://jina.ai/reranker/](https://jina.ai/reranker/).
* **`top_n`** (`Any`): Number of top results to return. If not specified, returns all documents.
* **`return_documents`** (`Any`): Whether to include the original document text in results.
**Returns:**
* `pxt.Json`: A dictionary containing:
* `results`: List of reranking results with `index` and `relevance_score`
(and `document` if `return_documents=True`)
* `usage`: Token usage information
**Examples:**
Rerank search results for better relevance:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
reranked=jina.rerank(
tbl.query,
tbl.candidate_docs,
model='jina-reranker-v2-base-multilingual',
top_n=5,
)
)
```
# json
Source: https://docs.pixeltable.com/sdk/latest/json
# module pixeltable.functions.json
Pixeltable UDFs for `JsonType`.
Example:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
import pixeltable.functions as pxtf
t = pxt.get_table(...)
t.select(pxtf.json.make_list(t.json_col)).collect()
```
## iterator list\_iterator()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.iterator
list_iterator(
elements: pxt.Json[(Json, ...)] | None = None,
*,
mode: pxt.String = 'strict',
**kwargs
)
```
Iterator over elements of a list or lists. There are two distinct call patterns: either a single positional
argument; or one or more keyword arguments.
* If a single positional argument is specified, as in `list_iterator(t.col)`, then the elements of `t.col` must
contain lists of dictionaries with matching signatures (identical keys and compatible value types). The
iterator will yield one new column for each key in the dictionaries, and one output row per element in the
lists.
* If multiple keyword arguments are specified, as in `list_iterator(val_1=t.col_1, val_2=t.col_2)`, then the
elements of each input column must contain lists, but not necessarily lists of dictionaries. The iterator
will yield one new column for each keyword argument, zipping together the individual lists.
All of the inputs must be *typed* `Json` expressions. Untyped Json will be rejected (the type schema is
necessary in order for Pixeltable to determine the types of the output columns).
**Parameters:**
* **`elements`** (`pxt.Json[(Json`): A list of dictionaries to iterate over. The dictionary keys will be used as column names in the
output. Cannot be specified together with keyword arguments.
* **`mode`** (`Any`): Only applies when called with keyword arguments. Determines how to handle lists of different lengths:
* `'strict'`: Raises an error if the input lists have different lengths.
* `'truncated'`: Iterates until the shortest input list is exhausted, ignoring any remaining elements in
longer lists.
* `'padded'`: Iterates until the longest input list is exhausted, yielding `None` for any missing
elements from shorter lists.
* **`**kwargs`** (`Any`): One or more lists to iterate over. The kwarg names will be used as column names in the output.
Cannot be specified together with `elements`.
## uda make\_list()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.uda
make_list(*args, **kwargs) -> pxt.Json[(Json, ...)]
```
Collects arguments into a list.
## udf dumps()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
dumps(obj: pxt.Json) -> pxt.String
```
Serialize a JSON object to a string.
Equivalent to [`json.dumps()`](https://docs.python.org/3/library/json.html#json.dumps).
**Parameters:**
* **`obj`** (`pxt.Json`): A JSON-serializable object (dict, list, or scalar).
**Returns:**
* `pxt.String`: A JSON-formatted string.
# llama_cpp
Source: https://docs.pixeltable.com/sdk/latest/llama_cpp
# module pixeltable.functions.llama\_cpp
Pixeltable UDFs for llama.cpp models.
Provides integration with llama.cpp for running quantized language models locally,
supporting chat completions and embeddings with GGUF format models.
## udf create\_chat\_completion()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
create_chat_completion(
messages: pxt.Json[(Json, ...)],
*,
model_path: pxt.String | None = None,
repo_id: pxt.String | None = None,
repo_filename: pxt.String | None = None,
chat_format: pxt.String | None = None,
tools: pxt.Json[(Json, ...)] | None = None,
tool_choice: pxt.Json | None = None,
model_kwargs: pxt.Json | None = None
) -> pxt.Json
```
Generate a chat completion from a list of messages.
The model can be specified either as a local path, or as a repo\_id and repo\_filename that reference a pretrained
model on the Hugging Face model hub. Exactly one of `model_path` or `repo_id` must be provided; if `model_path`
is provided, then an optional `repo_filename` can also be specified.
For additional details, see the
[llama\_cpp create\_chat\_completion documentation](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#llama_cpp.Llama.create_chat_completion).
**Parameters:**
* **`messages`** (`pxt.Json[(Json`): A list of messages to generate a response for.
* **`model_path`** (`Any`): Path to the model (if using a local model).
* **`repo_id`** (`Any`): The Hugging Face model repo id (if using a pretrained model).
* **`repo_filename`** (`Any`): A filename or glob pattern to match the model file in the repo (optional, if using a
pretrained model).
* **`chat_format`** (`Any`): An optional string specifying the chat format to use with the model.
* **`tools`** (`Any`): An optional list of tools (functions) the model may call, specified as `pxt.func.tools.Tools`.
* **`tool_choice`** (`Any`): An optional `pxt.func.tools.ToolChoice` controlling which tool(s) the model should use.
* **`model_kwargs`** (`Any`): Additional keyword args for the llama\_cpp `create_chat_completion` API, such as `max_tokens`,
`temperature`, `top_p`, and `top_k`. For details, see the
[llama\_cpp create\_chat\_completion documentation](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#llama_cpp.Llama.create_chat_completion).
# math
Source: https://docs.pixeltable.com/sdk/latest/math
# module pixeltable.functions.math
Pixeltable UDFs for mathematical operations.
Example:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
t = pxt.get_table(...)
t.select(t.float_col.floor()).collect()
```
## udf abs()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
abs(self: pxt.Float) -> pxt.Float
```
Return the absolute value of the given number.
Equivalent to Python [`builtins.abs()`](https://docs.python.org/3/library/functions.html#abs).
## udf bitwise\_and()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
bitwise_and(self: pxt.Int, other: pxt.Int) -> pxt.Int
```
Bitwise AND of two integers.
Equivalent to Python
[`self & other`](https://docs.python.org/3/library/stdtypes.html#bitwise-operations-on-integer-types).
## udf bitwise\_or()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
bitwise_or(self: pxt.Int, other: pxt.Int) -> pxt.Int
```
Bitwise OR of two integers.
Equivalent to Python
[`self | other`](https://docs.python.org/3/library/stdtypes.html#bitwise-operations-on-integer-types).
## udf bitwise\_xor()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
bitwise_xor(self: pxt.Int, other: pxt.Int) -> pxt.Int
```
Bitwise XOR of two integers.
Equivalent to Python
[`self ^ other`](https://docs.python.org/3/library/stdtypes.html#bitwise-operations-on-integer-types).
## udf ceil()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
ceil(self: pxt.Float) -> pxt.Float
```
Return the ceiling of the given number.
Equivalent to Python [`float(math.ceil(self))`](https://docs.python.org/3/library/math.html#math.ceil) if `self`
is finite, or `self` itself if `self` is infinite. (This is slightly different from the default behavior of
`math.ceil(self)`, which always returns an `int` and raises an error if `self` is infinite. The behavior in
Pixeltable generalizes the Python operator and is chosen to align with the SQL standard.)
## udf floor()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
floor(self: pxt.Float) -> pxt.Float
```
Return the ceiling of the given number.
Equivalent to Python [`float(math.floor(self))`](https://docs.python.org/3/library/math.html#math.ceil) if `self`
is finite, or `self` itself if `self` is infinite. (This is slightly different from the default behavior of
`math.floor(self)`, which always returns an `int` and raises an error if `self` is infinite. The behavior of
Pixeltable generalizes the Python operator and is chosen to align with the SQL standard.)
## udf pow()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
pow(self: pxt.Int, other: pxt.Int) -> pxt.Float
```
Raise `self` to the power of `other`.
Equivalent to Python [`self ** other`](https://docs.python.org/3/library/functions.html#pow).
## udf round()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
round(
self: pxt.Float,
digits: pxt.Int | None = None
) -> pxt.Float
```
Round a number to a given precision in decimal digits.
Equivalent to Python [`builtins.round(self, digits or 0)`](https://docs.python.org/3/library/functions.html#round).
Note that if `digits` is not specified, the behavior matches `builtins.round(self, 0)` rather than
`builtins.round(self)`; this ensures that the return type is always `float` (as in SQL) rather than `int`.
# mistralai
Source: https://docs.pixeltable.com/sdk/latest/mistralai
# module pixeltable.functions.mistralai
Pixeltable UDFs
that wrap various endpoints from the Mistral AI API. In order to use them, you must
first `pip install mistralai` and configure your Mistral AI credentials, as described in
the [Working with Mistral AI](https://docs.pixeltable.com/notebooks/integrations/working-with-mistralai) tutorial.
## udf chat\_completions()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
chat_completions(
messages: pxt.Json[(Json, ...)],
*,
model: pxt.String,
model_kwargs: pxt.Json | None = None
) -> pxt.Json
```
Chat Completion API.
Equivalent to the Mistral AI `chat/completions` API endpoint.
For additional details, see: [https://docs.mistral.ai/api/#tag/chat](https://docs.mistral.ai/api/#tag/chat)
Request throttling:
Applies the rate limit set in the config (section `mistral`, key `rate_limit`). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install mistralai`
**Parameters:**
* **`messages`** (`pxt.Json[(Json`): The prompt(s) to generate completions for.
* **`model`** (`Any`): ID of the model to use. (See overview here: [https://docs.mistral.ai/getting-started/models/](https://docs.mistral.ai/getting-started/models/))
* **`model_kwargs`** (`Any`): Additional keyword args for the Mistral `chat/completions` API.
For details on the available parameters, see: [https://docs.mistral.ai/api/#tag/chat](https://docs.mistral.ai/api/#tag/chat)
**Returns:**
* `pxt.Json`: A dictionary containing the response and other metadata.
**Examples:**
Add a computed column that applies the model `mistral-latest-small`
to an existing Pixeltable column `tbl.prompt` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
messages = [{'role': 'user', 'content': tbl.prompt}]
tbl.add_computed_column(
response=completions(messages, model='mistral-latest-small')
)
```
## udf embeddings()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
embeddings(
input: pxt.String,
*,
model: pxt.String
) -> pxt.Array[(None,), float32]
```
Embeddings API.
Equivalent to the Mistral AI `embeddings` API endpoint.
For additional details, see: [https://docs.mistral.ai/api/#tag/embeddings](https://docs.mistral.ai/api/#tag/embeddings)
Request throttling:
Applies the rate limit set in the config (section `mistral`, key `rate_limit`). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install mistralai`
**Parameters:**
* **`input`** (`pxt.String`): Text to embed.
* **`model`** (`pxt.String`): ID of the model to use. (See overview here: [https://docs.mistral.ai/getting-started/models/](https://docs.mistral.ai/getting-started/models/))
**Returns:**
* `pxt.Array[(None,), float32]`: An array representing the application of the given embedding to `input`.
## udf fim\_completions()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
fim_completions(
prompt: pxt.String,
*,
model: pxt.String,
model_kwargs: pxt.Json | None = None
) -> pxt.Json
```
Fill-in-the-middle Completion API.
Equivalent to the Mistral AI `fim/completions` API endpoint.
For additional details, see: [https://docs.mistral.ai/api/#tag/fim](https://docs.mistral.ai/api/#tag/fim)
Request throttling:
Applies the rate limit set in the config (section `mistral`, key `rate_limit`). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install mistralai`
**Parameters:**
* **`prompt`** (`pxt.String`): The text/code to complete.
* **`model`** (`pxt.String`): ID of the model to use. (See overview here: [https://docs.mistral.ai/getting-started/models/](https://docs.mistral.ai/getting-started/models/))
* **`model_kwargs`** (`pxt.Json | None`): Additional keyword args for the Mistral `fim/completions` API.
For details on the available parameters, see: [https://docs.mistral.ai/api/#tag/fim](https://docs.mistral.ai/api/#tag/fim)
**Returns:**
* `pxt.Json`: A dictionary containing the response and other metadata.
**Examples:**
Add a computed column that applies the model `codestral-latest`
to an existing Pixeltable column `tbl.prompt` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
response=completions(tbl.prompt, model='codestral-latest')
)
```
# net
Source: https://docs.pixeltable.com/sdk/latest/net
# module pixeltable.functions.net
Pixeltable UDF for converting media file URIs to presigned HTTP URLs.
## udf presigned\_url()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
presigned_url(uri: pxt.String, expiration_seconds: pxt.Int) -> pxt.String
```
Convert a blob storage URI to a presigned HTTP URL for direct access.
Generates a time-limited, publicly accessible URL from cloud storage URIs
(S3, GCS, Azure, etc.) that can be used to serve media files over HTTP.
Note:
This function uses presigned URLs from storage providers. Provider-specific
limitations apply:
* Google Cloud Storage: maximum 7-day expiration
* AWS S3: requires proper region configuration
* Azure: subject to storage account access policies
**Parameters:**
* **`uri`** (`pxt.String`): The media file URI (e.g., `s3://bucket/path`, `gs://bucket/path`, `azure://container/path`)
* **`expiration_seconds`** (`pxt.Int`): How long the URL remains valid
**Returns:**
* `pxt.String`: A presigned HTTP URL for accessing the file
**Examples:**
Generate a presigned URL for a video column with 1-hour expiration:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
original_url=tbl.video.fileurl,
presigned_url=pxtf.net.presigned_url(tbl.video.fileurl, 3600),
).collect()
```
# ollama
Source: https://docs.pixeltable.com/sdk/latest/ollama
# module pixeltable.functions.ollama
Pixeltable UDFs for Ollama local models.
Provides integration with Ollama for running large language models locally,
including chat completions and embeddings.
## udf chat()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
chat(
messages: pxt.Json[(Json, ...)],
*,
model: pxt.String,
tools: pxt.Json[(Json, ...)] | None = None,
format: pxt.String | None = None,
options: pxt.Json | None = None
) -> pxt.Json
```
Generate the next message in a chat with a provided model.
**Parameters:**
* **`messages`** (`pxt.Json[(Json`): The messages of the chat.
* **`model`** (`Any`): The model name.
* **`tools`** (`Any`): Tools for the model to use.
* **`format`** (`Any`): The format of the response; must be one of `'json'` or `None`.
* **`options`** (`Any`): Additional options to pass to the `chat` call, such as `max_tokens`, `temperature`, `top_p`, and
`top_k`. For details, see the
[Valid Parameters and Values](https://github.com/ollama/ollama/blob/main/docs/modelfile.mdx#valid-parameters-and-values)
section of the Ollama documentation.
## udf embed()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
embed(
input: pxt.String,
*,
model: pxt.String,
truncate: pxt.Bool = True,
options: pxt.Json | None = None
) -> pxt.Array[(None,), float32]
```
Generate embeddings from a model.
**Parameters:**
* **`input`** (`pxt.String`): The input text to generate embeddings for.
* **`model`** (`pxt.String`): The model name.
* **`truncate`** (`pxt.Bool`): Truncates the end of each input to fit within context length.
Returns error if false and context length is exceeded.
* **`options`** (`pxt.Json | None`): Additional options to pass to the `embed` call.
For details, see the
[Valid Parameters and Values](https://github.com/ollama/ollama/blob/main/docs/modelfile.mdx#valid-parameters-and-values)
section of the Ollama documentation.
## udf generate()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
generate(
prompt: pxt.String,
*,
model: pxt.String,
suffix: pxt.String = '',
system: pxt.String = '',
template: pxt.String = '',
context: pxt.Json[(Int, ...)] | None = None,
raw: pxt.Bool = False,
format: pxt.String | None = None,
options: pxt.Json | None = None
) -> pxt.Json
```
Generate a response for a given prompt with a provided model.
**Parameters:**
* **`prompt`** (`pxt.String`): The prompt to generate a response for.
* **`model`** (`pxt.String`): The model name.
* **`suffix`** (`pxt.String`): The text after the model response.
* **`format`** (`Any`): The format of the response; must be one of `'json'` or `None`.
* **`system`** (`pxt.String`): System message.
* **`template`** (`pxt.String`): Prompt template to use.
* **`context`** (`pxt.Json[(Int`): The context parameter returned from a previous call to `generate()`.
* **`raw`** (`Any`): If `True`, no formatting will be applied to the prompt.
* **`options`** (`Any`): Additional options for the Ollama `chat` call, such as `max_tokens`, `temperature`, `top_p`, and
`top_k`. For details, see the
[Valid Parameters and Values](https://github.com/ollama/ollama/blob/main/docs/modelfile.mdx#valid-parameters-and-values)
section of the Ollama documentation.
# openai
Source: https://docs.pixeltable.com/sdk/latest/openai
# module pixeltable.functions.openai
Pixeltable UDFs
that wrap various endpoints from the OpenAI API. In order to use them, you must
first `pip install openai` and configure your OpenAI credentials, as described in
the [Working with OpenAI](https://docs.pixeltable.com/notebooks/integrations/working-with-openai) tutorial.
## func invoke\_tools()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
invoke_tools(
tools: pixeltable.func.tools.Tools,
response: pixeltable.exprs.expr.Expr
) -> pixeltable.exprs.inline_expr.InlineDict
```
Converts an OpenAI response dict to Pixeltable tool invocation format and calls `tools._invoke()`.
## udf chat\_completions()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
chat_completions(
messages: pxt.Json[(Json, ...)],
*,
model: pxt.String,
model_kwargs: pxt.Json | None = None,
tools: pxt.Json[(Json, ...)] | None = None,
tool_choice: pxt.Json | None = None
) -> pxt.Json
```
Creates a model response for the given chat conversation.
Equivalent to the OpenAI `chat/completions` API endpoint.
For additional details, see: [https://platform.openai.com/docs/guides/chat-completions](https://platform.openai.com/docs/guides/chat-completions)
Request throttling:
Uses the rate limit-related headers returned by the API to throttle requests adaptively, based on available
request and token capacity. No configuration is necessary.
**Requirements:**
* `pip install openai`
**Parameters:**
* **`messages`** (`pxt.Json[(Json`): A list of messages to use for chat completion, as described in the OpenAI API documentation.
* **`model`** (`Any`): The model to use for chat completion.
* **`model_kwargs`** (`Any`): Additional keyword args for the OpenAI `chat/completions` API. For details on the available
parameters, see: [https://platform.openai.com/docs/api-reference/chat/create](https://platform.openai.com/docs/api-reference/chat/create)
**Returns:**
* `pxt.Json`: A dictionary containing the response and other metadata.
**Examples:**
Add a computed column that applies the model `gpt-4o-mini` to an existing Pixeltable column `tbl.prompt`
of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': tbl.prompt},
]
tbl.add_computed_column(
response=chat_completions(messages, model='gpt-4o-mini')
)
```
You can also include images in the messages list by passing image data directly in the input dictionary, in
the `'image_url'` field of the message content, as in this example:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
messages = [
{
'role': 'user',
'content': [
{'type': 'text', 'text': "What's in this image?"},
{'type': 'image_url', 'image_url': tbl.image},
],
}
]
tbl.add_computed_column(
response=chat_completions(messages, model='gpt-4o-mini')
)
```
## udf embeddings()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
embeddings(
input: pxt.String,
*,
model: pxt.String,
model_kwargs: pxt.Json | None = None
) -> pxt.Array[(None,), float32]
```
Creates an embedding vector representing the input text.
Equivalent to the OpenAI `embeddings` API endpoint.
For additional details, see: [https://platform.openai.com/docs/guides/embeddings](https://platform.openai.com/docs/guides/embeddings)
Request throttling:
Uses the rate limit-related headers returned by the API to throttle requests adaptively, based on available
request and token capacity. No configuration is necessary.
**Requirements:**
* `pip install openai`
**Parameters:**
* **`input`** (`pxt.String`): The text to embed.
* **`model`** (`pxt.String`): The model to use for the embedding.
* **`model_kwargs`** (`pxt.Json | None`): Additional keyword args for the OpenAI `embeddings` API. For details on the available
parameters, see: [https://platform.openai.com/docs/api-reference/embeddings](https://platform.openai.com/docs/api-reference/embeddings)
**Returns:**
* `pxt.Array[(None,), float32]`: An array representing the application of the given embedding to `input`.
**Examples:**
Add a computed column that applies the model `text-embedding-3-small` to an existing
Pixeltable column `tbl.text` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
embed=embeddings(tbl.text, model='text-embedding-3-small')
)
```
Add an embedding index to an existing column `text`, using the model `text-embedding-3-small`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_embedding_index(
embedding=embeddings.using(model='text-embedding-3-small')
)
```
## udf image\_edits()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
image_edits(
image: pxt.Image,
*,
prompt: pxt.String,
model: pxt.String,
mask: pxt.Image | None = None,
model_kwargs: pxt.Json | None = None
) -> pxt.Json
```
Creates an edited or extended image given a source image and a prompt.
Equivalent to the OpenAI `images/edits` API endpoint.
For additional details, see: [https://developers.openai.com/api/docs/guides/image-generation#edit-images](https://developers.openai.com/api/docs/guides/image-generation#edit-images)
Request throttling:
Applies the rate limit set in the config (section `openai.rate_limits`; use the model id as the key). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install openai`
**Parameters:**
* **`image`** (`pxt.Image`): The source image to edit.
* **`prompt`** (`pxt.String`): A text description of the desired edit.
* **`model`** (`pxt.String`): The model to use for image editing.
* **`mask`** (`pxt.Image | None`): An optional mask image. See: [https://developers.openai.com/api/reference/resources/images/methods/edit](https://developers.openai.com/api/reference/resources/images/methods/edit)
* **`model_kwargs`** (`pxt.Json | None`): Additional keyword args for the OpenAI `images/edits` API. For details on the available
parameters, see: [https://developers.openai.com/api/reference/resources/images/methods/edit](https://developers.openai.com/api/reference/resources/images/methods/edit)
**Returns:**
* `pxt.Json`: A dictionary containing the edited image data. Images will be deserialized into `PIL.Image.Image` objects,
and the result dictionary will have the following form:
```json theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{
"created": 1234567890,
"data": [
PIL.Image.Image(...),
...
],
"usage":
}
```
**Examples:**
Edit an image with a text prompt:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
edited=image_edits(
tbl.source_image,
prompt='Add a sunset background',
model='gpt-image-2',
)
)
```
Edit an image with a mask to specify the edit area:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
edited=image_edits(
tbl.source_image,
mask=tbl.mask_image,
prompt='Replace with a beach scene',
model='gpt-image-2',
)
)
```
## udf image\_generations()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
image_generations(
prompt: pxt.String,
*,
model: pxt.String,
model_kwargs: pxt.Json | None = None
) -> pxt.Json
```
Creates an image given a prompt.
Equivalent to the OpenAI `images/generations` API endpoint.
For additional details, see: [https://platform.openai.com/docs/guides/images](https://platform.openai.com/docs/guides/images)
Request throttling:
Applies the rate limit set in the config (section `openai.rate_limits`; use the model id as the key). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install openai`
**Parameters:**
* **`prompt`** (`pxt.String`): Prompt for the image.
* **`model`** (`pxt.String`): The model to use for the generations. See the OpenAI docs for supported models.
* **`model_kwargs`** (`pxt.Json | None`): Additional keyword args for the OpenAI `images/generations` API. For details on the available
parameters, see: [https://developers.openai.com/api/reference/resources/images/methods/generate](https://developers.openai.com/api/reference/resources/images/methods/generate)
**Returns:**
* `pxt.Json`: A dictionary containing the generated image data. Images will be deserialized into `PIL.Image.Image` objects,
and the result dictionary will have the following form:
```json theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{
"created": 1234567890,
"data": [
PIL.Image.Image(...),
PIL.Image.Image(...),
...
],
"usage":
}
```
**Examples:**
Add a computed column that applies the model `dall-e-2` to an existing
Pixeltable column `tbl.text` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
gen_image=image_generations(tbl.text, model='dall-e-2')
)
```
Generate an image using the `gpt-image-2` model:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
gen_image=image_generations(tbl.text, model='gpt-image-2')
)
```
## udf image\_variations()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
image_variations(
image: pxt.Image,
*,
model: pxt.String,
model_kwargs: pxt.Json | None = None
) -> pxt.Json
```
Creates a variation of a given image.
Equivalent to the OpenAI `images/variations` API endpoint.
For additional details, see: [https://developers.openai.com/api/docs/guides/image-generation#image-variations](https://developers.openai.com/api/docs/guides/image-generation#image-variations)
Request throttling:
Applies the rate limit set in the config (section `openai.rate_limits`; use the model id as the key). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install openai`
**Parameters:**
* **`image`** (`pxt.Image`): The source image to create a variation of.
* **`model`** (`pxt.String`): The model to use for creating variations.
* **`model_kwargs`** (`pxt.Json | None`): Additional keyword args for the OpenAI `images/variations` API. For details on the available
parameters, see: [https://developers.openai.com/api/reference/resources/images/methods/create\_variation](https://developers.openai.com/api/reference/resources/images/methods/create_variation)
**Returns:**
* `pxt.Json`: A dictionary containing the variation image data. Images will be deserialized into `PIL.Image.Image` objects,
and the result dictionary will have the following form:
```json theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{
"created": 1234567890,
"data": [
PIL.Image.Image(...),
...
]
}
```
**Examples:**
Generate a variation of an existing image:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
variation=image_variations(tbl.source_image, model='dall-e-2')
)
```
## udf moderations()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
moderations(
input: pxt.String,
*,
model: pxt.String = 'omni-moderation-latest'
) -> pxt.Json
```
Classifies if text is potentially harmful.
Equivalent to the OpenAI `moderations` API endpoint.
For additional details, see: [https://platform.openai.com/docs/guides/moderation](https://platform.openai.com/docs/guides/moderation)
Request throttling:
Applies the rate limit set in the config (section `openai.rate_limits`; use the model id as the key). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install openai`
**Parameters:**
* **`input`** (`pxt.String`): Text to analyze with the moderations model.
* **`model`** (`pxt.String`): The model to use for moderations.
**Returns:**
* `pxt.Json`: Details of the moderations results.
**Examples:**
Add a computed column that applies the default moderation model to an existing Pixeltable column `tbl.input`
of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(moderations=moderations(tbl.input))
```
## udf responses()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
responses(
input: pxt.Json[(Json, ...)],
*,
model: pxt.String,
model_kwargs: pxt.Json | None = None,
tools: pxt.Json[(Json, ...)] | None = None,
tool_choice: pxt.Json | None = None
) -> pxt.Json
```
Creates a model response for the given input.
Equivalent to the OpenAI `responses` API endpoint.
For additional details, see: [https://developers.openai.com/api/docs/guides/migrate-to-responses](https://developers.openai.com/api/docs/guides/migrate-to-responses)
Request throttling:
Uses the rate limit-related headers returned by the API to throttle requests adaptively, based on available
request and token capacity. No configuration is necessary.
**Requirements:**
* `pip install openai`
**Parameters:**
* **`input`** (`pxt.Json[(Json`): A list of input items for the model, as described in the OpenAI API documentation.
* **`model`** (`Any`): The model to use for generating a response.
* **`model_kwargs`** (`Any`): Additional keyword args for the OpenAI `responses` API. For details on the available
parameters, see: [https://developers.openai.com/api/docs/api-reference/responses/create](https://developers.openai.com/api/docs/api-reference/responses/create)
Common options include `instructions`, `temperature`, `max_output_tokens`, `reasoning`, `text`
(for structured outputs), `previous_response_id` (for multi-turn), and `store`.
**Returns:**
* `pxt.Json`: A dictionary containing the response and other metadata. The response text is accessible via
the `output_text` field for simple cases, or through the `output` list for structured access.
**Examples:**
Add a computed column that applies the model `gpt-4o-mini` to an existing Pixeltable column `tbl.prompt`
of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
messages = [{'role': 'user', 'content': tbl.prompt}]
tbl.add_computed_column(
response=responses(
messages,
model='gpt-4o-mini',
model_kwargs={'instructions': 'You are a helpful assistant.'},
)
)
```
You can also include images in the input list by passing image data directly, using the Responses API
`input_image` content type:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
messages = [
{
'role': 'user',
'content': [
{'type': 'input_text', 'text': "What's in this image?"},
{'type': 'input_image', 'image_url': tbl.image},
],
}
]
tbl.add_computed_column(response=responses(messages, model='gpt-4o-mini'))
```
## udf speech()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
speech(
input: pxt.String,
*,
model: pxt.String,
voice: pxt.String,
model_kwargs: pxt.Json | None = None
) -> pxt.Audio
```
Generates audio from the input text.
Equivalent to the OpenAI `audio/speech` API endpoint.
For additional details, see: [https://platform.openai.com/docs/guides/text-to-speech](https://platform.openai.com/docs/guides/text-to-speech)
Request throttling:
Applies the rate limit set in the config (section `openai.rate_limits`; use the model id as the key). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install openai`
**Parameters:**
* **`input`** (`pxt.String`): The text to synthesize into speech.
* **`model`** (`pxt.String`): The model to use for speech synthesis.
* **`voice`** (`pxt.String`): The voice profile to use for speech synthesis. Supported options include:
`alloy`, `echo`, `fable`, `onyx`, `nova`, and `shimmer`.
* **`model_kwargs`** (`pxt.Json | None`): Additional keyword args for the OpenAI `audio/speech` API. For details on the available
parameters, see: [https://platform.openai.com/docs/api-reference/audio/createSpeech](https://platform.openai.com/docs/api-reference/audio/createSpeech)
**Returns:**
* `pxt.Audio`: An audio file containing the synthesized speech.
**Examples:**
Add a computed column that applies the model `tts-1` to an existing Pixeltable column `tbl.text`
of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
audio=speech(tbl.text, model='tts-1', voice='nova')
)
```
## udf transcriptions()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
transcriptions(
audio: pxt.Audio,
*,
model: pxt.String,
model_kwargs: pxt.Json | None = None
) -> pxt.Json
```
Transcribes audio into the input language.
Equivalent to the OpenAI `audio/transcriptions` API endpoint.
For additional details, see: [https://platform.openai.com/docs/guides/speech-to-text](https://platform.openai.com/docs/guides/speech-to-text)
Request throttling:
Applies the rate limit set in the config (section `openai.rate_limits`; use the model id as the key). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install openai`
**Parameters:**
* **`audio`** (`pxt.Audio`): The audio to transcribe.
* **`model`** (`pxt.String`): The model to use for speech transcription.
* **`model_kwargs`** (`pxt.Json | None`): Additional keyword args for the OpenAI `audio/transcriptions` API. For details on the available
parameters, see: [https://platform.openai.com/docs/api-reference/audio/createTranscription](https://platform.openai.com/docs/api-reference/audio/createTranscription)
**Returns:**
* `pxt.Json`: A dictionary containing the transcription and other metadata.
**Examples:**
Add a computed column that applies the model `whisper-1` to an existing Pixeltable column `tbl.audio`
of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
transcription=transcriptions(
tbl.audio, model='whisper-1', language='en'
)
)
```
## udf translations()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
translations(
audio: pxt.Audio,
*,
model: pxt.String,
model_kwargs: pxt.Json | None = None
) -> pxt.Json
```
Translates audio into English.
Equivalent to the OpenAI `audio/translations` API endpoint.
For additional details, see: [https://platform.openai.com/docs/guides/speech-to-text](https://platform.openai.com/docs/guides/speech-to-text)
Request throttling:
Applies the rate limit set in the config (section `openai.rate_limits`; use the model id as the key). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install openai`
**Parameters:**
* **`audio`** (`pxt.Audio`): The audio to translate.
* **`model`** (`pxt.String`): The model to use for speech transcription and translation.
* **`model_kwargs`** (`pxt.Json | None`): Additional keyword args for the OpenAI `audio/translations` API. For details on the available
parameters, see: [https://platform.openai.com/docs/api-reference/audio/createTranslation](https://platform.openai.com/docs/api-reference/audio/createTranslation)
**Returns:**
* `pxt.Json`: A dictionary containing the translation and other metadata.
**Examples:**
Add a computed column that applies the model `whisper-1` to an existing Pixeltable column `tbl.audio`
of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
translation=translations(tbl.audio, model='whisper-1', language='en')
)
```
# openrouter
Source: https://docs.pixeltable.com/sdk/latest/openrouter
# module pixeltable.functions.openrouter
Pixeltable UDFs that wrap the OpenRouter API.
OpenRouter provides a unified interface to multiple LLM providers. In order to use it,
you must first sign up at [https://openrouter.ai](https://openrouter.ai), create an API key, and configure it
as described in the Working with OpenRouter tutorial.
## udf chat\_completions()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
chat_completions(
messages: pxt.Json[(Json, ...)],
*,
model: pxt.String,
model_kwargs: pxt.Json | None = None,
tools: pxt.Json[(Json, ...)] | None = None,
tool_choice: pxt.Json | None = None,
provider: pxt.Json | None = None,
transforms: pxt.Json[(String, ...)] | None = None
) -> pxt.Json
```
Chat Completion API via OpenRouter.
OpenRouter provides access to multiple LLM providers through a unified API.
For additional details, see: [https://openrouter.ai/docs](https://openrouter.ai/docs)
Supported models can be found at: [https://openrouter.ai/models](https://openrouter.ai/models)
Request throttling:
Applies the rate limit set in the config (section `openrouter`, key `rate_limit`). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install openai`
**Parameters:**
* **`messages`** (`pxt.Json[(Json`): A list of messages comprising the conversation so far.
* **`model`** (`Any`): ID of the model to use (e.g., `'anthropic/claude-3.5-sonnet'`, `'openai/gpt-4'`).
* **`model_kwargs`** (`Any`): Additional OpenAI-compatible parameters.
* **`tools`** (`Any`): List of tools available to the model.
* **`tool_choice`** (`Any`): Controls which (if any) tool is called by the model.
* **`provider`** (`Any`): OpenRouter-specific provider preferences (e.g., `{'order': ['Anthropic', 'OpenAI']}`).
* **`transforms`** (`Any`): List of message transforms to apply (e.g., `['middle-out']`).
**Returns:**
* `pxt.Json`: A dictionary containing the response in OpenAI format.
**Examples:**
Basic chat completion:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
messages = [{'role': 'user', 'content': tbl.prompt}]
tbl.add_computed_column(
response=chat_completions(
messages, model='anthropic/claude-3.5-sonnet'
)
)
```
With provider routing:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
response=chat_completions(
messages,
model='anthropic/claude-3.5-sonnet',
provider={'require_parameters': True, 'order': ['Anthropic']},
)
)
```
With transforms:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
response=chat_completions(
messages,
model='openai/gpt-4',
transforms=['middle-out'], # Optimize for long contexts
)
)
```
# pixeltable
Source: https://docs.pixeltable.com/sdk/latest/pixeltable
# module pixeltable
Core Pixeltable API for table operations, data processing, and UDF management.
## func array()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
array(elements: Iterable) -> exprs.Expr
```
## func configure\_logging()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
configure_logging(
*,
to_stdout: bool | None = None,
level: int | None = None,
add: str | None = None,
remove: str | None = None
) -> None
```
Configure logging.
**Parameters:**
* **`to_stdout`** (`bool | None`): if True, also log to stdout
* **`level`** (`int | None`): default log level
* **`add`** (`str | None`): comma-separated list of 'module name:log level' pairs; ex.: add='video:10'
* **`remove`** (`str | None`): comma-separated list of module names
## func create\_dir()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
create_dir(
path: str,
*,
if_exists: Literal['error', 'ignore', 'replace', 'replace_force'] = 'error',
parents: bool = False
) -> catalog.Dir | None
```
Create a directory.
**Parameters:**
* **`path`** (`str`): Path to the directory.
* **`if_exists`** (`Literal['error', 'ignore', 'replace', 'replace_force']`, default: `'error'`): Directive regarding how to handle if the path already exists.
Must be one of the following:
* `'error'`: raise an error
* `'ignore'`: do nothing and return the existing directory handle
* `'replace'`: if the existing directory is empty, drop it and create a new one
* `'replace_force'`: drop the existing directory and all its children, and create a new one
* **`parents`** (`bool`, default: `False`): Create missing parent directories.
**Returns:**
* `catalog.Dir | None`: A handle to the newly created directory, or to an already existing directory at the path when
`if_exists='ignore'`. Please note the existing directory may not be empty.
**Examples:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.create_dir('my_dir')
```
Create a subdirectory:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.create_dir('my_dir/sub_dir')
```
Create a subdirectory only if it does not already exist, otherwise do nothing:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.create_dir('my_dir/sub_dir', if_exists='ignore')
```
Create a directory and replace if it already exists:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.create_dir('my_dir', if_exists='replace_force')
```
Create a subdirectory along with its ancestors:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.create_dir('parent1/parent2/sub_dir', parents=True)
```
## func create\_snapshot()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
create_snapshot(
path_str: str,
base: catalog.Table | Query,
*,
additional_columns: Mapping[str, type | ColumnSpec | exprs.Expr] | None = None,
iterator: func.GeneratingFunctionCall | None = None,
comment: str | None = None,
custom_metadata: Any = None,
media_validation: Literal['on_read', 'on_write'] = 'on_write',
if_exists: Literal['error', 'ignore', 'replace', 'replace_force'] = 'error'
) -> catalog.Table | None
```
Create a snapshot of an existing table object (which itself can be a view or a snapshot or a base table).
**Parameters:**
* **`path_str`** (`str`): A name for the snapshot; can be either a simple name such as `my_snapshot`, or a pathname such as
`dir1/my_snapshot`.
* **`base`** (`catalog.Table | Query`): [`Table`](./table) (i.e., table or view or snapshot) or [`Query`](./query) to
base the snapshot on.
* **`additional_columns`** (`Mapping[str, type | ColumnSpec | exprs.Expr] | None`): If specified, will add these columns to the snapshot once it is created. The format
of the `additional_columns` parameter is identical to the format of the `schema` parameter in
[`create_table`](./pixeltable#func-create_table).
* **`iterator`** (`func.GeneratingFunctionCall | None`): The iterator to use for this snapshot. If specified, then this snapshot will be a one-to-many view of
the base table.
* **`comment`** (`str | None`): Optional comment for the snapshot.
* **`custom_metadata`** (`Any`): Optional user-defined JSON metadata to associate with the snapshot.
* **`media_validation`** (`Literal['on_read', 'on_write']`, default: `'on_write'`): Media validation policy for the snapshot.
* `'on_read'`: validate media files at query time
* `'on_write'`: validate media files during insert/update operations
* **`if_exists`** (`Literal['error', 'ignore', 'replace', 'replace_force']`, default: `'error'`): Directive regarding how to handle if the path already exists.
Must be one of the following:
* `'error'`: raise an error
* `'ignore'`: do nothing and return the existing snapshot handle
* `'replace'`: if the existing snapshot has no dependents, drop and replace it with a new one
* `'replace_force'`: drop the existing snapshot and all its dependents, and create a new one
**Returns:**
* `catalog.Table | None`: A handle to the [`Table`](./table) representing the newly created snapshot.
Please note the schema or base of the existing snapshot may not match those provided in the call.
**Examples:**
Create a snapshot `my_snapshot` of a table `my_table`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
snapshot = pxt.create_snapshot('my_snapshot', tbl)
```
Create a snapshot `my_snapshot` of a view `my_view` with additional int column `col3`,
if `my_snapshot` does not already exist:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
view = pxt.get_table('my_view')
snapshot = pxt.create_snapshot(
'my_snapshot',
view,
additional_columns={'col3': pxt.Int},
if_exists='ignore',
)
```
Create a snapshot `my_snapshot` on a table `my_table`, and replace any existing snapshot named `my_snapshot`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
snapshot = pxt.create_snapshot(
'my_snapshot', tbl, if_exists='replace_force'
)
```
## func create\_table()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
create_table(
path: str,
schema: Mapping[str, type | ColumnSpec | exprs.Expr] | None = None,
*,
source: TableDataSource | None = None,
source_format: Literal['csv', 'excel', 'parquet', 'json'] | None = None,
schema_overrides: dict[str, Any] | None = None,
create_default_idxs: bool = True,
on_error: Literal['abort', 'ignore'] = 'abort',
primary_key: str | list[str] | None = None,
comment: str | None = None,
custom_metadata: Any = None,
media_validation: Literal['on_read', 'on_write'] = 'on_write',
if_exists: Literal['error', 'ignore', 'replace', 'replace_force'] = 'error',
extra_args: dict[str, Any] | None = None,
_is_versioned: bool = True
) -> catalog.Table
```
Create a new base table. Exactly one of `schema` or `source` must be provided.
If a `schema` is provided, then an empty table will be created with the specified schema.
If a `source` is provided, then Pixeltable will attempt to infer a data source format and table schema from the
contents of the specified data, and the data will be imported from the specified source into the new table. The
source format and/or schema can be specified directly via the `source_format` and `schema_overrides` parameters.
**Parameters:**
* **`path`** (`str`): Pixeltable path (qualified name) of the table, such as `'my_table'` or `'my_dir/my_subdir/my_table'`.
* **`schema`** (`Mapping[str, type | ColumnSpec | exprs.Expr] | None`): Schema for the new table, mapping column names to Pixeltable types.
* **`source`** (`TableDataSource | None`): A data source (file, URL, Table, Query, or list of rows) to import from.
* **`source_format`** (`Literal['csv', 'excel', 'parquet', 'json'] | None`): Must be used in conjunction with a `source`.
If specified, then the given format will be used to read the source data. (Otherwise,
Pixeltable will attempt to infer the format from the source data.)
* **`schema_overrides`** (`dict[str, Any] | None`): Must be used in conjunction with a `source`.
If specified, then columns in `schema_overrides` will be given the specified types.
(Pixeltable will attempt to infer the types of any columns not specified.)
* **`create_default_idxs`** (`bool`, default: `True`): If True, creates a B-tree index on every scalar and media column that is not computed,
except for boolean columns.
* **`on_error`** (`Literal['abort', 'ignore']`, default: `'abort'`): Determines the behavior if an error occurs while evaluating a computed column or detecting an
invalid media file (such as a corrupt image) for one of the inserted rows.
* If `on_error='abort'`, then an exception will be raised and the rows will not be inserted.
* If `on_error='ignore'`, then execution will continue and the rows will be inserted. Any cells
with errors will have a `None` value for that cell, with information about the error stored in the
corresponding `tbl.col_name.errortype` and `tbl.col_name.errormsg` fields.
* **`primary_key`** (`str | list[str] | None`): An optional column name or list of column names to use as the primary key(s) of the
table.
* **`comment`** (`str | None`): An optional comment; its meaning is user-defined.
* **`custom_metadata`** (`Any`): Optional user-defined metadata to associate with the table. Must be a valid JSON-serializable
object \[str, int, float, bool, dict, list].
* **`media_validation`** (`Literal['on_read', 'on_write']`, default: `'on_write'`): Media validation policy for the table.
* `'on_read'`: validate media files at query time
* `'on_write'`: validate media files during insert/update operations
* **`if_exists`** (`Literal['error', 'ignore', 'replace', 'replace_force']`, default: `'error'`): Determines the behavior if a table already exists at the specified path location.
* `'error'`: raise an error
* `'ignore'`: do nothing and return the existing table handle
* `'replace'`: if the existing table has no views or snapshots, drop and replace it with a new one;
raise an error if the existing table has views or snapshots
* `'replace_force'`: drop the existing table and all its views and snapshots, and create a new one
* **`extra_args`** (`dict[str, Any] | None`): Must be used in conjunction with a `source`. If specified, then additional arguments will be
passed along to the source data provider.
**Returns:**
* `catalog.Table`: A handle to the newly created table, or to an already existing table at the path when `if_exists='ignore'`.
Please note the schema of the existing table may not match the schema provided in the call.
**Examples:**
Create a table with an int and a string column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.create_table(
'my_table', schema={'col1': pxt.Int, 'col2': pxt.String}
)
```
Create a table from a select statement over an existing table `orig_table` (this will create a new table
containing the exact contents of the query):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl1 = pxt.get_table('orig_table')
tbl2 = pxt.create_table(
'new_table', tbl1.where(tbl1.col1 < 10).select(tbl1.col2)
)
```
Create a table if it does not already exist, otherwise get the existing table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.create_table(
'my_table',
schema={'col1': pxt.Int, 'col2': pxt.String},
if_exists='ignore',
)
```
Create a table with an int and a float column, and replace any existing table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.create_table(
'my_table',
schema={'col1': pxt.Int, 'col2': pxt.Float},
if_exists='replace',
)
```
Create a table from a CSV file:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.create_table('my_table', source='data.csv')
```
Create a table with an auto-generated UUID primary key:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.create_table(
'my_table',
schema={'id': pxt.functions.uuid.uuid4(), 'data': pxt.String},
primary_key=['id'],
)
```
## func create\_view()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
create_view(
path: str,
base: catalog.Table | Query,
*,
additional_columns: Mapping[str, type | ColumnSpec | exprs.Expr] | None = None,
is_snapshot: bool = False,
create_default_idxs: bool = False,
iterator: func.GeneratingFunctionCall | None = None,
comment: str | None = None,
custom_metadata: Any = None,
media_validation: Literal['on_read', 'on_write'] = 'on_write',
if_exists: Literal['error', 'ignore', 'replace', 'replace_force'] = 'error'
) -> catalog.Table | None
```
Create a view of an existing table object (which itself can be a view or a snapshot or a base table).
**Parameters:**
* **`path`** (`str`): A name for the view; can be either a simple name such as `my_view`, or a pathname such as
`dir1/my_view`.
* **`base`** (`catalog.Table | Query`): [`Table`](./table) (i.e., table or view or snapshot) or [`Query`](./query) to
base the view on.
* **`additional_columns`** (`Mapping[str, type | ColumnSpec | exprs.Expr] | None`): If specified, will add these columns to the view once it is created. The format
of the `additional_columns` parameter is identical to the format of the `schema` parameter in
[`create_table`](./pixeltable#func-create_table).
* **`is_snapshot`** (`bool`, default: `False`): Whether the view is a snapshot. Setting this to `True` is equivalent to calling
[`create_snapshot`](./pixeltable#func-create_snapshot).
* **`create_default_idxs`** (`bool`, default: `False`): Whether to create default indexes on the view's columns (the base's columns are excluded).
Cannot be `True` for snapshots.
* **`iterator`** (`func.GeneratingFunctionCall | None`): The iterator to use for this view. If specified, then this view will be a one-to-many view of
the base table.
* **`comment`** (`str | None`): Optional comment for the view.
* **`custom_metadata`** (`Any`): Optional user-defined JSON metadata to associate with the view.
* **`media_validation`** (`Literal['on_read', 'on_write']`, default: `'on_write'`): Media validation policy for the view.
* `'on_read'`: validate media files at query time
* `'on_write'`: validate media files during insert/update operations
* **`if_exists`** (`Literal['error', 'ignore', 'replace', 'replace_force']`, default: `'error'`): Directive regarding how to handle if the path already exists.
Must be one of the following:
* `'error'`: raise an error
* `'ignore'`: do nothing and return the existing view handle
* `'replace'`: if the existing view has no dependents, drop and replace it with a new one
* `'replace_force'`: drop the existing view and all its dependents, and create a new one
**Returns:**
* `catalog.Table | None`: A handle to the [`Table`](./table) representing the newly created view. If the path already
exists and `if_exists='ignore'`, returns a handle to the existing view. Please note the schema
or the base of the existing view may not match those provided in the call.
**Examples:**
Create a view `my_view` of an existing table `my_table`, filtering on rows where `col1` is greater than 10:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
view = pxt.create_view('my_view', tbl.where(tbl.col1 > 10))
```
Create a view `my_view` of an existing table `my_table`, filtering on rows where `col1` is greater than 10,
and if it not already exist. Otherwise, get the existing view named `my_view`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
view = pxt.create_view(
'my_view', tbl.where(tbl.col1 > 10), if_exists='ignore'
)
```
Create a view `my_view` of an existing table `my_table`, filtering on rows where `col1` is greater than 100,
and replace any existing view named `my_view`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
view = pxt.create_view(
'my_view', tbl.where(tbl.col1 > 100), if_exists='replace_force'
)
```
## func drop\_dir()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
drop_dir(
path: str,
force: bool = False,
if_not_exists: Literal['error', 'ignore'] = 'error'
) -> None
```
Remove a directory.
**Parameters:**
* **`path`** (`str`): Name or path of the directory.
* **`force`** (`bool`, default: `False`): If `True`, will also drop all tables and subdirectories of this directory, recursively, along
with any views or snapshots that depend on any of the dropped tables.
* **`if_not_exists`** (`Literal['error', 'ignore']`, default: `'error'`): Directive regarding how to handle if the path does not exist.
Must be one of the following:
* `'error'`: raise an error
* `'ignore'`: do nothing and return
**Examples:**
Remove a directory, if it exists and is empty:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.drop_dir('my_dir')
```
Remove a subdirectory:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.drop_dir('my_dir/sub_dir')
```
Remove an existing directory if it is empty, but do nothing if it does not exist:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.drop_dir('my_dir/sub_dir', if_not_exists='ignore')
```
Remove an existing directory and all its contents:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.drop_dir('my_dir', force=True)
```
## func drop\_table()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
drop_table(
table: str | catalog.Table,
force: bool = False,
if_not_exists: Literal['error', 'ignore'] = 'error'
) -> None
```
Drop a table, view, snapshot, or replica.
**Parameters:**
* **`table`** (`str | catalog.Table`): Fully qualified name or table handle of the table to be dropped; or a remote URI of a cloud replica to
be deleted.
* **`force`** (`bool`, default: `False`): If `True`, will also drop all views and sub-views of this table.
* **`if_not_exists`** (`Literal['error', 'ignore']`, default: `'error'`): Directive regarding how to handle if the path does not exist.
Must be one of the following:
* `'error'`: raise an error
* `'ignore'`: do nothing and return
**Examples:**
Drop a table by its fully qualified name:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.drop_table('subdir/my_table')
```
Drop a table by its handle:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.get_table('subdir/my_table')
pxt.drop_table(t)
```
Drop a table if it exists, otherwise do nothing:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.drop_table('subdir/my_table', if_not_exists='ignore')
```
Drop a table and all its dependents:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.drop_table('subdir/my_table', force=True)
```
## func get\_dir\_contents()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
get_dir_contents(dir_path: str = '', recursive: bool = True) -> DirContents
```
Get the contents of a Pixeltable directory.
**Parameters:**
* **`dir_path`** (`str`, default: `''`): Path to the directory. Defaults to the root directory.
* **`recursive`** (`bool`, default: `True`): If `False`, returns only those tables and directories that are directly contained in specified
directory; if `True`, returns all tables and directories that are descendants of the specified directory,
recursively.
**Returns:**
* `'DirContents'`: A [`DirContents`](./dircontents) object representing the contents of the specified directory.
**Examples:**
Get contents of top-level directory:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.get_dir_contents()
```
Get contents of 'dir1':
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.get_dir_contents('dir1')
```
## func get\_dir\_tree()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
get_dir_tree() -> list['TreeNode']
```
Get a tree representation of the Pixeltable directory structure.
**Returns:**
* `list['TreeNode']`: A list of [`TreeNode`](./treenode) dicts. Each node is either a `DirectoryNode` or a `TableNode`.
## func get\_table()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
get_table(
path: str,
if_not_exists: Literal['error', 'ignore'] = 'error'
) -> catalog.Table | None
```
Get a handle to an existing table, view, or snapshot.
**Parameters:**
* **`path`** (`str`): Path to the table.
* **`if_not_exists`** (`Literal['error', 'ignore']`, default: `'error'`): Directive regarding how to handle if the path does not exist.
Must be one of the following:
* `'error'`: raise an error
* `'ignore'`: do nothing and return `None`
**Returns:**
* `catalog.Table | None`: A handle to the [`Table`](./table).
**Examples:**
Get handle for a table in the top-level directory:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
```
For a table in a subdirectory:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('subdir/my_table')
```
Handles to views and snapshots are retrieved in the same way:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_snapshot')
```
Get a handle to a specific version of a table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table:722')
```
## func home()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
home() -> Path
```
Get the path to the user's home directory in Pixeltable.
**Returns:**
* `Path`: The path to the user's home directory.
## func init()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
init(
config_overrides: dict[str, Any] | None = None,
additional_config_files: list[str] | None = None
) -> None
```
Initializes the Pixeltable environment.
**Parameters:**
* **`config_overrides`** (`dict[str, Any] | None`): Optional dictionary of configuration overrides.
* **`additional_config_files`** (`list[str] | None`): Optional list of additional TOML config file paths to load.
## func iterator()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
iterator(*args, **kwargs)
```
## func list\_dirs()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
list_dirs(path: str = '', recursive: bool = True) -> list[str]
```
List the directories in a directory.
**Parameters:**
* **`path`** (`str`, default: `''`): Name or path of the directory.
* **`recursive`** (`bool`, default: `True`): If `True`, lists all descendants of this directory recursively.
**Returns:**
* `list[str]`: List of directory paths.
**Examples:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
cl.list_dirs('my_dir', recursive=True)
```
\['my\_dir', 'my\_dir/sub\_dir1']
## func list\_functions()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
list_functions() -> Styler
```
Returns information about all registered functions.
**Returns:**
* `Styler`: Pandas DataFrame with columns 'Path', 'Name', 'Parameters', 'Return Type', 'Is Agg', 'Library'
## func list\_tables()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
list_tables(dir_path: str = '', recursive: bool = True) -> list[str]
```
List the [`Table`](./table)s in a directory.
**Parameters:**
* **`dir_path`** (`str`, default: `''`): Path to the directory. Defaults to the root directory.
* **`recursive`** (`bool`, default: `True`): If `False`, returns only those tables that are directly contained in specified directory; if
`True`, returns all tables that are descendants of the specified directory, recursively.
**Returns:**
* `list[str]`: A list of [`Table`](./table) paths.
**Examples:**
List tables in top-level directory:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.list_tables()
```
List tables in 'dir1':
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.list_tables('dir1')
```
## func ls()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
ls(path: str = '') -> pd.DataFrame
```
List the contents of a Pixeltable directory.
This function returns a Pandas DataFrame representing a human-readable listing of the specified directory,
including various attributes such as version and base table, as appropriate.
To get a programmatic list of the directory's contents, use [get\_dir\_contents()](./pixeltable#func-get_dir_contents)
instead.
## func mcp\_udfs()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
mcp_udfs(url: str) -> list['pxt.func.Function']
```
## func move()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
move(
path: str,
new_path: str,
*,
if_exists: Literal['error', 'ignore'] = 'error',
if_not_exists: Literal['error', 'ignore'] = 'error'
) -> None
```
Move a schema object to a new directory and/or rename a schema object.
**Parameters:**
* **`path`** (`str`): absolute path to the existing schema object.
* **`new_path`** (`str`): absolute new path for the schema object.
* **`if_exists`** (`Literal['error', 'ignore']`, default: `'error'`): Directive regarding how to handle if a schema object already exists at the new path.
Must be one of the following:
* `'error'`: raise an error
* `'ignore'`: do nothing and return
* **`if_not_exists`** (`Literal['error', 'ignore']`, default: `'error'`): Directive regarding how to handle if the source path does not exist.
Must be one of the following:
* `'error'`: raise an error
* `'ignore'`: do nothing and return
**Examples:**
Move a table to a different directory:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.move('dir1/my_table', 'dir2/my_table')
```
Rename a table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.move('dir1/my_table', 'dir1/new_name')
```
## func publish()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
publish(
source: str | catalog.Table,
destination_uri: str,
bucket_name: str | None = None,
access: Literal['public', 'private'] = 'private'
) -> None
```
Publishes a replica of a local Pixeltable table to Pixeltable cloud. A given table can be published to at most one
URI per Pixeltable cloud database.
**Parameters:**
* **`source`** (`str | catalog.Table`): Path or table handle of the local table to be published.
* **`destination_uri`** (`str`): Remote URI where the replica will be published, such as `'pxt://org_name/my_dir/my_table'`.
* **`bucket_name`** (`str | None`): The name of the bucket to use to store replica's data. The bucket must be registered with
Pixeltable cloud. If no `bucket_name` is provided, the default storage bucket for the destination
database will be used.
* **`access`** (`Literal['public', 'private']`, default: `'private'`): Access control for the replica.
* `'public'`: Anyone can access this replica.
* `'private'`: Only the host organization can access.
## func query()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query(*args: Any, **kwargs: Any) -> Any
```
## func replicate()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
replicate(remote_uri: str, local_path: str) -> catalog.Table
```
Retrieve a replica from Pixeltable cloud as a local table. This will create a full local copy of the replica in a
way that preserves the table structure of the original source data. Once replicated, the local table can be
queried offline just as any other Pixeltable table.
**Parameters:**
* **`remote_uri`** (`str`): Remote URI of the table to be replicated, such as `'pxt://org_name/my_dir/my_table'` or
`'pxt://org_name/my_dir/my_table:5'` (with version 5).
* **`local_path`** (`str`): Local table path where the replica will be created, such as `'my_new_dir/my_new_tbl'`. It can be
the same or different from the cloud table name.
**Returns:**
* `catalog.Table`: A handle to the newly created local replica table.
## func retrieval\_udf()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
retrieval_udf(
table: catalog.Table,
name: str | None = None,
description: str | None = None,
parameters: Iterable[str | exprs.ColumnRef] | None = None,
limit: int | None = 10
) -> func.QueryTemplateFunction
```
Constructs a retrieval UDF for the given table. The retrieval UDF is a UDF whose parameters are
columns of the table and whose return value is a list of rows from the table. The return value of
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
f(col1=x, col2=y, ...)
```
will be a list of all rows from the table that match the specified arguments.
**Parameters:**
* **`table`** (`catalog.Table`): The table to use as the dataset for the retrieval tool.
* **`name`** (`str | None`): The name of the tool. If not specified, then the name of the table will be used by default.
* **`description`** (`str | None`): The description of the tool. If not specified, then a default description will be generated.
* **`parameters`** (`Iterable[str | exprs.ColumnRef] | None`): The columns of the table to use as parameters. If not specified, all data columns
(non-computed columns) will be used as parameters.
All of the specified parameters will be required parameters of the tool, regardless of their status
as columns.
* **`limit`** (`int | None`, default: `10`): The maximum number of rows to return. If not specified, then all matching rows will be returned.
**Returns:**
* `func.QueryTemplateFunction`: A list of dictionaries containing data from the table, one per row that matches the input arguments.
If there are no matching rows, an empty list will be returned.
## func tool()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tool(
fn: func.Function,
name: str | None = None,
description: str | None = None
) -> func.tools.Tool
```
Specifies a Pixeltable UDF to be used as an LLM tool with customizable metadata. See the documentation for
[pxt.tools()](./pixeltable#func-tools) for more details.
**Parameters:**
* **`fn`** (`func.Function`): The UDF to use as a tool.
* **`name`** (`str | None`): The name of the tool. If not specified, then the unqualified name of the UDF will be used by default.
* **`description`** (`str | None`): The description of the tool. If not specified, then the entire contents of the UDF docstring
will be used by default.
**Returns:**
* `func.tools.Tool`: A `Tool` instance that can be passed to an LLM tool-calling API.
## func tools()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tools(*args: func.Function | func.tools.Tool) -> func.tools.Tools
```
Specifies a collection of UDFs to be used as LLM tools. Pixeltable allows any UDF to be used as an input into an
LLM tool-calling API. To use one or more UDFs as tools, wrap them in a `pxt.tools` call and pass the return value
to an LLM API.
The UDFs can be specified directly or wrapped inside a [pxt.tool()](./pixeltable#func-tool) invocation. If a UDF is
specified directly, the tool name will be the (unqualified) UDF name, and the tool description will consist of the
entire contents of the UDF docstring. If a UDF is wrapped in a `pxt.tool()` invocation, then the name and/or
description may be customized.
**Parameters:**
* **`args`** (`func.Function | func.tools.Tool`): The UDFs to use as tools.
**Returns:**
* `func.tools.Tools`: A `Tools` instance that can be passed to an LLM tool-calling API or invoked to generate tool results.
**Examples:**
Create a tools instance with a single UDF:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tools = pxt.tools(stock_price)
```
Create a tools instance with several UDFs:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tools = pxt.tools(stock_price, weather_quote)
```
Create a tools instance, some of whose UDFs have customized metadata:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tools = pxt.tools(
stock_price,
pxt.tool(
weather_quote,
description='Returns information about the weather in a particular location.',
),
pxt.tool(traffic_quote, name='traffic_conditions'),
)
```
## func uda()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
uda(*args, **kwargs)
```
Decorator for user-defined aggregate functions.
The decorated class must inherit from Aggregator and implement the following methods:
* **init**(self, ...) to initialize the aggregator
* update(self, ...) to update the aggregator with a new value
* value(self) to return the final result
The decorator creates an AggregateFunction instance from the class and adds it
to the module where the class is defined.
Parameters:
* requires\_order\_by: if True, the first parameter to the function is the order-by expression
* allows\_std\_agg: if True, the function can be used as a standard aggregate function w/o a window
* allows\_window: if True, the function can be used with a window
## func udf()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
udf(*args, **kwargs)
```
A decorator to create a Function from a function definition.
**Examples:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
def my_function(x: int) -> int:
return x + 1
```
# Query
Source: https://docs.pixeltable.com/sdk/latest/query
# class pixeltable.Query
Represents a query for retrieving and transforming data from Pixeltable tables.
Thread-safe.
## method collect()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
collect() -> ResultSet
```
## method cursor()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
cursor() -> ResultCursor
```
Return a [`ResultCursor`](./resultcursor) that iterates over the query results row by row.
See [`ResultCursor`](./resultcursor) for usage examples and lifecycle details.
## method distinct()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
distinct() -> Query
```
Remove duplicate rows from this Query.
Note that grouping will be applied to the rows based on the select clause of this Query.
In the absence of a select clause, by default, all columns are selected in the grouping.
**Examples:**
Select unique addresses from table `addresses`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
results = addresses.distinct()
```
Select unique cities in table `addresses`
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
results = addresses.city.distinct()
```
Select unique locations (street, city) in the state of `CA`
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
results = (
addresses.select(addresses.street, addresses.city)
.where(addresses.state == 'CA')
.distinct()
)
```
## method group\_by()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
group_by(*grouping_items: Any) -> Query
```
Add a group-by clause to this Query.
Variants:
* group\_by(base\_tbl): group a component view by their respective base table rows
* group\_by(expr1, expr2, expr3): group by the given expressions
Note that grouping will be applied to the rows and take effect when
used with an aggregation function like sum(), count() etc.
**Parameters:**
* **`grouping_items`** (`Any`): expressions to group by
**Returns:**
* `Query`: A new Query with the specified group-by clause.
**Examples:**
Given the Query book from a table t with all its columns and rows:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
book = t.select()
```
Group the above Query book by the 'genre' column (referenced in table t):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query = book.group_by(t.genre)
```
Use the above Query grouped by genre to count the number of
books for each 'genre':
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query = (
book.group_by(t.genre).select(t.genre, count=count(t.genre)).show()
)
```
Use the above Query grouped by genre to the total price of
books for each 'genre':
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query = book.group_by(t.genre).select(t.genre, total=sum(t.price)).show()
```
## method head()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
head(n: int = 10) -> ResultSet
```
Return the first n rows of the Query, in insertion order of the underlying Table.
head() is not supported for joins.
**Parameters:**
* **`n`** (`int`, default: `10`): Number of rows to select. Default is 10.
**Returns:**
* `ResultSet`: A ResultSet with the first n rows of the Query.
## method join()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
join(
other: catalog.Table,
on: exprs.Expr | Sequence[exprs.ColumnRef] | None = None,
how: plan.JoinType.LiteralType = 'inner'
) -> Query
```
Join this Query with a table.
**Parameters:**
* **`other`** (`catalog.Table`): the table to join with
* **`on`** (`exprs.Expr | Sequence[exprs.ColumnRef] | None`): the join condition, which can be either a) references to one or more columns or b) a boolean
expression.
* column references: implies an equality predicate that matches columns in both this
Query and `other` by name.
* column in `other`: A column with that same name must be present in this Query, and **it must
be unique** (otherwise the join is ambiguous).
* column in this Query: A column with that same name must be present in `other`.
* boolean expression: The expressions must be valid in the context of the joined tables.
* **`how`** (`plan.JoinType.LiteralType`, default: `'inner'`): the type of join to perform.
* `'inner'`: only keep rows that have a match in both
* `'left'`: keep all rows from this Query and only matching rows from the other table
* `'right'`: keep all rows from the other table and only matching rows from this Query
* `'full_outer'`: keep all rows from both this Query and the other table
* `'cross'`: Cartesian product; no `on` condition allowed
**Returns:**
* `Query`: A new Query.
**Examples:**
Perform an inner join between t1 and t2 on the column id:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
join1 = t1.join(t2, on=t2.id)
```
Perform a left outer join of join1 with t3, also on id (note that we can't specify `on=t3.id` here,
because that would be ambiguous, since both t1 and t2 have a column named id):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
join2 = join1.join(t3, on=t2.id, how='left')
```
Do the same, but now with an explicit join predicate:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
join2 = join1.join(t3, on=t2.id == t3.id, how='left')
```
Join t with d, which has a composite primary key (columns pk1 and pk2, with corresponding foreign
key columns d1 and d2 in t):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query = t.join(d, on=(t.d1 == d.pk1) & (t.d2 == d.pk2), how='left')
```
## method limit()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
limit(n: int, offset: int | None = None) -> Query
```
Limit the number of rows in the Query, optionally skipping rows for pagination.
**Parameters:**
* **`n`** (`int`): Number of rows to select.
* **`offset`** (`int | None`): Number of rows to skip before returning results. Default is None (no offset).
**Returns:**
* `Query`: A new Query with the specified limited rows.
**Examples:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query = t.select()
```
Get the first 10 rows:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query.limit(10).collect()
```
Get rows 21-30 (skip first 20, return next 10):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query.limit(10, offset=20).collect()
```
## method order\_by()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
order_by(*expr_list: exprs.Expr, asc: bool = True) -> Query
```
Add an order-by clause to this Query.
**Parameters:**
* **`expr_list`** (`exprs.Expr`): expressions to order by
* **`asc`** (`bool`, default: `True`): whether to order in ascending order (True) or descending order (False).
Default is True.
**Returns:**
* `Query`: A new Query with the specified order-by clause.
**Examples:**
Given the Query book from a table t with all its columns and rows:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
book = t.select()
```
Order the above Query book by two columns (price, pages) in descending order:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query = book.order_by(t.price, t.pages, asc=False)
```
Order the above Query book by price in descending order, but order the pages
in ascending order:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query = book.order_by(t.price, asc=False).order_by(t.pages)
```
## method sample()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sample(
n: int | None = None,
n_per_stratum: int | None = None,
fraction: float | None = None,
seed: int | None = None,
stratify_by: Any = None
) -> Query
```
Return a new Query specifying a sample of rows from the Query, considered in a shuffled order.
The size of the sample can be specified in three ways:
* `n`: the total number of rows to produce as a sample
* `n_per_stratum`: the number of rows to produce per stratum as a sample
* `fraction`: the fraction of available rows to produce as a sample
The sample can be stratified by one or more columns, which means that the sample will
be selected from each stratum separately.
The data is shuffled before creating the sample.
**Parameters:**
* **`n`** (`int | None`): Total number of rows to produce as a sample.
* **`n_per_stratum`** (`int | None`): Number of rows to produce per stratum as a sample. This parameter is only valid if
`stratify_by` is specified. Only one of `n` or `n_per_stratum` can be specified.
* **`fraction`** (`float | None`): Fraction of available rows to produce as a sample. This parameter is not usable with `n` or
`n_per_stratum`. The fraction must be between 0.0 and 1.0.
* **`seed`** (`int | None`): Random seed for reproducible shuffling
* **`stratify_by`** (`Any`): If specified, the sample will be stratified by these values.
**Returns:**
* `Query`: A new Query which specifies the sampled rows
**Examples:**
Given the Table `person` containing the field 'age', we can create samples of the table in various ways:
Sample 100 rows from the above Table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query = person.sample(n=100)
```
Sample 10% of the rows from the above Table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query = person.sample(fraction=0.1)
```
Sample 10% of the rows from the above Table, stratified by the column 'age':
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query = person.sample(fraction=0.1, stratify_by=t.age)
```
Equal allocation sampling: Sample 2 rows from each age present in the above Table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query = person.sample(n_per_stratum=2, stratify_by=t.age)
```
Sampling is compatible with the where clause, so we can also sample from a filtered Query:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query = person.where(t.age > 30).sample(n=100)
```
## method select()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
select(*items: Any, **named_items: Any) -> Query
```
Select columns or expressions from the Query.
**Parameters:**
* **`items`** (`Any`): expressions to be selected
* **`named_items`** (`Any`): named expressions to be selected
**Returns:**
* `Query`: A new Query with the specified select list.
**Examples:**
Given the Query person from a table t with all its columns and rows:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
person = t.select()
```
Select the columns 'name' and 'age' (referenced in table t) from the Query person:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query = person.select(t.name, t.age)
```
Select the columns 'name' (referenced in table t) from the Query person,
and a named column 'is\_adult' from the expression `age >= 18` where 'age' is
another column in table t:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query = person.select(t.name, is_adult=(t.age >= 18))
```
## method show()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
show(n: int = 20) -> ResultSet
```
## method tail()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tail(n: int = 10) -> ResultSet
```
Return the last n rows of the Query, in insertion order of the underlying Table.
tail() is not supported for joins.
**Parameters:**
* **`n`** (`int`, default: `10`): Number of rows to select. Default is 10.
**Returns:**
* `ResultSet`: A ResultSet with the last n rows of the Query.
## method to\_coco\_dataset()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
to_coco_dataset() -> Path
```
Convert the Query to a COCO dataset.
This Query must return a single json-typed output column in the following format:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{
'image': PIL.Image.Image,
'annotations': [
{
'bbox': [x: int, y: int, w: int, h: int],
'category': str | int,
},
...
],
}
```
**Returns:**
* `Path`: Path to the COCO dataset file.
## method to\_pytorch\_dataset()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
to_pytorch_dataset(image_format: str = 'pt') -> torch.utils.data.IterableDataset
```
Convert the Query to a pytorch IterableDataset suitable for parallel loading
with torch.utils.data.DataLoader.
This method requires pyarrow >= 13, torch and torchvision to work.
This method serializes data so it can be read from disk efficiently and repeatedly without
re-executing the query. This data is cached to disk for future re-use.
**Parameters:**
* **`image_format`** (`str`, default: `'pt'`): format of the images. Can be 'pt' (pytorch tensor) or 'np' (numpy array).
'np' means image columns return as an RGB uint8 array of shape HxWxC.
'pt' means image columns return as a CxHxW tensor with values in \[0,1] and type torch.float32.
(the format output by torchvision.transforms.ToTensor())
**Returns:**
* `'torch.utils.data.IterableDataset'`: A pytorch IterableDataset: Columns become fields of the dataset, where rows are returned as a dictionary
compatible with torch.utils.data.DataLoader default collation.
## method where()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
where(pred: exprs.Expr) -> Query
```
Filter rows based on a predicate.
**Parameters:**
* **`pred`** (`exprs.Expr`): the predicate to filter rows
**Returns:**
* `Query`: A new Query with the specified predicates replacing the where-clause.
**Examples:**
Given the Query person from a table t with all its columns and rows:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
person = t.select()
```
Filter the above Query person to only include rows where the column 'age'
(referenced in table t) is greater than 30:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
query = person.where(t.age > 30)
```
# replicate
Source: https://docs.pixeltable.com/sdk/latest/replicate
# module pixeltable.functions.replicate
Pixeltable UDFs
that wrap various endpoints from the Replicate API. In order to use them, you must
first `pip install replicate` and configure your Replicate credentials, as described in
the [Working with Replicate](https://docs.pixeltable.com/notebooks/integrations/working-with-replicate) tutorial.
## udf run()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
run(input: pxt.Json, *, ref: pxt.String) -> pxt.Json
```
Run a model on Replicate.
For additional details, see: [https://replicate.com/docs/topics/models/run-a-model](https://replicate.com/docs/topics/models/run-a-model)
Request throttling:
Applies the rate limit set in the config (section `replicate`, key `rate_limit`). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install replicate`
**Parameters:**
* **`input`** (`pxt.Json`): The input parameters for the model.
* **`ref`** (`pxt.String`): The name of the model to run.
**Returns:**
* `pxt.Json`: The output of the model.
**Examples:**
Add a computed column that applies the model `meta/meta-llama-3-8b-instruct`
to an existing Pixeltable column `tbl.prompt` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
input = {
'system_prompt': 'You are a helpful assistant.',
'prompt': tbl.prompt,
}
tbl.add_computed_column(
response=run(input, ref='meta/meta-llama-3-8b-instruct')
)
```
Add a computed column that uses the model `black-forest-labs/flux-schnell`
to generate images from an existing Pixeltable column `tbl.prompt`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
input = {'prompt': tbl.prompt, 'go_fast': True, 'megapixels': '1'}
tbl.add_computed_column(
response=run(input, ref='black-forest-labs/flux-schnell')
)
tbl.add_computed_column(image=tbl.response.output[0].astype(pxt.Image))
```
# ResultCursor
Source: https://docs.pixeltable.com/sdk/latest/resultcursor
# class pixeltable.ResultCursor
Cursor that iterates over query results.
Wraps a Query and yields Row objects one at a time,
avoiding materializing all results into memory.
A cursor transitions through three states: pending (created but not yet started), open (actively
iterating), and closed (resources released). Iteration auto-opens and auto-closes the cursor, or you can
use it as a context manager for explicit lifecycle control.
## method close()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
close() -> None
```
Release the underlying database transaction and query resources.
Safe to call multiple times. Once closed, the cursor cannot be reopened.
Also called automatically via the context manager protocol and on garbage collection.
## method open()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
open() -> None
```
Start the underlying query and prepare the cursor for iteration.
Raises an error if the cursor is already open or has been closed.
Called automatically when iterating if not already open.
# ResultSet
Source: https://docs.pixeltable.com/sdk/latest/resultset
# class pixeltable.ResultSet
A dataset obtained by executing a [`Query`](./query). Returned by
[`Query.collect()`](./query#method-collect), [`Query.head()`](./query#method-head),
[`Query.tail()`](./query#method-tail), and the equivalent methods on class [`Table`](./table).
A `ResultSet` is structured as a table with rows (indexed by integers) and columns (indexed by strings).
The column names correspond to the expressions in the query's select list. The values in a `ResultSet` can
be accessed in various ways:
* `len(result)` returns the number of rows
* `result[i]` returns the `i`th row as a `dict` mapping column names to values
* `result['col']` returns a `list` of all values in the column named `'col'`
* `result[i, 'col']` returns the specific value in the `i`th row and column `'col'`
`ResultSet` implements the Sequence protocol, so it can be iterated over and converted to other sequence
types in the usual fashion; for example:
* `for row in result` (iterates over rows)
* `list(result)` (converts to a list of rows)
## method to\_pandas()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
to_pandas() -> pd.DataFrame
```
Convert the `ResultSet` to a Pandas `DataFrame`.
**Returns:**
* `pd.DataFrame`: A `DataFrame` with one column per column in the `ResultSet`.
## method to\_pydantic()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
to_pydantic(model: type[BaseModelT]) -> Iterator[BaseModelT]
```
Convert the `ResultSet` to Pydantic model instances.
**Parameters:**
* **`model`** (`type[BaseModelT]`): A Pydantic model class.
**Returns:**
* `Iterator[BaseModelT]`: An iterator over Pydantic model instances, one for each row in the result set.
# reve
Source: https://docs.pixeltable.com/sdk/latest/reve
# module pixeltable.functions.reve
Pixeltable [UDFs](https://docs.pixeltable.com/platform/udfs-in-pixeltable) that wrap [Reve](https://app.reve.com/) image
generation API. In order to use them, the API key must be specified either with `REVE_API_KEY` environment variable,
or as `api_key` in the `reve` section of the Pixeltable config file.
## udf create()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
create(
prompt: pxt.String,
*,
aspect_ratio: pxt.String | None = None,
version: pxt.String | None = None,
model_kwargs: pxt.Json | None = None
) -> pxt.Image
```
Creates an image from a text prompt.
This UDF wraps the `https://api.reve.com/v1/image/create` endpoint. For more information, refer to the official
[API documentation](https://api.reve.com/console/docs/create).
**Parameters:**
* **`prompt`** (`pxt.String`): prompt describing the desired image
* **`aspect_ratio`** (`pxt.String | None`): desired image aspect ratio, e.g. '3:2', '16:9', '1:1', etc.
* **`version`** (`pxt.String | None`): specific model version to use. Latest if not specified.
* **`model_kwargs`** (`pxt.Json | None`): additional keyword arguments to pass to the Reve API.
**Returns:**
* `pxt.Image`: A generated image
**Examples:**
Add a computed column with generated square images to a table with text prompts:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(img=reve.create(t.prompt, aspect_ratio='1:1'))
```
## udf edit()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
edit(
image: pxt.Image,
edit_instruction: pxt.String,
*,
version: pxt.String | None = None,
model_kwargs: pxt.Json | None = None
) -> pxt.Image
```
Edits images based on a text prompt.
This UDF wraps the `https://api.reve.com/v1/image/edit` endpoint. For more information, refer to the official
[API documentation](https://api.reve.com/console/docs/edit)
**Parameters:**
* **`image`** (`pxt.Image`): image to edit
* **`edit_instruction`** (`pxt.String`): text prompt describing the desired edit
* **`version`** (`pxt.String | None`): specific model version to use. Latest if not specified.
* **`model_kwargs`** (`pxt.Json | None`): additional keyword arguments to pass to the Reve API.
**Returns:**
* `pxt.Image`: A generated image
**Examples:**
Add a computed column with catalog-ready images to the table with product pictures:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(
catalog_img=reve.edit(
t.product_img,
'Remove background and distractions from the product picture, improve lighting.',
)
)
```
## udf remix()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
remix(
prompt: pxt.String,
images: pxt.Json[(Image, ...)],
*,
aspect_ratio: pxt.String | None = None,
version: pxt.String | None = None,
model_kwargs: pxt.Json | None = None
) -> pxt.Image
```
Creates images based on a text prompt and reference images.
The prompt may include `0`, `1`, etc. tags to refer to the images in the `images` argument.
This UDF wraps the `https://api.reve.com/v1/image/remix` endpoint. For more information, refer to the official
[API documentation](https://api.reve.com/console/docs/remix)
**Parameters:**
* **`prompt`** (`pxt.String`): prompt describing the desired image
* **`images`** (`pxt.Json[(Image`): list of reference images
* **`aspect_ratio`** (`Any`): desired image aspect ratio, e.g. '3:2', '16:9', '1:1', etc.
* **`version`** (`Any`): specific model version to use. Latest by default.
* **`model_kwargs`** (`Any`): additional keyword arguments to pass to the Reve API.
**Returns:**
* `pxt.Image`: A generated image
**Examples:**
Add a computed column with promotional collages to a table with original images:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(
promo_img=(
reve.remix(
'Generate a product promotional image by combining the image of the product'
' from 0 with the landmark scene from 1',
images=[t.product_img, t.local_landmark_img],
aspect_ratio='16:9',
)
)
)
```
# Row
Source: https://docs.pixeltable.com/sdk/latest/row
# class pixeltable.Row
A dict-like wrapper over a single result row.
Supports key access (`row['col']`), membership (`'col' in row`),
iteration over keys, and the standard `get`, `keys`, `values`,
and `items` methods.
# runwayml
Source: https://docs.pixeltable.com/sdk/latest/runwayml
# module pixeltable.functions.runwayml
Pixeltable UDFs
that wrap various endpoints from the RunwayML API. In order to use them, you must
first `pip install runwayml` and configure your RunwayML credentials by setting the `RUNWAYML_API_SECRET` environment
variable.
## udf image\_to\_video()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
image_to_video(
prompt_image: pxt.Image,
model: pxt.String,
ratio: pxt.String,
*,
prompt_text: pxt.String | None = None,
duration: pxt.Int | None = None,
seed: pxt.Int | None = None,
audio: pxt.Bool | None = None,
model_kwargs: pxt.Json | None = None
) -> pxt.Json
```
Generate videos from images.
For additional details, see: [Image to video](https://docs.dev.runwayml.com/api/#tag/Start-generating/paths/~1v1~1image_to_video/post)
**Requirements:**
* `pip install runwayml`
**Parameters:**
* **`prompt_image`** (`pxt.Image`): Input image to use as the first frame.
* **`model`** (`pxt.String`): The model to use.
* **`ratio`** (`pxt.String`): Aspect ratio of the generated video.
* **`prompt_text`** (`pxt.String | None`): Text description to guide generation.
* **`duration`** (`pxt.Int | None`): Duration in seconds.
* **`seed`** (`pxt.Int | None`): Seed for reproducibility.
* **`audio`** (`pxt.Bool | None`): Whether to generate audio.
* **`model_kwargs`** (`pxt.Json | None`): Additional API parameters.
**Returns:**
* `pxt.Json`: A dictionary containing the response and metadata.
**Examples:**
Add a computed column that generates videos from images:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
response=image_to_video(
tbl.image,
model='gen4',
ratio='16:9',
prompt_text='Slow motion',
duration=5,
)
)
tbl.add_computed_column(video=tbl.response['output'].astype(pxt.Video))
```
## udf text\_to\_image()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
text_to_image(
prompt_text: pxt.String,
reference_images: pxt.Json[(Image, ...)],
model: pxt.String,
ratio: pxt.String,
*,
seed: pxt.Int | None = None,
model_kwargs: pxt.Json | None = None
) -> pxt.Json
```
Generate images from text prompts and reference images.
For additional details, see: [Text/Image to Image](https://docs.dev.runwayml.com/api/#tag/Start-generating/paths/~1v1~1text_to_image/post)
**Requirements:**
* `pip install runwayml`
**Parameters:**
* **`prompt_text`** (`pxt.String`): Text description of the image to generate.
* **`reference_images`** (`pxt.Json[(Image`): List of 1-3 reference images.
* **`model`** (`Any`): The model to use.
* **`ratio`** (`Any`): Aspect ratio of the generated image.
* **`seed`** (`Any`): Seed for reproducibility.
* **`model_kwargs`** (`Any`): Additional API parameters.
**Returns:**
* `pxt.Json`: A dictionary containing the response and metadata.
**Examples:**
Add a computed column that generates images from prompts:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
response=text_to_image(
tbl.prompt, [tbl.ref_image], model='gen4_image', ratio='16:9'
)
)
tbl.add_computed_column(image=tbl.response['output'][0].astype(pxt.Image))
```
## udf text\_to\_video()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
text_to_video(
prompt_text: pxt.String,
model: pxt.String,
ratio: pxt.String,
*,
duration: pxt.Int | None = None,
audio: pxt.Bool | None = None,
model_kwargs: pxt.Json | None = None
) -> pxt.Json
```
Generate videos from text prompts.
For additional details, see: [Text to video](https://docs.dev.runwayml.com/api/#tag/Start-generating/paths/~1v1~1text_to_video/post)
**Requirements:**
* `pip install runwayml`
**Parameters:**
* **`prompt_text`** (`pxt.String`): Text description of the video to generate.
* **`model`** (`pxt.String`): The model to use.
* **`ratio`** (`pxt.String`): Aspect ratio of the generated video.
* **`duration`** (`pxt.Int | None`): Duration in seconds.
* **`audio`** (`pxt.Bool | None`): Whether to generate audio.
* **`model_kwargs`** (`pxt.Json | None`): Additional API parameters.
**Returns:**
* `pxt.Json`: A dictionary containing the response and metadata.
**Examples:**
Add a computed column that generates videos from prompts:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
response=text_to_video(
tbl.prompt, model='veo3.1', ratio='16:9', duration=4
)
)
tbl.add_computed_column(video=tbl.response['output'].astype(pxt.Video))
```
## udf video\_to\_video()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
video_to_video(
video_uri: pxt.String,
prompt_text: pxt.String,
model: pxt.String,
ratio: pxt.String,
*,
seed: pxt.Int | None = None,
model_kwargs: pxt.Json | None = None
) -> pxt.Json
```
Transform videos with text guidance.
For additional details, see: [Video to video](https://docs.dev.runwayml.com/api/#tag/Start-generating/paths/~1v1~1video_to_video/post)
**Requirements:**
* `pip install runwayml`
**Parameters:**
* **`video_uri`** (`pxt.String`): HTTPS URL to the input video.
* **`prompt_text`** (`pxt.String`): Text description of the transformation.
* **`model`** (`pxt.String`): The model to use.
* **`ratio`** (`pxt.String`): Aspect ratio of the output video.
* **`seed`** (`pxt.Int | None`): Seed for reproducibility.
* **`model_kwargs`** (`pxt.Json | None`): Additional API parameters.
**Returns:**
* `pxt.Json`: A dictionary containing the response and metadata.
**Examples:**
Add a computed column that transforms videos:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
response=video_to_video(
tbl.video_url, 'Anime style', model='gen4_aleph', ratio='16:9'
)
)
tbl.add_computed_column(video=tbl.response['output'].astype(pxt.Video))
```
# serving
Source: https://docs.pixeltable.com/sdk/latest/serving
# module pixeltable.serving
Adapters for web serving frameworks.
# SqlExport
Source: https://docs.pixeltable.com/sdk/latest/sqlexport
# class pixeltable.serving.SqlExport
Specification of an external RDBMS target for SQL export.
**Parameters:**
* **`db_connect`** (`Any`): SQLAlchemy connection string for the target database (e.g.
`'postgresql+psycopg://user:pw@host/db'`, `'sqlite:///path/to.db'`).
* **`table`** (`Any`): Name of the target table. It must already exist; resolution fails
if the table is missing.
* **`db_schema`** (`Any`): Optional database schema qualifier (e.g. `'analytics'`); leave `None` to
use the connection's default schema.
* **`method`** (`Any`): How to write each row into the target table.
* `'insert'`: append the row via `INSERT ... VALUES`.
* `'update'`: update the row by primary-key match
(`UPDATE ... SET ... WHERE pk=...`). Requires that the target table has a
primary key whose metadata is exposed by the dialect. The exported columns
must include all primary-key columns of the target plus at least one non-PK
column to set. This is a strict update, **not** an upsert: if the WHERE
clause matches zero rows, the export fails. Useful when the source is
append-only but the target is a deduplicated current-state view.
* `'merge'`: upsert via the target table's primary key.
**Currently not supported.**
# StreamMetadata
Source: https://docs.pixeltable.com/sdk/latest/streammetadata
# class pixeltable.functions.StreamMetadata
Metadata for a stream within a media container.
## attr average\_rate
```
average_rate: float | None
```
Average frame rate in FPS (frames per second). Present only for video streams.
## attr base\_rate
```
base_rate: float | None
```
Base (constant) frame rate in FPS. Present only for video streams.
## attr codec\_context
```
codec_context: CodecContextMetadata
```
Codec information for this stream.
## attr duration
```
duration: int | None
```
Stream duration in `time_base` units, or `None` if unknown.
## attr duration\_seconds
```
duration_seconds: float | None
```
Stream duration in seconds, computed from `duration` and `time_base`.
## attr frames
```
frames: int
```
Number of frames in the stream (may be 0 if unknown).
## attr guessed\_rate
```
guessed_rate: float | None
```
Guessed frame rate in FPS. Present only for video streams.
## attr height
```
height: int
```
Frame height in pixels. Present only for video streams.
## attr metadata
```
metadata: dict[str, str]
```
Additional stream-specific metadata tags (e.g. language, title).
## attr time\_base
```
time_base: float | None
```
Time base of the stream as a float (seconds per tick), or `None` if unknown.
## attr type
```
type: str
```
Stream type: typically `'audio'` or `'video'`. Other stream types (e.g. subtitles) will have a
`StreamMetadata` entry, but with no metadata other than `type`.
## attr width
```
width: int
```
Frame width in pixels. Present only for video streams.
# string
Source: https://docs.pixeltable.com/sdk/latest/string
# module pixeltable.functions.string
Pixeltable UDFs for `StringType`.
It closely follows the Pandas `pandas.Series.str` API.
Example:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
t = pxt.get_table(...)
t.select(t.str_col.capitalize()).collect()
```
## iterator string\_splitter()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.iterator
string_splitter(
text: pxt.String,
separators: pxt.String,
*,
spacy_model: pxt.String = 'en_core_web_sm'
)
```
Iterator over chunks of a string. The string is chunked according to the specified `separators`.
**Outputs**:
One row per chunk, with the following columns:
* `text` (`pxt.String`): The text of the chunk.
**Parameters:**
* **`separators`** (`pxt.String`): Separators to use to chunk the document. Currently the only supported option is `'sentence'`.
* **`spacy_model`** (`pxt.String`): Name of the spaCy model to use for sentence segmentation.
**Examples:**
This example assumes an existing table `tbl` with a column `text` of type `pxt.String`.
Create a view that splits all strings on sentence boundaries:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.create_view(
'sentence_chunks',
tbl,
iterator=string_splitter(tbl.text, separators='sentence'),
)
```
## udf capitalize()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
capitalize(self: pxt.String) -> pxt.String
```
Return string with its first character capitalized and the rest lowercased.
Equivalent to [`str.capitalize()`](https://docs.python.org/3/library/stdtypes.html#str.capitalize).
## udf casefold()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
casefold(self: pxt.String) -> pxt.String
```
Return a casefolded copy of string.
Equivalent to [`str.casefold()`](https://docs.python.org/3/library/stdtypes.html#str.casefold).
## udf center()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
center(
self: pxt.String,
width: pxt.Int,
fillchar: pxt.String = ' '
) -> pxt.String
```
Return a centered string of length `width`.
Equivalent to [`str.center()`](https://docs.python.org/3/library/stdtypes.html#str.center).
**Parameters:**
* **`width`** (`pxt.Int`): Total width of the resulting string.
* **`fillchar`** (`pxt.String`): Character used for padding.
## udf contains()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
contains(
self: pxt.String,
substr: pxt.String,
case: pxt.Bool = True
) -> pxt.Bool
```
Test if string contains a substring.
**Parameters:**
* **`substr`** (`pxt.String`): string literal or regular expression
* **`case`** (`pxt.Bool`): if False, ignore case
## udf contains\_re()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
contains_re(
self: pxt.String,
pattern: pxt.String,
flags: pxt.Int = 0
) -> pxt.Bool
```
Test if string contains a regular expression pattern.
**Parameters:**
* **`pattern`** (`pxt.String`): regular expression pattern
* **`flags`** (`pxt.Int`): [flags](https://docs.python.org/3/library/re.html#flags) for the `re` module
## udf count()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
count(
self: pxt.String,
pattern: pxt.String,
flags: pxt.Int = 0
) -> pxt.Int
```
Count occurrences of pattern or regex.
**Parameters:**
* **`pattern`** (`pxt.String`): string literal or regular expression
* **`flags`** (`pxt.Int`): [flags](https://docs.python.org/3/library/re.html#flags) for the `re` module
## udf endswith()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
endswith(self: pxt.String, substr: pxt.String) -> pxt.Bool
```
Return `True` if the string ends with the specified suffix, otherwise return `False`.
Equivalent to [`str.endswith()`](https://docs.python.org/3/library/stdtypes.html#str.endswith).
**Parameters:**
* **`substr`** (`pxt.String`): string literal
## udf fill()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
fill(self: pxt.String, width: pxt.Int, **kwargs) -> pxt.String
```
Wraps the single paragraph in string, and returns a single string containing the wrapped paragraph.
Equivalent to [`textwrap.fill()`](https://docs.python.org/3/library/textwrap.html#textwrap.fill).
**Parameters:**
* **`width`** (`pxt.Int`): Maximum line width.
* **`kwargs`** (`Any`): Additional keyword arguments to pass to `textwrap.fill()`.
## udf find()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
find(
self: pxt.String,
substr: pxt.String,
start: pxt.Int = 0,
end: pxt.Int | None = None
) -> pxt.Int
```
Return the lowest index in string where `substr` is found within the slice `s[start:end]`.
Equivalent to [`str.find()`](https://docs.python.org/3/library/stdtypes.html#str.find).
**Parameters:**
* **`substr`** (`pxt.String`): substring to search for
* **`start`** (`pxt.Int`): slice start
* **`end`** (`pxt.Int | None`): slice end
## udf findall()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
findall(
self: pxt.String,
pattern: pxt.String,
flags: pxt.Int = 0
) -> pxt.Json[(Json, ...)]
```
Find all occurrences of a regular expression pattern in string.
Equivalent to [`re.findall()`](https://docs.python.org/3/library/re.html#re.findall).
**Parameters:**
* **`pattern`** (`pxt.String`): regular expression pattern
* **`flags`** (`pxt.Int`): [flags](https://docs.python.org/3/library/re.html#flags) for the `re` module
## udf format()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
format(self: pxt.String, *args, **kwargs) -> pxt.String
```
Perform string formatting.
Equivalent to [`str.format()`](https://docs.python.org/3/library/stdtypes.html#str.format).
## udf fullmatch()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
fullmatch(
self: pxt.String,
pattern: pxt.String,
case: pxt.Bool = True,
flags: pxt.Int = 0
) -> pxt.Bool
```
Determine if string fully matches a regular expression.
Equivalent to [`re.fullmatch()`](https://docs.python.org/3/library/re.html#re.fullmatch).
**Parameters:**
* **`pattern`** (`pxt.String`): regular expression pattern
* **`case`** (`pxt.Bool`): if False, ignore case
* **`flags`** (`pxt.Int`): [flags](https://docs.python.org/3/library/re.html#flags) for the `re` module
## udf index()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
index(
self: pxt.String,
substr: pxt.String,
start: pxt.Int = 0,
end: pxt.Int | None = None
) -> pxt.Int
```
Return the lowest index in string where `substr` is found within the slice `[start:end]`.
Raises ValueError if `substr` is not found.
Equivalent to [`str.index()`](https://docs.python.org/3/library/stdtypes.html#str.index).
**Parameters:**
* **`substr`** (`pxt.String`): substring to search for
* **`start`** (`pxt.Int`): slice start
* **`end`** (`pxt.Int | None`): slice end
## udf isalnum()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
isalnum(self: pxt.String) -> pxt.Bool
```
Return `True` if all characters in the string are alphanumeric and there is at least one character, `False`
otherwise.
Equivalent to [`str.isalnum()`](https://docs.python.org/3/library/stdtypes.html#str.isalnum)
## udf isalpha()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
isalpha(self: pxt.String) -> pxt.Bool
```
Return `True` if all characters in the string are alphabetic and there is at least one character, `False` otherwise.
Equivalent to [`str.isalpha()`](https://docs.python.org/3/library/stdtypes.html#str.isalpha).
## udf isascii()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
isascii(self: pxt.String) -> pxt.Bool
```
Return `True` if the string is empty or all characters in the string are ASCII, `False` otherwise.
Equivalent to [`str.isascii()`](https://docs.python.org/3/library/stdtypes.html#str.isascii).
## udf isdecimal()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
isdecimal(self: pxt.String) -> pxt.Bool
```
Return `True` if all characters in the string are decimal characters and there is at least one character, `False`
otherwise.
Equivalent to [`str.isdecimal()`](https://docs.python.org/3/library/stdtypes.html#str.isdecimal).
## udf isdigit()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
isdigit(self: pxt.String) -> pxt.Bool
```
Return `True` if all characters in the string are digits and there is at least one character, `False` otherwise.
Equivalent to [`str.isdigit()`](https://docs.python.org/3/library/stdtypes.html#str.isdigit).
## udf isidentifier()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
isidentifier(self: pxt.String) -> pxt.Bool
```
Return `True` if the string is a valid identifier according to the language definition, `False` otherwise.
Equivalent to [`str.isidentifier()`](https://docs.python.org/3/library/stdtypes.html#str.isidentifier)
## udf islower()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
islower(self: pxt.String) -> pxt.Bool
```
Return `True` if all cased characters in the string are lowercase and there is at least one cased character,
`False` otherwise.
Equivalent to [`str.islower()`](https://docs.python.org/3/library/stdtypes.html#str.islower)
## udf isnumeric()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
isnumeric(self: pxt.String) -> pxt.Bool
```
Return `True` if all characters in the string are numeric characters, `False` otherwise.
Equivalent to [`str.isnumeric()`](https://docs.python.org/3/library/stdtypes.html#str.isnumeric)
## udf isspace()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
isspace(self: pxt.String) -> pxt.Bool
```
Return `True` if there are only whitespace characters in the string and there is at least one character,
`False` otherwise.
Equivalent to [`str.isspace()`](https://docs.python.org/3/library/stdtypes.html#str.isspace)
## udf istitle()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
istitle(self: pxt.String) -> pxt.Bool
```
Return `True` if the string is a titlecased string and there is at least one character, `False` otherwise.
Equivalent to [`str.istitle()`](https://docs.python.org/3/library/stdtypes.html#str.istitle)
## udf isupper()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
isupper(self: pxt.String) -> pxt.Bool
```
Return `True` if all cased characters in the string are uppercase and there is at least one cased character,
`False` otherwise.
Equivalent to [`str.isupper()`](https://docs.python.org/3/library/stdtypes.html#str.isupper)
## udf join()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
join(
sep: pxt.String,
elements: pxt.Json[(Json, ...)]
) -> pxt.String
```
Return a string which is the concatenation of the strings in `elements`.
Equivalent to [`str.join()`](https://docs.python.org/3/library/stdtypes.html#str.join)
## udf len()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
len(self: pxt.String) -> pxt.Int
```
Return the number of characters in the string.
Equivalent to [`len(str)`](https://docs.python.org/3/library/functions.html#len)
## udf ljust()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
ljust(
self: pxt.String,
width: pxt.Int,
fillchar: pxt.String = ' '
) -> pxt.String
```
Return the string left-justified in a string of length `width`.
Equivalent to [`str.ljust()`](https://docs.python.org/3/library/stdtypes.html#str.ljust)
**Parameters:**
* **`width`** (`pxt.Int`): Minimum width of resulting string; additional characters will be filled with character defined in
`fillchar`.
* **`fillchar`** (`pxt.String`): Additional character for filling.
## udf lower()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
lower(self: pxt.String) -> pxt.String
```
Return a copy of the string with all the cased characters converted to lowercase.
Equivalent to [`str.lower()`](https://docs.python.org/3/library/stdtypes.html#str.lower)
## udf lstrip()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
lstrip(
self: pxt.String,
chars: pxt.String | None = None
) -> pxt.String
```
Return a copy of the string with leading characters removed. The `chars` argument is a string specifying the set of
characters to be removed. If omitted or `None`, whitespace characters are removed.
Equivalent to [`str.lstrip()`](https://docs.python.org/3/library/stdtypes.html#str.lstrip)
**Parameters:**
* **`chars`** (`pxt.String | None`): The set of characters to be removed.
## udf match()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
match(
self: pxt.String,
pattern: pxt.String,
case: pxt.Bool = True,
flags: pxt.Int = 0
) -> pxt.Bool
```
Determine if string starts with a match of a regular expression
**Parameters:**
* **`pattern`** (`pxt.String`): regular expression pattern
* **`case`** (`pxt.Bool`): if False, ignore case
* **`flags`** (`pxt.Int`): [flags](https://docs.python.org/3/library/re.html#flags) for the `re` module
## udf normalize()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
normalize(self: pxt.String, form: pxt.String) -> pxt.String
```
Return the Unicode normal form.
Equivalent to [`unicodedata.normalize()`](https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize)
**Parameters:**
* **`form`** (`pxt.String`): Unicode normal form (`'NFC'`, `'NFKC'`, `'NFD'`, `'NFKD'`)
## udf pad()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
pad(
self: pxt.String,
width: pxt.Int,
side: pxt.String = 'left',
fillchar: pxt.String = ' '
) -> pxt.String
```
Pad string up to width
**Parameters:**
* **`width`** (`pxt.Int`): Minimum width of resulting string; additional characters will be filled with character defined in
`fillchar`.
* **`side`** (`pxt.String`): Side from which to fill resulting string (`'left'`, `'right'`, `'both'`)
* **`fillchar`** (`pxt.String`): Additional character for filling
## udf partition()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
partition(
self: pxt.String,
sep: pxt.String = ' '
) -> pxt.Json[(Json, ...)]
```
Splits string at the first occurrence of `sep`, and returns 3 elements containing the part before the
separator, the separator itself, and the part after the separator. If the separator is not found, return 3 elements
containing string itself, followed by two empty strings.
## udf removeprefix()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
removeprefix(self: pxt.String, prefix: pxt.String) -> pxt.String
```
Remove prefix. If the prefix is not present, returns string.
## udf removesuffix()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
removesuffix(self: pxt.String, suffix: pxt.String) -> pxt.String
```
Remove suffix. If the suffix is not present, returns string.
## udf repeat()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
repeat(self: pxt.String, n: pxt.Int) -> pxt.String
```
Repeat string `n` times.
## udf replace()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
replace(
self: pxt.String,
substr: pxt.String,
repl: pxt.String,
n: pxt.Int | None = None
) -> pxt.String
```
Replace occurrences of `substr` with `repl`.
Equivalent to [`str.replace()`](https://docs.python.org/3/library/stdtypes.html#str.replace).
**Parameters:**
* **`substr`** (`pxt.String`): string literal
* **`repl`** (`pxt.String`): replacement string
* **`n`** (`pxt.Int | None`): number of replacements to make (if `None`, replace all occurrences)
## udf replace\_re()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
replace_re(
self: pxt.String,
pattern: pxt.String,
repl: pxt.String,
n: pxt.Int | None = None,
flags: pxt.Int = 0
) -> pxt.String
```
Replace occurrences of a regular expression pattern with `repl`.
Equivalent to [`re.sub()`](https://docs.python.org/3/library/re.html#re.sub).
**Parameters:**
* **`pattern`** (`pxt.String`): regular expression pattern
* **`repl`** (`pxt.String`): replacement string
* **`n`** (`pxt.Int | None`): number of replacements to make (if `None`, replace all occurrences)
* **`flags`** (`pxt.Int`): [flags](https://docs.python.org/3/library/re.html#flags) for the `re` module
## udf reverse()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
reverse(self: pxt.String) -> pxt.String
```
Return a reversed copy of the string.
Equivalent to `str[::-1]`.
## udf rfind()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
rfind(
self: pxt.String,
substr: pxt.String,
start: pxt.Int | None = 0,
end: pxt.Int | None = None
) -> pxt.Int
```
Return the highest index where `substr` is found, such that `substr` is contained within `[start:end]`.
Equivalent to [`str.rfind()`](https://docs.python.org/3/library/stdtypes.html#str.rfind).
**Parameters:**
* **`substr`** (`pxt.String`): substring to search for
* **`start`** (`pxt.Int | None`): slice start
* **`end`** (`pxt.Int | None`): slice end
## udf rindex()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
rindex(
self: pxt.String,
substr: pxt.String,
start: pxt.Int | None = 0,
end: pxt.Int | None = None
) -> pxt.Int
```
Return the highest index where `substr` is found, such that `substr` is contained within `[start:end]`.
Raises ValueError if `substr` is not found.
Equivalent to [`str.rindex()`](https://docs.python.org/3/library/stdtypes.html#str.rindex).
## udf rjust()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
rjust(
self: pxt.String,
width: pxt.Int,
fillchar: pxt.String = ' '
) -> pxt.String
```
Return the string right-justified in a string of length `width`.
Equivalent to [`str.rjust()`](https://docs.python.org/3/library/stdtypes.html#str.rjust).
**Parameters:**
* **`width`** (`pxt.Int`): Minimum width of resulting string.
* **`fillchar`** (`pxt.String`): Additional character for filling.
## udf rpartition()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
rpartition(
self: pxt.String,
sep: pxt.String = ' '
) -> pxt.Json[(Json, ...)]
```
This method splits string at the last occurrence of `sep`, and returns a list containing the part before the
separator, the separator itself, and the part after the separator.
## udf rstrip()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
rstrip(
self: pxt.String,
chars: pxt.String | None = None
) -> pxt.String
```
Return a copy of string with trailing characters removed.
Equivalent to [`str.rstrip()`](https://docs.python.org/3/library/stdtypes.html#str.rstrip).
**Parameters:**
* **`chars`** (`pxt.String | None`): The set of characters to be removed. If omitted or `None`, whitespace characters are removed.
## udf slice()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
slice(
self: pxt.String,
start: pxt.Int | None = None,
stop: pxt.Int | None = None,
step: pxt.Int | None = None
) -> pxt.String
```
Return a slice.
**Parameters:**
* **`start`** (`pxt.Int | None`): slice start
* **`stop`** (`pxt.Int | None`): slice end
* **`step`** (`pxt.Int | None`): slice step
## udf slice\_replace()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
slice_replace(
self: pxt.String,
start: pxt.Int | None = None,
stop: pxt.Int | None = None,
repl: pxt.String | None = None
) -> pxt.String
```
Replace a positional slice with another value.
**Parameters:**
* **`start`** (`pxt.Int | None`): slice start
* **`stop`** (`pxt.Int | None`): slice end
* **`repl`** (`pxt.String | None`): replacement value
## udf startswith()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
startswith(self: pxt.String, substr: pxt.String) -> pxt.Int
```
Return `True` if string starts with `substr`, otherwise return `False`.
Equivalent to [`str.startswith()`](https://docs.python.org/3/library/stdtypes.html#str.startswith).
**Parameters:**
* **`substr`** (`pxt.String`): string literal
## udf strip()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
strip(
self: pxt.String,
chars: pxt.String | None = None
) -> pxt.String
```
Return a copy of string with leading and trailing characters removed.
Equivalent to [`str.strip()`](https://docs.python.org/3/library/stdtypes.html#str.strip).
**Parameters:**
* **`chars`** (`pxt.String | None`): The set of characters to be removed. If omitted or `None`, whitespace characters are removed.
## udf swapcase()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
swapcase(self: pxt.String) -> pxt.String
```
Return a copy of string with uppercase characters converted to lowercase and vice versa.
Equivalent to [`str.swapcase()`](https://docs.python.org/3/library/stdtypes.html#str.swapcase).
## udf title()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
title(self: pxt.String) -> pxt.String
```
Return a titlecased version of string, i.e. words start with uppercase characters, all remaining cased characters
are lowercase.
Equivalent to [`str.title()`](https://docs.python.org/3/library/stdtypes.html#str.title).
## udf upper()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
upper(self: pxt.String) -> pxt.String
```
Return a copy of string converted to uppercase.
Equivalent to [`str.upper()`](https://docs.python.org/3/library/stdtypes.html#str.upper).
## udf wrap()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
wrap(
self: pxt.String,
width: pxt.Int,
**kwargs
) -> pxt.Json[(String, ...)]
```
Wraps the single paragraph in string so every line is at most `width` characters long.
Returns a list of output lines, without final newlines.
Equivalent to [`textwrap.fill()`](https://docs.python.org/3/library/textwrap.html#textwrap.fill).
**Parameters:**
* **`width`** (`pxt.Int`): Maximum line width.
* **`kwargs`** (`Any`): Additional keyword arguments to pass to `textwrap.fill()`.
## udf zfill()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
zfill(self: pxt.String, width: pxt.Int) -> pxt.String
```
Pad a numeric string with ASCII `0` on the left to a total length of `width`.
Equivalent to [`str.zfill()`](https://docs.python.org/3/library/stdtypes.html#str.zfill).
**Parameters:**
* **`width`** (`pxt.Int`): Minimum width of resulting string.
# Table
Source: https://docs.pixeltable.com/sdk/latest/table
# class pixeltable.Table
A handle to a table, view, or snapshot. This class is the primary interface through which table operations
(queries, insertions, updates, etc.) are performed in Pixeltable.
Thread-safe.
## method add\_column()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
add_column(
*,
if_exists: Literal['error', 'ignore', 'replace', 'replace_force'] = 'error',
**kwargs: type | ColumnSpec
) -> UpdateStatus
```
Adds an ordinary (non-computed) column to the table.
**Parameters:**
* **`kwargs`** (`type | ColumnSpec`): Exactly one keyword argument of the form `col_name=type` or `col_name=col_spec_dict`,
where `col_spec_dict` is a [`ColumnSpec`](./columnspec) dict.
* **`if_exists`** (`Literal['error', 'ignore', 'replace', 'replace_force']`, default: `'error'`): Determines the behavior if the column already exists. Must be one of the following:
* `'error'`: an exception will be raised.
* `'ignore'`: do nothing and return.
* `'replace'` or `'replace_force'`: drop the existing column and add the new column, if it has
no dependents.
**Returns:**
* `UpdateStatus`: Information about the execution status of the operation.
**Examples:**
Add an int column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_column(new_col=pxt.Int)
```
Add a column with column metadata using a dict:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_column(
img_col={
'type': pxt.Image,
'stored': True,
'media_validation': 'on_write',
}
)
```
Alternatively, adding a column can also be expressed using `add_columns`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_columns({'new_col': pxt.Int})
```
As well as with column metadata:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_columns(
{
'img_col': {
'type': pxt.Image,
'stored': True,
'media_validation': 'on_write',
}
}
)
```
## method add\_columns()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
add_columns(
schema: Mapping[str, type | ColumnSpec],
if_exists: Literal['error', 'ignore', 'replace', 'replace_force'] = 'error'
) -> UpdateStatus
```
Adds multiple columns to the table. The columns must be concrete (non-computed) columns; to add computed
columns, use [`add_computed_column()`](./table#method-add_computed_column) instead.
The format of the `schema` argument is a dict mapping column names to their types.
**Parameters:**
* **`schema`** (`Mapping[str, type | ColumnSpec]`): A dictionary mapping column names to a `type` or a [`ColumnSpec`](./columnspec) dict.
* **`if_exists`** (`Literal['error', 'ignore', 'replace', 'replace_force']`, default: `'error'`): Determines the behavior if a column already exists. Must be one of the following:
* `'error'`: an exception will be raised.
* `'ignore'`: do nothing and return.
* `'replace' or 'replace_force'`: drop the existing column and add the new column, if it has no
dependents.
Note that the `if_exists` parameter is applied to all columns in the schema.
To apply different behaviors to different columns, please use
[`add_column()`](./table#method-add_column) for each column.
**Returns:**
* `UpdateStatus`: Information about the execution status of the operation.
**Examples:**
Add multiple columns to the table `my_table`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
schema = {'new_col_1': pxt.Int, 'new_col_2': pxt.String}
tbl.add_columns(schema)
```
It is also possible to specify column metadata using a dict:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
schema = {
'new_col_1': {
'type': pxt.Image,
'stored': True,
'media_validation': 'on_write',
},
'new_col_2': pxt.String,
}
tbl.add_columns(schema)
```
## method add\_computed\_column()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
add_computed_column(
*,
stored: bool | None = None,
destination: str | Path | None = None,
custom_metadata: Any = None,
comment: str = '',
print_stats: bool = False,
on_error: Literal['abort', 'ignore'] = 'abort',
if_exists: Literal['error', 'ignore', 'replace'] = 'error',
**kwargs: exprs.Expr
) -> UpdateStatus
```
Adds a computed column to the table.
**Parameters:**
* **`kwargs`** (`exprs.Expr`): Exactly one keyword argument of the form `col_name=expression`.
* **`stored`** (`bool | None`): Whether the column is materialized and stored or computed on demand.
* **`destination`** (`str | Path | None`): An object store reference for persisting computed files.
* **`custom_metadata`** (`Any`): Optional user-defined metadata to associate with the column. Must be a valid
JSON-serializable object.
* **`comment`** (`str`, default: `''`): An optional comment; its meaning is user-defined.
* **`print_stats`** (`bool`, default: `False`): If `True`, print execution metrics during evaluation.
* **`on_error`** (`Literal['abort', 'ignore']`, default: `'abort'`): Determines the behavior if an error occurs while evaluating the column expression for at least one
row.
* `'abort'`: an exception will be raised and the column will not be added.
* `'ignore'`: execution will continue and the column will be added. Any rows
with errors will have a `None` value for the column, with information about the error stored in the
corresponding `tbl.col_name.errormsg` and `tbl.col_name.errortype` fields.
* **`if_exists`** (`Literal['error', 'ignore', 'replace']`, default: `'error'`): Determines the behavior if the column already exists. Must be one of the following:
* `'error'`: an exception will be raised.
* `'ignore'`: do nothing and return.
* `'replace' or 'replace_force'`: drop the existing column and add the new column, iff it has
no dependents.
**Returns:**
* `UpdateStatus`: Information about the execution status of the operation.
**Examples:**
For a table with an image column `frame`, add an image column `rotated` that rotates the image by
90 degrees:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(rotated=tbl.frame.rotate(90))
```
Do the same, but now the column is unstored:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(rotated=tbl.frame.rotate(90), stored=False)
```
## method add\_embedding\_index()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
add_embedding_index(
column: str | ColumnRef,
*,
idx_name: str | None = None,
embedding: pxt.Function | None = None,
string_embed: pxt.Function | None = None,
image_embed: pxt.Function | None = None,
metric: Literal['cosine', 'ip', 'l2'] = 'cosine',
precision: Literal['fp16', 'fp32'] = 'fp16',
if_exists: Literal['error', 'ignore', 'replace', 'replace_force'] = 'error'
) -> None
```
Add an embedding index to the table. Once the index is created, it will be automatically kept up-to-date as new
rows are inserted into the table.
To add an embedding index, specify the column to be indexed and, if the column is not an `Array` column, an
embedding UDF. `String`, `Image`, `Video`, `Audio` and `Array` columns are currently supported.
For `Array` columns, which are assumed to contain precomputed embeddings, an embedding function is optional;
if provided, it will be used to convert query values into embeddings for similarity search.
**Parameters:**
* **`column`** (`str | ColumnRef`): The name of, or reference to, the column to be indexed; must be a `String`, `Image` or
`Array` column.
* **`idx_name`** (`str | None`): An optional name for the index. If not specified, a name such as `'idx0'` will be generated
automatically. If specified, the name must be unique for this table and a valid pixeltable column name.
* **`embedding`** (`pxt.Function | None`): The UDF to use for the embedding. Must be a UDF that accepts a single argument of type `String`
or `Image` (as appropriate for the column being indexed) and returns a fixed-size 1-dimensional
array of floats.
* **`string_embed`** (`pxt.Function | None`): An optional UDF to use for the string embedding component of this index.
Can be used in conjunction with `image_embed` to construct multimodal embeddings manually, by
specifying different embedding functions for different data types.
* **`image_embed`** (`pxt.Function | None`): An optional UDF to use for the image embedding component of this index.
Can be used in conjunction with `string_embed` to construct multimodal embeddings manually, by
specifying different embedding functions for different data types.
* **`metric`** (`Literal['cosine', 'ip', 'l2']`, default: `'cosine'`): Distance metric to use for the index; one of `'cosine'`, `'ip'`, or `'l2'`.
The default is `'cosine'`.
* **`precision`** (`Literal['fp16', 'fp32']`, default: `'fp16'`): level of precision for the embeddings; one of `'fp16'` or `'fp32'`.
* **`if_exists`** (`Literal['error', 'ignore', 'replace', 'replace_force']`, default: `'error'`): Directive for handling an existing index with the same name. Must be one of the following:
* `'error'`: raise an error if an index with the same name already exists.
* `'ignore'`: do nothing if an index with the same name already exists.
* `'replace'` or `'replace_force'`: replace the existing index with the new one.
**Examples:**
Add an index to the `img` column of the table `my_table`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import clip
tbl = pxt.get_table('my_table')
embedding_fn = clip.using(model_id='openai/clip-vit-base-patch32')
tbl.add_embedding_index(tbl.img, embedding=embedding_fn)
```
Alternatively, the `img` column may be specified by name:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_embedding_index('img', embedding=embedding_fn)
```
Once the index is created, similarity lookups can be performed using the `similarity` pseudo-function:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sim = tbl.img.similarity(
image='/path/to/my-image.jpg' # can also be a URL or a PIL image
)
tbl.select(tbl.img, sim).order_by(sim, asc=False).limit(5)
```
If the embedding UDF is a multimodal embedding (supporting more than one data type), then lookups may be
performed using any of its supported modalities. In our example, CLIP supports both text and images, so we
can also search for images using a text description:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sim = tbl.img.similarity(string='a picture of a train')
tbl.select(tbl.img, sim).order_by(sim, asc=False).limit(5)
```
Audio and video lookups would look like this:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sim = tbl.img.similarity(audio='/path/to/audio.flac')
sim = tbl.img.similarity(video='/path/to/video.mp4')
```
Multiple indexes can be defined on each column. Add a second index to the `img` column, using the inner
product as the distance metric, and with a specific name:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_embedding_index(
tbl.img, idx_name='ip_idx', embedding=embedding_fn, metric='ip'
)
```
Add an index using separately specified string and image embeddings:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_embedding_index(
tbl.img,
string_embed=string_embedding_fn,
image_embed=image_embedding_fn,
)
```
## method batch\_update()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
batch_update(
rows: Iterable[dict[str, Any]],
cascade: bool = True,
if_not_exists: Literal['error', 'ignore', 'insert'] = 'error',
return_rows: bool = False
) -> UpdateStatus
```
Update rows in this table.
**Parameters:**
* **`rows`** (`Iterable[dict[str, Any]]`): an Iterable of dictionaries containing values for the updated columns plus values for the primary key
columns.
* **`cascade`** (`bool`, default: `True`): if True, also update all computed columns that transitively depend on the updated columns.
* **`if_not_exists`** (`Literal['error', 'ignore', 'insert']`, default: `'error'`): Specifies the behavior if a row to update does not exist:
* `'error'`: Raise an error.
* `'ignore'`: Skip the row silently.
* `'insert'`: Insert the row.
* **`return_rows`** (`bool`, default: `False`): If `True`, populate `UpdateStatus.rows` with one dict per affected row, mapping column
names to their new stored values. Rows newly inserted via `if_not_exists='insert'` are included.
If `False` (default), `UpdateStatus.rows` is `None`.
**Examples:**
Update the `name` and `age` columns for the rows with ids 1 and 2 (assuming `id` is the primary key).
If either row does not exist, this raises an error:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.batch_update(
[
{'id': 1, 'name': 'Alice', 'age': 30},
{'id': 2, 'name': 'Bob', 'age': 40},
]
)
```
Update the `name` and `age` columns for the row with `id` 1 (assuming `id` is the primary key) and insert
the row with new `id` 3 (assuming this key does not exist):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.batch_update(
[
{'id': 1, 'name': 'Alice', 'age': 30},
{'id': 3, 'name': 'Bob', 'age': 40},
],
if_not_exists='insert',
)
```
## method collect()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
collect() -> pxt._query.ResultSet
```
Return rows from this table.
## method columns()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
columns() -> list[str]
```
Return the names of the columns in this table.
## method count()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
count() -> int
```
Return the number of rows in this table.
## method cursor()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
cursor() -> pxt._query.ResultCursor
```
Return a [`ResultCursor`](./resultcursor) that iterates over this table's rows.
See [`ResultCursor`](./resultcursor) for usage examples and lifecycle details.
## method delete()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
delete(where: exprs.Expr | None = None) -> UpdateStatus
```
Delete rows in this table.
**Parameters:**
* **`where`** (`'exprs.Expr' | None`): a predicate to filter rows to delete.
**Examples:**
Delete all rows in a table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.delete()
```
Delete all rows in a table where column `a` is greater than 5:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.delete(tbl.a > 5)
```
## method describe()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
describe() -> None
```
Print the table schema.
## method distinct()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
distinct() -> pxt.Query
```
Remove duplicate rows from table.
## method drop\_column()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
drop_column(
column: str | ColumnRef,
if_not_exists: Literal['error', 'ignore'] = 'error'
) -> None
```
Drop a column from the table.
**Parameters:**
* **`column`** (`str | ColumnRef`): The name or reference of the column to drop.
* **`if_not_exists`** (`Literal['error', 'ignore']`, default: `'error'`): Directive for handling a non-existent column. Must be one of the following:
* `'error'`: raise an error if the column does not exist.
* `'ignore'`: do nothing if the column does not exist.
**Examples:**
Drop the column `col` from the table `my_table` by column name:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
tbl.drop_column('col')
```
Drop the column `col` from the table `my_table` by column reference:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
tbl.drop_column(tbl.col)
```
Drop the column `col` from the table `my_table` if it exists, otherwise do nothing:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
tbl.drop_col(tbl.col, if_not_exists='ignore')
```
## method drop\_embedding\_index()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
drop_embedding_index(
*,
column: str | ColumnRef | None = None,
idx_name: str | None = None,
if_not_exists: Literal['error', 'ignore'] = 'error'
) -> None
```
Drop an embedding index from the table. Either a column name or an index name (but not both) must be
specified. If a column name or reference is specified, it must be a column containing exactly one
embedding index; otherwise the specific index name must be provided instead.
**Parameters:**
* **`column`** (`str | ColumnRef | None`): The name of, or reference to, the column from which to drop the index.
The column must have only one embedding index.
* **`idx_name`** (`str | None`): The name of the index to drop.
* **`if_not_exists`** (`Literal['error', 'ignore']`, default: `'error'`): Directive for handling a non-existent index. Must be one of the following:
* `'error'`: raise an error if the index does not exist.
* `'ignore'`: do nothing if the index does not exist.
Note that `if_not_exists` parameter is only applicable when an `idx_name` is specified
and it does not exist, or when `column` is specified and it has no index.
`if_not_exists` does not apply to non-exisitng column.
**Examples:**
Drop the embedding index on the `img` column of the table `my_table` by column name:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
tbl.drop_embedding_index(column='img')
```
Drop the embedding index on the `img` column of the table `my_table` by column reference:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
tbl.drop_embedding_index(column=tbl.img)
```
Drop the embedding index `idx1` of the table `my_table` by index name:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
tbl.drop_embedding_index(idx_name='idx1')
```
Drop the embedding index `idx1` of the table `my_table` by index name, if it exists, otherwise do nothing:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
tbl.drop_embedding_index(idx_name='idx1', if_not_exists='ignore')
```
## method drop\_index()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
drop_index(
*,
column: str | ColumnRef | None = None,
idx_name: str | None = None,
if_not_exists: Literal['error', 'ignore'] = 'error'
) -> None
```
Drop an index from the table. Either a column name or an index name (but not both) must be
specified. If a column name or reference is specified, it must be a column containing exactly one index;
otherwise the specific index name must be provided instead.
**Parameters:**
* **`column`** (`str | ColumnRef | None`): The name of, or reference to, the column from which to drop the index.
The column must have only one embedding index.
* **`idx_name`** (`str | None`): The name of the index to drop.
* **`if_not_exists`** (`Literal['error', 'ignore']`, default: `'error'`): Directive for handling a non-existent index. Must be one of the following:
* `'error'`: raise an error if the index does not exist.
* `'ignore'`: do nothing if the index does not exist.
Note that `if_not_exists` parameter is only applicable when an `idx_name` is specified
and it does not exist, or when `column` is specified and it has no index.
`if_not_exists` does not apply to non-exisitng column.
**Examples:**
Drop the index on the `img` column of the table `my_table` by column name:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
tbl.drop_index(column_name='img')
```
Drop the index on the `img` column of the table `my_table` by column reference:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
tbl.drop_index(tbl.img)
```
Drop the index `idx1` of the table `my_table` by index name:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
tbl.drop_index(idx_name='idx1')
```
Drop the index `idx1` of the table `my_table` by index name, if it exists, otherwise do nothing:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
tbl.drop_index(idx_name='idx1', if_not_exists='ignore')
```
## method get\_metadata()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
get_metadata() -> TableMetadata
```
Retrieves metadata associated with this table.
**Returns:**
* `'TableMetadata'`: A [TableMetadata](./tablemetadata) instance containing this table's metadata.
## method get\_versions()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
get_versions(n: int | None = None) -> list[VersionMetadata]
```
Returns information about versions of this table, most recent first.
`get_versions()` is intended for programmatic access to version metadata; for human-readable
output, use [`history()`](./table#method-history) instead.
**Parameters:**
* **`n`** (`int | None`): if specified, will return at most `n` versions
**Returns:**
* `list[VersionMetadata]`: A list of [VersionMetadata](./versionmetadata) dictionaries, one per version retrieved, most
recent first.
**Examples:**
Retrieve metadata about all versions of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.get_versions()
```
Retrieve metadata about the most recent 5 versions of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.get_versions(n=5)
```
## method group\_by()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
group_by(*items: exprs.Expr) -> pxt.Query
```
Group the rows of this table based on the expression.
See [`Query.group_by`](./query#method-group_by) for more details.
## method head()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
head(*args: Any, **kwargs: Any) -> pxt._query.ResultSet
```
Return the first n rows inserted into this table.
## method history()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
history(n: int | None = None) -> pd.DataFrame
```
Returns a human-readable report about versions of this table.
`history()` is intended for human-readable output of version metadata; for programmatic access,
use [`get_versions()`](./table#method-get_versions) instead.
**Parameters:**
* **`n`** (`int | None`): if specified, will return at most `n` versions
**Returns:**
* `pd.DataFrame`: A report with information about each version, one per row, most recent first.
**Examples:**
Report all versions of the table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.history()
```
Report only the most recent 5 changes to the table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.history(n=5)
```
## method insert()
```python Signatures theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Signature 1:
insert(
source: TableDataSource,
/,
*,
source_format: Literal['csv', 'excel', 'parquet', 'json'] | None = None,
schema_overrides: dict[str, ts.ColumnType] | None = None,
on_error: Literal['abort', 'ignore'] = 'abort',
print_stats: bool = False,
return_rows: bool = False,
**kwargs: Any
) -> UpdateStatus
# Signature 2:
insert(
*,
on_error: Literal['abort', 'ignore'] = 'abort',
print_stats: bool = False,
return_rows: bool = False,
**kwargs: Any
) -> UpdateStatus
```
Inserts rows into this table.
You can insert rows directly by providing a list of dictionaries as the `source`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.insert([{'col1': 1, 'col2': 'egg'}, {'col1': 2, 'col2': 'fish'}])
```
You can also insert data from any recognized data source by providing a file path or URL.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.insert('path/to/file.csv')
tbl.insert('https://example.com/data.xlsx')
tbl.insert('s3://my-bucket/data.parquet')
```
Pixeltable will attempt to infer the format of the source data, unless the optional `source_format`
parameter is specified. Pixeltable will also attempt to infer the schema of the source data; you can
override the inferred schema by providing a `schema_overrides` dictionary (which may include all
columns or just a subset of columns).
The `source` can also be another table or a [`Query`](./query):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.insert(
other_tbl.select(
col1=other_tbl.other_col, col2=other_tbl.yet_another_col
)
)
```
For inserting just a single row, there is a convenient shorthand key/value syntax:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.insert(col1=1, col2='egg')
```
**Parameters:**
* **`source`** (`TableDataSource | None`): A data source from which data can be imported. Can be any of the following:
* A list of dictionaries
* A list of Pydantic model instances
* A file path or URI of a recognized data source
* A Pandas `DataFrame`
* Another Pixeltable table or a `Query`
* A Hugging Face dataset
* **`kwargs`** (`Any`): (if inserting a single row) Keyword-argument pairs representing column names and values.
(if inserting multiple rows) Additional keyword arguments are passed to the data source.
* **`source_format`** (`Literal['csv', 'excel', 'parquet', 'json'] | None`): A hint about the format of the source data. If not specified, Pixeltable will attempt
to infer the format.
* **`schema_overrides`** (`dict[str, ts.ColumnType] | None`): If specified, then columns in `schema_overrides` will be given the specified types.
Any columns not included in `schema_overrides` will have their types inferred as usual.
* **`on_error`** (`Literal['abort', 'ignore']`, default: `'abort'`): Determines the behavior if an error occurs while evaluating a computed column or detecting an
invalid media file (such as a corrupt image) for one of the inserted rows.
* If `on_error='abort'`, then an exception will be raised and the rows will not be inserted.
* If `on_error='ignore'`, then execution will continue and the rows will be inserted. Any cells
with errors will have a `None` value for that cell, with information about the error stored in the
corresponding `tbl.col_name.errortype` and `tbl.col_name.errormsg` fields.
* **`print_stats`** (`bool`, default: `False`): If `True`, print statistics about the cost of computed columns.
* **`return_rows`** (`bool`, default: `False`): If `True`, populate `UpdateStatus.rows` with one dict per inserted row, mapping column names
to their stored or computed values. If `False` (default), `UpdateStatus.rows` is `None`.
**Returns:**
* `UpdateStatus`: An [`UpdateStatus`](./updatestatus) object containing information about the update.
**Examples:**
Insert two rows into the table `my_table` with three int columns `a`, `b`, and `c`.
Column `c` is nullable:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
tbl.insert([{'a': 1, 'b': 1, 'c': 1}, {'a': 2, 'b': 2}])
```
Insert a single row using the alternative syntax:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.insert(a=3, b=3, c=3)
```
Insert rows from a CSV file:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.insert('path/to/file.csv')
```
Insert Pydantic model instances into a table with two `pxt.Int` columns `a` and `b`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
class MyModel(pydantic.BaseModel):
a: int
b: int
models = [MyModel(a=1, b=2), MyModel(a=3, b=4)]
tbl.insert(models)
```
## method join()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
join(
other: Table,
*,
on: exprs.Expr | None = None,
how: pixeltable.plan.JoinType.LiteralType = 'inner'
) -> pxt.Query
```
Join this table with another table.
## method limit()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
limit(n: int, offset: int | None = None) -> pxt.Query
```
Select a limited number of rows from the Table, optionally skipping rows for pagination.
**Parameters:**
* **`n`** (`int`): Number of rows to select.
* **`offset`** (`int | None`): Number of rows to skip before returning results. Default is None (no offset).
**Returns:**
* `'pxt.Query'`: A Query with the specified limited rows.
**Examples:**
Get the first 10 rows:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.limit(10).collect()
```
Get rows 21-30 (skip first 20, return next 10):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.limit(10, offset=20).collect()
```
## method list\_views()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
list_views(*, recursive: bool = True) -> list[str]
```
Returns a list of all views and snapshots of this `Table`.
**Parameters:**
* **`recursive`** (`bool`, default: `True`): If `False`, returns only the immediate successor views of this `Table`. If `True`, returns
all sub-views (including views of views, etc.)
**Returns:**
* `list[str]`: A list of view paths.
## method order\_by()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
order_by(*items: exprs.Expr, asc: bool = True) -> pxt.Query
```
Order the rows of this table based on the expression.
See [`Query.order_by`](./query#method-order_by) for more details.
## method pull()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pull() -> None
```
## method push()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
push() -> None
```
## method recompute\_columns()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
recompute_columns(
*columns: str | ColumnRef,
where: exprs.Expr | None = None,
errors_only: bool = False,
cascade: bool = True
) -> UpdateStatus
```
Recompute the values in one or more computed columns of this table.
**Parameters:**
* **`columns`** (`str | ColumnRef`): The names or references of the computed columns to recompute.
* **`where`** (`'exprs.Expr' | None`): A predicate to filter rows to recompute.
* **`errors_only`** (`bool`, default: `False`): If True, only run the recomputation for rows that have errors in the column (ie, the column's
`errortype` property indicates that an error occurred). Only allowed for recomputing a single column.
* **`cascade`** (`bool`, default: `True`): if True, also update all computed columns that transitively depend on the recomputed columns.
**Examples:**
Recompute computed columns `c1` and `c2` for all rows in this table, and everything that transitively
depends on them:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.recompute_columns('c1', 'c2')
```
Recompute computed column `c1` for all rows in this table, but don't recompute other columns that depend on
it:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.recompute_columns(tbl.c1, tbl.c2, cascade=False)
```
Recompute column `c1` and its dependents, but only for rows with `c2` == 0:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.recompute_columns('c1', where=tbl.c2 == 0)
```
Recompute column `c1` and its dependents, but only for rows that have errors in it:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.recompute_columns('c1', errors_only=True)
```
## method rename\_column()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
rename_column(old_name: str, new_name: str) -> None
```
Rename a column.
**Parameters:**
* **`old_name`** (`str`): The current name of the column.
* **`new_name`** (`str`): The new name of the column.
**Examples:**
Rename the column `col1` to `col2` of the table `my_table`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl = pxt.get_table('my_table')
tbl.rename_column('col1', 'col2')
```
## method revert()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
revert() -> None
```
Reverts the table to the previous version.
.. warning::
This operation is irreversible.
## method sample()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sample(
n: int | None = None,
n_per_stratum: int | None = None,
fraction: float | None = None,
seed: int | None = None,
stratify_by: Any = None
) -> pxt.Query
```
Choose a shuffled sample of rows
See [`Query.sample`](./query#method-sample) for more details.
## method select()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
select(*items: Any, **named_items: Any) -> pxt.Query
```
Select columns or expressions from this table.
See [`Query.select`](./query#method-select) for more details.
## method show()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
show(*args: Any, **kwargs: Any) -> pxt._query.ResultSet
```
Return rows from this table.
## method sync()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sync(
stores: str | list[str] | None = None,
*,
export_data: bool = True,
import_data: bool = True
) -> UpdateStatus
```
Synchronizes this table with its linked external stores.
**Parameters:**
* **`stores`** (`str | list[str] | None`): If specified, will synchronize only the specified named store or list of stores. If not specified,
will synchronize all of this table's external stores.
* **`export_data`** (`bool`, default: `True`): If `True`, data from this table will be exported to the external stores during synchronization.
* **`import_data`** (`bool`, default: `True`): If `True`, data from the external stores will be imported to this table during synchronization.
## method tail()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tail(*args: Any, **kwargs: Any) -> pxt._query.ResultSet
```
Return the last n rows inserted into this table.
## method unlink\_external\_stores()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
unlink_external_stores(
stores: str | list[str] | None = None,
*,
delete_external_data: bool = False,
ignore_errors: bool = False
) -> None
```
Unlinks this table's external stores.
**Parameters:**
* **`stores`** (`str | list[str] | None`): If specified, will unlink only the specified named store or list of stores. If not specified,
will unlink all of this table's external stores.
* **`ignore_errors`** (`bool`, default: `False`): If `True`, no exception will be thrown if a specified store is not linked
to this table.
* **`delete_external_data`** (`bool`, default: `False`): If `True`, then the external data store will also be deleted. WARNING: This
is a destructive operation that will delete data outside Pixeltable, and cannot be undone.
## method update()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
update(
value_spec: dict[str, Any],
where: exprs.Expr | None = None,
cascade: bool = True,
return_rows: bool = False
) -> UpdateStatus
```
Update rows in this table.
**Parameters:**
* **`value_spec`** (`dict[str, Any]`): a dictionary mapping column names to literal values or Pixeltable expressions.
* **`where`** (`'exprs.Expr' | None`): a predicate to filter rows to update.
* **`cascade`** (`bool`, default: `True`): if True, also update all computed columns that transitively depend on the updated columns.
* **`return_rows`** (`bool`, default: `False`): If `True`, populate `UpdateStatus.rows` with one dict per updated row, mapping column
names to their new stored values. If `False` (default), `UpdateStatus.rows` is `None`.
**Returns:**
* `UpdateStatus`: An [`UpdateStatus`](./updatestatus) object containing information about the update.
**Examples:**
Set column `int_col` to 1 for all rows:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.update({'int_col': 1})
```
Set column `int_col` to 1 for all rows where `int_col` is 0:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.update({'int_col': 1}, where=tbl.int_col == 0)
```
Set `int_col` to the value of `other_int_col` + 1:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.update({'int_col': tbl.other_int_col + 1})
```
Increment `int_col` by 1 for all rows where `int_col` is 0:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.update({'int_col': tbl.int_col + 1}, where=tbl.int_col == 0)
```
## method where()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
where(pred: exprs.Expr) -> pxt.Query
```
Filter rows from this table based on the expression.
See [`Query.where`](./query#method-where) for more details.
# TableMetadata
Source: https://docs.pixeltable.com/sdk/latest/tablemetadata
# class pixeltable.TableMetadata
Metadata for a Pixeltable table.
## attr base
```
base: str | None
```
If this table is a view or snapshot, the full path of its base table; otherwise `None`.
## attr columns
```
columns: dict[str, ColumnMetadata]
```
Column metadata for all of the visible columns of the table.
## attr comment
```
comment: str | None
```
User-provided table comment, if one exists.
## attr custom\_metadata
```
custom_metadata: Any
```
User-defined JSON metadata for this table, if any.
## attr id
```
id: uuid.UUID
```
The stable UUID of the table. Useful for detecting drop-and-recreate across time.
## attr indices
```
indices: dict[str, IndexMetadata]
```
Index metadata for all of the indices of the table.
## attr is\_replica
```
is_replica: bool
```
`True` if this table is a replica of another (shared) table.
## attr is\_snapshot
```
is_snapshot: bool
```
`True` if this table is a snapshot.
## attr is\_versioned
```
is_versioned: bool
```
`True` if this is a versioned table.
## attr is\_view
```
is_view: bool
```
`True` if this table is a view.
## attr iterator\_call
```
iterator_call: str | None
```
The iterator call for views that use an iterator; otherwise `None`.
## attr kind
```
kind: Literal['table', 'view', 'snapshot', 'replica']
```
The kind of table: `'table'`, `'view'`, `'snapshot'`, or `'replica'`.
## attr media\_validation
```
media_validation: Literal['on_read', 'on_write']
```
The media validation policy for this table.
## attr name
```
name: str
```
The name of the table (ex: `'my_table'`).
## attr path
```
path: str
```
The full path of the table (ex: `'my_dir.my_subdir.my_table'`).
## attr primary\_key
```
primary_key: list[str] | None
```
List of primary key column names, or `None` if this table has no primary key.
## attr schema\_version
```
schema_version: int
```
The current schema version of the table.
## attr version
```
version: int | None
```
The current version of the table or None if it's not versioned.
## attr version\_created
```
version_created: datetime.datetime
```
The timestamp when this table version was created.
# TableNode
Source: https://docs.pixeltable.com/sdk/latest/tablenode
# class pixeltable.TableNode
A table/view/snapshot/replica entry in a [`TreeNode`](./treenode) tree.
## attr base
```
base: str | None
```
Path of the immediate base table for views/snapshots; None for plain tables.
## attr error\_count
```
error_count: int
```
Cumulative error count as recorded in table's history.
## attr kind
```
kind: TableKind
```
⚠️ **No documentation**
## attr name
```
name: str
```
⚠️ **No documentation**
## attr path
```
path: str
```
⚠️ **No documentation**
## attr version
```
version: int | None
```
⚠️ **No documentation**
# timestamp
Source: https://docs.pixeltable.com/sdk/latest/timestamp
# module pixeltable.functions.timestamp
Pixeltable UDFs for `TimestampType`.
Usage example:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
t = pxt.get_table(...)
t.select(t.timestamp_col.year, t.timestamp_col.weekday()).collect()
```
## udf astimezone()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
astimezone(self: pxt.Timestamp, tz: pxt.String) -> pxt.Timestamp
```
Convert the datetime to the given time zone.
**Parameters:**
* **`tz`** (`pxt.String`): The time zone to convert to. Must be a valid time zone name from the
[IANA Time Zone Database](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones).
## udf day()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
day(self: pxt.Timestamp) -> pxt.Int
```
Between 1 and the number of days in the given month of the given year.
Equivalent to [`datetime.day`](https://docs.python.org/3/library/datetime.html#datetime.datetime.day).
## udf hour()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
hour(self: pxt.Timestamp) -> pxt.Int
```
Between 0 and 23 inclusive.
Equivalent to [`datetime.hour`](https://docs.python.org/3/library/datetime.html#datetime.datetime.hour).
## udf isocalendar()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
isocalendar(self: pxt.Timestamp) -> pxt.Json
```
Return a dictionary with three entries: `'year'`, `'week'`, and `'weekday'`.
Equivalent to
[`datetime.isocalendar()`](https://docs.python.org/3/library/datetime.html#datetime.datetime.isocalendar).
## udf isoformat()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
isoformat(
self: pxt.Timestamp,
sep: pxt.String = 'T',
timespec: pxt.String = 'auto'
) -> pxt.String
```
Return a string representing the date and time in ISO 8601 format.
Equivalent to [`datetime.isoformat()`](https://docs.python.org/3/library/datetime.html#datetime.datetime.isoformat).
**Parameters:**
* **`sep`** (`pxt.String`): Separator between date and time.
* **`timespec`** (`pxt.String`): The number of additional terms in the output. See the
[`datetime.isoformat()`](https://docs.python.org/3/library/datetime.html#datetime.datetime.isoformat)
documentation for more details.
## udf isoweekday()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
isoweekday(self: pxt.Timestamp) -> pxt.Int
```
Return the day of the week as an integer, where Monday is 1 and Sunday is 7.
Equivalent to [`datetime.isoweekday()`](https://docs.python.org/3/library/datetime.html#datetime.datetime.isoweekday).
## udf make\_timestamp()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
make_timestamp(
year: pxt.Int,
month: pxt.Int,
day: pxt.Int,
hour: pxt.Int = 0,
minute: pxt.Int = 0,
second: pxt.Int = 0,
microsecond: pxt.Int = 0
) -> pxt.Timestamp
```
Create a timestamp.
Equivalent to [`datetime()`](https://docs.python.org/3/library/datetime.html#datetime.datetime).
## udf microsecond()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
microsecond(self: pxt.Timestamp) -> pxt.Int
```
Between 0 and 999999 inclusive.
Equivalent to [`datetime.microsecond`](https://docs.python.org/3/library/datetime.html#datetime.datetime.microsecond).
## udf minute()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
minute(self: pxt.Timestamp) -> pxt.Int
```
Between 0 and 59 inclusive.
Equivalent to [`datetime.minute`](https://docs.python.org/3/library/datetime.html#datetime.datetime.minute).
## udf month()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
month(self: pxt.Timestamp) -> pxt.Int
```
Between 1 and 12 inclusive.
Equivalent to [`datetime.month`](https://docs.python.org/3/library/datetime.html#datetime.datetime.month).
## udf posix\_timestamp()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
posix_timestamp(self: pxt.Timestamp) -> pxt.Float
```
Return POSIX timestamp corresponding to the datetime instance.
Equivalent to [`datetime.timestamp()`](https://docs.python.org/3/library/datetime.html#datetime.datetime.timestamp).
## udf replace()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
replace(
self: pxt.Timestamp,
year: pxt.Int | None = None,
month: pxt.Int | None = None,
day: pxt.Int | None = None,
hour: pxt.Int | None = None,
minute: pxt.Int | None = None,
second: pxt.Int | None = None,
microsecond: pxt.Int | None = None
) -> pxt.Timestamp
```
Return a datetime with the same attributes, except for those attributes given new values by whichever keyword
arguments are specified.
Equivalent to [`datetime.replace()`](https://docs.python.org/3/library/datetime.html#datetime.datetime.replace).
## udf second()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
second(self: pxt.Timestamp) -> pxt.Int
```
Between 0 and 59 inclusive.
Equivalent to [`datetime.second`](https://docs.python.org/3/library/datetime.html#datetime.datetime.second).
## udf strftime()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
strftime(self: pxt.Timestamp, format: pxt.String) -> pxt.String
```
Return a string representing the date and time, controlled by an explicit format string.
Equivalent to [`datetime.strftime()`](https://docs.python.org/3/library/datetime.html#datetime.datetime.strftime).
**Parameters:**
* **`format`** (`pxt.String`): The format string to control the output. For a complete list of formatting directives, see
[`strftime()` and `strptime()` Behavior](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior).
## udf toordinal()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
toordinal(self: pxt.Timestamp) -> pxt.Int
```
Return the proleptic Gregorian ordinal of the date, where January 1 of year 1 has ordinal 1.
Equivalent to [`datetime.toordinal()`](https://docs.python.org/3/library/datetime.html#datetime.datetime.toordinal).
## udf weekday()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
weekday(self: pxt.Timestamp) -> pxt.Int
```
Between 0 (Monday) and 6 (Sunday) inclusive.
Equivalent to [`datetime.weekday()`](https://docs.python.org/3/library/datetime.html#datetime.datetime.weekday).
## udf year()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
year(self: pxt.Timestamp) -> pxt.Int
```
Between [`MINYEAR`](https://docs.python.org/3/library/datetime.html#datetime.MINYEAR) and
[`MAXYEAR`](https://docs.python.org/3/library/datetime.html#datetime.MAXYEAR) inclusive.
Equivalent to [`datetime.year`](https://docs.python.org/3/library/datetime.html#datetime.datetime.year).
# together
Source: https://docs.pixeltable.com/sdk/latest/together
# module pixeltable.functions.together
Pixeltable UDFs
that wrap various endpoints from the Together AI API. In order to use them, you must
first `pip install together` and configure your Together AI credentials, as described in
the [Working with Together AI](https://docs.pixeltable.com/notebooks/integrations/working-with-together-ai) tutorial.
## udf chat\_completions()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
chat_completions(
messages: pxt.Json[(Json, ...)],
*,
model: pxt.String,
model_kwargs: pxt.Json | None = None
) -> pxt.Json
```
Generate chat completions based on a given prompt using a specified model.
Equivalent to the Together AI `chat/completions` API endpoint.
For additional details, see: [https://docs.together.ai/reference/chat-completions-1](https://docs.together.ai/reference/chat-completions-1)
Request throttling:
Applies the rate limit set in the config (section `together.rate_limits`, key `chat`). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install together`
**Parameters:**
* **`messages`** (`pxt.Json[(Json`): A list of messages comprising the conversation so far.
* **`model`** (`Any`): The name of the model to query.
* **`model_kwargs`** (`Any`): Additional keyword arguments for the Together `chat/completions` API.
For details on the available parameters, see: [https://docs.together.ai/reference/chat-completions-1](https://docs.together.ai/reference/chat-completions-1)
**Returns:**
* `pxt.Json`: A dictionary containing the response and other metadata.
**Examples:**
Add a computed column that applies the model `openai/gpt-oss-20b` to an existing Pixeltable column
`tbl.prompt` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
messages = [{'role': 'user', 'content': tbl.prompt}]
tbl.add_computed_column(
response=chat_completions(messages, model='openai/gpt-oss-20b')
)
```
## udf completions()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
completions(
prompt: pxt.String,
*,
model: pxt.String,
model_kwargs: pxt.Json | None = None
) -> pxt.Json
```
Generate completions based on a given prompt using a specified model.
Equivalent to the Together AI `completions` API endpoint.
For additional details, see: [https://docs.together.ai/reference/completions-1](https://docs.together.ai/reference/completions-1)
Request throttling:
Applies the rate limit set in the config (section `together.rate_limits`, key `chat`). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install together`
**Parameters:**
* **`prompt`** (`pxt.String`): A string providing context for the model to complete.
* **`model`** (`pxt.String`): The name of the model to query.
* **`model_kwargs`** (`pxt.Json | None`): Additional keyword arguments for the Together `completions` API.
For details on the available parameters, see: [https://docs.together.ai/reference/completions-1](https://docs.together.ai/reference/completions-1)
**Returns:**
* `pxt.Json`: A dictionary containing the response and other metadata.
**Examples:**
Add a computed column that applies the model `Qwen/Qwen3.5-9B` to an existing Pixeltable column
`tbl.prompt` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
response=completions(tbl.prompt, model='Qwen/Qwen3.5-9B')
)
```
## udf embeddings()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
embeddings(
input: pxt.String,
*,
model: pxt.String
) -> pxt.Array[(None,), float32]
```
Query an embedding model for a given string of text.
Equivalent to the Together AI `embeddings` API endpoint.
For additional details, see: [https://docs.together.ai/reference/embeddings-2](https://docs.together.ai/reference/embeddings-2)
Request throttling:
Applies the rate limit set in the config (section `together.rate_limits`, key `embeddings`). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install together`
**Parameters:**
* **`input`** (`pxt.String`): A string providing the text for the model to embed.
* **`model`** (`pxt.String`): The name of the embedding model to use.
**Returns:**
* `pxt.Array[(None,), float32]`: An array representing the application of the given embedding to `input`.
**Examples:**
Add a computed column that applies the model `intfloat/multilingual-e5-large-instruct`
to an existing Pixeltable column `tbl.text` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
response=embeddings(
tbl.text, model='intfloat/multilingual-e5-large-instruct'
)
)
```
## udf image\_generations()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
image_generations(
prompt: pxt.String,
*,
model: pxt.String,
model_kwargs: pxt.Json | None = None
) -> pxt.Image
```
Generate images based on a given prompt using a specified model.
Equivalent to the Together AI `images/generations` API endpoint.
For additional details, see: [https://docs.together.ai/reference/post\_images-generations](https://docs.together.ai/reference/post_images-generations)
Request throttling:
Applies the rate limit set in the config (section `together.rate_limits`, key `images`). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install together`
**Parameters:**
* **`prompt`** (`pxt.String`): A description of the desired images.
* **`model`** (`pxt.String`): The model to use for image generation.
* **`model_kwargs`** (`pxt.Json | None`): Additional keyword args for the Together `images/generations` API.
For details on the available parameters, see: [https://docs.together.ai/reference/post\_images-generations](https://docs.together.ai/reference/post_images-generations)
**Returns:**
* `pxt.Image`: The generated image.
**Examples:**
Add a computed column that applies the model `black-forest-labs/FLUX.1-schnell`
to an existing Pixeltable column `tbl.prompt` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
response=image_generations(
tbl.prompt, model='black-forest-labs/FLUX.1-schnell'
)
)
```
# TreeNode
Source: https://docs.pixeltable.com/sdk/latest/treenode
## ⚠️ Error Loading Class
Failed to load class `TreeNode`:
```
TreeNode is not a class
```
# twelvelabs
Source: https://docs.pixeltable.com/sdk/latest/twelvelabs
# module pixeltable.functions.twelvelabs
Pixeltable UDFs
that wrap various endpoints from the TwelveLabs API. In order to use them, you must
first `pip install twelvelabs` and configure your TwelveLabs credentials, as described in
the [Working with TwelveLabs](https://docs.pixeltable.com/howto/providers/working-with-twelvelabs) tutorial.
## udf embed()
```python Signatures theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Signature 1:
@pxt.udf
embed(
text: pxt.String,
image: pxt.Image | None,
model_name: pxt.String
) -> pxt.Array[float32] | None
# Signature 2:
@pxt.udf
embed(
image: pxt.Image,
model_name: pxt.String
) -> pxt.Array[float32] | None
# Signature 3:
@pxt.udf
embed(
audio: pxt.Audio,
model_name: pxt.String,
start_sec: pxt.Float | None,
end_sec: pxt.Float | None,
embedding_option: pxt.Json[(String, ...)] | None
) -> pxt.Array[float32] | None
# Signature 4:
@pxt.udf
embed(
video: pxt.Video,
model_name: pxt.String,
start_sec: pxt.Float | None,
end_sec: pxt.Float | None,
embedding_option: pxt.Json[(String, ...)] | None
) -> pxt.Array[float32] | None
```
Creates an embedding vector for the given text, audio, image, or video input.
Each UDF signature corresponds to one of the four supported input types. If text is specified, it is possible to
specify an image as well, corresponding to the `text_image` embedding type in the TwelveLabs API. This is
(currently) the only way to include more than one input type at a time.
Equivalent to the TwelveLabs Embed API:
[https://docs.twelvelabs.io/v1.3/docs/guides/create-embeddings](https://docs.twelvelabs.io/v1.3/docs/guides/create-embeddings)
Request throttling:
Applies the rate limit set in the config (section `twelvelabs`, key `rate_limit`). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install twelvelabs`
**Parameters:**
* **`model_name`** (`String`): The name of the model to use. Check
[the TwelveLabs documentation](https://docs.twelvelabs.io/v1.3/sdk-reference/python/create-embeddings-v-1/create-text-image-and-audio-embeddings)
for available models.
* **`text`** (`String`): The text to embed.
* **`image`** (`Image | None`, default: `Literal(None)`): If specified, the embedding will be created from both the text and the image.
**Returns:**
* `pxt.Array[float32] | None`: The embedding.
**Examples:**
Add a computed column `embed` for an embedding of a string column `input`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
embed=embed(model_name='marengo3.0', text=tbl.input)
)
```
# UpdateStatus
Source: https://docs.pixeltable.com/sdk/latest/updatestatus
# class pixeltable.UpdateStatus
Information about changes to table data or table schema
## attr ext\_num\_rows
```
ext_num_rows: int
```
Total number of rows affected in an external store.
## attr external\_rows\_created
```
external_rows_created: int
```
Number of rows created in an external store.
## attr external\_rows\_deleted
```
external_rows_deleted: int
```
Number of rows deleted from an external store.
## attr external\_rows\_updated
```
external_rows_updated: int
```
Number of rows updated in an external store.
## attr num\_computed\_values
```
num_computed_values: int
```
Total number of computed values affected (including cascaded changes).
## attr num\_excs
```
num_excs: int
```
Total number of exceptions encountered (including cascaded changes).
## attr num\_rows
```
num_rows: int
```
Total number of rows affected (including cascaded changes).
## attr pxt\_rows\_updated
```
pxt_rows_updated: int
```
Returns the number of Pixeltable rows that were updated as a result of the operation.
# uuid
Source: https://docs.pixeltable.com/sdk/latest/uuid
# module pixeltable.functions.uuid
Pixeltable UDFs for `UUID`.
## udf to\_string()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
to_string(u: pxt.UUID) -> pxt.String
```
Convert a UUID to its string representation.
**Parameters:**
* **`u`** (`pxt.UUID`): The UUID to convert.
**Returns:**
* `pxt.String`: The string representation of the UUID, in the form `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`.
**Examples:**
Convert the UUID column `id` in an existing table `tbl` to a string:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(id_string=to_string(tbl.id))
```
## udf uuid4()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
uuid4() -> pxt.UUID
```
Generate a random UUID (version 4).
Equivalent to [`uuid.uuid4()`](https://docs.python.org/3/library/uuid.html#uuid.uuid4).
## udf uuid7()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
uuid7() -> pxt.UUID
```
Generate a time-based UUID.
Equivalent to [`uuid.uuid7()`](https://docs.python.org/3/library/uuid.html#uuid.uuid7).
# VersionMetadata
Source: https://docs.pixeltable.com/sdk/latest/versionmetadata
# class pixeltable.VersionMetadata
Metadata for a specific version of a Pixeltable table.
## attr change\_type
```
change_type: Literal['data', 'schema']
```
The type of table transformation that this version represents (`'data'` or `'schema'`).
## attr created\_at
```
created_at: datetime.datetime
```
The timestamp when this version was created.
## attr deletes
```
deletes: int
```
The number of rows deleted in this version.
## attr errors
```
errors: int
```
The number of errors encountered during this version.
## attr inserts
```
inserts: int
```
The number of rows inserted in this version.
## attr schema\_change
```
schema_change: str | None
```
A description of the schema change that occurred in this version, if any.
## attr updates
```
updates: int
```
The number of rows updated in this version.
## attr user
```
user: str | None
```
The user who created this version, if defined.
## attr version
```
version: int
```
The version number.
# video
Source: https://docs.pixeltable.com/sdk/latest/video
# module pixeltable.functions.video
Pixeltable UDFs for `VideoType`.
## iterator frame\_iterator()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.iterator
frame_iterator(
video: pxt.Video,
*,
fps: pxt.Float | None = None,
num_frames: pxt.Int | None = None,
keyframes_only: pxt.Bool = False
)
```
Iterator over frames of a video. At most one of `fps`, `num_frames` or `keyframes_only` may be specified. If `fps`
is specified, then frames will be extracted at the specified rate (frames per second). If `num_frames` is specified,
then the exact number of frames will be extracted. If neither is specified, then all frames will be extracted.
**Outputs:**
One row per extracted frame, with the following columns:
* `frame` (`pxt.Image`): The extracted video frame
* `frame_attrs` (`pxt.Json`): A dictionary containing the following attributes (for more information,
see `pyav`'s documentation on
[VideoFrame](https://pyav.org/docs/develop/api/video.html#module-av.video.frame) and
[Frame](https://pyav.org/docs/develop/api/frame.html)):
* `index` (`int`): The index of the frame in the video stream
* `pts` (`int | None`): The presentation timestamp of the frame
* `dts` (`int | None`): The decoding timestamp of the frame
* `time` (`float | None`): The timestamp of the frame in seconds
* `is_corrupt` (`bool`): `True` if the frame is corrupt
* `key_frame` (`bool`): `True` if the frame is a keyframe
* `pict_type` (`int`): The picture type of the frame
* `interlaced_frame` (`bool`): `True` if the frame is interlaced
**Parameters:**
* **`fps`** (`pxt.Float | None`): Number of frames to extract per second of video. This may be a fractional value, such as 0.5.
If omitted, or if greater than the native framerate of the video,
then the framerate of the video will be used (all frames will be extracted).
* **`num_frames`** (`pxt.Int | None`): Exact number of frames to extract. The frames will be spaced as evenly as possible. If
`num_frames` is greater than the number of frames in the video, all frames will be extracted.
* **`keyframes_only`** (`pxt.Bool`): If True, only extract keyframes.
**Examples:**
All these examples assume an existing table `tbl` with a column `video` of type `pxt.Video`.
Create a view that extracts all frames from all videos:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.create_view('all_frames', tbl, iterator=frame_iterator(tbl.video))
```
Create a view that extracts only keyframes from all videos:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.create_view(
'keyframes',
tbl,
iterator=frame_iterator(tbl.video, keyframes_only=True),
)
```
Create a view that extracts frames from all videos at a rate of 1 frame per second:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.create_view(
'one_fps_frames', tbl, iterator=frame_iterator(tbl.video, fps=1.0)
)
```
Create a view that extracts exactly 10 frames from each video:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.create_view(
'ten_frames', tbl, iterator=frame_iterator(tbl.video, num_frames=10)
)
```
## iterator video\_splitter()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.iterator
video_splitter(
video: pxt.Video,
*,
duration: pxt.Float | None = None,
overlap: pxt.Float | None = None,
min_segment_duration: pxt.Float | None = None,
segment_times: pxt.Json[(Float, ...)] | None = None,
mode: pxt.String = 'accurate',
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
)
```
Iterator over segments of a video file, which is split into segments. The segments are specified either via a
fixed duration or a list of split points.
**Parameters:**
* **`duration`** (`pxt.Float | None`): Video segment duration in seconds
* **`overlap`** (`pxt.Float | None`): Overlap between consecutive segments in seconds. Only available for `mode='fast'`.
* **`min_segment_duration`** (`pxt.Float | None`): Drop the last segment if it is smaller than min\_segment\_duration.
* **`segment_times`** (`pxt.Json[(Float`): List of timestamps (in seconds) in video where segments should be split. Note that these are not
segment durations. If all segment times are less than the duration of the video, produces exactly
`len(segment_times) + 1` segments. An argument of `[]` will produce a single segment containing the
entire video.
* **`mode`** (`Any`): Segmentation mode:
* `'fast'`: Quick segmentation using stream copy (splits only at keyframes, approximate durations)
* `'accurate'`: Precise segmentation with re-encoding (exact durations, slower)
* **`video_encoder`** (`Any`): Video encoder to use. If not specified, uses the default encoder for the current platform.
Only available for `mode='accurate'`.
* **`video_encoder_args`** (`Any`): Additional arguments to pass to the video encoder. Only available for `mode='accurate'`.
**Examples:**
All these examples assume an existing table `tbl` with a column `video` of type `pxt.Video`.
Create a view that splits each video into 10-second segments:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.create_view(
'ten_second_segments',
tbl,
iterator=video_splitter(tbl.video, duration=10.0),
)
```
Create a view that splits each video into segments at specified fixed times:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
split_times = [5.0, 15.0, 30.0]
pxt.create_view(
'custom_segments',
tbl,
iterator=video_splitter(tbl.video, segment_times=split_times),
)
```
Create a view that splits each video into segments at times specified by a column `split_times` of type
`pxt.Json`, containing a list of timestamps in seconds:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.create_view(
'custom_segments',
tbl,
iterator=video_splitter(tbl.video, segment_times=tbl.split_times),
)
```
## uda concat\_videos\_agg()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.uda
concat_videos_agg(*args, **kwargs) -> pxt.Video | None
```
Aggregate function that merges videos into a single video.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
* All videos must have the same resolution
**Parameters:**
* **`video_encoder`** (`pxt.String | None`): Video encoder to use. If not specified, uses the default encoder for the current platform.
* **`video_encoder_args`** (`pxt.Json | None`): Additional arguments to pass to the video encoder.
**Returns:**
* `pxt.Video | None`: A new video containing all input videos concatenated in order, or None if all inputs are None.
**Examples:**
Concatenate all videos in a table, ordered by timestamp:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(concat_videos_agg(tbl.timestamp, tbl.video)).collect()
```
## uda make\_video()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.uda
make_video(*args, **kwargs) -> pxt.Video
```
Aggregate function that creates a video from a sequence of images, using the default video encoder and
yuv420p pixel format.
**Parameters:**
* **`fps`** (`pxt.Int`): Frames per second for the output video.
**Returns:**
* `pxt.Video`: The video obtained by combining the input frames at the specified `fps`.
**Examples:**
Combine the images in the `img` column of the table `tbl` into a video:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(make_video(tbl.img, fps=30)).collect()
```
Combine a sequence of rotated images into a video:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(make_video(tbl.img.rotate(45), fps=30)).collect()
```
For a more extensive example, see the
[Object Detection in Videos](https://docs.pixeltable.com/howto/cookbooks/video/object-detection-in-videos)
cookbook.
## udf clip()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
clip(
video: pxt.Video,
*,
start_time: pxt.Float,
end_time: pxt.Float | None = None,
duration: pxt.Float | None = None,
mode: pxt.String = 'accurate',
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Video | None
```
Extract a clip from a video, specified by `start_time` and either `end_time` or `duration` (in seconds).
If `start_time` is beyond the end of the video, returns None. Can only specify one of `end_time` and `duration`.
If both `end_time` and `duration` are None, the clip goes to the end of the video.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`video`** (`pxt.Video`): Input video file
* **`start_time`** (`pxt.Float`): Start time in seconds
* **`end_time`** (`pxt.Float | None`): End time in seconds
* **`duration`** (`pxt.Float | None`): Duration of the clip in seconds
* **`mode`** (`pxt.String`): Clip mode:
* `'fast'`: avoids re-encoding but starts the clip at the nearest keyframes and as a result, the clip
duration will be slightly longer than requested
* `'accurate'`: extracts a frame-accurate clip, but requires re-encoding
* **`video_encoder`** (`pxt.String | None`): Video encoder to use. If not specified, uses the default encoder for the current platform.
Only available for `mode='accurate'`.
* **`video_encoder_args`** (`pxt.Json | None`): Additional arguments to pass to the video encoder. Only available for `mode='accurate'`.
**Returns:**
* `pxt.Video | None`: New video containing only the specified time range or None if start\_time is beyond the end of the video.
## udf concat\_videos()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
concat_videos(
videos: pxt.Json[(Video, ...)],
*,
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Video | None
```
Merge multiple videos into a single video.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`videos`** (`pxt.Json[(Video`): List of videos to merge.
* **`video_encoder`** (`Any`): Video encoder to use. If not specified, uses the default encoder for the current platform.
* **`video_encoder_args`** (`Any`): Additional arguments to pass to the video encoder.
**Returns:**
* `pxt.Video | None`: A new video containing the merged videos, or None if the input list is empty.
## udf crop()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
crop(
video: pxt.Video,
bbox: pxt.Json[(Int, ...)],
*,
bbox_format: pxt.String = 'xywh',
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Video
```
Crop a rectangular region from a video using ffmpeg's crop filter.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`video`** (`pxt.Video`): Input video.
* **`bbox`** (`pxt.Json[(Int`): Crop region as a list of 4 integers.
* **`bbox_format`** (`Any`): Format of the `bbox` coordinates:
* `'xyxy'`: `[x1, y1, x2, y2]` where (x1, y1) is top-left and (x2, y2) is bottom-right
* `'xywh'`: `[x, y, width, height]` where (x, y) is top-left corner
* `'cxcywh'`: `[cx, cy, width, height]` where (cx, cy) is the center
* **`video_encoder`** (`Any`): Video encoder to use. If not specified, uses the default encoder.
* **`video_encoder_args`** (`Any`): Additional arguments to pass to the video encoder.
**Returns:**
* `pxt.Video`: Video containing the cropped region.
**Examples:**
Crop using default xywh format:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.crop2([100, 50, 320, 240])).collect()
```
Crop using xyxy format (common in object detection):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
tbl.video.crop2([100, 50, 420, 290], bbox_format='xyxy')
).collect()
```
Crop using center format:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
tbl.video.crop2([260, 170, 320, 240], bbox_format='cxcywh')
).collect()
```
Use with yolox object detection output:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
cropped=tbl.video.crop2(tbl.detections.bboxes[0], bbox_format='xyxy')
)
```
## udf extract\_audio()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
extract_audio(
video_path: pxt.Video,
stream_idx: pxt.Int = 0,
format: pxt.String = 'wav',
codec: pxt.String | None = None
) -> pxt.Audio
```
Extract an audio stream from a video.
**Parameters:**
* **`stream_idx`** (`pxt.Int`): Index of the audio stream to extract.
* **`format`** (`pxt.String`): The target audio format. (`'wav'`, `'mp3'`, `'flac'`).
* **`codec`** (`pxt.String | None`): The codec to use for the audio stream. If not provided, a default codec will be used.
**Returns:**
* `pxt.Audio`: The extracted audio.
**Examples:**
Add a computed column to a table `tbl` that extracts audio from an existing column `video_col`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
extracted_audio=tbl.video_col.extract_audio(format='flac')
)
```
## udf extract\_frame()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
extract_frame(
video: pxt.Video,
*,
timestamp: pxt.Float
) -> pxt.Image | None
```
Extract a single frame from a video at a specific timestamp.
**Parameters:**
* **`video`** (`pxt.Video`): The video from which to extract the frame.
* **`timestamp`** (`pxt.Float`): Extract frame at this timestamp (in seconds).
**Returns:**
* `pxt.Image | None`: The extracted frame as a PIL Image, or None if the timestamp is beyond the video duration.
**Examples:**
Extract the first frame from each video in the `video` column of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.extract_frame(0.0)).collect()
```
Extract a frame close to the end of each video in the `video` column of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
tbl.video.extract_frame(
tbl.video.get_metadata().streams[0].duration_seconds - 0.1
)
).collect()
```
## udf fade\_in()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
fade_in(
video: pxt.Video,
*,
duration: pxt.Float = 1.0,
color: pxt.String = 'black',
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Video
```
Apply a fade-in effect from a solid color at the start of a video using ffmpeg's fade filter.
The video transitions from a solid `color` to the full video content over `duration` seconds.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`video`** (`pxt.Video`): Input video.
* **`duration`** (`pxt.Float`): Duration of the fade-in effect in seconds.
* **`color`** (`pxt.String`): Color to fade from (e.g., `'black'`, `'white'`, `'#FF0000'`).
* **`video_encoder`** (`pxt.String | None`): Video encoder to use. If not specified, uses the default encoder for the current platform.
* **`video_encoder_args`** (`pxt.Json | None`): Additional arguments to pass to the video encoder.
**Returns:**
* `pxt.Video`: A new video with the fade-in effect applied.
**Examples:**
Apply a 1-second fade from black (default):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.fade_in()).collect()
```
Apply a 2-second fade from white:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.fade_in(duration=2.0, color='white')).collect()
```
## udf fade\_out()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
fade_out(
video: pxt.Video,
*,
duration: pxt.Float = 1.0,
color: pxt.String = 'black',
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Video
```
Apply a fade-out effect to a solid color at the end of a video using ffmpeg's fade filter.
The video transitions from the full video content to a solid `color` over the final `duration` seconds.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`video`** (`pxt.Video`): Input video.
* **`duration`** (`pxt.Float`): Duration of the fade-out effect in seconds.
* **`color`** (`pxt.String`): Color to fade to (e.g., `'black'`, `'white'`, `'#FF0000'`).
* **`video_encoder`** (`pxt.String | None`): Video encoder to use. If not specified, uses the default encoder for the current platform.
* **`video_encoder_args`** (`pxt.Json | None`): Additional arguments to pass to the video encoder.
**Returns:**
* `pxt.Video`: A new video with the fade-out effect applied.
**Examples:**
Apply a 1-second fade to black (default):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.fade_out()).collect()
```
Apply a 3-second fade to white:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.fade_out(duration=3.0, color='white')).collect()
```
## udf get\_duration()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
get_duration(video: pxt.Video) -> pxt.Float | None
```
Get video duration in seconds.
**Parameters:**
* **`video`** (`pxt.Video`): The video for which to get the duration.
**Returns:**
* `pxt.Float | None`: The duration in seconds, or None if the duration cannot be determined.
## udf get\_metadata()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
get_metadata(video: pxt.Video) -> ContainerMetadata
```
Gets various metadata associated with a video file and returns it as
a [`ContainerMetadata`](./containermetadata) dictionary.
**Parameters:**
* **`video`** (`pxt.Video`): The video for which to get metadata.
**Returns:**
* `ContainerMetadata`: A [`ContainerMetadata`](./containermetadata) with typical structure:
```json theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{
'bit_exact': False,
'bit_rate': 967260,
'size': 2234371,
'metadata': {
'encoder': 'Lavf60.16.100',
'major_brand': 'isom',
'minor_version': '512',
'compatible_brands': 'isomiso2avc1mp41',
},
'streams': [
{
'type': 'video',
'width': 640,
'height': 360,
'frames': 462,
'time_base': 1.0 / 12800,
'duration': 236544,
'duration_seconds': 236544.0 / 12800,
'average_rate': 25.0,
'base_rate': 25.0,
'guessed_rate': 25.0,
'metadata': {
'language': 'und',
'handler_name': 'L-SMASH Video Handler',
'vendor_id': '[0][0][0][0]',
'encoder': 'Lavc60.31.102 libx264',
},
'codec_context': {'name': 'h264', 'codec_tag': 'avc1', 'profile': 'High', 'pix_fmt': 'yuv420p'},
}
],
}
```
**Examples:**
Extract metadata for files in the `video_col` column of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video_col.get_metadata()).collect()
```
## udf grayscale()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
grayscale(
video: pxt.Video,
*,
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Video
```
Convert a video to grayscale
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`video`** (`pxt.Video`): Input video.
* **`video_encoder`** (`pxt.String | None`): Video encoder to use. If not specified, uses the default encoder for the current platform.
* **`video_encoder_args`** (`pxt.Json | None`): Additional arguments to pass to the video encoder.
**Returns:**
* `pxt.Video`: A grayscale version of the video.
**Examples:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.grayscale()).collect()
```
## udf mirror\_x()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
mirror_x(
video: pxt.Video,
*,
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Video
```
Flip a video horizontally using ffmpeg's hflip filter.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`video`** (`pxt.Video`): Input video.
* **`video_encoder`** (`pxt.String | None`): Video encoder to use. If not specified, uses the default encoder for the current platform.
* **`video_encoder_args`** (`pxt.Json | None`): Additional arguments to pass to the video encoder.
**Returns:**
* `pxt.Video`: A horizontally flipped video.
**Examples:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.mirror_x()).collect()
```
## udf mirror\_y()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
mirror_y(
video: pxt.Video,
*,
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Video
```
Flip a video vertically using ffmpeg's vflip filter.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`video`** (`pxt.Video`): Input video.
* **`video_encoder`** (`pxt.String | None`): Video encoder to use. If not specified, uses the default encoder for the current platform.
* **`video_encoder_args`** (`pxt.Json | None`): Additional arguments to pass to the video encoder.
**Returns:**
* `pxt.Video`: A vertically flipped video.
**Examples:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.mirror_y()).collect()
```
## udf mix\_audio()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
mix_audio(
video: pxt.Video,
audio: pxt.Audio,
*,
audio_volume: pxt.Float = 1.0,
original_volume: pxt.Float = 1.0,
audio_start_time: pxt.Float = 0.0,
mix_duration: pxt.String = 'longest',
normalize: pxt.Bool = False,
dropout_transition: pxt.Float = 2.0,
align_to_video: pxt.String = 'trim',
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Video
```
Mix an audio track into a video's existing audio, blending both tracks together. Volume levels for each track can
be controlled independently.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`video`** (`pxt.Video`): Input video (must have an existing audio stream).
* **`audio`** (`pxt.Audio`): Audio track to mix in.
* **`audio_volume`** (`pxt.Float`): Volume multiplier for the added audio track. 1.0 is original volume.
* **`original_volume`** (`pxt.Float`): Volume multiplier for the video's existing audio track. 1.0 is original volume.
* **`audio_start_time`** (`pxt.Float`): Time in seconds at which the added audio begins playing in the output.
* **`mix_duration`** (`pxt.String`): Controls which input determines the length of the mixed audio stream.
* `"longest"`: the mix runs until the longer of the two audio inputs ends. Use
this when the added audio (e.g. a music bed) is longer than the video's original audio.
* `"first"`: the mix ends when the video's original audio track ends. Useful when the
original audio and video streams are the same length.
* `"shortest"`: the mix ends when the shorter input ends, truncating whichever track is longer.
* **`normalize`** (`pxt.Bool`): If `True`, ffmpeg scales the mixed output to prevent clipping by dividing each track's
contribution by the number of inputs. Defaults to `False` so that `audio_volume` and
`original_volume` mean what they say; flip on if you are not setting volumes explicitly and
want automatic clip protection.
* **`dropout_transition`** (`pxt.Float`): Duration in seconds over which a track's contribution fades to zero after
it ends, preventing audible clicks at hard boundaries. Defaults to 2.0 seconds, matching
ffmpeg's own default. Set to 0.0 to disable. Relevant whenever one input ends before
the mixed output ends, regardless of which `mix_duration` mode is selected.
* **`align_to_video`** (`pxt.String`): Post-mix adjustment to align the output audio stream with the video stream duration.
Applied after `amix`, so it is independent of `mix_duration`.
* `"trim"`: if the mixed audio is longer than the video stream, truncate it to
match. Pairs naturally with `mix_duration="longest"` for music-bed workflows.
* `"none"`: no adjustment; output audio duration is whatever `amix` produces.
* `"pad"`: if the mixed audio is shorter than the video stream, extend it with silence. Also
trims to the video duration if the mix runs long.
* **`video_encoder`** (`pxt.String | None`): Video encoder to use. If not specified, uses the default encoder.
* **`video_encoder_args`** (`pxt.Json | None`): Additional arguments to pass to the video encoder.
**Returns:**
* `pxt.Video`: A new video with both audio tracks mixed together.
**Examples:**
Add background music at 30% volume. With the defaults, this lays the music as a bed under the
video: `mix_duration="longest"` keeps the music playing past the end of the original audio,
`align_to_video="trim"` caps the result at the video stream duration, and `normalize=False`
means the `audio_volume=0.3` setting is taken at face value rather than halved by `amix`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.mix_audio(tbl.music, audio_volume=0.3)).collect()
```
Mix audio starting at second 5, with the original audio reduced:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
tbl.video.mix_audio(
tbl.music,
audio_volume=0.5,
original_volume=0.7,
audio_start_time=5.0,
)
).collect()
```
Pad a short ambient track with silence so the audio stream matches the full video length:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
tbl.video.mix_audio(
tbl.ambient, audio_volume=0.6, align_to_video='pad'
)
).collect()
```
## udf overlay\_image()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
overlay_image(
video: pxt.Video,
image: pxt.Image,
*,
horizontal_align: pxt.String = 'center',
horizontal_margin: pxt.Int = 0,
vertical_align: pxt.String = 'center',
vertical_margin: pxt.Int = 0,
scale: pxt.Float | None = None,
opacity: pxt.Float = 1.0,
start_time: pxt.Float | None = None,
end_time: pxt.Float | None = None,
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Video
```
Overlay an image on a video with customizable positioning, scaling, opacity, and timing.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`video`** (`pxt.Video`): Input video to overlay the image on.
* **`image`** (`pxt.Image`): Image to overlay.
* **`horizontal_align`** (`pxt.String`): Horizontal alignment of the overlay (`'left'`, `'center'`, `'right'`).
* **`horizontal_margin`** (`pxt.Int`): Horizontal margin in pixels from the alignment edge.
* **`vertical_align`** (`pxt.String`): Vertical alignment of the overlay (`'top'`, `'center'`, `'bottom'`).
* **`vertical_margin`** (`pxt.Int`): Vertical margin in pixels from the alignment edge.
* **`scale`** (`pxt.Float | None`): Scale factor for the overlay image relative to the video height. For example, 0.1 scales the
image to 10% of the video height while preserving aspect ratio. If None, uses the original size.
* **`opacity`** (`pxt.Float`): Overlay opacity from 0.0 (transparent) to 1.0 (opaque).
* **`start_time`** (`pxt.Float | None`): Time in seconds when the overlay appears. If None, the overlay is visible from the start.
* **`end_time`** (`pxt.Float | None`): Time in seconds when the overlay disappears. If None, the overlay is visible until the end.
* **`video_encoder`** (`pxt.String | None`): Video encoder to use. If not specified, uses the default encoder.
* **`video_encoder_args`** (`pxt.Json | None`): Additional arguments to pass to the video encoder.
**Returns:**
* `pxt.Video`: A new video with the image overlay applied.
**Examples:**
Add a logo to the top-right corner:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
tbl.video.overlay_image(
tbl.logo_img, horizontal_align='right', vertical_align='top'
)
).collect()
```
Add a watermark at 50% opacity, scaled to 10% of video height:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
tbl.video.overlay_image(tbl.watermark_img, scale=0.1, opacity=0.5)
).collect()
```
Show an image only between seconds 2 and 8:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
tbl.video.overlay_image(
tbl.img, start_time=2.0, end_time=8.0, horizontal_align='right'
)
).collect()
```
## udf overlay\_text()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
overlay_text(
video: pxt.Video,
text: pxt.String,
*,
font: pxt.String | None = None,
font_size: pxt.Int = 24,
line_spacing: pxt.Int = 0,
color: pxt.String = 'white',
opacity: pxt.Float = 1.0,
horizontal_align: pxt.String = 'center',
horizontal_margin: pxt.Int = 0,
vertical_align: pxt.String = 'center',
vertical_margin: pxt.Int = 0,
box: pxt.Bool = False,
box_color: pxt.String = 'black',
box_opacity: pxt.Float = 1.0,
box_border: pxt.Json[(Int, ...)] | None = None,
start_time: pxt.Float | None = None,
end_time: pxt.Float | None = None,
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Video
```
Overlay text on a video with customizable positioning and styling.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
Args:
video: Input video to overlay text on.
text: The text string to overlay on the video.
font: Font family or path to font file. If None, uses the system default.
font\_size: Size of the text in points.
line\_spacing: Pixels of vertical space added between lines of multi-line text (text containing
`
`). Defaults to 0. Negative values pull lines closer together for tighter packing.
Ignored for single-line text.
color: Text color (e.g., `'white'`, `'red'`, `'#FF0000'`).
opacity: Text opacity from 0.0 (transparent) to 1.0 (opaque).
horizontal\_align: Horizontal text alignment (`'left'`, `'center'`, `'right'`).
horizontal\_margin: Horizontal margin in pixels from the alignment edge.
vertical\_align: Vertical text alignment (`'top'`, `'center'`, `'bottom'`).
vertical\_margin: Vertical margin in pixels from the alignment edge.
box: Whether to draw a background box behind the text.
box\_color: Background box color as a string.
box\_opacity: Background box opacity from 0.0 to 1.0.
box\_border: Padding around text in the box in pixels.
* `[10]`: 10 pixels on all sides
* `[10, 20]`: 10 pixels on top/bottom, 20 on left/right
* `[10, 20, 30]`: 10 pixels on top, 20 on left/right, 30 on bottom
* `[10, 20, 30, 40]`: 10 pixels on top, 20 on right, 30 on bottom, 40 on left
start\_time: Time in seconds when the text appears. If None, the text is visible from the start.
end\_time: Time in seconds when the text disappears. If None, the text is visible until the end.
video\_encoder: Video encoder to use. If not specified, uses the default encoder.
video\_encoder\_args: Additional arguments to pass to the video encoder.
Returns:
A new video with the text overlay applied.
Examples:
Add a simple text overlay to videos in a table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.overlay_text('Sample Text')).collect()
```
Add a YouTube-style caption:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
tbl.video.overlay_text(
'Caption text',
font_size=32,
color='white',
opacity=1.0,
box=True,
box_color='black',
box_opacity=0.8,
box_border=[6, 14],
horizontal_margin=10,
vertical_align='bottom',
vertical_margin=70,
)
).collect()
```
Add text with a semi-transparent background box:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
tbl.video.overlay_text(
'Important Message',
font_size=32,
color='yellow',
box=True,
box_color='black',
box_opacity=0.6,
box_border=[20, 10],
)
).collect()
```
## udf pan()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
pan(*args, **kwargs) -> pxt.Video | None
```
A parameterized expression from which an executable Expr is created with a function call.
## udf resize()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
resize(
video: pxt.Video,
*,
width: pxt.Int | None = None,
height: pxt.Int | None = None,
scale: pxt.Float | None = None,
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Video
```
Resize a video using ffmpeg's scale filter.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`video`** (`pxt.Video`): Input video.
* **`width`** (`pxt.Int | None`): Width of the output video. Maintains the existing aspect ratio if no `height` is provided.
* **`height`** (`pxt.Int | None`): Height of the output video. Maintains the existing aspect ratio if no `width` is provided.
* **`scale`** (`pxt.Float | None`): Scale factor. Mutually exclusive with `width` and `height`.
* **`video_encoder`** (`pxt.String | None`): Video encoder to use. If not specified, uses the default encoder for the current platform.
* **`video_encoder_args`** (`pxt.Json | None`): Additional arguments to pass to the video encoder.
**Returns:**
* `pxt.Video`: The resized video.
**Examples:**
Resize to a specific width, preserving aspect ratio:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.resize(width=640)).collect()
```
Resize to exact dimensions:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.resize(width=1280, height=720)).collect()
```
Scale down by half:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.resize(scale=0.5)).collect()
```
## udf reverse()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
reverse(
video: pxt.Video,
audio: pxt.String = 'drop',
*,
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Video
```
Reverse a video using ffmpeg's reverse filter.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`video`** (`pxt.Video`): Input video.
* **`audio`** (`pxt.String`): Specifies what to do with audio streams
* `'drop'`: drop the audio streams
* `'reverse'`: also reverse the audio streams
* `'keep'`: keep the audio streams
* **`video_encoder`** (`pxt.String | None`): Video encoder to use. If not specified, uses the default encoder for the current platform.
* **`video_encoder_args`** (`pxt.Json | None`): Additional arguments to pass to the video encoder.
**Returns:**
* `pxt.Video`: The reversed video.
**Examples:**
Reverse a video, dropping audio:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.reverse()).collect()
```
Reverse a video along with its audio:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.reverse(audio='reverse')).collect()
```
## udf rotate()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
rotate(
video: pxt.Video,
*,
angle: pxt.Float,
unit: pxt.String = 'deg',
expand: pxt.Bool = False,
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Video
```
Rotate a video by a fixed angle using ffmpeg's rotate filter.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`video`** (`pxt.Video`): Input video.
* **`angle`** (`pxt.Float`): Rotation angle. Positive values rotate counter-clockwise.
* **`unit`** (`pxt.String`): Unit of the angle: `'deg'` for degrees or `'rad'` for radians.
* **`expand`** (`pxt.Bool`): If True, the output frame is enlarged to contain the entire rotated frame (no cropping).
If False (default), the output frame keeps the original dimensions, cropping corners.
* **`video_encoder`** (`pxt.String | None`): Video encoder to use. If not specified, uses the default encoder for the current platform.
* **`video_encoder_args`** (`pxt.Json | None`): Additional arguments to pass to the video encoder.
**Returns:**
* `pxt.Video`: A new video rotated by the specified angle.
**Examples:**
Rotate 90 degrees counter-clockwise:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.rotate(angle=90)).collect()
```
Rotate 45 degrees with frame expansion to avoid cropping:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.rotate(angle=45, expand=True)).collect()
```
Rotate by pi/2 radians:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.rotate(angle=1.5708, unit='rad')).collect()
```
## udf scene\_detect\_adaptive()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
scene_detect_adaptive(
video: pxt.Video,
*,
fps: pxt.Float | None = None,
adaptive_threshold: pxt.Float = 3.0,
min_scene_len: pxt.Int = 15,
window_width: pxt.Int = 2,
min_content_val: pxt.Float = 15.0,
delta_hue: pxt.Float = 1.0,
delta_sat: pxt.Float = 1.0,
delta_lum: pxt.Float = 1.0,
delta_edges: pxt.Float = 0.0,
luma_only: pxt.Bool = False,
kernel_size: pxt.Int | None = None
) -> pxt.Json[(Json, ...)]
```
Detect scene cuts in a video using PySceneDetect's
[AdaptiveDetector](https://www.scenedetect.com/docs/latest/api/detectors.html#scenedetect.detectors.adaptive_detector.AdaptiveDetector).
**Requirements:**
* `pip install scenedetect`
**Parameters:**
* **`video`** (`pxt.Video`): The video to analyze for scene cuts.
* **`fps`** (`pxt.Float | None`): Number of frames to extract per second for analysis. If None or 0, analyzes all frames.
Lower values process faster but may miss exact scene cuts.
* **`adaptive_threshold`** (`pxt.Float`): Threshold that the score ratio must exceed to trigger a new scene cut.
Lower values will detect more scenes (more sensitive), higher values will detect fewer scenes.
* **`min_scene_len`** (`pxt.Int`): Once a cut is detected, this many frames must pass before a new one can be added to the scene
list.
* **`window_width`** (`pxt.Int`): Size of window (number of frames) before and after each frame to average together in order to
detect deviations from the mean. Must be at least 1.
* **`min_content_val`** (`pxt.Float`): Minimum threshold (float) that the content\_val must exceed in order to register as a new scene.
This is calculated the same way that `scene_detect_content()` calculates frame
score based on weights/luma\_only/kernel\_size.
* **`delta_hue`** (`pxt.Float`): Weight for hue component changes. Higher values make hue changes more important.
* **`delta_sat`** (`pxt.Float`): Weight for saturation component changes. Higher values make saturation changes more important.
* **`delta_lum`** (`pxt.Float`): Weight for luminance component changes. Higher values make brightness changes more important.
* **`delta_edges`** (`pxt.Float`): Weight for edge detection changes. Higher values make edge changes more important.
Edge detection can help detect cuts in scenes with similar colors but different content.
* **`luma_only`** (`pxt.Bool`): If True, only analyzes changes in the luminance (brightness) channel of the video,
ignoring color information. This can be faster and may work better for grayscale content.
* **`kernel_size`** (`pxt.Int | None`): Size of kernel to use for post edge detection filtering. If None, automatically set based on video
resolution.
**Returns:**
* `pxt.Json[(Json, ...)]`: A list of dictionaries, one for each detected scene, with the following keys:
* `start_time` (float): The start time of the scene in seconds.
* `start_pts` (int): The pts of the start of the scene.
* `duration` (float): The duration of the scene in seconds.
The list is ordered chronologically. Returns the full duration of the video if no scenes are detected.
**Examples:**
Detect scene cuts with default parameters:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scene_detect_adaptive()).collect()
```
Detect more scenes by lowering the threshold:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
tbl.video.scene_detect_adaptive(adaptive_threshold=1.5)
).collect()
```
Use luminance-only detection with a longer minimum scene length:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
tbl.video.scene_detect_adaptive(luma_only=True, min_scene_len=30)
).collect()
```
Add scene cuts as a computed column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
scene_cuts=tbl.video.scene_detect_adaptive(adaptive_threshold=2.0)
)
```
Analyze at a lower frame rate for faster processing:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scene_detect_adaptive(fps=2.0)).collect()
```
## udf scene\_detect\_content()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
scene_detect_content(
video: pxt.Video,
*,
fps: pxt.Float | None = None,
threshold: pxt.Float = 27.0,
min_scene_len: pxt.Int = 15,
delta_hue: pxt.Float = 1.0,
delta_sat: pxt.Float = 1.0,
delta_lum: pxt.Float = 1.0,
delta_edges: pxt.Float = 0.0,
luma_only: pxt.Bool = False,
kernel_size: pxt.Int | None = None,
filter_mode: pxt.String = 'merge'
) -> pxt.Json[(Json, ...)]
```
Detect scene cuts in a video using PySceneDetect's
[ContentDetector](https://www.scenedetect.com/docs/latest/api/detectors.html#scenedetect.detectors.content_detector.ContentDetector).
**Requirements:**
* `pip install scenedetect`
**Parameters:**
* **`video`** (`pxt.Video`): The video to analyze for scene cuts.
* **`fps`** (`pxt.Float | None`): Number of frames to extract per second for analysis. If None, analyzes all frames.
Lower values process faster but may miss exact scene cuts.
* **`threshold`** (`pxt.Float`): Threshold that the weighted sum of component changes must exceed to trigger a scene cut.
Lower values detect more scenes (more sensitive), higher values detect fewer scenes.
* **`min_scene_len`** (`pxt.Int`): Once a cut is detected, this many frames must pass before a new one can be added to the scene
list.
* **`delta_hue`** (`pxt.Float`): Weight for hue component changes. Higher values make hue changes more important.
* **`delta_sat`** (`pxt.Float`): Weight for saturation component changes. Higher values make saturation changes more important.
* **`delta_lum`** (`pxt.Float`): Weight for luminance component changes. Higher values make brightness changes more important.
* **`delta_edges`** (`pxt.Float`): Weight for edge detection changes. Higher values make edge changes more important.
Edge detection can help detect cuts in scenes with similar colors but different content.
* **`luma_only`** (`pxt.Bool`): If True, only analyzes changes in the luminance (brightness) channel,
ignoring color information. This can be faster and may work better for grayscale content.
* **`kernel_size`** (`pxt.Int | None`): Size of kernel for expanding detected edges. Must be odd integer greater than or equal to 3. If
None, automatically set using video resolution.
* **`filter_mode`** (`pxt.String`): How to handle fast cuts/flashes. 'merge' combines quick cuts, 'suppress' filters them out.
**Returns:**
* `pxt.Json[(Json, ...)]`: A list of dictionaries, one for each detected scene, with the following keys:
* `start_time` (float): The start time of the scene in seconds.
* `start_pts` (int): The pts of the start of the scene.
* `duration` (float): The duration of the scene in seconds.
The list is ordered chronologically. Returns the full duration of the video if no scenes are detected.
**Examples:**
Detect scene cuts with default parameters:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scene_detect_content()).collect()
```
Detect more scenes by lowering the threshold:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scene_detect_content(threshold=15.0)).collect()
```
Use luminance-only detection:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scene_detect_content(luma_only=True)).collect()
```
Emphasize edge detection for scenes with similar colors:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
tbl.video.scene_detect_content(
delta_edges=1.0, delta_hue=0.5, delta_sat=0.5
)
).collect()
```
Add scene cuts as a computed column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
scene_cuts=tbl.video.scene_detect_content(threshold=20.0)
)
```
## udf scene\_detect\_hash()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
scene_detect_hash(
video: pxt.Video,
*,
fps: pxt.Float | None = None,
threshold: pxt.Float = 0.395,
size: pxt.Int = 16,
lowpass: pxt.Int = 2,
min_scene_len: pxt.Int = 15
) -> pxt.Json[(Json, ...)]
```
Detect scene cuts in a video using PySceneDetect's
[HashDetector](https://www.scenedetect.com/docs/latest/api/detectors.html#scenedetect.detectors.hash_detector.HashDetector).
HashDetector uses perceptual hashing for very fast scene detection. It computes a hash of each
frame at reduced resolution and compares hash distances.
**Requirements:**
* `pip install scenedetect`
**Parameters:**
* **`video`** (`pxt.Video`): The video to analyze for scene cuts.
* **`fps`** (`pxt.Float | None`): Number of frames to extract per second for analysis. If None, analyzes all frames.
Lower values process faster but may miss exact scene cuts.
* **`threshold`** (`pxt.Float`): Value from 0.0 and 1.0 representing the relative hamming distance between the perceptual hashes of
adjacent frames. A distance of 0 means the image is the same, and 1 means no correlation. Smaller threshold
values thus require more correlation, making the detector more sensitive. The Hamming distance is divided
by size x size before comparing to threshold for normalization.
Lower values detect more scenes (more sensitive), higher values detect fewer scenes.
* **`size`** (`pxt.Int`): Size of square of low frequency data to use for the DCT. Larger values are more precise but slower.
Common values are 8, 16, or 32.
* **`lowpass`** (`pxt.Int`): How much high frequency information to filter from the DCT. A value of 2 means keep lower 1/2 of the
frequency data, 4 means only keep 1/4, etc. Larger values make the
detector less sensitive to high-frequency details and noise.
* **`min_scene_len`** (`pxt.Int`): Once a cut is detected, this many frames must pass before a new one can be added to the scene
list.
**Returns:**
* `pxt.Json[(Json, ...)]`: A list of dictionaries, one for each detected scene, with the following keys:
* `start_time` (float): The start time of the scene in seconds.
* `start_pts` (int): The pts of the start of the scene.
* `duration` (float): The duration of the scene in seconds.
The list is ordered chronologically. Returns the full duration of the video if no scenes are detected.
**Examples:**
Detect scene cuts with default parameters:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scene_detect_hash()).collect()
```
Detect more scenes by lowering the threshold:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scene_detect_hash(threshold=0.3)).collect()
```
Use larger hash size for more precision:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scene_detect_hash(size=32)).collect()
```
Use for fast processing with lower frame rate:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scene_detect_hash(fps=1.0, threshold=0.4)).collect()
```
Add scene cuts as a computed column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(scene_cuts=tbl.video.scene_detect_hash())
```
## udf scene\_detect\_histogram()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
scene_detect_histogram(
video: pxt.Video,
*,
fps: pxt.Float | None = None,
threshold: pxt.Float = 0.05,
bins: pxt.Int = 256,
min_scene_len: pxt.Int = 15
) -> pxt.Json[(Json, ...)]
```
Detect scene cuts in a video using PySceneDetect's
[HistogramDetector](https://www.scenedetect.com/docs/latest/api/detectors.html#scenedetect.detectors.histogram_detector.HistogramDetector).
HistogramDetector compares frame histograms on the Y (luminance) channel after YUV conversion.
It detects scenes based on relative histogram differences and is more robust to gradual lighting
changes than content-based detection.
**Requirements:**
* `pip install scenedetect`
**Parameters:**
* **`video`** (`pxt.Video`): The video to analyze for scene cuts.
* **`fps`** (`pxt.Float | None`): Number of frames to extract per second for analysis. If None or 0, analyzes all frames.
Lower values process faster but may miss exact scene cuts.
* **`threshold`** (`pxt.Float`): Maximum relative difference between 0.0 and 1.0 that the histograms can differ. Histograms are
calculated on the Y channel after converting the frame to YUV, and normalized based on the number of bins.
Higher differences imply greater change in content, so larger threshold values are less sensitive to cuts.
Lower values detect more scenes (more sensitive), higher values detect fewer scenes.
* **`bins`** (`pxt.Int`): Number of bins to use for histogram calculation (typically 16-256). More bins provide
finer granularity but may be more sensitive to noise.
* **`min_scene_len`** (`pxt.Int`): Once a cut is detected, this many frames must pass before a new one can be added to the scene
list.
**Returns:**
* `pxt.Json[(Json, ...)]`: A list of dictionaries, one for each detected scene, with the following keys:
* `start_time` (float): The start time of the scene in seconds.
* `start_pts` (int): The pts of the start of the scene.
* `duration` (float): The duration of the scene in seconds.
The list is ordered chronologically. Returns the full duration of the video if no scenes are detected.
**Examples:**
Detect scene cuts with default parameters:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scene_detect_histogram()).collect()
```
Detect more scenes by lowering the threshold:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scene_detect_histogram(threshold=0.03)).collect()
```
Use fewer bins for faster processing:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scene_detect_histogram(bins=64)).collect()
```
Use with a longer minimum scene length:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scene_detect_histogram(min_scene_len=30)).collect()
```
Add scene cuts as a computed column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
scene_cuts=tbl.video.scene_detect_histogram(threshold=0.04)
)
```
## udf scene\_detect\_threshold()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
scene_detect_threshold(
video: pxt.Video,
*,
fps: pxt.Float | None = None,
threshold: pxt.Float = 12.0,
min_scene_len: pxt.Int = 15,
fade_bias: pxt.Float = 0.0,
add_final_scene: pxt.Bool = False,
method: pxt.String = 'floor'
) -> pxt.Json[(Json, ...)]
```
Detect fade-in and fade-out transitions in a video using PySceneDetect's
[ThresholdDetector](https://www.scenedetect.com/docs/latest/api/detectors.html#scenedetect.detectors.threshold_detector.ThresholdDetector).
ThresholdDetector identifies scenes by detecting when pixel brightness falls below or rises above
a threshold value, suitable for detecting fade-to-black, fade-to-white, and similar transitions.
**Requirements:**
* `pip install scenedetect`
**Parameters:**
* **`video`** (`pxt.Video`): The video to analyze for fade transitions.
* **`fps`** (`pxt.Float | None`): Number of frames to extract per second for analysis. If None or 0, analyzes all frames.
Lower values process faster but may miss exact transition points.
* **`threshold`** (`pxt.Float`): 8-bit intensity value that each pixel value (R, G, and B) must be less than or equal to in order
to trigger a fade in/out.
* **`min_scene_len`** (`pxt.Int`): Once a cut is detected, this many frames must pass before a new one can be added to the scene
list.
* **`fade_bias`** (`pxt.Float`): Float between -1.0 and +1.0 representing the percentage of timecode skew for the start of a scene
(-1.0 causing a cut at the fade-to-black, 0.0 in the middle, and +1.0 causing the cut to be right at the
position where the threshold is passed).
* **`add_final_scene`** (`pxt.Bool`): Boolean indicating if the video ends on a fade-out to generate an additional scene at this
timecode.
* **`method`** (`pxt.String`): How to treat threshold when detecting fade events
* 'ceiling': Fade out happens when frame brightness rises above threshold.
* 'floor': Fade out happens when frame brightness falls below threshold.
**Returns:**
* `pxt.Json[(Json, ...)]`: A list of dictionaries, one for each detected scene, with the following keys:
* `start_time` (float): The start time of the scene in seconds.
* `start_pts` (int): The pts of the start of the scene.
* `duration` (float): The duration of the scene in seconds.
The list is ordered chronologically. Returns the full duration of the video if no scenes are detected.
**Examples:**
Detect fade-to-black transitions with default parameters:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scene_detect_threshold()).collect()
```
Use a lower threshold to detect darker fades:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scene_detect_threshold(threshold=8.0)).collect()
```
Detect both fade-to-black and fade-to-white using absolute method:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scene_detect_threshold(method='absolute')).collect()
```
Add final scene boundary:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
tbl.video.scene_detect_threshold(add_final_scene=True)
).collect()
```
Add fade transitions as a computed column:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
fade_cuts=tbl.video.scene_detect_threshold(threshold=15.0)
)
```
## udf scroll()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
scroll(
video: pxt.Video,
*,
w: pxt.Int | None = None,
h: pxt.Int | None = None,
x_speed: pxt.Float = 0,
y_speed: pxt.Float = 0,
x_start: pxt.Int = 0,
y_start: pxt.Int = 0,
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Video
```
Apply a scrolling viewport effect to a video using ffmpeg's crop filter.
Extracts a viewport of size `w` x `h` from each frame, starting at position (`x_start`, `y_start`) and moving
at (`x_speed`, `y_speed`) pixels per second. The viewport clamps at the frame edges: once it reaches a boundary,
it stops moving and the remaining frames show a static crop.
At least one of `w` or `h` must be smaller than the input dimensions for the effect to be visible.
The clip duration is unchanged. To pan across the full available range, set
`x_speed = (input_width - w) / duration`.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`video`** (`pxt.Video`): Input video.
* **`w`** (`pxt.Int | None`): Width of the output viewport in pixels. If None, uses the input width.
* **`h`** (`pxt.Int | None`): Height of the output viewport in pixels. If None, uses the input height.
* **`x_speed`** (`pxt.Float`): Horizontal scroll speed in pixels per second. Positive values scroll rightward (the viewport moves
right, revealing content to the right). Negative values scroll leftward.
* **`y_speed`** (`pxt.Float`): Vertical scroll speed in pixels per second. Positive values scroll downward. Negative values scroll
upward.
* **`x_start`** (`pxt.Int`): Initial horizontal offset of the viewport in pixels.
* **`y_start`** (`pxt.Int`): Initial vertical offset of the viewport in pixels.
* **`video_encoder`** (`pxt.String | None`): Video encoder to use. If not specified, uses the default encoder for the current platform.
* **`video_encoder_args`** (`pxt.Json | None`): Additional arguments to pass to the video encoder.
**Returns:**
* `pxt.Video`: A new video with the scrolling effect applied. Output dimensions are `w` x `h`.
**Examples:**
Pan rightward across a 1920x1080 video using a 1280-pixel-wide viewport, scrolling at 50 px/s:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scroll(w=1280, x_speed=50)).collect()
```
Pan rightward across the full range of a 1920x1080 video in exactly its duration. The viewport is
1280 px wide, so the pan range is 1920 - 1280 = 640 px. For a 10-second video, set
`x_speed = 640 / 10 = 64`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scroll(w=1280, x_speed=64)).collect()
```
Pan leftward across a 1920x1080 video, starting from the right edge:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.scroll(w=1280, x_start=640, x_speed=-64)).collect()
```
## udf segment\_video()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
segment_video(
video: pxt.Video,
*,
duration: pxt.Float | None = None,
segment_times: pxt.Json[(Float, ...)] | None = None,
mode: pxt.String = 'accurate',
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Json[(String, ...)]
```
Split a video into segments.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`video`** (`pxt.Video`): Input video file to segment
* **`duration`** (`pxt.Float | None`): Duration of each segment (in seconds). For `mode='fast'`, this is approximate;
for `mode='accurate'`, segments will have exact durations. Cannot be specified together with
`segment_times`.
* **`segment_times`** (`pxt.Json[(Float`): List of timestamps (in seconds) in video where segments should be split. Note that these are not
segment durations. If all segment times are less than the duration of the video, produces exactly
`len(segment_times) + 1` segments. Cannot be empty or be specified together with `duration`.
* **`mode`** (`Any`): Segmentation mode:
* `'fast'`: Quick segmentation using stream copy (splits only at keyframes, approximate durations)
* `'accurate'`: Precise segmentation with re-encoding (exact durations, slower)
* **`video_encoder`** (`Any`): Video encoder to use. If not specified, uses the default encoder for the current platform.
Only available for `mode='accurate'`.
* **`video_encoder_args`** (`Any`): Additional arguments to pass to the video encoder. Only available for `mode='accurate'`.
**Returns:**
* `pxt.Json[(String, ...)]`: List of file paths for the generated video segments.
**Examples:**
Split a video at 1 minute intervals using fast mode:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select
segment_paths=tbl.video.segment_video(
duration=60, mode='fast'
)
).collect()
```
Split video into exact 10-second segments with default accurate mode, using the libx264 encoder with a CRF of 23
and slow preset (for smaller output files):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
segment_paths=tbl.video.segment_video(
duration=10,
video_encoder='libx264',
video_encoder_args={'crf': 23, 'preset': 'slow'},
)
).collect()
```
Split video into two parts at the midpoint:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
duration = tbl.video.get_duration()
tbl.select(
segment_paths=tbl.video.segment_video(segment_times=[duration / 2])
).collect()
```
## udf speed()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
speed(
video: pxt.Video,
*,
factor: pxt.Float,
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Video
```
Change the playback speed of a video using ffmpeg's setpts filter.
A factor of 2.0 doubles the speed (halves the duration); a factor of 0.5 halves the speed (doubles the duration).
Audio pitch is preserved using ffmpeg's `atempo` filter.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`video`** (`pxt.Video`): Input video.
* **`factor`** (`pxt.Float`): Speed multiplier. Must be positive. Values > 1.0 speed up, values \< 1.0 slow down.
* **`video_encoder`** (`pxt.String | None`): Video encoder to use. If not specified, uses the default encoder for the current platform.
* **`video_encoder_args`** (`pxt.Json | None`): Additional arguments to pass to the video encoder.
**Returns:**
* `pxt.Video`: A new video with the adjusted playback speed.
**Examples:**
Double the speed:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.speed(factor=2.0)).collect()
```
Half speed (slow motion):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.speed(factor=0.5)).collect()
```
## udf with\_audio()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
with_audio(
video: pxt.Video,
audio: pxt.Audio,
*,
video_start_time: pxt.Float = 0.0,
video_duration: pxt.Float | None = None,
audio_start_time: pxt.Float = 0.0,
audio_duration: pxt.Float | None = None
) -> pxt.Video
```
Creates a new video that combines the video stream from `video` and the audio stream from `audio`.
The `start_time` and `duration` parameters can be used to select a specific time range from each input.
If the audio input (or selected time range) is longer than the video, the audio will be truncated.
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`video`** (`pxt.Video`): Input video.
* **`audio`** (`pxt.Audio`): Input audio.
* **`video_start_time`** (`pxt.Float`): Start time in the video input (in seconds).
* **`video_duration`** (`pxt.Float | None`): Duration of video segment (in seconds). If None, uses the remainder of the video after
`video_start_time`. `video_duration` determines the duration of the output video.
* **`audio_start_time`** (`pxt.Float`): Start time in the audio input (in seconds).
* **`audio_duration`** (`pxt.Float | None`): Duration of audio segment (in seconds). If None, uses the remainder of the audio after
`audio_start_time`. If the audio is longer than the output video, it will be truncated.
**Returns:**
* `pxt.Video`: A new video file with the audio track added.
**Examples:**
Add background music to a video:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.with_audio(tbl.music_track)).collect()
```
Add audio starting 5 seconds into both files:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
tbl.video.with_audio(
tbl.music_track, video_start_time=5.0, audio_start_time=5.0
)
).collect()
```
Use a 10-second clip from the middle of both files:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(
tbl.video.with_audio(
tbl.music_track,
video_start_time=30.0,
video_duration=10.0,
audio_start_time=15.0,
audio_duration=10.0,
)
).collect()
```
## udf zoom()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
zoom(
video: pxt.Video,
*,
start_scale: pxt.Float = 1.0,
end_scale: pxt.Float = 1.3,
center: pxt.Json[(Float, ...)] | None = None,
video_encoder: pxt.String | None = None,
video_encoder_args: pxt.Json | None = None
) -> pxt.Video
```
Apply a smooth zoom effect over the duration of a video using ffmpeg's zoompan filter.
The zoom factor interpolates linearly from `start_scale` to `end_scale`. The effect works by computing a crop
region at each frame (centered on `center`) and scaling it back to the original resolution. Output dimensions
match the input.
* `start_scale < end_scale`: zoom in (frame progressively tightens)
* `start_scale > end_scale`: zoom out (frame progressively widens)
* `start_scale == end_scale`: static zoom (constant crop, no animation)
**Requirements:**
* `ffmpeg` needs to be installed and in PATH
**Parameters:**
* **`video`** (`pxt.Video`): Input video.
* **`start_scale`** (`pxt.Float`): Zoom factor at the start of the video. Must be >= 1.0.
* **`end_scale`** (`pxt.Float`): Zoom factor at the end of the video. Must be >= 1.0.
* **`center`** (`pxt.Json[(Float`): Zoom center as `[x, y]` in normalized coordinates (0.0 to 1.0), where `[0.5, 0.5]` is the frame
center. If None, defaults to `[0.5, 0.5]`.
* **`video_encoder`** (`Any`): Video encoder to use. If not specified, uses the default encoder for the current platform.
* **`video_encoder_args`** (`Any`): Additional arguments to pass to the video encoder.
**Returns:**
* `pxt.Video`: A new video with the zoom effect applied. Output resolution matches the input.
**Examples:**
Zoom in (default, 1.0x to 1.3x centered):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.zoom()).collect()
```
Zoom out from 2x to 1x:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.zoom(start_scale=2.0, end_scale=1.0)).collect()
```
Zoom in toward the upper-left quadrant:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.zoom(end_scale=1.5, center=[0.25, 0.25])).collect()
```
Static 1.5x zoom (no animation):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.select(tbl.video.zoom(start_scale=1.5, end_scale=1.5)).collect()
```
# vision
Source: https://docs.pixeltable.com/sdk/latest/vision
# module pixeltable.functions.vision
Pixeltable UDFs for Computer Vision.
Example:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions import vision as pxtv
t = pxt.get_table(...)
t.select(
pxtv.bboxes_draw(t.img, boxes=t.boxes, labels=t.labels)
).collect()
```
## udf bboxes\_clip\_to\_canvas()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
bboxes_clip_to_canvas(
bboxes: pxt.Json[(Json, ...)],
format: pxt.String,
*,
width: pxt.Int | None = None,
height: pxt.Int | None = None,
min_visibility: pxt.Float = 0.0,
min_area: pxt.Float = 0.0
) -> pxt.Json[(Json, ...)]
```
Clip a list of bounding boxes to a canvas of specified size.
**Parameters:**
* **`bboxes`** (`pxt.Json[(Json`): List of bounding boxes, each either specified with absolute pixel coordinates (`int`) or relative
coordinates (`float`).
* **`format`** (`Any`): Format of the bounding box coordinates, one of 'xyxy', 'xywh', 'cxcywh'.
* **`width`** (`Any`): Canvas width in absolute pixels. Required for absolute coordinates, must not be specified for relative.
* **`height`** (`Any`): Canvas height in absolute pixels. Required for absolute coordinates, must not be specified for relative.
* **`min_visibility`** (`Any`): Minimum fraction of the bounding box that must be visible after clipping. If the visibility
is less than this value, returns None.
* **`min_area`** (`Any`): Minimum area of the bounding box after clipping. If the area is less than this value, returns None.
**Returns:**
* `pxt.Json[(Json, ...)]`: List of clipped bounding boxes in the same format as the input. Boxes that don't meet the
min\_visibility or min\_area thresholds are replaced with None.
## udf bboxes\_convert()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
bboxes_convert(
bboxes: pxt.Json[(Json, ...)],
*,
src_format: pxt.String,
dst_format: pxt.String
) -> pxt.Json[(Json, ...)]
```
Convert a list of bounding boxes from src\_format to dst\_format.
**Parameters:**
* **`bboxes`** (`pxt.Json[(Json`): List of bounding boxes, each either specified with absolute pixel coordinates or relative
coordinates in \[0, 1].
* **`src_format`** (`Any`): Source format, one of 'xyxy', 'xywh', 'cxcywh'.
* **`dst_format`** (`Any`): Destination format, one of 'xyxy', 'xywh', 'cxcywh'.
**Returns:**
* `pxt.Json[(Json, ...)]`: List of bounding boxes in dst\_format.
## udf bboxes\_crop\_canvas()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
bboxes_crop_canvas(
bboxes: pxt.Json[(Json, ...)],
format: pxt.String,
*,
canvas_region: pxt.Json[(Json, ...)],
canvas_region_format: pxt.String,
canvas_width: pxt.Int | None = None,
canvas_height: pxt.Int | None = None
) -> pxt.Json[(Json, ...)]
```
Adjust a list of bounding boxes to account for a canvas crop.
**Parameters:**
* **`bboxes`** (`pxt.Json[(Json`): List of bounding boxes, each either specified with absolute pixel coordinates or relative coordinates.
* **`format`** (`Any`): Format of the bounding box coordinates, one of 'xyxy', 'xywh', 'cxcywh'.
* **`canvas_width`** (`Any`): Canvas width.
* **`canvas_height`** (`Any`): Canvas height.
* **`canvas_region`** (`Any`): Canvas region that was cropped, either specified with absolute pixel coordinates or relative
coordinates, in the format specified by `canvas_region_format`.
* **`canvas_region_format`** (`Any`): Format of the `canvas_region` coordinates, one of 'xyxy', 'xywh', 'cxcywh'.
**Returns:**
* `pxt.Json[(Json, ...)]`: List of adjusted bounding boxes in the same format as the input. They can extend beyond the canvas boundaries.
## udf bboxes\_draw()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
bboxes_draw(
img: pxt.Image,
boxes: pxt.Json[(Json[(Int, ...)], ...)],
*,
labels: pxt.Json[(Json, ...)] | None = None,
color: pxt.String | None = None,
box_colors: pxt.Json[(String, ...)] | None = None,
alpha: pxt.Float | None = None,
fill: pxt.Bool = False,
fill_alpha: pxt.Float | None = None,
width: pxt.Int = 1,
font: pxt.String | None = None,
font_size: pxt.Int | None = None
) -> pxt.Image
```
Draws bounding boxes on the given image.
Labels can be any type that supports `str()` and is hashable (e.g., strings, ints, etc.).
Colors can be specified as common HTML color names (e.g., 'red') supported by PIL's
[`ImageColor`](https://pillow.readthedocs.io/en/stable/reference/ImageColor.html#imagecolor-module) module or as
RGB/RGBA hex codes (e.g., '#FF0000', '#FF0000FF'). If opacity isn't specified in the color string and
`alpha`/`fill_alpha` is `None`, defaults to 1.0 for box borders and 0.5 for filled boxes.
If no colors are specified, this function randomly assigns each label a specific color based on a hash of the label.
**Parameters:**
* **`img`** (`pxt.Image`): The image on which to draw the bounding boxes.
* **`boxes`** (`pxt.Json[(Json[(Int`): List of bounding boxes, each represented as \[xmin, ymin, xmax, ymax].
* **`labels`** (`Any`): List of labels for each bounding box.
* **`color`** (`Any`): Single color to be used for all bounding boxes and labels.
* **`box_colors`** (`Any`): List of colors, one per bounding box.
* **`alpha`** (`Any`): Opacity (0-1) of the bounding box borders and labels. If non-`None`, overrides any alpha in
`color`/`box_colors`.
* **`fill`** (`Any`): Whether to fill the bounding boxes with color.
* **`fill_alpha`** (`Any`): Opacity (0-1) of the bounding box fill. If non-`None`, overrides any alpha in
`color`/`box_colors`.
* **`width`** (`Any`): Width of the bounding box borders.
* **`font`** (`Any`): Name of a system font or path to a TrueType font file, as required by
[`PIL.ImageFont.truetype()`](https://pillow.readthedocs.io/en/stable/reference/ImageFont.html#PIL.ImageFont.truetype).
If `None`, uses the default provided by
[`PIL.ImageFont.load_default()`](https://pillow.readthedocs.io/en/stable/reference/ImageFont.html#PIL.ImageFont.load_default).
* **`font_size`** (`Any`): Size of the font used for labels in points. Only used in conjunction with non-`None` `font` argument.
**Returns:**
* `pxt.Image`: The image with bounding boxes drawn on it.
## udf bboxes\_pad()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
bboxes_pad(
bboxes: pxt.Json[(Json, ...)],
format: pxt.String,
*,
top: pxt.Int | None = None,
bottom: pxt.Int | None = None,
left: pxt.Int | None = None,
right: pxt.Int | None = None,
x: pxt.Int | None = None,
y: pxt.Int | None = None
) -> pxt.Json[(Json, ...)]
```
Pad a list of bounding boxes.
**Parameters:**
* **`bboxes`** (`pxt.Json[(Json`): List of bounding boxes in absolute pixel coordinates.
* **`format`** (`Any`): Format of the bounding box coordinates, one of 'xyxy', 'xywh', 'cxcywh'.
* **`top`** (`Any`): Amount to pad at the top, in absolute pixels.
* **`bottom`** (`Any`): Amount to pad at the bottom, in absolute pixels.
* **`left`** (`Any`): Amount to pad at the left, in absolute pixels.
* **`right`** (`Any`): Amount to pad at the right, in absolute pixels.
* **`x`** (`Any`): Amount to pad at the left and right, in absolute pixels.
* **`y`** (`Any`): Amount to pad at the top and bottom, in absolute pixels.
**Returns:**
* `pxt.Json[(Json, ...)]`: List of padded bounding boxes in the same format as the input.
## udf bboxes\_resize()
```python Signatures theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Signature 1:
@pxt.udf
bboxes_resize(
bboxes: pxt.Json[(Json, ...)],
format: pxt.String,
width: pxt.Int | None,
height: pxt.Int | None,
aspect: pxt.String | None,
aspect_mode: pxt.String | None
) -> pxt.Json[(Json, ...)]
# Signature 2:
@pxt.udf
bboxes_resize(
bboxes: pxt.Json[(Json, ...)],
format: pxt.String,
width: pxt.Float | None,
height: pxt.Float | None,
aspect: pxt.Float | None,
aspect_mode: pxt.String | None
) -> pxt.Json[(Json, ...)]
```
Resize a list of bounding boxes (center-anchored):
* to a specified width or height (the other dimension is scaled to maintain the aspect ratio)
* to a specified aspect ratio
Only one of `width`, `height`, or `aspect` can be specified.
**Parameters:**
* **`bboxes`** (`Json[(Json, ...)]`): List of bounding boxes, each either specified with absolute pixel coordinates or relative
coordinates in \[0, 1].
* **`format`** (`String`): Format of the bounding box coordinates, one of 'xyxy', 'xywh', 'cxcywh'.
* **`width`** (`Int | None`, default: `Literal(None)`): Target width. Pass an `int` for absolute pixels or a `float` for relative coordinates.
* **`height`** (`Int | None`, default: `Literal(None)`): Target height. Pass an `int` for absolute pixels or a `float` for relative coordinates.
* **`aspect`** (`String | None`, default: `Literal(None)`): Target aspect ratio. Pass a `str` like '16:9' or a `float` like 1.78.
* **`aspect`** (`String | None`, default: `Literal(None)`): Target aspect ratio as a string 'W:H' (e.g., '16:9') or a `float`. Resizes either the width
or height to match the specified aspect ratio, maintaining the other dimension. Requires `aspect_mode`.
* **`aspect_mode`** (`String | None`, default: `Literal(None)`): Either 'crop' or 'pad'. Required when `aspect` is specified. If `crop`, reduces the oversized
dimension to match the aspect ratio. If `pad`, extends the undersized dimension to match the aspect ratio.
**Returns:**
* `pxt.Json[(Json, ...)]`: List of resized bounding boxes in the same format as the input.
## udf bboxes\_resize\_canvas()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
bboxes_resize_canvas(
bboxes: pxt.Json[(Json, ...)],
format: pxt.String,
*,
canvas_width: pxt.Int | None = None,
canvas_height: pxt.Int | None = None,
new_canvas_width: pxt.Int | None = None,
new_canvas_height: pxt.Int | None = None,
canvas_scale: pxt.Float | None = None,
canvas_scale_x: pxt.Float | None = None,
canvas_scale_y: pxt.Float | None = None
) -> pxt.Json[(Json, ...)]
```
Adjust a list of bounding boxes to account for a canvas resize. The resize operation can be expressed
* as absolute pixel dimensions (requires canvas\_width, canvas\_height, new\_canvas\_width, new\_canvas\_height)
* as relative dimensions (requires at least one of canvas\_scale, canvas\_scale\_x, canvas\_scale\_y)
**Parameters:**
* **`bboxes`** (`pxt.Json[(Json`): List of bounding boxes in absolute pixel coordinates.
* **`format`** (`Any`): Format of the bounding box coordinates, one of 'xyxy', 'xywh', 'cxcywh'.
* **`canvas_width`** (`Any`): Original canvas width in absolute pixels.
* **`canvas_height`** (`Any`): Original canvas height in absolute pixels.
* **`new_canvas_width`** (`Any`): New canvas width in absolute pixels. Requires canvas\_width/canvas\_height to be specified.
* **`new_canvas_height`** (`Any`): New canvas height in absolute pixels. Requires canvas\_width/canvas\_height to be specified.
* **`canvas_scale`** (`Any`): Scale factor to apply to both canvas dimensions.
* **`canvas_scale_x`** (`Any`): Scale factor to apply to the canvas width.
* **`canvas_scale_y`** (`Any`): Scale factor to apply to the canvas height.
**Returns:**
* `pxt.Json[(Json, ...)]`: List of adjusted bounding boxes in the same format as the input.
## udf bboxes\_scale()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
bboxes_scale(
bboxes: pxt.Json[(Json, ...)],
format: pxt.String,
*,
factor: pxt.Float | None = None,
x_factor: pxt.Float | None = None,
y_factor: pxt.Float | None = None
) -> pxt.Json[(Json, ...)]
```
Re-scale a list of bounding boxes (center-anchored).
**Parameters:**
* **`bboxes`** (`pxt.Json[(Json`): List of bounding boxes, each either specified with absolute pixel coordinates or relative
coordinates in \[0, 1].
* **`format`** (`Any`): Format of the bounding box coordinates, one of 'xyxy', 'xywh', 'cxcywh'.
* **`factor`** (`Any`): Scale factor to apply to both box dimensions.
* **`x_factor`** (`Any`): Scale factor to apply to the box width.
* **`y_factor`** (`Any`): Scale factor to apply to the box height.
**Returns:**
* `pxt.Json[(Json, ...)]`: List of scaled bounding boxes in the same format as the input.
## udf eval\_detections()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
eval_detections(
pred_bboxes: pxt.Json[(Json[(Int, ...)], ...)],
pred_labels: pxt.Json[(Int, ...)],
pred_scores: pxt.Json[(Float, ...)],
gt_bboxes: pxt.Json[(Json[(Int, ...)], ...)],
gt_labels: pxt.Json[(Int, ...)],
min_iou: pxt.Float = 0.5
) -> pxt.Json[(Json, ...)]
```
Evaluates the performance of a set of predicted bounding boxes against a set of ground truth bounding boxes.
**Parameters:**
* **`pred_bboxes`** (`pxt.Json[(Json[(Int`): List of predicted bounding boxes, each represented as \[xmin, ymin, xmax, ymax].
* **`pred_labels`** (`Any`): List of predicted labels.
* **`pred_scores`** (`Any`): List of predicted scores.
* **`gt_bboxes`** (`Any`): List of ground truth bounding boxes, each represented as \[xmin, ymin, xmax, ymax].
* **`gt_labels`** (`Any`): List of ground truth labels.
* **`min_iou`** (`Any`): Minimum intersection-over-union (IoU) threshold for a predicted bounding box to be
considered a true positive.
**Returns:**
* `pxt.Json[(Json, ...)]`: A list of dictionaries, one per label class, with the following structure:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{
'min_iou': float, # The value of `min_iou` used for the detections
'class': int, # The label class
# List of 1's and 0's indicating true positives for each
# predicted bounding box of this class
'tp': list[int],
# List of 1's and 0's indicating false positives for each
# predicted bounding box of this class; `fp[n] == 1 - tp[n]`
'fp': list[int],
# List of predicted scores for each bounding box of this class
'scores': list[float],
'num_gts': int, # Number of ground truth bounding boxes of this class
}
```
## udf overlay\_segmentation()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
overlay_segmentation(
img: pxt.Image,
segmentation: pxt.Array[(None, None), int32],
*,
alpha: pxt.Float = 0.5,
background: pxt.Int = 0,
segment_colors: pxt.Json[(String, ...)] | None = None,
draw_contours: pxt.Bool = True,
contour_thickness: pxt.Int = 1
) -> pxt.Image
```
Overlays a colored segmentation map on an image.
Colors can be specified as common HTML color names (e.g., 'red') supported by PIL's
[`ImageColor`](https://pillow.readthedocs.io/en/stable/reference/ImageColor.html#imagecolor-module) module or as
RGB hex codes (e.g., '#FF0000').
If no colors are specified, this function randomly assigns each segment a specific color based on a hash of its id.
**Parameters:**
* **`img`** (`pxt.Image`): Input image.
* **`segmentation`** (`pxt.Array[(None`): 2D array of the same shape as `img` where each pixel value is a segment id.
* **`alpha`** (`Any`): Blend factor for the overlay (0.0 = only original image, 1.0 = only segmentation colors).
* **`background`** (`Any`): Segment id to treat as background (not overlaid with color, showing the original
image through).
* **`segment_colors`** (`Any`): List of colors, one per segment id. If the list is shorter than the number of segments, the
remaining segments will be assigned colors automatically.
* **`draw_contours`** (`Any`): If True, draw contours around each segment with full opacity.
* **`contour_thickness`** (`Any`): Thickness of the contour lines in pixels.
**Returns:**
* `pxt.Image`: The image with the colored segmentation overlay.
# voyageai
Source: https://docs.pixeltable.com/sdk/latest/voyageai
# module pixeltable.functions.voyageai
Pixeltable UDFs
that wrap various endpoints from the Voyage AI API. In order to use them, you must
first `pip install voyageai` and configure your Voyage AI credentials, as described in
the [Working with Voyage AI](https://docs.pixeltable.com/notebooks/integrations/working-with-voyageai) tutorial.
## udf embeddings()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
embeddings(
input: pxt.String,
*,
model: pxt.String,
input_type: pxt.String | None = None,
truncation: pxt.Bool | None = None,
output_dimension: pxt.Int | None = None,
output_dtype: pxt.String | None = None
) -> pxt.Array[(None,), float32]
```
Creates an embedding vector representing the input text.
Equivalent to the Voyage AI `embeddings` API endpoint.
For additional details, see: [https://docs.voyageai.com/docs/embeddings](https://docs.voyageai.com/docs/embeddings)
Request throttling:
Applies the rate limit set in the config (section `voyageai`, key `rate_limit`). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install voyageai`
**Parameters:**
* **`input`** (`pxt.String`): The text to embed.
* **`model`** (`pxt.String`): The model to use for the embedding. Recommended options: `voyage-3-large`, `voyage-3.5`,
`voyage-3.5-lite`, `voyage-code-3`, `voyage-finance-2`, `voyage-law-2`.
* **`input_type`** (`pxt.String | None`): Type of the input text. Options: `None`, `query`, `document`.
When `input_type` is `None`, the embedding model directly converts the inputs into numerical vectors.
For retrieval/search purposes, we recommend setting this to `query` or `document` as appropriate.
* **`truncation`** (`pxt.Bool | None`): Whether to truncate the input texts to fit within the context length. Defaults to `True`.
* **`output_dimension`** (`pxt.Int | None`): The number of dimensions for resulting output embeddings.
Most models only support a single default dimension. Models `voyage-3-large`, `voyage-3.5`,
`voyage-3.5-lite`, and `voyage-code-3` support: 256, 512, 1024 (default), and 2048.
* **`output_dtype`** (`pxt.String | None`): The data type for the embeddings to be returned. Options: `float`, `int8`, `uint8`,
`binary`, `ubinary`. Only `float` is currently supported in Pixeltable.
**Returns:**
* `pxt.Array[(None,), float32]`: An array representing the application of the given embedding to `input`.
**Examples:**
Add a computed column that applies the model `voyage-3.5` to an existing
Pixeltable column `tbl.text` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
embed=embeddings(tbl.text, model='voyage-3.5', input_type='document')
)
```
Add an embedding index to an existing column `text`, using the model `voyage-3.5`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_embedding_index(
'text', string_embed=embeddings.using(model='voyage-3.5')
)
```
## udf multimodal\_embed()
```python Signatures theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Signature 1:
@pxt.udf
multimodal_embed(
text: pxt.String,
model: pxt.String,
input_type: pxt.String | None,
truncation: pxt.Bool
) -> pxt.Array[(1024,), float32]
# Signature 2:
@pxt.udf
multimodal_embed(
image: pxt.Image,
model: pxt.String,
input_type: pxt.String | None,
truncation: pxt.Bool
) -> pxt.Array[(1024,), float32]
# Signature 3:
@pxt.udf
multimodal_embed(
video: pxt.Video,
model: pxt.String,
input_type: pxt.String | None,
truncation: pxt.Bool
) -> pxt.Array[(1024,), float32]
```
Creates an embedding vector for text, images, or video using Voyage AI's multimodal model.
Equivalent to the Voyage AI `multimodal_embed` API endpoint.
For additional details, see: [https://docs.voyageai.com/docs/multimodal-embeddings](https://docs.voyageai.com/docs/multimodal-embeddings)
Request throttling:
Applies the rate limit set in the config (section `voyageai`, key `rate_limit`). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install voyageai`
**Parameters:**
* **`text`** (`String`): The text to embed.
* **`image`** (`Image`): The image to embed.
* **`video`** (`Video`): The video to embed.
* **`model`** (`String`): The model to use. Currently only `voyage-multimodal-3` is supported.
* **`input_type`** (`String | None`, default: `Literal(None)`): Type of the input. Options: `None`, `query`, `document`.
For retrieval/search, set to `query` or `document` as appropriate.
* **`truncation`** (`Bool`, default: `Literal(True)`): Whether to truncate inputs to fit within context length. Defaults to `True`.
**Returns:**
* `pxt.Array[(1024,), float32]`: An array of 1024 floats representing the embedding.
**Examples:**
Embed a text column `description`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
embed=multimodal_embed(tbl.description, input_type='document')
)
```
Add an embedding index for column `description`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_embedding_index(
'description',
embed=multimodal_embed.using(model='voyage-multimodal-3'),
)
```
## udf rerank()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
rerank(
query: pxt.String,
documents: pxt.Json[(String, ...)],
*,
model: pxt.String,
top_k: pxt.Int | None = None,
truncation: pxt.Bool = True
) -> pxt.Json
```
Reranks documents based on their relevance to a query.
Equivalent to the Voyage AI `rerank` API endpoint.
For additional details, see: [https://docs.voyageai.com/docs/reranker](https://docs.voyageai.com/docs/reranker)
Request throttling:
Applies the rate limit set in the config (section `voyageai`, key `rate_limit`). If no rate
limit is configured, uses a default of 600 RPM.
**Requirements:**
* `pip install voyageai`
**Parameters:**
* **`query`** (`pxt.String`): The query as a string.
* **`documents`** (`pxt.Json[(String`): The documents to be reranked as a list of strings.
* **`model`** (`Any`): The model to use for reranking. Recommended options: `rerank-2.5`, `rerank-2.5-lite`.
* **`top_k`** (`Any`): The number of most relevant documents to return. If not specified, all documents
will be reranked and returned.
* **`truncation`** (`Any`): Whether to truncate the input to satisfy context length limits. Defaults to `True`.
**Returns:**
* `pxt.Json`: A dictionary containing:
* `results`: List of reranking results with `index`, `document`, and `relevance_score`
* `total_tokens`: The total number of tokens used
**Examples:**
Rerank similarity search results for better relevance. First, create a table with
an embedding index, then use a query function to retrieve candidates and rerank them:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
docs = pxt.create_table('docs', {'text': pxt.String})
docs.add_computed_column(embed=embeddings(docs.text, model='voyage-3.5'))
docs.add_embedding_index('text', embed=docs.embed)
@pxt.query
def get_candidates(query_text: str):
sim = docs.text.similarity(
query_text, embed=embeddings.using(model='voyage-3.5')
)
return docs.order_by(sim, asc=False).limit(20).select(docs.text)
queries = pxt.create_table('queries', {'query': pxt.String})
queries.add_computed_column(candidates=get_candidates(queries.query))
queries.add_computed_column(
reranked=rerank(
queries.query,
queries.candidates.text,
model='rerank-2.5',
top_k=5,
)
)
```
# whisper
Source: https://docs.pixeltable.com/sdk/latest/whisper
# module pixeltable.functions.whisper
Pixeltable UDFs
that wraps the OpenAI Whisper library.
This UDF will cause Pixeltable to invoke the relevant model locally. In order to use it, you must
first `pip install openai-whisper`.
## udf transcribe()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
transcribe(
audio: pxt.Audio,
*,
model: pxt.String,
temperature: pxt.Json[(Float, ...)] | None = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
compression_ratio_threshold: pxt.Float | None = 2.4,
logprob_threshold: pxt.Float | None = -1.0,
no_speech_threshold: pxt.Float | None = 0.6,
condition_on_previous_text: pxt.Bool = True,
initial_prompt: pxt.String | None = None,
word_timestamps: pxt.Bool = False,
prepend_punctuations: pxt.String = '"\'“¿([{-',
append_punctuations: pxt.String = '"\'.。,,!!??::”)]}、',
decode_options: pxt.Json | None = None
) -> pxt.Json
```
Transcribe an audio file using Whisper.
This UDF runs a transcription model *locally* using the Whisper library,
equivalent to the Whisper `transcribe` function, as described in the
[Whisper library documentation](https://github.com/openai/whisper).
**Requirements:**
* `pip install openai-whisper`
**Parameters:**
* **`audio`** (`pxt.Audio`): The audio file to transcribe.
* **`model`** (`pxt.String`): The name of the model to use for transcription.
**Returns:**
* `pxt.Json`: A dictionary containing the transcription and various other metadata.
**Examples:**
Add a computed column that applies the model `base.en` to an existing Pixeltable column `tbl.audio`
of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(result=transcribe(tbl.audio, model='base.en'))
```
# whisperx
Source: https://docs.pixeltable.com/sdk/latest/whisperx
# module pixeltable.functions.whisperx
WhisperX audio transcription and diarization functions.
## udf transcribe()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
transcribe(
audio: pxt.Audio,
*,
model: pxt.String,
diarize: pxt.Bool = False,
compute_type: pxt.String | None = None,
language: pxt.String | None = None,
task: pxt.String | None = None,
chunk_size: pxt.Int | None = None,
alignment_model_name: pxt.String | None = None,
interpolate_method: pxt.String | None = None,
return_char_alignments: pxt.Bool | None = None,
diarization_model_name: pxt.String | None = None,
num_speakers: pxt.Int | None = None,
min_speakers: pxt.Int | None = None,
max_speakers: pxt.Int | None = None
) -> pxt.Json
```
Transcribe an audio file using WhisperX.
This UDF runs a transcription model *locally* using the WhisperX library,
equivalent to the WhisperX `transcribe` function, as described in the
[WhisperX library documentation](https://github.com/m-bain/whisperX).
If `diarize=True`, then speaker diarization will also be performed. Several of the UDF parameters are only valid if
`diarize=True`, as documented in the parameters list below.
**Requirements:**
* `pip install whisperx`
**Parameters:**
* **`audio`** (`pxt.Audio`): The audio file to transcribe.
* **`model`** (`pxt.String`): The name of the model to use for transcription.
* **`diarize`** (`pxt.Bool`): Whether to perform speaker diarization.
* **`compute_type`** (`pxt.String | None`): The compute type to use for the model (e.g., `'int8'`, `'float16'`). If `None`,
defaults to `'float16'` on CUDA devices and `'int8'` otherwise.
* **`language`** (`pxt.String | None`): The language code for the transcription (e.g., `'en'` for English).
* **`task`** (`pxt.String | None`): The task to perform (e.g., `'transcribe'` or `'translate'`). Defaults to `'transcribe'`.
* **`chunk_size`** (`pxt.Int | None`): The size of the audio chunks to process, in seconds. Defaults to `30`.
* **`alignment_model_name`** (`pxt.String | None`): The name of the alignment model to use. If `None`, uses the default model for the given
language. Only valid if `diarize=True`.
* **`interpolate_method`** (`pxt.String | None`): The method to use for interpolation of the alignment results. If not specified, uses the
WhisperX default (`'nearest'`). Only valid if `diarize=True`.
* **`return_char_alignments`** (`pxt.Bool | None`): Whether to return character-level alignments. Defaults to `False`.
Only valid if `diarize=True`.
* **`diarization_model_name`** (`pxt.String | None`): The name of the diarization model to use. Defaults to
`pyannote/speaker-diarization-3.1`. Only valid if `diarize=True`.
* **`num_speakers`** (`pxt.Int | None`): The number of speakers to expect in the audio. By default, the model with try to detect the
number of speakers. Only valid if `diarize=True`.
* **`min_speakers`** (`pxt.Int | None`): If specified, the minimum number of speakers to expect in the audio.
Only valid if `diarize=True`.
* **`max_speakers`** (`pxt.Int | None`): If specified, the maximum number of speakers to expect in the audio.
Only valid if `diarize=True`.
**Returns:**
* `pxt.Json`: A dictionary containing the audio transcription, diarization (if enabled), and various other metadata.
**Examples:**
Add a computed column that applies the model `tiny.en` to an existing Pixeltable column `tbl.audio`
of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(result=transcribe(tbl.audio, model='tiny.en'))
```
Add a computed column that applies the model `tiny.en` to an existing Pixeltable column `tbl.audio`
of the table `tbl`, with speaker diarization enabled, expecting at least 2 speakers:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
result=transcribe(
tbl.audio, model='tiny.en', diarize=True, min_speakers=2
)
)
```
# yolox
Source: https://docs.pixeltable.com/sdk/latest/yolox
# module pixeltable.functions.yolox
YOLOX object detection functions.
## udf yolo\_to\_coco()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
yolo_to_coco(detections: pxt.Json) -> pxt.Json[(Json, ...)]
```
Converts the output of a YOLOX object detection model to COCO format.
**Parameters:**
* **`detections`** (`pxt.Json`): The output of a YOLOX object detection model, as returned by `yolox`.
**Returns:**
* `pxt.Json[(Json, ...)]`: A dictionary containing the data from `detections`, converted to COCO format.
**Examples:**
Add a computed column that converts the output `tbl.detections` to COCO format, where `tbl.image`
is the image for which detections were computed:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
detections=yolox(tbl.image, model_id='yolox_m', threshold=0.8)
)
tbl.add_computed_column(detections_coco=yolo_to_coco(tbl.detections))
```
## udf yolox()
```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
yolox(
images: pxt.Image,
*,
model_id: pxt.String,
threshold: pxt.Float = 0.5
) -> pxt.Json
```
Computes YOLOX object detections for the specified image. `model_id` should reference one of the models
defined in the [YOLOX documentation](https://github.com/Megvii-BaseDetection/YOLOX).
**Requirements**:
* `pip install pixeltable-yolox`
**Parameters:**
* **`model_id`** (`pxt.String`): one of: `yolox_nano`, `yolox_tiny`, `yolox_s`, `yolox_m`, `yolox_l`, `yolox_x`
* **`threshold`** (`pxt.Float`): the threshold for object detection
**Returns:**
* `pxt.Json`: A dictionary containing the output of the object detection model.
**Examples:**
Add a computed column that applies the model `yolox_m` to an existing
Pixeltable column `tbl.image` of the table `tbl`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
detections=yolox(tbl.image, model_id='yolox_m', threshold=0.8)
)
```
# Computed Columns
Source: https://docs.pixeltable.com/tutorials/computed-columns
Learn how Pixeltable computed columns turn Python functions into incremental, cached transformations over tables, views, and media data.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
This guide introduces one of Pixeltable’s most essential and powerful
concepts: computed columns. You’ll learn how to:
* Add computed columns to a table
* Use computed columns for complex operations such as image processing
and model inference
## Prerequisites
This guide assumes you’re familiar with:
* Creating and managing tables
* Inserting and querying data
* Basic table operations
If you’re new to Pixeltable, start with the [Tables and Data
Operations](/tutorials/tables-and-data-operations)
guide.
First, let’s ensure the Pixeltable library is installed in your
environment, along with the Huggingface `transformers` library.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable torch transformers timm
```
### Computed Columns
Let’s start with a simple example that illustrates the basic concepts
behind computed columns. We’ll use a table of world population data for
our example. Remember that you can import datasets into a Pixeltable
table by using `pxt.create_table()` with the `source` parameter.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
pxt.create_dir('fundamentals', if_exists='ignore')
pop_t = pxt.create_table(
'fundamentals/population',
source='https://github.com/pixeltable/pixeltable/raw/release/docs/resources/world-population-data.csv',
if_exists='replace',
)
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'fundamentals'.
Created table 'population'.
Inserting rows into \`population\`: 234 rows \[00:00, 6850.71 rows/s]
Inserted 234 rows with 0 errors.
Also recall that `pop_t.head()` returns the first few rows of a table,
and typing the table name `pop_t` by itself gives the schema.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pop_t.head(5)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pop_t
```
Now let’s suppose we want to add a new column for the year-over-year
population change from 2022 to 2023. You can `select()` such a quantity
into a Pixeltable `Query`, giving it the name `yoy_change`
(year-over-year change):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pop_t.select(
pop_t.country, yoy_change=(pop_t.pop_2023 - pop_t.pop_2022)
).head(5)
```
A **computed column** is a way of turning such a selection into a new,
permanent column of the table. Here’s how it works:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pop_t.add_computed_column(yoy_change=(pop_t.pop_2023 - pop_t.pop_2022))
```
As soon as the column is added, Pixeltable will (by default)
automatically compute its value for all rows in the table, storing the
results in the new column. If we now inspect the schema of `pop_t`, we
see the new column and its definition.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pop_t
```
The new column can be queried in the usual manner.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pop_t.select(pop_t.country, pop_t.yoy_change).head(5)
```
The output is identical to the previous example, but now we’re
retrieving the computed output from the database, instead of computing
it on-the-fly.
Computed columns can be “chained” with other computed columns. Here’s an
example that expresses population change as a percentage:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pop_t.add_computed_column(
yoy_percent_change=(100 * pop_t.yoy_change / pop_t.pop_2022)
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pop_t
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pop_t.select(
pop_t.country, pop_t.yoy_change, pop_t.yoy_percent_change
).head(5)
```
Although computed columns appear superficially similar to Queries, there
is a key difference. Because computed columns are a permanent part of
the table, they will be automatically updated any time new data is added
to the table. These updates will propagate through any other computed
columns that are “downstream” of the new data, ensuring that the state
of the entire data is kept up-to-date.
In traditional data workflows, it is commonplace
to recompute entire pipelines when the input dataset is changed or
enlarged. In Pixeltable, by contrast, all updates are applied
incrementally. When new data appear in a table or existing data are
altered, Pixeltable will recompute only those rows that are dependent on
the changed data.
Let’s see how this works in practice. For purposes of illustration,
we’ll add an entry for California to the table, as if it were a country.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pop_t.insert(country='California', pop_2023=39110000, pop_2022=39030000)
```
Observe that the computed columns `yoy_growth` and `yoy_percent_growth`
have been automatically updated in response to the new data.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pop_t.tail(5)
```
Remember that all tables in Pixeltable are
persistent. This includes computed columns: when you create a
computed column, its definition is stored in the database. You can think
of computed columns as setting up a persistent compute workflow: if you
close your notebook or restart your Python instance, computed columns
(along with the relationships between them, and any data contained in
them) will be preserved.
### Recomputing Columns
From time to time you might need to recompute the data in an existing
computed column. Perhaps the *code* for one of your UDFs has changed,
and you want to recompute a column that uses that UDF in order to pick
up the new logic. Or perhaps you want to re-run a nondeterministic
computation such as model inference. The command to do this is
`recompute_columns()`. It won’t do much in the current example, because
all our computations are simple and deterministic, but for demonstration
purposes here’s what it looks like:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pop_t.recompute_columns(pop_t.yoy_change, pop_t.yoy_percent_change)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pop_t.tail(5)
```
As expected, it looks the same.
If you modify the data that a computed
column depends on, Pixeltable will recompute automatically; so
recompute\_columns() is primarily useful when the input data
remains the same, but your UDF business logic changes.
### A More Complex Example: Image Processing
Pixeltable supports media data such as images alongside traditional
structured data. Let’s explore an example that uses computed columns for
image processing operations.
In this example, we’ll create the table directly by providing a schema,
rather than importing it from a CSV.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('fundamentals/image_ops', {'source': pxt.Image})
```
Created table 'image\_ops'.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
url_prefix = 'https://github.com/pixeltable/pixeltable/raw/release/docs/resources/images'
images = ['000000000139.jpg', '000000000632.jpg', '000000000872.jpg']
t.insert({'source': f'{url_prefix}/{image}'} for image in images)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.collect()
```
What are some things we might want to do with these images? A fairly
basic one is to extract metadata. Pixeltable provides the built-in UDF
`get_metadata()`, which returns a dictionary with various metadata about
the image. Let’s go ahead and make this a computed column.
“UDF” is standard terminology in databases,
meaning “User-Defined Function”. Technically speaking, the
get\_metadata() function isn’t user-defined, it’s built in
to the Pixeltable library. But we’ll consistently refer to Pixeltable
functions as “UDFs” in order to clearly distinguish them from ordinary
Python functions. Later in this guide, we’ll see how to turn (almost)
any Python function into a Pixeltable UDF.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(metadata=t.source.get_metadata())
t.collect()
```
Added 3 column values with 0 errors.
Image operations, of course, can also return new images.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(rotated=t.source.rotate(10))
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.collect()
```
Or, perhaps we want to rotate our images and fill them in with a
transparent background rather than black. We can do this by chaining
image operations, adding a transparency layer before doing the rotation.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(
rotated_transparent=t.source.convert('RGBA').rotate(10)
)
t.collect()
```
Added 3 column values with 0 errors.
In addition to get\_metadata(),
convert(), and rotate(), Pixeltable has a
sizable library of other common image operations that can be used as
UDFs in computed columns. For the most part, the image UDFs are analogs
of the operations provided by the
Pillow library
(in fact, Pixeltable is just using Pillow under the covers). You can
read more about the provided image (and other) UDFs in the
Pixeltable SDK
Documentation.
Let’s have a look at our table schema.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t
```
### Image Detection
In addition to simple operations like `rotate()` and `convert()`, the
Pixeltable API includes UDFs for various off-the-shelf image models.
Let’s look at one example: object detection using the ResNet-50 model.
Model inference is a UDF too, and it can be inserted into a computed
column like any other.
This one may take a little more time to compute, since it involves first
downloading the ResNet-50 model (if it isn’t already cached), then
running inference on the images in our table.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import detr_for_object_detection
t.add_computed_column(
detections=detr_for_object_detection(
t.source, model_id='facebook/detr-resnet-50', threshold=0.8
)
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.source, t.detections).collect()
```
It’s great that the DETR model gave us so much information about the
images, but it’s not exactly in human-readable form. Those are JSON
structures that encode bounding boxes, confidence scores, and categories
for each detected object. Let’s do something more useful with them:
we’ll use Pixeltable’s `bboxes_draw()` API to superimpose bounding boxes
on the images, using different colors to distinguish different object
categories.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.vision import bboxes_draw
t.add_computed_column(
image_with_bb=bboxes_draw(
t.source,
t.detections.boxes,
labels=t.detections.label_text,
fill=True,
)
)
t.select(t.source, t.image_with_bb).collect()
```
Added 3 column values with 0 errors.
It can be a little hard to see what’s going on, so let’s zoom in on just
one image. If you select a single image in a notebook, Pixeltable will
enlarge its display:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.image_with_bb).head(1)
```
Let’s check in on our schema. We now have five computed columns, all
derived from the single source column.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t
```
And as always, when we add new data to the table, its computed columns
are updated automatically. Let’s try this on a few more images.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
more_images = ['000000000108.jpg', '000000000885.jpg']
t.insert({'source': f'{url_prefix}/{image}'} for image in more_images)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(
t.source, t.image_with_bb, t.detections.label_text, t.metadata
).tail(2)
```
It bears repeating that Pixeltable is
persistent! Anything you put into a table, including computed
columns, will be saved in persistent storage. This includes inference
outputs such as t.detections, as well as generated images
such as t.image\_with\_bb. (Later we’ll see how to tune this
behavior in cases where it might be undesirable to store
everything, but the default behavior is that computed column
output is always persisted.)
### Expressions
Let’s have a closer look at that call to `bboxes_draw()` in the last
example.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
bboxes_draw(t.source, t.detections.boxes, labels=t.detections.label_text, fill=True)
```
There are a couple of things going on. `bboxes_draw()` is, of course, a
UDF, and its first argument is a column reference of the sort we’ve used
many times now: `t.source`, the source image. The other two arguments
are more than simple column references, though: they’re compound
expressions that include the column reference `t.detections` along with
a suffix (`.boxes` or `.label_text`) that tells Pixeltable to look
inside the dictionary stored in `t.detections`.
These are all examples of Pixeltable expressions. In fact, we’ve
seen other types of Pixeltable expressions as well, without explicitly
calling them out:
* Calls to a UDF are expressions, such as `t.source.rotate(10)`, or the
`bboxes_draw()` example above;
* Arithmetic operations are expressions, such as year-over-year
calculation in our first example:
`100 * pop_t.yoy_change / pop_t.pop_2022`.
## Next Steps
Learn more about working with Pixeltable:
* [Queries and
Expressions](/tutorials/queries-and-expressions)
* [Tables and Data
Operations](/tutorials/tables-and-data-operations)
# Queries and Expressions
Source: https://docs.pixeltable.com/tutorials/queries-and-expressions
Tutorial on Pixeltable queries, expressions, filters, joins, and aggregations for retrieving structured and multimodal data with Python syntax.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Expressions are the basic building blocks of Pixeltable. This guide
explores how to use queries and expressions, including:
* Different types of Pixeltable expressions
* Column references and arithmetic operations
* Function calls and media operations
* The Pixeltable type system
## Prerequisites
This guide assumes you’re familiar with:
* Creating and managing tables
* Basic table operations and queries
* Computed columns
If you’re new to these concepts, start with:
* [Tables and Data
Operations](/tutorials/tables-and-data-operations)
* [Computed
Columns](/tutorials/computed-columns)
## Understanding Expressions
You can use Pixeltable expressions in queries:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pop_t.select(yoy_change=(pop_t.pop_2023 - pop_t.pop_2022)).collect()
```
Or as computed columns that update automatically:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pop_t.add_column(yoy_change=(pop_t.pop_2023 - pop_t.pop_2022))
```
Both examples use the expression `pop_t.pop_2023 - pop_t.pop_2022`. You
can also chain operations:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.source.convert('RGBA').rotate(10)
```
Or invoke models:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
detr_for_object_detection(
t.source,
model_id='facebook/detr-resnet-50',
threshold=0.8
)
```
You can include an expression in a `select()` statement to evaluate it
dynamically, or in an `add_column()` statement to add it to the table
schema as a computed column.
To get started, let’s import the necessary libraries and set up a demo
directory.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable datasets torch transformers
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
pxt.drop_dir('demo', force=True)
pxt.create_dir('demo')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'demo'.
In this guide we’ll work with a subset of the MNIST dataset, a classic
reference database of hand-drawn digits. A copy of the MNIST dataset is
hosted on the Hugging Face datasets repository, so we can use
`create_table()` with the `source` parameter to load it into a
Pixeltable table.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import datasets
# Download the first 50 images of the MNIST dataset
ds = datasets.load_dataset('ylecun/mnist', split='train[:50]')
# Import them into a Pixeltable table
t = pxt.create_table('demo/mnist', source=ds)
```
Created table 'mnist'.
Inserting rows into \`mnist\`: 50 rows \[00:00, 7516.67 rows/s]
Inserted 50 rows with 0 errors.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.head(5)
```
### Column References
The most basic type of expression is a **column reference**: that’s what
you get when you type, say, `t.image`. An expression such as `t.image`
by itself is just a Python object; it doesn’t contain any actual data,
and no data will be loaded until you use the expression in a `select()`
query or `add_column()` statement. Here’s what we get if we type
`t.image` by itself:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.image
```
This is true of all Pixeltable expressions: we can freely create them
and manipulate them in various ways, but no actual data will be loaded
until we use them in a query.
### JSON Collections (Dicts and Lists)
Data is commonly presented in JSON format: for example, API responses
and model output often take the shape of JSON dictionaries or lists of
dictionaries. Pixeltable has native support for JSON accessors. To
demonstrate this, let’s add a computed column that runs an image
classification model against the images in our dataset.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import vit_for_image_classification
t.add_computed_column(
classification=vit_for_image_classification(
t.image, model_id='farleyknight-org-username/vit-base-mnist'
)
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.image, t.classification).head(3)
```
We see that the output is returned as a dict containing three lists: the
five most likely labels (classes) for the image, the corresponding text
labels (in this case, just the string form of the class number), and the
scores (confidences) of each prediction. The Pixeltable type of the
`classification` column is `pxt.Json`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t
```
Pixeltable provides a range of operators on `Json`-typed output that
behave just as you’d expect. To look up a key in a dictionary, use the
syntax `t.classification['labels']`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.classification['labels']).head(3)
```
You can also use a convenient “attribute” syntax for dictionary lookups.
This follows the standard
[JSONPath](https://en.wikipedia.org/wiki/JSONPath) expression syntax.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.classification.labels).head(3)
```
The “attribute” syntax isn’t fully general (it won’t work for dictionary
keys that are not valid Python identifiers), but it’s handy when it
works.
`t.classification.labels` is another Pixeltable expression; you can
think of it as saying, “do the `'labels'` lookup from every dictionary
in the column `t.classification`, and return the result as a new
column.” As before, the expression by itself contains no data; it’s the
query that does the actual work of retrieving data. Here’s what we see
if we just give the expression by itself, without a query:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.classification.labels
```
classification.labels
Similarly, one can pull out a specific item in a list (for this model,
we’re probably mostly interested in the first item anyway):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.classification.labels[0]).head(3)
```
Or slice a list in the usual manner:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.classification.labels[:2]).head(3)
```
Pixeltable is resilient against out-of-bounds indices or dictionary
keys. If an index or key doesn’t exist for a particular row, you’ll get
a `None` output for that row.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.classification.not_a_key).head(3)
```
As always, any expression can be used to create a computed column.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Use label_text to be consistent with t.label, which was given
# to us as a string
t.add_computed_column(pred_label=t.classification.label_text[0])
t
```
Added 50 column values with 0 errors.
Finally, just as it’s possible to extract items from lists and
dictionaries using Pixeltable expressions, you can also construct new
lists and dictionaries: just package them up in the usual way.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
custom_dict = {
# Keys must be strings; values can be any expressions
'ground_truth': t.label,
'prediction': t.pred_label,
'is_correct': t.label == t.pred_label,
# You can also use constants as values
'engine': 'pixeltable',
}
t.select(t.image, custom_dict).head(5)
```
### UDF Calls
UDF calls are another common type of expression. For example, we used
one earlier when we added a model invocation to our workload:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
vit_for_image_classification(
t.image,
model_id='farleyknight-org-username/vit-base-mnist'
)
```
This calls the `vit_for_image_classification` UDF in the
`pxt.functions.huggingface` module. Note that
`vit_for_image_classification` is a Pixeltable UDF, not an ordinary
Python function. You can think of a Pixeltable UDF as a function that
operates on columns of data, iteratively applying an underlying
operation to each row in the column (or columns). In this case,
`vit_for_image_classification` operates on `t.image`, running the model
against every image in the column.
Notice that in addition to the column `t.image`, this call to
`vit_for_image_classification` also takes a constant argument specifying
the `model_id`. Any UDF call argument may be a constant, and the
constant value simply means “use this value for every row being
evaluated”.
You can always compose Pixeltable expressions to form more complicated
ones; here’s an example that runs the model against a 90-degree rotation
of every image in the sample and extracts the label. Not surprisingly,
the model doesn’t perform as well on the rotated images.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
rot_model_result = vit_for_image_classification(
t.image.rotate(90),
model_id='farleyknight-org-username/vit-base-mnist',
)
t.select(t.image, rot_label=rot_model_result.labels[0]).head(5)
```
Note that we employed a useful trick here: we
assigned an expression to the variable rot\_model\_result for
later reuse. Every Pixeltable expression is a Python object, so you can
freely assign them to variables, reuse them, compose them, and so on.
Remember that nothing actually happens until the expression is used in a
query - so in this example, setting the variable
rot\_model\_result doesn’t itself result in any data being
retrieved; that only happens later, when we actually use it in the
select() query.
There are a large number of built-in UDFs that ship with Pixeltable; you
can always refer back to the [SDK
Documentation](/sdk/latest/) for details.
### Method Calls
Many built-in UDFs allow a convenient alternate syntax. The following
two expressions are exactly equivalent:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
a = t.image.rotate(90)
b = pxt.functions.image.rotate(t.image, 90)
```
`a` and `b` can always be used interchangeably in queries, with
identical results. Just like in standard Python classes, whenever
Pixeltable sees the **method call** `t.image.rotate(90)`, it interprets
it as a **function call** `pxt.functions.image.rotate(self, 90)`, with
(in this case) `self` equal to `t.image`.
Any method call can also be written as a function call, but (just like
in standard Python) not every function call can be written as a method
call. For example, the following won’t work:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.image.vit_for_image_classification(
model_id='farleyknight-org-username/vit-base-mnist'
)
```
That’s because `vit_for_image_classification` is part of the
`pxt.functions.huggingface` module, not the core module
`pxt.functions.image`. Most Pixeltable types have a corresponding **core
module** of UDFs that can be used as method calls (`pxt.functions.image`
for `Image`; `pxt.functions.string` for `String`; and so on), described
fully in the [SDK
Documentation](/sdk/latest/).
### Arithmetic and Boolean Operations
Expressions can also be combined using standard arithmetic and boolean
operators. As with everything else, arithmetic and boolean expressions
are operations on columns that (when used in a query) are applied to
every row.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.image, t.label, t.label == '4', t.label < '5').head(5)
```
When you use a `where` clause in a query, you’re giving it a Pixeltable
expression, too (a boolean-valued one).
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.where(t.label == '4').select(t.image).show()
```
The following example shows how boolean expressions can be assigned to
variables and used to form more complex expressions.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Reuse `rot_model_result` from above, extracting
# the dominant label as a new expression
rot_label = rot_model_result.label_text[0]
# Select all the rows where the ground truth label is '5',
# and the "rotated" version of the model got it wrong
# (by returning something other than a '5')
t.where((t.label == '5') & (rot_label != '5')).select(
t.image, t.label, rot_label=rot_label
).show()
```
Notice that to form a logical “and”, we wrote
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
(t.label == '5') & (rot_label != '5')
```
using the operator `&` rather than `and`. Likewise, to form a logical
“or”, we’d use `|` rather than `or`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
(t.label == '5') | (rot_label != '5')
```
For logical negation:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
~(t.label == '5')
```
This follows the convention used by other popular data-manipulation
frameworks such as Pandas, and it’s necessary because the Python
language does not allow the meanings of `and`, `or`, and `not` to be
customized. There is one more instance of this to be aware of: to check
whether an expression is `None`, it’s necessary to write (say)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.label == None
```
rather than `t.label is None`, for the same reason.
### Arrays
In addition to lists and dicts, Pixeltable also has built-in support for
numerical arrays. A typical place where arrays show up is as the output
of an embedding.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.huggingface import clip
# Add a computed column that computes a CLIP embedding for each image
t.add_computed_column(
clip=clip(t.image, model_id='openai/clip-vit-base-patch32')
)
t.select(t.image, t.clip).head(5)
```
Added 50 column values with 0 errors.
The underlying Python type of `pxt.Array` is an ordinary NumPy array
(`np.ndarray`), so that an array-typed column is a column of NumPy
arrays (in this example, representing the embedding output of each image
in the table). As with lists, arrays can be sliced in all the usual
ways.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.clip[0], t.clip[5:10], t.clip[-3:]).head(5)
```
### Ad hoc UDFs with `apply`
We’ve now seen the most commonly encountered Pixeltable expression
types. There are a few other less commonly encountered expressions that
are occasionally useful.
You can use `apply` to map any Python function onto a column of data.
You can think of `apply` as a quick way of constructing an “on-the-fly”
UDF for one-off use.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import numpy as np
t.select(t.clip.apply(np.ndarray.dumps, col_type=pxt.String)).head(2)
```
Note, however, that if the function you’re `apply`ing doesn’t have type
hints (as in the example here), you’ll need to specify the output column
type explicitly.
### Type Conversion with `astype`
Sometimes it’s useful to transform an expression of one type into a
different type. For example, you can use `astype` to turn an expression
of type `pxt.Json` into one of type `pxt.String`. This assumes that the
value being converted is actually a string; otherwise, you’ll get an
exception. Here’s an example:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Select the text in position 0 of `t.classification.label_text`; since
# `t.classification.label_text` has type `pxt.Json`, so does
# `t.classification.label_text[0]`
t.classification.label_text[0].col_type
```
Optional\[Json]
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Select the text in position 0 of `t.classification.label_text`, this time
# cast as a `pxt.String`
t.classification.label_text[0].astype(pxt.String).col_type
```
Optional\[String]
### Column Properties
Some `ColumnRef` expressions have additional useful properties. A media
column (image, video, audio, or document) has the following two
properties:
* `localpath`: the media location on the local filesystem
* `fileurl`: the original URL where the media resides (could be the same
as `localpath`)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(t.image, t.image.localpath).head(5)
```
Any computed column will have two additional properties, `errortype` and
`errormsg`. These properties will usually be `None`. However, if the
computed column was created with `on_error='ignore'` and an exception
was encountered during column execution, then the properties will
contain additional information about the exception.
To demonstrate this feature, we’re going to deliberately trigger an
exception in a computed column. The images in our example table are
black and white, meaning they have only one color channel. If we try to
extract a channel other than channel number `0`, we’ll get an exception.
Ordinarily when we call `add_computed_column`, the exception is raised
and the `add_computed_column` operation is aborted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(channel=t.image.getchannel(1))
```
Error: Error while evaluating computed column 'channel':
band index out of range
\[0;31m---------------------------------------------------------------------------\[0m
\[0;31mValueError\[0m Traceback (most recent call last)
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exec/expr\_eval/evaluators.py:225\[0m, in \[0;36mFnCallEvaluator.eval\[0;34m(self, call\_args\_batch)\[0m
\[1;32m 224\[0m \[38;5;28;01mtry\[39;00m:
\[0;32m--> 225\[0m item\[38;5;241m.\[39mrow\[\[38;5;28mself\[39m\[38;5;241m.\[39mfn\_call\[38;5;241m.\[39mslot\_idx] \[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mscalar\_py\_fn\[49m\[43m(\[49m\[38;5;241;43m*\[39;49m\[43mitem\[49m\[38;5;241;43m.\[39;49m\[43margs\[49m\[43m,\[49m\[43m \[49m\[38;5;241;43m*\[39;49m\[38;5;241;43m\*\[39;49m\[43mitem\[49m\[38;5;241;43m.\[39;49m\[43mkwargs\[49m\[43m)\[49m
\[1;32m 226\[0m \[38;5;28;01mexcept\[39;00m \[38;5;167;01mException\[39;00m \[38;5;28;01mas\[39;00m exc:
File \[0;32m/opt/miniconda3/envs/pxt/lib/python3.10/site-packages/PIL/Image.py:2682\[0m, in \[0;36mImage.getchannel\[0;34m(self, channel)\[0m
\[1;32m 2680\[0m \[38;5;28;01mraise\[39;00m \[38;5;167;01mValueError\[39;00m(msg) \[38;5;28;01mfrom\[39;00m \[38;5;21;01me\[39;00m
\[0;32m-> 2682\[0m \[38;5;28;01mreturn\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_new(\[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mim\[49m\[38;5;241;43m.\[39;49m\[43mgetband\[49m\[43m(\[49m\[43mchannel\[49m\[43m)\[49m)
\[0;31mValueError\[0m: band index out of range
The above exception was the direct cause of the following exception:
\[0;31mError\[0m Traceback (most recent call last)
Cell \[0;32mIn\[27], line 1\[0m
\[0;32m----> 1\[0m \[43mt\[49m\[38;5;241;43m.\[39;49m\[43madd\_computed\_column\[49m\[43m(\[49m\[43mchannel\[49m\[38;5;241;43m=\[39;49m\[43mt\[49m\[38;5;241;43m.\[39;49m\[43mimage\[49m\[38;5;241;43m.\[39;49m\[43mgetchannel\[49m\[43m(\[49m\[38;5;241;43m1\[39;49m\[43m)\[49m\[43m)\[49m
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table.py:697\[0m, in \[0;36mTable.add\_computed\_column\[0;34m(self, stored, destination, print\_stats, on\_error, if\_exists, \*\*kwargs)\[0m
\[1;32m 695\[0m \[38;5;28mself\[39m\[38;5;241m.\[39m\_verify\_column(new\_col)
\[1;32m 696\[0m \[38;5;28;01massert\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_tbl\_version \[38;5;129;01mis\[39;00m \[38;5;129;01mnot\[39;00m \[38;5;28;01mNone\[39;00m
\[0;32m--> 697\[0m result \[38;5;241m+\[39m\[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43m\_tbl\_version\[49m\[38;5;241;43m.\[39;49m\[43mget\[49m\[43m(\[49m\[43m)\[49m\[38;5;241;43m.\[39;49m\[43madd\_columns\[49m\[43m(\[49m\[43m\[\[49m\[43mnew\_col\[49m\[43m]\[49m\[43m,\[49m\[43m \[49m\[43mprint\_stats\[49m\[38;5;241;43m=\[39;49m\[43mprint\_stats\[49m\[43m,\[49m\[43m \[49m\[43mon\_error\[49m\[38;5;241;43m=\[39;49m\[43mon\_error\[49m\[43m)\[49m
\[1;32m 698\[0m FileCache\[38;5;241m.\[39mget()\[38;5;241m.\[39memit\_eviction\_warnings()
\[1;32m 699\[0m \[38;5;28;01mreturn\[39;00m result
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table\_version.py:666\[0m, in \[0;36mTableVersion.add\_columns\[0;34m(self, cols, print\_stats, on\_error)\[0m
\[1;32m 664\[0m all\_cols\[38;5;241m.\[39mappend(undo\_col)
\[1;32m 665\[0m \[38;5;66;03m# Add all columns\[39;00m
\[0;32m--> 666\[0m status \[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43m\_add\_columns\[49m\[43m(\[49m\[43mall\_cols\[49m\[43m,\[49m\[43m \[49m\[43mprint\_stats\[49m\[38;5;241;43m=\[39;49m\[43mprint\_stats\[49m\[43m,\[49m\[43m \[49m\[43mon\_error\[49m\[38;5;241;43m=\[39;49m\[43mon\_error\[49m\[43m)\[49m
\[1;32m 667\[0m \[38;5;66;03m# Create indices and their md records\[39;00m
\[1;32m 668\[0m \[38;5;28;01mfor\[39;00m col, (idx, val\_col, undo\_col) \[38;5;129;01min\[39;00m index\_cols\[38;5;241m.\[39mitems():
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table\_version.py:732\[0m, in \[0;36mTableVersion.\_add\_columns\[0;34m(self, cols, print\_stats, on\_error)\[0m
\[1;32m 730\[0m plan\[38;5;241m.\[39mopen()
\[1;32m 731\[0m \[38;5;28;01mtry\[39;00m:
\[0;32m--> 732\[0m excs\_per\_col \[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mstore\_tbl\[49m\[38;5;241;43m.\[39;49m\[43mload\_column\[49m\[43m(\[49m\[43mcol\[49m\[43m,\[49m\[43m \[49m\[43mplan\[49m\[43m,\[49m\[43m \[49m\[43mon\_error\[49m\[43m \[49m\[38;5;241;43m==\[39;49m\[43m \[49m\[38;5;124;43m'\[39;49m\[38;5;124;43mabort\[39;49m\[38;5;124;43m'\[39;49m\[43m)\[49m
\[1;32m 733\[0m \[38;5;28;01mexcept\[39;00m sql\_exc\[38;5;241m.\[39mDBAPIError \[38;5;28;01mas\[39;00m exc:
\[1;32m 734\[0m Catalog\[38;5;241m.\[39mget()\[38;5;241m.\[39mconvert\_sql\_exc(exc, \[38;5;28mself\[39m\[38;5;241m.\[39mid, \[38;5;28mself\[39m\[38;5;241m.\[39mhandle, convert\_db\_excs\[38;5;241m=\[39m\[38;5;28;01mTrue\[39;00m)
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/store.py:247\[0m, in \[0;36mStoreBase.load\_column\[0;34m(self, col, exec\_plan, abort\_on\_exc)\[0m
\[1;32m 245\[0m \[38;5;28;01mif\[39;00m abort\_on\_exc \[38;5;129;01mand\[39;00m row\[38;5;241m.\[39mhas\_exc():
\[1;32m 246\[0m exc \[38;5;241m=\[39m row\[38;5;241m.\[39mget\_first\_exc()
\[0;32m--> 247\[0m \[38;5;28;01mraise\[39;00m excs\[38;5;241m.\[39mError(\[38;5;124mf\[39m\[38;5;124m'\[39m\[38;5;124mError while evaluating computed column \[39m\[38;5;132;01m\{\[39;00mcol\[38;5;241m.\[39mname\[38;5;132;01m!r}\[39;00m\[38;5;124m:\[39m\[38;5;130;01m\n\[39;00m\[38;5;132;01m\{\[39;00mexc\[38;5;132;01m}\[39;00m\[38;5;124m'\[39m) \[38;5;28;01mfrom\[39;00m \[38;5;21;01mexc\[39;00m
\[1;32m 248\[0m table\_row, num\_row\_exc \[38;5;241m=\[39m row\_builder\[38;5;241m.\[39mcreate\_store\_table\_row(row, \[38;5;28;01mNone\[39;00m, row\[38;5;241m.\[39mpk)
\[1;32m 249\[0m num\_excs \[38;5;241m+\[39m\[38;5;241m=\[39m num\_row\_exc
\[0;31mError\[0m: Error while evaluating computed column 'channel':
band index out of range
But if we use `on_error='ignore'`, the exception will be logged in the
column properties instead.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.add_computed_column(channel=t.image.getchannel(1), on_error='ignore')
```
Notice that the update status informs us that there were 50 errors. If
we query the table, we see that the column contains only `None` values,
but the `errortype` and `errormsg` fields contain details of the error.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.select(
t.image, t.channel, t.channel.errortype, t.channel.errormsg
).head(5)
```
More details on Pixeltable’s error handling can be found in the
[External Files](/platform/external-files)
guide.
## The Pixeltable Type System
We’ve seen that every column and every expression in Pixeltable has an
associated **Pixeltable type**. In this section, we’ll briefly survey
the various Pixeltable types and their uses.
Here are all the supported types and their corresponding Python types:
The Python type is what you’ll get back if you query an expression of
the given Pixeltable type. For `pxt.Json`, it can be any of `str`,
`int`, `float`, `bool`, `list`, or `dict`.
pxt.Audio, pxt.Video,
and pxt.Document all correspond to the Python type
str. This is because those types are represented by file
paths that reference the media in question. When you query for, say,
t.select(t.video\_col), you’re guaranteed to get a file path
on the local filesystem (Pixeltable will download and cache a
local copy of the video if necessary to ensure this). If you want the
original URL, use t.video\_col.fileurl instead.
Several types can be **specialized** to constrain the allowable data in
a column.
* `pxt.Image` can be specialized with a resolution and/or an image mode:
* `pxt.Image[(300,200)]` - images with width 300 and height 200
* `pxt.Image['RGB']` - images with mode `'RGB'`; see the [PIL
Documentation](https://pillow.readthedocs.io/en/stable/handbook/concepts.html)
for the full list
* `pxt.Image[(300,200), 'RGB']` - combines the above constraints
* `pxt.Array` can be specialized with a shape and/or a dtype:
* `pxt.Array[pxt.Float]` - arrays with dtype `pxt.Float`
* `pxt.Array[(64,64,3), pxt.Float]` - 3-dimensional arrays with dtype
`pxt.Float` and 64x64x3 shape
If we look at the structure of our table now, we see examples of
specialized image and array types.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t
```
`t.clip` has type `pxt.Array[(512,), pxt.Float]`, since the output of
the embedding is always a 1x512 array. `t.channel` has type
`Image['L']`, since it’s always an `'L'` mode (1-channel) image.
You can freely use pxt.Image by
itself to mean “any image, without constraints”, but numerical arrays
must always specify a shape and a dtype; pxt.Array by
itself will raise an error.
Array shapes follow standard numpy conventions: a
shape is a tuple of integers, such as (512,) or
(64,64,3). A None may be used in place of an
integer to indicate an unconstrained size for that dimension, as in
(None,None,3) (3-dimensional array with two unconstrained
dimensions), or simply (None,) (unconstrained 1-dimensional
array).
# Tables and Data Operations
Source: https://docs.pixeltable.com/tutorials/tables-and-data-operations
Tutorial on creating Pixeltable tables, inserting rows, updating columns, and using views to build versioned multimodal data pipelines.
This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
This guide shows you how to:
* Create and manage tables: Understand Pixeltable’s table structure,
create and modify tables, and work with table schemas
* Manipulate data: Insert, update, and delete data within tables, and
retrieve data from tables into Python variables
* Filter and select data: Use `where()`, `select()`, and `order_by()` to
query for specific rows and columns
* Import data from CSV files and other file types
First, let’s ensure the Pixeltable library is installed in your
environment.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable
```
### Tables
All data in Pixeltable is stored in tables. At a high level, a
Pixeltable table behaves similarly to an ordinary SQL database table,
but with many additional capabilities to support complex AI workflows.
We’ll introduce those advanced capabilities gradually throughout this
tutorial; in this section, the focus is on basic table and data
operations.
Tables in Pixeltable are grouped into **directories**, which are simply
user-defined namespaces. The following command creates a new directory,
`fundamentals`, which we’ll use to store the tables in our tutorial.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# First we delete the `fundamentals` directory and all its contents (if
# it exists), in order to ensure a clean environment for the tutorial.
pxt.drop_dir('fundamentals', force=True)
# Now we create the directory.
pxt.create_dir('fundamentals')
```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'fundamentals'.
Now let’s create our first table. To create a table, we must give it a
name and a **schema** that describes the table structure. Note that
prefacing the name with `fundamentals` causes it to be placed in our
newly-created directory.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
films_t = pxt.create_table(
'fundamentals/films',
{'film_name': pxt.String, 'year': pxt.Int, 'revenue': pxt.Float},
)
```
Created table 'films'.
To insert data into a table, we use the `insert()` method, passing it a
list of Python dicts.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
films_t.insert(
[
{'film_name': 'Jurassic Park', 'year': 1993, 'revenue': 1037.5},
{'film_name': 'Titanic', 'year': 1997, 'revenue': 2257.8},
{
'film_name': 'Avengers: Endgame',
'year': 2019,
'revenue': 2797.5,
},
]
)
```
If you’re inserting just a single row, you can use an alternate syntax
that is sometimes more convenient.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
films_t.insert(
[{'film_name': 'Inside Out 2', 'year': 2024, 'revenue': 1462.7}]
)
```
Inserting rows into \`films\`: 1 rows \[00:00, 318.76 rows/s]
Inserted 1 row with 0 errors.
1 row inserted, 1 value computed.
We can peek at the data in our table with the `collect()` method, which
retrieves all the rows in the table.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
films_t.collect()
```
Pixeltable also provides `update()` and `delete()` methods for modifying
and removing data from a table; we’ll see examples of them shortly.
### Filtering and Selecting Data
Often you want to select only certain rows and/or certain columns in a
table. You can do this with the `where()` and `select()` methods.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
films_t.where(films_t.revenue >= 2000.0).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
films_t.select(films_t.film_name, films_t.year).collect()
```
Note the expressions that appear inside the calls to `where()` and
`select()`, such as `films_t.year`. These are **column references** that
point to specific columns within a table. In place of `films_t.year`,
you can also use dictionary syntax and type `films_t['year']`, which
means exactly the same thing but is sometimes more convenient.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
films_t.select(films_t['film_name'], films_t['year']).collect()
```
In addition to selecting columns directly, you can use column references
inside various kinds of expressions. For example, our `revenue` numbers
are given in millions of dollars. Let’s say we wanted to select revenue
in thousands of dollars instead; we could do that as follows:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
films_t.select(films_t.film_name, films_t.revenue * 1000).collect()
```
Note that since we selected an abstract expression rather than a
specific column, Pixeltable gave it the generic name `col_1`. You can
assign it a more informative name with Python keyword syntax:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
films_t.select(
films_t.film_name, revenue_thousands=films_t.revenue * 1000
).collect()
```
### Tables are Persistent
This is a good time to mention a few key differences between Pixeltable
tables and other familiar datastructures, such as Python dicts or Pandas
dataframes.
First, **Pixeltable is persistent. Unlike in-memory Python libraries
such as Pandas, Pixeltable is a database**. When you reset a notebook
kernel or start a new Python session, you’ll have access to all the data
you’ve stored previously in Pixeltable. Let’s demonstrate this by using
the IPython `%reset -f` command to clear out all our notebook variables,
so that `films_t` is no longer defined.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%reset -f
films_t.collect() # Throws an exception now
```
NameError: name 'films\_t' is not defined
\[0;31m---------------------------------------------------------------------------\[0m
\[0;31mNameError\[0m Traceback (most recent call last)
Cell \[0;32mIn\[11], line 2\[0m
\[1;32m 1\[0m get\_ipython()\[38;5;241m.\[39mrun\_line\_magic(\[38;5;124m'\[39m\[38;5;124mreset\[39m\[38;5;124m'\[39m, \[38;5;124m'\[39m\[38;5;124m-f\[39m\[38;5;124m'\[39m)
\[0;32m----> 2\[0m \[43mfilms\_t\[49m\[38;5;241m.\[39mcollect() \[38;5;66;03m# Throws an exception now\[39;00m
\[0;31mNameError\[0m: name 'films\_t' is not defined
The `films_t` variable (along with all other variables in our Python
session) has been cleared out - but that’s ok, because it wasn’t the
source of record for our data. The `films_t` variable is just a
reference to the underlying database table. We can recover it with the
`get_table` command, referencing the `films` table by name.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
films_t = pxt.get_table('fundamentals/films')
films_t.collect()
```
You can always get a list of existing tables with the Pixeltable
`pxt.ls()` command. Let’s use it to see the contents of the
`fundamentals` directory.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.ls(path='fundamentals')
```
Note that if you’re running Pixeltable on colab
or kaggle, the database will persist only for as long as your
colab/kaggle session remains active. If you’re running it locally or on
your own server, then your database will persist indefinitely (until you
actively delete it).
### Tables are Typed
The second major difference is that **Pixeltable is strongly typed**.
Because Pixeltable is a database, every column has a data type: that’s
why we specified `String`, `Int`, and `Float` for the three columns when
we created the table. These **type specifiers** are *mandatory* when
creating tables, and they become part of the table schema. You can
always see the table schema with the `describe()` method.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
films_t.describe()
```
In a notebook, you can also just type `films_t` to see the schema; its
output is identical to `films_t.describe()`.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
films_t
```
In addition to String,
Int, and Float, Pixeltable provides several
additional data types:
Bool, whose values are True or
False;
Array for
numerical arrays;
Json, for
lists or dicts that correspond to valid JSON structures;
and
The media typesImage, Video, Audio, and
Document.
We’ll see examples of each of these types later in this guide.
Besides the column names and types, there’s a third element to the
schema, `Computed With`. To learn more about this, see the [Computed
Columns](/tutorials/computed-columns) guide.
All of the methods we’ve discussed so far, such as `insert()` and
`get_table()`, are documented in the [Pixeltable
SDK](/sdk/latest/) Documentation. The
following pages are particularly relevant:
* [pixeltable](/sdk/latest/pixeltable)
package reference
* [pxt.Table](/sdk/latest/table) class
reference
### A Real-World Example: Earthquake Data
Now let’s dive a little deeper into Pixeltable’s data operations. To
showcase all the features, it’ll be helpful to have a real-world
dataset, rather than our toy dataset with four movies. The dataset we’ll
be using consists of Earthquake data drawn from the US Geological
Survey: all recorded Earthquakes that occurred within 100 km of Seattle,
Washington, between January 1, 2023 and June 30, 2024.
The dataset is in CSV format, and we can load it into Pixeltable by
using `create_table()` with the `source` parameter, which creates a new
Pixeltable table from the contents of a CSV file.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t = pxt.create_table(
'fundamentals/earthquakes', # Name for the new table
source='https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/earthquakes.csv',
primary_key='id', # Column 'id' is the primary key
schema_overrides={
'timestamp': pxt.Timestamp
}, # Interpret column 3 as a timestamp
)
```
Created table 'earthquakes'.
Inserting rows into \`earthquakes\`: 1823 rows \[00:00, 19554.24 rows/s]
Inserted 1823 rows with 0 errors.
In Pixeltable, you can always import external
data by giving a URL instead of a local file path. This applies to CSV
datasets, media files (such images and video), and other types of
content. The URL will often be an http\:// URL, but it can
also be an s3:// URL referencing an S3 bucket.
Pixeltable’s create\_table() function
with the source parameter can import data from various
formats including CSV, Excel, and Hugging Face datasets. You can also
use source to import from a Pandas dataframe. For more
details, see the
pixeltable.io
package reference.
Let’s have a peek at our new dataset. The dataset contains 1823 rows,
and we probably don’t want to display them all at once. We can limit our
query to fewer rows with the `limit()` method.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.limit(5).collect()
```
A different way of achieving something similar is to use the `head()`
and `tail()` methods. Pixeltable keeps track of the insertion order of
all its data, and `head()` and `tail()` will always return the *earliest
inserted* and *most recently inserted* rows in a table, respectively.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.head(5)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.tail(5)
```
head(n) and
limit(n).collect() appear similar in this example. But
head() always returns the earliest rows in a table,
whereas limit() makes no promises about the ordering of its
results (unless you specify an order\_by() clause - more on
this below).
Let’s also peek at the schema:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.describe()
```
Note that while specifying a schema is mandatory when *creating* a
table, it’s not always required when *importing* data. This is because
Pixeltable uses the structure of the imported data to infer the column
types, when feasible. You can always override the inferred column types
with the `schema_overrides` parameter of `import_csv()`.
The following examples showcase some common data operations.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.count() # Number of rows in the table
```
1823
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# 5 highest-magnitude earthquakes
eq_t.order_by(eq_t.magnitude, asc=False).limit(5).collect()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from datetime import datetime
# 5 highest-magnitude earthquakes in Q3 2023
eq_t.where(
(eq_t.timestamp >= datetime(2023, 6, 1))
& (eq_t.timestamp < datetime(2023, 10, 1))
).order_by(eq_t.magnitude, asc=False).limit(5).collect()
```
Note that Pixeltable uses Pandas-like operators for filtering data: the
expression
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
(eq_t.timestamp >= datetime(2023, 6, 1)) & (eq_t.timestamp < datetime(2023, 10, 1))
```
means *both* conditions must be true; similarly (say),
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
(eq_t.timestamp < datetime(2023, 6, 1)) | (eq_t.timestamp >= datetime(2023, 10, 1))
```
would mean *either* condition must be true.
You can also use the special `isin` operator to select just those values
that appear within a particular list:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Earthquakes with specific ids
eq_t.where(eq_t.id.isin([123, 456, 789])).collect()
```
In addition to basic operators like `>=` and `isin`, a Pixeltable
`where` clause can also contain more complex operations. For example,
the `location` column in our dataset is a string that contains a lot of
information, but in a relatively unstructured way. Suppose we wanted to
see all Earthquakes in the vicinity of Rainier, Washington; one way to
do this is with the `contains()` method:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# All earthquakes in the vicinity of Rainier
eq_t.where(eq_t.location.contains('Rainier')).collect()
```
Pixeltable also supports various **aggregators**; here’s an example
showcasing two fairly simple ones, `max()` and `min()`:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Min and max ids
eq_t.select(
min=pxt.functions.min(eq_t.id), max=pxt.functions.max(eq_t.id)
).collect()
```
To learn more about Pixeltable functions and expressions, see the
[Computed
Columns](/tutorials/computed-columns) guide.
They’re also exhaustively documented in the [Pixeltable SDK
Documentation](/sdk/latest).
### Extracting Data from Tables into Python/Pandas
Sometimes it’s handy to pull out data from a table into a Python object.
We’ve actually already done this; the call to `collect()` returns an
in-memory result set, which we can then dereference in various ways. For
example:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
result = eq_t.limit(5).collect()
result[0] # Get the first row of the results as a dict
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
result[
'timestamp'
] # Get a list of the `timestamp` field of all the rows that were queried
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
df = result.to_pandas() # Convert the result set into a Pandas dataframe
df['magnitude'].describe()
```
count 5.000000
mean 0.744000
std 0.587988
min 0.200000
25% 0.290000
50% 0.520000
75% 1.150000
max 1.560000
Name: magnitude, dtype: float64
`collect()` without a preceding `limit()` returns the entire contents of
a query or table. Be careful! For very large tables, this could result
in out-of-memory errors. In this example, the 1823 rows in the table fit
comfortably into a dataframe.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
df = eq_t.collect().to_pandas()
df['magnitude'].describe()
```
count 1823.000000
mean 0.900378
std 0.625492
min -0.830000
25% 0.420000
50% 0.850000
75% 1.310000
max 4.300000
Name: magnitude, dtype: float64
### Adding Columns
Like other database tables, Pixeltable tables aren’t fixed entities:
they’re meant to evolve over time. Suppose we want to add a new column
to hold user-specified comments about particular earthquake events. We
can do this with the `add_column()` method:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.add_column(note=pxt.String)
```
Here, `note` is the column name, and `pxt.String` specifies the type of
the new column.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.add_column(contact_email=pxt.String)
```
Let’s have a look at the revised schema.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.describe()
```
### Updating Rows in a Table
Table rows can be modified and deleted with the SQL-like `update()` and
`delete()` commands.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add a comment to records with IDs 123 and 127
(
eq_t.where(eq_t.id.isin([121, 123])).update(
{
'note': 'Still investigating.',
'contact_email': 'contact@pixeltable.com',
}
)
)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.where(eq_t.id >= 120).select(
eq_t.id, eq_t.magnitude, eq_t.note, eq_t.contact_email
).head(5)
```
`update()` can also accept an expression, rather than a constant value.
For example, suppose we wanted to shorten the location strings by
replacing every occurrence of `Washington` with `WA`. One way to do this
is with an `update()` clause, using a Pixeltable expression with the
`replace()` method.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.update({'location': eq_t.location.replace('Washington', 'WA')})
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.head(5)
```
Notice that in all cases, the `update()` clause takes a Python
dictionary, but its values can be either constants such as
`'contact@pixeltable.com'`, or more complex expressions such as
`eq_t.location.replace('Washington', 'WA')`. Also notice that if
`update()` appears without a `where()` clause, then every row in the
table will be updated, as in the preceding example.
### Batch Updates
The `batch_update()` method provides an alternative way to update
multiple rows with different values. With a `batch_update()`, the
contents of each row are specified by individual `dict`s, rather than
according to a formula. Here’s a toy example that shows `batch_update()`
in action.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
updates = [
{'id': 500, 'note': 'This is an example note.'},
{'id': 501, 'note': 'This is a different note.'},
{'id': 502, 'note': 'A third note, unrelated to the others.'},
]
eq_t.batch_update(updates)
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.where(eq_t.id >= 500).select(
eq_t.id, eq_t.magnitude, eq_t.note, eq_t.contact_email
).head(5)
```
### Deleting Rows
To delete rows from a table, use the `delete()` method.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Delete all rows in 2024
eq_t.where(eq_t.timestamp >= datetime(2024, 1, 1)).delete()
```
587 rows deleted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.count() # How many are left after deleting?
```
1236
Don’t forget to specify a `where()` clause when using `delete()`! If you
run `delete()` without a `where()` clause, the entire contents of the
table will be deleted.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.delete()
```
### Table Versioning
Every table in Pixeltable is versioned: some or all of its modification
history is preserved. We’ve seen a reference to this already; `pxt.ls()`
will show the most recent version along with each table it lists.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.ls('fundamentals')
```
To see the version history of a particular table:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.history()
```
If you ever make a mistake, you can always call `revert()` to undo the
most recent change to a table and roll back to the previous version.
Let’s try it out: we’ll use it to revert the successive `delete()` calls
that we just executed.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.revert()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.count()
```
Be aware: calling revert() cannot
be undone!
### Multimodal Data
In addition to the structured data we’ve been exploring so far,
Pixeltable has native support for **media types**: images, video, audio,
and unstructured documents such as pdfs. Media support is one of
Pixeltable’s core capabilities. Here’s an example showing how media data
lives side-by-side with structured data in Pixeltable.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add a new column of type `Image`
eq_t.add_column(map_image=pxt.Image)
eq_t.describe()
```
Added 1823 column values with 0 errors.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Update the row with id == 1002, adding an image to the `map_image` column
eq_t.where(eq_t.id == 1002).update(
{
'map_image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/port-townsend-map.jpeg'
}
)
```
Inserting rows into \`earthquakes\`: 1 rows \[00:00, 192.79 rows/s]
1 row updated, 1 value computed.
Note that in Pixeltable, you can always insert
images into a table by giving the file path or URL of the image (as a
string). It’s not necessary to load the image first; Pixeltable will
manage the loading and caching of images in the background. The same
applies to other media data such as documents and videos.
Pixeltable will also embed image thumbnails in your notebook when you do
a query:
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.where(eq_t.id >= 1000).select(
eq_t.id, eq_t.magnitude, eq_t.location, eq_t.map_image
).head(5)
```
### Directory Hierarchies
So far we’ve only seen an example of a single directory with a table
inside it, but one can also put directories inside other directories, in
whatever fashion makes the most sense for a given application.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.create_dir('fundamentals/subdir')
pxt.create_dir('fundamentals/subdir/subsubdir')
pxt.create_table(
'fundamentals/subdir/subsubdir/my_table', {'my_col': pxt.String}
)
```
Created directory 'fundamentals/subdir'.
Created directory 'fundamentals/subdir/subsubdir'.
Created table 'my\_table'.
### Deleting Columns, Tables, and Directories
`drop_column()`, `drop_table()`, and `drop_dir()` are used to delete
columns, tables, and directories, respectively.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Delete the `contact_email` column
eq_t.drop_column('contact_email')
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
eq_t.describe()
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Delete the entire table (cannot be reverted!)
pxt.drop_table('fundamentals/earthquakes')
```
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Delete the entire directory and all its contents, including any nested
# subdirectories (cannot be reverted)
pxt.drop_dir('fundamentals', force=True)
```
## Next Steps
Learn more about working with Pixeltable:
* [Computed
Columns](/tutorials/computed-columns)
* [Queries and
Expressions](/tutorials/queries-and-expressions)
# Agents & MCP
Source: https://docs.pixeltable.com/use-cases/agents-mcp
Build LLM agents in Pixeltable with declarative tool calling, persistent memory tables, and Model Context Protocol server integration.
**Who:** Agent Builders, AI Engineers\
**Output:** Autonomous AI agents with memory and tool use
Build AI agents that can call tools, remember context, and integrate with MCP servers—all backed by Pixeltable's persistent storage and orchestration.
**Declarative Agents:** Instead of imperative control flow, define your agent as a table with computed columns. Each row is a user query; computed columns define the reasoning chain (tool selection → execution → context retrieval → response). Pixeltable handles orchestration, caching, and persistence automatically.
***
## Agent Capabilities
Register UDFs and queries as tools that LLMs can invoke
Store conversation history and retrieved context in tables
Connect to Model Context Protocol servers for external tools
Semantic search over documents, images, and more
***
## Data Lifecycle
Wrap any Python code as `@pxt.udf` tools—API calls, web scraping, database queries
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
import requests
import yfinance as yf
@pxt.udf
def get_latest_news(topic: str) -> str:
"""Fetch latest news using NewsAPI."""
response = requests.get(
"https://newsapi.org/v2/everything",
params={"q": topic, "apiKey": os.environ["NEWS_API_KEY"]}
)
articles = response.json().get("articles", [])[:3]
return "\n".join(f"- {a['title']}" for a in articles)
@pxt.udf
def fetch_financial_data(ticker: str) -> str:
"""Fetch stock data using yfinance."""
stock = yf.Ticker(ticker)
info = stock.info
return f"{info['shortName']}: ${info['currentPrice']}"
```
Writing custom functions
Turn semantic search into callable tools with `@pxt.query`
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.query
def search_documents(query_text: str, user_id: str):
"""Search documents by semantic similarity."""
sim = chunks.text.similarity(query_text)
return (
chunks.where((chunks.user_id == user_id) & (sim > 0.5))
.order_by(sim, asc=False)
.select(chunks.text, source_doc=chunks.document, sim=sim)
.limit(20)
)
@pxt.query
def search_video_transcripts(query_text: str):
"""Search video transcripts by text."""
sim = transcript_sentences.text.similarity(query_text)
return (
transcript_sentences.where(sim > 0.7)
.order_by(sim, asc=False)
.select(transcript_sentences.text, source_video=transcript_sentences.video)
.limit(20)
)
```
Combine UDFs, queries, and MCP tools into a single registry
[`pxt.tools()`](/howto/cookbooks/agents/llm-tool-calling)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Register tools from multiple sources
tools = pxt.tools(
# UDFs - External API Calls
get_latest_news,
fetch_financial_data,
# Query Functions - Agentic RAG
search_documents,
search_video_transcripts,
)
```
Complete tool calling walkthrough
Define the workflow as a table with computed columns
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Main workflow table - rows trigger the agent pipeline
agent = pxt.create_table('agents.workflow', {
'prompt': pxt.String,
'timestamp': pxt.Timestamp,
'user_id': pxt.String,
'system_prompt': pxt.String,
'max_tokens': pxt.Int,
'temperature': pxt.Float,
})
```
First LLM call decides which tool to use
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.anthropic import messages, invoke_tools
# Step 1: LLM selects which tool to call
agent.add_computed_column(
initial_response=messages(
model='claude-sonnet-4-20250514',
messages=[{'role': 'user', 'content': agent.prompt}],
max_tokens=agent.max_tokens,
tools=tools, # Available tools
tool_choice=tools.choice(required=True), # Force tool selection
model_kwargs={'system': agent.system_prompt}
)
)
```
Pixeltable executes the selected tool automatically
[`invoke_tools()`](/howto/cookbooks/agents/llm-tool-calling)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Step 2: Execute the tool the LLM chose
agent.add_computed_column(
tool_output=invoke_tools(tools, agent.initial_response)
)
```
Combine tool output with retrieved context
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Parallel context retrieval (Pixeltable handles this)
agent.add_computed_column(doc_context=search_documents(agent.prompt, agent.user_id))
agent.add_computed_column(image_context=search_images(agent.prompt, agent.user_id))
agent.add_computed_column(memory_context=search_memory(agent.prompt, agent.user_id))
# Assemble everything into final context
agent.add_computed_column(
final_context=assemble_context(
agent.prompt,
agent.tool_output,
agent.doc_context,
agent.memory_context,
)
)
```
Second LLM call generates the answer with full context
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Step 3: Generate final answer with all context
agent.add_computed_column(
final_response=messages(
model='claude-sonnet-4-20250514',
messages=agent.final_context,
max_tokens=agent.max_tokens,
model_kwargs={'system': agent.system_prompt}
)
)
# Extract answer text
agent.add_computed_column(
answer=agent.final_response.content[0].text
)
```
Complete walkthrough
Load tools from any MCP-compatible server
[`pxt.mcp_udfs()`](/howto/cookbooks/agents/llm-tool-calling)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Load tools from MCP server
mcp_tools = pxt.mcp_udfs('http://localhost:8000/mcp')
# Combine with local tools
all_tools = pxt.tools(
get_latest_news,
fetch_financial_data,
search_documents,
*mcp_tools # Add MCP tools
)
```
MCP server for Claude, Cursor, and AI IDEs
Expose Pixeltable tables as MCP tools for AI IDEs
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Example: JFK Files MCP Server
# Exposes document search to Claude Desktop, Cursor, etc.
from mcp.server import Server
import pixeltable as pxt
server = Server("jfk-files")
@server.tool()
def search_jfk_documents(query: str) -> str:
"""Search declassified JFK documents."""
docs = pxt.get_table('jfk.documents')
sim = docs.content.similarity(query)
results = docs.order_by(sim, asc=False).limit(5).collect()
return "\n".join(r['content'] for r in results)
```
Example MCP server with document search
Store conversation turns with semantic search
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Chat history with embedding index
chat_history = pxt.create_table('agents.chat_history', {
'role': pxt.String, # 'user' or 'assistant'
'content': pxt.String,
'timestamp': pxt.Timestamp,
'user_id': pxt.String
})
chat_history.add_embedding_index(
'content',
string_embed=sentence_transformer.using(model_id='all-MiniLM-L6-v2')
)
# Recent history query
@pxt.query
def get_recent_chat_history(user_id: str, limit: int = 4):
return (
chat_history.where(chat_history.user_id == user_id)
.order_by(chat_history.timestamp, asc=False)
.select(role=chat_history.role, content=chat_history.content)
.limit(limit)
)
# Semantic search over all history
@pxt.query
def search_chat_history(query_text: str, user_id: str):
sim = chat_history.content.similarity(query_text)
return (
chat_history.where((chat_history.user_id == user_id) & (sim > 0.8))
.order_by(sim, asc=False)
.select(role=chat_history.role, content=chat_history.content, sim=sim)
.limit(10)
)
```
Persistent conversation context
Store user-saved snippets (code, text, facts) for recall
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Selective memory - things the user explicitly saves
memory_bank = pxt.create_table('agents.memory_bank', {
'content': pxt.String,
'type': pxt.String, # 'code', 'text', 'fact'
'language': pxt.String, # For code: 'python', 'javascript', etc.
'context_query': pxt.String, # What triggered this save
'timestamp': pxt.Timestamp,
'user_id': pxt.String
})
memory_bank.add_embedding_index('content', string_embed=embed_fn)
@pxt.query
def search_memory(query_text: str, user_id: str):
sim = memory_bank.content.similarity(query_text)
return (
memory_bank.where((memory_bank.user_id == user_id) & (sim > 0.8))
.order_by(sim, asc=False)
.select(
content=memory_bank.content,
type=memory_bank.type,
language=memory_bank.language,
context_query=memory_bank.context_query,
)
.limit(10)
)
```
Index documents, images, video, and audio for retrieval
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Documents with chunking
documents = pxt.create_table('agents.collection', {
'document': pxt.Document,
'uuid': pxt.String,
'user_id': pxt.String
})
chunks = pxt.create_view('agents.chunks', documents,
iterator=document_splitter(
document=documents.document,
separators='paragraph',
metadata='title, heading, page'
)
)
chunks.add_embedding_index('text', string_embed=embed_fn)
# Images with CLIP
images = pxt.create_table('agents.images', {
'image': pxt.Image,
'user_id': pxt.String
})
images.add_embedding_index('image', embedding=clip.using(model_id='openai/clip-vit-large-patch14'))
# Video frames
videos = pxt.create_table('agents.videos', {'video': pxt.Video, 'user_id': pxt.String})
video_frames = pxt.create_view('agents.video_frames', videos,
iterator=frame_iterator(video=videos.video, fps=1)
)
video_frames.add_embedding_index('frame', embedding=clip.using(model_id='openai/clip-vit-large-patch14'))
```
Document retrieval patterns
Expose your agent via HTTP API
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from flask import Flask, request
from datetime import datetime
import pixeltable as pxt
app = Flask(__name__)
agent = pxt.get_table('agents.workflow')
chat_history = pxt.get_table('agents.chat_history')
@app.route("/chat", methods=["POST"])
def chat():
data = request.json
user_id = data["user_id"]
prompt = data["message"]
# Store user message
chat_history.insert([{
"role": "user",
"content": prompt,
"timestamp": datetime.now(),
"user_id": user_id
}])
# Trigger agent workflow (computed columns run automatically)
agent.insert([{
"prompt": prompt,
"timestamp": datetime.now(),
"user_id": user_id,
"system_prompt": "You are a helpful assistant.",
"max_tokens": 1024,
"temperature": 0.7,
}])
# Get the answer (already computed)
result = agent.order_by(agent.timestamp, asc=False).limit(1).collect()
answer = result[0]["answer"]
# Store assistant response
chat_history.insert([{
"role": "assistant",
"content": answer,
"timestamp": datetime.now(),
"user_id": user_id
}])
return {"response": answer}
```
Production deployment patterns
Serve your agent via `pxt serve` (available now) or deploy to Pixeltable Cloud with `pxt deploy` (coming soon).
Expose tables and queries as HTTP endpoints
***
## Built with Pixeltable
Multimodal AI agent with infinite memory, file search, and image generation
Lightweight agent framework with built-in memory and tool orchestration
Persistent memory layer for AI applications
Model Context Protocol server for Claude, Cursor, and AI IDEs
***
## Related Cookbooks
Complete guide to `pxt.tools()` and `invoke_tools()`
Persistent conversation context patterns
Retrieval-augmented generation workflow
Use tables as callable functions
# Backend for AI Apps
Source: https://docs.pixeltable.com/use-cases/ai-applications
Build multimodal AI applications on Pixeltable with declarative pipelines that combine images, video, audio, documents, and language data.
**Who:** AI/App Developers
**Output:** AI-powered application
Add multimodal intelligence to applications with two deployment patterns.
**Same foundation, different intent:** This workflow uses the same Pixeltable capabilities as [Data Wrangling for ML](/use-cases/ml-data-wrangling) — tables, multimodal types, computed columns, iterators. The difference is the output: training datasets vs. live application intelligence.
***
## Data Lifecycle
Define schema with native multimodal types — Pixeltable handles storage and references
[`create_table()`](/tutorials/tables-and-data-operations), [`pxt.Image`](/platform/type-system), [`pxt.Video`](/platform/type-system), [`pxt.Audio`](/platform/type-system), [`pxt.Document`](/platform/type-system), [`pxt.Json`](/platform/type-system)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
# Native multimodal types
t = pxt.create_table('app.docs', {
'pdf': pxt.Document,
'metadata': pxt.Json
})
```
Create tables and manage data
Image, Video, Audio, Document, JSON & more
Load from any source — local files, URLs, cloud storage, or databases
[`insert()`](/tutorials/tables-and-data-operations), [`import_csv()`](/sdk/latest/io), [S3/GCS/Azure](/integrations/cloud-storage)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Insert with URLs, local paths, or direct upload
t.insert([
{'pdf': 'https://example.com/report.pdf'},
{'pdf': '/local/path/to/doc.pdf'},
{'pdf': 's3://bucket/documents/spec.pdf'}
])
```
Load from cloud storage
S3, GCS, Azure, R2 configuration
Create UDFs and computed columns — they auto-update when data changes
[`@pxt.udf`](/platform/udfs-in-pixeltable), [`@pxt.query`](/platform/udfs-in-pixeltable), [`add_computed_column()`](/tutorials/computed-columns)
Write custom functions in Python
Auto-update derived data
Extract frames, transcribe audio, chunk documents
[`frame_iterator()`](/platform/iterators), [`document_splitter()`](/platform/iterators), [`AudioSplitter`](/platform/iterators)
Process video into searchable frames
Audio to text with Whisper
Add embedding indexes with **incremental sync** — only new/changed rows are embedded
[`add_embedding_index()`](/platform/embedding-indexes)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add index once — auto-updates on insert
docs.add_embedding_index('content', string_embed=e5_embed)
```
Configure and query indexes
Use OpenAI embedding models
Define `@pxt.query` functions that return data from your tables
[`@pxt.query`](/platform/udfs-in-pixeltable)
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.query
def get_image(image_id: str) -> PIL.Image.Image:
return (
images.where(images.uuid == image_id)
.select(images.image)
.limit(1)
)
# Use in computed columns or API endpoints
t.add_computed_column(thumbnail=get_image(t.image_id))
```
Reusable parameterized queries
Find relevant content by meaning, not keywords
[`.similarity()`](/platform/embedding-indexes), `.order_by()`, `.where()`, `.collect()`
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sim = images.image.similarity(query)
results = images.order_by(sim, asc=False).select(
uuid=images.uuid,
url=images.image.fileurl
).limit(10).collect()
```
Search documents by meaning
Find visually similar images
Expose Pixeltable functions as LLM tools for agents
[`pxt.tools()`](/howto/cookbooks/agents/llm-tool-calling), [`invoke_tools()`](/howto/cookbooks/agents/llm-tool-calling)
LLM agents with function calling
Persistent conversation context
Expose tables and queries as HTTP endpoints with a TOML config or a single CLI command
[`pxt serve`](/howto/deployment/serving), [`FastAPIRouter`](/howto/deployment/serving#quickstart-python)
```toml theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# service.toml
[[service]]
name = "image-service"
port = 8000
[[service.routes]]
type = "insert"
table = "app/images"
path = "/upload"
uploadfile_inputs = ["image"]
outputs = ["image", "caption"]
[[service.routes]]
type = "query"
path = "/search"
query = "app.queries.search_images"
```
```bash theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt serve image-service --config service.toml
```
TOML config, CLI, Python API, background jobs
Full backend vs. orchestration layer
For custom logic, middleware, or authentication, use Flask, FastAPI, or any Python web framework
`pxt.get_table()`, `.insert()`, `.select()`, `.collect()`
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from flask import Flask, request
import pixeltable as pxt
app = Flask(__name__)
images = pxt.get_table("app.images")
@app.route("/api/search", methods=["POST"])
def search():
query = request.form.get("q")
sim = images.image.similarity(query)
return images.order_by(sim, asc=False).limit(10).collect()
```
Concurrency, error handling, sync endpoints
Full Flask app with file upload & search
Get pre-signed URLs for media files stored in cloud storage
`.fileurl`, pre-signed URLs for S3/GCS/Tigris
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
url = row["image"].fileurl
presigned = s3.generate_presigned_url(
"get_object",
Params={"Bucket": bucket, "Key": key},
ExpiresIn=3600,
)
```
S3, GCS, Azure, R2, Tigris configuration
***
## Deployment Patterns
**When:** Keep existing RDBMS + blob storage
Pixeltable processes media, runs models, then exports results to your existing systems.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.io.sql import export_sql
# Process in Pixeltable with media stored directly to S3/GCS/Azure
videos.add_computed_column(
thumbnail=videos.frame.resize((256, 256)),
destination='s3://my-bucket/thumbnails/'
)
# Export structured results to serving DB
export_sql(
videos.select(videos.video, videos.transcript),
'video_metadata',
db_connect_str='postgresql+psycopg://...',
if_exists='replace',
)
```
Process with computed columns, export with `export_sql`
**When:** Need versioning, lineage, and retrieval (RAG) from same system
Pixeltable persists everything—use it as your primary data backend with automatic versioning.
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Everything in one place: storage + compute + retrieval
docs.add_computed_column(chunks=document_splitter(docs.pdf))
docs.add_embedding_index('chunks', string_embed=e5_embed)
# Query with full lineage
results = docs.chunks.similarity(query).limit(10).collect()
```
Versioning, lineage, and retrieval in one system
***
## End-to-End Examples
Multimodal AI agent with memory, file search, and image generation
Next.js + FastAPI app for text & image search
Retrieval-augmented generation workflow
**More sample apps:** Check out the [sample-apps directory](https://github.com/pixeltable/pixeltable/tree/main/docs/sample-apps) for chat applications, multimodal search, and more.
# Get Started with Data Sharing
Source: https://docs.pixeltable.com/use-cases/get-started
Get started with Pixeltable Cloud to explore, share, and collaborate on multimodal AI datasets and tabular pipelines in a hosted workspace.
## Overview
Build and share multimodal AI datasets without managing infrastructure. Work with your images, videos, audio, and documents through a unified Python API - process them with AI models, create embeddings, and publish your results for team collaboration or public research.
## Quick Start
**Requirements:** Pixeltable >= 0.6.2
**Replicate a dataset:**
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
coco_copy = pxt.replicate(
remote_uri='pxt://pixeltable:fiftyone/coco_mini_2017',
local_path='coco-copy'
)
```
Replicas are read-only locally, but you can query them, perform similarity searches, update them with `pull()`, or create independent copies.
**Publish your datasets** (requires account and API key from [pixeltable.com](https://pixeltable.com/)):
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
pxt.publish(
source='my-table',
destination_uri='pxt://username/my-dataset'
)
```
After publishing, use `push()` to update the cloud replica with local changes. Access defaults to private; add `access='public'` to make it publicly accessible.
Learn more in the [Data Sharing Guide](/platform/data-sharing).
## Resources
Get real-time help from our community
Report issues and contribute code
Browse our documentation
Schedule time with our team
# Data Wrangling for ML
Source: https://docs.pixeltable.com/use-cases/ml-data-wrangling
Wrangle video, audio, documents, and images into ML-ready datasets with Pixeltable computed columns, iterators, and embedding indices.
**Who:** ML Engineers, Data Scientists
**Output:** Training/evaluation datasets
**Pixeltable is your system of record**—all data, cached results, and references stay in sync.
***
## Data Lifecycle
Load from any source: [`import_csv()`](/sdk/latest/io#func-import_csv), [`import_parquet()`](/sdk/latest/io#func-import_parquet), [HuggingFace](/howto/cookbooks/data/data-import-huggingface), [S3/GCS/Azure](/integrations/cloud-storage), RDBMS via Python DB API
Load images/videos from cloud storage
Load datasets from HuggingFace Hub
Statistics & sampling: [`select()`](/tutorials/queries-and-expressions), [`.sample()`](/howto/cookbooks/data/data-sampling), `.head()`
Sample and filter large datasets efficiently
Transform & extract: [`add_computed_column()`](/tutorials/computed-columns), [`FrameIterator`](/platform/iterators), [`DocumentSplitter`](/platform/iterators)
Process video into frame-level data
Audio to text with Whisper
**Model-in-the-loop:** Auto-generate labels with AI models
* **Object Detection:** [`yolox.yolox()`](/sdk/latest/yolox), [`huggingface.detr_for_object_detection()`](/sdk/latest/huggingface)
* **Vision LLMs:** [`openai.chat_completions()`](/sdk/latest/openai), [`anthropic.messages()`](/sdk/latest/anthropic), [`gemini.generate_content()`](/sdk/latest/gemini)
* **Classification:** [`huggingface.image_classification()`](/sdk/latest/huggingface)
Run YOLOX detection on images
Analyze images with GPT-4o
**Human-in-the-loop:** Refine labels with human annotators
[Label Studio](/howto/using-label-studio-with-pixeltable) sync, [FiftyOne](/howto/working-with-fiftyone) export, [`add_embedding_index()`](/platform/embedding-indexes) for curation search
Sync annotations bidirectionally
Visualize and curate datasets
**Model-in-the-loop vs Human-in-the-loop:** Use pre-annotation to generate initial labels with AI models, then refine with human annotators. Pixeltable keeps both in sync—model outputs and human corrections live in the same table.
Find similar examples with embedding search, filter by quality metrics
[`add_embedding_index()`](/platform/embedding-indexes), [`.similarity()`](/platform/embedding-indexes), `.where()`, `.order_by()`
Find visually similar samples
Search by meaning, not keywords
**Test transformations before committing:** Run `SELECT` to preview results on samples before adding computed columns
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Test on 5 rows first (no storage cost)
t.select(t.image, new_label=my_classifier(t.image)).head(5)
# Happy? Commit to full dataset
t.add_computed_column(new_label=my_classifier(t.image))
```
Test UDFs and expressions before committing
Version control: [`create_snapshot()`](/platform/version-control), [`create_view()`](/platform/views), [`history()`](/platform/version-control), lineage tracking
Track changes and revert to previous states
**Why curate?** ML models are only as good as their training data. Use Pixeltable's search and filtering to find edge cases, remove duplicates, balance classes, and iterate on your data quality before export.
Publish to cloud: [`publish()`](/platform/data-sharing), [`replicate()`](/platform/data-sharing), `push()`, `pull()`
Collaborate with your team via cloud replicas
Training and data formats: [`export_csv()`](/sdk/latest/io#func-export_csv), [`export_json()`](/sdk/latest/io#func-export_json), [`export_parquet()`](/sdk/latest/io#func-export_parquet), [`to_pytorch_dataset()`](/sdk/latest/query#method-to_pytorch_dataset), [`to_coco_dataset()`](/sdk/latest/query#method-to_coco_dataset), [`export_lancedb()`](/sdk/latest/io#func-export_lancedb)
Convert to PyTorch DataLoader format
All import/export formats
***
## End-to-End Examples
Complete workflow: ingest video → extract frames → detect objects → export
Transcribe and analyze audio at scale
Extract structured data from images with GPT-4o
Auto-generate image descriptions
# Cloud Offering
Source: https://docs.pixeltable.com/use-cases/services
Share data, serve endpoints, and collaborate on multimodal AI workflows with Pixeltable Cloud services for teams and production deployments.
Pixeltable Cloud extends the local SDK with team collaboration and production deployment capabilities.
***
## Publish & Replicate ✅ Available Now
| Feature | API |
| ------------------ | ----------------------------------------------------------------- |
| Publish datasets | [`pxt.publish(source, destination_uri)`](/platform/data-sharing) |
| Replicate datasets | [`pxt.replicate(remote_uri, local_path)`](/platform/data-sharing) |
| Sync updates | `push()`, `pull()` |
| Access control | `access='public'` or `'private'` |
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Publish your curated dataset
pxt.publish(source='my-table', destination_uri='pxt://myorg/my-dataset')
# Anyone can replicate public datasets (no account required)
coco = pxt.replicate(remote_uri='pxt://pixeltable:fiftyone/coco_mini_2017', local_path='coco-copy')
```
Full documentation on publish, replicate, push, and pull
***
## 2. Endpoints
Self-hosted HTTP serving is available now. Cloud deployment with managed hosting is coming soon.
| Feature | Status | What |
| -------------------------------------------------------------- | -------------- | ---------------------------------------------------------- |
| [`pxt serve`](/howto/deployment/serving) | ✅ Available | TOML-configured HTTP endpoints with CLI |
| [`FastAPIRouter`](/howto/deployment/serving#quickstart-python) | ✅ Available | Declarative routes in Python, drop-in `APIRouter` subclass |
| Background jobs | ✅ Available | `background=True` on any route |
| `pxt deploy` | 🔜 Coming Soon | Cloud deployment with auto-scaling |
| Pre-signed URLs | 🔜 Coming Soon | Media access without proxying |
TOML config, CLI, Python API, and background jobs for self-hosted deployments
[Join the waitlist](https://www.pixeltable.com/waitlist) to get early access to cloud-managed Endpoints.
***
## 3. Cloud Storage ✅ Available Now
Every Pixeltable Cloud account includes a free managed storage bucket for media files. No cloud provider account or bucket configuration required.
| Feature | What |
| ---------------- | --------------------------------------------------------- |
| Home bucket | Free R2-backed storage for computed and input media |
| Auto credentials | Temporary credentials fetched and refreshed automatically |
| Quota management | Built-in storage limits with clear error messages |
```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Use your free cloud bucket as a media destination
t.add_computed_column(
thumbnail=t.photo.resize((256, 256)),
destination='pxtfs://myorg:mydb/home'
)
```
Configuration guide for the free managed bucket
***
## 4. Live Tables 🔜 Coming Soon
Multi-writer collaboration and serverless compute.
| Feature | What |
| ------------------ | -------------------------------------- |
| Multi-writer | Team collaboration on shared tables |
| Serverless compute | Auto-scaling without infrastructure |
| UDF versioning | Safe experimentation with code changes |
| RBAC + audit | Governance and compliance |
***
## Unified: Where It's All Going
When all three cloud services are available, the two use cases converge:
* **Data wrangling + AI pipelines + endpoints = one system**
* **Orchestration + storage + retrieval unified**
* **Table becomes the endpoint**
Your training datasets and production APIs share the same infrastructure—versioning, lineage, and retrieval in the serving path.
Schedule time with our team to discuss your use case