# Changelog
Source: https://docs.pixeltable.com/changelog/changelog
Release history and updates for Pixeltable
## Contributors
Pixeltable is built by a vibrant community of contributors. We're grateful for everyone who has helped make Pixeltable better!
**Want to contribute?** Check out our [Contributing Guide](https://github.com/pixeltable/pixeltable/tree/main?tab=contributing-ov-file#readme) to get started.
**Top Contributors:** View our top contributors on [GitHub](https://github.com/pixeltable/pixeltable/graphs/contributors).
***
## Release History
View the complete release history for Pixeltable below. Each release includes detailed information about new features, bug fixes, and improvements.
For the latest release information, visit our [GitHub Releases page](https://github.com/pixeltable/pixeltable/releases).
***
### v0.5.20
**Released:** March 03, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.20](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.20)
#### What's Changed
* Perftest to log if it thinks that it's running in CI by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1163](https://github.com/pixeltable/pixeltable/pull/1163)
* \[PXT-1002] re-enable force replace view in random ops by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1166](https://github.com/pixeltable/pixeltable/pull/1166)
* \[PXT-1002] Fix table md caching when an insert finalizes view creation by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1138](https://github.com/pixeltable/pixeltable/pull/1138)
* Add missing %pip install to custom-iterators.ipynb by [@aaron-siegel](https://github.com/aaron-siegel) in [#1171](https://github.com/pixeltable/pixeltable/pull/1171)
* Add migration guides for new users coming from common stacks by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1167](https://github.com/pixeltable/pixeltable/pull/1167)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.19...v0.5.20](https://github.com/pixeltable/pixeltable/compare/v0.5.19...v0.5.20)
***
### v0.5.19
**Released:** March 01, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.19](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.19)
#### What's Changed
* Add local docs serving instructions to contributing guide by [@apreshill](https://github.com/apreshill) in [#1054](https://github.com/pixeltable/pixeltable/pull/1054)
* TableOp refactoring so that TableVersion is not required for some ops by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1153](https://github.com/pixeltable/pixeltable/pull/1153)
* [@pxt](https://github.com/pxt).iterator decorator by [@aaron-siegel](https://github.com/aaron-siegel) in [#1111](https://github.com/pixeltable/pixeltable/pull/1111)
* Docs: add missing integrations, SDK entries, and cookbook updates for v0.5.11–v0.5.18 by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1158](https://github.com/pixeltable/pixeltable/pull/1158)
* Quieter CI output by [@aaron-siegel](https://github.com/aaron-siegel) in [#1161](https://github.com/pixeltable/pixeltable/pull/1161)
* \[PXT-1002] Make non-transactional TableOps idempotent by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1139](https://github.com/pixeltable/pixeltable/pull/1139)
* \[PXT-1043] Support video embeddings in VoyageAI by [@aaron-siegel](https://github.com/aaron-siegel) in [#1160](https://github.com/pixeltable/pixeltable/pull/1160)
* PXT-877 Fixing if\_exists='replace' cannot be used to replace a Table with a View/Snapshot or vice-versa by [@christopherpestano](https://github.com/christopherpestano) in [#1150](https://github.com/pixeltable/pixeltable/pull/1150)
* PXT-1020: support for multi-threaded API calls by [@mkornacker](https://github.com/mkornacker) in [#1155](https://github.com/pixeltable/pixeltable/pull/1155)
* Fix TableVersion.is\_iterator\_column by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1159](https://github.com/pixeltable/pixeltable/pull/1159)
* PXT-933 Support videos in gemini generate\_content by [@amithadke](https://github.com/amithadke) in [#1152](https://github.com/pixeltable/pixeltable/pull/1152)
* \[PXT-1018] Add a "source" field to list of columns in t.describe() by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1135](https://github.com/pixeltable/pixeltable/pull/1135)
* uvloop compatibility by [@mkornacker](https://github.com/mkornacker) in [#1164](https://github.com/pixeltable/pixeltable/pull/1164)
* docs: update deployment guides for thread safety, sync endpoints, and uvloop by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1165](https://github.com/pixeltable/pixeltable/pull/1165)
* Add Bedrock API Key auth support and notebook outputs by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1146](https://github.com/pixeltable/pixeltable/pull/1146)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.18...v0.5.19](https://github.com/pixeltable/pixeltable/compare/v0.5.18...v0.5.19)
***
### v0.5.18
**Released:** February 24, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.18](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.18)
#### What's Changed
* misc improvements in the code by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1072](https://github.com/pixeltable/pixeltable/pull/1072)
* \[PXT-995] improve test migration coverage of literals of various types by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1128](https://github.com/pixeltable/pixeltable/pull/1128)
* Twelvelabs notebook update by [@apreshill](https://github.com/apreshill) in [#1117](https://github.com/pixeltable/pixeltable/pull/1117)
* \[PXT-1040] Temporarily disable twelvelabs nb test by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1140](https://github.com/pixeltable/pixeltable/pull/1140)
* Update contribution guidelines regarding AI-generated code by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1134](https://github.com/pixeltable/pixeltable/pull/1134)
* \[PXT-1007 + PXT-1010] Modifying add\_columns to support column metadata and introducing standard ColumnSpec by [@christopherpestano](https://github.com/christopherpestano) in [#1119](https://github.com/pixeltable/pixeltable/pull/1119)
* Adding negative\_prompt to img2img notebook by [@christopherpestano](https://github.com/christopherpestano) in [#1136](https://github.com/pixeltable/pixeltable/pull/1136)
* \[PXT-1040] disable all twelvelabs tests by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1142](https://github.com/pixeltable/pixeltable/pull/1142)
* PXT-1039: video\_splitter(mode='accurate') doesn't work by [@mkornacker](https://github.com/mkornacker) in [#1145](https://github.com/pixeltable/pixeltable/pull/1145)
* PXT-966: crop() udf for videos by [@mkornacker](https://github.com/mkornacker) in [#1144](https://github.com/pixeltable/pixeltable/pull/1144)
* dumps() udf for json by [@mkornacker](https://github.com/mkornacker) in [#1149](https://github.com/pixeltable/pixeltable/pull/1149)
* Fixes for recent versions of mintlify by [@aaron-siegel](https://github.com/aaron-siegel) in [#1151](https://github.com/pixeltable/pixeltable/pull/1151)
* \[PXT-1003] Add offset parameter to limit() queries for pagination by [@aaron-siegel](https://github.com/aaron-siegel) in [#1148](https://github.com/pixeltable/pixeltable/pull/1148)
* Add agentic patterns cookbook by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1141](https://github.com/pixeltable/pixeltable/pull/1141)
* PXT-985 + PXT-1041 - Adding custom\_metadata and comment for columns by [@christopherpestano](https://github.com/christopherpestano) in [#1132](https://github.com/pixeltable/pixeltable/pull/1132)
* Fix: Implement drop\_index() for BtreeIndex and EmbeddingIndex by [@KeeProMise](https://github.com/KeeProMise) in [#1133](https://github.com/pixeltable/pixeltable/pull/1133)
* Update OpenAI vision and image gen APIs to make proper use of images in dicts by [@aaron-siegel](https://github.com/aaron-siegel) in [#1147](https://github.com/pixeltable/pixeltable/pull/1147)
* \[PXT-995] Literal should serialize its entire type info by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1123](https://github.com/pixeltable/pixeltable/pull/1123)
#### New Contributors
* [@KeeProMise](https://github.com/KeeProMise) made their first contribution in [#1133](https://github.com/pixeltable/pixeltable/pull/1133)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.17...v0.5.18](https://github.com/pixeltable/pixeltable/compare/v0.5.17...v0.5.18)
***
### v0.5.17
**Released:** February 10, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.17](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.17)
#### What's Changed
* Standardize names for runner configs by [@aaron-siegel](https://github.com/aaron-siegel) in [#1122](https://github.com/pixeltable/pixeltable/pull/1122)
* Add Jina AI integration for embeddings and reranking by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1029](https://github.com/pixeltable/pixeltable/pull/1029)
* Add Microsoft Fabric Integration for Azure OpenAI by [@pawarbi](https://github.com/pawarbi) in [#1109](https://github.com/pixeltable/pixeltable/pull/1109)
* Switch away from gemini-2.0 models by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1115](https://github.com/pixeltable/pixeltable/pull/1115)
* PXT-985 Adding custom\_metadata and restricting comment field to string by [@christopherpestano](https://github.com/christopherpestano) in [#1102](https://github.com/pixeltable/pixeltable/pull/1102)
* Nightly CI fix by [@aaron-siegel](https://github.com/aaron-siegel) in [#1129](https://github.com/pixeltable/pixeltable/pull/1129)
* PXT-1033: handle min\_segment\_duration=None correctly in VideoSplitter by [@mkornacker](https://github.com/mkornacker) in [#1131](https://github.com/pixeltable/pixeltable/pull/1131)
* Apply ruff formatting to code snippets in docstrings by [@aaron-siegel](https://github.com/aaron-siegel) in [#1125](https://github.com/pixeltable/pixeltable/pull/1125)
* Improved treatment of stored UDFs by [@aaron-siegel](https://github.com/aaron-siegel) in [#1126](https://github.com/pixeltable/pixeltable/pull/1126)
* PXT-1023: Support for ragged arrays in export\_parquet() by [@mkornacker](https://github.com/mkornacker) in [#1124](https://github.com/pixeltable/pixeltable/pull/1124)
#### New Contributors
* [@pawarbi](https://github.com/pawarbi) made their first contribution in [#1109](https://github.com/pixeltable/pixeltable/pull/1109)
* [@christopherpestano](https://github.com/christopherpestano) made their first contribution in [#1102](https://github.com/pixeltable/pixeltable/pull/1102)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.16...v0.5.17](https://github.com/pixeltable/pixeltable/compare/v0.5.16...v0.5.17)
***
### v0.5.16
**Released:** February 04, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.16](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.16)
#### What's Changed
* PXT-898 Allow Pixeltable API key to change in the environment mid-stream in a Python session by [@amithadke](https://github.com/amithadke) in [#1060](https://github.com/pixeltable/pixeltable/pull/1060)
* various runwayml followups by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1095](https://github.com/pixeltable/pixeltable/pull/1095)
* Ensure progress bar stops on empty results and plan exit by [@amithadke](https://github.com/amithadke) in [#1097](https://github.com/pixeltable/pixeltable/pull/1097)
* Fix exception handling in catalog by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1101](https://github.com/pixeltable/pixeltable/pull/1101)
* Migrate docs to `uuid7()` UDF by [@apreshill](https://github.com/apreshill) in [#1093](https://github.com/pixeltable/pixeltable/pull/1093)
* Add retries to Python install in CI by [@aaron-siegel](https://github.com/aaron-siegel) in [#1094](https://github.com/pixeltable/pixeltable/pull/1094)
* fix: Make notebook outputs visible in dark mode by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1107](https://github.com/pixeltable/pixeltable/pull/1107)
* various improvements to random-ops script by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1084](https://github.com/pixeltable/pixeltable/pull/1084)
* Prep work for iterator refactor: Add media types and iterators to migration test by [@aaron-siegel](https://github.com/aaron-siegel) in [#1103](https://github.com/pixeltable/pixeltable/pull/1103)
* Add export media to s3 to io cookbooks in docs by [@apreshill](https://github.com/apreshill) in [#1088](https://github.com/pixeltable/pixeltable/pull/1088)
* Include audio\_splitter and video\_splitter in db dumps by [@aaron-siegel](https://github.com/aaron-siegel) in [#1110](https://github.com/pixeltable/pixeltable/pull/1110)
* PXT-965 Support http url and blob store uri for creating json/parquet/csv tables by [@amithadke](https://github.com/amithadke) in [#1104](https://github.com/pixeltable/pixeltable/pull/1104)
* Fixes for Pandas 3.0 by [@aaron-siegel](https://github.com/aaron-siegel) in [#1112](https://github.com/pixeltable/pixeltable/pull/1112)
* Upgrade ruff to latest by [@aaron-siegel](https://github.com/aaron-siegel) in [#1114](https://github.com/pixeltable/pixeltable/pull/1114)
* Fixes for Transformers 5 by [@aaron-siegel](https://github.com/aaron-siegel) in [#1113](https://github.com/pixeltable/pixeltable/pull/1113)
* Use a larger runner in merge queue for full tests on Python 3.10 by [@aaron-siegel](https://github.com/aaron-siegel) in [#1120](https://github.com/pixeltable/pixeltable/pull/1120)
* \[PXT-944] speech2text\_for\_conditional\_generation declares return type… by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1116](https://github.com/pixeltable/pixeltable/pull/1116)
* \[PXT-875] Fix openai perftest on github by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1062](https://github.com/pixeltable/pixeltable/pull/1062)
* PXT-973: additional\_columns doesn't evaluate as expected when creating a view by [@mkornacker](https://github.com/mkornacker) in [#1087](https://github.com/pixeltable/pixeltable/pull/1087)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.15...v0.5.16](https://github.com/pixeltable/pixeltable/compare/v0.5.15...v0.5.16)
***
### v0.5.15
**Released:** January 29, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.15](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.15)
#### What's Changed
* docs: update overview description and callout/footer styling by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1086](https://github.com/pixeltable/pixeltable/pull/1086)
* Fix HF datasets rotten\_tomatoes references in tests & notebook by [@aaron-siegel](https://github.com/aaron-siegel) in [#1089](https://github.com/pixeltable/pixeltable/pull/1089)
* Gemini UDFs to use "rate limits" scheduler by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1092](https://github.com/pixeltable/pixeltable/pull/1092)
* Allow dict/list config params to be specified as environment variables by [@aaron-siegel](https://github.com/aaron-siegel) in [#1091](https://github.com/pixeltable/pixeltable/pull/1091)
* Minor Gemini UDF followup for safer get\_retry\_delay() by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1098](https://github.com/pixeltable/pixeltable/pull/1098)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.14...v0.5.15](https://github.com/pixeltable/pixeltable/compare/v0.5.14...v0.5.15)
***
### v0.5.14
**Released:** January 24, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.14](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.14)
#### What's Changed
* Add RunwayML integration with UDFs for image and video generation by [@tiennguyentony](https://github.com/tiennguyentony) in [#1019](https://github.com/pixeltable/pixeltable/pull/1019)
* Deployment and Use Cases Docs by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1043](https://github.com/pixeltable/pixeltable/pull/1043)
* Transaction rollback by [@mkornacker](https://github.com/mkornacker) in [#1075](https://github.com/pixeltable/pixeltable/pull/1075)
* \[PXT-972] Bugfix: FrameIterator.set\_pos() on videos with start\_time > 0 by [@aaron-siegel](https://github.com/aaron-siegel) in [#1082](https://github.com/pixeltable/pixeltable/pull/1082)
* to\_string() method on UUIDType by [@aaron-siegel](https://github.com/aaron-siegel) in [#1078](https://github.com/pixeltable/pixeltable/pull/1078)
* CI and Makefile step to ensure notebooks have >= 50% of their cells with outputs by [@aaron-siegel](https://github.com/aaron-siegel) in [#1073](https://github.com/pixeltable/pixeltable/pull/1073)
* Regenerate all outputs for Reve integration notebook by [@apreshill](https://github.com/apreshill) in [#1071](https://github.com/pixeltable/pixeltable/pull/1071)
* Apply ruff formatting to all notebooks by [@aaron-siegel](https://github.com/aaron-siegel) in [#1074](https://github.com/pixeltable/pixeltable/pull/1074)
#### New Contributors
* [@tiennguyentony](https://github.com/tiennguyentony) made their first contribution in [#1019](https://github.com/pixeltable/pixeltable/pull/1019)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.13...v0.5.14](https://github.com/pixeltable/pixeltable/compare/v0.5.13...v0.5.14)
***
### v0.5.13
**Released:** January 22, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.13](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.13)
#### What's Changed
* rename reset\_db fixture to uses\_db by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1067](https://github.com/pixeltable/pixeltable/pull/1067)
* Use '/' as path delimiter by [@amithadke](https://github.com/amithadke) in [#1055](https://github.com/pixeltable/pixeltable/pull/1055)
* Temporarily disable progress reporting when verbosity \< 2 by [@aaron-siegel](https://github.com/aaron-siegel) in [#1079](https://github.com/pixeltable/pixeltable/pull/1079)
* Follow up fixes for Path delimiter change by [@amithadke](https://github.com/amithadke) in [#1076](https://github.com/pixeltable/pixeltable/pull/1076)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.12...v0.5.13](https://github.com/pixeltable/pixeltable/compare/v0.5.12...v0.5.13)
***
### v0.5.12
**Released:** January 17, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.12](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.12)
#### What's Changed
* Lint markdown in notebooks by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1033](https://github.com/pixeltable/pixeltable/pull/1033)
* Adjust down max connections on OpenAI client by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1058](https://github.com/pixeltable/pixeltable/pull/1058)
* \[PXT-915] Gemini embedding UDFs by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#986](https://github.com/pixeltable/pixeltable/pull/986)
* PXT-866 Add validation for version in pixeltable uri by [@amithadke](https://github.com/amithadke) in [#1048](https://github.com/pixeltable/pixeltable/pull/1048)
* uuid7() udf by [@mkornacker](https://github.com/mkornacker) in [#1059](https://github.com/pixeltable/pixeltable/pull/1059)
* \[PXT-875] Disable performance test until it reliably passes by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1061](https://github.com/pixeltable/pixeltable/pull/1061)
* Daemonize pgserver on Windows by [@aaron-siegel](https://github.com/aaron-siegel) in [#1057](https://github.com/pixeltable/pixeltable/pull/1057)
* PXT-954: assertion in recompute\_columns() for view column by [@mkornacker](https://github.com/mkornacker) in [#1064](https://github.com/pixeltable/pixeltable/pull/1064)
* Remove obsolete mkdocs by [@aaron-siegel](https://github.com/aaron-siegel) in [#1056](https://github.com/pixeltable/pixeltable/pull/1056)
* Working with blob storage nb by [@apreshill](https://github.com/apreshill) in [#977](https://github.com/pixeltable/pixeltable/pull/977)
* PXT-961: correct support for alpha in draw\_bounding\_boxes() by [@mkornacker](https://github.com/mkornacker) in [#1068](https://github.com/pixeltable/pixeltable/pull/1068)
* Notebook CI tweaks by [@aaron-siegel](https://github.com/aaron-siegel) in [#1069](https://github.com/pixeltable/pixeltable/pull/1069)
* PXT-943: Rectify all indices in TableRestorer, not just embedding indices by [@aaron-siegel](https://github.com/aaron-siegel) in [#1066](https://github.com/pixeltable/pixeltable/pull/1066)
* \[PXT-955] Skip UDA evaluation if a required parameter is None by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1070](https://github.com/pixeltable/pixeltable/pull/1070)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.11...v0.5.12](https://github.com/pixeltable/pixeltable/compare/v0.5.11...v0.5.12)
***
### v0.5.11
**Released:** January 13, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.11](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.11)
#### What's Changed
* \[PXT-916] Store embedding indexes as halfvecs by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1007](https://github.com/pixeltable/pixeltable/pull/1007)
* Add a "read only random ops" stress-tests job by [@aaron-siegel](https://github.com/aaron-siegel) in [#1047](https://github.com/pixeltable/pixeltable/pull/1047)
* Streamline dev installation by [@aaron-siegel](https://github.com/aaron-siegel) in [#1046](https://github.com/pixeltable/pixeltable/pull/1046)
* Add reruns by default to all cockroach test failures by [@aaron-siegel](https://github.com/aaron-siegel) in [#1053](https://github.com/pixeltable/pixeltable/pull/1053)
* PXT-938: export\_sql() by [@mkornacker](https://github.com/mkornacker) in [#1037](https://github.com/pixeltable/pixeltable/pull/1037)
* Add cookbooks: SQL and Segmentation by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1038](https://github.com/pixeltable/pixeltable/pull/1038)
* \[PXT-629] Update plan is incomplete by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1044](https://github.com/pixeltable/pixeltable/pull/1044)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.10...v0.5.11](https://github.com/pixeltable/pixeltable/compare/v0.5.10...v0.5.11)
***
### v0.5.10
**Released:** January 10, 2026\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.10](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.10)
#### What's Changed
* Adding ipywidgets to dev dependencies by [@mkornacker](https://github.com/mkornacker) in [#1027](https://github.com/pixeltable/pixeltable/pull/1027)
* Add a seed to TestSample.test\_sample\_basic\_f by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1040](https://github.com/pixeltable/pixeltable/pull/1040)
* Twelvelabs notebook by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1013](https://github.com/pixeltable/pixeltable/pull/1013)
* Readme Updates by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1041](https://github.com/pixeltable/pixeltable/pull/1041)
* Proper configurability for spaCy models by [@aaron-siegel](https://github.com/aaron-siegel) in [#1039](https://github.com/pixeltable/pixeltable/pull/1039)
* Various import fixes by [@aaron-siegel](https://github.com/aaron-siegel) in [#1042](https://github.com/pixeltable/pixeltable/pull/1042)
* PXT-875 Run perf tests on a dedicated larger runner by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1024](https://github.com/pixeltable/pixeltable/pull/1024)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.9...v0.5.10](https://github.com/pixeltable/pixeltable/compare/v0.5.9...v0.5.10)
***
### v0.5.9
**Released:** December 30, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.9](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.9)
#### What's Changed
* Bedrock invoke\_model() udf by [@mkornacker](https://github.com/mkornacker) in [#1018](https://github.com/pixeltable/pixeltable/pull/1018)
* \[PXT-765] Support for Office Formats as part of Document Type through MarkdownIT by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#960](https://github.com/pixeltable/pixeltable/pull/960)
* HF DetrForSegmentation by [@mkornacker](https://github.com/mkornacker) in [#1020](https://github.com/pixeltable/pixeltable/pull/1020)
* Image2Image: Updated HF.py to use AutoPipelineForImage2Image and Cookbook by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1025](https://github.com/pixeltable/pixeltable/pull/1025)
* Fixed broken tutorial links. by [@joerg84](https://github.com/joerg84) in [#1026](https://github.com/pixeltable/pixeltable/pull/1026)
* Allow `similarity(image=...)` to accept a filename or URL instead of a PIL image object by [@aaron-siegel](https://github.com/aaron-siegel) in [#1023](https://github.com/pixeltable/pixeltable/pull/1023)
* docs(cookbook): add MCP tool calling section to LLM tool calling guide by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1021](https://github.com/pixeltable/pixeltable/pull/1021)
* PXT-928: Export Json columns to parquet as pa.struct by [@mkornacker](https://github.com/mkornacker) in [#1017](https://github.com/pixeltable/pixeltable/pull/1017)
* removing psutil by [@mkornacker](https://github.com/mkornacker) in [#1031](https://github.com/pixeltable/pixeltable/pull/1031)
* Use head() instead of collect() in test\_add\_column\_to\_view by [@aaron-siegel](https://github.com/aaron-siegel) in [#1022](https://github.com/pixeltable/pixeltable/pull/1022)
* disable progress reporting in Jupyter if ipywidgets is not installed by [@mkornacker](https://github.com/mkornacker) in [#1032](https://github.com/pixeltable/pixeltable/pull/1032)
#### New Contributors
* [@joerg84](https://github.com/joerg84) made their first contribution in [#1026](https://github.com/pixeltable/pixeltable/pull/1026)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.8...v0.5.9](https://github.com/pixeltable/pixeltable/compare/v0.5.8...v0.5.9)
***
### v0.5.8
**Released:** December 20, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.8](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.8)
#### What's Changed
* Use high performance endpoint for Tigris by [@apreshill](https://github.com/apreshill) in [#1011](https://github.com/pixeltable/pixeltable/pull/1011)
* Merge Table.add\_embedding\_index examples by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1014](https://github.com/pixeltable/pixeltable/pull/1014)
* Notebook fixes & some cleanup by [@aaron-siegel](https://github.com/aaron-siegel) in [#1010](https://github.com/pixeltable/pixeltable/pull/1010)
* Progress tracker by [@mkornacker](https://github.com/mkornacker) in [#956](https://github.com/pixeltable/pixeltable/pull/956)
* \[PXT-925] Fix spurious exception when `if_not_exists='ignore'` is used with a missing parent dir by [@aaron-siegel](https://github.com/aaron-siegel) in [#1015](https://github.com/pixeltable/pixeltable/pull/1015)
* Improve primary key error message by [@aaron-siegel](https://github.com/aaron-siegel) in [#1016](https://github.com/pixeltable/pixeltable/pull/1016)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.7...v0.5.8](https://github.com/pixeltable/pixeltable/compare/v0.5.7...v0.5.8)
***
### v0.5.7
**Released:** December 18, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.7](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.7)
#### What's Changed
* Fix a bug in rag-demo.ipynb by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#996](https://github.com/pixeltable/pixeltable/pull/996)
* Fixes the errant `/datastore/` url in the Reve docstrings by [@apreshill](https://github.com/apreshill) in [#999](https://github.com/pixeltable/pixeltable/pull/999)
* Remove custom-iterators.ipynb from docs for now, and clean up docs.json by [@aaron-siegel](https://github.com/aaron-siegel) in [#997](https://github.com/pixeltable/pixeltable/pull/997)
* \[PXT-921] Skip test\_create\_video\_table on cockroachdb by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#1002](https://github.com/pixeltable/pixeltable/pull/1002)
* Add iterators cookbook with all 6 built-in iterators by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1000](https://github.com/pixeltable/pixeltable/pull/1000)
* PXT 910 Add rerun options to presigned url tests by [@amithadke](https://github.com/amithadke) in [#1006](https://github.com/pixeltable/pixeltable/pull/1006)
* docs: add presigned\_url to S3 cookbook and update SDK docs by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1004](https://github.com/pixeltable/pixeltable/pull/1004)
* docs(providers): add Tigris example notebook by [@Xe](https://github.com/Xe) in [#998](https://github.com/pixeltable/pixeltable/pull/998)
* docs: update Mintlify theme colors and styling by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#1008](https://github.com/pixeltable/pixeltable/pull/1008)
* Add `pxt.Binary` type to type system; `bytes` support in JSON; working Gemini 3 Pro by [@aaron-siegel](https://github.com/aaron-siegel) in [#1001](https://github.com/pixeltable/pixeltable/pull/1001)
* Support audio and video embedding indices by [@aaron-siegel](https://github.com/aaron-siegel) in [#990](https://github.com/pixeltable/pixeltable/pull/990)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.6...v0.5.7](https://github.com/pixeltable/pixeltable/compare/v0.5.6...v0.5.7)
***
### v0.5.6
**Released:** December 15, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.6](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.6)
#### What's Changed
* \[PXT-892] Support variable framerate in FrameIterator by [@aaron-siegel](https://github.com/aaron-siegel) in [#961](https://github.com/pixeltable/pixeltable/pull/961)
* \[PXT-875] Define GRAFANA\_INSTANCE\_ID for the perf job by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#989](https://github.com/pixeltable/pixeltable/pull/989)
* \[PXT-399] Remove pymupdf as a dependency by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#981](https://github.com/pixeltable/pixeltable/pull/981)
* Docs Cleanup + Cookbooks + Versioning/Lineage + Production for Workshop by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#964](https://github.com/pixeltable/pixeltable/pull/964)
* Iterators Refactor Part 1 by [@aaron-siegel](https://github.com/aaron-siegel) in [#992](https://github.com/pixeltable/pixeltable/pull/992)
* Update documentation for iterators and aggregate functions by [@aaron-siegel](https://github.com/aaron-siegel) in [#995](https://github.com/pixeltable/pixeltable/pull/995)
* PXT-910 Add presigned\_url udf by [@amithadke](https://github.com/amithadke) in [#991](https://github.com/pixeltable/pixeltable/pull/991)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.5...v0.5.6](https://github.com/pixeltable/pixeltable/compare/v0.5.5...v0.5.6)
***
### v0.5.5
**Released:** December 11, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.5](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.5)
#### What's Changed
* Multimodal support for Gemini `generate_content()` by [@aaron-siegel](https://github.com/aaron-siegel) in [#983](https://github.com/pixeltable/pixeltable/pull/983)
* PXT-903 Add UUID in pixeltable types by [@amithadke](https://github.com/amithadke) in [#979](https://github.com/pixeltable/pixeltable/pull/979)
* PXT-905/907: clean up handling of Huggingface datasets by [@mkornacker](https://github.com/mkornacker) in [#984](https://github.com/pixeltable/pixeltable/pull/984)
* Twelve Labs multimodal embeddings support by [@aaron-siegel](https://github.com/aaron-siegel) in [#987](https://github.com/pixeltable/pixeltable/pull/987)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.4...v0.5.5](https://github.com/pixeltable/pixeltable/compare/v0.5.4...v0.5.5)
***
### v0.5.4
**Released:** December 09, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.4](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.4)
#### What's Changed
* \[PXT-645] Support more numpy dtypes for Array by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#940](https://github.com/pixeltable/pixeltable/pull/940)
* Add working-with-voyageai tutorial notebook by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#978](https://github.com/pixeltable/pixeltable/pull/978)
* StringSplitter docstring fix plus test by [@mkornacker](https://github.com/mkornacker) in [#980](https://github.com/pixeltable/pixeltable/pull/980)
* \[PXT-875] performance test for openai endpoints by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#963](https://github.com/pixeltable/pixeltable/pull/963)
* Restructuring of docs site and repo by [@aaron-siegel](https://github.com/aaron-siegel) in [#982](https://github.com/pixeltable/pixeltable/pull/982)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.3...v0.5.4](https://github.com/pixeltable/pixeltable/compare/v0.5.3...v0.5.4)
***
### v0.5.3
**Released:** December 04, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.3](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.3)
#### What's Changed
* PXT-872 Support count() with sample and group by clause. by [@amithadke](https://github.com/amithadke) in [#955](https://github.com/pixeltable/pixeltable/pull/955)
* Add VOYAGE\_API\_KEY to CI and configuration.mdx; update uv.lock doctools reference by [@aaron-siegel](https://github.com/aaron-siegel) in [#976](https://github.com/pixeltable/pixeltable/pull/976)
* Fal.ai Integration by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#959](https://github.com/pixeltable/pixeltable/pull/959)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.2...v0.5.3](https://github.com/pixeltable/pixeltable/compare/v0.5.2...v0.5.3)
***
### v0.5.2
**Released:** December 03, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.2](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.2)
#### What's Changed
* Use database schemas and search\_path for test isolation in parallel runs by [@amithadke](https://github.com/amithadke) in [#953](https://github.com/pixeltable/pixeltable/pull/953)
* Working CI for Cockroach by [@aaron-siegel](https://github.com/aaron-siegel) in [#906](https://github.com/pixeltable/pixeltable/pull/906)
* Fix internal documentation links by [@aaron-siegel](https://github.com/aaron-siegel) in [#954](https://github.com/pixeltable/pixeltable/pull/954)
* \[PXT-886] Fix a bug in RateLimitsScheduler's error handling by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#951](https://github.com/pixeltable/pixeltable/pull/951)
* \[PXT-786] Development Guide by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#958](https://github.com/pixeltable/pixeltable/pull/958)
* Add Reve integration notebook by [@apreshill](https://github.com/apreshill) in [#939](https://github.com/pixeltable/pixeltable/pull/939)
* Adds support for Voyage AI embeddings and rerankers. by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#962](https://github.com/pixeltable/pixeltable/pull/962)
* Some rough-edges features/improvements by [@mkornacker](https://github.com/mkornacker) in [#967](https://github.com/pixeltable/pixeltable/pull/967)
* \[PXT-908] Ensure that generated Gemini videos have sound by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#973](https://github.com/pixeltable/pixeltable/pull/973)
* PXT-904: add MIME type for object uploads by [@mkornacker](https://github.com/mkornacker) in [#971](https://github.com/pixeltable/pixeltable/pull/971)
* Update uv.lock by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#974](https://github.com/pixeltable/pixeltable/pull/974)
* Add uv.lock validation to the pr tests by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#975](https://github.com/pixeltable/pixeltable/pull/975)
* Documentation and config updates by [@aaron-siegel](https://github.com/aaron-siegel) in [#972](https://github.com/pixeltable/pixeltable/pull/972)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.1...v0.5.2](https://github.com/pixeltable/pixeltable/compare/v0.5.1...v0.5.2)
***
### v0.5.1
**Released:** November 19, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.1](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.1)
#### What's Changed
* Add TableVersionMd.from\_dict and update publish to use objects instead of dicts by [@amithadke](https://github.com/amithadke) in [#944](https://github.com/pixeltable/pixeltable/pull/944)
* Publishing existing version returns 201, 204 does not allow any content to be sent back in body. by [@amithadke](https://github.com/amithadke) in [#948](https://github.com/pixeltable/pixeltable/pull/948)
* Replace StorageDestination with StorageTarget by [@amithadke](https://github.com/amithadke) in [#947](https://github.com/pixeltable/pixeltable/pull/947)
* Missing converter for schema change in PR 932 by [@mkornacker](https://github.com/mkornacker) in [#949](https://github.com/pixeltable/pixeltable/pull/949)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.5.0...v0.5.1](https://github.com/pixeltable/pixeltable/compare/v0.5.0...v0.5.1)
***
### v0.5.0
**Released:** November 18, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.5.0](https://github.com/pixeltable/pixeltable/releases/tag/v0.5.0)
#### What's Changed
* Data sharing docs by [@apreshill](https://github.com/apreshill) in [#931](https://github.com/pixeltable/pixeltable/pull/931)
* Numerous documentation fixes by [@aaron-siegel](https://github.com/aaron-siegel) in [#933](https://github.com/pixeltable/pixeltable/pull/933)
* PXT-846: FrameIterator(keyframes\_only: bool) by [@mkornacker](https://github.com/mkornacker) in [#934](https://github.com/pixeltable/pixeltable/pull/934)
* \[PXT-809] Improve OpenAI rate limiting by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#912](https://github.com/pixeltable/pixeltable/pull/912)
* Multi-phase drop\_table() by [@mkornacker](https://github.com/mkornacker) in [#932](https://github.com/pixeltable/pixeltable/pull/932)
* Streamline Makefile by [@aaron-siegel](https://github.com/aaron-siegel) in [#937](https://github.com/pixeltable/pixeltable/pull/937)
* Changes to protocol to handle publishing existing version by [@amithadke](https://github.com/amithadke) in [#938](https://github.com/pixeltable/pixeltable/pull/938)
* PXT-871: == None filter doesn't work correctly on an array column by [@mkornacker](https://github.com/mkornacker) in [#941](https://github.com/pixeltable/pixeltable/pull/941)
* More documentation improvements by [@aaron-siegel](https://github.com/aaron-siegel) in [#936](https://github.com/pixeltable/pixeltable/pull/936)
* Circularity detection in view creation with if\_exists='replace' by [@aaron-siegel](https://github.com/aaron-siegel) in [#942](https://github.com/pixeltable/pixeltable/pull/942)
* Add Tigris integration by [@Xe](https://github.com/Xe) in [#935](https://github.com/pixeltable/pixeltable/pull/935)
* Improvements to notebook documentation by [@aaron-siegel](https://github.com/aaron-siegel) in [#943](https://github.com/pixeltable/pixeltable/pull/943)
* Improvements to retriable errors detection in RequestRateScheduler by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#922](https://github.com/pixeltable/pixeltable/pull/922)
* Rename `DataFrame` to `Query` and `DataFrameResultSet` to `ResultSet` by [@aaron-siegel](https://github.com/aaron-siegel) in [#902](https://github.com/pixeltable/pixeltable/pull/902)
* PXT-873: t.sample() fails on externalized array data by [@mkornacker](https://github.com/mkornacker) in [#945](https://github.com/pixeltable/pixeltable/pull/945)
#### New Contributors
* [@Xe](https://github.com/Xe) made their first contribution in [#935](https://github.com/pixeltable/pixeltable/pull/935)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.24...v0.5.0](https://github.com/pixeltable/pixeltable/compare/v0.4.24...v0.5.0)
***
### v0.4.24
**Released:** November 12, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.24](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.24)
#### What's Changed
* Update imagen model in tests and docs (3.0 is deprecated) by [@aaron-siegel](https://github.com/aaron-siegel) in [#929](https://github.com/pixeltable/pixeltable/pull/929)
* Allow hyphens in table and dir names by [@aaron-siegel](https://github.com/aaron-siegel) in [#926](https://github.com/pixeltable/pixeltable/pull/926)
* Skip download when replicating the same version of a table a second time by [@aaron-siegel](https://github.com/aaron-siegel) in [#927](https://github.com/pixeltable/pixeltable/pull/927)
* Several fixes and improvements for data sharing by [@aaron-siegel](https://github.com/aaron-siegel) in [#928](https://github.com/pixeltable/pixeltable/pull/928)
* PXT-862: bug fix for drop\_table() by [@mkornacker](https://github.com/mkornacker) in [#930](https://github.com/pixeltable/pixeltable/pull/930)
* Various docs updates by [@aaron-siegel](https://github.com/aaron-siegel) in [#923](https://github.com/pixeltable/pixeltable/pull/923)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.23...v0.4.24](https://github.com/pixeltable/pixeltable/compare/v0.4.23...v0.4.24)
***
### v0.4.23
**Released:** November 11, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.23](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.23)
#### What's Changed
* Add PIXELTABLE\_API\_KEY to CI environment by [@aaron-siegel](https://github.com/aaron-siegel) in [#914](https://github.com/pixeltable/pixeltable/pull/914)
* `create_store_tbls: bool` option in Catalog.create\_replica() by [@aaron-siegel](https://github.com/aaron-siegel) in [#916](https://github.com/pixeltable/pixeltable/pull/916)
* \[PXT-380] Remove NamedFunction object and related code in named\_function.py by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#911](https://github.com/pixeltable/pixeltable/pull/911)
* Switch to new random ops script in CI by [@aaron-siegel](https://github.com/aaron-siegel) in [#909](https://github.com/pixeltable/pixeltable/pull/909)
* \[PXT-799] Allow setting `fps` greater than the framerate of the video in `FrameIterator` by [@aaron-siegel](https://github.com/aaron-siegel) in [#918](https://github.com/pixeltable/pixeltable/pull/918)
* Intelligible error message when replicating a view of an existing original base table by [@aaron-siegel](https://github.com/aaron-siegel) in [#897](https://github.com/pixeltable/pixeltable/pull/897)
* \[PXT-837] Support creating/inserting directly from an existing Table by [@aaron-siegel](https://github.com/aaron-siegel) in [#919](https://github.com/pixeltable/pixeltable/pull/919)
* Add parameters to `make stresstest` by [@aaron-siegel](https://github.com/aaron-siegel) in [#920](https://github.com/pixeltable/pixeltable/pull/920)
* Introduce "anchor tables" in TableVersion(Handle) for live replicas; working pull() by [@aaron-siegel](https://github.com/aaron-siegel) in [#917](https://github.com/pixeltable/pixeltable/pull/917)
* Time travel for view over snapshot; replicas of view over snapshot by [@aaron-siegel](https://github.com/aaron-siegel) in [#924](https://github.com/pixeltable/pixeltable/pull/924)
* Proper display of embeddings by [@aaron-siegel](https://github.com/aaron-siegel) in [#925](https://github.com/pixeltable/pixeltable/pull/925)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.22...v0.4.23](https://github.com/pixeltable/pixeltable/compare/v0.4.22...v0.4.23)
***
### v0.4.22
**Released:** November 04, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.22](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.22)
#### What's Changed
* Manage `additional_md` from Catalog, rather than TableVersion by [@aaron-siegel](https://github.com/aaron-siegel) in [#913](https://github.com/pixeltable/pixeltable/pull/913)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.21...v0.4.22](https://github.com/pixeltable/pixeltable/compare/v0.4.21...v0.4.22)
***
### v0.4.21
**Released:** November 03, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.21](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.21)
#### What's Changed
* Hotfix for bug when publishing older versions of a table by [@aaron-siegel](https://github.com/aaron-siegel) in [#910](https://github.com/pixeltable/pixeltable/pull/910)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.20...v0.4.21](https://github.com/pixeltable/pixeltable/compare/v0.4.20...v0.4.21)
***
### v0.4.20
**Released:** November 03, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.20](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.20)
#### What's Changed
* pyscenedetect udfs by [@mkornacker](https://github.com/mkornacker) in [#899](https://github.com/pixeltable/pixeltable/pull/899)
* CockroachDB fixes + CI target by [@aaron-siegel](https://github.com/aaron-siegel) in [#900](https://github.com/pixeltable/pixeltable/pull/900)
* Add protocol for replica operations. by [@amithadke](https://github.com/amithadke) in [#819](https://github.com/pixeltable/pixeltable/pull/819)
* \[PXT-822, PXT-674] Fix for querying snapshots of tables with unstored columns by [@aaron-siegel](https://github.com/aaron-siegel) in [#895](https://github.com/pixeltable/pixeltable/pull/895)
* Switch to using random\_tbl\_ops\_2 in stress-tests by [@aaron-siegel](https://github.com/aaron-siegel) in [#898](https://github.com/pixeltable/pixeltable/pull/898)
* Fix nondeterminism in unit test by [@aaron-siegel](https://github.com/aaron-siegel) in [#905](https://github.com/pixeltable/pixeltable/pull/905)
* \[PXT-817] UDFs for reve.com by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#901](https://github.com/pixeltable/pixeltable/pull/901)
* \[PXT-826] Refactor index creation logic by [@aaron-siegel](https://github.com/aaron-siegel) in [#908](https://github.com/pixeltable/pixeltable/pull/908)
* UV\_OPTS in Makefile by [@aaron-siegel](https://github.com/aaron-siegel) in [#896](https://github.com/pixeltable/pixeltable/pull/896)
* Ignore additional\_mds when checking table or table version metadata by [@amithadke](https://github.com/amithadke) in [#903](https://github.com/pixeltable/pixeltable/pull/903)
* \[PXT-786] push() and pull() implementations by [@amithadke](https://github.com/amithadke) in [#907](https://github.com/pixeltable/pixeltable/pull/907)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.19...v0.4.20](https://github.com/pixeltable/pixeltable/compare/v0.4.19...v0.4.20)
***
### v0.4.19
**Released:** October 29, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.19](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.19)
#### What's Changed
* Add image recipes to cookbook by [@apreshill](https://github.com/apreshill) in [#857](https://github.com/pixeltable/pixeltable/pull/857)
* Add display-name to CI matrix (prep for testing global media destination) by [@aaron-siegel](https://github.com/aaron-siegel) in [#879](https://github.com/pixeltable/pixeltable/pull/879)
* Enable all media destinations in CI by [@aaron-siegel](https://github.com/aaron-siegel) in [#876](https://github.com/pixeltable/pixeltable/pull/876)
* \[PXT-814] UDF to encode a numpy array to an audio file by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#881](https://github.com/pixeltable/pixeltable/pull/881)
* Convert notebooks to use YAML frontmatter and fix formatting issues by [@goodlux](https://github.com/goodlux) in [#880](https://github.com/pixeltable/pixeltable/pull/880)
* Rename a public constant by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#884](https://github.com/pixeltable/pixeltable/pull/884)
* Multi-phase create\_table() by [@mkornacker](https://github.com/mkornacker) in [#854](https://github.com/pixeltable/pixeltable/pull/854)
* Initial integration of TwelveLabs Embed API by [@mkornacker](https://github.com/mkornacker) in [#885](https://github.com/pixeltable/pixeltable/pull/885)
* Fix `pxt.__version__` by [@aaron-siegel](https://github.com/aaron-siegel) in [#887](https://github.com/pixeltable/pixeltable/pull/887)
* Update many error messages for consistency by [@aaron-siegel](https://github.com/aaron-siegel) in [#869](https://github.com/pixeltable/pixeltable/pull/869)
* Replace `Optional[T]` with `T | None` (Python 3.10 style) throughout the codebase by [@aaron-siegel](https://github.com/aaron-siegel) in [#888](https://github.com/pixeltable/pixeltable/pull/888)
* Docs-related updates to Makefile and pyproject by [@aaron-siegel](https://github.com/aaron-siegel) in [#889](https://github.com/pixeltable/pixeltable/pull/889)
* \[PXT-685] Add `recompute_columns()` to computed columns fundamentals notebook by [@aaron-siegel](https://github.com/aaron-siegel) in [#892](https://github.com/pixeltable/pixeltable/pull/892)
* \[PXT-811, PXT-812] Improve two error messages with helpful hints by [@aaron-siegel](https://github.com/aaron-siegel) in [#891](https://github.com/pixeltable/pixeltable/pull/891)
* Revert two uses of `Optional` in unit tests by [@aaron-siegel](https://github.com/aaron-siegel) in [#893](https://github.com/pixeltable/pixeltable/pull/893)
* Dependency updates for Python 3.14 by [@aaron-siegel](https://github.com/aaron-siegel) in [#894](https://github.com/pixeltable/pixeltable/pull/894)
* Azure support by [@aaron-siegel](https://github.com/aaron-siegel) in [#886](https://github.com/pixeltable/pixeltable/pull/886)
* Default media destination as configuration parameter by [@aaron-siegel](https://github.com/aaron-siegel) in [#883](https://github.com/pixeltable/pixeltable/pull/883)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.18...v0.4.19](https://github.com/pixeltable/pixeltable/compare/v0.4.18...v0.4.19)
***
### v0.4.18
**Released:** October 22, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.18](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.18)
#### What's Changed
* Updates to nightly.yml by [@aaron-siegel](https://github.com/aaron-siegel) in [#866](https://github.com/pixeltable/pixeltable/pull/866)
* Streamline CI configs on PRs by [@aaron-siegel](https://github.com/aaron-siegel) in [#858](https://github.com/pixeltable/pixeltable/pull/858)
* Update WhisperX to >=3.7 and enable for Python 3.13 by [@aaron-siegel](https://github.com/aaron-siegel) in [#860](https://github.com/pixeltable/pixeltable/pull/860)
* elements parameter for DocSplitter by [@mkornacker](https://github.com/mkornacker) in [#865](https://github.com/pixeltable/pixeltable/pull/865)
* Fix examples docstring for add\_embedding\_index() by [@aaron-siegel](https://github.com/aaron-siegel) in [#871](https://github.com/pixeltable/pixeltable/pull/871)
* Improvements to random\_tbl\_ops script by [@aaron-siegel](https://github.com/aaron-siegel) in [#868](https://github.com/pixeltable/pixeltable/pull/868)
* Enforce `numpy>=2.2` by [@aaron-siegel](https://github.com/aaron-siegel) in [#872](https://github.com/pixeltable/pixeltable/pull/872)
* Segmentation-related improvements by [@mkornacker](https://github.com/mkornacker) in [#873](https://github.com/pixeltable/pixeltable/pull/873)
* Randomize the behavior of `sample()` in the case `seed=None` by [@aaron-siegel](https://github.com/aaron-siegel) in [#828](https://github.com/pixeltable/pixeltable/pull/828)
* \[PXT-729] Documentation deploy scripts for Mintlify website and local development by [@goodlux](https://github.com/goodlux) in [#867](https://github.com/pixeltable/pixeltable/pull/867)
* Properly reconstruct btree and vector indices when a replica is restored by [@aaron-siegel](https://github.com/aaron-siegel) in [#875](https://github.com/pixeltable/pixeltable/pull/875)
* Fix various errors and typos in README and the notebooks by [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) in [#877](https://github.com/pixeltable/pixeltable/pull/877)
* UDFs for Hugging Face Auto model integrations by [@aaron-siegel](https://github.com/aaron-siegel) in [#870](https://github.com/pixeltable/pixeltable/pull/870)
#### New Contributors
* [@sergey-mkhitaryan](https://github.com/sergey-mkhitaryan) made their first contribution in [#877](https://github.com/pixeltable/pixeltable/pull/877)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.17...v0.4.18](https://github.com/pixeltable/pixeltable/compare/v0.4.17...v0.4.18)
***
### v0.4.17
**Released:** October 16, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.17](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.17)
#### What's Changed
* Update model used by Together AI tests by [@aaron-siegel](https://github.com/aaron-siegel) in [#846](https://github.com/pixeltable/pixeltable/pull/846)
* Fix broken links at the bottom of basics notebook by [@apreshill](https://github.com/apreshill) in [#844](https://github.com/pixeltable/pixeltable/pull/844)
* Retry failed notebook tests once in CI by [@aaron-siegel](https://github.com/aaron-siegel) in [#830](https://github.com/pixeltable/pixeltable/pull/830)
* feat(storage): add Backblaze B2 S3-compatible integration and tests by [@jeronimodeleon](https://github.com/jeronimodeleon) in [#840](https://github.com/pixeltable/pixeltable/pull/840)
* cockroachDB: Set null\_ordered\_last on session start. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#838](https://github.com/pixeltable/pixeltable/pull/838)
* cockroachDB: Explicit coercions for arithmetic ops. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#839](https://github.com/pixeltable/pixeltable/pull/839)
* Fix for isolated NB tests in CI by [@aaron-siegel](https://github.com/aaron-siegel) in [#847](https://github.com/pixeltable/pixeltable/pull/847)
* Notebook updates & OpenRouter notebook by [@aaron-siegel](https://github.com/aaron-siegel) in [#851](https://github.com/pixeltable/pixeltable/pull/851)
* ffmpeg with libx264 by [@mkornacker](https://github.com/mkornacker) in [#855](https://github.com/pixeltable/pixeltable/pull/855)
* Fixed incorrect documentation links by [@metadaddy](https://github.com/metadaddy) in [#859](https://github.com/pixeltable/pixeltable/pull/859)
* Update pixeltable-pgserver dependency to 0.4.0 by [@aaron-siegel](https://github.com/aaron-siegel) in [#853](https://github.com/pixeltable/pixeltable/pull/853)
* Support packaging of tables with embedding indices for data sharing by [@aaron-siegel](https://github.com/aaron-siegel) in [#841](https://github.com/pixeltable/pixeltable/pull/841)
* mode 'accurate' for VideoSplitter and segment\_video() by [@mkornacker](https://github.com/mkornacker) in [#856](https://github.com/pixeltable/pixeltable/pull/856)
* Added PDF-Page-Chunk-Extractor for image extraction (Issue 703) (PR 705) by [@kamir](https://github.com/kamir) in [#850](https://github.com/pixeltable/pixeltable/pull/850)
* Formatting fixes by [@aaron-siegel](https://github.com/aaron-siegel) in [#862](https://github.com/pixeltable/pixeltable/pull/862)
* Fix pyproject and mypy config by [@aaron-siegel](https://github.com/aaron-siegel) in [#863](https://github.com/pixeltable/pixeltable/pull/863)
* Fixes for load\_replica\_md() with non-snapshot tables by [@aaron-siegel](https://github.com/aaron-siegel) in [#861](https://github.com/pixeltable/pixeltable/pull/861)
* Correctly process cellmd in package/restore by [@aaron-siegel](https://github.com/aaron-siegel) in [#864](https://github.com/pixeltable/pixeltable/pull/864)
#### New Contributors
* [@jeronimodeleon](https://github.com/jeronimodeleon) made their first contribution in [#840](https://github.com/pixeltable/pixeltable/pull/840)
* [@metadaddy](https://github.com/metadaddy) made their first contribution in [#859](https://github.com/pixeltable/pixeltable/pull/859)
* [@kamir](https://github.com/kamir) made their first contribution in [#850](https://github.com/pixeltable/pixeltable/pull/850)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.16...v0.4.17](https://github.com/pixeltable/pixeltable/compare/v0.4.16...v0.4.17)
***
### v0.4.16
**Released:** October 08, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.16](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.16)
#### What's Changed
* Openrouter Integration by [@aaron-siegel](https://github.com/aaron-siegel) in [#825](https://github.com/pixeltable/pixeltable/pull/825)
* Concurrency fixes & random\_tbl\_ops v2 by [@aaron-siegel](https://github.com/aaron-siegel) in [#814](https://github.com/pixeltable/pixeltable/pull/814)
* Images and arrays in json structures, plus improved storage of array columns by [@mkornacker](https://github.com/mkornacker) in [#812](https://github.com/pixeltable/pixeltable/pull/812)
* Minimal edits to docstrings. by [@goodlux](https://github.com/goodlux) in [#813](https://github.com/pixeltable/pixeltable/pull/813)
* Add SDK documentation for Mintlify by [@goodlux](https://github.com/goodlux) in [#835](https://github.com/pixeltable/pixeltable/pull/835)
* Fix for performance problem when importing HF datasets by [@mkornacker](https://github.com/mkornacker) in [#833](https://github.com/pixeltable/pixeltable/pull/833)
* cockroachDB: div, mod operations SQL changed. Timestamp propagated through client stack by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#823](https://github.com/pixeltable/pixeltable/pull/823)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.15...v0.4.16](https://github.com/pixeltable/pixeltable/compare/v0.4.15...v0.4.16)
***
### v0.4.15
**Released:** October 01, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.15](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.15)
#### What's Changed
* Add a spot for the cookbook in docs/ by [@apreshill](https://github.com/apreshill) in [#815](https://github.com/pixeltable/pixeltable/pull/815)
* Fixes for notebook tests resource cleanup by [@aaron-siegel](https://github.com/aaron-siegel) in [#827](https://github.com/pixeltable/pixeltable/pull/827)
* Adding export\_lancedb() to API reference by [@mkornacker](https://github.com/mkornacker) in [#824](https://github.com/pixeltable/pixeltable/pull/824)
* Replace `create_replica()` with separate `publish()` and `replicate()` methods by [@aaron-siegel](https://github.com/aaron-siegel) in [#816](https://github.com/pixeltable/pixeltable/pull/816)
* PXT-638, PXT-675, PXT-682 Handle Keyboard exception by [@amithadke](https://github.com/amithadke) in [#803](https://github.com/pixeltable/pixeltable/pull/803)
* PXT-772 Filling in missing docstrings by [@goodlux](https://github.com/goodlux) in [#822](https://github.com/pixeltable/pixeltable/pull/822)
* with\_audio() udf by [@mkornacker](https://github.com/mkornacker) in [#826](https://github.com/pixeltable/pixeltable/pull/826)
#### New Contributors
* [@apreshill](https://github.com/apreshill) made their first contribution in [#815](https://github.com/pixeltable/pixeltable/pull/815)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.14...v0.4.15](https://github.com/pixeltable/pixeltable/compare/v0.4.14...v0.4.15)
***
### v0.4.14
**Released:** September 23, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.14](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.14)
#### What's Changed
* Proper implementation of package/restore for non-snapshot replicas by [@aaron-siegel](https://github.com/aaron-siegel) in [#797](https://github.com/pixeltable/pixeltable/pull/797)
* Set up pydoclint by [@aaron-siegel](https://github.com/aaron-siegel) in [#805](https://github.com/pixeltable/pixeltable/pull/805)
* upgrade mint.json -> docs.json by [@goodlux](https://github.com/goodlux) in [#809](https://github.com/pixeltable/pixeltable/pull/809)
* Enable a destination parameter on stored computed columns. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#766](https://github.com/pixeltable/pixeltable/pull/766)
* Add support for running tests with cockroachdb as backend by [@amithadke](https://github.com/amithadke) in [#811](https://github.com/pixeltable/pixeltable/pull/811)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.13...v0.4.14](https://github.com/pixeltable/pixeltable/compare/v0.4.13...v0.4.14)
***
### v0.4.13
**Released:** September 19, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.13](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.13)
#### What's Changed
* Added pxt.io.export\_lancedb() by [@mkornacker](https://github.com/mkornacker) in [#795](https://github.com/pixeltable/pixeltable/pull/795)
* Update README.md by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#801](https://github.com/pixeltable/pixeltable/pull/801)
* Use raw\.githubusercontent.com instead of raw\.github.com in tests by [@aaron-siegel](https://github.com/aaron-siegel) in [#806](https://github.com/pixeltable/pixeltable/pull/806)
* Simplify & generalize TableDataSource types by [@aaron-siegel](https://github.com/aaron-siegel) in [#804](https://github.com/pixeltable/pixeltable/pull/804)
* Short Sample App: CLI Media Toolkit for Multimodal Data Processing by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#802](https://github.com/pixeltable/pixeltable/pull/802)
* Table.get\_versions() by [@aaron-siegel](https://github.com/aaron-siegel) in [#800](https://github.com/pixeltable/pixeltable/pull/800)
* Fixes for nightly CI by [@aaron-siegel](https://github.com/aaron-siegel) in [#807](https://github.com/pixeltable/pixeltable/pull/807)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.12...v0.4.13](https://github.com/pixeltable/pixeltable/compare/v0.4.12...v0.4.13)
***
### v0.4.12
**Released:** September 05, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.12](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.12)
#### What's Changed
* Update model used by groq tests and examples by [@aaron-siegel](https://github.com/aaron-siegel) in [#790](https://github.com/pixeltable/pixeltable/pull/790)
* Clear TempStore, MediaStore, and HF cache after each test in CI by [@aaron-siegel](https://github.com/aaron-siegel) in [#792](https://github.com/pixeltable/pixeltable/pull/792)
* Explicitly install pixeltable in run-isolated-nb-tests.sh by [@aaron-siegel](https://github.com/aaron-siegel) in [#794](https://github.com/pixeltable/pixeltable/pull/794)
* Handle incomplete rate limit headers better by [@mkornacker](https://github.com/mkornacker) in [#788](https://github.com/pixeltable/pixeltable/pull/788)
* SDK changes/fixes for data sharing by [@aaron-siegel](https://github.com/aaron-siegel) in [#791](https://github.com/pixeltable/pixeltable/pull/791)
* Disable TestWhisperx on Linux w/ GPU by [@mkornacker](https://github.com/mkornacker) in [#789](https://github.com/pixeltable/pixeltable/pull/789)
* recompute\_columns(): added where parameter by [@mkornacker](https://github.com/mkornacker) in [#787](https://github.com/pixeltable/pixeltable/pull/787)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.11...v0.4.12](https://github.com/pixeltable/pixeltable/compare/v0.4.11...v0.4.12)
***
### v0.4.11
**Released:** August 29, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.11](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.11)
#### What's Changed
* missing .md for VideoSplitter by [@mkornacker](https://github.com/mkornacker) in [#784](https://github.com/pixeltable/pixeltable/pull/784)
* CI & dev environment enhancements by [@aaron-siegel](https://github.com/aaron-siegel) in [#785](https://github.com/pixeltable/pixeltable/pull/785)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.10...v0.4.11](https://github.com/pixeltable/pixeltable/compare/v0.4.10...v0.4.11)
***
### v0.4.10
**Released:** August 28, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.10](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.10)
#### What's Changed
* Fix local\_public\_names() to properly exclude private functions by [@goodlux](https://github.com/goodlux) in [#778](https://github.com/pixeltable/pixeltable/pull/778)
* Add .DS\_Store to .gitignore by [@goodlux](https://github.com/goodlux) in [#779](https://github.com/pixeltable/pixeltable/pull/779)
* More video built-ins by [@mkornacker](https://github.com/mkornacker) in [#768](https://github.com/pixeltable/pixeltable/pull/768)
* Add missing **all** to gemini and whisper modules by [@aaron-siegel](https://github.com/aaron-siegel) in [#781](https://github.com/pixeltable/pixeltable/pull/781)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.9...v0.4.10](https://github.com/pixeltable/pixeltable/compare/v0.4.9...v0.4.10)
***
### v0.4.9
**Released:** August 27, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.9](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.9)
#### What's Changed
* WhisperX Speaker Diarization by [@aaron-siegel](https://github.com/aaron-siegel) in [#770](https://github.com/pixeltable/pixeltable/pull/770)
* Basic support for concurrent pixeltable metadata creation/upgrade by [@amithadke](https://github.com/amithadke) in [#769](https://github.com/pixeltable/pixeltable/pull/769)
* Support for pydantic models in Table.insert() by [@mkornacker](https://github.com/mkornacker) in [#760](https://github.com/pixeltable/pixeltable/pull/760)
* Add comments for concurrent pixeltable initialization changes by [@amithadke](https://github.com/amithadke) in [#772](https://github.com/pixeltable/pixeltable/pull/772)
* Disable notebook tests that are failing in CI for unknown reasons by [@aaron-siegel](https://github.com/aaron-siegel) in [#777](https://github.com/pixeltable/pixeltable/pull/777)
* Publish the existing mypy plugin under `pixeltable.mypy` module to make it accessible for external use. by [@amithadke](https://github.com/amithadke) in [#776](https://github.com/pixeltable/pixeltable/pull/776)
* Remove `ext` package and fold contents into `functions` by [@aaron-siegel](https://github.com/aaron-siegel) in [#775](https://github.com/pixeltable/pixeltable/pull/775)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.8...v0.4.9](https://github.com/pixeltable/pixeltable/compare/v0.4.8...v0.4.9)
***
### v0.4.8
**Released:** August 20, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.8](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.8)
#### What's Changed
* Performance test for chat completion integrations by [@mkornacker](https://github.com/mkornacker) in [#746](https://github.com/pixeltable/pixeltable/pull/746)
* Bugfixes related to missing dependencies by [@aaron-siegel](https://github.com/aaron-siegel) in [#747](https://github.com/pixeltable/pixeltable/pull/747)
* Makefile and pytest improvements by [@aaron-siegel](https://github.com/aaron-siegel) in [#753](https://github.com/pixeltable/pixeltable/pull/753)
* Update dev version of onnx by [@aaron-siegel](https://github.com/aaron-siegel) in [#755](https://github.com/pixeltable/pixeltable/pull/755)
* Pytest configuration fix by [@aaron-siegel](https://github.com/aaron-siegel) in [#756](https://github.com/pixeltable/pixeltable/pull/756)
* RequestRateScheduler improvements by [@mkornacker](https://github.com/mkornacker) in [#752](https://github.com/pixeltable/pixeltable/pull/752)
* Update README.md by [@aaron-siegel](https://github.com/aaron-siegel) in [#754](https://github.com/pixeltable/pixeltable/pull/754)
* Updating tutorial notebook to use Table.recompute\_columns(). by [@mkornacker](https://github.com/mkornacker) in [#757](https://github.com/pixeltable/pixeltable/pull/757)
* Changes to pixeltable shared client for R2 support. by [@amithadke](https://github.com/amithadke) in [#653](https://github.com/pixeltable/pixeltable/pull/653)
* Fix README spacing and linting issues by [@aaron-siegel](https://github.com/aaron-siegel) in [#759](https://github.com/pixeltable/pixeltable/pull/759)
* Move stored\_img\_cols from ExecNode To RowBuilder, add stored\_media\_cols by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#749](https://github.com/pixeltable/pixeltable/pull/749)
* Group local media file operations into a MediaStore or TempStore class by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#748](https://github.com/pixeltable/pixeltable/pull/748)
* Correct construction of two row\_builder members. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#761](https://github.com/pixeltable/pixeltable/pull/761)
* PXT-661 PXT-662 Adding checks for dropping column used by view predicates by [@amithadke](https://github.com/amithadke) in [#751](https://github.com/pixeltable/pixeltable/pull/751)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.7...v0.4.8](https://github.com/pixeltable/pixeltable/compare/v0.4.7...v0.4.8)
***
### v0.4.7
**Released:** August 04, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.7](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.7)
#### What's Changed
* Consolidate ColumnMd operations into from\_md() and to\_md(). by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#715](https://github.com/pixeltable/pixeltable/pull/715)
* Update README.md + Changelog by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#727](https://github.com/pixeltable/pixeltable/pull/727)
* Consolidate all store\_table row prep into DataRow\.create\_store\_table\_row. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#723](https://github.com/pixeltable/pixeltable/pull/723)
* More rigor in UDF evolution tests by [@aaron-siegel](https://github.com/aaron-siegel) in [#728](https://github.com/pixeltable/pixeltable/pull/728)
* Rerun tests that fail due to concurrency conflicts by [@aaron-siegel](https://github.com/aaron-siegel) in [#737](https://github.com/pixeltable/pixeltable/pull/737)
* Replace most uses of `Union[]` with Python 3.10-style unions by [@aaron-siegel](https://github.com/aaron-siegel) in [#735](https://github.com/pixeltable/pixeltable/pull/735)
* Extend FrameIterator to output all available frame attributes by [@mkornacker](https://github.com/mkornacker) in [#716](https://github.com/pixeltable/pixeltable/pull/716)
* Clean up pytest output by [@aaron-siegel](https://github.com/aaron-siegel) in [#740](https://github.com/pixeltable/pixeltable/pull/740)
* Introduce `TypedDict`s for user-facing table, dir, column, and index metadata by [@aaron-siegel](https://github.com/aaron-siegel) in [#739](https://github.com/pixeltable/pixeltable/pull/739)
* get\_dir\_contents(), a more structured replacement for list\_tables() / list\_dirs() by [@aaron-siegel](https://github.com/aaron-siegel) in [#742](https://github.com/pixeltable/pixeltable/pull/742)
* Test cleanup by [@aaron-siegel](https://github.com/aaron-siegel) in [#743](https://github.com/pixeltable/pixeltable/pull/743)
* Prefer public API in tests by [@aaron-siegel](https://github.com/aaron-siegel) in [#744](https://github.com/pixeltable/pixeltable/pull/744)
* Catching missing sqlalchemy transaction-related exceptions by [@mkornacker](https://github.com/mkornacker) in [#745](https://github.com/pixeltable/pixeltable/pull/745)
* PXT-668: Remove unneeded test\_sample\_md5\_fraction. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#750](https://github.com/pixeltable/pixeltable/pull/750)
* PXT-671: fixes to RateLimitsScheduler by [@mkornacker](https://github.com/mkornacker) in [#741](https://github.com/pixeltable/pixeltable/pull/741)
* make\_video API Doc by [@pierrebrunelle](https://github.com/pierrebrunelle) in [#736](https://github.com/pixeltable/pixeltable/pull/736)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.6...v0.4.7](https://github.com/pixeltable/pixeltable/compare/v0.4.6...v0.4.7)
***
### v0.4.6
**Released:** July 24, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.6](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.6)
#### What's Changed
* Migrate from `poetry` to `uv` by [@aaron-siegel](https://github.com/aaron-siegel) in [#722](https://github.com/pixeltable/pixeltable/pull/722)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.5...v0.4.6](https://github.com/pixeltable/pixeltable/compare/v0.4.5...v0.4.6)
***
### v0.4.5
**Released:** July 24, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.5](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.5)
#### What's Changed
* Consolidate more MediaStore operations - part 3 by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#701](https://github.com/pixeltable/pixeltable/pull/701)
* Working Python 3.13 dev installation by [@aaron-siegel](https://github.com/aaron-siegel) in [#695](https://github.com/pixeltable/pixeltable/pull/695)
* Replace uses of sql.text() in catalog.py with idiomatic SQLAlchemy by [@aaron-siegel](https://github.com/aaron-siegel) in [#707](https://github.com/pixeltable/pixeltable/pull/707)
* Move some column summary information into RowBuilder. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#711](https://github.com/pixeltable/pixeltable/pull/711)
* DataFrameResultSet.to\_pydantic() by [@mkornacker](https://github.com/mkornacker) in [#713](https://github.com/pixeltable/pixeltable/pull/713)
* PXT-667: Write media files to MediaStore with correct version. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#714](https://github.com/pixeltable/pixeltable/pull/714)
* Time travel by [@aaron-siegel](https://github.com/aaron-siegel) in [#710](https://github.com/pixeltable/pixeltable/pull/710)
* Correct the table.history status report for newly created views. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#719](https://github.com/pixeltable/pixeltable/pull/719)
* Include all columns in packager data preview by [@aaron-siegel](https://github.com/aaron-siegel) in [#720](https://github.com/pixeltable/pixeltable/pull/720)
* Communicate Column spec for all MediaStore save and move operations by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#718](https://github.com/pixeltable/pixeltable/pull/718)
* Further simplify DataRowBatch. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#724](https://github.com/pixeltable/pixeltable/pull/724)
* Use the method plan.\_insert\_prefetch\_node everywhere. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#721](https://github.com/pixeltable/pixeltable/pull/721)
* Support Python 3.10 style union types by [@aaron-siegel](https://github.com/aaron-siegel) in [#726](https://github.com/pixeltable/pixeltable/pull/726)
* Additional config parameters + more flexible rate limit parsing for Azure OpenAI support by [@aaron-siegel](https://github.com/aaron-siegel) in [#725](https://github.com/pixeltable/pixeltable/pull/725)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.4...v0.4.5](https://github.com/pixeltable/pixeltable/compare/v0.4.4...v0.4.5)
***
### v0.4.4
**Released:** July 16, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.4](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.4)
#### What's Changed
* Consolidate MediaStore file operations, including temp file name creation by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#694](https://github.com/pixeltable/pixeltable/pull/694)
* Update google-genai dev dependency by [@aaron-siegel](https://github.com/aaron-siegel) in [#699](https://github.com/pixeltable/pixeltable/pull/699)
* CI changes for random-tbl-ops by [@aaron-siegel](https://github.com/aaron-siegel) in [#697](https://github.com/pixeltable/pixeltable/pull/697)
* schema\_overrides bugfixes by [@aaron-siegel](https://github.com/aaron-siegel) in [#700](https://github.com/pixeltable/pixeltable/pull/700)
* Load replicas as views by [@aaron-siegel](https://github.com/aaron-siegel) in [#696](https://github.com/pixeltable/pixeltable/pull/696)
* Multi-phase transactions by [@mkornacker](https://github.com/mkornacker) in [#692](https://github.com/pixeltable/pixeltable/pull/692)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.3...v0.4.4](https://github.com/pixeltable/pixeltable/compare/v0.4.3...v0.4.4)
***
### v0.4.3
**Released:** July 10, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.3](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.3)
#### What's Changed
* Allow config parameters to be specified in `pxt.init()` by [@aaron-siegel](https://github.com/aaron-siegel) in [#680](https://github.com/pixeltable/pixeltable/pull/680)
* Prepare to report more status in table.history() by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#682](https://github.com/pixeltable/pixeltable/pull/682)
* `pxt.ls()` command for pretty-printing all contents of a Pixeltable dir by [@aaron-siegel](https://github.com/aaron-siegel) in [#681](https://github.com/pixeltable/pixeltable/pull/681)
* Handle 429 errors in RateLimitScheduler by [@mkornacker](https://github.com/mkornacker) in [#670](https://github.com/pixeltable/pixeltable/pull/670)
* Support dicts and Sequences of dicts in HF datasets \[rough-edges PXT-640] by [@aaron-siegel](https://github.com/aaron-siegel) in [#684](https://github.com/pixeltable/pixeltable/pull/684)
* Allow packaging of non-snapshot tables in TablePackager by [@aaron-siegel](https://github.com/aaron-siegel) in [#688](https://github.com/pixeltable/pixeltable/pull/688)
* Use a JSON field xxx\_cellmd in place of xxx\_errortype and xxx\_errormsg by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#685](https://github.com/pixeltable/pixeltable/pull/685)
* Consolidate media operations in the MediaStore module by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#691](https://github.com/pixeltable/pixeltable/pull/691)
* Enhance UpdateStatus to subsume SyncStatus. Save user and UpdateStatus in a field in TableVersionMd. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#689](https://github.com/pixeltable/pixeltable/pull/689)
* Refactor create\_replica to conform to concurrency protocol by [@aaron-siegel](https://github.com/aaron-siegel) in [#690](https://github.com/pixeltable/pixeltable/pull/690)
* Add additional packages & task configurations to nightly.yml by [@aaron-siegel](https://github.com/aaron-siegel) in [#693](https://github.com/pixeltable/pixeltable/pull/693)
* Doc fixes for audio and video UDFs by [@aaron-siegel](https://github.com/aaron-siegel) in [#698](https://github.com/pixeltable/pixeltable/pull/698)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.2...v0.4.3](https://github.com/pixeltable/pixeltable/compare/v0.4.2...v0.4.3)
***
### v0.4.2
**Released:** June 27, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.2](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.2)
#### What's Changed
* Revert various accumulated workarounds in CI by [@aaron-siegel](https://github.com/aaron-siegel) in [#669](https://github.com/pixeltable/pixeltable/pull/669)
* Use ColumnHandles in external stores by [@aaron-siegel](https://github.com/aaron-siegel) in [#664](https://github.com/pixeltable/pixeltable/pull/664)
* Update versions of a few more libraries by [@aaron-siegel](https://github.com/aaron-siegel) in [#668](https://github.com/pixeltable/pixeltable/pull/668)
* First part of additional status collection for table.history reporting. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#654](https://github.com/pixeltable/pixeltable/pull/654)
* Add table.history() method to return a user-readable list of known changes to a table. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#640](https://github.com/pixeltable/pixeltable/pull/640)
* Added Table.recompute\_columns() by [@mkornacker](https://github.com/mkornacker) in [#667](https://github.com/pixeltable/pixeltable/pull/667)
* Collect more information on ins, del, upd operations. Freeze UpdateStatus. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#673](https://github.com/pixeltable/pixeltable/pull/673)
* Refactor SyncStatus for merge with UpdateStatus. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#674](https://github.com/pixeltable/pixeltable/pull/674)
* Adding recompute\_columns() to overview in table.md by [@mkornacker](https://github.com/mkornacker) in [#675](https://github.com/pixeltable/pixeltable/pull/675)
* CI workflow for random table ops by [@aaron-siegel](https://github.com/aaron-siegel) in [#676](https://github.com/pixeltable/pixeltable/pull/676)
* \~40% improvement in insert performance by [@aaron-siegel](https://github.com/aaron-siegel) in [#658](https://github.com/pixeltable/pixeltable/pull/658)
* Skip whisperx on t4 instances by [@aaron-siegel](https://github.com/aaron-siegel) in [#678](https://github.com/pixeltable/pixeltable/pull/678)
* Pretty-print update status in notebooks or IPython shells by [@aaron-siegel](https://github.com/aaron-siegel) in [#677](https://github.com/pixeltable/pixeltable/pull/677)
* Performance improvements in add\_computed\_column by [@aaron-siegel](https://github.com/aaron-siegel) in [#679](https://github.com/pixeltable/pixeltable/pull/679)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.1...v0.4.2](https://github.com/pixeltable/pixeltable/compare/v0.4.1...v0.4.2)
***
### v0.4.1
**Released:** June 19, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.1](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.1)
#### What's Changed
* Docs/update model kwargs by [@jacobweiss2305](https://github.com/jacobweiss2305) in [#662](https://github.com/pixeltable/pixeltable/pull/662)
* Fixes and improvements for nightly CI job by [@aaron-siegel](https://github.com/aaron-siegel) in [#665](https://github.com/pixeltable/pixeltable/pull/665)
* Docs/changelog v0.4.0 by [@jacobweiss2305](https://github.com/jacobweiss2305) in [#663](https://github.com/pixeltable/pixeltable/pull/663)
* Update dev versions of many libraries used by Pixeltable by [@aaron-siegel](https://github.com/aaron-siegel) in [#666](https://github.com/pixeltable/pixeltable/pull/666)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.0...v0.4.1](https://github.com/pixeltable/pixeltable/compare/v0.4.0...v0.4.1)
***
### v0.4.0
**Released:** June 16, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.0](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.0)
#### Highlights
* Support for concurrent insert/query and table/view operations
* `sample()` operator for deterministic, pseudo-random samples of tables and data frames
* More flexible API for optional LLM parameters
* Groq integration
* MCP integration
* HEIC image support
* Numerous bugfixes
#### All Changes
* Support for concurrent table operations by [@mkornacker](https://github.com/mkornacker) in [#611](https://github.com/pixeltable/pixeltable/pull/611)
* New Deepseek notebook by [@aaron-siegel](https://github.com/aaron-siegel) in [#634](https://github.com/pixeltable/pixeltable/pull/634)
* Re-enable 3 of the 4 disabled Labelstudio tests by [@mkornacker](https://github.com/mkornacker) in [#635](https://github.com/pixeltable/pixeltable/pull/635)
* Implement `to_sql` for many string methods by [@aaron-siegel](https://github.com/aaron-siegel) in [#636](https://github.com/pixeltable/pixeltable/pull/636)
* Remove extraneous reload\_catalog() in test\_packager by [@aaron-siegel](https://github.com/aaron-siegel) in [#637](https://github.com/pixeltable/pixeltable/pull/637)
* fix building with llm link by [@jacobweiss2305](https://github.com/jacobweiss2305) in [#638](https://github.com/pixeltable/pixeltable/pull/638)
* Allow HEIC images by [@aaron-siegel](https://github.com/aaron-siegel) in [#639](https://github.com/pixeltable/pixeltable/pull/639)
* Include preview data in request when publishing a table by [@aaron-siegel](https://github.com/aaron-siegel) in [#631](https://github.com/pixeltable/pixeltable/pull/631)
* WIP: stratified sampling operation on DataFrame by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#591](https://github.com/pixeltable/pixeltable/pull/591)
* remove main reference and replace with release by [@jacobweiss2305](https://github.com/jacobweiss2305) in [#646](https://github.com/pixeltable/pixeltable/pull/646)
* docs: add product updates changelog with version history and release notes by [@jacobweiss2305](https://github.com/jacobweiss2305) in [#645](https://github.com/pixeltable/pixeltable/pull/645)
* remove print statement in gemini tool calls by [@jacobweiss2305](https://github.com/jacobweiss2305) in [#651](https://github.com/pixeltable/pixeltable/pull/651)
* PXT-595: Raise error if attempting to access metadata from a future v… by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#642](https://github.com/pixeltable/pixeltable/pull/642)
* Make TableVersion timestamps consistent across propagated changes. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#643](https://github.com/pixeltable/pixeltable/pull/643)
* Update RowBuilder.create\_table\_raw to save PIL image with the jpeg extension by [@Yann-CV](https://github.com/Yann-CV) in [#648](https://github.com/pixeltable/pixeltable/pull/648)
* Fix bug in handling "nullary" JsonMapper expressions by [@aaron-siegel](https://github.com/aaron-siegel) in [#655](https://github.com/pixeltable/pixeltable/pull/655)
* Update release.sh to handle pre-releases by [@aaron-siegel](https://github.com/aaron-siegel) in [#656](https://github.com/pixeltable/pixeltable/pull/656)
* Refactor inference API integrations to use `model_kwargs` dicts instead of explicit parameters by [@aaron-siegel](https://github.com/aaron-siegel) in [#641](https://github.com/pixeltable/pixeltable/pull/641)
* Refactor tool invocation unit tests \[techdebt] by [@aaron-siegel](https://github.com/aaron-siegel) in [#657](https://github.com/pixeltable/pixeltable/pull/657)
* Concurrent view interactions by [@mkornacker](https://github.com/mkornacker) in [#652](https://github.com/pixeltable/pixeltable/pull/652)
* Consolidate all SQL generation related to sampling inside of SqlSampleNode by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#649](https://github.com/pixeltable/pixeltable/pull/649)
* Suppressing asyncio slow callback warnings by [@mkornacker](https://github.com/mkornacker) in [#660](https://github.com/pixeltable/pixeltable/pull/660)
* Groq integration by [@aaron-siegel](https://github.com/aaron-siegel) in [#659](https://github.com/pixeltable/pixeltable/pull/659)
* First cut at MCP integration by [@aaron-siegel](https://github.com/aaron-siegel) in [#661](https://github.com/pixeltable/pixeltable/pull/661)
#### New Contributors
* [@Yann-CV](https://github.com/Yann-CV) made their first contribution in [#648](https://github.com/pixeltable/pixeltable/pull/648)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.3.15...v0.4.0](https://github.com/pixeltable/pixeltable/compare/v0.3.15...v0.4.0)
***
### v0.4.0-pre.3
**Released:** June 10, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.0-pre.3](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.0-pre.3)
#### What's Changed
* Update release.sh to handle pre-releases by [@aaron-siegel](https://github.com/aaron-siegel) in [#656](https://github.com/pixeltable/pixeltable/pull/656)
* Refactor inference API integrations to use `model_kwargs` dicts instead of explicit parameters by [@aaron-siegel](https://github.com/aaron-siegel) in [#641](https://github.com/pixeltable/pixeltable/pull/641)
* Refactor tool invocation unit tests \[techdebt] by [@aaron-siegel](https://github.com/aaron-siegel) in [#657](https://github.com/pixeltable/pixeltable/pull/657)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.4.0-pre.2...v0.4.0-pre.3](https://github.com/pixeltable/pixeltable/compare/v0.4.0-pre.2...v0.4.0-pre.3)
***
### v0.4.0-pre.2
**Released:** June 07, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.0-pre.2](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.0-pre.2)
#### What's Changed
* fix building with llm link by [@jacobweiss2305](https://github.com/jacobweiss2305) in [#638](https://github.com/pixeltable/pixeltable/pull/638)
* Allow HEIC images by [@aaron-siegel](https://github.com/aaron-siegel) in [#639](https://github.com/pixeltable/pixeltable/pull/639)
* Include preview data in request when publishing a table by [@aaron-siegel](https://github.com/aaron-siegel) in [#631](https://github.com/pixeltable/pixeltable/pull/631)
* WIP: stratified sampling operation on DataFrame by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#591](https://github.com/pixeltable/pixeltable/pull/591)
* remove main reference and replace with release by [@jacobweiss2305](https://github.com/jacobweiss2305) in [#646](https://github.com/pixeltable/pixeltable/pull/646)
* docs: add product updates changelog with version history and release notes by [@jacobweiss2305](https://github.com/jacobweiss2305) in [#645](https://github.com/pixeltable/pixeltable/pull/645)
* remove print statement in gemini tool calls by [@jacobweiss2305](https://github.com/jacobweiss2305) in [#651](https://github.com/pixeltable/pixeltable/pull/651)
* PXT-595: Raise error if attempting to access metadata from a future v… by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#642](https://github.com/pixeltable/pixeltable/pull/642)
* Make TableVersion timestamps consistent across propagated changes. by [@jpeterson-pxt](https://github.com/jpeterson-pxt) in [#643](https://github.com/pixeltable/pixeltable/pull/643)
* Update RowBuilder.create\_table\_raw to save PIL image with the jpeg extension by [@Yann-CV](https://github.com/Yann-CV) in [#648](https://github.com/pixeltable/pixeltable/pull/648)
* Fix bug in handling "nullary" JsonMapper expressions by [@aaron-siegel](https://github.com/aaron-siegel) in [#655](https://github.com/pixeltable/pixeltable/pull/655)
#### New Contributors
* [@Yann-CV](https://github.com/Yann-CV) made their first contribution in [#648](https://github.com/pixeltable/pixeltable/pull/648)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.3.15...v0.4.0-pre.2](https://github.com/pixeltable/pixeltable/compare/v0.3.15...v0.4.0-pre.2)
***
### v0.4.0-pre.1
**Released:** May 28, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.4.0-pre.1](https://github.com/pixeltable/pixeltable/releases/tag/v0.4.0-pre.1)
#### What's Changed
* Support for concurrent table operations by [@mkornacker](https://github.com/mkornacker) in [#611](https://github.com/pixeltable/pixeltable/pull/611)
* New Deepseek notebook by [@aaron-siegel](https://github.com/aaron-siegel) in [#634](https://github.com/pixeltable/pixeltable/pull/634)
* Re-enable 3 of the 4 disabled Labelstudio tests by [@mkornacker](https://github.com/mkornacker) in [#635](https://github.com/pixeltable/pixeltable/pull/635)
* Implement `to_sql` for many string methods by [@aaron-siegel](https://github.com/aaron-siegel) in [#636](https://github.com/pixeltable/pixeltable/pull/636)
* Remove extraneous reload\_catalog() in test\_packager by [@aaron-siegel](https://github.com/aaron-siegel) in [#637](https://github.com/pixeltable/pixeltable/pull/637)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.3.15...v0.4.0-pre.1](https://github.com/pixeltable/pixeltable/compare/v0.3.15...v0.4.0-pre.1)
***
### v0.3.15
**Released:** May 25, 2025\
**Author:** [@aaron-siegel](https://github.com/aaron-siegel)\
**View on GitHub:** [v0.3.15](https://github.com/pixeltable/pixeltable/releases/tag/v0.3.15)
#### What's Changed
* Rename blueprint links to guides in pixelagent documentation by [@jacobweiss2305](https://github.com/jacobweiss2305) in [#628](https://github.com/pixeltable/pixeltable/pull/628)
* Add documentation for embedding\_access feature by [@jacobweiss2305](https://github.com/jacobweiss2305) in [#626](https://github.com/pixeltable/pixeltable/pull/626)
* Improve import documentation by [@jacobweiss2305](https://github.com/jacobweiss2305) in [#624](https://github.com/pixeltable/pixeltable/pull/624)
* Update mint.json to use Kandinsky color theme by [@jacobweiss2305](https://github.com/jacobweiss2305) in [#633](https://github.com/pixeltable/pixeltable/pull/633)
* Merge different versions of base tables consistently when pulling replicas by [@aaron-siegel](https://github.com/aaron-siegel) in [#625](https://github.com/pixeltable/pixeltable/pull/625)
* Add UDFs for Google Imagen and Veo; Support Tool Calling in Gemini by [@aaron-siegel](https://github.com/aaron-siegel) in [#632](https://github.com/pixeltable/pixeltable/pull/632)
**Full Changelog**: [https://github.com/pixeltable/pixeltable/compare/v0.3.14...v0.3.15](https://github.com/pixeltable/pixeltable/compare/v0.3.14...v0.3.15)
***
# Agentic Patterns
Source: https://docs.pixeltable.com/howto/cookbooks/agents/agentic-patterns
Created directory 'agentic\_patterns'. \## Pattern 1: Prompt Chaining Break a complex task into sequential steps, where each step’s output feeds the next. **Imperative approach:** a chain of function calls or an explicit pipeline object. **Pixeltable approach:** each step is a computed column. The engine resolves dependencies automatically.
input → step 1 (outline) → step 2 (draft) → step 3 (polish) → output```python theme={null} # Create a table with a single input column chain = pxt.create_table('agentic_patterns/chain', {'topic': pxt.String}) ```
Created table 'chain'.```python theme={null} # Step 1: generate an outline chain.add_computed_column( outline_response=openai.chat_completions( messages=[ { 'role': 'user', 'content': 'Create a 3-point outline for a short article about: ' + chain.topic, } ], model='gpt-4o-mini', ) ) chain.add_computed_column( outline=chain.outline_response.choices[0].message.content.astype( pxt.String ) ) ```
Added 0 column values with 0 errors in 0.00 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Step 2: write a draft from the outline chain.add_computed_column( draft_response=openai.chat_completions( messages=[ { 'role': 'user', 'content': 'Write a short article (2-3 paragraphs) based on this outline:\n\n' + chain.outline, } ], model='gpt-4o-mini', ) ) chain.add_computed_column( draft=chain.draft_response.choices[0].message.content.astype( pxt.String ) ) ```
Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Step 3: polish the draft chain.add_computed_column( polish_response=openai.chat_completions( messages=[ { 'role': 'user', 'content': 'Edit this article for clarity and conciseness. ' 'Return only the improved text:\n\n' + chain.draft, } ], model='gpt-4o-mini', ) ) chain.add_computed_column( final_article=chain.polish_response.choices[0].message.content.astype( pxt.String ) ) ```
Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Insert a topic — all three steps execute automatically chain.insert([{'topic': 'the benefits of declarative AI pipelines'}]) chain.select( chain.topic, chain.outline, chain.draft, chain.final_article ).collect() ```
Inserted 1 row with 0 errors in 14.58 s (0.07 rows/s)Every intermediate result (`outline`, `draft`, `final_article`) is persisted in the table. Inserting another topic reuses the same pipeline — no code changes needed. If the same topic is inserted again, cached results are returned instantly. ## Pattern 2: Routing Classify an input and route it to a specialized handler. This is the agent equivalent of a switch/case statement. **Imperative approach:** a triage agent that performs handoffs to specialized agents. **Pixeltable approach:** one computed column classifies; a UDF selects the prompt; a second LLM call generates the response.
input → classify intent → select specialized prompt → generate response```python theme={null} router = pxt.create_table( 'agentic_patterns/router', {'query': pxt.String} ) ```
Created table 'router'.```python theme={null} # Step 1: classify the query intent router.add_computed_column( classify_response=openai.chat_completions( messages=[ { 'role': 'user', 'content': 'Classify this customer query into exactly one category: ' 'technical, billing, or general. Reply with the single word only.\n\n' 'Query: ' + router.query, } ], model='gpt-4o-mini', ) ) router.add_computed_column( intent=router.classify_response.choices[0].message.content.astype( pxt.String ) ) ```
Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Step 2: route to a specialized system prompt based on the classification @pxt.udf def route_prompt(intent: str, query: str) -> list[dict]: """Select a system prompt based on the classified intent.""" system_prompts = { 'technical': 'You are a senior technical support engineer. ' 'Provide precise, step-by-step troubleshooting guidance.', 'billing': 'You are a billing specialist. ' 'Be empathetic and clear about charges, refunds, and payment options.', 'general': 'You are a friendly customer service representative. ' 'Answer helpfully and concisely.', } # Default to general if classification is unexpected system = system_prompts.get( intent.strip().lower(), system_prompts['general'] ) return [ {'role': 'system', 'content': system}, {'role': 'user', 'content': query}, ] router.add_computed_column( routed_messages=route_prompt(router.intent, router.query) ) ```
Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Step 3: generate the specialized response router.add_computed_column( response_raw=openai.chat_completions( messages=router.routed_messages, model='gpt-4o-mini' ) ) router.add_computed_column( response=router.response_raw.choices[0].message.content.astype( pxt.String ) ) ```
Added 0 column values with 0 errors in 0.00 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Insert queries spanning different intents router.insert( [ { 'query': 'My API calls are returning 429 errors since this morning' }, {'query': 'I was charged twice for my subscription last month'}, {'query': 'What programming languages do you support?'}, ] ) router.select(router.query, router.intent, router.response).collect() ```
Inserted 3 rows with 0 errors in 6.93 s (0.43 rows/s)Each query was classified and then handled by a specialized system prompt. The `intent` column is inspectable for every row, making it easy to audit routing decisions. ## Pattern 3: Parallelization Run multiple independent LLM calls on the same input simultaneously, then combine the results. **Imperative approach:** `asyncio.gather` or thread pools. **Pixeltable approach:** add independent computed columns. The engine parallelizes them automatically because they share no dependencies.
┌→ sentiment ─┐ input ──┼→ entities ──┼→ merge → combined output └→ summary ─┘```python theme={null} parallel = pxt.create_table( 'agentic_patterns/parallel', {'text': pxt.String} ) ```
Created table 'parallel'.```python theme={null} # Three independent LLM calls — Pixeltable runs them in parallel automatically parallel.add_computed_column( sentiment_raw=openai.chat_completions( messages=[ { 'role': 'user', 'content': 'Analyze the sentiment of this text. ' 'Reply with: positive, negative, or neutral.\n\n' + parallel.text, } ], model='gpt-4o-mini', ) ) parallel.add_computed_column( sentiment=parallel.sentiment_raw.choices[0].message.content.astype( pxt.String ) ) parallel.add_computed_column( entities_raw=openai.chat_completions( messages=[ { 'role': 'user', 'content': 'Extract all named entities (people, companies, locations) ' 'from this text. Return a comma-separated list.\n\n' + parallel.text, } ], model='gpt-4o-mini', ) ) parallel.add_computed_column( entities=parallel.entities_raw.choices[0].message.content.astype( pxt.String ) ) parallel.add_computed_column( summary_raw=openai.chat_completions( messages=[ { 'role': 'user', 'content': 'Summarize this text in one sentence.\n\n' + parallel.text, } ], model='gpt-4o-mini', ) ) parallel.add_computed_column( summary=parallel.summary_raw.choices[0].message.content.astype( pxt.String ) ) ```
Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.00 s Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.00 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Merge the parallel results into a single structured report @pxt.udf def merge_analysis(sentiment: str, entities: str, summary: str) -> dict: """Combine parallel analysis results into one report.""" return { 'sentiment': sentiment.strip(), 'entities': entities.strip(), 'summary': summary.strip(), } parallel.add_computed_column( report=merge_analysis( parallel.sentiment, parallel.entities, parallel.summary ) ) ```
Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} parallel.insert( [ { 'text': 'Apple announced record quarterly revenue of $124 billion, ' 'driven by strong iPhone sales in Europe and Asia. CEO Tim Cook ' "expressed optimism about the company's AI initiatives, while " 'some analysts remain cautious about increased R&D spending.' } ] ) parallel.select( parallel.text, parallel.sentiment, parallel.entities, parallel.summary ).collect() ``` The three LLM calls (`sentiment`, `entities`, `summary`) have no dependency on each other, so Pixeltable dispatches them concurrently. The `merge_analysis` UDF waits for all three before combining the results. No async code required. ## Pattern 4: Tool Use Give an LLM access to external functions it can call to gather information or take action. **Imperative approach:** `@function_tool` decorator, tool loop that re-prompts until the LLM stops requesting tools. **Pixeltable approach:** `pxt.tools()` bundles UDFs into tool definitions; `invoke_tools()` executes the LLM’s choices — both as computed columns.
input → LLM (with tools) → invoke\_tools() → resultsFor a deeper walkthrough including MCP servers, see [Use tool calling with LLMs](/howto/cookbooks/agents/llm-tool-calling). ```python theme={null} # Define tool functions as UDFs @pxt.udf def get_weather(city: str) -> str: """Get the current weather for a city.""" weather_data = { 'new york': 'Sunny, 72F', 'london': 'Cloudy, 58F', 'tokyo': 'Rainy, 65F', 'paris': 'Partly cloudy, 68F', } return weather_data.get( city.lower(), f'Weather data not available for {city}' ) @pxt.udf def get_stock_price(symbol: str) -> str: """Get the current stock price for a ticker symbol.""" prices = {'AAPL': '$178.50', 'GOOGL': '$141.25', 'MSFT': '$378.90'} return prices.get(symbol.upper(), f'Price not available for {symbol}') # Bundle into a Tools object tools = pxt.tools(get_weather, get_stock_price) ``` ```python theme={null} # Create the tool-calling pipeline tool_agent = pxt.create_table( 'agentic_patterns/tool_agent', {'query': pxt.String} ) # LLM decides which tool(s) to call tool_agent.add_computed_column( response=openai.chat_completions( messages=[{'role': 'user', 'content': tool_agent.query}], model='gpt-4o-mini', tools=tools, ) ) # Execute the tool calls automatically tool_agent.add_computed_column( tool_output=openai.invoke_tools(tools, tool_agent.response) ) ```
Created table 'tool\_agent'. Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} tool_agent.insert( [ {'query': "What's the weather in Tokyo?"}, {'query': "What's Apple's stock price?"}, { 'query': "What's the weather in Paris and Microsoft's stock price?" }, ] ) for row in tool_agent.select( tool_agent.query, tool_agent.tool_output ).collect(): print(f'Query: {row["query"]}') for tool_name, results in (row['tool_output'] or {}).items(): if results: print(f' -> {tool_name}: {results}') print() ``` The LLM chose which tools to invoke (including multiple tools for the last query). `invoke_tools()` executed them and stored results. The full LLM response is also persisted in the `response` column for debugging. ## Pattern 5: Evaluator-Optimizer One LLM generates output, a second LLM evaluates it, and the results are used to decide whether to refine. This is the architectural cousin of the *Reflection* pattern from Taxonomy 1 — an agent critiques its own output and iteratively improves it. **Imperative approach:** a while-loop that re-prompts until a quality threshold is met (see [Pixelagent’s reflection example](https://github.com/pixeltable/pixelagent/tree/main/examples/reflection)). **Pixeltable approach:** chained computed columns — generate, evaluate, then conditionally refine. The evaluation score is stored alongside the content for analysis.
input → generate → evaluate (score + feedback) → refine if needed → output```python theme={null} evaluator = pxt.create_table( 'agentic_patterns/evaluator', {'product_brief': pxt.String} ) ```
Created table 'evaluator'.```python theme={null} # Step 1: generate initial marketing copy evaluator.add_computed_column( gen_response=openai.chat_completions( messages=[ { 'role': 'user', 'content': 'Write a short marketing tagline (one sentence) for this product:\n\n' + evaluator.product_brief, } ], model='gpt-4o-mini', ) ) evaluator.add_computed_column( first_draft=evaluator.gen_response.choices[0].message.content.astype( pxt.String ) ) ```
Added 0 column values with 0 errors in 0.00 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Step 2: evaluate the draft with an LLM-as-judge evaluator.add_computed_column( eval_response=openai.chat_completions( messages=[ { 'role': 'user', 'content': 'Rate this marketing tagline on a scale of 1-10 for clarity, ' 'creativity, and persuasiveness. Then provide one sentence of feedback ' 'for improvement.\n\n' 'Tagline: ' + evaluator.first_draft + '\n\n' 'Reply in this exact format:\n' 'Score:
Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Step 3: refine using the feedback evaluator.add_computed_column( refine_response=openai.chat_completions( messages=[ { 'role': 'user', 'content': 'Improve this marketing tagline based on the feedback below. ' 'Return only the improved tagline.\n\n' 'Original: ' + evaluator.first_draft + '\n\n' 'Feedback: ' + evaluator.evaluation, } ], model='gpt-4o-mini', ) ) evaluator.add_computed_column( refined=evaluator.refine_response.choices[0].message.content.astype( pxt.String ) ) ```
Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} evaluator.insert( [ { 'product_brief': 'A noise-canceling headphone designed for open-plan offices, ' 'with 30-hour battery life and a built-in microphone for calls.' }, { 'product_brief': 'An AI-powered code review tool that catches bugs, suggests ' "improvements, and learns your team's coding style over time." }, ] ) evaluator.select( evaluator.product_brief, evaluator.first_draft, evaluator.evaluation, evaluator.refined, ).collect() ```
Inserted 2 rows with 0 errors in 2.95 s (0.68 rows/s)Both the first draft and the refined version are stored side-by-side with the evaluation. This makes it straightforward to compare outputs, audit the judge’s reasoning, or filter rows where the score fell below a threshold. ## Pattern 6: Orchestrator-Worker A central agent decomposes a task, delegates sub-tasks to specialized worker agents, and synthesizes the results. This is the architectural cousin of the *Multi-Agent* pattern from Taxonomy 1, and the same structure Anthropic uses in their [multi-agent research system](https://www.anthropic.com/engineering/multi-agent-research-system) — a lead agent coordinates parallel subagents, each with their own context and tools. **Imperative approach:** an orchestrator agent class that spawns worker agent instances and collects their outputs. **Pixeltable approach:** each worker is a table with computed columns, wrapped as a callable function via `pxt.udf(table, return_value=...)`. The orchestrator table calls these functions as computed columns.
input → decompose → worker A (summarizer) ─┐ → worker B (fact-checker) ─┼→ synthesize → outputFor more on table UDFs, see [Use a table pipeline as a reusable function](/howto/cookbooks/agents/pattern-table-as-udf). ### Build worker agents as tables ```python theme={null} # Worker A: summarizer summarizer_tbl = pxt.create_table( 'agentic_patterns/summarizer', {'text': pxt.String} ) summarizer_tbl.add_computed_column( response=openai.chat_completions( messages=[ { 'role': 'user', 'content': 'Summarize this text in 2-3 sentences:\n\n' + summarizer_tbl.text, } ], model='gpt-4o-mini', ) ) summarizer_tbl.add_computed_column( summary=summarizer_tbl.response.choices[0].message.content.astype( pxt.String ) ) # Wrap as a callable function summarize = pxt.udf(summarizer_tbl, return_value=summarizer_tbl.summary) ```
Created table 'summarizer'. Added 0 column values with 0 errors in 0.10 s Added 0 column values with 0 errors in 0.06 s```python theme={null} # Worker B: fact-checker checker_tbl = pxt.create_table( 'agentic_patterns/checker', {'claim': pxt.String} ) checker_tbl.add_computed_column( response=openai.chat_completions( messages=[ { 'role': 'user', 'content': 'Assess whether this claim is plausible. ' 'Reply with: PLAUSIBLE or DUBIOUS, followed by a one-sentence explanation.\n\n' 'Claim: ' + checker_tbl.claim, } ], model='gpt-4o-mini', ) ) checker_tbl.add_computed_column( assessment=checker_tbl.response.choices[0].message.content.astype( pxt.String ) ) # Wrap as a callable function fact_check = pxt.udf(checker_tbl, return_value=checker_tbl.assessment) ```
Created table 'checker'. Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.02 s### Build the orchestrator ```python theme={null} # Orchestrator table: delegates to workers, then synthesizes orchestrator = pxt.create_table( 'agentic_patterns/orchestrator', {'article': pxt.String} ) # Dispatch to worker A (summarizer) and worker B (fact-checker) in parallel orchestrator.add_computed_column( summary=summarize(text=orchestrator.article) ) orchestrator.add_computed_column( fact_check_result=fact_check(claim=orchestrator.article) ) ```
Created table 'orchestrator'. Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Synthesize worker outputs into a final briefing orchestrator.add_computed_column( synth_response=openai.chat_completions( messages=[ { 'role': 'user', 'content': 'Based on the summary and fact-check below, write a brief ' 'editorial note (2-3 sentences) about this article.\n\n' 'Summary: ' + orchestrator.summary + '\n\n' 'Fact-check: ' + orchestrator.fact_check_result, } ], model='gpt-4o-mini', ) ) orchestrator.add_computed_column( briefing=orchestrator.synth_response.choices[ 0 ].message.content.astype(pxt.String) ) ```
Added 0 column values with 0 errors in 0.02 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} orchestrator.insert( [ { 'article': 'A recent study published in Nature found that global sea levels ' 'rose by 4.5 mm per year over the last decade, nearly double the rate observed ' 'in the 1990s. Researchers attribute the acceleration primarily to ice sheet ' 'loss in Greenland and Antarctica, compounded by thermal expansion of ocean ' 'water. The findings suggest coastal cities may face significant flooding risks ' 'by 2050 without aggressive mitigation strategies.' } ] ) orchestrator.select( orchestrator.summary, orchestrator.fact_check_result, orchestrator.briefing, ).collect() ```
Inserted 1 row with 0 errors in 4.69 s (0.21 rows/s)The orchestrator table called two independent worker pipelines (`summarize` and `fact_check`), each backed by their own table with full intermediate-result persistence. The synthesis step consumed both outputs to produce the final briefing. Adding a new worker (e.g., a tone analyzer) requires only creating another table, wrapping it with `pxt.udf()`, and adding one more computed column to the orchestrator. ## Strategy A: ReAct ReAct is not a wiring pattern — it is a **reasoning strategy** that can be applied inside any of the six patterns above. The agent alternates between reasoning about the next step and acting on it (typically via tools), observing the result before deciding what to do next. **Imperative approach:** a while-loop that parses the LLM’s THOUGHT/ACTION output, calls tools, and feeds observations back (see [Pixelagent’s ReAct example](https://github.com/pixeltable/pixelagent/tree/main/examples/planning)). **Pixeltable approach:** the reasoning loop lives in a UDF that inserts rows into a tool-calling table and reads back results. The table stores every thought-action-observation triple for full observability.
question → \[THOUGHT → ACTION → OBSERVATION] × N → final answer```python theme={null} import re # Define a tool for the ReAct agent @pxt.udf def lookup_population(country: str) -> str: """Look up the approximate population of a country.""" populations = { 'united states': '331 million', 'china': '1.4 billion', 'india': '1.4 billion', 'germany': '84 million', 'brazil': '214 million', 'japan': '125 million', } return populations.get( country.lower(), f'Population data not available for {country}' ) react_tools = pxt.tools(lookup_population) ``` ```python theme={null} # Build a tool-calling table that the ReAct loop will insert into react_steps = pxt.create_table( 'agentic_patterns/react_steps', {'step': pxt.Int, 'prompt': pxt.String, 'system_prompt': pxt.String}, ) react_steps.add_computed_column( response=openai.chat_completions( messages=[ {'role': 'system', 'content': react_steps.system_prompt}, {'role': 'user', 'content': react_steps.prompt}, ], model='gpt-4o-mini', tools=react_tools, ) ) react_steps.add_computed_column( answer=react_steps.response.choices[0].message.content.astype( pxt.String ) ) react_steps.add_computed_column( tool_output=openai.invoke_tools(react_tools, react_steps.response) ) ```
Created table 'react\_steps'. Added 0 column values with 0 errors in 0.00 s Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.00 s No rows affected.```python theme={null} # The ReAct loop: reason → act → observe, repeated until done REACT_SYSTEM = ( "You are a research assistant. Answer the user's question step by step.\n" 'Available tools: lookup_population\n\n' 'On each turn, respond in this exact format:\n' 'THOUGHT:
question → generate plan → execute step 1 → execute step 2 → ... → synthesize```python theme={null} import json as json_mod planner = pxt.create_table( 'agentic_patterns/planner', {'question': pxt.String} ) # Step 1: generate a plan as structured JSON planner.add_computed_column( plan_response=openai.chat_completions( messages=[ { 'role': 'user', 'content': 'Break this question into 2-3 research steps. ' 'Return ONLY a JSON object like {"steps": ["sub-question 1", "sub-question 2"]}. ' 'No other text.\n\n' 'Question: ' + planner.question, } ], model='gpt-4o-mini', ) ) planner.add_computed_column( plan_text=planner.plan_response.choices[0].message.content.astype( pxt.String ) ) ```
Created table 'planner'. Added 0 column values with 0 errors in 0.00 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Step 2: parse the plan and execute each sub-question, then synthesize @pxt.udf def execute_plan(plan_json: str, original_question: str) -> list[dict]: """Parse the plan JSON and return structured sub-questions.""" try: data = json_mod.loads(plan_json) # Handle both {"steps": [...]} and direct [...] steps = ( data if isinstance(data, list) else data.get('steps', data.get('questions', [])) ) return [ {'step': i + 1, 'sub_question': q} for i, q in enumerate(steps) ] except (json_mod.JSONDecodeError, TypeError): return [{'step': 1, 'sub_question': original_question}] planner.add_computed_column( plan_steps=execute_plan(planner.plan_text, planner.question) ) ```
Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Step 3: execute the plan — answer each sub-question, then synthesize @pxt.udf def format_plan_for_execution( plan_steps: list[dict], original_question: str ) -> str: """Format the plan steps into a single execution prompt.""" step_list = '\n'.join( f'{s["step"]}. {s["sub_question"]}' for s in plan_steps ) return ( f'Answer each of these research sub-questions briefly, ' f'then provide a final synthesis that answers the original question.\n\n' f'Original question: {original_question}\n\n' f'Sub-questions:\n{step_list}' ) planner.add_computed_column( exec_prompt=format_plan_for_execution( planner.plan_steps, planner.question ) ) planner.add_computed_column( exec_response=openai.chat_completions( messages=[{'role': 'user', 'content': planner.exec_prompt}], model='gpt-4o-mini', ) ) planner.add_computed_column( final_answer=planner.exec_response.choices[0].message.content.astype( pxt.String ) ) ```
Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} planner.insert( [ { 'question': 'What are the economic and environmental trade-offs of electric vehicles vs hydrogen fuel cells?' } ] ) row = planner.select( planner.question, planner.plan_text, planner.final_answer ).collect() print('Plan:', row['plan_text'][0]) print() print('Answer:', row['final_answer'][0][:500]) ``` The plan (stored in `plan_steps`) is fully inspectable. The execution step answers all sub-questions in a single LLM call, but this could also use parallelization (Pattern 3) to answer each sub-question independently and merge the results. Planning and ReAct compose naturally with any of the six architectural patterns. ## Choosing a Pattern ### Six architectural patterns ### Two cross-cutting reasoning strategies Patterns compose naturally. An orchestrator-worker system might use routing in the orchestrator, tool use within a worker, and ReAct reasoning inside the tool-calling loop. Because each pattern is just a set of computed columns on a table, combining them requires no special glue code. ## See Also **Pixeltable cookbooks:** * [Use tool calling with LLMs](/howto/cookbooks/agents/llm-tool-calling) — deep dive into `pxt.tools()`, `invoke_tools()`, and MCP server integration * [Build an agent with persistent memory](/howto/cookbooks/agents/pattern-agent-memory) — embedding indexes for semantic memory recall * [Build a RAG pipeline](/howto/cookbooks/agents/pattern-rag-pipeline) — document chunking, embedding, and retrieval-augmented generation * [Look up structured data with retrieval UDFs](/howto/cookbooks/agents/pattern-data-lookup) — `pxt.retrieval_udf()` for key-based lookups * [Use a table pipeline as a reusable function](/howto/cookbooks/agents/pattern-table-as-udf) — `pxt.udf(table)` explained in depth **Pixelagent examples** (imperative implementations of the same patterns): * [Reflection loop](https://github.com/pixeltable/pixelagent/tree/main/examples/reflection) — main agent + critic agent with iterative refinement * [ReAct / Planning](https://github.com/pixeltable/pixelagent/tree/main/examples/planning) — step-by-step reasoning with tool calls * [Tool calling](https://github.com/pixeltable/pixelagent/tree/main/examples/tool-calling) — OpenAI, Anthropic, and Bedrock tool integration * [Memory](https://github.com/pixeltable/pixelagent/tree/main/examples/memory) — persistent and semantic memory management **External references:** * [OpenAI’s Practical Guide to Building Agents](https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf) — the six architectural patterns * [Anthropic: How we built our multi-agent research system](https://www.anthropic.com/engineering/multi-agent-research-system) — orchestrator-worker at scale * [Pydantic AI: Multi-agent applications](https://ai.pydantic.dev/multi-agent-applications/#agent-delegation) — agent delegation patterns # Use tool calling and MCP servers with LLMs Source: https://docs.pixeltable.com/howto/cookbooks/agents/llm-tool-calling
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'tools\_demo'. \### Define tools as UDFs ```python theme={null} # Define tool functions as Pixeltable UDFs @pxt.udf def get_weather(city: str) -> str: """Get the current weather for a city.""" # In production, call a real weather API weather_data = { 'new york': 'Sunny, 72°F', 'london': 'Cloudy, 58°F', 'tokyo': 'Rainy, 65°F', 'paris': 'Partly cloudy, 68°F', } return weather_data.get( city.lower(), f'Weather data not available for {city}' ) @pxt.udf def get_stock_price(symbol: str) -> str: """Get the current stock price for a symbol.""" # In production, call a real stock API prices = { 'AAPL': '$178.50', 'GOOGL': '$141.25', 'MSFT': '$378.90', 'AMZN': '$185.30', } return prices.get(symbol.upper(), f'Price not available for {symbol}') ``` ```python theme={null} # Create a Tools object with our functions tools = pxt.tools(get_weather, get_stock_price) ``` ### Create tool-calling pipeline ```python theme={null} # Create table for queries queries = pxt.create_table('tools_demo/queries', {'query': pxt.String}) ```
Created table 'queries'.```python theme={null} # Add LLM call with tools queries.add_computed_column( response=openai.chat_completions( messages=[{'role': 'user', 'content': queries.query}], model='gpt-4o-mini', tools=tools, # Pass tools to the LLM ) ) ```
Added 0 column values with 0 errors in 0.00 s No rows affected.```python theme={null} # Automatically execute tool calls and get results queries.add_computed_column( tool_results=openai.invoke_tools(tools, queries.response) ) ```
Added 0 column values with 0 errors in 0.01 s No rows affected.### Run tool-enabled queries ```python theme={null} # Insert queries that require tool calls sample_queries = [ {'query': "What's the weather in Tokyo?"}, {'query': "What's the stock price of Apple?"}, { 'query': "What's the weather in Paris and the price of Microsoft stock?" }, ] queries.insert(sample_queries) ```
Inserted 3 rows with 0 errors in 4.16 s (0.72 rows/s) 3 rows inserted.```python theme={null} # View results queries.select(queries.query, queries.tool_results).collect() ``` ## Using MCP Servers as Tools The [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) is an open protocol that standardizes how applications provide context to LLMs. Pixeltable can connect to MCP servers and use their exposed tools as UDFs. ### Why MCP? ### Create an MCP Server First, create an MCP server with tools you want to expose. Save this as `mcp_server.py`: ```python theme={null} from mcp.server.fastmcp import FastMCP mcp = FastMCP('PixeltableDemo', stateless_http=True) @mcp.tool() def calculate_discount(price: float, discount_percent: float) -> float: """Calculate the discounted price.""" return price * (1 - discount_percent / 100) @mcp.tool() def check_inventory(product_id: str) -> str: """Check inventory status for a product.""" # In production, query your inventory database inventory = { 'SKU001': 'In stock (42 units)', 'SKU002': 'Low stock (3 units)', 'SKU003': 'Out of stock', } return inventory.get(product_id, f'Unknown product: {product_id}') if __name__ == '__main__': mcp.run(transport='streamable-http') ``` Run the server: `python mcp_server.py` (it will listen on `http://localhost:8000/mcp`) ### Connect to MCP Server and Use Tools ```python theme={null} # Connect to the MCP server using pxt.mcp_udfs() # This creates a Pixeltable UDF for each tool exposed by the server # See: https://docs.pixeltable.com/platform/custom-functions#5-mcp-udfs mcp_tools = pxt.mcp_udfs('https://docs.pixeltable.com/mcp') # View available tools - each is now a callable Pixeltable function for tool in mcp_tools: print(f'- {tool.name}: {tool.comment()}') ```
- SearchPixeltableDocumentation: Search across the Pixeltable Documentation knowledge base to find relevant information, code examples, API references, and guides. Use this tool when you need to answer questions about Pixeltable Documentation, find specific documentation, understand how features work, or locate implementation details. The search returns contextual content with titles and direct links to the documentation pages.```python theme={null} # Bundle MCP tools for LLM use mcp_toolset = pxt.tools(*mcp_tools) # Create a table with MCP tool-calling pipeline mcp_queries = pxt.create_table( 'tools_demo/mcp_queries', {'query': pxt.String} ) # Add LLM call with MCP tools mcp_queries.add_computed_column( response=openai.chat_completions( messages=[{'role': 'user', 'content': mcp_queries.query}], model='gpt-4o-mini', tools=mcp_toolset, ) ) # Execute MCP tool calls mcp_queries.add_computed_column( tool_results=openai.invoke_tools(mcp_toolset, mcp_queries.response) ) # View the schema - note that mcp_toolset is stored as persistent metadata # Every subsequent insert will use these same tools automatically mcp_queries.describe() ```
Created table 'mcp\_queries'. Added 0 column values with 0 errors in 0.00 s Added 0 column values with 0 errors in 0.01 s```python theme={null} # Test with e-commerce queries mcp_queries.insert( [ {'query': 'What is Pixeltable?'}, {'query': 'How to use OpenAI in Pixeltable?'}, ] ) mcp_queries.select(mcp_queries.query, mcp_queries.tool_results).collect() ``` ```python theme={null} # Extract the search result with a named column mcp_queries.select( search_result=mcp_queries.tool_results[ 'SearchPixeltableDocumentation' ][0] ).collect() ``` ## Explanation **Tool calling flow:**
Query → LLM decides tool → invoke\_tools executes → Results**Key components:** **MCP integration:**
MCP Server → pxt.mcp\_udfs() → pxt.tools() → LLM tool callingMCP servers expose tools via a standardized protocol. Pixeltable’s `mcp_udfs()` connects to any MCP server and returns the tools as callable UDFs that can be bundled with `pxt.tools()` for LLM use. **Supported providers:** ## See also * [Build a RAG pipeline](/howto/cookbooks/agents/pattern-rag-pipeline) - Retrieval-augmented generation * [Run local LLMs](/howto/providers/working-with-ollama) - Local model inference * [Multimodal MCP Servers](/libraries/mcp) - Pixeltable’s MCP server collection * [Custom Functions](/platform/custom-functions) - More about UDFs and MCP integration # Build an agent with memory Source: https://docs.pixeltable.com/howto/cookbooks/agents/pattern-agent-memory
Created directory 'agent\_demo'. \### Create memory bank ```python theme={null} # Create memory bank table memories = pxt.create_table( 'agent_demo/memories', { 'content': pxt.String, # The memory content 'category': pxt.String, # Optional category (preference, fact, etc.) 'created_at': pxt.Timestamp, # When the memory was stored }, ) ```
Created table 'memories'.```python theme={null} # Add embedding index for semantic search on content memories.add_embedding_index( column='content', string_embed=embeddings.using(model='text-embedding-3-small'), ) ``` ### Define retrieval function ```python theme={null} # Define a query function to retrieve relevant memories @pxt.query def recall_memories(context: str, top_k: int = 3): """Retrieve memories relevant to the current context.""" sim = memories.content.similarity(string=context) return ( memories.where(sim > 0.5) .order_by(sim, asc=False) .limit(top_k) .select(content=memories.content, category=memories.category) ) ``` ### Store some memories ```python theme={null} # Store some initial memories initial_memories = [ { 'content': 'User prefers Python for data analysis', 'category': 'preference', 'created_at': datetime.now(), }, { 'content': 'The project deadline is March 15, 2024', 'category': 'fact', 'created_at': datetime.now(), }, { 'content': 'User works at a startup in San Francisco', 'category': 'fact', 'created_at': datetime.now(), }, { 'content': 'Budget for the ML project is $50,000', 'category': 'fact', 'created_at': datetime.now(), }, { 'content': 'User prefers concise explanations over detailed ones', 'category': 'preference', 'created_at': datetime.now(), }, ] memories.insert(initial_memories) ```
Inserting rows into \`memories\`: 5 rows \[00:00, 590.53 rows/s] Inserted 5 rows with 0 errors. 5 rows inserted, 15 values computed.### Create conversation table with memory retrieval ```python theme={null} # Create conversation table conversations = pxt.create_table( 'agent_demo/conversations', {'user_message': pxt.String} ) ```
Created table 'conversations'.```python theme={null} # Add memory retrieval step conversations.add_computed_column( relevant_memories=recall_memories(conversations.user_message, top_k=3) ) ```
Added 0 column values with 0 errors. No rows affected.```python theme={null} # Build prompt with memories @pxt.udf def build_memory_prompt( user_message: str, relevant_memories: list[dict] ) -> str: memory_text = '\n'.join( [f'- {m["content"]}' for m in relevant_memories] ) return f"""You are a helpful assistant with access to the following memories about the user: {memory_text} Use these memories to personalize your response when relevant. User: {user_message} Assistant:""" conversations.add_computed_column( prompt=build_memory_prompt( conversations.user_message, conversations.relevant_memories ) ) ```
Added 0 column values with 0 errors. No rows affected.```python theme={null} # Generate response with memory context conversations.add_computed_column( response=chat_completions( messages=[{'role': 'user', 'content': conversations.prompt}], model='gpt-4o-mini', ) ) conversations.add_computed_column( assistant_reply=conversations.response.choices[0].message.content ) ```
Added 0 column values with 0 errors. Added 0 column values with 0 errors. No rows affected.### Chat with memory-aware agent ```python theme={null} # Test the memory-aware agent test_messages = [ { 'user_message': 'What programming language should I use for this project?' }, {'user_message': 'When do I need to finish this?'}, {'user_message': 'How much can I spend on cloud resources?'}, ] conversations.insert(test_messages) ```
Inserting rows into \`conversations\`: 3 rows \[00:00, 1047.88 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 18 values computed.```python theme={null} # View conversations with memory conversations.select( conversations.user_message, conversations.relevant_memories, conversations.assistant_reply, ).collect() ``` ## Explanation **Memory-aware agent architecture:**
User Message → Retrieve Memories → Build Prompt → LLM Response ↓ Memory Bank (with embeddings)**Key components:** **Adding new memories:** ```python theme={null} memories.insert([{ 'content': 'New information to remember', 'category': 'fact', 'created_at': datetime.now() }]) ``` ## See also * [Build a RAG pipeline](/howto/cookbooks/agents/pattern-rag-pipeline) - Document retrieval * [Use tool calling](/howto/cookbooks/agents/llm-tool-calling) - Function calling with LLMs * [Pixelbot](https://github.com/pixeltable/pixelbot) - Full agent implementation # Look up structured data with retrieval UDFs Source: https://docs.pixeltable.com/howto/cookbooks/agents/pattern-data-lookup
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'lookup\_demo'. \### Create a product catalog table ```python theme={null} # Create a product catalog products = pxt.create_table( 'lookup_demo/products', { 'sku': pxt.String, 'name': pxt.String, 'price': pxt.Float, 'category': pxt.String, }, ) products.insert( [ { 'sku': 'LAPTOP-001', 'name': 'MacBook Pro 14"', 'price': 1999.00, 'category': 'electronics', }, { 'sku': 'LAPTOP-002', 'name': 'ThinkPad X1', 'price': 1499.00, 'category': 'electronics', }, { 'sku': 'PHONE-001', 'name': 'iPhone 15 Pro', 'price': 999.00, 'category': 'electronics', }, { 'sku': 'CHAIR-001', 'name': 'Ergonomic Office Chair', 'price': 449.00, 'category': 'furniture', }, { 'sku': 'DESK-001', 'name': 'Standing Desk', 'price': 699.00, 'category': 'furniture', }, ] ) products.collect() ```
Created table 'products'. Inserting rows into \`products\`: 5 rows \[00:00, 502.31 rows/s] Inserted 5 rows with 0 errors.### Create a lookup function with retrieval\_udf ```python theme={null} # Create a lookup function that searches by SKU get_product = pxt.retrieval_udf( products, name='get_product', description='Look up a product by its SKU code', parameters=['sku'], # Only use SKU as the lookup key limit=1, # Return at most 1 result ) # Check the function signature ``` ```python theme={null} # Look up a product by SKU result = products.select(get_product(sku='LAPTOP-001')).limit(1).collect() ``` ### Look up by category (multiple results) ```python theme={null} # Create a category lookup (returns multiple products) get_by_category = pxt.retrieval_udf( products, name='get_by_category', description='Get all products in a category', parameters=['category'], limit=10, # Return up to 10 products ) # Find all electronics products.select(get_by_category(category='electronics')).limit( 1 ).collect() ``` ### Use lookups for data enrichment ```python theme={null} # Create an orders table orders = pxt.create_table( 'lookup_demo/orders', { 'order_id': pxt.String, 'product_sku': pxt.String, 'quantity': pxt.Int, }, ) orders.insert( [ { 'order_id': 'ORD-001', 'product_sku': 'LAPTOP-001', 'quantity': 2, }, { 'order_id': 'ORD-002', 'product_sku': 'PHONE-001', 'quantity': 1, }, { 'order_id': 'ORD-003', 'product_sku': 'CHAIR-001', 'quantity': 4, }, ] ) ```
Created table 'orders'. Inserting rows into \`orders\`: 3 rows \[00:00, 1186.28 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 6 values computed.```python theme={null} # Add a computed column that enriches orders with product details orders.add_computed_column( product_info=get_product(sku=orders.product_sku) ) # View enriched orders orders.select( orders.order_id, orders.product_sku, orders.quantity, orders.product_info, ).collect() ```
Added 3 column values with 0 errors.## Explanation **`retrieval_udf` parameters:** **Use cases:** **Tips:** * Use `limit=1` for unique key lookups * Specify only needed columns in `parameters` for cleaner APIs * Add descriptions for LLM tool integration ## See also * [Use tool calling with LLMs](/howto/cookbooks/agents/llm-tool-calling) - Use retrieval UDFs as LLM tools * [Build a RAG pipeline](/howto/cookbooks/agents/pattern-rag-pipeline) - Semantic search with `@pxt.query` # Build a RAG pipeline Source: https://docs.pixeltable.com/howto/cookbooks/agents/pattern-rag-pipeline
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'rag\_demo'. \### Step 1: create document store with embeddings ```python theme={null} # Create table for document chunks chunks = pxt.create_table( 'rag_demo/chunks', {'doc_id': pxt.String, 'chunk_text': pxt.String} ) ```
Created table 'chunks'.```python theme={null} # Add embedding index for semantic search chunks.add_embedding_index( column='chunk_text', string_embed=embeddings.using(model='text-embedding-3-small'), ) ``` ### Step 2: load documents ```python theme={null} # Sample knowledge base (in production, load from files/database) documents = [ { 'doc_id': 'password-reset', 'chunk_text': 'To reset your password, go to the login page and click "Forgot Password". Enter your email address and you will receive a reset link within 5 minutes. The link expires after 24 hours.', }, { 'doc_id': 'password-reset', 'chunk_text': 'Password requirements: minimum 8 characters, at least one uppercase letter, one number, and one special character. Passwords expire every 90 days for security.', }, { 'doc_id': 'account-settings', 'chunk_text': 'To update your profile, navigate to Settings > Account. You can change your display name, email address, and notification preferences. Changes take effect immediately.', }, { 'doc_id': 'billing', 'chunk_text': 'Billing occurs on the first of each month. You can view invoices under Settings > Billing. To change your payment method, click "Update Payment" and enter your new card details.', }, { 'doc_id': 'api-access', 'chunk_text': 'API keys can be generated in Settings > Developer. Each key has configurable permissions. Rate limits are 1000 requests per minute for standard plans, 10000 for enterprise.', }, ] chunks.insert(documents) ```
Inserting rows into \`chunks\`: 5 rows \[00:00, 345.31 rows/s] Inserted 5 rows with 0 errors. 5 rows inserted, 15 values computed.### Step 3: create the RAG query function ```python theme={null} # Define a query function that retrieves context @pxt.query def retrieve_context(query: str, top_k: int = 3): """Retrieve the most relevant chunks for a query.""" sim = chunks.chunk_text.similarity(string=query) return ( chunks.where(sim > 0.5) .order_by(sim, asc=False) .limit(top_k) .select(doc_id=chunks.doc_id, text=chunks.chunk_text) ) ``` ```python theme={null} # View retrieved context for a query query = 'What are the key features?' context_chunks = retrieve_context(query) context_chunks ```
retrieve\_context('What are the key features?')
### Step 4: generate answers with context
```python theme={null}
# Create a table for questions/answers
qa = pxt.create_table('rag_demo/qa', {'question': pxt.String})
```
Created table 'qa'.```python theme={null} # Add retrieval step qa.add_computed_column(context=retrieve_context(qa.question, top_k=3)) ```
Added 0 column values with 0 errors. No rows affected.```python theme={null} # Build the RAG prompt @pxt.udf def build_rag_prompt(question: str, context: list[dict]) -> str: context_text = '\n\n'.join( [f'[{c["doc_id"]}]: {c["text"]}' for c in context] ) return f"""Answer the question based only on the provided context. If the context doesn't contain the answer, say "I don't have information about that." Context: {context_text} Question: {question} Answer:""" qa.add_computed_column(prompt=build_rag_prompt(qa.question, qa.context)) ```
Added 0 column values with 0 errors. No rows affected.```python theme={null} # Generate answer qa.add_computed_column( response=chat_completions( messages=[{'role': 'user', 'content': qa.prompt}], model='gpt-4o-mini', ) ) qa.add_computed_column(answer=qa.response.choices[0].message.content) ```
Added 0 column values with 0 errors. Added 0 column values with 0 errors. No rows affected.### Ask questions ```python theme={null} # Insert questions questions = [ {'question': 'How do I reset my password?'}, {'question': 'What are the API rate limits?'}, {'question': 'When am I billed?'}, ] qa.insert(questions) ```
Inserting rows into \`qa\`: 3 rows \[00:00, 872.12 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 18 values computed.```python theme={null} # View answers qa.select(qa.question, qa.answer).collect() ``` ## Explanation **RAG pipeline flow:**
Question → Embed → Retrieve similar chunks → Build prompt with context → Generate answer**Key components:** **Scaling tips:** * Use `doc-chunk-for-rag` recipe to split long documents * Adjust `top_k` to balance context size vs. relevance * Consider metadata filtering for large knowledge bases ## See also * [Chunk documents for RAG](/howto/cookbooks/text/doc-chunk-for-rag) - Split documents into chunks * [Create text embeddings](/howto/cookbooks/search/embed-text-openai) - Embedding fundamentals * [Semantic text search](/howto/cookbooks/search/search-semantic-text) - Search patterns # Use a table pipeline as a reusable function Source: https://docs.pixeltable.com/howto/cookbooks/agents/pattern-table-as-udf
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'table\_udf\_demo'. \### Create an agent table with computed columns You create a table that encapsulates a complete pipeline. This example builds a summarization agent: ```python theme={null} # Create the agent table with input column summarizer = pxt.create_table( 'table_udf_demo/summarizer', {'text': pxt.String} ) ```
Created table 'summarizer'.```python theme={null} # Add the LLM call as a computed column summarizer.add_computed_column( response=chat_completions( messages=[ { 'role': 'user', 'content': 'Summarize this in one sentence:\n\n' + summarizer.text, } ], model='gpt-4o-mini', ) ) ```
Added 0 column values with 0 errors. No rows affected.```python theme={null} # Extract the summary text summarizer.add_computed_column( summary=summarizer.response.choices[0].message.content ) ```
Added 0 column values with 0 errors. No rows affected.### Convert the table to a UDF You use `pxt.udf(table, return_value=...)` to convert the table into a callable function. The `return_value` specifies which column to return: ```python theme={null} # Convert the summarizer table into a callable UDF summarize = pxt.udf(summarizer, return_value=summarizer.summary) ``` ### Use the table UDF in another table You can now use `summarize()` as a computed column in any other table: ```python theme={null} # Create a table that uses the summarizer articles = pxt.create_table( 'table_udf_demo/articles', {'title': pxt.String, 'content': pxt.String}, ) ```
Created table 'articles'.```python theme={null} # Add the table UDF as a computed column articles.add_computed_column(summary=summarize(text=articles.content)) ```
Added 0 column values with 0 errors. No rows affected.```python theme={null} # Insert articles - summaries are generated automatically articles.insert( [ { 'title': 'Climate Report', 'content': 'Global temperatures rose by 1.2 degrees Celsius above pre-industrial levels last year, marking the hottest year on record. Scientists attribute this to continued greenhouse gas emissions and a strong El Nino pattern. The report calls for immediate action to reduce carbon emissions.', }, { 'title': 'Tech Merger', 'content': 'Two major semiconductor companies announced a merger valued at $50 billion. The combined entity will control 30% of the global chip market. Regulators in multiple countries will review the deal over the next 18 months.', }, ] ) ```
Inserting rows into \`articles\`: 2 rows \[00:00, 196.58 rows/s] Inserted 2 rows with 0 errors. 2 rows inserted, 6 values computed.```python theme={null} # View results articles.select(articles.title, articles.summary).collect() ``` ## Explanation **How table UDFs work:**
Consumer table row → Table UDF called → Agent table inserts row → Computed columns run → Return value extracted → Consumer gets result**When to use table UDFs vs `@pxt.query`:** **Key benefits:** * **Encapsulation**: Hide complex pipeline details behind a simple function call * **Reusability**: Use the same agent from multiple consumer tables * **Persistence**: All intermediate results are stored in the agent table for debugging * **Composition**: Chain agents together for multi-stage workflows ## See also * [Look up structured data](/howto/cookbooks/agents/pattern-data-lookup) - Simple key-based lookups with `retrieval_udf` * [Build a RAG pipeline](/howto/cookbooks/agents/pattern-rag-pipeline) - Retrieval with `@pxt.query` * [Use tool calling with LLMs](/howto/cookbooks/agents/llm-tool-calling) - Add tools to agent tables # Extract audio from video Source: https://docs.pixeltable.com/howto/cookbooks/audio/audio-extract-from-video
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'audio\_extract\_demo'. \### Extract audio from video ```python theme={null} # Create table for videos videos = pxt.create_table( 'audio_extract_demo/videos', {'title': pxt.String, 'video': pxt.Video} ) ```
Created table 'videos'.```python theme={null} # Add computed column to extract audio as MP3 videos.add_computed_column( audio=extract_audio(videos.video, format='mp3') ) ```
Added 0 column values with 0 errors. No rows affected.```python theme={null} # Insert a sample video (from multimedia-commons with audio) video_url = 's3://multimedia-commons/data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4' videos.insert([{'title': 'Sample Video', 'video': video_url}]) ```
Inserting rows into \`videos\`: 1 rows \[00:00, 207.52 rows/s] Inserted 1 row with 0 errors. 1 row inserted, 4 values computed.```python theme={null} # View results videos.select(videos.title, videos.audio).collect() ``` ### Chain with transcription Add transcription as a follow-up computed column: ```python theme={null} # Install whisper for transcription %pip install -qU openai-whisper ``` ```python theme={null} from pixeltable.functions import whisper # Add transcription of the extracted audio videos.add_computed_column( transcription=whisper.transcribe(videos.audio, model='base.en') ) ```
Added 1 column value with 0 errors. 1 row updated, 1 value computed.```python theme={null} # Extract the transcript text videos.add_computed_column(transcript=videos.transcription.text) ```
Added 1 column value with 0 errors. 1 row updated, 1 value computed.```python theme={null} # View the full pipeline results videos.select(videos.title, videos.transcript).collect() ``` ## Explanation **Audio format options:** **Pipeline flow:**
Video → extract\_audio → Audio → whisper.transcribe → TranscriptEach step is a computed column. When you insert a new video: 1. Audio is extracted automatically 2. Whisper transcribes the audio 3. All results are cached for future queries ## See also * [Transcribe audio](/howto/cookbooks/audio/audio-transcribe) - Audio-only transcription * [Summarize podcasts](/howto/cookbooks/audio/audio-summarize-podcast) - Transcribe and summarize * [Extract video frames](/howto/cookbooks/video/video-extract-frames) - Work with video frames # Summarize podcasts and audio Source: https://docs.pixeltable.com/howto/cookbooks/audio/audio-summarize-podcast
Created directory 'podcast\_demo'. \### Create the pipeline Create a table with audio input, then add computed columns for transcription and summarization: ```python theme={null} # Create table for audio files podcasts = pxt.create_table( 'podcast_demo/episodes', {'title': pxt.String, 'audio': pxt.Audio} ) ```
Created table 'episodes'.```python theme={null} # Step 1: Transcribe with local Whisper (uses GPU if available) podcasts.add_computed_column( transcription=whisper.transcribe(podcasts.audio, model='base.en') ) ```
Added 0 column values with 0 errors. No rows affected.```python theme={null} # Extract the text from transcription result (cast to String for concatenation) podcasts.add_computed_column( transcript_text=podcasts.transcription.text.astype(pxt.String) ) ```
Added 0 column values with 0 errors. No rows affected.```python theme={null} # Step 2: Summarize the transcript with OpenAI summary_prompt = ( """Summarize this transcript in 2-3 sentences, then list 3 key points. Transcript: """ + podcasts.transcript_text ) podcasts.add_computed_column( summary_response=openai.chat_completions( messages=[{'role': 'user', 'content': summary_prompt}], model='gpt-4o-mini', ) ) ```
Added 0 column values with 0 errors. No rows affected.```python theme={null} # Extract summary text from response podcasts.add_computed_column( summary=podcasts.summary_response.choices[0].message.content ) ```
Added 0 column values with 0 errors. No rows affected.### Process audio files Insert audio files and watch the pipeline run automatically: ```python theme={null} # Insert sample audio audio_url = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/10-minute%20tour%20of%20Pixeltable.mp3' podcasts.insert([{'title': 'Pixeltable Tour', 'audio': audio_url}]) ```
Inserting rows into \`episodes\`: 1 rows \[00:00, 185.18 rows/s] Inserted 1 row with 0 errors. 1 row inserted, 8 values computed.```python theme={null} # View transcript podcasts.select(podcasts.title, podcasts.transcript_text).collect() ``` ```python theme={null} # View summary podcasts.select(podcasts.title, podcasts.summary).collect() ``` ## Explanation **Pipeline architecture:**
Audio → Whisper transcription → Transcript text → LLM summarization → SummaryEach step is a computed column that depends on the previous one. When you insert a new audio file, all steps run automatically in sequence. **Whisper model options:** For production with varied audio quality, use `small.en` or larger. ## See also * [Transcribe audio](/howto/cookbooks/audio/audio-transcribe) - Basic audio transcription * [Summarize text](/howto/cookbooks/text/text-summarize) - Text summarization patterns # Convert text to speech Source: https://docs.pixeltable.com/howto/cookbooks/audio/audio-text-to-speech
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'tts\_demo'. \### Create text-to-speech pipeline ```python theme={null} # Create table for articles articles = pxt.create_table( 'tts_demo/articles', {'title': pxt.String, 'content': pxt.String} ) ```
Created table 'articles'.```python theme={null} # Add audio generation column articles.add_computed_column( audio=speech(articles.content, model='tts-1', voice='alloy') ) ```
Added 0 column values with 0 errors. No rows affected.### Generate audio ```python theme={null} # Insert sample articles sample_articles = [ { 'title': 'Welcome to AI', 'content': 'Artificial intelligence is transforming how we work and live. From smart assistants to autonomous vehicles, AI is becoming part of our daily lives.', }, { 'title': 'Getting Started', 'content': 'To begin your journey with machine learning, start by understanding the basics of data preparation and model training.', }, ] articles.insert(sample_articles) ```
Inserting rows into \`articles\`: 2 rows \[00:00, 423.90 rows/s] Inserted 2 rows with 0 errors. 2 rows inserted, 6 values computed.```python theme={null} # View articles with generated audio articles.select( articles.title, articles.content, articles.audio ).collect() ``` ## Explanation **OpenAI TTS models:** **Voice options:** **Tips:** * Use `tts-1` for drafts and real-time applications * Use `tts-1-hd` for final production audio * Audio is cached—no regeneration on queries ## See also * [Transcribe audio](/howto/cookbooks/audio/audio-transcribe) - Convert audio to text * [Summarize podcasts](/howto/cookbooks/audio/audio-summarize-podcast) - Transcribe and summarize audio # Transcribe audio files with Whisper Source: https://docs.pixeltable.com/howto/cookbooks/audio/audio-transcribe
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Converting metadata from version 45 to 46 Created directory 'audio\_demo'. \```python theme={null} # Create table for audio files audio = pxt.create_table('audio_demo/files', {'audio': pxt.Audio}) ```
Created table 'files'.```python theme={null} # Insert a sample audio file (video files also work - audio is extracted automatically) audio.insert( [ { 'audio': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/audio-transcription-demo/Lex-Fridman-Podcast-430-Excerpt-0.mp4' } ] ) ```
Inserted 1 row with 0 errors in 1.05 s (0.95 rows/s) 1 row inserted.### Split into segments Create a view that splits audio into 30-second segments with overlap: ```python theme={null} # Split audio into segments for transcription segments = pxt.create_view( 'audio_demo/segments', audio, iterator=audio_splitter( audio.audio, duration=30.0, # 30-second segments overlap=2.0, # 2-second overlap for context min_segment_duration=5.0, # Drop segments shorter than 5 seconds ), ) ``` ```python theme={null} # View the segments segments.select(segments.segment_start, segments.segment_end).collect() ``` ### Transcribe with Whisper Add a computed column that transcribes each segment: ```python theme={null} # Add transcription column (runs locally - no API key needed) segments.add_computed_column( transcription=whisper.transcribe( audio=segments.audio_segment, model='base.en', # Options: tiny.en, base.en, small.en, medium.en, large ) ) ```
Added 2 column values with 0 errors in 3.35 s (0.60 rows/s) 2 rows updated.```python theme={null} # Extract just the text segments.add_computed_column(text=segments.transcription.text) ```
Added 2 column values with 0 errors in 0.06 s (31.82 rows/s) 2 rows updated.```python theme={null} # View transcriptions with timestamps segments.select( segments.segment_start, segments.segment_end, segments.text ).collect() ``` ## Explanation **Whisper models:** Models ending in `.en` are English-only and faster. Remove `.en` for multilingual support. **audio\_splitter parameters:** **Video files work too:** When you insert a video file, Pixeltable automatically extracts the audio track. ## See also * [Iterators documentation](/platform/iterators) * [Whisper library](https://github.com/openai/whisper) # Create custom aggregate functions (UDAs) Source: https://docs.pixeltable.com/howto/cookbooks/core/custom-aggregates-uda
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'uda\_demo'. \### Create sample data ```python theme={null} sales = pxt.create_table( 'uda_demo/sales', { 'region': pxt.String, 'product': pxt.String, 'amount': pxt.Float, 'quantity': pxt.Int, }, ) sales.insert( [ { 'region': 'North', 'product': 'Widget', 'amount': 100.0, 'quantity': 5, }, { 'region': 'North', 'product': 'Gadget', 'amount': 250.0, 'quantity': 2, }, { 'region': 'North', 'product': 'Widget', 'amount': 150.0, 'quantity': 8, }, { 'region': 'South', 'product': 'Widget', 'amount': 200.0, 'quantity': 10, }, { 'region': 'South', 'product': 'Gadget', 'amount': 175.0, 'quantity': 3, }, { 'region': 'East', 'product': 'Widget', 'amount': 125.0, 'quantity': 6, }, ] ) sales.collect() ```
Created table 'sales'. Inserting rows into \`sales\`: 0 rows \[00:00, ? rows/s] Inserting rows into \`sales\`: 6 rows \[00:00, 609.56 rows/s] Inserted 6 rows with 0 errors.### Variance UDA (not built-in) ```python theme={null} # A UDA is a class that inherits from pxt.Aggregator # It must implement: __init__, update, and value @pxt.uda class variance(pxt.Aggregator): """Compute population variance using Welford's online algorithm.""" def __init__(self): self.count = 0 self.mean = 0.0 self.m2 = 0.0 # Sum of squared differences from mean def update(self, val: float) -> None: if val is not None: self.count += 1 delta = val - self.mean self.mean += delta / self.count delta2 = val - self.mean self.m2 += delta * delta2 def value(self) -> float: if self.count < 1: return 0.0 return self.m2 / self.count # Population variance ``` ```python theme={null} # Use like any built-in aggregate sales.select(variance(sales.amount)).collect() ``` ```python theme={null} # Use in group_by queries sales.group_by(sales.region).select( sales.region, amount_variance=variance(sales.amount) ).collect() ``` ### String concatenation UDA ```python theme={null} @pxt.uda class string_agg(pxt.Aggregator): """Concatenate strings with a comma separator.""" def __init__(self): self.values = [] def update(self, val: str) -> None: if val is not None: self.values.append(val) def value(self) -> str: return ', '.join(self.values) ``` ```python theme={null} # List all products sold in each region sales.group_by(sales.region).select( sales.region, products=string_agg(sales.product) ).collect() ``` ### Collect values into a list ```python theme={null} @pxt.uda class collect_list(pxt.Aggregator): """Collect all values into a list.""" def __init__(self): self.items = [] def update(self, val: float) -> None: if val is not None: self.items.append(val) def value(self) -> list[float]: return self.items ``` ```python theme={null} # Get all amounts per region as a list sales.group_by(sales.region).select( sales.region, amounts=collect_list(sales.amount) ).collect() ``` ### Weighted average UDA ```python theme={null} @pxt.uda class weighted_avg(pxt.Aggregator): """Compute weighted average: sum(value * weight) / sum(weight).""" def __init__(self): self.weighted_sum = 0.0 self.weight_sum = 0.0 def update(self, value: float, weight: float) -> None: if value is not None and weight is not None: self.weighted_sum += value * weight self.weight_sum += weight def value(self) -> float: if self.weight_sum == 0: return 0.0 return self.weighted_sum / self.weight_sum ``` ```python theme={null} # Compute quantity-weighted average price per region sales.group_by(sales.region).select( sales.region, avg_price=weighted_avg(sales.amount, sales.quantity) ).collect() ``` ### Mode UDA (most frequent value) ```python theme={null} from collections import Counter @pxt.uda class mode(pxt.Aggregator): """Find the most frequent value in a group.""" def __init__(self): self.counts = Counter() def update(self, val: str) -> None: if val is not None: self.counts[val] += 1 def value(self) -> str: if not self.counts: return None return self.counts.most_common(1)[0][0] ``` ```python theme={null} # Find most common product per region sales.group_by(sales.region).select( sales.region, top_product=mode(sales.product) ).collect() ``` ## Explanation **UDA structure:** ```python theme={null} @pxt.uda class my_aggregate(pxt.Aggregator): def __init__(self): # Initialize state self.state = initial_value def update(self, val: InputType) -> None: # Called for each row # Update internal state with val def value(self) -> OutputType: # Called at the end return self.state ``` **Key points:** * Always handle `None` values in `update()` * Multiple parameters in `update()` enable multi-column aggregations (like `weighted_avg`) * Return type annotation on `value()` determines output column type ## See also * [UDFs in Pixeltable](../../../platform/udfs-in-pixeltable) - Complete guide to custom functions * [Join tables](/howto/cookbooks/core/query-join-tables) - Combine data before aggregating # Split data into multiple rows with iterators Source: https://docs.pixeltable.com/howto/cookbooks/core/data-split-rows
Inserted 1 row with 0 errors in 0.13 s (7.68 rows/s) 1 row inserted.```python theme={null} chunks = pxt.create_view( 'split_demo/doc_chunks', docs, iterator=document_splitter( docs.doc, separators='sentence,token_limit', limit=300 ), ) chunks.select(chunks.text).limit(3).collect() ``` **Available separators:** * `heading` — Split on HTML/Markdown headings * `sentence` — Split on sentence boundaries (requires spacy) * `token_limit` — Split by token count (requires tiktoken) * `char_limit` — Split by character count * `page` — Split by page (PDF only) [SDK Reference: document\_splitter](/sdk/latest/document) ### Extract frames from videos Use `frame_iterator` to extract frames at specified intervals. ```python theme={null} from pixeltable.functions.video import frame_iterator videos = pxt.create_table('split_demo/videos', {'video': pxt.Video}) videos.insert( [ { 'video': 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/bangkok.mp4' } ] ) ```
Inserted 1 row with 0 errors in 1.28 s (0.78 rows/s) 1 row inserted.```python theme={null} frames = pxt.create_view( 'split_demo/frames', videos, iterator=frame_iterator(videos.video, fps=1.0), ) frames.select(frames.frame, frames.frame_attrs).limit(3).collect() ``` **frame\_iterator options:** * `fps` — Frames per second to extract * `num_frames` — Extract exact number of frames (evenly spaced) * `keyframes_only` — Extract only keyframes [SDK Reference: frame\_iterator](/sdk/latest/video) ### Split videos into segments Use `video_splitter` to divide videos into smaller clips. ```python theme={null} from pixeltable.functions.video import video_splitter segments = pxt.create_view( 'split_demo/segments', videos, iterator=video_splitter( videos.video, duration=5.0, min_segment_duration=1.0 ), ) segments.select( segments.segment_start, segments.segment_end, segments.video_segment ).limit(3).collect() ``` **video\_splitter options:** * `duration` — Duration of each segment in seconds * `overlap` — Overlap between segments in seconds * `min_segment_duration` — Drop last segment if shorter than this [SDK Reference: video\_splitter](/sdk/latest/video) ### Split strings into sentences Use `string_splitter` to divide text into sentences. ```python theme={null} from pixeltable.functions.string import string_splitter texts = pxt.create_table('split_demo/texts', {'content': pxt.String}) texts.insert( [ { 'content': 'AI data infrastructure simplifies ML workflows. Declarative pipelines update incrementally. This makes development faster and more maintainable.' } ] ) ```
Inserted 1 row with 0 errors in 0.03 s (38.38 rows/s) 1 row inserted.```python theme={null} sentences = pxt.create_view( 'split_demo/sentences', texts, iterator=string_splitter(texts.content, separators='sentence'), ) sentences.select(sentences.text).collect() ``` [SDK Reference: string\_splitter](/sdk/latest/string) ### Tile images for analysis Use `tile_iterator` to divide large images into a grid of smaller tiles. This is useful for processing high-resolution images that are too large to analyze at once, or for running object detection on different regions. ```python theme={null} from pixeltable.functions.image import tile_iterator images = pxt.create_table('split_demo/images', {'image': pxt.Image}) images.insert( [ { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/pixeltable-logo-large.png' } ] ) ```
Inserted 1 row with 0 errors in 0.09 s (11.69 rows/s) 1 row inserted.```python theme={null} tiles = pxt.create_view( 'split_demo/tiles', images, iterator=tile_iterator(images.image, tile_size=(100, 100)), ) ``` **tile\_iterator options:** * `tile_size` — Size of each tile as `(width, height)` * `overlap` — Overlap between adjacent tiles as `(width, height)` [SDK Reference: tile\_iterator](/sdk/latest/image) ```python theme={null} tiles.select(tiles.tile_coord, tiles.tile).sample(n=4).collect() ``` ### Split audio into chunks Use `audio_splitter` to divide audio files into time-based segments for transcription or analysis. ```python theme={null} from pixeltable.functions.audio import audio_splitter audio = pxt.create_table('split_demo/audio', {'audio': pxt.Audio}) audio.insert( [ { 'audio': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/10-minute%20tour%20of%20Pixeltable.mp3' } ] ) ```
Inserted 1 row with 0 errors in 0.67 s (1.50 rows/s) 1 row inserted.```python theme={null} audio_segments = pxt.create_view( 'split_demo/audio_chunks', audio, iterator=audio_splitter(audio.audio, duration=30.0, overlap=2.0), ) audio_segments.select( audio_segments.segment_start, audio_segments.segment_end ).limit(5).collect() ``` **audio\_splitter options:** * `duration` — Duration of each chunk in seconds * `overlap` — Overlap between chunks in seconds * `min_segment_duration` — Drop last chunk if shorter than this [SDK Reference: audio\_splitter](/sdk/latest/audio) ## See also * [Split documents for RAG](/howto/cookbooks/text/doc-chunk-for-rag) * [Extract frames from videos](/howto/cookbooks/video/video-extract-frames) * [Transcribe audio files](/howto/cookbooks/audio/audio-transcribe) # Get fast feedback on transformations Source: https://docs.pixeltable.com/howto/cookbooks/core/dev-iterative-workflow
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'demo\_project'. \```python theme={null} t = pxt.create_table('demo_project/lyrics', {'text': pxt.String}) ```
Created table 'lyrics'.```python theme={null} t.insert( [ {'text': 'Tumble out of bed and I stumble to the kitchen'}, {'text': 'Pour myself a cup of ambition'}, {'text': 'And yawn and stretch and try to come to life'}, {'text': "Jump in the shower and the blood starts pumpin'"}, {'text': "Out on the street, the traffic starts jumpin'"}, {'text': 'With folks like me on the job from nine to five'}, ] ) ```
Inserted 6 rows with 0 errors in 0.01 s (916.65 rows/s) 6 rows inserted.### Example 1: built-in functions Iterate with built-in functions, then add to the table. ```python theme={null} # Test uppercase transformation on subset t.select(t.text, uppercase=t.text.upper()).head(2) ``` ```python theme={null} # Confirm the transformation was only in memory—table unchanged t.head(2) ``` ```python theme={null} # Apply to all rows (same expression) t.add_computed_column(uppercase=t.text.upper()) ```
Added 6 column values with 0 errors in 0.04 s (158.08 rows/s) 6 rows updated.```python theme={null} # View text with uppercase column t.collect() ``` ### Example 2: save and reuse expressions Save an expression as a variable to guarantee the same logic in both iterate and add steps. ```python theme={null} # Define the expression once - no duplication char_count_expr = t.text.len() # Iterate: Test on subset t.select(t.text, char_count=char_count_expr).head(2) ``` ```python theme={null} # Confirm the transformation was only in memory—table unchanged t.head(2) ``` ```python theme={null} # Add: Use the SAME expression to persist t.add_computed_column(char_count=char_count_expr) ```
Added 6 column values with 0 errors in 0.02 s (348.64 rows/s) 6 rows updated.```python theme={null} # View text with char_count column t.collect() ``` This pattern works with any expression: * Built-in functions: `resize_expr = t.image.resize((224, 224))` * UDFs: `watermark_expr = add_watermark(t.image, '© 2024')` * Chained operations: `processed_expr = t.image.resize((224, 224)).rotate(90)` Benefits: * Write the expression once, use it twice * No copy-paste—reuse the same logic * Easy to iterate: change in one place, test again ### Example 3: custom UDF Iterate with a user-defined function, then add to the table. ```python theme={null} # Define a custom transformation @pxt.udf def word_count(text: str) -> int: return len(text.split()) ``` ```python theme={null} # Iterate: Test UDF on subset t.select(t.text, word_count=word_count(t.text)).head(2) ``` ```python theme={null} # Confirm the transformation was only in memory—table unchanged t.head(2) ``` ```python theme={null} # Add: Apply to all rows (same expression) t.add_computed_column(word_count=word_count(t.text)) ```
Added 6 column values with 0 errors in 0.02 s (312.11 rows/s) 6 rows updated.```python theme={null} # View text with word_count column t.collect() ``` ### Example 4: annotate columns with metadata Use `ColumnSpec` to attach a comment or custom metadata when adding columns. Comments appear in `describe()` output, while `custom_metadata` stores arbitrary data (tags, version info, config) that you can retrieve with `get_metadata()`. ```python theme={null} from pixeltable.types import ColumnSpec # Add a column with a comment and custom metadata t.add_column( source=ColumnSpec( type=pxt.String, comment='Original source URL or file path', custom_metadata={'added_by': 'data_team', 'version': 2}, ) ) t.describe() ``` ## Explanation **How the iterate-then-add workflow works:** Queries and computed columns serve different purposes. Queries let you test transformations on sample rows without storing anything. Once you’re satisfied with the results, you use the exact same expression with `.add_computed_column()` to persist it across your entire table. This workflow is especially valuable for expensive operations—API calls, model inference, complex image processing—where you want to validate logic before processing your full dataset. Test on 2-3 rows to catch errors early, then commit once. **To customize this workflow:** * **Sample size**: Use `.head(n)` to collect only the first n rows—`.head(1)` for single-row testing, `.head(10)` for broader validation, or `.collect()` to collect all rows * **Save expressions**: Store transformations as variables (Example 2) to guarantee identical logic in both iterate and add steps * **Chain transformations**: Test multiple operations together—`.select(t.text.upper().split())` works just like single operations * **Use with any data type**: This pattern works with images, videos, audio, documents—not just text. For multimodal data, visual inspection during iteration is especially valuable **The Pixeltable workflow:** In traditional databases, `.select()` just picks which columns to view. In Pixeltable, `.select()` also lets you compute new transformations on the fly—define new columns without storing them. This makes `.select()` perfect for testing transformations before you commit them. When you use `.select()`, you’re creating a query. Queries are temporary operations that retrieve and transform data from tables—they don’t store anything. Queries use lazy evaluation, meaning they don’t execute until you call `.collect()`. You must use `.collect()` to execute the query and return results. `.head(n)` is a convenience method that collects only the first n rows instead of all rows. Use `.head(n)` when iterating to get fast feedback without processing your entire dataset. Nothing is stored in your table when you run queries. You can test different approaches quickly without affecting your data. You can store query results in a Python variable to work with them in your session. ```python theme={null} # Store query results as a variable (in memory only) results = t.select( t.text, uppercase=t.text.upper() # Label the transformed column ).head(3) ``` These results are stored in memory and will not persist across sessions—only `.add_computed_column()` persists data to your table. Once you’re satisfied, `.add_computed_column()` uses the same expression but adds it as a persistent column in your table. Now the transformation runs on all rows and results are stored permanently. ## See also * [Transform images with PIL operations](/howto/cookbooks/images/img-pil-transforms) * [Convert RGB images to grayscale](/howto/cookbooks/images/img-rgb-to-grayscale) * [Apply filters to images](/howto/cookbooks/images/img-apply-filters) # Join tables to combine data Source: https://docs.pixeltable.com/howto/cookbooks/core/query-join-tables
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'join\_demo'. \### Create sample tables ```python theme={null} # Create a customers table customers = pxt.create_table( 'join_demo/customers', {'customer_id': pxt.Int, 'name': pxt.String, 'email': pxt.String}, ) customers.insert( [ {'customer_id': 1, 'name': 'Alice', 'email': 'alice@example.com'}, {'customer_id': 2, 'name': 'Bob', 'email': 'bob@example.com'}, { 'customer_id': 3, 'name': 'Charlie', 'email': 'charlie@example.com', }, ] ) customers.collect() ```
Created table 'customers'. Inserted 3 rows with 0 errors in 0.01 s (385.68 rows/s)```python theme={null} # Create an orders table orders = pxt.create_table( 'join_demo/orders', { 'order_id': pxt.Int, 'customer_id': pxt.Int, 'product': pxt.String, 'amount': pxt.Float, }, ) orders.insert( [ { 'order_id': 101, 'customer_id': 1, 'product': 'Laptop', 'amount': 999.00, }, { 'order_id': 102, 'customer_id': 1, 'product': 'Mouse', 'amount': 29.00, }, { 'order_id': 103, 'customer_id': 2, 'product': 'Keyboard', 'amount': 79.00, }, { 'order_id': 104, 'customer_id': 4, 'product': 'Monitor', 'amount': 299.00, }, # No matching customer ] ) orders.collect() ```
Created table 'orders'. Inserted 4 rows with 0 errors in 0.01 s (657.81 rows/s)### Inner join (matching rows only) ```python theme={null} # Inner join: only rows that match in both tables customers.join( orders, on=customers.customer_id == orders.customer_id, how='inner' ).select(customers.name, orders.product, orders.amount).collect() ``` ### Left join (keep all from first table) ```python theme={null} # Left join: all customers, with order data where available # Charlie has no orders, so product/amount will be null customers.join( orders, on=customers.customer_id == orders.customer_id, how='left' ).select(customers.name, orders.product, orders.amount).collect() ``` ### Join with filtering ```python theme={null} # Combine join with where clause to filter results customers.join( orders, on=customers.customer_id == orders.customer_id, how='inner' ).where(orders.amount > 50).select( customers.name, customers.email, orders.product, orders.amount ).collect() ``` ### Join with aggregation ```python theme={null} # Join and aggregate: total spending per customer customers.join( orders, on=customers.customer_id == orders.customer_id, how='inner' ).group_by(customers.name).select( customers.name, total_spent=pxtf.sum(orders.amount), order_count=pxtf.count(orders.order_id), ).collect() ``` ### Cross join (all combinations) ```python theme={null} # Cross join: every customer paired with every product (no 'on' condition) products = pxt.create_table( 'join_demo/products', {'product': pxt.String, 'price': pxt.Float} ) products.insert( [ {'product': 'Widget', 'price': 19.99}, {'product': 'Gadget', 'price': 29.99}, ] ) customers.join(products, how='cross').select( customers.name, products.product, products.price ).collect() ```
Created table 'products'. Inserted 2 rows with 0 errors in 0.00 s (422.52 rows/s)### Save join results to a new table ```python theme={null} # Build a join query and collect as DataFrame customer_orders_df = ( customers.join( orders, on=customers.customer_id == orders.customer_id, how='inner', ) .select( name=customers.name, email=customers.email, product=orders.product, amount=orders.amount, ) .collect() .to_pandas() ) customer_orders_df ``` ```python theme={null} # Create a new table from the DataFrame orders_report = pxt.create_table( 'join_demo/orders_report', source=customer_orders_df ) orders_report.collect() ```
Created table 'orders\_report'. Inserted 3 rows with 0 errors in 0.01 s (500.32 rows/s)### Paginate results with limit and offset Use `limit(n, offset=k)` to retrieve results in pages. This is useful for displaying results incrementally or building paginated APIs. ```python theme={null} # Page 1: first 2 rows orders.order_by(orders.order_id).limit(2).collect() ``` ```python theme={null} # Page 2: next 2 rows (skip the first 2) orders.order_by(orders.order_id).limit(2, offset=2).collect() ``` ## Explanation **Join types:** **Join syntax:** ```python theme={null} # Simple: join on column by name t1.join(t2, on=t1.id) # Explicit predicate t1.join(t2, on=t1.customer_id == t2.customer_id) # Composite key t1.join(t2, on=(t1.pk1 == t2.pk1) & (t1.pk2 == t2.pk2)) ``` **Aggregation functions:** ```python theme={null} from pixeltable.functions import sum, count, mean, min, max # Use as functions, not methods total=sum(t.amount) num_rows=count(t.id) ``` **Saving join results:** ```python theme={null} # Collect as DataFrame, then create table df = query.select(name=t.col, ...).collect().to_pandas() new_table = pxt.create_table('path', source=df) ``` **Pagination:** ```python theme={null} # limit(n) returns at most n rows # limit(n, offset=k) skips the first k rows, then returns n query.order_by(t.id).limit(10) # rows 0-9 query.order_by(t.id).limit(10, offset=10) # rows 10-19 ``` **Tips:** * Use explicit predicates (`t1.col == t2.col`) for clarity * Chain `.where()` after join to filter results * Chain `.group_by()` for aggregations * Use `'left'` join when the first table is your “main” table * Use named columns in `.select(name=col)` for clean column names * Always use `.order_by()` with pagination to get deterministic page ordering ## See also * [Look up structured data](/howto/cookbooks/agents/pattern-data-lookup) - Use retrieval UDFs for lookups * [Sample data for training](/howto/cookbooks/data/data-sampling) - Sample from joined results # Time Zones Source: https://docs.pixeltable.com/howto/cookbooks/core/time-zones
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'tz\_demo'. Created table 'example'.```python theme={null} from datetime import datetime, timezone from zoneinfo import ZoneInfo naive_dt = datetime(2024, 8, 9, 23, 0, 0) explicit_dt = datetime( 2024, 8, 9, 23, 0, 0, tzinfo=ZoneInfo('America/Los_Angeles') ) other_dt = datetime( 2024, 8, 9, 23, 0, 0, tzinfo=ZoneInfo('America/New_York') ) t.insert( [ {'dt': naive_dt, 'note': 'No time zone specified (uses default)'}, { 'dt': explicit_dt, 'note': 'Time zone America/Los_Angeles was specified explicitly', }, { 'dt': other_dt, 'note': 'Time zone America/New_York was specified explicitly', }, ] ) ```
Inserting rows into \`example\`: 0 rows \[00:00, ? rows/s]Inserting rows into \`example\`: 3 rows \[00:00, 433.04 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 3 values computed.On retrieval, all timestamps are normalized to the default time zone, regardless of how they were specified during insertion. ```python theme={null} t.collect() ``` To represent timestamps in a different time zone, use the `astimezone` method. ```python theme={null} t.select( t.dt, dt_new_york=t.dt.astimezone('America/New_York'), note=t.note ).collect() ``` ### Timestamp methods and properties The Pixeltable API exposes all the standard `datetime` methods and properties from the Python library. Because retrieval uses the default time zone, they are all relative to the default time zone unless `astimezone` is used. ```python theme={null} t.select( t.dt, day_default=t.dt.day, day_eastern=t.dt.astimezone('America/New_York').day, ).collect() ``` Observe that the first two timestamps map to different dates depending on the time zone, as expected. # Track changes and revert to previous versions Source: https://docs.pixeltable.com/howto/cookbooks/core/version-control-history
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'version\_demo'. \### Create a table and make some changes Every data or schema change creates a new version. ```python theme={null} # Create table (version 0) products = pxt.create_table( 'version_demo/products', {'name': pxt.String, 'price': pxt.Float, 'category': pxt.String}, ) ```
Created table 'products'.```python theme={null} # Insert data (version 1) products.insert( [ {'name': 'Widget', 'price': 9.99, 'category': 'Tools'}, {'name': 'Gadget', 'price': 24.99, 'category': 'Electronics'}, {'name': 'Gizmo', 'price': 14.99, 'category': 'Electronics'}, ] ) ```
Inserting rows into \`products\`: 0 rows \[00:00, ? rows/s]Inserting rows into \`products\`: 3 rows \[00:00, 432.95 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 6 values computed.```python theme={null} # Add a computed column (version 2 - schema change) products.add_computed_column(price_with_tax=products.price * 1.08) ```
Added 3 column values with 0 errors. 3 rows updated, 6 values computed.```python theme={null} # Update some data (version 3) products.update({'price': 19.99}, where=products.name == 'Widget') ```
Inserting rows into \`products\`: 0 rows \[00:00, ? rows/s]Inserting rows into \`products\`: 1 rows \[00:00, 297.47 rows/s] 1 row updated, 3 values computed.```python theme={null} # Insert more data (version 4) products.insert( [{'name': 'Thingamajig', 'price': 49.99, 'category': 'Tools'}] ) ```
Inserting rows into \`products\`: 0 rows \[00:00, ? rows/s]Inserting rows into \`products\`: 1 rows \[00:00, 661.46 rows/s] Inserted 1 row with 0 errors. 1 row inserted, 3 values computed.### View version history Use `history()` for a human-readable summary of all changes. ```python theme={null} # View full history (most recent first) products.history() ``` ```python theme={null} # View only the last 3 versions products.history(n=3) ``` ### Programmatic access to version metadata Use `get_versions()` to access version data programmatically. ```python theme={null} # Get version metadata as a list of dictionaries versions = products.get_versions() # Access specific version info latest = versions[0] latest['version'], latest['change_type'], latest['inserts'] ```
(4, 'data', 1)### Access a specific version Use `pxt.get_table('table_name:version')` to get a read-only handle to a specific version: ```python theme={null} # Get the table at version 1 (after initial insert, before computed column) products_v1 = pxt.get_table('version_demo/products:1') # This is a read-only view of the data at that point in time products_v1.collect() ``` ```python theme={null} # Compare data at version 2 (after computed column added) vs version 1 # Note: version 1 doesn't have the price_with_tax column yet products_v2 = pxt.get_table('version_demo/products:2') products_v2.collect() ``` ### Revert to previous version Use `revert()` to undo the most recent change. This is irreversible. ```python theme={null} # Current state: 4 products products.count() ```
4```python theme={null} # Revert the last insert (removes Thingamajig) products.revert() products.count() ```
3```python theme={null} # History now shows version 4 was reverted products.history() ``` ```python theme={null} # Can revert multiple times (back to before the update) products.revert() # Check the Widget price is back to original products.where(products.name == 'Widget').select( products.name, products.price ).collect() ``` ### Create point-in-time snapshots Snapshots freeze a table’s state for reproducibility. Unlike `revert()`, snapshots preserve the data indefinitely. ```python theme={null} # Create a snapshot of the current state snapshot_v1 = pxt.create_snapshot('version_demo/products_v1', products) snapshot_v1.collect() ``` ```python theme={null} # Now make changes to the original table products.insert( [{'name': 'Doohickey', 'price': 99.99, 'category': 'Premium'}] ) products.update({'price': 29.99}, where=products.name == 'Gadget') products.collect() ```
Inserting rows into \`products\`: 0 rows \[00:00, ? rows/s]Inserting rows into \`products\`: 1 rows \[00:00, 535.67 rows/s] Inserted 1 row with 0 errors. Inserting rows into \`products\`: 0 rows \[00:00, ? rows/s]Inserting rows into \`products\`: 1 rows \[00:00, 558.05 rows/s]```python theme={null} # Snapshot remains unchanged - still shows original data snapshot_v1.collect() ``` ## Explanation **What creates a new version:** * `insert()` - adding rows * `update()` - modifying rows * `delete()` - removing rows * `add_column()` / `add_computed_column()` - schema changes * `drop_column()` - schema changes * `rename_column()` - schema changes **Version history methods:** * `history()` - Human-readable DataFrame showing all changes * `get_versions()` - List of dictionaries for programmatic access **Accessing specific versions:** * `pxt.get_table('table_name:N')` - Get read-only handle to version N * Useful for comparing data across versions, auditing changes, or recovering specific values * Version handles are read-only—you cannot modify historical versions **Reverting:** * `revert()` undoes the most recent version * Can call multiple times to go back further * Cannot revert past version 0 * Cannot revert if a snapshot references that version **Snapshots vs revert:** * Snapshots are persistent, named, point-in-time copies * `revert()` permanently removes the latest version * Use snapshots when you need to preserve state for reproducibility * Use `revert()` to undo mistakes ## See also * [Data sharing](../../../platform/data-sharing) - Share tables between environments * [Iterative development](/howto/cookbooks/core/dev-iterative-workflow) - Fast feedback during development # Configure API keys for AI services Source: https://docs.pixeltable.com/howto/cookbooks/core/workflow-api-keys
True### Option 2: config file **Use when:** Local development, want credentials available to all Pixeltable projects Create `~/.pixeltable/config.toml`: ```toml theme={null} # ~/.pixeltable/config.toml [openai] api_key = "sk-..." [anthropic] api_key = "sk-ant-..." [google] api_key = "AIza..." ``` You can check if the config file exists: ```python theme={null} # Check config file location home_dir = pxt.home() # Usually ~/.pixeltable config_file = home_dir / 'config.toml' print(config_file) config_file.exists() ```
/Users/asiegel/.pixeltable/config.toml True### Option 3: getpass (interactive) **Use when:** Shared notebooks, demos, one-time sessions Prompt for the key at runtime—it won’t be saved anywhere: ```python theme={null} import getpass # Uncomment to use interactively: # if 'OPENAI_API_KEY' not in os.environ: # os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ') ``` ### Verify your configuration Test that Pixeltable can access your credentials by checking the config: ```python theme={null} # Check which API keys are available services = [ 'OPENAI_API_KEY', 'ANTHROPIC_API_KEY', 'GOOGLE_API_KEY', 'MISTRAL_API_KEY', ] for svc in services: status = '✓' if svc in os.environ else '✗' print(f'{status} {svc}') ```
✓ OPENAI\_API\_KEY ✓ ANTHROPIC\_API\_KEY ✓ GOOGLE\_API\_KEY ✓ MISTRAL\_API\_KEY## Explanation **Discovery order:** Pixeltable checks for API keys in this order: 1. Environment variable (e.g., `OPENAI_API_KEY`) 2. Config file (`~/.pixeltable/config.toml`) 3. Raises an error if not found **Supported services:** **Config file is global:** All Pixeltable projects on your machine share the same config file. **Getpass is per-session:** The key only exists in memory for the current Python session. ## See also * [Pixeltable configuration reference](/platform/configuration) * [Working with OpenAI](/howto/providers/working-with-openai) # Extract fields from LLM JSON responses Source: https://docs.pixeltable.com/howto/cookbooks/core/workflow-json-extraction
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'json\_demo'. \```python theme={null} t = pxt.create_table('json_demo/prompts', {'prompt': pxt.String}) ```
Created table 'prompts'.```python theme={null} t.insert( [ {'prompt': 'What is the capital of France?'}, {'prompt': 'Write a haiku about coding'}, ] ) ```
Inserting rows into \`prompts\`: 2 rows \[00:00, 325.83 rows/s] Inserted 2 rows with 0 errors. 2 rows inserted, 2 values computed.### Get LLM responses ```python theme={null} # Add computed column for API response (returns full JSON) t.add_computed_column( response=openai.chat_completions( messages=[{'role': 'user', 'content': t.prompt}], model='gpt-4o-mini', ) ) ```
Added 2 column values with 0 errors. 2 rows updated, 2 values computed.### Extract specific fields Use dot notation to access nested JSON fields: ```python theme={null} # Extract just the text content t.add_computed_column(text=t.response.choices[0].message.content) # Extract token usage t.add_computed_column(tokens=t.response.usage.total_tokens) ```
Added 2 column values with 0 errors. Added 2 column values with 0 errors. 2 rows updated, 2 values computed.```python theme={null} # View clean results t.select(t.prompt, t.text, t.tokens).collect() ``` ## Explanation **Common extraction patterns:** **Accessing JSON fields:** * Use dot notation for object properties: `response.usage` * Use brackets for array elements: `choices[0]` * Chain them: `response.choices[0].message.content` **Extracted columns are computed:** Changes to the source data automatically update all extracted fields. ## See also * [Configure API keys](/howto/cookbooks/core/workflow-api-keys) * [Extract structured data from images](/howto/cookbooks/images/vision-structured-output) # Add unique identifiers to your tables Source: https://docs.pixeltable.com/howto/cookbooks/core/workflow-uuid-identity
Created table 'products'.```python theme={null} # Insert data - no need to provide 'id', it's auto-generated products.insert( [ {'name': 'Laptop', 'price': 999.99}, {'name': 'Mouse', 'price': 29.99}, {'name': 'Keyboard', 'price': 79.99}, ] ) ```
Inserted 3 rows with 0 errors in 0.02 s (191.21 rows/s) 3 rows inserted.```python theme={null} # View the data - each row has a unique UUID products.collect() ``` ### Add a UUID column to an existing table You can add a UUID column to a table that already exists using `add_computed_column()`: ```python theme={null} # Create a table without a UUID column orders = pxt.create_table( 'uuid_demo/orders', {'customer': pxt.String, 'amount': pxt.Float} ) ```
Created table 'orders'.```python theme={null} # Insert some data orders.insert( [ {'customer': 'Alice', 'amount': 150.00}, {'customer': 'Bob', 'amount': 75.50}, ] ) ```
Inserted 2 rows with 0 errors in 0.01 s (310.49 rows/s) 2 rows inserted.```python theme={null} # Add a UUID column to existing table orders.add_computed_column(order_id=uuid7()) ```
Added 2 column values with 0 errors in 0.02 s (98.14 rows/s) 2 rows updated.```python theme={null} # View orders with their UUID column orders.collect() ``` ## Explanation **Two ways to add UUIDs:** Both use `uuid7()` which generates UUIDv7 (time-based) identifiers: * 128-bit values * Formatted as 32 hex digits with hyphens: `018e65c5-35e5-7c5d-8f37-f1c5b9c8a7b2` * Time-ordered for better database performance * Virtually guaranteed unique (collision probability is negligible) ## See also * [Tables and operations](/tutorials/tables-and-data-operations) * [Computed columns](/tutorials/computed-columns) # Export data for ML training Source: https://docs.pixeltable.com/howto/cookbooks/data/data-export-pytorch
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'pytorch\_demo'. \### Create sample training data ```python theme={null} # Create table with images and labels training_data = pxt.create_table( 'pytorch_demo/training_data', {'image': pxt.Image, 'label': pxt.Int} ) ```
Created table 'training\_data'.```python theme={null} # Insert sample images with labels base_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images' samples = [ {'image': f'{base_url}/000000000036.jpg', 'label': 0}, # cat {'image': f'{base_url}/000000000090.jpg', 'label': 1}, # other {'image': f'{base_url}/000000000139.jpg', 'label': 1}, # other ] training_data.insert(samples) ```
Inserting rows into \`training\_data\`: 3 rows \[00:00, 659.03 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 6 values computed.### Export to PyTorch dataset ```python theme={null} # Add a resize step to ensure all images have the same size training_data.add_computed_column( image_resized=training_data.image.resize((224, 224)) ) # Convert to PyTorch dataset # 'pt' format returns images as CxHxW tensors with values in [0,1] pytorch_dataset = training_data.select( training_data.image_resized, training_data.label ).to_pytorch_dataset(image_format='pt') ```
Added 3 column values with 0 errors.```python theme={null} # Use with PyTorch DataLoader dataloader = DataLoader(pytorch_dataset, batch_size=2) # Get first batch to verify the shape batch = next(iter(dataloader)) batch[ 'image_resized' ].shape # Should be (2, 3, 224, 224) - batch_size x channels x height x width ```
torch.Size(\[2, 3, 224, 224])### Export to Parquet for external tools ```python theme={null} import tempfile from pathlib import Path # Export to Parquet for use with other ML tools export_path = Path(tempfile.mkdtemp()) / 'training_data' pxt.io.export_parquet( training_data.select(training_data.label), # Non-image columns parquet_path=export_path, ) ``` ## Explanation **Export methods:** **Image format options:** **DataLoader tips:** * Data is cached to disk for efficient repeated loading * Use `num_workers > 0` for parallel data loading * Filter/transform data before export to reduce size ## See also * [Sample data for training](/howto/cookbooks/data/data-sampling) - Stratified sampling * [Import Parquet files](/howto/cookbooks/data/data-import-parquet) - Parquet import/export # Upload media to S3 and other cloud storage Source: https://docs.pixeltable.com/howto/cookbooks/data/data-export-s3
Created table 'media'.We can inspect the schema before adding images to our table: ```python theme={null} t ``` Let’s insert a single sample image. ```python theme={null} sample_image = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg' t.insert(source_image=sample_image) ```
Inserted 1 row with 0 errors in 0.77 s (1.29 rows/s) 1 row inserted.And we can see the image in our table: ```python theme={null} t.collect() ``` ## Default destinations By default, Pixeltable stores all media files in local storage under `~/.pixeltable/media`: * **Input files** (files you insert) — If you insert a URL, Pixeltable stores the URL and downloads it to cache on access. If you insert a local file path, Pixeltable just stores the path reference (the file stays where it is). * **Output files** (files Pixeltable generates) — Stored in `~/.pixeltable/media` This works out of the box with no configuration. You can change these defaults, which we’ll cover in the rest of this notebook. Let’s check where the source image is stored. Since we inserted a URL (not a local file), Pixeltable stores the URL reference and will download it to cache when we access it. ```python theme={null} # Let's see where the source_image is stored by default t.select(t.source_image.fileurl).collect() ``` Now let’s add a computed column without specifying a destination. This will show us where Pixeltable stores **output** files by default. ```python theme={null} # Add computed column with no destination specified - uses default t.add_computed_column( flipped=t.source_image.transpose(0), if_exists='replace' ) ```
Added 1 column value with 0 errors in 0.02 s (45.44 rows/s) 1 row updated.Check the file URL - it points to `~/.pixeltable/media`, the default location for generated files. ```python theme={null} t.select(t.flipped, t.flipped.fileurl).collect() ``` ## Per-column destinations When you create a computed column, you can specify exactly where to store generated files using the `destination=` parameter. This gives you fine-grained control over outputs, which may be costly and/or difficult to re-generate. We’ll create a destination directory for storing one of our processed images. For this demo, we’re using a local directory on your Desktop, but you can replace this path with a cloud storage URI (like `s3://my-bucket/rotated/`). ```python theme={null} # Create a local destination directory # For S3: dest_rotated = "s3://my-bucket/rotated/" # For GCS: dest_rotated = "gs://my-bucket/rotated/" base_path = Path.home() / 'Desktop' / 'pixeltable_outputs' base_path.mkdir(parents=True, exist_ok=True) dest_rotated = str(base_path / 'rotated') # Create directory (only needed for local paths) Path(dest_rotated).mkdir(exist_ok=True) ``` Now let’s add a computed column **with** an explicit destination to see the difference from the default behavior. ```python theme={null} # Add column WITH explicit destination t.add_computed_column( rotated=t.source_image.rotate(90), destination=dest_rotated, if_exists='replace', ) ```
Added 1 column value with 0 errors in 0.02 s (48.98 rows/s) 1 row updated.Compare the file URLs. The `rotated` image uses our explicit destination, while `flipped` (created earlier) uses the default `~/.pixeltable/media` location. ```python theme={null} t.select(t.rotated, t.rotated.fileurl).collect() ``` ```python theme={null} t.select(t.flipped, t.flipped.fileurl).collect() ``` ## Changing global destinations Instead of setting `destination=` on every column, you can change the global default for ALL columns. ### Output and input destinations You can configure two types of global destinations: * **`output_media_dest`** — Changes the default for files Pixeltable generates (computed columns) * **`input_media_dest`** — Changes the default for files you insert into tables You can set them to the same bucket or different buckets depending on your needs. ### How to configure You have two options: **Option 1: Configuration file** (`~/.pixeltable/config.toml`) ```toml theme={null} [pixeltable] # Where files Pixeltable generates are stored output_media_dest = "s3://my-bucket/output/" # Where files you insert are stored input_media_dest = "s3://my-bucket/input/" ``` **Option 2: Environment variables** ```bash theme={null} export PIXELTABLE_OUTPUT_MEDIA_DEST="s3://my-bucket/output/" export PIXELTABLE_INPUT_MEDIA_DEST="s3://my-bucket/input/" ``` ### Supported providers and URI formats For complete authentication and setup details, see the [Cloud Storage documentation](/integrations/cloud-storage). ## Overriding global destinations Even if you configure global destinations, you can still override them for specific columns using the `destination=` parameter in `add_computed_column()`. Let’s create a new destination directory and add a thumbnail column that uses it. ```python theme={null} # Create a different destination for thumbnails dest_thumbnails = str(base_path / 'thumbnails') Path(dest_thumbnails).mkdir(exist_ok=True) # Add column with explicit destination (overrides any global default) t.add_computed_column( thumbnail=t.source_image.thumbnail((128, 128)), destination=dest_thumbnails, if_exists='replace', ) ```
Added 1 column value with 0 errors in 0.02 s (47.89 rows/s) 1 row updated.Let’s view the thumbnail and its file URL. The explicit `destination=` parameter always wins, regardless of global configuration. ```python theme={null} t.select(t.thumbnail, t.thumbnail.fileurl).collect() ``` ## Getting URLs for your files When your files are in blob storage, you can get URLs that point directly to them. These URLs work in HTML, APIs, or any application you need to serve media with. The `.fileurl` property gives you direct URLs you can use anywhere. ```python theme={null} t.select( source=t.source_image.fileurl, rotated=t.rotated.fileurl, flipped=t.flipped.fileurl, ).collect() ``` ## Generating presigned URLs **Note:** This section only applies if you’re using cloud storage (S3, GCS, Azure, R2, B2, Tigris). If you’re following along with local destinations (as in the examples above), you can skip this section or configure cloud storage to try it out. When your files are in cloud storage, the `.fileurl` property returns storage URIs like `s3://bucket/path/file.jpg`. These aren’t directly accessible over HTTP. For private buckets or when you need time-limited HTTP access, use **presigned URLs**. These are temporary, authenticated URLs that allow anyone to access your files for a limited time without needing credentials. Presigned URLs are particularly useful for: * Sharing files from private buckets without making them public * Creating temporary download links with expiration * Serving media in web applications without exposing credentials * Providing time-limited access to sensitive content Use the `presigned_url` function from `pixeltable.functions.net`: ```python theme={null} import os # Use HTTPS URL format for Backblaze B2 b2_region = 'us-east-005' b2_bucket = 'pixeltable' cloud_destination = ( f'https://s3.{b2_region}.backblazeb2.com/{b2_bucket}/presigned-demo/' ) # Add the computed column t.add_computed_column( cloud_thumbnail=t.source_image.thumbnail((64, 64)), destination=cloud_destination, if_exists='replace', ) ```
Added 1 column value with 0 errors in 0.22 s (4.46 rows/s) 1 row updated.```python theme={null} # Now generate presigned URLs for the cloud-stored files from pixeltable.functions import net t.select( cloud_thumbnail=t.cloud_thumbnail, storage_url=t.cloud_thumbnail.fileurl, presigned_url=net.presigned_url( t.cloud_thumbnail.fileurl, 3600 ), # 1-hour expiration ).collect() ``` The presigned URLs in the output are fully authenticated HTTP/HTTPS URLs that can be accessed directly in a browser or used in APIs without any credentials. ### Common expiration times **Note:** Different storage providers have different maximum expiration limits. For example, Google Cloud Storage has a maximum 7-day expiration for presigned URLs. ### Troubleshooting presigned URLs If `presigned_url()` isn’t working: 1. **Local files**: Presigned URLs only work with cloud storage (S3, GCS, Azure, R2, B2, Tigris). If your files are stored locally (default), you’ll get an error. Configure a cloud destination first. 2. **Already HTTP URLs**: If `.fileurl` returns an `http://` or `https://` URL (not a storage URI like `s3://`), the file is already publicly accessible and doesn’t need a presigned URL. 3. **Credentials**: Ensure your cloud storage credentials are properly configured. See the [Cloud Storage documentation](/integrations/cloud-storage) for provider-specific setup. ## Common patterns Here are a few real-world patterns you might use: ### Pattern 1: All media in one place If you want everything in the same bucket, configure both input and output destinations in `~/.pixeltable/config.toml`: ```toml theme={null} [pixeltable] input_media_dest = "s3://my-bucket/media/" output_media_dest = "s3://my-bucket/media/" ``` Or set environment variables: ```bash theme={null} export PIXELTABLE_INPUT_MEDIA_DEST="s3://my-bucket/media/" export PIXELTABLE_OUTPUT_MEDIA_DEST="s3://my-bucket/media/" ``` ### Pattern 2: Separate input and output Keep source files separate from processed files in `~/.pixeltable/config.toml`: ```toml theme={null} [pixeltable] input_media_dest = "s3://my-bucket/uploads/" output_media_dest = "s3://my-bucket/processed/" ``` ### Pattern 3: Override for specific columns Use a global default, but send some columns elsewhere. First, set a global default in your config: ```toml theme={null} [pixeltable] output_media_dest = "s3://my-bucket/processed/" ``` Then in your code, most columns use the global default, but you can override specific ones: ```python theme={null} # Uses global default (s3://my-bucket/processed/) t.add_computed_column( thumbnail=t.image.thumbnail((128, 128)) ) # Overrides global default - goes to different location t.add_computed_column( large_thumbnail=t.image.thumbnail((512, 512)), destination='s3://my-bucket/thumbnails/' ) ``` ## Where do my files go? Understanding how Pixeltable handles different types of input files helps you make better decisions about storage configuration. When you configure a cloud destination, Pixeltable populates both the destination and the local cache efficiently during `insert()`. For URLs, this means downloading once and using that download for both the upload and cache—avoiding wasteful upload→download cycles. ## What you learned * Pixeltable uses local storage by default for all media files * You can override the default for specific columns with the `destination` parameter * You can change the global default with `input_media_dest` and `output_media_dest` * Precedence: column destination > global config > Pixeltable’s default local storage * Use `.fileurl` to get URLs for your stored files * Use `net.presigned_url()` to generate time-limited, authenticated HTTP URLs for cloud storage files * Pixeltable handles caching intelligently to avoid wasteful operations ## See also * [Load from S3](../../../howto/cookbooks/data/data-import-s3) - Import media from cloud storage * [Cloud Storage Integration](../../../integrations/cloud-storage) - Provider setup ## Next steps * See the [Cloud Storage documentation](/integrations/cloud-storage) for complete provider setup and authentication details * Check out [Pixeltable Configuration](/platform/configuration) for all config options * Join our [Discord community](https://pixeltable.com/discord) if you have questions # Export data to SQL databases Source: https://docs.pixeltable.com/howto/cookbooks/data/data-export-sql
Created directory 'sql\_export\_demo'. \```python theme={null} # Create a table with product data products = pxt.create_table( 'sql_export_demo/products', { 'name': pxt.String, 'price': pxt.Float, 'in_stock': pxt.Bool, 'metadata': pxt.Json, }, ) ```
Created table 'products'.```python theme={null} # Insert sample products products.insert( [ { 'name': 'Wireless Mouse', 'price': 29.99, 'in_stock': True, 'metadata': {'category': 'electronics', 'rating': 4.5}, }, { 'name': 'USB-C Hub', 'price': 49.99, 'in_stock': False, 'metadata': {'category': 'electronics', 'rating': 4.2}, }, { 'name': 'Mechanical Keyboard', 'price': 89.99, 'in_stock': True, 'metadata': {'category': 'electronics', 'rating': 4.8}, }, { 'name': 'Monitor Stand', 'price': 39.99, 'in_stock': True, 'metadata': {'category': 'accessories', 'rating': 4.0}, }, { 'name': 'Webcam', 'price': 59.99, 'in_stock': False, 'metadata': {'category': 'electronics', 'rating': 3.9}, }, ] ) ```
Inserted 5 rows with 0 errors in 0.01 s (566.35 rows/s) 5 rows inserted.```python theme={null} # View the data products.collect() ``` ### Export an entire table You pass a table and a SQLAlchemy connection string to export all rows and columns. ```python theme={null} # Create a SQLite database for this demo db_path = Path(tempfile.mkdtemp()) / 'products.db' connection_string = f'sqlite:///{db_path}' ``` ```python theme={null} # Export the full table export_sql(products, 'products', db_connect_str=connection_string) ``` ```python theme={null} # Verify the export with SQLAlchemy import sqlalchemy as sql engine = sql.create_engine(connection_string) with engine.connect() as conn: result = conn.execute(sql.text('SELECT * FROM products')).fetchall() result ```
\[('Wireless Mouse', 29.99, 1, '\{"rating": 4.5, "category": "electronics"}'),
('USB-C Hub', 49.99, 0, '\{"rating": 4.2, "category": "electronics"}'),
('Mechanical Keyboard', 89.99, 1, '\{"rating": 4.8, "category": "electronics"}'),
('Monitor Stand', 39.99, 1, '\{"rating": 4.0, "category": "accessories"}'),
('Webcam', 59.99, 0, '\{"rating": 3.9, "category": "electronics"}')]
### Export a filtered query
You can export any query result—filter rows, select specific columns, or
apply transformations before export.
```python theme={null}
# Export only in-stock products
export_sql(
products.where(products.in_stock == True),
'in_stock_products',
db_connect_str=connection_string,
)
```
```python theme={null}
# Verify filtered export
with engine.connect() as conn:
result = conn.execute(
sql.text('SELECT name, price FROM in_stock_products')
).fetchall()
result
```
\[('Wireless Mouse', 29.99),
('Mechanical Keyboard', 89.99),
('Monitor Stand', 39.99)]
### Export specific columns
You select only the columns you need before exporting. You can also
rename columns in the output.
```python theme={null}
# Export only name and price columns
export_sql(
products.select(products.name, products.price),
'price_list',
db_connect_str=connection_string,
)
```
```python theme={null}
# Export with renamed columns
export_sql(
products.select(
product_name=products.name, unit_price=products.price
),
'renamed_columns',
db_connect_str=connection_string,
)
```
```python theme={null}
# Verify column selection
inspector = sql.inspect(engine)
columns = [col['name'] for col in inspector.get_columns('price_list')]
columns
```
\['name', 'price']### Handle existing tables You control what happens when the target table already exists using the `if_exists` parameter: ```python theme={null} # Append new data to existing table export_sql( products.where(products.price > 50), 'products', db_connect_str=connection_string, if_exists='insert', ) ``` ```python theme={null} # Check row count after insert with engine.connect() as conn: result = conn.execute( sql.text('SELECT COUNT(*) FROM products') ).fetchone() f'Total rows after insert: {result[0]}' ```
'Total rows after insert: 7'```python theme={null} # Replace with fresh data export_sql( products.select(products.name, products.price), 'products', db_connect_str=connection_string, if_exists='replace', ) ``` ```python theme={null} # Check that table was replaced inspector = sql.inspect(engine) columns = [col['name'] for col in inspector.get_columns('products')] with engine.connect() as conn: row_count = conn.execute( sql.text('SELECT COUNT(*) FROM products') ).fetchone()[0] f'Columns: {columns}, Row count: {row_count}' ```
"Columns: \['name', 'price'], Row count: 5"### Export to cloud PostgreSQL (TigerData) You can export directly to cloud-hosted PostgreSQL databases like [TigerData](https://www.timescale.com/cloud) (Timescale Cloud). Get your credentials from the TigerData dashboard after creating a service. ```python theme={null} import getpass import os # Skip interactive sections in CI environments SKIP_CLOUD_TESTS = os.environ.get('CI') or os.environ.get( 'GITHUB_ACTIONS' ) if not SKIP_CLOUD_TESTS: # Enter your TigerData credentials interactively tigerdata_host = input( 'TigerData host (e.g., abc123.tsdb.cloud.timescale.com): ' ) tigerdata_port = input('TigerData port (e.g., 38963): ') tigerdata_user = input('TigerData username (e.g., tsdbadmin): ') tigerdata_password = getpass.getpass('TigerData password: ') tigerdata_dbname = input('TigerData database name (e.g., tsdb): ') # Build the connection string (use postgresql+psycopg:// for SQLAlchemy compatibility) tigerdata_connection = f'postgresql+psycopg://{tigerdata_user}:{tigerdata_password}@{tigerdata_host}:{tigerdata_port}/{tigerdata_dbname}?sslmode=require' else: print('Skipping TigerData section (running in CI)') ``` ```python theme={null} if not SKIP_CLOUD_TESTS: # Export to TigerData export_sql( products, 'pixeltable_products', db_connect_str=tigerdata_connection, if_exists='replace', ) ``` ```python theme={null} if not SKIP_CLOUD_TESTS: # Verify the export in TigerData tigerdata_engine = sql.create_engine(tigerdata_connection) with tigerdata_engine.connect() as conn: result = conn.execute( sql.text('SELECT * FROM pixeltable_products') ).fetchall() result ```
\[('Wireless Mouse', 29.99, True, \{'rating': 4.5, 'category': 'electronics'}),
('USB-C Hub', 49.99, False, \{'rating': 4.2, 'category': 'electronics'}),
('Mechanical Keyboard', 89.99, True, \{'rating': 4.8, 'category': 'electronics'}),
('Monitor Stand', 39.99, True, \{'rating': 4.0, 'category': 'accessories'}),
('Webcam', 59.99, False, \{'rating': 3.9, 'category': 'electronics'})]
### Export to Snowflake
You can export directly to [Snowflake](https://www.snowflake.com/) data
warehouses. Get your account identifier from the Snowflake web interface
under **Admin → Accounts**.
```python theme={null}
if not SKIP_CLOUD_TESTS:
# Enter your Snowflake credentials interactively
snowflake_account = input(
'Snowflake account identifier (e.g., WEZMMGC-AIB20064): '
)
snowflake_user = input('Snowflake username: ')
snowflake_password = getpass.getpass('Snowflake password: ')
snowflake_warehouse = input(
'Snowflake warehouse (e.g., COMPUTE_WH): '
)
snowflake_database = input('Snowflake database: ')
snowflake_schema = input('Snowflake schema (e.g., PUBLIC): ')
# Build the connection string
snowflake_connection = f'snowflake://{snowflake_user}:{snowflake_password}@{snowflake_account}/{snowflake_database}/{snowflake_schema}?warehouse={snowflake_warehouse}'
else:
print('Skipping Snowflake section (running in CI)')
```
```python theme={null}
if not SKIP_CLOUD_TESTS:
# Export to Snowflake (without JSON column)
export_sql(
products.select(products.name, products.price, products.in_stock),
'PIXELTABLE_PRODUCTS',
db_connect_str=snowflake_connection,
if_exists='replace',
)
```
```python theme={null}
if not SKIP_CLOUD_TESTS:
# Verify the export in Snowflake
snowflake_engine = sql.create_engine(snowflake_connection)
with snowflake_engine.connect() as conn:
result = conn.execute(
sql.text('SELECT * FROM PIXELTABLE_PRODUCTS')
).fetchall()
result
```
\[('Wireless Mouse', 29.99, True, None),
('USB-C Hub', 49.99, False, None),
('Mechanical Keyboard', 89.99, True, None),
('Monitor Stand', 39.99, True, None),
('Webcam', 59.99, False, None)]
### Exporting media data
For tables containing media types (`pxt.Image`, `pxt.Video`,
`pxt.Audio`), you have two options:
1. **Extract metadata before export** - Select only the columns you
need (paths, embeddings, extracted text, etc.) and export those to
SQL.
2. **Use Pixeltable destinations** - For syncing media files to cloud
storage, use Pixeltable’s built-in destination support with
providers like
[Tigris](/howto/providers/working-with-tigris).
**Example: Export image metadata to SQL**
```python theme={null}
# Create a table with images
images = pxt.create_table(
'sql_export_demo/images', {'image': pxt.Image, 'label': pxt.String}
)
# Add computed columns for metadata
images.add_computed_column(width=images.image.width)
images.add_computed_column(height=images.image.height)
images.add_computed_column(mode=images.image.mode)
```
Created table 'images'. Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Insert sample images base_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images' images.insert( [ {'image': f'{base_url}/000000000036.jpg', 'label': 'cat'}, {'image': f'{base_url}/000000000090.jpg', 'label': 'scene'}, ] ) ```
Inserted 2 rows with 0 errors in 0.03 s (63.85 rows/s) 2 rows inserted.```python theme={null} # Export metadata (not the image itself) to SQL export_sql( images.select(images.label, images.width, images.height, images.mode), 'image_metadata', db_connect_str=connection_string, # or tigerdata_connection for cloud ) ``` ```python theme={null} # Verify the metadata export with engine.connect() as conn: result = conn.execute( sql.text('SELECT * FROM image_metadata') ).fetchall() result ```
\[('cat', 481, 640, 'RGB'), ('scene', 640, 429, 'RGB')]
## Explanation
**Connection strings:**
The function uses SQLAlchemy connection strings. Common formats:
**Type mapping:**
Pixeltable types map to SQL types automatically:
**Unsupported types:**
Media types like `pxt.Image`, `pxt.Video`, and `pxt.Audio` cannot be
exported directly. Extract the data you need (paths, embeddings,
metadata) before export.
## See also
* [Working with
Tigris](/howto/providers/working-with-tigris) -
Sync media files to cloud storage
* [Cloud Storage
Integration](/integrations/cloud-storage) -
S3, GCS, and Azure Blob storage
* [Export to PyTorch](./data-export-pytorch) - Export for ML training
# Import data from CSV files
Source: https://docs.pixeltable.com/howto/cookbooks/data/data-import-csv
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'import\_demo'. \### Import CSV directly Use `create_table` with `source` to create a table from a CSV file: ```python theme={null} # Import CSV from URL csv_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/world-population-data.csv' population = pxt.create_table('import_demo/population', source=csv_url) ```
Created table 'population'. Inserting rows into \`population\`: 0 rows \[00:00, ? rows/s] Inserting rows into \`population\`: 234 rows \[00:00, 9032.63 rows/s] Inserted 234 rows with 0 errors.```python theme={null} # View the imported data population.head(5) ``` ### Import from Pandas DataFrame You can also create a DataFrame first and insert it: ```python theme={null} # Create a DataFrame df = pd.DataFrame( { 'name': ['Alice', 'Bob', 'Charlie'], 'age': [25, 30, 35], 'city': ['NYC', 'LA', 'Chicago'], } ) # Create table and insert DataFrame users = pxt.create_table( 'import_demo/users', {'name': pxt.String, 'age': pxt.Int, 'city': pxt.String}, ) users.insert(df) ```
Created table 'users'. Inserting rows into \`users\`: 0 rows \[00:00, ? rows/s] Inserting rows into \`users\`: 3 rows \[00:00, 923.31 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 6 values computed.```python theme={null} # View the data users.collect() ``` ## Explanation **Source types supported:** **Type inference:** Pixeltable automatically infers column types from CSV data. You can override types using `schema_overrides`. **Large files:** For very large CSV files, consider: * Using `create_table(source=...)` which streams data * Importing in batches if memory is limited ## See also * [Tables documentation](/tutorials/tables-and-data-operations) * [Bringing data guide](/howto/cookbooks/data/data-import-csv) # Import data from Excel files Source: https://docs.pixeltable.com/howto/cookbooks/data/data-import-excel
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'excel\_demo'. \```python theme={null} # Import Excel file directly orders = pxt.create_table( 'excel_demo/orders', source=str(excel_path), source_format='excel', # Hint for Excel format ) ```
Created table 'orders'. Inserting rows into \`orders\`: 0 rows \[00:00, ? rows/s] Inserting rows into \`orders\`: 5 rows \[00:00, 501.21 rows/s] Inserted 5 rows with 0 errors.```python theme={null} # View imported data orders.collect() ``` ### Add computed columns ```python theme={null} # Add computed column for order total orders.add_computed_column(total=orders.quantity * orders.price) ```
Added 5 column values with 0 errors. 5 rows updated, 10 values computed.```python theme={null} # View with computed total orders.select( orders.order_id, orders.customer, orders.product, orders.quantity, orders.price, orders.total, ).collect() ``` ## Explanation **Import methods:** **Excel-specific options:** Pass Pandas `read_excel` arguments via `extra_args`: ```python theme={null} pxt.create_table( 'table_name', source='data.xlsx', source_format='excel', extra_args={'sheet_name': 'Sheet2', 'skiprows': 1} ) ``` **Common extra\_args:** ## See also * [Import CSV files](/howto/cookbooks/data/data-import-csv) - CSV and tabular data * [Import Parquet files](/howto/cookbooks/data/data-import-parquet) - Columnar data # Import data from Hugging Face datasets Source: https://docs.pixeltable.com/howto/cookbooks/data/data-import-huggingface
Created directory 'hf\_demo'. \### Import a single split Load a specific split from a dataset: ```python theme={null} # Load a small subset for demo (first 100 rows of rotten_tomatoes) hf_dataset = load_dataset( 'cornell-movie-review-data/rotten_tomatoes', split='train[:100]' ) ``` ```python theme={null} # Import into Pixeltable reviews = pxt.create_table('hf_demo/reviews', source=hf_dataset) ```
Created table 'reviews'. Inserting rows into \`reviews\`: 100 rows \[00:00, 14781.69 rows/s] Inserted 100 rows with 0 errors.```python theme={null} # View imported data reviews.head(5) ``` ### Import multiple splits Load a DatasetDict with multiple splits and track which split each row came from: ```python theme={null} # Load dataset with multiple splits (small subset for demo) hf_dataset_dict = load_dataset( 'cornell-movie-review-data/rotten_tomatoes', split={'train': 'train[:50]', 'test': 'test[:50]'}, ) ``` ```python theme={null} # Import each split separately for clarity train_data = pxt.create_table( 'hf_demo/reviews_train', source=hf_dataset_dict['train'] ) test_data = pxt.create_table( 'hf_demo/reviews_test', source=hf_dataset_dict['test'] ) ```
Created table 'reviews\_train'. Inserting rows into \`reviews\_train\`: 50 rows \[00:00, 10150.29 rows/s] Inserted 50 rows with 0 errors. Created table 'reviews\_test'. Inserting rows into \`reviews\_test\`: 50 rows \[00:00, 9883.37 rows/s] Inserted 50 rows with 0 errors.```python theme={null} # View training data train_data.head(5) ``` ```python theme={null} # View test data test_data.head(3) ``` ### Add AI-powered computed columns Enrich the dataset with AI models: ```python theme={null} # Add a computed column for text length reviews.add_computed_column( text_length=reviews.text.apply(len, col_type=pxt.Int) ) ```
Added 100 column values with 0 errors. 100 rows updated, 200 values computed.```python theme={null} # View with computed column reviews.select(reviews.text, reviews.label, reviews.text_length).head(5) ``` ### Type mapping Pixeltable automatically maps Hugging Face types to Pixeltable types: Use `schema_overrides` to customize type mapping when needed. ## Explanation **Why import Hugging Face datasets into Pixeltable:** 1. **Add computed columns** - Enrich data with embeddings, AI analysis, or transformations 2. **Incremental processing** - Add new rows without reprocessing existing data 3. **Persistent storage** - Keep processed results across sessions 4. **Query capabilities** - Filter, aggregate, and join with other tables **Working with large datasets:** For very large datasets, consider loading in batches or using streaming mode in the `datasets` library before importing. ## See also * [Import CSV files](/howto/cookbooks/data/data-import-csv) - For CSV and Excel imports * [Semantic text search](/howto/cookbooks/search/search-semantic-text) - Add embeddings to text data * [Hugging Face integration notebook](/howto/providers/working-with-hugging-face) - Full integration guide # Import data from JSON files Source: https://docs.pixeltable.com/howto/cookbooks/data/data-import-json
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'json\_demo'. \```python theme={null} # Import JSON file into a new table articles = pxt.create_table( 'json_demo/articles', source=str(json_path), source_format='json', # Explicitly specify format when using local file paths ) ```
Created table 'articles'. Inserting rows into \`articles\`: 0 rows \[00:00, ? rows/s] Inserting rows into \`articles\`: 5 rows \[00:00, 538.52 rows/s] Inserted 5 rows with 0 errors.```python theme={null} # View imported data articles.collect() ``` ### Import from URL You can import JSON directly from a URL—useful for APIs and remote data: ```python theme={null} # Import from a public JSON URL # Using JSONPlaceholder API as an example posts = pxt.create_table( 'json_demo/posts', source='https://jsonplaceholder.typicode.com/posts', source_format='json', # Required for URL sources ) ```
Created table 'posts'. Inserting rows into \`posts\`: 0 rows \[00:00, ? rows/s] Inserting rows into \`posts\`: 100 rows \[00:00, 15623.57 rows/s] Inserted 100 rows with 0 errors.```python theme={null} # View first few rows posts.head(5) ``` ### Import from Python dictionaries Use `create_table` with a list of dictionaries as `source`—useful when you have data in memory: ```python theme={null} # Import from a list of dictionaries events = [ { 'event': 'page_view', 'user_id': 101, 'timestamp': '2024-01-15T10:30:00', }, { 'event': 'click', 'user_id': 101, 'timestamp': '2024-01-15T10:31:00', }, { 'event': 'purchase', 'user_id': 102, 'timestamp': '2024-01-15T10:32:00', }, ] event_table = pxt.create_table('json_demo/events', source=events) ```
Created table 'events'. Inserting rows into \`events\`: 0 rows \[00:00, ? rows/s] Inserting rows into \`events\`: 3 rows \[00:00, 988.06 rows/s] Inserted 3 rows with 0 errors.```python theme={null} # View imported events event_table.collect() ``` ### Add computed columns Once imported, you can enrich the data with computed columns: ```python theme={null} # Add a computed column combining title and author articles.add_computed_column( summary=articles.title + ' by ' + articles.author ) ```
Added 5 column values with 0 errors. 5 rows updated, 10 values computed.```python theme={null} # View with computed column articles.select( articles.title, articles.author, articles.summary ).collect() ``` ## Explanation **JSON format requirements:** The JSON file must contain an array of objects at the top level: ```json theme={null} [ {"col1": "value1", "col2": 123}, {"col1": "value2", "col2": 456} ] ``` **Source types supported:** **Nested JSON handling:** Nested objects and arrays are stored as JSON columns. You can access nested fields using Pixeltable’s JSON path syntax in computed columns. ## See also * [Import CSV files](/howto/cookbooks/data/data-import-csv) - For CSV and Excel imports * [Import Parquet files](/howto/cookbooks/data/data-import-parquet) - For Parquet data * [Extract fields from JSON](/howto/cookbooks/core/workflow-json-extraction) - Parse LLM response fields # Import data from Parquet files Source: https://docs.pixeltable.com/howto/cookbooks/data/data-import-parquet
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'parquet\_demo'. \```python theme={null} # Import Parquet file into a new table products = pxt.create_table( 'parquet_demo/products', source=str(parquet_path) ) ```
Created table 'products'. Inserting rows into \`products\`: 0 rows \[00:00, ? rows/s] Inserting rows into \`products\`: 5 rows \[00:00, 653.18 rows/s] Inserted 5 rows with 0 errors.```python theme={null} # View imported data products.collect() ``` ### Add computed columns Once imported, you can add computed columns like any other Pixeltable table: ```python theme={null} # Add a computed column for discounted price products.add_computed_column(sale_price=products.price * 0.9) ```
Added 5 column values with 0 errors. 5 rows updated, 10 values computed.```python theme={null} # View with computed column products.select( products.name, products.price, products.sale_price ).collect() ``` ### Import with primary key Specify a primary key when you need upsert behavior or unique constraints: ```python theme={null} # Import with a primary key products_pk = pxt.create_table( 'parquet_demo/products_with_pk', source=str(parquet_path), primary_key='product_id', ) ```
Created table 'products\_with\_pk'. Inserting rows into \`products\_with\_pk\`: 0 rows \[00:00, ? rows/s] Inserting rows into \`products\_with\_pk\`: 5 rows \[00:00, 1548.97 rows/s] Inserted 5 rows with 0 errors.```python theme={null} # View the table products_pk.collect() ``` ### Export table to Parquet Export your processed data back to Parquet for use with other toolee ```python theme={null} # Export to Parquet (note: image columns require inline_images=True) export_path = Path(temp_dir) / 'exported_products' pxt.io.export_parquet( products.select(products.name, products.price, products.sale_price), parquet_path=export_path, ) ``` ```python theme={null} # Verify export by reading back import pyarrow.parquet as pq exported_table = pq.read_table(export_path) exported_table.to_pandas() ``` ## Explanation **When to use Parquet import:** **Key features:** * Automatic schema inference from Parquet metadata * Support for partitioned datasets (directory of files) * Export with `pxt.io.export_parquet` for interoperability * Primary key support for upsert workflows ## See also * [Import CSV files](/howto/cookbooks/data/data-import-csv) - For CSV and Excel imports * [Import JSON files](/howto/cookbooks/data/data-import-json) - For JSON data # Load media from S3 and other cloud storage Source: https://docs.pixeltable.com/howto/cookbooks/data/data-import-s3
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'cloud\_demo'. \### Load images from HTTP URLs Reference images by URL—Pixeltable downloads them on demand: ```python theme={null} # Create a table with image column images = pxt.create_table('cloud_demo/images', {'image': pxt.Image}) ```
Created table 'images'.```python theme={null} # Insert images by URL (HTTP) image_urls = [ 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000106.jpg', ] images.insert([{'image': url} for url in image_urls]) ```
Inserting rows into \`images\`: 3 rows \[00:00, 767.91 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 6 values computed.```python theme={null} # View images - files are downloaded and cached on access images.collect() ``` ### Load videos from S3 Reference videos in S3 buckets (using public Multimedia Commons bucket): ```python theme={null} # Create a table with video column videos = pxt.create_table('cloud_demo/videos', {'video': pxt.Video}) ```
Created table 'videos'.```python theme={null} # Insert videos by S3 URL (public bucket, no credentials needed) s3_prefix = 's3://multimedia-commons/' video_paths = [ 'data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4', 'data/videos/mp4/ffe/feb/ffefebb41485539f964760e6115fbc44.mp4', ] videos.insert([{'video': s3_prefix + path} for path in video_paths]) ```
Inserting rows into \`videos\`: 2 rows \[00:00, 1477.13 rows/s] Inserted 2 rows with 0 errors. 2 rows inserted, 4 values computed.```python theme={null} # View videos - downloaded and cached on access videos.collect() ``` ### Add computed columns on remote media Process remote media with computed columns—files are fetched automatically: ```python theme={null} # Add computed columns for image properties images.add_computed_column(width=images.image.width) images.add_computed_column(height=images.image.height) ```
Added 3 column values with 0 errors. Added 3 column values with 0 errors. 3 rows updated, 6 values computed.```python theme={null} # View with computed properties images.select(images.image, images.width, images.height).collect() ``` ### Generate presigned URLs for serving media When you store media in private cloud storage, you need presigned URLs to serve files over HTTP. The `presigned_url` function converts storage URIs to time-limited, publicly accessible URLs: ```python theme={null} import pixeltable.functions as pxtf # Generate presigned URLs for videos (1-hour expiration) videos.select( videos.video, original_uri=videos.video.fileurl, http_url=pxtf.net.presigned_url(videos.video.fileurl, 3600), ).collect() ``` ```python theme={null} # Store presigned URLs as computed column for API responses videos.add_computed_column( serving_url=pxtf.net.presigned_url( videos.video.fileurl, 86400 ) # 24-hour expiration ) ```
Added 2 column values with 0 errors. 2 rows updated, 4 values computed.**Use cases for presigned URLs:** * Serve private media in web applications without exposing credentials * Generate download links for end users * Integrate with CDNs or video players that require HTTP URLs **Provider limitations:** Note: HTTP/HTTPS URLs pass through unchanged (already publicly accessible). ### Supported URL formats Pixeltable supports multiple URL schemes for media files: \*Configure AWS/GCP credentials via environment variables or config files. ## Explanation **How caching works:** 1. URLs are stored as references in the table 2. Files are downloaded on first access (query or computed column) 3. Downloaded files are cached in `~/.pixeltable/file_cache/` 4. Cache uses LRU eviction when space is needed **Benefits of URL-based storage:** * **Lazy loading** - Only download files when needed * **Deduplication** - Same URL is cached once * **Incremental processing** - Add files without bulk downloads * **Cloud-native** - Works directly with object storage **For private S3 buckets:** Configure AWS credentials using standard methods: * Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`) * AWS credentials file (`~/.aws/credentials`) * IAM roles (when running on EC2/ECS) ## See also * [Upload to S3](../../../howto/cookbooks/data/data-export-s3) - Store generated media in S3/GCS * [Import from CSV](../../../howto/cookbooks/data/data-import-csv) - Load structured data * [Extract frames from videos](/howto/cookbooks/video/video-extract-frames) - Process video files * [Analyze images in batch](/howto/cookbooks/images/vision-batch-analysis) - AI vision on images * [Configure API keys](/howto/cookbooks/core/workflow-api-keys) - Set up credentials # Sample data for training and testing Source: https://docs.pixeltable.com/howto/cookbooks/data/data-sampling
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'sampling\_demo'. \### Create sample dataset ```python theme={null} # Create a dataset with labels data = pxt.create_table( 'sampling_demo/data', {'text': pxt.String, 'label': pxt.String, 'score': pxt.Float}, ) # Insert sample data with imbalanced classes samples = [ {'text': 'Great product!', 'label': 'positive', 'score': 0.9}, {'text': 'Love it', 'label': 'positive', 'score': 0.85}, {'text': 'Amazing quality', 'label': 'positive', 'score': 0.95}, {'text': 'Best purchase ever', 'label': 'positive', 'score': 0.88}, {'text': 'Highly recommend', 'label': 'positive', 'score': 0.92}, {'text': 'Fantastic!', 'label': 'positive', 'score': 0.91}, {'text': 'Terrible', 'label': 'negative', 'score': 0.1}, {'text': 'Waste of money', 'label': 'negative', 'score': 0.15}, {'text': 'It is okay', 'label': 'neutral', 'score': 0.5}, {'text': 'Average product', 'label': 'neutral', 'score': 0.55}, ] data.insert(samples) ```
Created table 'data'. Inserting rows into \`data\`: 0 rows \[00:00, ? rows/s] Inserting rows into \`data\`: 10 rows \[00:00, 857.13 rows/s] Inserted 10 rows with 0 errors. 10 rows inserted, 20 values computed.### Random sampling ```python theme={null} # Sample exactly N rows data.sample(n=5, seed=42).collect() ``` ```python theme={null} # Sample a percentage of rows sample_50pct = data.sample(fraction=0.5, seed=42).collect() ``` ### Stratified sampling ```python theme={null} # Stratified sampling: 50% from each class data.sample(fraction=0.5, stratify_by=data.label, seed=42).collect() ``` ```python theme={null} # Equal allocation: N rows from each class data.sample(n_per_stratum=1, stratify_by=data.label, seed=42).collect() ``` ### Sampling from filtered data ```python theme={null} # Sample from filtered query (high-confidence predictions only) data.where(data.score > 0.8).sample(n=3, seed=42).collect() ``` ### Persist samples as tables ```python theme={null} # Create a persistent table from a sample for dev/test train_sample = data.sample(fraction=0.8, seed=42) test_sample = data.sample(fraction=0.2, seed=43) # Persist as new tables train_table = pxt.create_table('sampling_demo/train', source=train_sample) test_table = pxt.create_table('sampling_demo/test', source=test_sample) ```
Created table 'train'. Inserting rows into \`train\`: 0 rows \[00:00, ? rows/s] Inserting rows into \`train\`: 9 rows \[00:00, 3080.27 rows/s] Created table 'test'. Inserting rows into \`test\`: 0 rows \[00:00, ? rows/s] Inserting rows into \`test\`: 3 rows \[00:00, 1333.92 rows/s]## Explanation **Sampling methods:** **Stratification options:** **Tips:** * Always set `seed` for reproducible experiments * Use stratified sampling for imbalanced datasets * Combine with `.where()` to sample from subsets ## See also * [Export for ML training](/howto/cookbooks/data/data-export-pytorch) - PyTorch DataLoader export * [Import Hugging Face datasets](/howto/cookbooks/data/data-import-huggingface) - Load pre-split datasets # Add watermarks to images Source: https://docs.pixeltable.com/howto/cookbooks/images/img-add-watermarks
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'image\_demo'. \```python theme={null} t = pxt.create_table('image_demo/watermarks', {'image': pxt.Image}) ```
Created table 'watermarks'.```python theme={null} t.insert( [ { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000049.jpg' }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg' }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg' }, ] ) ```
Inserting rows into \`watermarks\`: 0 rows \[00:00, ? rows/s]Inserting rows into \`watermarks\`: 3 rows \[00:00, 532.86 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 6 values computed.### Iterate: add watermarks to a few images first ```python theme={null} @pxt.udf def add_watermark(img: Image.Image, text: str) -> Image.Image: """Add a watermark to bottom-right corner.""" img = img.copy().convert('RGBA') overlay = Image.new('RGBA', img.size, (0, 0, 0, 0)) draw = ImageDraw.Draw(overlay) # Draw white text in bottom-right corner font = ImageFont.load_default(size=40) position = (img.width - 150, img.height - 60) draw.text(position, text, font=font, fill=(255, 255, 255, 200)) result = Image.alpha_composite(img, overlay) return result.convert('RGB') ``` ```python theme={null} # Test on first image t.select(t.image, add_watermark(t.image, '© 2024')).head(1) ``` ### Add: add watermarks to all images in your table ```python theme={null} # Add watermark to all images t.add_computed_column(watermarked=add_watermark(t.image, '© 2024')) ```
Added 3 column values with 0 errors. 3 rows updated, 3 values computed.```python theme={null} # View all results t.collect() ``` ## Explanation **How the watermark technique works:** The UDF creates a transparent overlay on top of each image. The overlay is created with the same dimensions as the image (`Image.new('RGBA', img.size, ...)`), so watermarks adapt automatically whether you’re processing small thumbnails or large photos. The function draws white text with semi-transparent fill (alpha=200, where 255 is fully opaque), composites the overlay onto the original image using `Image.alpha_composite()`, and converts back to RGB since most image formats don’t support transparency. **To customize the UDF:** * Position: Change the `(x, y)` coordinates in the `position` variable * Color: Modify the `(R, G, B, Alpha)` fill value (0-255 for each) * Size: Adjust the font size parameter in `ImageFont.load_default(size=40)` * Font: Use `ImageFont.truetype('path/to/font.ttf', size)` for custom fonts **The Pixeltable workflow:** In traditional databases, `.select()` just picks which columns to view. In Pixeltable, `.select()` also lets you compute new transformations on the fly—define new columns without storing them. This makes `.select()` perfect for testing transformations before you commit them. When you use `.select()`, you’re creating a query that doesn’t execute until you call `.collect()`. You must use `.collect()` to execute the query and return results—nothing is stored in your table. If you want to collect only the first few rows, use `.head(n)` instead of `.collect()` to test on a subset before processing your full dataset. Once satisfied, use `.add_computed_column()` with the same expression to persist results permanently. For more on this workflow, see [Get fast feedback on transformations](/howto/cookbooks/core/dev-iterative-workflow). ## See also * [Test transformations with fast feedback loops](/howto/cookbooks/core/dev-iterative-workflow) * [Transform images with PIL operations](/howto/cookbooks/images/img-pil-transforms) * *Pillow techniques from [Real Python: Image Processing With the Python Pillow Library](https://realpython.com/image-processing-with-the-python-pillow-library/)* # Adjust image opacity Source: https://docs.pixeltable.com/howto/cookbooks/images/img-adjust-opacity
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'image\_demo'. \```python theme={null} t = pxt.create_table('image_demo/opacity', {'image': pxt.Image}) ```
Created table 'opacity'.```python theme={null} t.insert( [ { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000776.jpg' }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000885.jpg' }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000016.jpg' }, ] ) ```
Inserting rows into \`opacity\`: 0 rows \[00:00, ? rows/s]Inserting rows into \`opacity\`: 3 rows \[00:00, 545.05 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 6 values computed.### Iterate: adjust opacity for a few images first You define a custom function using the `@pxt.udf` decorator to make it available in Pixeltable. Inside the function, you use standard PIL (Pillow) operations to manipulate images. Pixeltable handles applying your function to every row in your table. **How it works:** * All image manipulation (`.convert()`, `.split()`, `.point()`, `.putalpha()`) comes from the PIL/Pillow library * These are standard Python image operations—see [Pillow docs](https://pillow.readthedocs.io/) for reference * The `@pxt.udf` decorator lets Pixeltable apply your function to table rows * The opacity parameter (0.0 = fully transparent, 1.0 = fully opaque) controls the alpha scaling ```python theme={null} @pxt.udf def set_opacity(img: Image.Image, opacity: float) -> Image.Image: """Set image opacity (0.0 = fully transparent, 1.0 = fully opaque).""" img = img.convert('RGBA') alpha = img.split()[3] # Get alpha channel alpha = alpha.point(lambda p: int(p * opacity)) # Scale alpha values img.putalpha(alpha) return img ``` ```python theme={null} # Test 25%, 50%, and 75% opacity t.select( t.image, alpha_25=set_opacity(t.image, 0.25), alpha_50=set_opacity(t.image, 0.5), alpha_75=set_opacity(t.image, 0.75), ).head(1) ``` ### Add: adjust opacity for all images in your table ```python theme={null} # Create 50% opacity for backgrounds t.add_computed_column(semi_transparent=set_opacity(t.image, 0.5)) ```
Added 3 column values with 0 errors. 3 rows updated, 3 values computed.```python theme={null} # View original and semi-transparent side by side t.collect() ``` ## Explanation **How the opacity technique works:** The UDF modifies the alpha channel to control transparency. The function converts the image to RGBA mode (which includes an alpha channel for transparency), extracts the alpha channel with `.split()[3]`, scales all values by the desired opacity factor using `.point(lambda p: int(p * opacity))`, and applies it back with `.putalpha()`. This preserves the original image while adjusting only the transparency level. **To customize the UDF:** * **Opacity levels**: Use 0.25 for very faint backgrounds, 0.5 for standard transparency, 0.75 for subtle effects * **Selective transparency**: Modify the lambda function in `.point()` to apply different transparency to different pixel values * **Preserve regions**: Add conditional logic to keep certain areas fully opaque **The Pixeltable workflow:** In traditional databases, `.select()` just picks which columns to view. In Pixeltable, `.select()` also lets you compute new transformations on the fly—define new columns without storing them. This makes `.select()` perfect for testing transformations before you commit them. When you use `.select()`, you’re creating a query that doesn’t execute until you call `.collect()`. You must use `.collect()` to execute the query and return results—nothing is stored in your table. If you want to collect only the first few rows, use `.head(n)` instead of `.collect()` to test on a subset before processing your full dataset. Once satisfied, use `.add_computed_column()` with the same expression to persist results permanently. For more on this workflow, see [Get fast feedback on transformations](/howto/cookbooks/core/dev-iterative-workflow). ## See also * [Test transformations with fast feedback loops](/howto/cookbooks/core/dev-iterative-workflow) * [Add watermarks to images](/howto/cookbooks/images/img-add-watermarks) * [Transform images with PIL operations](/howto/cookbooks/images/img-pil-transforms) # Apply image filters Source: https://docs.pixeltable.com/howto/cookbooks/images/img-apply-filters
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'image\_demo'. \```python theme={null} t = pxt.create_table('image_demo/filters', {'image': pxt.Image}) ```
Created table 'filters'.```python theme={null} t.insert( [ { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg' }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000106.jpg' }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000285.jpg' }, ] ) ```
Inserting rows into \`filters\`: 0 rows \[00:00, ? rows/s]Inserting rows into \`filters\`: 3 rows \[00:00, 538.79 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 6 values computed.### Iterate: apply filters to a few images first ```python theme={null} @pxt.udf def apply_blur(img: pxt.Image) -> pxt.Image: """Apply blur filter.""" return img.filter(ImageFilter.BLUR) @pxt.udf def apply_sharpen(img: pxt.Image) -> pxt.Image: """Apply sharpen filter.""" return img.filter(ImageFilter.SHARPEN) @pxt.udf def apply_find_edges(img: pxt.Image) -> pxt.Image: """Apply edge detection filter.""" return img.filter(ImageFilter.FIND_EDGES) @pxt.udf def apply_edge_enhance(img: pxt.Image) -> pxt.Image: """Apply edge enhancement filter.""" return img.filter(ImageFilter.EDGE_ENHANCE) ``` ```python theme={null} # Test blur and sharpen t.select(t.image, apply_blur(t.image), apply_sharpen(t.image)).head(1) ``` ### Add: apply filters to all images in your table ```python theme={null} # Add filter columns t.add_computed_column(blurred=apply_blur(t.image)) t.add_computed_column(sharpened=apply_sharpen(t.image)) t.add_computed_column(edges=apply_find_edges(t.image)) t.add_computed_column(edge_enhanced=apply_edge_enhance(t.image)) ```
Added 3 column values with 0 errors. Added 3 column values with 0 errors. Added 3 column values with 0 errors. Added 3 column values with 0 errors. 3 rows updated, 3 values computed.### View results Compare original and filtered images. ```python theme={null} # Compare blur and sharpen t.select(t.image, t.blurred, t.sharpened).collect() ``` ```python theme={null} # Compare edge detection filters t.select(t.image, t.edges, t.edge_enhanced).collect() ``` ## Explanation **How the filter technique works:** The UDFs wrap PIL’s `ImageFilter` module to apply convolution-based filters to images. Each filter uses a predefined kernel that processes pixel neighborhoods to achieve different effects. Blur averages surrounding pixels to reduce detail, Sharpen enhances pixel differences to increase detail, Find Edges detects boundaries between contrasting regions, and Edge Enhance strengthens edges while preserving the full image. You can apply multiple filters to the same image to create different versions for analysis or visual effects. **To customize the UDFs:** * **Blur intensity**: Use `ImageFilter.BoxBlur(radius)` or `ImageFilter.GaussianBlur(radius)` for adjustable blur strength * **Edge detection**: Combine with grayscale conversion for clearer edge maps * **Filter stacking**: Apply multiple filters in sequence for complex effects * **Custom kernels**: Use `ImageFilter.Kernel()` to define your own convolution filters **The Pixeltable workflow:** In traditional databases, `.select()` just picks which columns to view. In Pixeltable, `.select()` also lets you compute new transformations on the fly—define new columns without storing them. This makes `.select()` perfect for testing transformations before you commit them. When you use `.select()`, you’re creating a query that doesn’t execute until you call `.collect()`. You must use `.collect()` to execute the query and return results—nothing is stored in your table. If you want to collect only the first few rows, use `.head(n)` instead of `.collect()` to test on a subset before processing your full dataset. Once satisfied, use `.add_computed_column()` with the same expression to persist results permanently. For more on this workflow, see [Get fast feedback on transformations](/howto/cookbooks/core/dev-iterative-workflow). ## See also * [Test transformations with fast feedback loops](/howto/cookbooks/core/dev-iterative-workflow) * [Adjust image brightness and contrast](/howto/cookbooks/images/img-brightness-contrast) * *Pillow techniques from [Real Python: Image Processing With the Python Pillow Library](https://realpython.com/image-processing-with-the-python-pillow-library/)* # Adjust image brightness and contrast Source: https://docs.pixeltable.com/howto/cookbooks/images/img-brightness-contrast
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'image\_demo'. \```python theme={null} t = pxt.create_table('image_demo/enhancements', {'image': pxt.Image}) ```
Created table 'enhancements'.```python theme={null} t.insert( [ { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000016.jpg' }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000049.jpg' }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg' }, ] ) ```
Inserting rows into \`enhancements\`: 0 rows \[00:00, ? rows/s]Inserting rows into \`enhancements\`: 3 rows \[00:00, 601.16 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 6 values computed.### Iterate: adjust brightness and contrast for a few images first ```python theme={null} @pxt.udf def adjust_brightness(img: pxt.Image, factor: float) -> pxt.Image: """Adjust brightness. factor < 1 = darker, > 1 = brighter.""" return ImageEnhance.Brightness(img).enhance(factor) @pxt.udf def adjust_contrast(img: pxt.Image, factor: float) -> pxt.Image: """Adjust contrast. factor < 1 = lower, > 1 = higher.""" return ImageEnhance.Contrast(img).enhance(factor) @pxt.udf def adjust_saturation(img: pxt.Image, factor: float) -> pxt.Image: """Adjust saturation. factor < 1 = less saturated, > 1 = more saturated.""" return ImageEnhance.Color(img).enhance(factor) ``` ```python theme={null} # Test brightness adjustments t.select( t.image, adjust_brightness(t.image, 0.5), adjust_brightness(t.image, 1.5), ).head(1) ``` ### Add: adjust brightness and contrast for all images in your table ```python theme={null} # Brightness adjustments (1.0 = original) t.add_computed_column(darker=adjust_brightness(t.image, 0.5)) t.add_computed_column(brighter=adjust_brightness(t.image, 1.5)) # Contrast adjustments t.add_computed_column(low_contrast=adjust_contrast(t.image, 0.5)) t.add_computed_column(high_contrast=adjust_contrast(t.image, 2.0)) # Color saturation t.add_computed_column(desaturated=adjust_saturation(t.image, 0.3)) t.add_computed_column(saturated=adjust_saturation(t.image, 2.0)) ```
Added 3 column values with 0 errors. Added 3 column values with 0 errors. Added 3 column values with 0 errors. Added 3 column values with 0 errors. Added 3 column values with 0 errors. Added 3 column values with 0 errors. 3 rows updated, 3 values computed.### View results Compare different enhancement levels side-by-side. ```python theme={null} # Compare brightness levels t.select(t.image, t.darker, t.brighter).collect() ``` ```python theme={null} # Compare contrast levels t.select(t.image, t.low_contrast, t.high_contrast).collect() ``` ```python theme={null} # Compare saturation levels t.select(t.image, t.desaturated, t.saturated).collect() ``` ## Explanation **How the enhancement technique works:** The UDFs wrap PIL’s `ImageEnhance` module to adjust visual properties of images. Each enhancement type creates an enhancer object for the image, then applies a multiplication factor. A factor of 1.0 leaves the image unchanged, values below 1.0 decrease the property (darker, less contrast, desaturated), and values above 1.0 increase it (brighter, more contrast, saturated). You can apply different factors to the same image to create multiple variations for comparison or different use cases. **To customize the UDFs:** * **Brightness factors**: Use 0.5 for darker images, 1.5 for brighter, or adjust to match your lighting needs * **Contrast factors**: Use 0.5 for lower contrast, 2.0 for higher contrast, or fine-tune for image clarity * **Saturation factors**: Use 0.3 for desaturated/muted colors, 2.0 for vibrant colors, or 0.0 for complete grayscale * **Combine adjustments**: Apply multiple enhancements to create complex transformations **The Pixeltable workflow:** In traditional databases, `.select()` just picks which columns to view. In Pixeltable, `.select()` also lets you compute new transformations on the fly—define new columns without storing them. This makes `.select()` perfect for testing transformations before you commit them. When you use `.select()`, you’re creating a query that doesn’t execute until you call `.collect()`. You must use `.collect()` to execute the query and return results—nothing is stored in your table. If you want to collect only the first few rows, use `.head(n)` instead of `.collect()` to test on a subset before processing your full dataset. Once satisfied, use `.add_computed_column()` with the same expression to persist results permanently. For more on this workflow, see [Get fast feedback on transformations](/howto/cookbooks/core/dev-iterative-workflow). ## See also * [Test transformations with fast feedback loops](/howto/cookbooks/core/dev-iterative-workflow) * [Apply image filters](/howto/cookbooks/images/img-apply-filters) * *Pillow techniques from [Real Python: Image Processing With the Python Pillow Library](https://realpython.com/image-processing-with-the-python-pillow-library/)* # Detect objects in images Source: https://docs.pixeltable.com/howto/cookbooks/images/img-detect-objects
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'detection\_demo'. \```python theme={null} # Create table for images images = pxt.create_table('detection_demo/images', {'image': pxt.Image}) ```
Created table 'images'.```python theme={null} # Insert sample images (COCO dataset samples with common objects) image_urls = [ 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000106.jpg', ] images.insert([{'image': url} for url in image_urls]) ```
Inserting rows into \`images\`: 0 rows \[00:00, ? rows/s] Inserting rows into \`images\`: 3 rows \[00:00, 523.85 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 6 values computed.```python theme={null} # View images images.collect() ``` ### Run object detection Add a computed column that runs YOLOX on each image: ```python theme={null} # Run YOLOX object detection # model_id options: yolox_nano, yolox_tiny, yolox_s, yolox_m, yolox_l, yolox_x images.add_computed_column( detections=yolox(images.image, model_id='yolox_m', threshold=0.5) ) ```
Added 3 column values with 0 errors. 3 rows updated, 3 values computed.```python theme={null} # View detection results images.select(images.image, images.detections).collect() ``` ### Extract detection details Parse the detection output to get object counts and classes: ```python theme={null} # Extract number of detections @pxt.udf def count_objects(detections: dict) -> int: """Count the number of detected objects.""" return len(detections.get('labels', [])) images.add_computed_column(object_count=count_objects(images.detections)) ```
Added 3 column values with 0 errors. 3 rows updated, 6 values computed.```python theme={null} # Extract unique object classes @pxt.udf def get_classes(detections: dict) -> list: """Get list of detected object classes.""" return list(set(detections.get('labels', []))) images.add_computed_column(object_classes=get_classes(images.detections)) ```
Added 3 column values with 0 errors. 3 rows updated, 3 values computed.```python theme={null} # View summary images.select( images.image, images.object_count, images.object_classes ).collect() ``` ## Explanation **YOLOX model sizes:** **Detection output format:** The `detections` dictionary contains: * `labels`: List of class names (e.g., “person”, “car”, “dog”) * `boxes`: Bounding box coordinates \[x1, y1, x2, y2] * `scores`: Confidence scores (0-1) **Adjusting threshold:** * Higher threshold (0.7-0.9): Fewer detections, higher confidence * Lower threshold (0.3-0.5): More detections, may include false positives ## See also * [Extract frames from videos](/howto/cookbooks/video/video-extract-frames) - Detect objects in video frames * [Analyze images in batch](/howto/cookbooks/images/vision-batch-analysis) - AI vision analysis * [Find similar images](/howto/cookbooks/search/search-similar-images) - Visual similarity search # Compare object detection and panoptic segmentation Source: https://docs.pixeltable.com/howto/cookbooks/images/img-detection-vs-segmentation
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'detection\_vs\_seg'. \```python theme={null} images = pxt.create_table('detection_vs_seg/images', {'image': pxt.Image}) base_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images' images.insert( [ {'image': f'{base_url}/000000000034.jpg'}, {'image': f'{base_url}/000000000049.jpg'}, ] ) ```
Created table 'images'. Inserted 2 rows with 0 errors in 0.22 s (9.21 rows/s) 2 rows inserted.### Run object detection The `detr_for_object_detection` function returns bounding boxes, labels, and confidence scores. **Parameters:** * `model_id`: DETR variant (`facebook/detr-resnet-50` or `facebook/detr-resnet-101`) * `threshold`: Confidence threshold (0.0-1.0). Higher = fewer but more confident detections **Output:** ```python theme={null} {'boxes': [[x1, y1, x2, y2], ...], 'scores': [0.98, ...], 'label_text': ['person', ...]} ``` ```python theme={null} images.add_computed_column( detections=detr_for_object_detection( images.image, model_id='facebook/detr-resnet-50', threshold=0.8 ) ) ```
Added 2 column values with 0 errors in 4.09 s (0.49 rows/s) 2 rows updated.```python theme={null} # View detection results images.select(images.image, images.detections).collect() ``` ### Visualize detections with bounding boxes Use `draw_bounding_boxes` to overlay the detection results on the original image. ```python theme={null} images.add_computed_column( detection_viz=draw_bounding_boxes( images.image, boxes=images.detections.boxes, labels=images.detections.label_text, fill=True, width=2, ) ) ```
Added 2 column values with 0 errors in 0.03 s (58.89 rows/s) 2 rows updated.```python theme={null} images.select(images.detection_viz).collect() ``` ### Run panoptic segmentation The `detr_for_segmentation` function returns pixel-level masks and segment metadata. **Parameters:** * `model_id`: Segmentation model (`facebook/detr-resnet-50-panoptic`) * `threshold`: Confidence threshold for filtering segments **Output:** ```python theme={null} { 'segmentation': np.ndarray, # (H, W) array where each pixel = segment ID 'segments_info': [{'id': 1, 'label_text': 'person', 'score': 0.98}, ...] } ``` > **Note:** The full segmentation output contains a numpy array that > can’t be stored as JSON. We store just the `segments_info` metadata > and compute the pixel-level visualization inline. ```python theme={null} # Store just the segments_info (JSON-serializable) as a computed column # The segmentation array will be computed inline for visualization seg_expr = detr_for_segmentation( images.image, model_id='facebook/detr-resnet-50-panoptic', threshold=0.5, ) images.add_computed_column(segments_info=seg_expr.segments_info) ``` ```python theme={null} # View stored segmentation info images.select(images.image, images.segments_info).collect() ``` ### Visualize segmentation with colored overlay Use `overlay_segmentation` to visualize the pixel masks with colored regions and contours. ```python theme={null} # Compute segmentation visualization inline # Cast the segmentation array to the proper type for overlay_segmentation seg_expr = detr_for_segmentation( images.image, model_id='facebook/detr-resnet-50-panoptic', threshold=0.5, ) segmentation_map = seg_expr.segmentation.astype( pxt.Array[(None, None), np.int32] ) images.select( segmentation_viz=overlay_segmentation( images.image, segmentation_map, alpha=0.5, draw_contours=True, contour_thickness=2, ) ).collect() ``` ### Compare side-by-side ```python theme={null} # Side-by-side comparison: original, detection, segmentation seg_expr = detr_for_segmentation( images.image, model_id='facebook/detr-resnet-50-panoptic', threshold=0.5, ) segmentation_map = seg_expr.segmentation.astype( pxt.Array[(None, None), np.int32] ) images.select( images.image, images.detection_viz, segmentation_viz=overlay_segmentation( images.image, segmentation_map, alpha=0.5, draw_contours=True, contour_thickness=2, ), ).collect() ``` ### Count objects per image ```python theme={null} # Count objects per image (using stored columns) images.select( images.image, num_detections=images.detections.boxes.apply(len, col_type=pxt.Int), num_segments=images.segments_info.apply(len, col_type=pxt.Int), ).collect() ``` ## Explanation Detection gives fast, approximate locations. Segmentation gives slower but precise boundaries. ### Capability comparison ### Performance tradeoffs ### When to use each **Choose detection when:** * You need to know *what* objects are present and *where* (approximately) * Speed matters (detection is 2x faster) * You need search, filtering, or counting * Bounding boxes suffice for visualization **Choose segmentation when:** * You need *exact* object boundaries (pixel-perfect masks) * You’re doing image editing, compositing, or AR * You need to measure actual object area/coverage * You want scene composition analysis (what % is sky vs buildings) ## See also * [Detect objects in images](./img-detect-objects) - Object detection with YOLOX * [Visualize detections](./img-visualize-detections) - Draw bounding boxes and labels * [DETR documentation](https://huggingface.co/docs/transformers/model_doc/detr) - Hugging Face model docs # Generate captions for images Source: https://docs.pixeltable.com/howto/cookbooks/images/img-generate-captions
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'caption\_demo'. \```python theme={null} # Create table for images images = pxt.create_table('caption_demo/images', {'image': pxt.Image}) ```
Created table 'images'.```python theme={null} # Insert sample images image_urls = [ 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000106.jpg', ] images.insert([{'image': url} for url in image_urls]) ```
Inserted 3 rows with 0 errors in 0.12 s (25.17 rows/s) 3 rows inserted.```python theme={null} # View images images.collect() ``` ### Generate captions Add a computed column that generates captions using the vision model: ```python theme={null} # Add caption using OpenAI vision messages = [ { 'role': 'user', 'content': [ { 'type': 'text', 'text': 'Write a concise, descriptive caption for this image in one sentence.', }, {'type': 'image_url', 'image_url': images.image}, ], } ] images.add_computed_column( caption=chat_completions(messages, model='gpt-4o-mini') ) ```
Added 3 column values with 0 errors in 4.62 s (0.65 rows/s) 3 rows updated.```python theme={null} # View images with captions images.select( images.image, images.caption['choices'][0]['message']['content'] ).collect() ``` ### Different caption styles You can generate multiple caption styles for different uses: ```python theme={null} # Add alt text for accessibility (brief) messages = [ { 'role': 'user', 'content': [ { 'type': 'text', 'text': 'Write a brief alt text for this image (under 125 characters) for screen readers.', }, {'type': 'image_url', 'image_url': images.image}, ], } ] images.add_computed_column( alt_text=chat_completions(messages, model='gpt-4o-mini') ) ```
Added 3 column values with 0 errors in 3.51 s (0.85 rows/s) 3 rows updated.```python theme={null} # Add detailed description messages = [ { 'role': 'user', 'content': [ { 'type': 'text', 'text': 'Describe this image in detail, including objects, colors, setting, and mood.', }, {'type': 'image_url', 'image_url': images.image}, ], } ] images.add_computed_column( description=chat_completions(messages, model='gpt-4o-mini') ) ```
Added 3 column values with 0 errors in 11.28 s (0.27 rows/s) 3 rows updated.```python theme={null} # View all caption types images.select( images.image, images.caption['choices'][0]['message']['content'], images.alt_text['choices'][0]['message']['content'], images.description['choices'][0]['message']['content'], ).collect() ``` ## Explanation **Caption prompt patterns:** **Model selection:** * `gpt-4o-mini`: Fast and affordable, good for most captioning tasks * `gpt-4o`: Higher quality for complex images or detailed descriptions ## See also * [Analyze images in batch](/howto/cookbooks/images/vision-batch-analysis) - Run custom prompts on images * [Extract structured data from images](/howto/cookbooks/images/vision-structured-output) - Get JSON from images * [Find similar images](/howto/cookbooks/search/search-similar-images) - Visual similarity search # Transform images with AI-powered editing Source: https://docs.pixeltable.com/howto/cookbooks/images/img-image-to-image
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/cpestano/.pixeltable/pgdata Created directory 'img2img\_demo'. \```python theme={null} t = pxt.create_table( 'img2img_demo/images', { 'image': pxt.Image, 'prompt': pxt.String, 'negative_prompt': pxt.String, }, ) ```
Created table 'images'.```python theme={null} t.insert( [ { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000285.jpg', 'prompt': 'oil painting style, vibrant colors, brushstrokes visible', 'negative_prompt': 'blurry, low quality, bad anatomy', }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000776.jpg', 'prompt': 'watercolor painting, soft edges, artistic', 'negative_prompt': 'blurry, low quality, bad anatomy', }, ] ) ```
Inserted 2 rows with 0 errors in 0.49 s (4.07 rows/s) 2 rows inserted.```python theme={null} # View original images and prompts t.collect() ``` ### Iterate: test transformation on a single image Use `.select()` to define the transformation, then `.head(n)` to preview results on a subset of images. Nothing is stored in your table. The `image_to_image` function requires: * `image`: The source image to transform * `prompt`: Text describing the desired output * `model_id`: A Hugging Face model ID that supports image-to-image (e.g., `stable-diffusion-v1-5/stable-diffusion-v1-5`) ```python theme={null} # Preview transformation on first image t.select( t.image, t.prompt, image_to_image( t.image, t.prompt, model_id='stable-diffusion-v1-5/stable-diffusion-v1-5', ), ).head(1) ``` ### Iterate: adjust transformation strength You control how much the model modifies the original image using `strength` (0.0-1.0): * **Lower values** (0.3-0.5): Subtle changes, preserves more of the original * **Higher values** (0.7-1.0): Dramatic changes, more creative freedom You pass additional parameters through `model_kwargs`. For example, `negative_prompt` text describing what you don’t want the output to be. ```python theme={null} # Preview with lower strength (more preservation of original) t.select( t.image, t.prompt, t.negative_prompt, image_to_image( t.image, t.prompt, model_id='stable-diffusion-v1-5/stable-diffusion-v1-5', model_kwargs={ 'negative_prompt': t.negative_prompt, 'strength': 0.5, 'num_inference_steps': 30, }, ), ).head(1) ``` ### Add: apply transformation to all images Once you’re satisfied with the results, use `.add_computed_column()` with the same expression. This processes all rows and stores the results permanently in your table. ```python theme={null} # Save as computed column t.add_computed_column( transformed=image_to_image( t.image, t.prompt, model_id='stable-diffusion-v1-5/stable-diffusion-v1-5', model_kwargs={ 'strength': 0.5, 'num_inference_steps': 25, 'negative_prompt': t.negative_prompt, }, ) ) ```
Added 2 column values with 0 errors in 53.83 s (0.04 rows/s) 2 rows updated.```python theme={null} # View original and transformed images side by side t.select(t.image, t.prompt, t.negative_prompt, t.transformed).collect() ``` ### Use reproducible results with seeds You set a `seed` parameter to get the same output every time you run the transformation. ```python theme={null} # Add reproducible transformation t.add_computed_column( transformed_seeded=image_to_image( t.image, t.prompt, model_id='stable-diffusion-v1-5/stable-diffusion-v1-5', seed=42, model_kwargs={ 'strength': 0.5, 'negative_prompt': t.negative_prompt, }, ) ) ```
Added 2 column values with 0 errors in 96.24 s (0.02 rows/s) 2 rows updated.```python theme={null} # View results t.select(t.image, t.transformed_seeded).collect() ``` ## Explanation **How image-to-image works:** Image-to-image diffusion models take an existing image and a text prompt, then generate a new image that blends the structure of the original with the guidance from the prompt. The `strength` parameter controls the balance—lower values preserve more of the original, while higher values allow more dramatic transformations. **Model compatibility:** The `image_to_image` UDF uses `AutoPipelineForImage2Image` from the diffusers library, which automatically detects the model type and selects the appropriate pipeline. You use any compatible model: * `stable-diffusion-v1-5/stable-diffusion-v1-5` - General-purpose, runs on most hardware * `stabilityai/stable-diffusion-xl-base-1.0` - Higher quality, needs more VRAM **Key parameters:** * `strength` (0.0-1.0): How much to transform the image * `negative_prompt`: Text describing what to avoid in the generated image (e.g., “blurry, low quality”). * `num_inference_steps`: Quality vs speed tradeoff (more steps = better quality) * `guidance_scale`: How closely to follow the prompt (7-8 is typical) * `seed`: For reproducible results ## See also * [Apply filters to images](/howto/cookbooks/images/img-apply-filters) * [Generate captions for images](/howto/cookbooks/images/img-generate-captions) * [Hugging Face image-to-image models](https://huggingface.co/models?pipeline_tag=image-to-image) # Transform images with PIL operations Source: https://docs.pixeltable.com/howto/cookbooks/images/img-pil-transforms
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'image\_demo'. \```python theme={null} t = pxt.create_table('image_demo/images', {'image': pxt.Image}) ```
Created table 'images'.```python theme={null} t.insert( [ { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000285.jpg' }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000776.jpg' }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000885.jpg' }, ] ) ```
Inserting rows into \`images\`: 0 rows \[00:00, ? rows/s] Inserting rows into \`images\`: 3 rows \[00:00, 708.38 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 6 values computed.### Iterate: check image properties for a few images first Use `.select()` to define the transformation, then `.collect()` to execute and return results. If you want to collect only the first few rows, use `.head(n)` instead of `.collect()`. Nothing is stored in your table. Pixeltable includes these built-in functions for image properties: * `.height` - Get image height in pixels * `.width` - Get image width in pixels * `.mode` - Get color mode (RGB, RGBA, L for grayscale, etc.) ```python theme={null} # Preview the properties t.select(t.image, t.image.height, t.image.width, t.image.mode).collect() ``` ### Add: check image properties for all images in your table ```python theme={null} # Save as computed columns t.add_computed_column(height=t.image.height) t.add_computed_column(width=t.image.width) t.add_computed_column(mode=t.image.mode) # RGB, RGBA, L (grayscale), etc. ```
Added 3 column values with 0 errors. Added 3 column values with 0 errors. Added 3 column values with 0 errors. 3 rows updated, 6 values computed.```python theme={null} # View images with computed height, width, and mode columns t.collect() ``` ### Iterate: resize a few images first Use `.select()` to define the transformation, then `.collect()` to execute and return results. If you want to collect only the first few rows, use `.head(n)` instead of `.collect()`. Nothing is stored in your table. Pixeltable includes a built-in function for resizing image files with PIL: * `.resize(width, height)` - Change image dimensions ```python theme={null} # Preview the resize operation t.select(t.image, t.image.resize((224, 224))).head(1) ``` ### Add: resize all images in your table Once you’re satisfied with the results, use `.add_computed_column()` with the same expression. This processes all rows and stores the results permanently in your table. ```python theme={null} # Save as computed column t.add_computed_column(resized=t.image.resize((224, 224))) ```
Added 3 column values with 0 errors. 3 rows updated, 3 values computed.```python theme={null} # View images with resized column t.collect() ``` ### Iterate: rotate a few images first Use `.select()` to define the transformation, then `.collect()` to execute and return results. If you want to collect only the first few rows, use `.head(n)` instead of `.collect()`. Nothing is stored in your table. Pixeltable includes a built-in function for rotating image files with PIL: * `.rotate(degrees)` - Rotate image by specified degrees ```python theme={null} # Preview the rotation t.select(t.image, t.image.rotate(90)).head(1) ``` ### Add: rotate all images in your table Once you’re satisfied with the results, use `.add_computed_column()` with the same expression. This processes all rows and stores the results permanently in your table. ```python theme={null} # Save as computed column t.add_computed_column(rotated=t.image.rotate(90)) ```
Added 3 column values with 0 errors. 3 rows updated, 3 values computed.```python theme={null} # View images with rotated column t.collect() ``` ### Iterate: flip a few images first Use `.select()` to define the transformation, then `.collect()` to execute and return results. If you want to collect only the first few rows, use `.head(n)` instead of `.collect()`. Nothing is stored in your table. Pixeltable includes a built-in function for transposing image files with PIL (note that for this transform you will need import PIL to access the `FLIP_*` constants): * `.transpose(Image.FLIP_TOP_BOTTOM)` - Flip image vertically * `.transpose(Image.FLIP_LEFT_RIGHT)` - Mirror image horizontally ```python theme={null} # Import PIL Image to access flip constants from PIL import Image # Preview both flip operations t.select( t.image, t.image.transpose(Image.FLIP_TOP_BOTTOM), t.image.transpose(Image.FLIP_LEFT_RIGHT), ).head(1) ``` ### Add: flip all images in your table Once you’re satisfied with the results, use `.add_computed_column()` with the same expression. This processes all rows and stores the results permanently in your table. ```python theme={null} # Flip vertically (top to bottom) t.add_computed_column(flip_v=t.image.transpose(Image.FLIP_TOP_BOTTOM)) # Flip horizontally (left to right, mirror effect) t.add_computed_column(flip_h=t.image.transpose(Image.FLIP_LEFT_RIGHT)) ```
Added 3 column values with 0 errors. Added 3 column values with 0 errors. 3 rows updated, 3 values computed.```python theme={null} # View original and flipped versions side by side t.select(t.image, t.flip_v, t.flip_h).collect() ``` ### Iterate: crop a few images first Use `.select()` to define the transformation, then `.collect()` to execute and return results. If you want to collect only the first few rows, use `.head(n)` instead of `.collect()`. Nothing is stored in your table. Pixeltable includes a built-in function for cropping image files with PIL: * `.crop(box)` - Extract a rectangular region from the image (box format: `(left, top, right, bottom)`) ```python theme={null} # Preview the center crop # Box format: (left, top, right, bottom) t.select( t.image, t.image.crop( ( t.image.width // 4, t.image.height // 4, 3 * t.image.width // 4, 3 * t.image.height // 4, ) ), ).head(1) ``` ### Add: crop all images in your table Once you’re satisfied with the results, use `.add_computed_column()` with the same expression. This processes all rows and stores the results permanently in your table. ```python theme={null} # Save as computed column t.add_computed_column( center_crop=t.image.crop( ( t.image.width // 4, t.image.height // 4, 3 * t.image.width // 4, 3 * t.image.height // 4, ) ) ) ```
Added 3 column values with 0 errors. 3 rows updated, 3 values computed.```python theme={null} # View center-cropped images t.select(t.center_crop).collect() ``` ## Explanation **How PIL transformations work in Pixeltable:** Pixeltable provides built-in functions that wrap PIL (Pillow) operations for image manipulation. These functions work directly on image columns in your table—no need to write loops or manage file I/O. When you call `.resize()`, `.rotate()`, or other methods on an image column, Pixeltable handles applying the transformation to each image automatically. All these transformations use standard PIL operations under the hood. For more details on PIL functionality, see the [Pillow documentation](https://pillow.readthedocs.io/). **To customize transformations:** * **Resize**: Change dimensions with `.resize((width, height))` - specify target size in pixels * **Rotate**: Rotate counterclockwise with `.rotate(degrees)` - use negative values for clockwise rotation * **Flip**: Use `.transpose(Image.FLIP_LEFT_RIGHT)` for horizontal mirror or `.transpose(Image.FLIP_TOP_BOTTOM)` for vertical flip * **Crop**: Extract regions with `.crop((left, top, right, bottom))` - coordinates are in pixels from top-left origin * **Properties**: Access `.width`, `.height`, and `.mode` to get image dimensions and color mode **The Pixeltable workflow:** In traditional databases, `.select()` just picks which columns to view. In Pixeltable, `.select()` also lets you compute new transformations on the fly—define new columns without storing them. This makes `.select()` perfect for testing transformations before you commit them. When you use `.select()`, you’re creating a query that doesn’t execute until you call `.collect()`. You must use `.collect()` to execute the query and return results—nothing is stored in your table. If you want to collect only the first few rows, use `.head(n)` instead of `.collect()` to test on a subset before processing your full dataset. Once satisfied, use `.add_computed_column()` with the same expression to persist results permanently. For more on this workflow, see [Get fast feedback on transformations](/howto/cookbooks/core/dev-iterative-workflow). ## See also * [Convert RGB images to grayscale](/howto/cookbooks/images/img-rgb-to-grayscale) * [Apply filters to images](/howto/cookbooks/images/img-apply-filters) * [Test transformations with fast feedback loops](/howto/cookbooks/core/dev-iterative-workflow) # Convert color images to grayscale Source: https://docs.pixeltable.com/howto/cookbooks/images/img-rgb-to-grayscale
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'image\_demo'. \```python theme={null} t = pxt.create_table('image_demo/gray', {'image': pxt.Image}) ```
Created table 'gray'.```python theme={null} t.insert( [ { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg' }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg' }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000106.jpg' }, ] ) ```
Inserting rows into \`gray\`: 0 rows \[00:00, ? rows/s]Inserting rows into \`gray\`: 3 rows \[00:00, 617.66 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 6 values computed.```python theme={null} # View loaded images t.collect() ``` ### Iterate: convert with linear approximation for a few images first ```python theme={null} # Query: Preview the conversion t.select(t.image, t.image.convert('L')).head(1) ``` ### Add: convert with linear approximation for all images in your table ```python theme={null} # Commit: Save as computed column (built-in PIL conversion - fast and good for most use cases) t.add_computed_column(grayscale=t.image.convert('L')) ```
Added 3 column values with 0 errors. 3 rows updated, 3 values computed.```python theme={null} # View images with grayscale column t.collect() ``` ### Iterate: convert with gamma decompression for a few images first ```python theme={null} @pxt.udf def rgb_to_gray_accurate(img: Image.Image) -> Image.Image: """Convert RGB to grayscale with full gamma correction. Most accurate but slower. Gamma-decompresses, applies perceptual weights in linear space, then re-compresses for display. """ rgb = np.array(img).astype(np.float32) / 255.0 # Gamma decompress: make pixel values perceptually linear rgb_lin = ((rgb + 0.055) / 1.055) ** 2.4 rgb_lin = np.where(rgb <= 0.04045, rgb / 12.92, rgb_lin) # Apply perceptual weights in linear space gray_lin = ( 0.2126 * rgb_lin[:, :, 0] + 0.7152 * rgb_lin[:, :, 1] + 0.0722 * rgb_lin[:, :, 2] ) # Gamma compress: make values display-ready gray = 1.055 * gray_lin ** (1 / 2.4) - 0.055 gray = np.where(gray_lin <= 0.0031308, 12.92 * gray_lin, gray) gray = (gray * 255).astype(np.uint8) return Image.fromarray(gray) ``` ```python theme={null} # Compare both methods on first image t.select(t.image, t.grayscale, rgb_to_gray_accurate(t.image)).head(1) ``` ### Add: convert with gamma decompression for all images in your table ```python theme={null} t.add_computed_column(accurate=rgb_to_gray_accurate(t.image)) ```
Added 3 column values with 0 errors. 3 rows updated, 3 values computed.```python theme={null} # View all results t.collect() ``` ## Explanation **Two approaches:** 1. **Simple (`.convert('L')`):** PIL’s built-in. Fast, good for most use cases (model preprocessing, general analysis). 2. **Gamma-corrected (custom UDF):** Not built into PIL. Requires a custom UDF that: * Gamma-decompresses to linear space * Applies perceptual weights: 0.2126 × R + 0.7152 × G + 0.0722 × B * Gamma-compresses back for display * Slower but most perceptually accurate * Use for scientific imaging, professional photography **Why gamma matters:** Displays aren’t linear—doubling a pixel value doesn’t double perceived brightness. Gamma correction accounts for this. For best results, convert to linear space before weighting, then convert back. *The gamma-corrected method is based on [Brandon Rohrer’s explanation](https://brandonrohrer.com/convert_rgb_to_grayscale.html) of perceptually accurate RGB to grayscale conversion.* **The Pixeltable workflow:** In traditional databases, `.select()` just picks which columns to view. In Pixeltable, `.select()` also lets you compute new transformations on the fly—define new columns without storing them. This makes `.select()` perfect for testing transformations before you commit them. When you use `.select()`, you’re creating a query that doesn’t execute until you call `.collect()`. You must use `.collect()` to execute the query and return results—nothing is stored in your table. If you want to collect only the first few rows, use `.head(n)` instead of `.collect()` to test on a subset before processing your full dataset. Once satisfied, use `.add_computed_column()` with the same expression to persist results permanently. For more on this workflow, see [Get fast feedback on transformations](/howto/cookbooks/core/dev-iterative-workflow). ## See also * [Transform images with PIL operations](/howto/cookbooks/images/img-pil-transforms) * [Test transformations with fast feedback loops](/howto/cookbooks/core/dev-iterative-workflow) # Visualize object detections Source: https://docs.pixeltable.com/howto/cookbooks/images/img-visualize-detections
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'viz\_demo'. \### Create detection and visualization pipeline ```python theme={null} # Create table for images images = pxt.create_table('viz_demo/images', {'image': pxt.Image}) ```
Created table 'images'.```python theme={null} # Step 1: Run object detection images.add_computed_column( detections=yolox(images.image, model_id='yolox_m', threshold=0.5) ) ```
Added 0 column values with 0 errors. No rows affected.```python theme={null} # Step 2: Draw bounding boxes on the image # Note: draw_bounding_boxes takes image, boxes, and labels (scores are not used for drawing) images.add_computed_column( annotated=draw_bounding_boxes( images.image, images.detections.bboxes, labels=images.detections.labels, ) ) ```
Added 0 column values with 0 errors. No rows affected.### Detect and visualize ```python theme={null} # Insert sample images base_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images' image_urls = [ f'{base_url}/000000000036.jpg', # cats f'{base_url}/000000000139.jpg', # elephants ] images.insert([{'image': url} for url in image_urls]) ```
Inserting rows into \`images\`: 0 rows \[00:00, ? rows/s] Inserting rows into \`images\`: 2 rows \[00:00, 236.29 rows/s] Inserted 2 rows with 0 errors. 2 rows inserted, 8 values computed.```python theme={null} # View original vs annotated images side by side images.select(images.image, images.annotated).collect() ``` ```python theme={null} # View detection details images.select(images.detections).collect() ``` ## Explanation **Pipeline flow:**
Image → YOLOX detection → Bounding boxes + labels → draw\_bounding\_boxes → Annotated image**Detection output format:** The `yolox` function returns a dict with: * `bboxes` - List of \[x1, y1, x2, y2] coordinates * `labels` - List of class names (e.g., “cat”, “dog”) * `scores` - List of confidence scores (0-1) **YOLOX model options:** ## See also * [Detect objects in images](/howto/cookbooks/images/img-detect-objects) - Object detection basics * [Extract video frames](/howto/cookbooks/video/video-extract-frames) - Detect objects in video # Analyze images in batch with AI vision Source: https://docs.pixeltable.com/howto/cookbooks/images/vision-batch-analysis
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'vision\_demo'. \```python theme={null} t = pxt.create_table('vision_demo/images', {'image': pxt.Image}) ```
Created table 'images'.```python theme={null} # Insert sample images t.insert( [ { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg' }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg' }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000106.jpg' }, ] ) ```
Inserted 3 rows with 0 errors in 0.03 s (88.80 rows/s) 3 rows inserted.```python theme={null} # View loaded images t.collect() ``` ### Analyze images with AI Add a computed column using `openai.vision()`. The prompt runs automatically on all images: ```python theme={null} # Define the prompt messages = [ { 'role': 'user', 'content': [ { 'type': 'text', 'text': 'Describe this image in one sentence.', }, {'type': 'image_url', 'image_url': t.image}, ], } ] # Add computed column for AI analysis using openai.chat_completions() t.add_computed_column( description=openai.chat_completions(messages, model='gpt-4o-mini') ) ```
Added 3 column values with 0 errors in 4.84 s (0.62 rows/s) 3 rows updated.### View results `openai.chat_completions()` returns a JSON structure containing the output, which we can unpack in the usual way: ```python theme={null} # View results: image alongside its AI-generated description t.select( t.image, t.description, t.description['choices'][0]['message']['content'], ).collect() ``` ### New images are analyzed automatically When you insert more images, the analysis runs without any extra code: ```python theme={null} # Insert a new image - analysis happens automatically t.insert( [ { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000139.jpg' } ] ) # View all results including the new image t.select( t.image, t.description, t.description['choices'][0]['message']['content'], ).collect() ``` ## Explanation **How it works:** 1. Add images to your table 2. Define a computed column with `openai.vision()` 3. Pixeltable executes the API call for each row automatically 4. Results are cached—rerunning won’t re-call the API 5. New rows trigger automatic computation **Changing the prompt:** To use a different prompt, add a new computed column with `if_exists='replace'`: ```python theme={null} messages = ... t.add_computed_column( description=openai.chat_completions(messages, model='gpt-4o-mini'), if_exists='replace' ) ``` **Using other providers:** Replace `openai.vision` with: * `anthropic.messages` for Claude * `google.generate_content` for Gemini * `together.chat_completions` for Together AI ## See also * [Configure API keys](/howto/cookbooks/core/workflow-api-keys) * [Working with OpenAI](/howto/providers/working-with-openai) ### New images are analyzed automatically When you insert more images, the analysis runs without any extra code: # Extract structured data from images Source: https://docs.pixeltable.com/howto/cookbooks/images/vision-structured-output
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'extraction\_demo'. \```python theme={null} t = pxt.create_table('extraction_demo/images', {'image': pxt.Image}) ```
Created table 'images'.```python theme={null} # Insert sample images t.insert( [ { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg' }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg' }, ] ) ```
Inserted 2 rows with 0 errors in 0.03 s (60.43 rows/s) 2 rows inserted.### Extract structured data Use `openai.chat_completions()` to analyze images and get JSON output: ```python theme={null} # Add extraction column using openai.vision (handles images directly) PROMPT = """Analyze this image and extract the following as JSON: - description: A brief description of the image - objects: List of objects visible in the image - dominant_colors: List of dominant colors - scene_type: Type of scene (indoor, outdoor, etc.)""" messages = [ { 'role': 'user', 'content': [ {'type': 'text', 'text': PROMPT}, {'type': 'image_url', 'image_url': t.image}, ], } ] t.add_computed_column( data=openai.chat_completions( messages, model='gpt-4o-mini', model_kwargs={'response_format': {'type': 'json_object'}}, ) ) ```
Added 2 column values with 0 errors in 7.55 s (0.26 rows/s) 2 rows updated.```python theme={null} # View extracted data t.select( t.image, t.data, t.data['choices'][0]['message']['content'] ).collect() ``` ```python theme={null} # You can also parse the JSON into individual columns if needed import json @pxt.udf def parse_description(data: str) -> str: return json.loads(data).get('description', '') t.select( t.image, description=parse_description( t.data['choices'][0]['message']['content'] ), ).collect() ``` ## Explanation **Getting JSON output:** Pass `model_kwargs={'response_format': {'type': 'json_object'}}` to get structured JSON. **Other extraction use cases:** ## See also * [Analyze images in batch](/howto/cookbooks/images/vision-batch-analysis) * [Configure API keys](/howto/cookbooks/core/workflow-api-keys) # Create text embeddings with OpenAI Source: https://docs.pixeltable.com/howto/cookbooks/search/embed-text-openai
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'embed\_demo'. \### Create table with embedding column ```python theme={null} # Create table for documents docs = pxt.create_table( 'embed_demo/documents', {'title': pxt.String, 'content': pxt.String} ) ```
Created table 'documents'.```python theme={null} # Add embedding column using OpenAI's text-embedding-3-small docs.add_computed_column( embedding=embeddings(docs.content, model='text-embedding-3-small') ) ```
Added 0 column values with 0 errors. No rows affected.### Insert documents ```python theme={null} # Insert sample documents sample_docs = [ { 'title': 'Python Basics', 'content': 'Python is a high-level programming language known for its clear syntax and readability.', }, { 'title': 'Machine Learning', 'content': 'Machine learning is a subset of AI that enables systems to learn from data.', }, { 'title': 'Web Development', 'content': 'Web development involves building websites and web applications using HTML, CSS, and JavaScript.', }, { 'title': 'Data Science', 'content': 'Data science combines statistics, programming, and domain expertise to extract insights from data.', }, { 'title': 'Cloud Computing', 'content': 'Cloud computing provides on-demand computing resources over the internet.', }, ] docs.insert(sample_docs) ```
Inserting rows into \`documents\`: 5 rows \[00:00, 553.22 rows/s] Inserted 5 rows with 0 errors. 5 rows inserted, 15 values computed.```python theme={null} # View documents with embeddings (showing first 5 dimensions) result = docs.select(docs.title, docs.embedding).collect() ``` ### Query by similarity Find documents similar to a query by creating an embedding index: ```python theme={null} # Add embedding index for semantic search docs.add_embedding_index( column='content', string_embed=embeddings.using(model='text-embedding-3-small'), ) ``` ```python theme={null} # Search for similar documents sim = docs.content.similarity( string='artificial intelligence applications' ) results = ( docs.where(sim > 0.2) .order_by(sim, asc=False) .limit(3) .select(docs.title, docs.content, sim=sim) ) results.collect() ``` ## Explanation **OpenAI embedding models:** **Similarity metrics:** **Key benefits of computed embedding columns:** * Embeddings are generated automatically on insert * Results are cached—no re-computation on subsequent queries * Index enables fast similarity search at scale ## See also * [Semantic text search](/howto/cookbooks/search/search-semantic-text) - Full semantic search patterns * [Chunk documents for RAG](/howto/cookbooks/text/doc-chunk-for-rag) - Prepare documents for retrieval # Build semantic search for text Source: https://docs.pixeltable.com/howto/cookbooks/search/search-semantic-text
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'search\_demo'. \```python theme={null} # Create table with content and metadata kb = pxt.create_table( 'search_demo/articles', {'title': pxt.String, 'content': pxt.String, 'category': pxt.String}, ) ```
Created table 'articles'.```python theme={null} # Insert sample content kb.insert( [ { 'title': 'Debugging best practices', 'content': 'Use logging, breakpoints, and unit tests to identify and fix issues in your code.', 'category': 'engineering', }, { 'title': 'Machine learning model optimization', 'content': 'Improve training efficiency with batch normalization, learning rate schedules, and early stopping.', 'category': 'ml', }, { 'title': 'Production infrastructure setup', 'content': 'Deploy applications using containers, load balancers, and automated scaling.', 'category': 'devops', }, { 'title': 'API design principles', 'content': 'Create RESTful endpoints with proper versioning, authentication, and error handling.', 'category': 'engineering', }, ] ) ```
Inserting rows into \`articles\`: 4 rows \[00:00, 577.69 rows/s] Inserted 4 rows with 0 errors. 4 rows inserted, 12 values computed.### Add semantic search Create an embedding index on the content column: ```python theme={null} # Add embedding index kb.add_embedding_index( column='content', string_embed=sentence_transformer.using(model_id='all-MiniLM-L6-v2'), ) ``` ### Search by meaning Find content semantically similar to your query: ```python theme={null} # Search by meaning query = 'how to fix bugs' sim = kb.content.similarity(string=query) results = ( kb.order_by(sim, asc=False) .select(kb.title, kb.content, score=sim) .limit(2) ) results.collect() ``` ### Filter by metadata Combine semantic search with metadata filters: ```python theme={null} # Search within a specific category query = 'best practices' sim = kb.content.similarity(string=query) results = ( kb.where(kb.category == 'engineering') # Filter first .order_by(sim, asc=False) .select(kb.title, kb.category, score=sim) .limit(2) ) results.collect() ``` ## Explanation **How similarity search works:** 1. Your query is converted to an embedding vector 2. Pixeltable finds the most similar vectors in the index 3. Results are ranked by cosine similarity (0 to 1) **Embedding models:** **New content is indexed automatically:** When you insert new rows, embeddings are generated without extra code. ## See also * [Vector database documentation](/platform/embedding-indexes) * [Split documents for RAG](/howto/cookbooks/text/doc-chunk-for-rag) # Find similar images with CLIP Source: https://docs.pixeltable.com/howto/cookbooks/search/search-similar-images
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'image\_search\_demo'. \```python theme={null} images = pxt.create_table( 'image_search_demo/images', {'image': pxt.Image} ) ```
Created table 'images'.```python theme={null} # Insert sample images images.insert( [ { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg' }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg' }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000106.jpg' }, { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000139.jpg' }, ] ) ```
Inserting rows into \`images\`: 4 rows \[00:00, 973.44 rows/s] Inserted 4 rows with 0 errors. 4 rows inserted, 8 values computed.### Create CLIP embedding index Add an embedding index using CLIP for cross-modal search: ```python theme={null} # Add CLIP embedding index (supports both image and text queries) images.add_embedding_index( 'image', embedding=clip.using(model_id='openai/clip-vit-base-patch32') ) ``` ### Search by text description Find images matching a text query: ```python theme={null} # Search by text description query = 'people eating food' sim = images.image.similarity(string=query) results = ( images.order_by(sim, asc=False) .select(images.image, score=sim) .limit(2) ) results.collect() ``` ## Explanation **Why CLIP:** CLIP (Contrastive Language-Image Pre-training) understands both images and text in the same embedding space. This enables: * Image-to-image search (find similar photos) * Text-to-image search (find photos matching a description) **Index parameters:** **Both must use the same model** for cross-modal search to work. **New images are indexed automatically:** When you insert new images, embeddings are generated without extra code. ## See also * [Semantic text search](/howto/cookbooks/search/search-semantic-text) * [Vector database documentation](/platform/embedding-indexes) # Split documents into chunks for RAG Source: https://docs.pixeltable.com/howto/cookbooks/text/doc-chunk-for-rag
Created directory 'rag\_demo'. \```python theme={null} # Create table for documents docs = pxt.create_table('rag_demo/documents', {'document': pxt.Document}) ```
Created table 'documents'.```python theme={null} # Insert a sample PDF docs.insert( [ { 'document': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/rag-demo/Argus-Market-Digest-June-2024.pdf' } ] ) ```
Inserting rows into \`documents\`: 1 rows \[00:00, 775.86 rows/s] Inserted 1 row with 0 errors. 1 row inserted, 2 values computed.### Split into chunks Create a view that splits each document into sentences with a token limit: ```python theme={null} # Create a view that splits documents into chunks chunks = pxt.create_view( 'rag_demo/chunks', docs, iterator=document_splitter( docs.document, separators='sentence,token_limit', # Split by sentence with token limit limit=300, # Max 300 tokens per chunk ), ) ```
Inserting rows into \`chunks\`: 217 rows \[00:00, 42111.88 rows/s]```python theme={null} # View the chunks chunks.select(chunks.text).head(5) ``` ### Add semantic search Create an embedding index on the chunks for similarity search: ```python theme={null} # Add embedding index for semantic search chunks.add_embedding_index( column='text', string_embed=sentence_transformer.using(model_id='all-MiniLM-L6-v2'), ) ``` ### Search your documents Use similarity search to find relevant chunks: ```python theme={null} # Search for relevant chunks query = 'market trends' sim = chunks.text.similarity(string=query) results = ( chunks.order_by(sim, asc=False) .select(chunks.text, score=sim) .limit(3) ) results.collect() ``` ## Explanation **Separator options:** You can combine separators: `separators='sentence,token_limit'` **Chunk sizing:** * `limit`: Maximum tokens per chunk (default: 500) * `overlap`: Tokens to overlap between chunks (default: 0) **New documents are processed automatically:** When you insert new documents, chunks and embeddings are generated without extra code. ## See also * [Iterators documentation](/platform/iterators) * [RAG demo notebook](/howto/use-cases/rag-demo) # Extract text from PowerPoint, Word, and Excel files Source: https://docs.pixeltable.com/howto/cookbooks/text/doc-extract-text-from-office-files
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'office\_docs'. \```python theme={null} # Create table for office documents docs = pxt.create_table('office_docs/documents', {'doc': pxt.Document}) ```
Created table 'documents'.```python theme={null} # Sample PowerPoint from Pixeltable repo # Replace with your own PPTX, DOCX, or XLSX files sample_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/calpy.pptx' docs.insert([{'doc': sample_url}]) ```
Inserting rows into \`documents\`: 1 rows \[00:00, 57.40 rows/s] Inserted 1 row with 0 errors. 1 row inserted, 2 values computed.### Extract full document text You create a view with `DocumentSplitter` to extract text. Setting `separators=''` extracts the full document without splitting. ```python theme={null} # Create a view to extract full document text full_text = pxt.create_view( 'office_docs/full_text', docs, iterator=DocumentSplitter.create( document=docs.doc, separators='', # No splitting - extract full document ), ) ```
Inserting rows into \`full\_text\`: 1 rows \[00:00, 196.50 rows/s]```python theme={null} # Preview extracted text full_text.select(full_text.doc, full_text.text).head(1) ``` ### Split documents by headings You split documents by headings to preserve their logical structure. Each section under a heading becomes a separate chunk. ```python theme={null} # Create view that splits by headings by_heading = pxt.create_view( 'office_docs/by_heading', docs, iterator=DocumentSplitter.create( document=docs.doc, separators='heading', metadata='heading', # Preserve heading structure ), ) ```
Inserting rows into \`by\_heading\`: 87 rows \[00:00, 10359.54 rows/s]```python theme={null} # View chunks with their headings by_heading.select(by_heading.heading, by_heading.text).head(5) ``` ### Split by token limit for AI models You split documents by token count when feeding chunks to AI models. The `overlap` parameter ensures chunks share context at boundaries. ```python theme={null} # Create view with token-based splitting by_tokens = pxt.create_view( 'office_docs/by_tokens', docs, iterator=DocumentSplitter.create( document=docs.doc, separators='heading,token_limit', # Split by heading first, then by tokens limit=512, # Maximum tokens per chunk overlap=50, # Overlap between chunks to preserve context metadata='heading', ), ) ```
Inserting rows into \`by\_tokens\`: 2369 rows \[00:00, 9212.05 rows/s]```python theme={null} # Preview chunks with token limits by_tokens.select(by_tokens.doc, by_tokens.heading, by_tokens.text).head(3) ``` ### Search across documents You search across all document chunks using standard Pixeltable queries. ```python theme={null} # Find chunks containing specific keywords by_tokens.where(by_tokens.text.contains('Python')).select( by_tokens.doc, by_tokens.text ).head(3) ``` ## Explanation **Supported formats:** * PowerPoint: `.pptx`, `.ppt` * Word: `.docx`, `.doc` * Excel: `.xlsx`, `.xls` **Separator options:** * `heading` - Split by document headings (preserves structure) * `paragraph` - Split by paragraphs * `sentence` - Split by sentences * `token_limit` - Split by token count (requires `limit` parameter) * `char_limit` - Split by character count (requires `limit` parameter) * Multiple separators work together: `'heading,token_limit'` splits by heading first, then ensures no chunk exceeds token limit **Metadata fields:** * `heading` - Hierarchical heading structure (e.g., `{'h1': 'Introduction', 'h2': 'Overview'}`) * `title` - Document title * `sourceline` - Source line number (HTML and Markdown documents) **Token overlap:** The `overlap` parameter ensures chunks share context at boundaries. This prevents sentences from being split mid-thought when feeding chunks to AI models. ## See also * [Get fast feedback on transformations](/howto/cookbooks/core/dev-iterative-workflow) * [Pixeltable Document API](/sdk/latest/document) # Extract named entities from text Source: https://docs.pixeltable.com/howto/cookbooks/text/text-extract-entities
WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages) WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages) WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages) WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages) WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages) WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages) Note: you may need to restart the kernel to use updated packages.```python theme={null} import getpass import os if 'OPENAI_API_KEY' not in os.environ: os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ') ``` ```python theme={null} import json import pixeltable as pxt from pixeltable.functions.openai import chat_completions ``` ```python theme={null} # Create a fresh directory pxt.drop_dir('entities_demo', force=True) pxt.create_dir('entities_demo') ```
Created directory 'entities\_demo'. \### Define entity extraction schema ```python theme={null} # Define the JSON schema for entity extraction entity_schema = { 'type': 'json_schema', 'json_schema': { 'name': 'entities', 'strict': True, 'schema': { 'type': 'object', 'properties': { 'people': { 'type': 'array', 'items': {'type': 'string'}, 'description': 'Names of people mentioned', }, 'organizations': { 'type': 'array', 'items': {'type': 'string'}, 'description': 'Names of companies, institutions, or groups', }, 'locations': { 'type': 'array', 'items': {'type': 'string'}, 'description': 'Geographic locations (cities, countries, addresses)', }, 'dates': { 'type': 'array', 'items': {'type': 'string'}, 'description': 'Dates or time references', }, }, 'required': ['people', 'organizations', 'locations', 'dates'], 'additionalProperties': False, }, }, } ``` ### Create extraction pipeline ```python theme={null} # Create table for articles articles = pxt.create_table( 'entities_demo/articles', {'title': pxt.String, 'content': pxt.String} ) ```
Created table 'articles'.```python theme={null} # Add entity extraction column extraction_prompt = ( 'Extract all named entities from the following text:\n\n' + articles.content ) articles.add_computed_column( extraction_response=chat_completions( messages=[{'role': 'user', 'content': extraction_prompt}], model='gpt-4o-mini', model_kwargs={'response_format': entity_schema}, ) ) ```
Added 0 column values with 0 errors. No rows affected.```python theme={null} # Extract the entities JSON articles.add_computed_column( entities=articles.extraction_response.choices[0].message.content ) ```
Added 0 column values with 0 errors. No rows affected.### Extract entities from text ```python theme={null} # Insert sample articles sample_articles = [ { 'title': 'Tech Acquisition', 'content': 'Microsoft announced today that CEO Satya Nadella will lead the acquisition of a Seattle-based startup. The deal, expected to close in March 2024, is valued at $500 million.', }, { 'title': 'Sports Update', 'content': 'LeBron James led the Los Angeles Lakers to victory against the Boston Celtics on Tuesday night at Staples Center. Coach Darvin Ham praised the teams performance.', }, { 'title': 'Research Breakthrough', 'content': 'Dr. Sarah Chen at Stanford University published groundbreaking research on renewable energy. The study, funded by the National Science Foundation, was conducted in Palo Alto, California.', }, ] articles.insert(sample_articles) ```
Inserting rows into \`articles\`: 3 rows \[00:00, 404.21 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 12 values computed.```python theme={null} # View extracted entities articles.select(articles.title, articles.entities).collect() ``` ## Explanation **Structured output ensures reliable extraction:** By using OpenAI’s structured output (`response_format`), the model always returns valid JSON matching the schema. No post-processing or error handling needed. **Common entity types:** **Customizing the schema:** Modify the `entity_schema` to extract domain-specific entities—product SKUs, legal terms, medical conditions, etc. ## See also * [Extract structured data from images](/howto/cookbooks/images/vision-structured-output) - JSON extraction from images * [Extract fields from JSON](/howto/cookbooks/core/workflow-json-extraction) - Parse LLM response fields # Summarize text with LLMs Source: https://docs.pixeltable.com/howto/cookbooks/text/text-summarize
WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages) WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages) WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages) WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages) WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages) WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages) Note: you may need to restart the kernel to use updated packages.```python theme={null} import getpass import os if 'OPENAI_API_KEY' not in os.environ: os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ') ``` ```python theme={null} import pixeltable as pxt from pixeltable.functions import openai ``` ### Load sample text ```python theme={null} # Create a fresh directory pxt.drop_dir('summarize_demo', force=True) pxt.create_dir('summarize_demo') ```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'summarize\_demo'. \```python theme={null} # Create table for articles articles = pxt.create_table( 'summarize_demo/articles', {'title': pxt.String, 'content': pxt.String}, ) ```
Created table 'articles'.```python theme={null} # Sample articles to summarize sample_articles = [ { 'title': 'The Rise of Electric Vehicles', 'content': """Electric vehicles (EVs) have seen unprecedented growth in recent years, transforming the automotive industry. Sales increased by 60% globally in 2023, with China leading the market followed by Europe and North America. Major automakers like Tesla, BYD, and traditional manufacturers have invested billions in EV technology. Battery costs have dropped significantly, making EVs more affordable for consumers. Government incentives and stricter emissions regulations continue to drive adoption. Charging infrastructure is expanding rapidly, with new fast-charging networks being deployed across major highways. Despite challenges like range anxiety and charging times, consumer acceptance is growing steadily.""", }, { 'title': 'Advances in Renewable Energy', 'content': """Solar and wind power capacity reached record levels in 2023, accounting for over 30% of global electricity generation. The cost of solar panels has fallen by 90% over the past decade, making renewable energy competitive with fossil fuels. Offshore wind farms are being built at scale, with turbines now reaching heights of over 250 meters. Energy storage solutions, particularly lithium-ion batteries, are addressing intermittency challenges. Countries like Denmark and Scotland have achieved periods of 100% renewable electricity. Corporate power purchase agreements are accelerating the transition, with tech giants committing to carbon-neutral operations.""", }, ] articles.insert(sample_articles) ```
Inserting rows into \`articles\`: 2 rows \[00:00, 316.21 rows/s] Inserted 2 rows with 0 errors. 2 rows inserted, 4 values computed.```python theme={null} # View articles articles.select(articles.title, articles.content).collect() ``` ### Generate summaries Add a computed column that generates summaries using GPT: ```python theme={null} # Create prompt template for summarization prompt = ( 'Summarize the following article in 2-3 sentences:\n\n' + articles.content ) # Add computed column for LLM response articles.add_computed_column( response=openai.chat_completions( messages=[{'role': 'user', 'content': prompt}], model='gpt-4o-mini', ) ) ```
Added 2 column values with 0 errors. 2 rows updated, 2 values computed.```python theme={null} # Extract the summary text from the response articles.add_computed_column( summary=articles.response.choices[0].message.content ) ```
Added 2 column values with 0 errors. 2 rows updated, 2 values computed.```python theme={null} # View titles and summaries articles.select(articles.title, articles.summary).collect() ``` ### Custom summary styles You can customize the summary format by changing the prompt: ```python theme={null} # Add bullet-point summary bullet_prompt = ( 'List the 3 key points from this article as bullet points:\n\n' + articles.content ) articles.add_computed_column( bullet_response=openai.chat_completions( messages=[{'role': 'user', 'content': bullet_prompt}], model='gpt-4o-mini', ) ) articles.add_computed_column( key_points=articles.bullet_response.choices[0].message.content ) ```
Added 2 column values with 0 errors. Added 2 column values with 0 errors. 2 rows updated, 2 values computed.```python theme={null} # View bullet-point summaries articles.select(articles.title, articles.key_points).collect() ``` ### Automatic processing New articles are automatically summarized when inserted: ```python theme={null} # Insert a new article - summaries are generated automatically articles.insert( [ { 'title': 'AI in Healthcare', 'content': """Artificial intelligence is revolutionizing healthcare diagnostics and treatment planning. Machine learning models can now detect diseases from medical images with accuracy matching or exceeding human specialists. AI-powered drug discovery is accelerating the development of new treatments. Natural language processing is being used to extract insights from clinical notes and research papers.""", } ] ) ```
Inserting rows into \`articles\`: 1 rows \[00:00, 411.57 rows/s] Inserted 1 row with 0 errors. 1 row inserted, 6 values computed.```python theme={null} # View all summaries including the new article articles.select(articles.title, articles.summary).collect() ``` ## Explanation **Prompt engineering for summaries:** **Cost optimization:** * Use `gpt-4o-mini` for most summarization tasks (fast and affordable) * Use `gpt-4o` for complex documents requiring deeper understanding * Summaries are cached—you only pay once per article and stuand toofor trL para ## See also * [Split documents for RAG](/howto/cookbooks/text/doc-chunk-for-rag) - Process long documents * [Extract fields from JSON](/howto/cookbooks/core/workflow-json-extraction) - Parse structured LLM output * [Configure API keys](/howto/cookbooks/core/workflow-api-keys) - Set up OpenAI credentials # Translate text between languages Source: https://docs.pixeltable.com/howto/cookbooks/text/text-translate
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'translate\_demo'. \### Create translation pipeline ```python theme={null} # Create table for content content = pxt.create_table( 'translate_demo/content', {'title': pxt.String, 'text_en': pxt.String} ) ```
Created table 'content'.```python theme={null} # Add Spanish translation column spanish_prompt = ( 'Translate the following text to Spanish. Return only the translation, no explanations:\n\n' + content.text_en ) content.add_computed_column( response_es=chat_completions( messages=[{'role': 'user', 'content': spanish_prompt}], model='gpt-4o-mini', ) ) content.add_computed_column( text_es=content.response_es.choices[0].message.content ) ```
Added 0 column values with 0 errors. Added 0 column values with 0 errors. No rows affected.```python theme={null} # Add French translation column french_prompt = ( 'Translate the following text to French. Return only the translation, no explanations:\n\n' + content.text_en ) content.add_computed_column( response_fr=chat_completions( messages=[{'role': 'user', 'content': french_prompt}], model='gpt-4o-mini', ) ) content.add_computed_column( text_fr=content.response_fr.choices[0].message.content ) ```
Added 0 column values with 0 errors. Added 0 column values with 0 errors. No rows affected.### Translate content ```python theme={null} # Insert sample content sample_content = [ { 'title': 'Welcome Message', 'text_en': 'Welcome to our platform! We are excited to have you here.', }, { 'title': 'Product Description', 'text_en': 'This lightweight laptop features a 14-inch display and all-day battery life.', }, { 'title': 'Support Article', 'text_en': 'To reset your password, click the forgot password link on the login page.', }, ] content.insert(sample_content) ```
Inserting rows into \`content\`: 3 rows \[00:00, 198.43 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 18 values computed.```python theme={null} # View all translations content.select( content.title, content.text_en, content.text_es, content.text_fr ).collect() ``` ```python theme={null} # Pretty print one example row = content.where(content.title == 'Welcome Message').collect()[0] ``` ## Explanation **How it works:** Each target language is a computed column with a translation prompt. When you insert new content: 1. The English text is processed 2. Translation prompts are generated for each language 3. All translations run in parallel 4. Results are cached—no re-translation needed **Adding more languages:** ```python theme={null} # Add German translation german_prompt = 'Translate to German:\n\n' + content.text_en content.add_computed_column( response_de=chat_completions(messages=[{'role': 'user', 'content': german_prompt}], model='gpt-4o-mini') ) content.add_computed_column(text_de=content.response_de.choices[0].message.content) ``` **Cost optimization:** ## See also * [Summarize text](/howto/cookbooks/text/text-summarize) - Text summarization with LLMs * [Extract structured data](/howto/cookbooks/images/vision-structured-output) - Get JSON from LLM responses # Add text overlays to videos Source: https://docs.pixeltable.com/howto/cookbooks/video/video-add-text-overlay
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'overlay\_demo'. \### Load sample videos ```python theme={null} # Create a video table videos = pxt.create_table( 'overlay_demo/videos', {'video': pxt.Video, 'title': pxt.String} ) # Insert a sample video videos.insert( [ { 'video': 's3://multimedia-commons/data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4', 'title': 'Sample Video', } ] ) ```
Created table 'videos'. Inserted 1 row with 0 errors in 3.21 s (0.31 rows/s) 1 row inserted.### Add a simple text overlay ```python theme={null} # Add a simple watermark in the corner videos.add_computed_column( watermarked=videos.video.overlay_text( 'My Brand', font_size=24, color='white', opacity=0.7, horizontal_align='right', horizontal_margin=20, vertical_align='top', vertical_margin=20, ) ) ```
Added 1 column value with 0 errors in 1.25 s (0.80 rows/s) 1 row updated.### Add YouTube-style captions ```python theme={null} # Add a caption with a semi-transparent background box videos.add_computed_column( captioned=videos.video.overlay_text( 'This is a sample caption', font_size=32, color='white', box=True, # Add background box box_color='black', box_opacity=0.8, box_border=[6, 14], # Padding: [top/bottom, left/right] horizontal_align='center', vertical_align='bottom', vertical_margin=70, # Distance from bottom ) ) ```
Added 1 column value with 0 errors in 1.08 s (0.92 rows/s) 1 row updated.### Add dynamic titles from table columns ```python theme={null} # Add video title as an overlay (dynamic per video) videos.add_computed_column( titled=videos.video.overlay_text( videos.title, # Use the title column! font_size=48, color='yellow', opacity=1.0, horizontal_align='center', vertical_align='top', vertical_margin=30, ) ) ```
Added 1 column value with 0 errors in 1.15 s (0.87 rows/s) 1 row updated.```python theme={null} # View all versions videos.select( videos.title, videos.video, videos.watermarked, videos.captioned, videos.titled, ).collect() ``` ### Crop a region from a video Use `video.crop()` to extract a rectangular region from a video. This is useful for focusing on a specific area of interest, removing borders, or preparing clips for object-specific analysis. ```python theme={null} # Crop using xywh format (default): [x, y, width, height] videos.add_computed_column(cropped=videos.video.crop([100, 50, 320, 240])) # Crop using xyxy format (common in object detection pipelines): # videos.add_computed_column( # cropped_xyxy=videos.video.crop([100, 50, 420, 290], bbox_format='xyxy') # ) ```
Added 1 column value with 0 errors in 0.56 s (1.78 rows/s) 1 row updated.## Explanation **Positioning options:** **Styling options:** **Background box options:** **Requirements:** * FFmpeg must be installed and in PATH ## See also * [Generate thumbnails](/howto/cookbooks/video/video-generate-thumbnails) - Create preview images * [Detect scene changes](/howto/cookbooks/video/video-scene-detection) - Find cuts and transitions # Extract frames from videos Source: https://docs.pixeltable.com/howto/cookbooks/video/video-extract-frames
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'video\_demo'. \```python theme={null} # Create table for videos videos = pxt.create_table('video_demo/videos', {'video': pxt.Video}) ```
Created table 'videos'.```python theme={null} # Insert a sample video videos.insert( [ { 'video': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/bangkok.mp4' } ] ) ```
Inserting rows into \`videos\`: 1 rows \[00:00, 212.90 rows/s] Inserted 1 row with 0 errors. 1 row inserted, 2 values computed.### Extract frames at fixed rate Create a view that extracts 1 frame per second: ```python theme={null} # Extract 1 frame per second frames = pxt.create_view( 'video_demo/frames', videos, iterator=frame_iterator( videos.video, fps=1.0, # 1 frame per second ), ) ```
Inserting rows into \`frames\`: 19 rows \[00:00, 8687.65 rows/s]```python theme={null} # View extracted frames frames.select(frames.frame, frames.pos).head(3) ``` ### Extract keyframes only For faster processing, extract only keyframes (I-frames): ```python theme={null} # Extract only keyframes (much faster for long videos) keyframes = pxt.create_view( 'video_demo/keyframes', videos, iterator=frame_iterator(videos.video, keyframes_only=True), ) keyframes.select(keyframes.frame).head(3) ```
Inserting rows into \`keyframes\`: 7 rows \[00:00, 3277.53 rows/s]## Explanation **Extraction options:** Only one of `fps`, `num_frames`, or `keyframes_only` can be specified. **When to use keyframes:** * Quick video scanning and thumbnails * Initial content classification * Processing very long videos **Frame metadata:** Each frame includes: * `frame`: The extracted image * `pos`: Frame position in the video * `pts`: Presentation timestamp ## See also * [Iterators documentation](/platform/iterators) * [Analyze images in batch](/howto/cookbooks/images/vision-batch-analysis) # Generate videos with AI Source: https://docs.pixeltable.com/howto/cookbooks/video/video-generate-ai
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'video\_gen\_demo'. \### Generate videos from text prompts ```python theme={null} # Create a table for text-to-video generation videos = pxt.create_table( 'video_gen_demo/text_to_video', {'prompt': pxt.String} ) # Add computed column that generates videos videos.add_computed_column( video=gemini.generate_videos( videos.prompt, model='veo-2.0-generate-001' ) ) ```
Created table 'text\_to\_video'. Added 0 column values with 0 errors. No rows affected.```python theme={null} # Generate a video from a text prompt videos.insert( [ { 'prompt': 'A serene mountain lake at sunrise with mist rising from the water' } ] ) # View the result videos.select(videos.prompt, videos.video).collect() ```
Inserting rows into \`text\_to\_video\`: 1 rows \[00:00, 190.68 rows/s] Inserted 1 row with 0 errors.### Animate images into videos ```python theme={null} # Create a table for image-to-video generation animated = pxt.create_table( 'video_gen_demo/image_to_video', {'image': pxt.Image, 'description': pxt.String}, ) # Add computed column that animates images animated.add_computed_column( video=gemini.generate_videos( image=animated.image, model='veo-2.0-generate-001' ) ) ```
Created table 'image\_to\_video'. Added 0 column values with 0 errors. No rows affected.```python theme={null} # Animate an image base_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images' animated.insert( [ { 'image': f'{base_url}/000000000030.jpg', 'description': 'Beach scene', } ] ) # View the animated result animated.select(animated.image, animated.video).collect() ```
Inserting rows into \`image\_to\_video\`: 1 rows \[00:00, 291.88 rows/s] Inserted 1 row with 0 errors.## Explanation **Generation modes:** **Veo model options:** **Tips:** * Prompts work best when descriptive and specific * Generated videos are cached - same prompt returns cached result * Image-to-video preserves the composition of the input image * New rows automatically generate videos on insert **Requirements:** * Google AI Studio API key (set `GEMINI_API_KEY`) * `pip install google-genai` ## See also * [Extract frames from videos](/howto/cookbooks/video/video-extract-frames) - Pull frames from generated videos * [Add text overlays](/howto/cookbooks/video/video-add-text-overlay) - Add captions to videos # Generate thumbnails from videos Source: https://docs.pixeltable.com/howto/cookbooks/video/video-generate-thumbnails
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'thumbnail\_demo'. \```python theme={null} # Create table for videos videos = pxt.create_table('thumbnail_demo/videos', {'video': pxt.Video}) ```
Created table 'videos'.```python theme={null} # Insert sample videos from public S3 bucket s3_prefix = 's3://multimedia-commons/' video_paths = [ 'data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4', 'data/videos/mp4/ffe/feb/ffefebb41485539f964760e6115fbc44.mp4', ] videos.insert([{'video': s3_prefix + path} for path in video_paths]) ```
Inserting rows into \`videos\`: 2 rows \[00:00, 382.20 rows/s] Inserted 2 rows with 0 errors. 2 rows inserted, 4 values computed.```python theme={null} # View videos videos.collect() ``` ### Extract thumbnail at timestamp Extract a single frame at a specific time (e.g., 1 second into the video): ```python theme={null} # Extract frame at 1 second as thumbnail videos.add_computed_column( thumbnail=pxtf.video.extract_frame(videos.video, timestamp=1.0) ) ```
Added 2 column values with 0 errors. 2 rows updated, 2 values computed.```python theme={null} # View thumbnails videos.select(videos.video, videos.thumbnail).collect() ``` ### Resize thumbnails Create standard-sized thumbnails for consistent display: ```python theme={null} # Resize thumbnail to 320x180 (16:9 aspect ratio) videos.add_computed_column( thumbnail_small=videos.thumbnail.resize((320, 180)) ) ```
Added 2 column values with 0 errors. 2 rows updated, 2 values computed.```python theme={null} # View resized thumbnails with dimensions videos.select( videos.thumbnail_small, videos.thumbnail_small.width, videos.thumbnail_small.height, ).collect() ``` ### Multiple thumbnails with `frame_iterator` For preview strips or timeline thumbnails, use `frame_iterator` to extract multiple frames: ```python theme={null} # Create a view with frames extracted at 0.5 fps (one frame every 2 seconds) frames = pxt.create_view( 'thumbnail_demo/frames', videos, iterator=pxtf.video.frame_iterator(videos.video, fps=0.5), ) ```
Inserting rows into \`frames\`: 17 rows \[00:00, 9736.88 rows/s]```python theme={null} # View extracted frames (multiple per video) frames.select(frames.frame, frames.pos).head(10) ``` ## Explanation **Thumbnail extraction methods:** **Common thumbnail sizes:** ## See also * [Extract frames from videos](/howto/cookbooks/video/video-extract-frames) - Detailed frame extraction guide * [Load media from S3](/howto/cookbooks/data/data-import-s3) - Import videos from cloud storage * [Transform images with PIL](/howto/cookbooks/images/img-pil-transforms) - Resize and crop images # Detect scene changes in videos Source: https://docs.pixeltable.com/howto/cookbooks/video/video-scene-detection
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'scene\_demo'. \### Load sample videos ```python theme={null} # Create a video table videos = pxt.create_table( 'scene_demo/videos', {'video': pxt.Video, 'title': pxt.String} ) # Insert sample videos from S3 videos.insert( [ { 'video': 's3://multimedia-commons/data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4', 'title': 'Sample video 1', } ] ) ```
Created table 'videos'. Inserting rows into \`videos\`: 1 rows \[00:00, 200.53 rows/s] Inserted 1 row with 0 errors. 1 row inserted, 3 values computed.### Detect scenes with content-based detection ```python theme={null} # Detect scenes using content-based detection (best for hard cuts) videos.add_computed_column( scenes_content=videos.video.scene_detect_content( threshold=27.0, # Lower = more sensitive min_scene_len=15, # Minimum frames between cuts ) ) # View detected scenes videos.select(videos.title, videos.scenes_content).collect() ```
Added 1 column value with 0 errors.### Detect fade transitions ```python theme={null} # Detect fade-to-black/white transitions videos.add_computed_column( scenes_fade=videos.video.scene_detect_threshold( threshold=12.0, # Brightness threshold for fades min_scene_len=15, ) ) # View fade-detected scenes videos.select(videos.title, videos.scenes_fade).collect() ```
Added 1 column value with 0 errors.### Adaptive detection for complex videos ```python theme={null} # Adaptive detection adjusts to video content dynamically videos.add_computed_column( scenes_adaptive=videos.video.scene_detect_adaptive( adaptive_threshold=3.0, # Lower = more scenes detected min_scene_len=15, fps=2.0, # Analyze at 2 FPS for speed ) ) # View adaptively-detected scenes videos.select(videos.title, videos.scenes_adaptive).collect() ```
Added 1 column value with 0 errors.## Explanation **Detection methods:** **Output format:** Each method returns a list of scene dictionaries: ```python theme={null} { 'start_time': 5.2, # Scene start in seconds 'start_pts': 156, # Presentation timestamp 'duration': 3.8 # Scene duration in seconds } ``` **Tuning tips:** ## See also * [Extract frames from videos](/howto/cookbooks/video/video-extract-frames) - Get frames at scene boundaries * [Generate thumbnails](/howto/cookbooks/video/video-generate-thumbnails) - Create preview images # Infrastructure Setup Source: https://docs.pixeltable.com/howto/deployment/infrastructure Code organization and storage architecture for Pixeltable deployments ## Code Organization Both deployment strategies require separating schema definition from application code. **Schema Definition (`setup_pixeltable.py`):** * Defines directories, tables, views, computed columns, indexes * Acts as Infrastructure-as-Code for Pixeltable entities * Version controlled in Git * Executed during initial deployment and schema migrations **Application Code (`app.py`, `endpoints.py`, `functions.py`):** * Assumes Pixeltable infrastructure exists * Interacts with tables via `pxt.get_table()` and `@pxt.udf` * Handles missing tables/views gracefully **Configuration (`config.py`):** * Externalizes model IDs, API keys, thresholds, connection strings * Uses environment variables (`.env` + `python-dotenv`) or secrets management * Never hardcodes secrets ```python theme={null} # setup_pixeltable.py import pixeltable as pxt import config pxt.create_dir(config.APP_NAMESPACE, if_exists='ignore') pxt.create_table( f'{config.APP_NAMESPACE}/documents', { 'document': pxt.Document, 'metadata': pxt.Json, 'timestamp': pxt.Timestamp }, if_exists='ignore' # Idempotent: safe for repeated execution ) # --- # app.py import pixeltable as pxt import config docs_table = pxt.get_table(f'{config.APP_NAMESPACE}/documents') if docs_table is None: raise RuntimeError( f"Table '{config.APP_NAMESPACE}/documents' not found. " "Run setup_pixeltable.py first." ) ``` ## Project Structure
Created directory 'anthropic\_demo'. \## Messages Create a Table: In Pixeltable, create a table with columns to represent your input data and the columns where you want to store the results from Anthropic. ```python theme={null} from pixeltable.functions import anthropic # Create a table in Pixeltable and add a computed column that calls Anthropic t = pxt.create_table('anthropic_demo/chat', {'input': pxt.String}) msgs = [{'role': 'user', 'content': t.input}] t.add_computed_column( output=anthropic.messages( messages=msgs, model='claude-haiku-4-5-20251001', max_tokens=300, model_kwargs={ # Optional dict with parameters for the Anthropic API 'system': 'Respond to the prompt with detailed historical information.', 'temperature': 0.7, }, ) ) ```
Created table 'chat'. Added 0 column values with 0 errors. No rows affected.```python theme={null} # Parse the response into a new column t.add_computed_column(response=t.output.content[0].text) ```
Added 0 column values with 0 errors. No rows affected.```python theme={null} # Start a conversation t.insert( [ { 'input': 'What was the outcome of the 1904 US Presidential election?' } ] ) t.select(t.input, t.response).show() ```
Inserting rows into \`chat\`: 1 rows \[00:00, 203.87 rows/s] Inserted 1 row with 0 errors.### Learn More To learn more about advanced techniques like RAG operations in Pixeltable, check out the [RAG Operations in Pixeltable](/howto/use-cases/rag-operations) tutorial. If you have any questions, don’t hesitate to reach out. # Working with Bedrock in Pixeltable Source: https://docs.pixeltable.com/howto/providers/working-with-bedrock
Created directory 'bedrock\_demo'. \## Messages Create a Table: In Pixeltable, create a table with columns to represent your input data and the columns where you want to store the results from Bedrock. ```python theme={null} from pixeltable.functions import bedrock # Create a table in Pixeltable and add a computed column that calls Bedrock t = pxt.create_table('bedrock_demo/chat', {'input': pxt.String}) t.add_computed_column( output=bedrock.converse( model_id='amazon.nova-pro-v1:0', messages=[{'role': 'user', 'content': [{'text': t.input}]}], ) ) ```
Created table 'chat'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Parse the response into a new column t.add_computed_column(response=t.output.output.message.content[0].text) ```
Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Start a conversation t.insert( [ { 'input': 'What was the outcome of the 1904 US Presidential election?' } ] ) t.select(t.input, t.response).show() ```
Inserted 1 row with 0 errors in 2.75 s (0.36 rows/s)### Learn more To learn more about advanced techniques like RAG operations in Pixeltable, check out the [RAG Operations in Pixeltable](/howto/use-cases/rag-operations) tutorial. If you have any questions, don’t hesitate to reach out. # Working with Deepseek in Pixeltable Source: https://docs.pixeltable.com/howto/providers/working-with-deepseek
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'deepseek\_demo'. \## Messages Create a Table: In Pixeltable, create a table with columns to represent your input data and the columns where you want to store the results from Deepseek. ```python theme={null} from pixeltable.functions import deepseek # Create a table in Pixeltable and add a computed column that calls Deepseek t = pxt.create_table('deepseek_demo/chat', {'input': pxt.String}) msgs = [{'role': 'user', 'content': t.input}] t.add_computed_column( output=deepseek.chat_completions(messages=msgs, model='deepseek-chat') ) ```
Created table 'chat'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Parse the response into a new column t.add_computed_column(response=t.output.choices[0].message.content) ```
Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Start a conversation t.insert( [ { 'input': 'What was the outcome of the 1904 US Presidential election?' } ] ) t.select(t.input, t.response).show() ```
Inserted 1 row with 0 errors in 18.72 s (0.05 rows/s)### Learn more To learn more about advanced techniques like RAG operations in Pixeltable, check out the [RAG Operations in Pixeltable](/howto/use-cases/rag-operations) tutorial. If you have any questions, don’t hesitate to reach out. # Working with Microsoft Fabric Source: https://docs.pixeltable.com/howto/providers/working-with-fabric
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'fal\_demo'. \## Text-to-image generation with FLUX Schnell Let’s start by using fal.ai’s FLUX Schnell model, which is optimized for fast image generation. We’ll create a table to store prompts and generated images. ```python theme={null} from pixeltable.functions import fal # Create a table for image generation t = pxt.create_table('fal_demo/images', {'prompt': pxt.String}) # Add a computed column that calls the FLUX Schnell model t.add_computed_column( response=fal.run( input={'prompt': t.prompt}, app='fal-ai/flux/schnell' ) ) ```
Created table 'images'. Added 0 column values with 0 errors in 0.01 s No rows affected.Now let’s insert some prompts and see the results: ```python theme={null} # Insert a few prompts t.insert( [ { 'prompt': 'A serene mountain landscape at sunset with a crystal clear lake' }, { 'prompt': 'A friendly robot teaching a class of kittens to code' }, {'prompt': 'An underwater city with bioluminescent architecture'}, ] ) ```
Inserted 3 rows with 0 errors in 1.77 s (1.70 rows/s) 3 rows inserted.Let’s examine the structure of the response: ```python theme={null} t.select(t.prompt, t.response).head(1) ``` We can see that fal.ai returns a JSON response with an `images` array. Each image has a `url` field. Let’s extract and display the images: ```python theme={null} # Add a computed column to extract the image URL and convert it to an Image type t.add_computed_column( image=t.response['images'][0]['url'].astype(pxt.Image) ) # Display the prompts and images t.select(t.prompt, t.image).head() ```
Added 3 column values with 0 errors in 0.04 s (85.38 rows/s)## Advanced image generation with Fast SDXL fal.ai also offers Fast SDXL, which provides more control over image generation parameters. Let’s create a new table to explore these capabilities. ```python theme={null} # Create a table with more parameters sdxl_t = pxt.create_table( 'fal_demo/sdxl_images', { 'prompt': pxt.String, 'negative_prompt': pxt.String, 'steps': pxt.Int, }, ) # Add a computed column with more parameters sdxl_t.add_computed_column( response=fal.run( input={ 'prompt': sdxl_t.prompt, 'negative_prompt': sdxl_t.negative_prompt, 'image_size': 'square_hd', # 1024x1024 'num_inference_steps': sdxl_t.steps, }, app='fal-ai/fast-sdxl', ) ) # Extract the image sdxl_t.add_computed_column( image=sdxl_t.response['images'][0]['url'].astype(pxt.Image) ) ```
Created table 'sdxl\_images'. Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Insert prompts with different parameters sdxl_t.insert( [ { 'prompt': 'A majestic lion in a savanna at golden hour, photorealistic', 'negative_prompt': 'cartoon, illustration, drawing', 'steps': 25, }, { 'prompt': 'A futuristic cityscape with flying cars and neon lights', 'negative_prompt': 'blurry, low quality', 'steps': 30, }, ] ) ```
Inserted 2 rows with 0 errors in 5.23 s (0.38 rows/s) 2 rows inserted.```python theme={null} # Display the results sdxl_t.select(sdxl_t.prompt, sdxl_t.image).head() ``` ## Generating multiple images per prompt You can also generate multiple variations of the same prompt in a single request: ```python theme={null} # Create a table for multiple image generation multi_t = pxt.create_table( 'fal_demo/multi_images', {'prompt': pxt.String} ) # Generate 3 variations of each prompt multi_t.add_computed_column( response=fal.run( input={'prompt': multi_t.prompt, 'num_images': 3}, app='fal-ai/flux/schnell', ) ) # Extract the first image (you could create columns for all three) multi_t.add_computed_column( image_1=multi_t.response['images'][0]['url'].astype(pxt.Image) ) multi_t.add_computed_column( image_2=multi_t.response['images'][1]['url'].astype(pxt.Image) ) multi_t.add_computed_column( image_3=multi_t.response['images'][2]['url'].astype(pxt.Image) ) ```
Created table 'multi\_images'. Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Insert a prompt multi_t.insert( [{'prompt': 'A steampunk mechanical butterfly on a brass flower'}] ) ```
Inserted 1 row with 0 errors in 1.14 s (0.88 rows/s) 1 row inserted.```python theme={null} # Display all three variations multi_t.select(multi_t.image_1, multi_t.image_2, multi_t.image_3).head() ``` ## Using Higher Quality Models For higher quality generation, you can use models like `fal-ai/flux/dev` which produce better results but take more time: ```python theme={null} # Create a table using FLUX Dev dev_t = pxt.create_table('fal_demo/flux_dev', {'prompt': pxt.String}) # Use FLUX Dev model for higher quality dev_t.add_computed_column( response=fal.run( input={'prompt': dev_t.prompt}, app='fal-ai/flux/dev' ) ) dev_t.add_computed_column( image=dev_t.response['images'][0]['url'].astype(pxt.Image) ) ```
Created table 'flux\_dev'. Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Insert a prompt (note: FLUX Dev may take longer but produces higher quality results) dev_t.insert( [ { 'prompt': 'A highly detailed oil painting of a wizard casting a spell in an ancient library' } ] ) ```
Inserted 1 row with 0 errors in 1.74 s (0.58 rows/s) 1 row inserted.```python theme={null} # Display the result dev_t.select(dev_t.prompt, dev_t.image).head() ``` ## Exploring Available Models fal.ai offers a wide variety of models. Here are some popular ones you can try: ### Image Generation Models * `fal-ai/flux/schnell` - Fast FLUX model for quick image generation * `fal-ai/flux/dev` - Higher quality FLUX model (slower) * `fal-ai/fast-sdxl` - Fast Stable Diffusion XL * `fal-ai/stable-diffusion-v3-medium` - Stable Diffusion 3 Medium ### Other Models * `fal-ai/fast-lightning-sdxl` - Ultra-fast SDXL variant * `fal-ai/recraft-v3` - Recraft V3 for design-focused generation To use a different model, simply change the `app` parameter in your `fal.run()` call. ## Working with Batch Processing Pixeltable’s computed columns make it easy to process multiple images in batch. Let’s create a larger dataset: ```python theme={null} # Create a batch processing table batch_t = pxt.create_table( 'fal_demo/batch', {'category': pxt.String, 'description': pxt.String} ) # Create a prompt by combining category and description batch_t.add_computed_column( prompt=pxt.functions.string.format( 'A {} that is {}', batch_t.category, batch_t.description ) ) # Generate images batch_t.add_computed_column( response=fal.run( input={'prompt': batch_t.prompt}, app='fal-ai/flux/schnell' ) ) batch_t.add_computed_column( image=batch_t.response['images'][0]['url'].astype(pxt.Image) ) ```
Created table 'batch'. Added 0 column values with 0 errors in 0.02 s Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Insert a batch of prompts batch_t.insert( [ {'category': 'landscape', 'description': 'peaceful and zen-like'}, { 'category': 'portrait', 'description': 'mysterious and ethereal', }, { 'category': 'abstract art', 'description': 'colorful and energetic', }, { 'category': 'architecture', 'description': 'modern and minimalist', }, {'category': 'animal', 'description': 'cute and fluffy'}, ] ) ```
Inserted 5 rows with 0 errors in 1.69 s (2.96 rows/s) 5 rows inserted.```python theme={null} # View all results batch_t.select( batch_t.category, batch_t.description, batch_t.image ).show() ``` ## Tips and Best Practices 1. **Rate Limiting**: fal.ai has rate limits. Pixeltable respects these limits by default. You can configure custom rate limits in your Pixeltable config. 2. **Model Selection**: * Use `flux/schnell` for fast prototyping and when speed is critical * Use `flux/dev` when you need higher quality and can afford longer generation times * Use `fast-sdxl` for a good balance of speed and quality 3. **Prompt Engineering**: Good prompts lead to better results. Be specific and descriptive. 4. **Negative Prompts**: Use negative prompts to exclude unwanted elements from your images. 5. **Caching**: Pixeltable automatically caches results, so re-running the same prompt won’t incur additional costs. ### Learn more * fal.ai Documentation: [https://fal.ai/docs](https://fal.ai/docs) * Pixeltable Documentation: [https://docs.pixeltable.com](https://docs.pixeltable.com) * To learn more about advanced techniques like RAG operations in Pixeltable, check out the [RAG Operations in Pixeltable](/howto/use-cases/rag-operations) tutorial. If you have any questions, don’t hesitate to reach out on our [Discord community](https://pixeltable.com/discord)! # Working with Fireworks AI in Pixeltable Source: https://docs.pixeltable.com/howto/providers/working-with-fireworks
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'fireworks\_demo'. \## Completions Create a Table: In Pixeltable, create a table with columns to represent your input data and the columns where you want to store the results from Fireworks. ```python theme={null} from pixeltable.functions.fireworks import chat_completions # Create a table in Pixeltable and pick a model hosted on Fireworks with some parameters t = pxt.create_table('fireworks_demo/chat', {'input': pxt.String}) messages = [{'role': 'user', 'content': t.input}] t.add_computed_column( output=chat_completions( messages=messages, model='accounts/fireworks/models/llama-v3p3-70b-instruct', model_kwargs={ # Optional dict with parameters for the Fireworks API 'max_tokens': 300, 'top_k': 40, 'top_p': 0.9, 'temperature': 0.7, }, ) ) ```
Created table 'chat'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Parse the bot_response into a new column t.add_computed_column(response=t.output.choices[0].message.content) ```
Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Start a conversation t.insert( [{'input': 'Can you tell me who was President of the US in 1961?'}] ) t.select(t.input, t.response).show() ```
Inserted 1 row with 0 errors in 2.15 s (0.47 rows/s)### Learn more To learn more about advanced techniques like RAG operations in Pixeltable, check out the [RAG Operations in Pixeltable](/howto/use-cases/rag-operations) tutorial. If you have any questions, don’t hesitate to reach out. # Working with Gemini in Pixeltable Source: https://docs.pixeltable.com/howto/providers/working-with-gemini
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'gemini\_demo'. \## Generate content Create a Table: In Pixeltable, create a table with columns to represent your input data and the columns where you want to store the results from Gemini. ```python theme={null} from google.genai.types import GenerateContentConfigDict from pixeltable.functions import gemini # Create a table in Pixeltable and pick a model hosted on Google AI Studio with some parameters t = pxt.create_table('gemini_demo/text', {'input': pxt.String}) config = GenerateContentConfigDict( stop_sequences=['\n'], max_output_tokens=300, temperature=1.0, top_p=0.95, top_k=40, ) t.add_computed_column( output=gemini.generate_content( t.input, model='gemini-2.5-flash', config=config ) ) ```
Created table 'text'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Ask Gemini to generate some content based on the input t.insert( [ {'input': 'Write a story about a magic backpack.'}, {'input': 'Tell me a science joke.'}, ] ) ```
Inserted 2 rows with 0 errors in 1.43 s (1.39 rows/s) 2 rows inserted.```python theme={null} # Parse the response into a new column t.add_computed_column( response=t.output['candidates'][0]['content']['parts'][0]['text'] ) t.select(t.input, t.response).head() ```
Added 2 column values with 0 errors in 0.03 s (62.79 rows/s)## Generate images with Imagen ```python theme={null} from google.genai.types import GenerateImagesConfigDict images_t = pxt.create_table('gemini_demo/images', {'prompt': pxt.String}) config = GenerateImagesConfigDict(aspect_ratio='16:9') images_t.add_computed_column( generated_image=gemini.generate_images( images_t.prompt, model='imagen-4.0-generate-001', config=config ) ) ```
Created table 'images'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} images_t.insert( [{'prompt': 'A friendly dinosaur playing tennis in a cornfield'}] ) ```
Inserted 1 row with 0 errors in 9.41 s (0.11 rows/s) 1 row inserted.```python theme={null} images_t.head() ``` ## Generate video with Veo ```python theme={null} videos_t = pxt.create_table('gemini_demo/videos', {'prompt': pxt.String}) videos_t.add_computed_column( generated_video=gemini.generate_videos( videos_t.prompt, model='veo-2.0-generate-001' ) ) ```
Created table 'videos'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} videos_t.insert( [ { 'prompt': 'A giant pixel floating over the open ocean in a sea of data' } ] ) ```
Inserted 1 row with 0 errors in 46.23 s (0.02 rows/s) 1 row inserted.```python theme={null} videos_t.head() ``` ## Generate Video from an existing Image We’ll add an additional computed column to our existing `images_t` to animate the generated images. ```python theme={null} images_t.add_computed_column( generated_video=gemini.generate_videos( image=images_t.generated_image, model='veo-2.0-generate-001' ) ) ```
Added 1 column value with 0 errors in 40.00 s (0.03 rows/s) 1 row updated.```python theme={null} images_t.head() ``` ### Learn more To learn more about advanced techniques like RAG operations in Pixeltable, check out the [RAG Operations in Pixeltable](/howto/use-cases/rag-operations) tutorial. If you have any questions, don’t hesitate to reach out. # Working with Groq in Pixeltable Source: https://docs.pixeltable.com/howto/providers/working-with-groq
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'groq\_demo'. \## Chat Completions Create a Table: In Pixeltable, create a table with columns to represent your input data and the columns where you want to store the results from Groq. ```python theme={null} from pixeltable.functions import groq # Create a table in Pixeltable and add a computed column that calls OpenAI t = pxt.create_table('groq_demo/chat', {'input': pxt.String}) messages = [{'role': 'user', 'content': t.input}] t.add_computed_column( output=groq.chat_completions( messages=messages, model='llama-3.3-70b-versatile', model_kwargs={ # Optional dict with parameters for the Groq API 'max_tokens': 300, 'top_p': 0.9, 'temperature': 0.7, }, ) ) ```
Created table 'chat'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Parse the response into a new column t.add_computed_column(response=t.output.choices[0].message.content) ```
Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Start a conversation t.insert( [{'input': 'How many islands are in the Aleutian island chain?'}] ) t.select(t.input, t.response).head() ```
Inserted 1 row with 0 errors in 1.16 s (0.86 rows/s)### Learn more To learn more about advanced techniques like RAG operations in Pixeltable, check out the [RAG Operations in Pixeltable](/howto/use-cases/rag-operations) tutorial. If you have any questions, don’t hesitate to reach out. # Working with Hugging Face Source: https://docs.pixeltable.com/howto/providers/working-with-hugging-face
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'hf\_demo'. Created table 'images'. Inserting rows into \`images\`: 100 rows \[00:00, 310.24 rows/s] Inserting rows into \`images\`: 100 rows \[00:00, 353.22 rows/s] Inserting rows into \`images\`: 100 rows \[00:00, 368.40 rows/s] Inserting rows into \`images\`: 82 rows \[00:00, 567.89 rows/s] Inserted 382 rows with 0 errors.```python theme={null} images.head(3) ``` ## Working with Dataset Splits When importing a `DatasetDict` (which contains multiple splits like train/test), use `extra_args={'column_name_for_split': 'split'}` to preserve split information in a column. ```python theme={null} # Load a dataset with multiple splits imdb = datasets.load_dataset('stanfordnlp/imdb') # Import all splits, storing split info in 'split' column reviews = pxt.create_table( 'hf_demo/reviews', source=imdb, extra_args={'column_name_for_split': 'split'}, ) ``` ```python theme={null} # Query by split reviews.where(reviews.split == 'train').limit(3).select( reviews.text, reviews.label, reviews.split ).collect() ``` ```python theme={null} # Count rows per split reviews.group_by(reviews.split).select( reviews.split, count=pxt.functions.count(reviews.text) ).collect() ``` Using `schema_overrides` for Embeddings When importing datasets with pre-computed embeddings (common in RAG), use `schema_overrides` to specify the exact array shape: ```python theme={null} # Wikipedia with pre-computed embeddings - specify array shape wiki_ds = ( datasets.load_dataset( 'Cohere/wikipedia-2023-11-embed-multilingual-v3', 'simple', split='train', streaming=True, ) .select_columns(['url', 'title', 'text', 'emb']) .take(50) ) wiki = pxt.create_table( 'hf_demo/wiki_embeddings', source=wiki_ds, schema_overrides={'emb': pxt.Array[(1024,), pxt.Float]}, ) ``` ```python theme={null} wiki.select(wiki.title, wiki.emb).limit(2).collect() ``` ## Streaming Large Datasets For very large datasets, use `streaming=True` to filter and sample before importing: ```python theme={null} # Stream, filter, and sample before importing streaming_ds = datasets.load_dataset( 'stanfordnlp/imdb', split='train', streaming=True ) positive_stream = streaming_ds.filter(lambda x: x['label'] == 1).take(50) ``` ```python theme={null} positive_samples = pxt.create_table( 'hf_demo/positive_samples', source=positive_stream ) ``` ```python theme={null} positive_samples.select( positive_samples.text, positive_samples.label ).limit(2).collect() ``` ## Importing Audio Datasets Audio datasets work seamlessly - Pixeltable stores audio files locally: ```python theme={null} # Import a small audio dataset audio_ds = datasets.load_dataset( 'hf-internal-testing/librispeech_asr_dummy', 'clean', split='validation', ) audio_table = pxt.create_table('hf_demo/audio_samples', source=audio_ds) audio_table.select(audio_table.audio, audio_table.text).limit(2).collect() ```
Created table 'audio\_samples'. Inserting rows into \`audio\_samples\`: 73 rows \[00:00, 3960.27 rows/s] Inserted 73 rows with 0 errors.## Inserting More Data Use `table.insert()` to add more data from a HuggingFace dataset to an existing table: ```python theme={null} # Insert more data from the same or similar dataset more_audio = datasets.load_dataset( 'hf-internal-testing/librispeech_asr_dummy', 'clean', split='validation', ).select(range(5)) audio_table.insert(more_audio) audio_table.count() ```
Inserting rows into \`audio\_samples\`: 5 rows \[00:00, 3186.68 rows/s] Inserted 5 rows with 0 errors. 78## Type Mappings Reference ## Using Hugging Face Models Pixeltable integrates with Hugging Face models for embeddings and inference, running locally without API keys. ### Image Embeddings with CLIP ```python theme={null} from pixeltable.functions.huggingface import clip # Add CLIP embedding index for cross-modal image search images.add_embedding_index( 'Image', embedding=clip.using(model_id='openai/clip-vit-base-patch32') ) # Search images using text sim = images.Image.similarity(string='anime character with red clothes') images.order_by(sim, asc=False).limit(3).select( images.Image, images.Name, sim=sim ).collect() ``` ### Text Embeddings with Sentence Transformers ```python theme={null} from pixeltable.functions.huggingface import sentence_transformer # Create table with text embedding index sample_reviews = pxt.create_table( 'hf_demo/sample_reviews', source=datasets.load_dataset('stanfordnlp/imdb', split='test').select( range(100) ), ) sample_reviews.add_embedding_index( 'text', string_embed=sentence_transformer.using(model_id='all-MiniLM-L6-v2'), ) # Semantic search query = 'great acting and cinematography' sim = sample_reviews.text.similarity(string=query) sample_reviews.order_by(sim, asc=False).limit(3).select( sample_reviews.text, sim=sim ).collect() ```
Created table 'sample\_reviews'. Inserting rows into \`sample\_reviews\`: 100 rows \[00:00, 21625.70 rows/s] Inserted 100 rows with 0 errors.### More Hugging Face Models Pixeltable supports many more HuggingFace models including: * **ASR**: `automatic_speech_recognition()` - transcribe audio * **Translation**: `translation()` - translate between languages * **Text Generation**: `text_generation()` - generate text completions * **Image Classification**: `vit_for_image_classification()` - classify images * **Object Detection**: `detr_for_object_detection()` - detect objects in images See the SDK reference below for the complete list. ## See Also * [HuggingFace SDK Reference](/sdk/latest/huggingface) - Full list of models: ASR, translation, text generation, image classification, etc. * [Working with embedding indexes](../../platform/embedding-indexes) - Index configuration # Working with Jina AI in Pixeltable Source: https://docs.pixeltable.com/howto/providers/working-with-jina
Created directory 'jina\_demo'. \## Text Embeddings Jina AI provides frontier multilingual embedding models for semantic search and RAG applications. The `jina-embeddings-v3` model supports 89+ languages and achieves state-of-the-art performance. ```python theme={null} from pixeltable.functions import jina # Create a table for document embeddings docs_t = pxt.create_table('jina_demo.documents', {'text': pxt.String}) # Add computed column with Jina embeddings # task='retrieval.passage' optimizes embeddings for documents to be searched docs_t.add_computed_column( embedding=jina.embeddings( docs_t.text, model='jina-embeddings-v3', task='retrieval.passage' ) ) ```
Created table 'documents'. Added 0 column values with 0 errors. No rows affected.```python theme={null} # Insert some sample documents documents = [ 'The Mediterranean diet emphasizes fish, olive oil, and vegetables, believed to reduce chronic diseases.', 'Photosynthesis in plants converts light energy into glucose and produces essential oxygen.', '20th-century innovations, from radios to smartphones, centered on electronic advancements.', 'Rivers provide water, irrigation, and habitat for aquatic species, vital for ecosystems.', "Apple's conference call to discuss fourth fiscal quarter results is scheduled for Thursday, November 2, 2023.", "Shakespeare's works, like 'Hamlet' and 'A Midsummer Night's Dream,' endure in literature.", ] docs_t.insert({'text': doc} for doc in documents) ```
Inserting rows into \`documents\`: 6 rows \[00:00, 1394.00 rows/s] Inserted 6 rows with 0 errors. 6 rows inserted, 12 values computed.```python theme={null} # View the embeddings docs_t.select(docs_t.text, docs_t.embedding).head(3) ``` ## Multilingual Embeddings Jina AI models excel at multilingual text. The same model can embed text in different languages into the same semantic space. ```python theme={null} # Create a table for multilingual content multilingual_t = pxt.create_table( 'jina_demo.multilingual', {'text': pxt.String, 'language': pxt.String} ) multilingual_t.add_computed_column( embedding=jina.embeddings( multilingual_t.text, model='jina-embeddings-v3', task='text-matching', ) ) # Insert texts in different languages (all about organic skincare) multilingual_t.insert( [ { 'text': 'Organic skincare for sensitive skin with aloe vera and chamomile.', 'language': 'English', }, { 'text': 'Bio-Hautpflege für empfindliche Haut mit Aloe Vera und Kamille.', 'language': 'German', }, { 'text': 'Cuidado de la piel orgánico para piel sensible con aloe vera y manzanilla.', 'language': 'Spanish', }, { 'text': '针对敏感肌专门设计的天然有机护肤产品', 'language': 'Chinese', }, ] ) multilingual_t.select( multilingual_t.language, multilingual_t.text ).collect() ```
Created table 'multilingual'. Added 0 column values with 0 errors. Inserting rows into \`multilingual\`: 4 rows \[00:00, 736.23 rows/s] Inserted 4 rows with 0 errors.## Embedding Index for Similarity Search You can use Jina AI embeddings with Pixeltable’s embedding index for efficient similarity search. ```python theme={null} # Create a table with an embedding index search_t = pxt.create_table('jina_demo.search', {'text': pxt.String}) # Add embedding index for similarity search embed_fn = jina.embeddings.using( model='jina-embeddings-v3', task='retrieval.passage' ) search_t.add_embedding_index('text', string_embed=embed_fn) # Insert documents search_t.insert({'text': doc} for doc in documents) ```
Created table 'search'. Inserting rows into \`search\`: 6 rows \[00:00, 565.03 rows/s] Inserted 6 rows with 0 errors. 6 rows inserted, 12 values computed.```python theme={null} # Perform similarity search sim = search_t.text.similarity( string='What are the health benefits of Mediterranean food?' ) search_t.order_by(sim, asc=False).limit(3).select( search_t.text, score=sim ).collect() ``` ## Reranking Jina AI’s reranker models can improve search relevance by reordering results based on semantic similarity to the query. ```python theme={null} # Create a table for reranking queries rerank_t = pxt.create_table( 'jina_demo.rerank', {'query': pxt.String, 'documents': pxt.Json}, if_exists='replace', ) # Add computed column for reranking rerank_t.add_computed_column( reranked=jina.rerank( rerank_t.query, rerank_t.documents, model='jina-reranker-v2-base-multilingual', top_n=3, return_documents=True, ) ) # Insert a query with candidate documents rerank_t.insert( query="When is Apple's conference call scheduled?", documents=documents, ) ```
Created table 'rerank'. Added 0 column values with 0 errors. Inserting rows into \`rerank\`: 1 rows \[00:00, 543.16 rows/s] Inserted 1 row with 0 errors. 1 row inserted, 2 values computed.```python theme={null} # View the reranked results result = rerank_t.select(rerank_t.reranked).collect() result['reranked'][0] ```
\{'usage': \{'total\_tokens': 221},
'results': \[\{'index': 4,
'document': "Apple's conference call to discuss fourth fiscal quarter results is scheduled for Thursday, November 2, 2023.",
'relevance\_score': 0.64511991},
\{'index': 2,
'document': '20th-century innovations, from radios to smartphones, centered on electronic advancements.',
'relevance\_score': 0.03846619},
\{'index': 5,
'document': "Shakespeare's works, like 'Hamlet' and 'A Midsummer Night's Dream,' endure in literature.",
'relevance\_score': 0.02517884}]}
## Learn More
* [Jina AI Documentation](https://jina.ai/)
* [Jina Embeddings](https://jina.ai/embeddings/)
* [Jina Reranker](https://jina.ai/reranker/)
* [API Rate Limits](https://jina.ai/api-dashboard/rate-limit)
# Working with llama.cpp in Pixeltable
Source: https://docs.pixeltable.com/howto/providers/working-with-llama-cpp
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'llama\_demo'. Created table 'chat'.Next, we add a computed column that calls the Pixeltable `create_chat_completion` UDF, which adapts the corresponding llama.cpp API call. In our examples, we’ll use pretrained models from the Hugging Face repository. llama.cpp makes it easy to do this by specifying a repo\_id (from the URL of the model) and filename from the model repo; the model will then be downloaded and cached automatically. (If this is your first time using Pixeltable, the Pixeltable Fundamentals tutorial contains more details about table creation, computed columns, and UDFs.) For this demo we’ll use `Qwen2.5-0.5B`, a very small (0.5-billion parameter) model that still produces decent results. We’ll use `Q5_K_M` (5-bit) quantization, which gives an excellent balance of quality and efficiency. ```python theme={null} # Add a computed column that uses llama.cpp for chat completion # against the input. messages = [ {'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': t.input}, ] t.add_computed_column( result=llama_cpp.create_chat_completion( messages, repo_id='Qwen/Qwen2.5-0.5B-Instruct-GGUF', repo_filename='*q5_k_m.gguf', ) ) # Extract the output content from the JSON structure returned # by llama_cpp. t.add_computed_column(output=t.result.choices[0].message.content) ```
Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s No rows affected.## Test chat completion Let’s try a simple query: ```python theme={null} # Test with a simple question t.insert( [ {'input': 'What is the capital of France?'}, {'input': 'What are some edible species of fish?'}, {'input': 'Who are the most prominent classical composers?'}, ] ) ```
Inserted 3 rows with 0 errors in 6.74 s (0.44 rows/s) 3 rows inserted.```python theme={null} t.select(t.input, t.output).collect() ``` ## Comparing models Local model frameworks like `llama.cpp` make it easy to compare the output of different models. Let’s try comparing the output from `Qwen` against a somewhat larger model, `Llama-3.2-1B`. As always, when we add a new computed column to our table, it’s automatically evaluated against the existing table rows. ```python theme={null} t.add_computed_column( result_l3=llama_cpp.create_chat_completion( messages, repo_id='bartowski/Llama-3.2-1B-Instruct-GGUF', repo_filename='*Q5_K_M.gguf', ) ) t.add_computed_column(output_l3=t.result_l3.choices[0].message.content) t.select(t.input, t.output, t.output_l3).collect() ```
Added 3 column values with 0 errors in 6.32 s (0.47 rows/s) Added 3 column values with 0 errors in 0.03 s (113.79 rows/s)Just for fun, let’s try running against a different system prompt with a different persona. ```python theme={null} messages_teacher = [ { 'role': 'system', 'content': 'You are a patient school teacher. ' 'Explain concepts simply and clearly.', }, {'role': 'user', 'content': t.input}, ] t.add_computed_column( result_teacher=llama_cpp.create_chat_completion( messages_teacher, repo_id='bartowski/Llama-3.2-1B-Instruct-GGUF', repo_filename='*Q5_K_M.gguf', ) ) t.add_computed_column( output_teacher=t.result_teacher.choices[0].message.content ) t.select(t.input, t.output_teacher).collect() ```
Added 3 column values with 0 errors in 7.70 s (0.39 rows/s) Added 3 column values with 0 errors in 0.02 s (143.54 rows/s)## Additional Resources * [Pixeltable Documentation](https:/docs.pixeltable.com/) * [llama.cpp GitHub](https://github.com/ggerganov/llama.cpp) # Working with Mistral AI in Pixeltable Source: https://docs.pixeltable.com/howto/providers/working-with-mistralai
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'mistralai\_demo'. \## Messages Create a Table: In Pixeltable, create a table with columns to represent your input data and the columns where you want to store the results from Mistral. ```python theme={null} from pixeltable.functions.mistralai import chat_completions # Create a table in Pixeltable and add a computed column that calls Mistral AI t = pxt.create_table('mistralai_demo/chat', {'input': pxt.String}) messages = [{'role': 'user', 'content': t.input}] t.add_computed_column( output=chat_completions( messages=messages, model='mistral-small-latest', model_kwargs={ # Optional dict with parameters for the Mistral API 'max_tokens': 300, 'top_p': 0.9, 'temperature': 0.7, }, ) ) ```
Created table 'chat'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Parse the response into a new column t.add_computed_column(response=t.output.choices[0].message.content) ```
Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Start a conversation t.insert( [ { 'input': 'What three species of fish have the highest mercury content?' } ] ) t.select(t.input, t.response).show() ```
Inserted 1 row with 0 errors in 2.31 s (0.43 rows/s)### Learn more To learn more about advanced techniques like RAG operations in Pixeltable, check out the [RAG Operations in Pixeltable](/howto/use-cases/rag-operations) tutorial. If you have any questions, don’t hesitate to reach out. # Working with Ollama in Pixeltable Source: https://docs.pixeltable.com/howto/providers/working-with-ollama
'The capital of Missouri is Jefferson City. Jefferson City was originally named after the French explorer Pierre-Jacques Houget and the American statesman Thomas Jefferson, who lived in this city from 1764 to 1805. It became the seat of government for most of Jefferson County when it was established in 1836. In more recent times, the name has changed several times due to various political changes and legal changes.'## Install Pixeltable Now, let’s install Pixeltable and create a table for the demo. ```python theme={null} %pip install -qU pixeltable ``` ```python theme={null} import pixeltable as pxt from pixeltable.functions.ollama import chat pxt.drop_dir('ollama_demo', force=True) pxt.create_dir('ollama_demo') ```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'ollama\_demo'. \```python theme={null} t = pxt.create_table('ollama_demo/chat', {'input': pxt.String}) messages = [{'role': 'user', 'content': t.input}] # Add a computed column that runs the model to generate responses t.add_computed_column( output=chat( messages=messages, model='qwen2.5:0.5b', # These parameters are optional and can be used to tune model behavior: options={'max_tokens': 300, 'top_p': 0.9, 'temperature': 0.5}, ) ) ```
Created table 'chat'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Extract the message content into a separate column t.add_computed_column(response=t.output.message.content) ```
Added 0 column values with 0 errors in 0.01 s No rows affected.We can insert our input prompts into the table now. As always, Pixeltable automatically updates the computed columns by calling the relevant Ollama endpoint. ```python theme={null} # Start a conversation t.insert(input='What are the most popular services for LLM inference?') t.select(t.input, t.response).show() ```
Inserted 1 row with 0 errors in 1.28 s (0.78 rows/s)### Learn More To learn more about advanced techniques like RAG operations in Pixeltable, check out the [RAG Operations in Pixeltable](/howto/use-cases/rag-operations) tutorial. If you have any questions, don’t hesitate to reach out. # Working with OpenAI in Pixeltable Source: https://docs.pixeltable.com/howto/providers/working-with-openai
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'openai\_demo'. \## Chat completions Create a Table: In Pixeltable, create a table with columns to represent your input data and the columns where you want to store the results from OpenAI. ```python theme={null} from pixeltable.functions import openai # Create a table in Pixeltable and add a computed column that calls OpenAI t = pxt.create_table('openai_demo/chat', {'input': pxt.String}) messages = [{'role': 'user', 'content': t.input}] t.add_computed_column( output=openai.chat_completions( messages=messages, model='gpt-4o-mini', model_kwargs={ # Optional dict with parameters for the OpenAI API 'max_tokens': 300, 'top_p': 0.9, 'temperature': 0.7, }, ) ) ```
Created table 'chat'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Parse the response into a new column t.add_computed_column(response=t.output.choices[0].message.content) ```
Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Start a conversation t.insert( [{'input': 'How many islands are in the Aleutian island chain?'}] ) t.select(t.input, t.response).head() ```
Inserted 1 row with 0 errors in 3.39 s (0.29 rows/s)## Embeddings Note: OpenAI Embeddings API is not available with free tier API keys ```python theme={null} emb_t = pxt.create_table('openai_demo/embeddings', {'input': pxt.String}) emb_t.add_computed_column( embedding=openai.embeddings( input=emb_t.input, model='text-embedding-3-small' ) ) ```
Created table 'embeddings'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} emb_t.insert( [{'input': 'OpenAI provides a variety of embeddings models.'}] ) ```
Inserted 1 row with 0 errors in 1.03 s (0.97 rows/s) 1 row inserted.```python theme={null} emb_t.head() ``` ## Image generations ```python theme={null} image_t = pxt.create_table('openai_demo/images', {'input': pxt.String}) image_t.add_computed_column( img=openai.image_generations(image_t.input, model='dall-e-2') ) ```
Created table 'images'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} image_t.insert( [ { 'input': 'A giant Pixel floating in the open ocean in a sea of data' } ] ) ```
Inserted 1 row with 0 errors in 11.59 s (0.09 rows/s) 1 row inserted.```python theme={null} image_t ``` ```python theme={null} image_t.head() ``` ## Audio Transcription ```python theme={null} audio_t = pxt.create_table('openai_demo/audio', {'input': pxt.Audio}) audio_t.add_computed_column( result=openai.transcriptions( audio_t.input, model='whisper-1', model_kwargs={ 'language': 'en', 'prompt': 'Transcribe the contents of this recording.', }, ) ) ```
Created table 'audio'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} url = ( 'https://github.com/pixeltable/pixeltable/raw/release/tests/data/audio/' 'jfk_1961_0109_cityuponahill-excerpt.flac' ) audio_t.insert([{'input': url}]) ```
Inserted 1 row with 0 errors in 5.42 s (0.18 rows/s) 1 row inserted.```python theme={null} audio_t.head() ``` ```python theme={null} audio_t.head()[0]['result']['text'] ```
'Allow me to illustrate. During the last 60 days, I have been at the task of constructing an administration. It has been a long and deliberate process. Some have counseled greater speed. Others have counseled more expedient tests. But I have been guided by the standard John Winthrop set before his shipmates on the flagship Arabella 331 years ago, as they too faced the task of building a new government on a perilous frontier. We must always consider, he said, that we shall be as a city upon a hill. The eyes of all peoples are upon us. Today the eyes of all people are truly upon us. And our governments, in every branch, at every level,'### Learn more To learn more about advanced techniques like RAG operations in Pixeltable, check out the [RAG Operations in Pixeltable](/howto/use-cases/rag-operations) tutorial. If you have any questions, don’t hesitate to reach out. # Working with OpenRouter in Pixeltable Source: https://docs.pixeltable.com/howto/providers/working-with-openrouter
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'openrouter\_demo'. \## Chat completions Create a Table: In Pixeltable, create a table with columns to represent your input data and the columns where you want to store the results from OpenRouter. ```python theme={null} from pixeltable.functions import openrouter # Create a table in Pixeltable and add a computed column that calls OpenRouter t = pxt.create_table('openrouter_demo/chat', {'input': pxt.String}) messages = [{'role': 'user', 'content': t.input}] t.add_computed_column( output=openrouter.chat_completions( messages=messages, model='anthropic/claude-sonnet-4', model_kwargs={ # Optional dict with parameters compatible with the model 'max_tokens': 300, 'temperature': 0.7, }, ) ) ```
Created table 'chat'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Parse the response into a new column t.add_computed_column(response=t.output.choices[0].message.content) ```
Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Start a conversation t.insert( [ {'input': 'How many species of felids have been classified?'}, {'input': 'Can you make me a coffee?'}, ] ) t.select(t.input, t.response).head() ```
Inserted 2 rows with 0 errors in 7.59 s (0.26 rows/s)## Using different models One of OpenRouter’s key benefits is easy access to models from multiple providers. Let’s create a table that compares responses from Anthropic Claude, OpenAI GPT-4, and Meta Llama. ```python theme={null} # Create a table to compare different models compare_t = pxt.create_table( 'openrouter_demo/compare_models', {'prompt': pxt.String} ) messages = [{'role': 'user', 'content': compare_t.prompt}] # Add responses from different models compare_t.add_computed_column( claude=openrouter.chat_completions( messages=messages, model='anthropic/claude-sonnet-4', model_kwargs={'max_tokens': 150}, ) .choices[0] .message.content ) compare_t.add_computed_column( gpt4=openrouter.chat_completions( messages=messages, model='openai/gpt-4o-mini', model_kwargs={'max_tokens': 150}, ) .choices[0] .message.content ) compare_t.add_computed_column( llama=openrouter.chat_completions( messages=messages, model='meta-llama/llama-3.3-70b-instruct', model_kwargs={'max_tokens': 150}, ) .choices[0] .message.content ) ```
Created table 'compare\_models'. Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Insert a prompt and compare responses compare_t.insert( [{'prompt': 'Explain quantum entanglement in one sentence.'}] ) compare_t.select( compare_t.prompt, compare_t.claude, compare_t.gpt4, compare_t.llama ).head() ```
Inserted 1 row with 0 errors in 1.27 s (0.79 rows/s)## Advanced features: provider routing OpenRouter allows you to specify provider preferences for fallback behavior and cost optimization. ```python theme={null} # Create a table with provider routing routing_t = pxt.create_table( 'openrouter_demo/routing', {'input': pxt.String} ) messages = [{'role': 'user', 'content': routing_t.input}] routing_t.add_computed_column( output=openrouter.chat_completions( messages=messages, model='anthropic/claude-sonnet-4', model_kwargs={'max_tokens': 300}, # Specify provider preferences provider={ 'order': [ 'Anthropic', 'OpenAI', ], # Try Anthropic first, then OpenAI 'allow_fallbacks': True, }, ) ) routing_t.add_computed_column( response=routing_t.output.choices[0].message.content ) ```
Created table 'routing'. Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} routing_t.insert([{'input': 'What are the primary colors?'}]) routing_t.select(routing_t.input, routing_t.response).head() ```
Inserted 1 row with 0 errors in 3.97 s (0.25 rows/s)## Advanced Features: Context Window Optimization OpenRouter supports transforms like ‘middle-out’ to optimize handling of long contexts. ```python theme={null} # Create a table with transforms for long context optimization transform_t = pxt.create_table( 'openrouter_demo/transforms', {'long_context': pxt.String} ) messages = [{'role': 'user', 'content': transform_t.long_context}] transform_t.add_computed_column( output=openrouter.chat_completions( messages=messages, model='openai/gpt-4o-mini', model_kwargs={'max_tokens': 200}, # Apply middle-out transform for better long context handling transforms=['middle-out'], ) ) transform_t.add_computed_column( response=transform_t.output.choices[0].message.content ) ```
Created table 'transforms'. Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Example with longer context long_text = """ Artificial intelligence has transformed many industries. Machine learning algorithms can now detect patterns in data that humans might miss. Deep learning has revolutionized computer vision and natural language processing. The future of AI looks promising with developments in areas like reinforcement learning and generative models. Question: What are the main AI developments mentioned? """ transform_t.insert([{'long_context': long_text}]) transform_t.select(transform_t.response).head() ```
Inserted 1 row with 0 errors in 1.82 s (0.55 rows/s)### Learn more To learn more about advanced techniques like RAG operations in Pixeltable, check out the [RAG Operations in Pixeltable](/howto/use-cases/rag-operations) tutorial. For more information about OpenRouter’s features and available models, visit: * [OpenRouter Documentation](https://openrouter.ai/docs) * [Available Models](https://openrouter.ai/models) If you have any questions, don’t hesitate to reach out. # Working with Pydantic in Pixeltable Source: https://docs.pixeltable.com/howto/providers/working-with-pydantic
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'pydantic\_demo'. \## Basic usage: scalar types Define a Pydantic model with fields that match your table columns. Pixeltable automatically maps Python types to Pixeltable types: ```python theme={null} import datetime import pydantic from enum import Enum from typing import Literal # Define an enum for product categories class Category(Enum): ELECTRONICS = 1 CLOTHING = 2 BOOKS = 3 # Define a Pydantic model class Product(pydantic.BaseModel): name: str price: float in_stock: bool category: Category rating: Literal['poor', 'average', 'good', 'excellent'] created_at: datetime.datetime description: str | None = None # Optional field ``` ```python theme={null} # Create a table with matching schema products = pxt.create_table( 'pydantic_demo/products', { 'name': pxt.Required[pxt.String], 'price': pxt.Required[pxt.Float], 'in_stock': pxt.Required[pxt.Bool], 'category': pxt.Required[pxt.Int], # Enum values are integers 'rating': pxt.Required[pxt.String], # Literal values 'created_at': pxt.Required[pxt.Timestamp], 'description': pxt.String, # Nullable }, ) ```
Created table 'products'.```python theme={null} # Create Pydantic model instances now = datetime.datetime.now() product_data = [ Product( name='Wireless Headphones', price=79.99, in_stock=True, category=Category.ELECTRONICS, rating='excellent', created_at=now, description='High-quality wireless headphones with noise cancellation', ), Product( name='Python Cookbook', price=49.99, in_stock=True, category=Category.BOOKS, rating='good', created_at=now, ), Product( name='Running Shoes', price=129.99, in_stock=False, category=Category.CLOTHING, rating='average', created_at=now, description='Lightweight running shoes', ), ] # Insert Pydantic models directly products.insert(product_data) products.collect() ```
Inserted 3 rows with 0 errors in 0.02 s (146.18 rows/s)## Nested models and JSON columns Nested Pydantic models automatically map to Pixeltable JSON columns. This is useful for storing structured metadata. ```python theme={null} # Define nested models class Address(pydantic.BaseModel): street: str city: str country: str zip_code: str class ContactInfo(pydantic.BaseModel): email: str phone: str | None = None address: Address class Customer(pydantic.BaseModel): customer_id: str name: str contact: ContactInfo # Nested model → JSON column ``` ```python theme={null} # Create table with JSON column for nested data customers = pxt.create_table( 'pydantic_demo/customers', { 'customer_id': pxt.Required[pxt.String], 'name': pxt.Required[pxt.String], 'contact': pxt.Required[pxt.Json], # Nested model stored as JSON }, ) ```
Created table 'customers'.```python theme={null} # Insert nested data customer_data = [ Customer( customer_id='C001', name='Alice Johnson', contact=ContactInfo( email='alice@example.com', phone='+1-555-0101', address=Address( street='123 Main St', city='San Francisco', country='USA', zip_code='94102', ), ), ), Customer( customer_id='C002', name='Bob Smith', contact=ContactInfo( email='bob@example.com', address=Address( street='456 Oak Ave', city='New York', country='USA', zip_code='10001', ), ), ), ] customers.insert(customer_data) customers.collect() ```
Inserted 2 rows with 0 errors in 0.01 s (227.55 rows/s)```python theme={null} # Query nested JSON fields using Pixeltable's JSON path syntax customers.select( customers.name, email=customers.contact.email, city=customers.contact.address.city, ).collect() ``` ## Media files with Pydantic For media columns (Image, Video, Audio, Document), use `str` or `Path` fields in your Pydantic model to specify file paths or URLs. ```python theme={null} from pathlib import Path class ImageRecord(pydantic.BaseModel): title: str image_url: str # URLs or file paths as strings tags: list[str] # Create table with Image column images = pxt.create_table( 'pydantic_demo/images', { 'title': pxt.Required[pxt.String], 'image_url': pxt.Required[pxt.Image], # Media column 'tags': pxt.Required[pxt.Json], }, ) ```
Created table 'images'.```python theme={null} # Insert image records with URLs base_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images' image_data = [ ImageRecord( title='Sample Image', image_url=f'{base_url}/000000000036.jpg', tags=['sample', 'test', 'image'], ) ] images.insert(image_data) images.select(images.title, images.image_url, images.tags).collect() ```
Inserted 1 row with 0 errors in 0.27 s (3.74 rows/s)## Working with Computed Columns Pydantic models work seamlessly with computed columns. Simply omit computed column fields from your model - Pixeltable will skip them during insertion. ```python theme={null} # Model only includes input columns class Article(pydantic.BaseModel): title: str content: str # Create table with computed column articles = pxt.create_table( 'pydantic_demo/articles', { 'title': pxt.Required[pxt.String], 'content': pxt.Required[pxt.String], }, ) # Add a computed column articles.add_computed_column( word_count=articles.content.apply( lambda x: len(x.split()), col_type=pxt.Int ) ) ```
Created table 'articles'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Insert data - computed columns are automatically calculated article_data = [ Article( title='Getting Started with Pixeltable', content='Pixeltable is a powerful tool for building AI applications. It provides automatic versioning and incremental computation.', ), Article( title='Type Safety in Python', content='Using Pydantic with Pixeltable provides type safety and validation for your data pipelines.', ), ] articles.insert(article_data) articles.select(articles.title, articles.word_count).collect() ```
Inserted 2 rows with 0 errors in 0.01 s (186.43 rows/s)## Optional Fields and Defaults Pydantic’s optional fields with defaults work naturally with Pixeltable’s nullable columns. ```python theme={null} class Task(pydantic.BaseModel): title: str priority: int = 1 # Default value due_date: datetime.datetime | None = None # Optional notes: str | None = None # Optional tasks = pxt.create_table( 'pydantic_demo/tasks', { 'title': pxt.Required[pxt.String], 'priority': pxt.Required[pxt.Int], 'due_date': pxt.Timestamp, # Nullable 'notes': pxt.String, # Nullable }, ) # Insert with and without optional fields tasks.insert( [ Task( title='Complete project', priority=3, due_date=datetime.datetime(2025, 12, 31), ), Task( title='Review code' ), # Uses default priority=1, None for optionals Task(title='Write docs', notes='Include examples'), ] ) tasks.collect() ```
Created table 'tasks'. Inserted 3 rows with 0 errors in 0.01 s (408.88 rows/s)## Type Mapping Reference Here’s the complete mapping between Pydantic/Python types and Pixeltable types: ## Learn More For more information about working with Pydantic in Pixeltable: * [Pixeltable Documentation](https://docs.pixeltable.com) * [Pydantic Documentation](https://docs.pydantic.dev) * [Type Safety Blog Post](https://www.pixeltable.com/blog/pydantic-integration-type-safety) If you have any questions, don’t hesitate to reach out on [Discord](https://discord.com/invite/QPyqFYx2UN). # Working with Replicate in Pixeltable Source: https://docs.pixeltable.com/howto/providers/working-with-replicate
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'replicate\_demo'. \## Chat completions Create a Table: In Pixeltable, create a table with columns to represent your input data and the columns where you want to store the results from Replicate. ```python theme={null} from pixeltable.functions.replicate import run # Create a table in Pixeltable and pick a model hosted on Replicate with some parameters t = pxt.create_table('replicate_demo/chat', {'prompt': pxt.String}) input = { 'system_prompt': 'You are a helpful assistant.', 'prompt': t.prompt, # These parameters are optional and can be used to tune model behavior: 'max_tokens': 300, 'top_p': 0.9, 'temperature': 0.8, } t.add_computed_column( output=run(input, ref='meta/meta-llama-3-8b-instruct') ) ```
Created table 'chat'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Parse the response into a new column t.add_computed_column(response=pxt.functions.string.join('', t.output)) ```
Added 0 column values with 0 errors in 0.02 s No rows affected.```python theme={null} # Start a conversation t.insert([{'prompt': 'What foods are rich in selenium?'}]) t.select(t.prompt, t.response).show() ```
Inserted 1 row with 0 errors in 4.45 s (0.22 rows/s)## Image generation Here’s an example that shows how to use Replicate’s image generation models with Pixeltable. We’ll use the FLUX Schnell model. ```python theme={null} t = pxt.create_table('replicate_demo/images', {'prompt': pxt.String}) input = {'prompt': t.prompt, 'go_fast': True, 'megapixels': '1'} t.add_computed_column( output=run(input, ref='black-forest-labs/flux-schnell') ) ```
Created table 'images'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} t.insert( [ { 'prompt': 'Draw a pencil sketch of a friendly dinosaur playing tennis in a cornfield.' } ] ) ```
Inserted 1 row with 0 errors in 0.99 s (1.01 rows/s) 1 row inserted.```python theme={null} t.select(t.prompt, t.output).collect() ``` We see that Replicate returns our image as an array containing a single URL. To turn it into an actual image, we cast the string to type `pxt.Image` in a new computed column: ```python theme={null} t.add_computed_column(image=t.output[0].astype(pxt.Image)) t.select(t.image).collect() ```
Added 1 column value with 0 errors in 0.02 s (53.36 rows/s)### Learn more To learn more about advanced techniques like RAG operations in Pixeltable, check out the [RAG Operations in Pixeltable](/howto/use-cases/rag-operations) tutorial. If you have any questions, don’t hesitate to reach out. # Working with Reve in Pixeltable Source: https://docs.pixeltable.com/howto/providers/working-with-reve
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/alison-pxt/.pixeltable/pgdata Created directory 'reve\_demo'. \We’ll create a Pixeltable table that starts with a prompt and a source image, and ends with a final scene. The table we’ll build up to will require two inputs per row: 1. A prompt for creating a background scene image. We’ll use this prompt for Reve to create a scene with `reve.create()`. 2. An existing source image. We’ll ask Reve to edit this image with `reve.edit()`, and then it will be ready as the foreground image. Finally, we’ll remix the background scene image we made in step 1 by combining it with the foreground image we made in step 2 with `reve.remix()`. ```python theme={null} spunk_t = pxt.create_table( 'reve_demo/solarpunk_scenes', {'prompt': pxt.String, 'source_image': pxt.Image}, ) ```
Created table 'solarpunk\_scenes'.To read more about creating tables, see [Tables and Data Operations](/tutorials/tables-and-data-operations). You can look at the schema for this table: ```python theme={null} spunk_t.describe() ``` Now, we’ll insert values for our first row. We need to provide a text prompt for the `reve.create()` function and a source image for the `reve.edit()` function. ```python theme={null} scene_prompt = ( 'Create a scene of lush solarpunk metropolis in the desert ' 'with urban agriculture and an oasis theme.' 'It should not look like an office park, corporate campus, or an outdoor mall.' ) image_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg' ``` ```python theme={null} spunk_t.insert([{'prompt': scene_prompt, 'source_image': image_url}]) ```
Inserted 1 row with 0 errors in 0.03 s (39.10 rows/s) 1 row inserted.To read more about inserting data, see [Bringing Data](/howto/cookbooks/data/data-import-csv). And we can peek at our starter table with a single row: ```python theme={null} spunk_t.collect() ``` ## Generate new imagery with Reve Create Use `reve.create()` when you want Reve to synthesize an entirely new image from a prompt. In Pixeltable, we place this function call inside a computed column. We’ll generate fresh imagery from the prompt first in this section. Feel free to change the prompt. Here we ask for a solarpunk oasis city. ```python theme={null} from pixeltable.functions import reve spunk_t.add_computed_column( new_image=reve.create(spunk_t.prompt), if_exists='replace' ) ```
Added 1 column value with 0 errors in 6.16 s (0.16 rows/s) 1 row updated.To read more about computed columns in Pixeltable, see [Computed Columns](/tutorials/computed-columns). ```python theme={null} spunk_t.select(spunk_t.prompt, spunk_t.new_image).collect() ``` By default, Pixeltable saves all generated media outputs to a media directory. We can see the file path by using the `fileurl` property. ```python theme={null} spunk_t.select(spunk_t.new_image.fileurl).collect() ``` ### Add Reve parameters All Reve functions accept optional parameters to customize the output: * `aspect_ratio`: desired image aspect ratio, e.g. ‘3:2’, ‘16:9’, ‘1:1’, etc. (available for `reve.create()` and `reve.remix()`) * `version`: specific model version to use (optional; defaults to latest if not specified). Available for all Reve functions (`reve.create()`, `reve.edit()`, and `reve.remix()`) This adds a second image column using the same prompt that renders in a square frame. ```python theme={null} spunk_t.add_computed_column( new_image_sq=reve.create(spunk_t.prompt, aspect_ratio='1:1'), if_exists='replace', ) ```
Added 1 column value with 0 errors in 6.22 s (0.16 rows/s) 1 row updated.```python theme={null} spunk_t.select( spunk_t.prompt, spunk_t.new_image, spunk_t.new_image_sq ).collect() ``` To read more about `reve.create()`, see [reve.create UDF](/sdk/latest/reve#udf-create). ## Edit an existing photo with Reve Edit `reve.edit()` takes an existing image plus natural-language instructions and returns an edited version. We already have a `source_image` column in our table from the initial setup. ```python theme={null} spunk_t.select(spunk_t.source_image).collect() ``` We can now add a computed column that calls `reve.edit()` to modify the source image. To read more about `reve.edit()`, see [reve.edit UDF](/sdk/latest/reve#udf-edit). This editing prompt is integrated into our computed column logic in Pixeltable, as opposed to our creating example where we saved the prompt as its own column. This means that the same prompt will be applied to any new rows that we insert into this table. We will phrase the editing prompt to reflect this table’s solarpunk theme, but otherwise keep it general. This way, we don’t need to provide a specific prompt for every new table row. ```python theme={null} # Uncomment the below line to use a Reve function, if you have not already done so # from pixeltable.functions import reve spunk_t.add_computed_column( edited_subject=reve.edit( spunk_t.source_image, 'Remove any existing background. Focus on the closest person in the foreground. ' 'Keep the person and props, but make the lighting and colors vibrant and fit with a solarpunk theme. ' 'Make the background behind the person blank.', ), if_exists='replace', ) ```
Added 1 column value with 0 errors in 16.54 s (0.06 rows/s) 1 row updated.We can use `collect()` to see the new image: ```python theme={null} spunk_t.select(spunk_t.source_image, spunk_t.edited_subject).collect() ``` ## Remix multiple references with Reve Remix `reve.remix()` blends multiple reference images. Inside the prompt string, you reference each image with a numbered placeholder: * `
Added 1 column value with 0 errors in 18.58 s (0.05 rows/s) 1 row updated.To read more about `reve.remix()`, see [reve.remix UDF](/sdk/latest/reve#udf-remix). ```python theme={null} spunk_t.select(spunk_t.solarpunk_remix).collect() ``` ## Insert a new row So far, we have been building up our table schema with a single row. Now we’ll insert a new row, with two fresh input values: 1. A text prompt to create the scene image with `reve.create()` and 2. A source image to edit with `reve.edit()` and remix into that scene with `reve.remix()`. Pixeltable will then automatically make the desired Reve API calls and populate the computed columns. ```python theme={null} spunk_t.insert( [ { 'prompt': 'Create an indoor tennis court scene, with clay courts inside a lush solarpunk greenhouse filled with bougainvillea, terraced gardens, and an oasis theme.', 'source_image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000885.jpg', } ] ) ```
Inserted 1 row with 0 errors in 34.30 s (0.03 rows/s) 1 row inserted.Now we can inspect both outputs because the `insert()` in Pixeltable triggers our computed columns to update for any that are missing values (existing images we already generated are not changed because Pixeltable does incremental updates). For example, here is our inserted image and our edited image: ```python theme={null} spunk_t.select(spunk_t.source_image, spunk_t.edited_subject).collect() ``` Here are our two remixed images created by Reve: ```python theme={null} spunk_t.select(spunk_t.solarpunk_remix).collect() ``` All together, we created a new scene image, edited an existing image of a person, then remixed both together to reimagine an existing person in our new scene. ```python theme={null} spunk_t.select( spunk_t.new_image, spunk_t.edited_subject, spunk_t.solarpunk_remix ).collect() ``` ## Review Reve in Pixeltable Below is a quick recap of how each Reve function maps inputs to outputs inside Pixeltable tables. Each function reads input parameters and writes its results into computed columns. ### Reve Create * **Input parameter:** A prompt inserted as a row inside a Pixeltable ```python theme={null} spunk_t.select( spunk_t.prompt, spunk_t.new_image, spunk_t.new_image_sq ).collect() ``` ### Reve Edit * **Input parameter:** A source image of type `pxt.Image` * **Usage reminder:** The edit instructions live inline inside the `add_computed_column()` call ```python theme={null} spunk_t.select(spunk_t.source_image, spunk_t.edited_subject).collect() ``` ### Reve Remix * **Input parameters:** We started with two image columns * **How the prompt references them:** * `images=[my_table.image00, my_table.image01]` * Inside the prompt, `
Created directory 'tigris'. Created table 'screenshots'. Inserting rows into \`screenshots\`: 100 rows \[00:01, 51.72 rows/s] Inserting rows into \`screenshots\`: 100 rows \[00:01, 55.57 rows/s] Inserting rows into \`screenshots\`: 100 rows \[00:01, 52.74 rows/s] Inserting rows into \`screenshots\`: 100 rows \[00:02, 33.96 rows/s] Inserting rows into \`screenshots\`: 100 rows \[00:02, 42.64 rows/s] Inserting rows into \`screenshots\`: 100 rows \[00:02, 39.65 rows/s] Inserting rows into \`screenshots\`: 100 rows \[00:02, 47.36 rows/s] Inserting rows into \`screenshots\`: 28 rows \[00:00, 6786.12 rows/s] Inserted 728 rows with 0 errors.Once the import is done, you can create thumbnails with a [computed column](/tutorials/computed-columns): ```python theme={null} # Add a computed column for thumbnails # Uses output_media_dest by default, or specify a custom destination screenshots.add_computed_column( thumbnail=screenshots.image.resize((256, 256)), destination=f's3://{bucket_name}/botw-screenshots/thumbnails/', ) ```
Added 728 column values with 0 errors. 728 rows updated, 728 values computed.And then inspect that with the `collect` method: ```python theme={null} results = screenshots.limit(1).collect() results ``` ## Getting URLs for your files When your files are in object storage, you can get URLs that point directly to them. These URLs work in HTML, APIs, or any application you need to serve media with. Fetch them with the `.fileurl` property: ```python theme={null} screenshots.select( image=screenshots.image, image_url=screenshots.image.fileurl, thumbnail=screenshots.thumbnail, thumbnail_url=screenshots.thumbnail.fileurl, ).limit(1).collect() ``` ## Generating Presigned URLs For private buckets or when you need time-limited access to files, use presigned URLs. These are temporary, authenticated URLs that allow anyone to access your files for a limited time without needing credentials. Use the `presigned_url` function from `pixeltable.functions.net`: ```python theme={null} from pixeltable.functions import net # Generate presigned URLs with 1-hour expiration (3600 seconds) screenshots.select( image=screenshots.image, image_url=screenshots.image.fileurl, image_presigned=net.presigned_url(screenshots.image.fileurl, 3600), thumbnail=screenshots.thumbnail, thumbnail_url=screenshots.thumbnail.fileurl, thumbnail_presigned=net.presigned_url( screenshots.thumbnail.fileurl, 3600 ), ).limit(1).collect() ``` ### Common expiration times ## What you learned * When you configure Pixeltable to use Tigris to store images, adding images transparently uploads them into Tigris for global distribution. * You can override where images are stored in Tigris using the `destination=` kwarg when creating computed columns. * Use the `.fileurl` property in queries to get URLs for your stored files. * Use `net.presigned_url()` to generate time-limited, authenticated URLs for private bucket access. Pixeltable handles everything else for you. ## Next steps * See the [Cloud Storage documentation](/integrations/cloud-storage) for complete provider setup and authentication details. * Check out [Pixeltable Configuration](/platform/configuration) for all config options. * Join our [Discord community](https://pixeltable.com/discord) if you have questions. ## Additional Resources * [Pixeltable Documentation](/) * [Tigris Documentation](https://www.tigrisdata.com/docs/) # Working with Together AI in Pixeltable Source: https://docs.pixeltable.com/howto/providers/working-with-together
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'together\_demo'. \## Chat completions Create a Table: In Pixeltable, create a table with columns to represent your input data and the columns where you want to store the results from OpenAI. ```python theme={null} from pixeltable.functions import together chat_t = pxt.create_table('together_demo/chat', {'input': pxt.String}) messages = [{'role': 'user', 'content': chat_t.input}] chat_t.add_computed_column( output=together.chat_completions( messages=messages, model='meta-llama/Llama-3.3-70B-Instruct-Turbo', model_kwargs={ # Optional dict with parameters for the Together API 'max_tokens': 300, 'stop': ['\n'], 'temperature': 0.7, 'top_p': 0.9, }, ) ) chat_t.add_computed_column( response=chat_t.output.choices[0].message.content ) ```
Created table 'chat'. Added 0 column values with 0 errors in 0.01 s Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} # Start a conversation chat_t.insert( [ {'input': 'How many species of felids have been classified?'}, {'input': 'Can you make me a coffee?'}, ] ) chat_t.select(chat_t.input, chat_t.response).head() ```
Inserted 2 rows with 0 errors in 1.58 s (1.27 rows/s)## Embeddings ```python theme={null} emb_t = pxt.create_table( 'together_demo/embeddings', {'input': pxt.String} ) emb_t.add_computed_column( embedding=together.embeddings( input=emb_t.input, model='BAAI/bge-base-en-v1.5' ) ) ```
Created table 'embeddings'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} emb_t.insert( [{'input': 'Together AI provides a variety of embeddings models.'}] ) ```
Inserted 1 row with 0 errors in 0.54 s (1.86 rows/s) 1 row inserted.```python theme={null} emb_t.head() ``` ## Image generations ```python theme={null} image_t = pxt.create_table('together_demo/images', {'input': pxt.String}) image_t.add_computed_column( img=together.image_generations( image_t.input, model='black-forest-labs/FLUX.1-schnell', model_kwargs={'steps': 5}, ) ) ```
Created table 'images'. Added 0 column values with 0 errors in 0.01 s No rows affected.```python theme={null} image_t.insert( [{'input': 'A friendly dinosaur playing tennis in a cornfield'}] ) ```
Inserted 1 row with 0 errors in 1.35 s (0.74 rows/s) 1 row inserted.```python theme={null} image_t ``` ```python theme={null} image_t.head() ``` ### Learn more To learn more about advanced techniques like RAG operations in Pixeltable, check out the [RAG Operations in Pixeltable](/howto/use-cases/rag-operations) tutorial. If you have any questions, don’t hesitate to reach out. # Working with Twelve Labs in Pixeltable Source: https://docs.pixeltable.com/howto/providers/working-with-twelvelabs
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'twelvelabs\_demo'. \## Cross-Modal Video Search Let’s index a video and search it using text, images, audio, and other videos - all against the same index. ### Create Video Table and Index ```python theme={null} # Create a table for videos video_t = pxt.create_table('twelvelabs_demo/videos', {'video': pxt.Video}) # Insert a sample video video_url = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/The-Pursuit-of-Happiness.mp4' video_t.insert([{'video': video_url}]) ```
Created table 'videos'. Inserted 1 row with 0 errors in 1.60 s (0.63 rows/s) 1 row inserted.```python theme={null} # Create a view that segments the video into searchable chunks # Twelve Labs requires minimum 4 second segments video_chunks = pxt.create_view( 'twelvelabs_demo/video_chunks', video_t, iterator=pxtf.video.video_splitter( video=video_t.video, duration=5.0, min_segment_duration=4.0 ), ) # Add embedding index for cross-modal search video_chunks.add_embedding_index( 'video_segment', embedding=pxtf.twelvelabs.embed.using(model_name='marengo3.0'), ) ``` Let’s look at the index we just added in the table metadata: ```python theme={null} video_chunks ``` The iterator created a larger table from our single video: ```python theme={null} video_chunks.count() ```
51### Text to Video Search Find video segments matching a text description. ```python theme={null} sim = video_chunks.video_segment.similarity(string='pink') video_chunks.order_by(sim, asc=False).limit(3).select( video_chunks.video_segment, score=sim ).collect() ``` ### Image to Video Search Find video segments similar to an image. ```python theme={null} image_query = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/The-Pursuit-of-Happiness-Screenshot.png' sim = video_chunks.video_segment.similarity(image=image_query) video_chunks.order_by(sim, asc=False).limit(2).select( video_chunks.video_segment, score=sim ).collect() ``` ### Video to Video Search Find video segments similar to another video clip. ```python theme={null} video_query = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/The-Pursuit-of-Happiness-Video-Extract.mp4' sim = video_chunks.video_segment.similarity(video=video_query) video_chunks.order_by(sim, asc=False).limit(2).select( video_chunks.video_segment, score=sim ).collect() ``` ### Audio to Video Search Find video segments with similar audio/speech content. ```python theme={null} audio_query = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/The-Pursuit-of-Happiness-Audio-Extract.m4a' sim = video_chunks.video_segment.similarity(audio=audio_query) video_chunks.order_by(sim, asc=False).limit(2).select( video_chunks.video_segment, score=sim ).collect() ``` ## Embedding Options For video embeddings, you can focus on specific aspects: * `'visual'` - Focus on what you see * `'audio'` - Focus on what you hear * `'transcription'` - Focus on what is said ```python theme={null} # Add a visual-only embedding column video_chunks.add_computed_column( visual_embedding=pxtf.twelvelabs.embed( video_chunks.video_segment, model_name='marengo3.0', embedding_option=['visual'], ) ) video_chunks.select( video_chunks.video_segment, video_chunks.visual_embedding ).limit(2).collect() ```
Added 51 column values with 0 errors in 19.81 s (2.57 rows/s)## Other Modalities: Text, Images, and Documents Twelve Labs embeddings also work for text, images, and documents. Here’s a compact example showing **multiple embedding indexes on a single table**. ```python theme={null} # Create a multimodal content table content_t = pxt.create_table( 'twelvelabs_demo/content', { 'title': pxt.String, 'description': pxt.String, 'thumbnail': pxt.Image, }, ) # Add computed column combining title and description content_t.add_computed_column( text_content=content_t.title + '. ' + content_t.description ) # Add embedding index on combined text column content_t.add_embedding_index( 'text_content', embedding=pxtf.twelvelabs.embed.using(model_name='marengo3.0'), ) # Add embedding index on image column content_t.add_embedding_index( 'thumbnail', embedding=pxtf.twelvelabs.embed.using(model_name='marengo3.0'), ) ```
Created table 'content'. Added 0 column values with 0 errors in 0.01 s```python theme={null} # Insert sample content content_t.insert( [ { 'title': 'Beach Sunset', 'description': 'A beautiful sunset over the ocean with palm trees.', 'thumbnail': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000025.jpg', }, { 'title': 'Mountain Hiking', 'description': 'Hikers climbing a steep mountain trail with scenic views.', 'thumbnail': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000139.jpg', }, { 'title': 'City Street', 'description': 'Busy urban street with cars and pedestrians.', 'thumbnail': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000042.jpg', }, { 'title': 'Wildlife Safari', 'description': 'Elephants and zebras on the African savanna.', 'thumbnail': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000061.jpg', }, ] ) ```
Inserted 4 rows with 0 errors in 1.97 s (2.03 rows/s) 4 rows inserted.We can see the two indexes we added in the schema: ```python theme={null} content_t ``` ```python theme={null} # Search by text description sim = content_t.text_content.similarity(string='outdoor nature adventure') content_t.order_by(sim, asc=False).limit(2).select( content_t.title, content_t.text_content, score=sim ).collect() ``` ```python theme={null} # Search by image similarity query_image = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000001.jpg' sim = content_t.thumbnail.similarity(image=query_image) content_t.order_by(sim, asc=False).limit(2).select( content_t.title, content_t.thumbnail, score=sim ).collect() ``` ```python theme={null} # Cross-modal: Search images using text! sim = content_t.thumbnail.similarity(string='shoe rack') content_t.order_by(sim, asc=False).limit(2).select( content_t.title, content_t.thumbnail, score=sim ).collect() ``` ## Summary **Twelve Labs + Pixeltable enables:** * **Cross-modal search**: Query video with text, images, audio, or other videos * **Multiple indexes per table**: Add embedding indexes on different columns * **Embedding options**: Focus on visual, audio, or transcription aspects * **All modalities**: Text, images, audio, video, and documents ### Learn More * [Twelve Labs Documentation](https://docs.twelvelabs.io/) * [Pixeltable Documentation](/) # Working with Voyage AI in Pixeltable Source: https://docs.pixeltable.com/howto/providers/working-with-voyageai
Created directory 'voyageai\_demo'. \## Text embeddings Voyage AI provides state-of-the-art embedding models for semantic search and RAG applications. ```python theme={null} from pixeltable.functions import voyageai # Create a table for document embeddings docs_t = pxt.create_table('voyageai_demo/documents', {'text': pxt.String}) # Add computed column with Voyage embeddings docs_t.add_computed_column( embedding=voyageai.embeddings( docs_t.text, model='voyage-3.5', input_type='document' ) ) ```
Created table 'documents'. Added 0 column values with 0 errors. No rows affected.```python theme={null} # Insert some sample documents documents = [ 'The Mediterranean diet emphasizes fish, olive oil, and vegetables, believed to reduce chronic diseases.', 'Photosynthesis in plants converts light energy into glucose and produces essential oxygen.', '20th-century innovations, from radios to smartphones, centered on electronic advancements.', 'Rivers provide water, irrigation, and habitat for aquatic species, vital for ecosystems.', "Apple's conference call to discuss fourth fiscal quarter results is scheduled for Thursday, November 2, 2023.", "Shakespeare's works, like 'Hamlet' and 'A Midsummer Night's Dream,' endure in literature.", ] docs_t.insert({'text': doc} for doc in documents) ```
Inserting rows into \`documents\`: 6 rows \[00:00, 2561.67 rows/s] Inserted 6 rows with 0 errors. 6 rows inserted, 12 values computed.```python theme={null} # View the embeddings docs_t.select(docs_t.text, docs_t.embedding).head(3) ``` ## Embedding index for similarity search You can use Voyage AI embeddings with Pixeltable’s embedding index for efficient similarity search. ```python theme={null} # Create a table with an embedding index search_t = pxt.create_table('voyageai_demo/search', {'text': pxt.String}) # Add embedding index for similarity search embed_fn = voyageai.embeddings.using( model='voyage-3.5', input_type='document' ) search_t.add_embedding_index('text', string_embed=embed_fn) ```
Created table 'search'.```python theme={null} # Insert documents search_t.insert({'text': doc} for doc in documents) ```
Inserting rows into \`search\`: 6 rows \[00:00, 973.68 rows/s] Inserted 6 rows with 0 errors. 6 rows inserted, 12 values computed.```python theme={null} # Perform similarity search sim = search_t.text.similarity( string='What are the health benefits of Mediterranean food?' ) search_t.order_by(sim, asc=False).limit(3).select( search_t.text, score=sim ).collect() ``` ## Reranking Voyage AI’s rerankers can refine search results by providing more accurate relevance scores. ```python theme={null} # Create a table for reranking rerank_t = pxt.create_table( 'voyageai_demo/rerank', {'query': pxt.String, 'documents': pxt.Json} ) # Add computed column with reranking results rerank_t.add_computed_column( reranked=voyageai.rerank( rerank_t.query, rerank_t.documents, model='rerank-2.5', top_k=3 ) ) ```
Created table 'rerank'. Added 0 column values with 0 errors. No rows affected.```python theme={null} # Insert query and documents to rerank rerank_t.insert( [ { 'query': "When is Apple's conference call scheduled?", 'documents': documents, } ] ) ```
Inserting rows into \`rerank\`: 1 rows \[00:00, 343.65 rows/s] Inserted 1 row with 0 errors. 1 row inserted, 2 values computed.```python theme={null} # Add computed column to extract top results using JSON path rerank_t.add_computed_column(top_results=rerank_t.reranked['results']) ```
Added 1 column value with 0 errors. 1 row updated, 1 value computed.```python theme={null} # Extract the top result's document and score rerank_t.select( rerank_t.query, top_document=rerank_t.top_results[0]['document'], top_score=rerank_t.top_results[0]['relevance_score'], ).collect() ``` ```python theme={null} # View reranking results rerank_t.select(rerank_t.query, rerank_t.top_results).collect() ``` ## Multimodal Embeddings Voyage AI’s multimodal model (`voyage-multimodal-3`) can embed both images and text into the same vector space, enabling cross-modal similarity search. ```python theme={null} # Create a table for multimodal embeddings mm_t = pxt.create_table( 'voyageai_demo/multimodal', {'image': pxt.Image, 'caption': pxt.String}, if_exists='replace', ) # Add computed columns for image and text embeddings # multimodal_embed can embed either images or text independently mm_t.add_computed_column( image_embedding=voyageai.multimodal_embed( mm_t.image, model='voyage-multimodal-3.5', input_type='document' ) ) mm_t.add_computed_column( text_embedding=voyageai.multimodal_embed( mm_t.caption, model='voyage-multimodal-3.5', input_type='document' ) ) ```
Created table 'multimodal'. Added 0 column values with 0 errors. Added 0 column values with 0 errors. No rows affected.```python theme={null} # Insert a sample image with caption mm_t.insert( [ { 'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000139.jpg', 'caption': 'A person standing next to an elephant', } ] ) ```
Inserting rows into \`multimodal\`: 1 rows \[00:00, 520.00 rows/s] Inserted 1 row with 0 errors. 1 row inserted, 5 values computed.```python theme={null} # View the multimodal embeddings mm_t.select( mm_t.image, mm_t.caption, mm_t.image_embedding, mm_t.text_embedding ).head() ``` ### Learn more To learn more about RAG operations in Pixeltable, check out the [RAG Operations in Pixeltable](/howto/use-cases/rag-operations) tutorial. For more information about Voyage AI models and features, visit: * [Voyage AI Documentation](https://docs.voyageai.com/) * [Text Embeddings](https://docs.voyageai.com/docs/embeddings) * [Multimodal Embeddings](https://docs.voyageai.com/docs/multimodal-embeddings) * [Rerankers](https://docs.voyageai.com/docs/reranker) If you have any questions, don’t hesitate to reach out. # Transcribing and Indexing Audio and Video in Pixeltable Source: https://docs.pixeltable.com/howto/use-cases/audio-transcriptions
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'transcription\_demo'. Created table 'video\_table'.Next let’s insert some video files into the table. In this demo, we’ll be using one-minute excerpts from a Lex Fridman podcast. We’ll begin by inserting two of them into our new table. In this demo, our videos are given as `https` links, but Pixeltable also accepts local files and S3 URLs as input. ```python theme={null} videos = [ 'https://github.com/pixeltable/pixeltable/raw/release/docs/resources/audio-transcription-demo/' f'Lex-Fridman-Podcast-430-Excerpt-{n}.mp4' for n in range(3) ] video_table.insert({'video': video} for video in videos[:2]) video_table.show() ```
Inserted 2 rows with 0 errors in 2.04 s (0.98 rows/s)Now we’ll add another column to hold extracted audio from our videos. The new column is an example of a *computed column*: it’s updated automatically based on the contents of another column (or columns). In this case, the value of the `audio` column is defined to be the audio track extracted from whatever’s in the `video` column. ```python theme={null} from pixeltable.functions.video import extract_audio video_table.add_computed_column( audio=extract_audio(video_table.video, format='mp3') ) video_table.show() ```
Added 2 column values with 0 errors in 0.91 s (2.19 rows/s)If we look at the structure of the video table, we see that the new column is a computed column. ```python theme={null} video_table ``` We can also add another computed column to extract metadata from the audio streams. ```python theme={null} from pixeltable.functions.audio import get_metadata video_table.add_computed_column(metadata=get_metadata(video_table.audio)) video_table.show() ```
Added 2 column values with 0 errors in 0.02 s (95.47 rows/s)## Create Transcriptions Now we’ll add a step to create transcriptions of our videos. As mentioned above, we’re going to use the Whisper library for this, running locally. Pixeltable has a built-in function, `whisper.transcribe`, that serves as an adapter for the Whisper library’s transcription capability. All we have to do is add a computed column that calls this function: ```python theme={null} from pixeltable.functions import whisper video_table.add_computed_column( transcription=whisper.transcribe( audio=video_table.audio, model='base.en' ) ) video_table.select( video_table.video, video_table.transcription.text ).show() ```
Added 2 column values with 0 errors in 4.63 s (0.43 rows/s)In order to index the transcriptions, we’ll first need to split them into sentences. We can do this using Pixeltable’s built-in `string_splitter` iterator. ```python theme={null} from pixeltable.functions.string import string_splitter sentences_view = pxt.create_view( 'transcription_demo/sentences_view', video_table, iterator=string_splitter( video_table.transcription.text, separators='sentence' ), ) ``` The `string_splitter` creates a new view, with the audio transcriptions broken into individual, one-sentence chunks. ```python theme={null} sentences_view.select(sentences_view.pos, sentences_view.text).show(8) ``` ## Add an Embedding Index Next, let’s use the Huggingface `sentence_transformers` library to create an embedding index of our sentences, attaching it to the `text` column of our `sentences_view`. ```python theme={null} from pixeltable.functions.huggingface import sentence_transformer sentences_view.add_embedding_index( 'text', embedding=sentence_transformer.using(model_id='intfloat/e5-large-v2'), ) ```
modules.json: 0%| | 0.00/387 \[00:00\, ?B/s] README.md: 0.00B \[00:00, ?B/s] sentence\_bert\_config.json: 0%| | 0.00/57.0 \[00:00\, ?B/s] config.json: 0%| | 0.00/616 \[00:00\, ?B/s] model.safetensors: 0%| | 0.00/1.34G \[00:00\, ?B/s] tokenizer\_config.json: 0%| | 0.00/314 \[00:00\, ?B/s] vocab.txt: 0.00B \[00:00, ?B/s] tokenizer.json: 0.00B \[00:00, ?B/s] special\_tokens\_map.json: 0%| | 0.00/125 \[00:00\, ?B/s] config.json: 0%| | 0.00/201 \[00:00\, ?B/s]We can do a simple lookup to test our new index. The following snippet returns the results of a nearest-neighbor search on the input “What is happiness?” ```python theme={null} sim = sentences_view.text.similarity(string='What is happiness?') ( sentences_view.order_by(sim, asc=False) .limit(10) .select(sentences_view.text, similarity=sim) .collect() ) ``` ## Incremental Updates *Incremental updates* are a key feature of Pixeltable. Whenever a new video is added to the original table, all of its downstream computed columns are updated automatically. Let’s demonstrate this by adding a third video to the table and seeing how the updates propagate through to the index. ```python theme={null} video_table.insert([{'video': videos[2]}]) ```
Inserted 10 rows with 0 errors in 4.20 s (2.38 rows/s) 10 rows inserted.```python theme={null} video_table.select( video_table.video, video_table.metadata, video_table.transcription.text, ).show() ``` ```python theme={null} sim = sentences_view.text.similarity(string='What is happiness?') ( sentences_view.order_by(sim, asc=False) .limit(20) .select(sentences_view.text, similarity=sim) .collect() ) ``` We can see the new results showing up in `sentences_view`. ## Using the OpenAI API This concludes our tutorial using the locally installed Whisper library. Sometimes, it may be preferable to use the OpenAI API rather than a locally installed library. In this section we’ll show how this can be done in Pixeltable, simply by using a different function to construct our computed columns. Since this section relies on calling out to the OpenAI API, you’ll need to have an API key, which you can enter below. ```python theme={null} import getpass import os if 'OPENAI_API_KEY' not in os.environ: os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:') ``` ```python theme={null} from pixeltable.functions import openai video_table.add_computed_column( transcription_from_api=openai.transcriptions( video_table.audio, model='whisper-1' ) ) ```
Added 3 column values with 0 errors in 6.49 s (0.46 rows/s) 3 rows updated.Now let’s compare the results from the local model and the API side-by-side. ```python theme={null} video_table.select( video_table.video, video_table.transcription.text, video_table.transcription_from_api.text, ).show() ``` They look pretty similar, which isn’t surprising, since the OpenAI transcriptions endpoint runs on Whisper. One difference is that the local library spits out a lot more information about the internal behavior of the model. Note that we’ve been selecting `video_table.transcription.text` in the preceding queries, which pulls out just the `text` field of the transcription results. The actual results are a sizable JSON structure that includes a lot of metadata. To see the full output, we can select `video_table.transcription` instead, to get the full JSON struct. Here’s what it looks like (we’ll select just one row, since it’s a lot of output): ```python theme={null} video_table.select( video_table.transcription, video_table.transcription_from_api ).show(1) ``` # Object Detection in Videos Source: https://docs.pixeltable.com/howto/use-cases/object-detection-in-videos
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'detection\_demo'. Created table 'videos'.In order to interact with the frames, we take advantage of Pixeltable’s component view concept: we create a “view” of our video table that contains one row for each frame of each video in the table. Pixeltable provides the built-in `frame_iterator` for this. ```python theme={null} from pixeltable.functions.video import frame_iterator frames_view = pxt.create_view( 'detection_demo/frames', videos_table, iterator=frame_iterator(videos_table.video), ) ``` You’ll see that neither the `videos` table nor the `frames` view has any actual data yet, because we haven’t yet added any videos to the table. However, the `frames` view is now configured to automatically track the `videos` table as new data shows up. The new view is automatically configured with six columns: * `pos` - a system column that is part of every component view * `video` - the column inherited from our base table (all base table columns are visible in any of its views) * `frame_idx`, `pos_msec`, `pos_frame`, `frame` - these four columns are created by the `frame_iterator`. Let’s have a look at the new view: ```python theme={null} frames_view ``` We’ll now insert a single row into the videos table, containing a video of a busy intersection in Bangkok. ```python theme={null} videos_table.insert( [ { 'video': 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/bangkok.mp4' } ] ) ```
Inserted 462 rows with 0 errors in 4.35 s (106.25 rows/s) 462 rows inserted.Notice that both the `videos` table and `frames` view were automatically updated, expanding the single video into 461 rows in the view. Let’s have a look at `videos` first. ```python theme={null} videos_table.show() ``` Now let’s peek at the first five rows of `frames`: ```python theme={null} frames_view.select( frames_view.pos, frames_view.frame, frames_view.frame.width, frames_view.frame.height, ).show(5) ``` One advantage of using Pixeltable’s component view mechanism is that Pixeltable does not physically store the frames. Instead, Pixeltable re-extracts the frames on retrieval using the frame index, which can be done very efficiently and avoids any storage overhead (which can be quite substantial for video frames). ## Object Detection with Pixeltable Now let’s apply an object detection model to our frames. Pixeltable includes built-in support for a number of models; we’re going to use the YOLOX family of models, which are lightweight models with solid performance. We first import the `yolox` Pixeltable function. ```python theme={null} from pixeltable.functions.yolox import yolox ``` Pixeltable functions operate on columns and expressions using standard Python function call syntax. Here’s an example that shows how we might experiment with applying one of the YOLOX models to the first few frames in our video, using Pixeltable’s powerful `select` comprehension. ```python theme={null} # Show the results of applying the `yolox_tiny` model # to the first few frames in the table. frames_view.select( frames_view.frame, yolox(frames_view.frame, model_id='yolox_tiny') ).head(3) ``` It may appear that we just ran the YOLOX inference over the entire view of 461 frames, but remember that Pixeltable evaluates expressions lazily: in this case, it only ran inference over the 3 frames that we actually displayed. The inference output looks like what we’d expect, so let’s add a *computed column* that runs inference over the entire view (computed columns are discussed in detail in the [Computed Columns](https://github.com/pixeltable/pixeltable/blob/release/docs/tutorials/computed-columns.ipynb) tutorial). Remember that once a computed column is created, Pixeltable will update it incrementally any time new rows are added to the view. This is a convenient way to incorporate inference (and other operations) into data workflows. This *will* cause Pixeltable to run inference over all 461 frames, so please be patient. ```python theme={null} # Create a computed column to compute detections using the `yolox_tiny` # model. # We'll adjust the confidence threshold down a bit (the default is 0.5) # to pick up even more bounding boxes. frames_view.add_computed_column( detections_tiny=yolox( frames_view.frame, model_id='yolox_tiny', threshold=0.25 ) ) ```
Added 461 column values with 0 errors in 15.09 s (30.55 rows/s) 461 rows updated.The new column is now part of the schema of the `frames` view: ```python theme={null} frames_view ``` The data in the computed column is now stored for fast retrieval. ```python theme={null} frames_view.select(frames_view.frame, frames_view.detections_tiny).show(3) ``` Now let’s create a new set of images, in which we superimpose the detected bounding boxes on top of the original images. We’ll use the handy built-in `draw_bounding_boxes` UDF for this. We could create a new computed column to hold the superimposed images, but we don’t have to; sometimes it’s easier just to use a `select` comprehension, as we did when we were first experimenting with the detection model. ```python theme={null} import pixeltable.functions as pxtf frames_view.select( frames_view.frame, pxtf.vision.draw_bounding_boxes( frames_view.frame, frames_view.detections_tiny.bboxes, width=4 ), ).show(1) ``` Our `select` comprehension ranged over the entire table, but just as before, Pixeltable computes the output lazily: image operations are performed at retrieval time, so in this case, Pixeltable drew the annotations just for the one frame that we actually displayed. Looking at individual frames gives us some idea of how well our detection algorithm works, but it would be more instructive to turn the visualization output back into a video. We do that with the built-in function `make_video()`, which is an aggregation function that takes a frame index (actually: any expression that can be used to order the frames; a timestamp would also work) and an image, and then assembles the sequence of images into a video. ```python theme={null} frames_view.group_by(videos_table).select( pxt.functions.video.make_video( frames_view.pos, pxtf.vision.draw_bounding_boxes( frames_view.frame, frames_view.detections_tiny.bboxes, width=4 ), ) ).show(1) ``` ## Comparing Object Detection Models The detections that we get out of `yolox_tiny` are passable, but a little choppy. Suppose we want to experiment with a more powerful object detection model, to see if there is any improvement in detection quality. We can create an additional column to hold the new inferences. The larger model takes longer to download and run, so please be patient. ```python theme={null} # Here we use the larger `yolox_m` (medium) model. frames_view.add_computed_column( detections_m=yolox( frames_view.frame, model_id='yolox_m', threshold=0.25 ) ) ```
Added 461 column values with 0 errors in 65.94 s (6.99 rows/s) 461 rows updated.Let’s see the results of the two models side-by-side. ```python theme={null} frames_view.group_by(videos_table).select( pxt.functions.video.make_video( frames_view.pos, pxtf.vision.draw_bounding_boxes( frames_view.frame, frames_view.detections_tiny.bboxes, width=4 ), ), pxt.functions.video.make_video( frames_view.pos, pxtf.vision.draw_bounding_boxes( frames_view.frame, frames_view.detections_m.bboxes, width=4 ), ), ).show(1) ``` Running the videos side-by-side, we can see that the larger model is higher in quality: less flickering, with more stable boxes from frame to frame. ## Evaluating Models Against a Ground Truth In order to do a quantitative evaluation of model performance, we need a ground truth to compare them against. Let’s generate some (synthetic) “ground truth” data by running against the largest YOLOX model available. It will take even longer to cache and evaluate this model. ```python theme={null} frames_view.add_computed_column( detections_x=yolox( frames_view.frame, model_id='yolox_x', threshold=0.25 ) ) ```
Added 461 column values with 0 errors in 156.55 s (2.94 rows/s) 461 rows updated.Let’s have a look at our enlarged view, now with three `detections` columns. ```python theme={null} frames_view ``` We’re going to be evaluating the generated detections with the commonly-used [mean average precision](https://learnopencv.com/mean-average-precision-map-object-detection-model-evaluation-metric/) metric (mAP). The mAP metric is based on per-frame metrics, such as true and false positives per detected class, which are then aggregated into a single (per-class) number. In Pixeltable, functionality is available via the `eval_detections()` and `mean_ap()` built-in functions. ```python theme={null} from pixeltable.functions.vision import eval_detections, mean_ap frames_view.add_computed_column( eval_yolox_tiny=eval_detections( pred_bboxes=frames_view.detections_tiny.bboxes, pred_labels=frames_view.detections_tiny.labels, pred_scores=frames_view.detections_tiny.scores, gt_bboxes=frames_view.detections_x.bboxes, gt_labels=frames_view.detections_x.labels, ) ) frames_view.add_computed_column( eval_yolox_m=eval_detections( pred_bboxes=frames_view.detections_m.bboxes, pred_labels=frames_view.detections_m.labels, pred_scores=frames_view.detections_m.scores, gt_bboxes=frames_view.detections_x.bboxes, gt_labels=frames_view.detections_x.labels, ) ) ```
Added 461 column values with 0 errors in 0.29 s (1589.38 rows/s) Added 461 column values with 0 errors in 0.31 s (1475.98 rows/s) 461 rows updated.Let’s take a look at the output. ```python theme={null} frames_view.select( frames_view.eval_yolox_tiny, frames_view.eval_yolox_m ).show(1) ``` The computation of the mAP metric is now simply a query over the evaluation output, aggregated with the `mean_ap()` function. ```python theme={null} frames_view.select( mean_ap(frames_view.eval_yolox_tiny), mean_ap(frames_view.eval_yolox_m), ).show() ``` This two-step process allows you to compute mAP at every granularity: over your entire dataset, only for specific videos, only for videos that pass a certain filter, etc. Moreover, you can compute this metric any time, not just during training, and use it to guide your understanding of your dataset and how it affects the quality of your models. # Document Indexing and RAG Source: https://docs.pixeltable.com/howto/use-cases/rag-demo
Note: you may need to restart the kernel to use updated packages.```python theme={null} import pixeltable as pxt # Ensure a clean slate for the demo pxt.drop_dir('rag_demo', force=True) pxt.create_dir('rag_demo') ```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/sergeymkhitaryan/.pixeltable/pgdata Created directory 'rag\_demo'. \Next we’ll create a table containing the sample questions we want to answer. The questions are stored in an Excel spreadsheet, along with a set of “ground truth” answers to help evaluate our model pipeline. We can use `create_table()` with the `source` parameter to load them. Note that we can pass the URL of the spreadsheet directly. ```python theme={null} base = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/rag-demo/' qa_url = base + 'Q-A-Rag.xlsx' queries_t = pxt.create_table('rag_demo/queries', source=qa_url) ```
Created table 'queries'. Inserting rows into \`queries\`: 8 rows \[00:00, 2469.96 rows/s] Inserted 8 rows with 0 errors.```python theme={null} queries_t.head() ``` ## Outline There are two major parts to our RAG application: 1. Document Indexing: Load the documents, split them into chunks, and index them using a vector embedding. 2. Querying: For each question on our list, do a top-k lookup for the most relevant chunks, use them to construct a ChatGPT prompt, and send the enriched prompt to an LLM. We’ll implement both parts in Pixeltable. ## Document Indexing All data in Pixeltable, including documents, resides in tables. Tables are persistent containers that can serve as the store of record for your data. Since we are starting from scratch, we will start with an empty table `rag_demo.documents` with a single column, `document`. ```python theme={null} documents_t = pxt.create_table( 'rag_demo/documents', {'document': pxt.Document} ) documents_t ```
Created table 'documents'.Next, we’ll insert our first few source documents into the new table. We’ll leave the rest for later, in order to show how to update the indexed document base incrementally. ```python theme={null} document_urls = [ base + 'Argus-Market-Digest-June-2024.pdf', base + 'Argus-Market-Watch-June-2024.pdf', base + 'Company-Research-Alphabet.pdf', base + 'Jefferson-Amazon.pdf', base + 'Mclean-Equity-Alphabet.pdf', base + 'Zacks-Nvidia-Report.pdf', ] ``` ```python theme={null} documents_t.insert({'document': url} for url in document_urls[:3]) documents_t.show() ```
Inserting rows into \`documents\`: 3 rows \[00:00, 491.31 rows/s] Inserted 3 rows with 0 errors.In RAG applications, we often decompose documents into smaller units, or chunks, rather than treating each document as a single entity. In this example, we’ll use Pixeltable’s built-in `document_splitter`, but in general the chunking methodology is highly customizable. `document_splitter` has a variety of options for controlling the chunking behavior, and it’s also possible to replace it entirely with a user-defined iterator (or an adapter for a third-party document splitter). In Pixeltable, operations such as chunking can be automated by creating **views** of the base `documents` table. A view is a virtual derived table: rather than adding data directly to the view, we define it via a computation over the base table. In this example, the view is defined by iteration over the chunks of a `document_splitter`. ```python theme={null} from pixeltable.functions.document import document_splitter chunks_t = pxt.create_view( 'rag_demo/chunks', documents_t, iterator=document_splitter( documents_t.document, separators='token_limit', limit=300 ), ) ```
Inserting rows into \`chunks\`: 41 rows \[00:00, 20799.04 rows/s]Our `chunks` view now has 3 columns: ```python theme={null} chunks_t ``` * `text` is the chunk text produced by the `document_splitter` * `pos` is a system-generated integer column, starting at 0, that provides a sequence number for each row * `document`, which is simply the `document` column from the base table `documents`. We won’t need it here, but having access to the base table’s columns (in effect a parent-child join) can be quite useful. Notice that as soon as we created it, `chunks` was automatically populated with data from the existing documents in our base table. We can select the first 2 chunks from each document using common query operations, in order to get a feel for what was extracted: ```python theme={null} chunks_t.where(chunks_t.pos < 2).show() ``` Now let’s compute vector embeddings for the document chunks and store them in a vector index. Pixeltable has built-in support for vector indexing using a variety of embedding model families, and it’s easy for users to add new ones via UDFs. In this demo, we’re going to use the E5 model from the Huggingface `sentence_transformers` library, which runs locally. The following command creates a vector index on the `text` column in the `chunks` table, using the E5 embedding model. (For details on index creation, see the [Embedding and Vector Indices](https://github.com/pixeltable/pixeltable/blob/release/docs/platform/embedding-indexes.ipynb) guide.) Note that defining the index is sufficient in order to load it with the existing data (and also to update it when the underlying data changes, as we’ll see later). ```python theme={null} from pixeltable.functions.huggingface import sentence_transformer chunks_t.add_embedding_index( 'text', embedding=sentence_transformer.using(model_id='intfloat/e5-large-v2'), ) ``` This completes the first part of our application, creating an indexed document base. Next, we’ll use it to run some queries. ## Querying In order to express a top-k lookup against our index, we use Pixeltable’s `similarity` operator in combination with the standard `order_by` and `limit` operations. Before building this into our application, let’s run a sample query to make sure it works. ```python theme={null} query_text = 'What is the expected EPS for Nvidia in Q1 2026?' sim = chunks_t.text.similarity(string=query_text) nvidia_eps_query = ( chunks_t.order_by(sim, asc=False) .select(similarity=sim, text=chunks_t.text) .limit(5) ) nvidia_eps_query.collect() ``` We perform this context retrieval for each row of our `queries` table by adding it as a computed column. In this case, the operation is a top-k similarity lookup against the data in the `chunks` table. To implement this operation, we’ll use Pixeltable’s `@query` decorator to enhance the capabilities of the `chunks` table. ```python theme={null} # A @query is essentially a reusable, parameterized query that is attached to a table (or view), # which is a modular way of getting data from that table. @pxt.query def top_k(query_text: str): sim = chunks_t.text.similarity(string=query_text) return ( chunks_t.order_by(sim, asc=False) .select(chunks_t.text, sim=sim) .limit(5) ) # Now add a computed column to `queries_t`, calling the query # `top_k` that we just defined. queries_t.add_computed_column(question_context=top_k(queries_t.Question)) ```
Added 8 column values with 0 errors. 8 rows updated, 8 values computed.Our `queries` table now looks like this: ```python theme={null} queries_t ``` The new column `question_context` now contains the result of executing the query for each row, formatted as a list of dictionaries: ```python theme={null} queries_t.select(queries_t.question_context).head(1) ``` ### Asking the LLM Now it’s time for the final step in our application: feeding the document chunks and questions to an LLM for resolution. In this demo, we’ll use OpenAI for this, but any other inference cloud or local model could be used instead. We start by defining a UDF that takes a top-k list of context chunks and a question and turns them into a ChatGPT prompt. ```python theme={null} # Define a UDF to create an LLM prompt given a top-k list of # context chunks and a question. @pxt.udf def create_prompt(top_k_list: list[dict], question: str) -> str: concat_top_k = '\n\n'.join( elt['text'] for elt in reversed(top_k_list) ) return f""" PASSAGES: {concat_top_k} QUESTION: {question}""" ``` We then add that again as a computed column to `queries`: ```python theme={null} queries_t.add_computed_column( prompt=create_prompt(queries_t.question_context, queries_t.Question) ) ```
Added 8 column values with 0 errors. 8 rows updated, 16 values computed.We now have a new string column containing the prompt: ```python theme={null} queries_t ``` ```python theme={null} queries_t.select(queries_t.prompt).head(1) ``` We now add another computed column to call OpenAI. For the `chat_completions()` call, we need to construct two messages, containing the instructions to the model and the prompt. For the latter, we can simply reference the `prompt` column we just added. ```python theme={null} from pixeltable.functions import openai # Assemble the prompt and instructions into OpenAI's message format messages = [ { 'role': 'system', 'content': 'Please read the following passages and answer the question based on their contents.', }, {'role': 'user', 'content': queries_t.prompt}, ] # Add a computed column that calls OpenAI queries_t.add_computed_column( response=openai.chat_completions( model='gpt-4o-mini', messages=messages ) ) ```
Added 8 column values with 0 errors. 8 rows updated, 8 values computed.Our `queries` table now contains a JSON-structured column `response`, which holds the entire API response structure. At the moment, we’re only interested in the response content, which we can extract easily into another computed column: ```python theme={null} queries_t.add_computed_column( answer=queries_t.response.choices[0].message.content ) ```
Added 8 column values with 0 errors. 8 rows updated, 8 values computed.We now have the following `queries` schema: ```python theme={null} queries_t ``` Let’s take a look at what we got back: ```python theme={null} queries_t.select( queries_t.Question, queries_t.correct_answer, queries_t.answer ).show() ``` The application works, but, as expected, a few questions couldn’t be answered due to the missing documents. As a final step, let’s add the remaining documents to our document base, and run the queries again. ## Incremental Updates Pixeltable’s views and computed columns update automatically in response to new data. We can see this when we add the remaining documents to our `documents` table. Watch how the `chunks` view is updated to stay in sync with `documents`: ```python theme={null} documents_t.insert({'document': p} for p in document_urls[3:]) ```
Inserting rows into \`documents\`: 3 rows \[00:00, 569.05 rows/s] Inserting rows into \`chunks\`: 67 rows \[00:00, 325.91 rows/s] Inserted 70 rows with 0 errors. 70 rows inserted, 6 values computed.```python theme={null} documents_t.show() ``` (Note: although Pixeltable updates `documents` and `chunks`, it **does not** automatically update the `queries` table. This is by design: we don’t want all rows in `queries` to get automatically re-executed every time a single new document is added to the document base. However, newly-added rows will be run over the new, incrementally-updated index.) To confirm that the `chunks` index got updated, we’ll re-run the chunks retrieval query for the question `What is the expected EPS for Nvidia in Q1 2026?` Previously, our most similar chunk had a similarity score of \~0.8. Let’s see what we get now: ```python theme={null} nvidia_eps_query.collect() ``` Our most similar chunk now has a score of \~0.855 and pulls in more relevant chunks from the newly-inserted documents. Let’s recompute the `question_context` column of the `queries_t` table, which will automatically recompute the `answer` column as well. ```python theme={null} queries_t.recompute_columns('question_context') ```
Inserting rows into \`queries\`: 8 rows \[00:00, 580.60 rows/s] 8 rows updated, 40 values computed.As a final step, let’s confirm that all the queries now have answers: ```python theme={null} queries_t.select( queries_t.Question, queries_t.correct_answer, queries_t.answer ).show() ``` # RAG Operations in Pixeltable Source: https://docs.pixeltable.com/howto/use-cases/rag-operations
Created table 'docs'.If we take a peek at the `docs` table, we see its very simple structure. ```python theme={null} docs ``` Next we create a view to represent chunks of our PDF documents. A Pixeltable view is a virtual table, which is dynamically derived from a source table by applying a transformation and/or selecting a subset of data. In this case, our view represents a one-to-many transformation from source documents into individual sentences. This is achieved using Pixeltable’s built-in `document_splitter` class. Note that the `docs` table is currently empty, so creating this view doesn’t actually *do* anything yet: it simply defines an operation that we want Pixeltable to execute when it sees new data. ```python theme={null} from pixeltable.functions.document import document_splitter sentences = pxt.create_view( 'rag_ops_demo/sentences', # Name of the view docs, # Table from which the view is derived iterator=document_splitter( docs.source_doc, separators='sentence', # Chunk docs into sentences metadata='title,heading,sourceline', ), ) ``` Let’s take a peek at the new `sentences` view. ```python theme={null} sentences ``` We see that `sentences` inherits the `source_doc` column from `docs`, together with some new fields: * `pos`: The position in the source document where the sentence appears. * `text`: The text of the sentence. * `title`, `heading`, and `sourceline`: The metadata we requested when we set up the view. ## Data Ingestion Ok, now it’s time to insert some data into our workflow. A document in Pixeltable is just a URL; the following command inserts a single row into the `docs` table with the `source_doc` field set to the specified URL: ```python theme={null} docs.insert( [ { 'source_doc': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/rag-demo/Argus-Market-Digest-June-2024.pdf' } ] ) ```
Inserting rows into \`docs\`: 1 rows \[00:00, 292.76 rows/s] Inserting rows into \`sentences\`: 217 rows \[00:00, 42910.00 rows/s] Inserted 218 rows with 0 errors. 218 rows inserted, 2 values computed.We can see that two things happened. First, a single row was inserted into `docs`, containing the URL representing our source PDF. Then, the view `sentences` was incrementally updated by applying the `document_splitter` according to the definition of the view. This illustrates an important principle in Pixeltable: by default, anytime Pixeltable sees new data, the update is incrementally propagated to any downstream views or computed columns. We can see the effect of the insertion with the `select` command. There’s a single row in `docs`: ```python theme={null} docs.select(docs.source_doc.fileurl).show() ``` And here are the first 20 rows in `sentences`. The content of the PDF is broken into individual sentences, as expected. ```python theme={null} sentences.select(sentences.text, sentences.heading).show(20) ``` ## Experimenting with Chunking Of course, chunking into sentences isn’t the only way to split a document. Perhaps we want to experiment with different chunking methodologies, in order to see which one performs best in a particular application. Pixeltable makes it easy to do this, by creating several views of the same source table. Here are a few examples. Notice that as each new view is created, it is initially populated from the data already in `docs`. ```python theme={null} chunks = pxt.create_view( 'rag_ops_demo/chunks', docs, iterator=document_splitter( docs.source_doc, separators='sentence,token_limit', limit=2048, overlap=0, metadata='title,heading,sourceline', ), ) ```
Inserting rows into \`chunks\`: 217 rows \[00:00, 47827.85 rows/s]```python theme={null} short_chunks = pxt.create_view( 'rag_ops_demo/short_chunks', docs, iterator=document_splitter( docs.source_doc, separators='sentence,token_limit', limit=72, overlap=0, metadata='title,heading,sourceline', ), ) ```
Inserting rows into \`short\_chunks\`: 219 rows \[00:00, 49104.70 rows/s]```python theme={null} short_char_chunks = pxt.create_view( 'rag_ops_demo/short_char_chunks', docs, iterator=document_splitter( docs.source_doc, separators='sentence,char_limit', limit=72, overlap=0, metadata='title,heading,sourceline', ), ) ```
Inserting rows into \`short\_char\_chunks\`: 459 rows \[00:00, 63241.10 rows/s]```python theme={null} chunks.select(chunks.text, chunks.heading).show(20) ``` ```python theme={null} short_chunks.select(short_chunks.text, short_chunks.heading).show(20) ``` ```python theme={null} short_char_chunks.select( short_char_chunks.text, short_char_chunks.heading ).show(20) ``` Now let’s add a few more documents to our workflow. Notice how all of the downstream views are updated incrementally, processing just the new documents as they are inserted. ```python theme={null} urls = [ 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/rag-demo/Argus-Market-Watch-June-2024.pdf', 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/rag-demo/Company-Research-Alphabet.pdf', 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/rag-demo/Zacks-Nvidia-Report.pdf', ] docs.insert({'source_doc': url} for url in urls) ```
Inserting rows into \`docs\`: 3 rows \[00:00, 1969.77 rows/s] Inserting rows into \`chunks\`: 742 rows \[00:00, 61926.41 rows/s] Inserting rows into \`short\_chunks\`: 747 rows \[00:00, 67743.68 rows/s] Inserting rows into \`sentences\`: 742 rows \[00:00, 67949.90 rows/s] Inserting rows into \`short\_char\_chunks\`: 1165 rows \[00:00, 3603.41 rows/s] Inserted 3399 rows with 0 errors. 3399 rows inserted, 6 values computed.## Further Experiments This is a good time to mention another important guiding principle of Pixeltable. The preceding examples all used the built-in `document_splitter` class with various configurations. That’s probably fine as a first cut or to prototype an application quickly, and it might be sufficient for some applications. But other applications might want to do more sophisticated kinds of chunking, implementing their own specialized logic or leveraging third-party tools. Pixeltable imposes no constraints on the AI or RAG operations a workflow uses: the iterator interface is highly general, and it’s easy to implement new operations or adapt existing code or third-party tools into the Pixeltable workflow. ## Computing Embeddings Next, let’s look at how embedding indices can be added seamlessly to existing Pixeltable workflows. To compute our embeddings, we’ll use the Huggingface `sentence_transformer` package, running it over the `chunks` view that broke our documents up into sentence-based chunks. Pixeltable has a built-in `sentence_transformer` adapter, and all we have to do is add a new column that leverages it. Pixeltable takes care of the rest, applying the new column to all existing data in the view. ```python theme={null} from pixeltable.functions.huggingface import sentence_transformer chunks.add_computed_column( minilm_embed=sentence_transformer( chunks.text, model_id='paraphrase-MiniLM-L6-v2' ) ) ```
Added 959 column values with 0 errors. 959 rows updated, 959 values computed.The new column is a *computed column*: it is defined as a function on top of existing data and updated incrementally as new data are added to the workflow. Let’s have a look at how the new column affected the `chunks` view. ```python theme={null} chunks ``` ```python theme={null} chunks.select(chunks.text, chunks.heading, chunks.minilm_embed).head() ``` Similarly, we might want to add a CLIP embedding to our workflow; once again, it’s just another computed column: ```python theme={null} from pixeltable.functions.huggingface import clip chunks.add_computed_column( clip_embed=clip(chunks.text, model_id='openai/clip-vit-base-patch32') ) ```
Added 959 column values with 0 errors. 959 rows updated, 959 values computed.```python theme={null} chunks ``` ```python theme={null} chunks.select(chunks.text, chunks.heading, chunks.clip_embed).head() ``` # Using Label Studio for Annotations with Pixeltable Source: https://docs.pixeltable.com/howto/using-label-studio-with-pixeltable
January 23, 2026 - 01:41:50 Django version 5.1.15, using settings 'label\_studio.core.settings.label\_studio' Starting development server at [http://0.0.0.0:8080/](http://0.0.0.0:8080/) Quit the server with CONTROL-C.If for some reason the Label Studio browser window failed to open, you can always access it at: [http://localhost:8080/](http://localhost:8080/) Once you’ve created an account in Label Studio, you’ll need to locate your API key. In the Label Studio browser window, log in, click “Organization”, “API Tokens Settings”, and enable “Legacy Tokens”. Then click on “Account & Settings” in the top right, click “Legacy Token”, and copy the Access Token from the interface. ## Configure Pixeltable Next, we configure Pixeltable to communicate with Label Studio. Run the following command, pasting in the API key that you copied from the Label Studio interface. ```python theme={null} import getpass import os if 'LABEL_STUDIO_URL' not in os.environ: os.environ['LABEL_STUDIO_URL'] = 'http://localhost:8080/' if 'LABEL_STUDIO_API_KEY' not in os.environ: os.environ['LABEL_STUDIO_API_KEY'] = getpass.getpass( 'Label Studio API key: ' ) ``` ## Create a Table to Store Videos Now we create the master table that will hold our videos to be annotated. This only needs to be done once, when we initially set up the workflow. ```python theme={null} import pixeltable as pxt schema = {'video': pxt.Video, 'date': pxt.Timestamp} # Before creating the table, we drop the `ls_demo` dir and all its contents, # in order to ensure a clean environment for the demo. pxt.drop_dir('ls_demo', force=True) pxt.create_dir('ls_demo') videos_table = pxt.create_table('ls_demo/videos', schema) ```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'ls\_demo'. Created table 'videos'.## Populate It with Data Now let’s add some videos to the table to populate it. For this tutorial, we’ll use some randomly selected videos from the Multimedia Commons archive. The table also contains a `date` field, for which we’ll use a fixed date (but in a production setting, it would typically be the date on which the video was imported). ```python theme={null} from datetime import datetime url_prefix = 'http://multimedia-commons.s3-website-us-west-2.amazonaws.com/data/videos/mp4/' files = [ '122/8ff/1228ff94bf742242ee7c88e4769ad5d5.mp4', '2cf/a20/2cfa205eae979b31b1144abd9fa4e521.mp4', 'ffe/ff3/ffeff3c6bf57504e7a6cecaff6aefbc9.mp4', ] today = datetime(2024, 4, 22) videos_table.insert( {'video': url_prefix + file, 'date': today} for file in files ) ```
Inserted 3 rows with 0 errors in 1.07 s (2.81 rows/s) 3 rows inserted.Let’s have a look at the table now. ```python theme={null} videos_table.head() ``` ## Create a Label Studio project Next we’ll create a new Label Studio project and link it to a new view on the Pixeltable table. You can link a Label Studio project to either a table or a view. For tables that are expecting a lot of input data, it’s often easier to link to views. In this example, we’ll create a view that filters the table down by date. ```python theme={null} # Create a view to filter on the specified date v = pxt.create_view( 'ls_demo/videos_2024_04_22', videos_table.where(videos_table.date == today), ) # Create a new Label Studio project and link it to the view. The # configuration uses Label Studio's standard XML format. This only # needs to be done once: after the view and project are linked, # the relationship is stored indefinitely in Pixeltable's metadata. label_config = """
Added 3 column values with 0 errors in 0.01 s (355.10 rows/s) Added 3 column values with 0 errors in 0.02 s (146.19 rows/s) Linked external store 'ls\_project\_0' to table 'videos\_2024\_04\_22'. Created 3 new task(s) in LabelStudioProject \`videos\_2024\_04\_22\`. No rows affected.If you look in the Label Studio UI now, you’ll see that there’s a new project with the name `videos_2022_04_22`, with three tasks, one for each of the videos in the view. If you want to create the project without populating it with tasks (yet), you can set `sync_immediately=False` in the call to `create_label_studio_project()`. You can always sync the table and project by calling `v.sync()`. Note also that we didn’t have to specify an explicit mapping between Pixeltable columns and Label Studio data fields. This is because, by default, Pixeltable assumes the Pixeltable and Label Studio field names coincide. The data field in the Label Studio project has the name `$video`, which Pixeltable maps, by default, to the column in `ls_demo.videos_2024_02_22` that is also called `video`. If you want to override this behavior to specify an explicit mapping of columns to fields, you can do that with the `col_mapping` parameter of `create_label_studio_project()`. Inspecting the view, we also see that Pixeltable created an additional column on the view, `annotations`, which will hold the output of our annotations workflow. The name of the output column can also be overridden by specifying a dict entry in `col_mapping` of the form `{'my_col_name': 'annotations'}`. ```python theme={null} v ``` ## Add Some Annotations Now, let’s add some annotations to our Label Studio project to simulate a human-in-the-loop workflow. In the Label Studio UI, click on the new `videos_2024_02_22` project, and click on any of the three tasks. Select the appropriate category (“city”, “food”, or “sports”), and click “Submit”. ## Import the Annotations Back To Pixeltable Now let’s try importing annotations from Label Studio back to our view. ```python theme={null} v = pxt.get_table('ls_demo/videos_2024_04_22') v.sync() ```
Created 0 new task(s) in LabelStudioProject \`videos\_2024\_04\_22\`. Updated annotation(s) from 3 task(s) in LabelStudioProject \`videos\_2024\_04\_22\`. 3 rows updated.Let’s see what effect that had. You’ll see that any videos that you annotated now have their `annotations` field populated in the view. ```python theme={null} v.select(v.video, v.annotations).head() ``` ## Parse Annotations with a Computed Column Pixeltable pulls in all sorts of metadata from Label Studio during a sync: everything that Label Studio reports back about the annotations, including things like the user account that created the annotations. Let’s say that all we care about is the annotation value. We can add a computed column to our table to pull it out. ```python theme={null} v.add_computed_column( video_category=v.annotations[0].result[0].value.choices[0] ) v.select(v.video, v.annotations, v.video_category).head() ```
Added 3 column values with 0 errors in 0.02 s (143.55 rows/s)Another useful operation is the `get_metadata` function, which returns information about the video itself, such as the resolution and codec (independent of Label Studio). Let’s add another computed column to hold such metadata. ```python theme={null} from pixeltable.functions.video import get_metadata v.add_computed_column(video_metadata=get_metadata(v.video)) v.select( v.video, v.annotations, v.video_category, v.video_metadata ).head() ```
Added 3 column values with 0 errors in 0.03 s (115.36 rows/s)## Preannotations with Pixeltable and Label Studio Frame extraction is another common operation in labeling workflows. In this example, we’ll extract frames from our videos into a view, then use an object detection model to generate preannotations for each frame. The following code uses a Pixeltable `frame_iterator` to automatically extract frames into a new view, which we’ll call `frames_2024_04_22`. ```python theme={null} from datetime import datetime from pixeltable.functions.video import frame_iterator today = datetime(2024, 4, 22) videos_table = pxt.get_table('ls_demo/videos') # Create the view, using a `frame_iterator` to extract frames with a sample rate # of `fps=0.25`, or 1 frame per 4 seconds of video. Setting `fps=0` would use the # native framerate of the video, extracting every frame. frames = pxt.create_view( 'ls_demo/frames_2024_04_22', videos_table.where(videos_table.date == today), iterator=frame_iterator(videos_table.video, fps=0.25), ) ``` ```python theme={null} # Show just the first 3 frames in the table, to avoid cluttering the notebook frames.select(frames.frame).head(3) ``` Now we’ll use the Resnet-50 object detection model to generate preannotations. We do this by creating a new computed column. ```python theme={null} from pixeltable.functions.huggingface import detr_for_object_detection # Run the Resnet-50 object detection model against each frame to generate bounding boxes frames.add_computed_column( detections=detr_for_object_detection( frames.frame, model_id='facebook/detr-resnet-50', threshold=0.95 ) ) frames.select(frames.frame, frames.detections).head(3) ```
Added 11 column values with 0 errors in 9.71 s (1.13 rows/s)We’d like to send these detections to Label Studio as preannotations, but they’re not quite ready. Label Studio expects preannotations in standard COCO format, but the Huggingface library outputs them in its own custom format. We can use Pixeltable’s handy `detr_to_coco` function to do the conversion, using another computed column. ```python theme={null} from pixeltable.functions.huggingface import detr_to_coco frames.add_computed_column( preannotations=detr_to_coco(frames.frame, frames.detections) ) frames.select( frames.frame, frames.detections, frames.preannotations ).head(3) ``` ## Create a Label Studio Project for Frames With our data workflow set up and the COCO preannotations prepared, all that’s left is to create a corresponding Label Studio project. Note how Pixeltable automatically maps `RectangleLabels` preannotation fields to columns, just like it does with data fields. Here, Pixeltable interprets the `name="preannotations"` attribute in `RectangleLabels` to mean, “map these rectangle labels to the `preannotations` column in my linked table or view”. The Label values `car`, `person`, and `train` are standard COCO object identifiers used by many off-the-shelf object detection models. You can find the complete list of them here, and include as many as you wish: [https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/coco-categories.csv](https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/coco-categories.csv) ```python theme={null} frames_config = """
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata Created directory 'fo\_demo'. \```python theme={null} # Create a Pixeltable table for our dataset and insert some sample images. url_prefix = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images' urls = [ 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000019.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000025.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000030.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000034.jpg', ] t = pxt.create_table('fo_demo/images', {'image': pxt.Image}) t.insert({'image': url} for url in urls) t.head() ```
Created table 'images'. Inserted 4 rows with 0 errors in 0.71 s (5.60 rows/s)Now we export our new table to a Voxel51 dataset and load it into a new Voxel51 session within our demo notebook. Once it’s been loaded, the images can be interactively navigated as with any other Voxel51 dataset. ```python theme={null} fo_dataset = pxt.io.export_images_as_fo_dataset(t, t.image) session = fo.launch_app(fo_dataset) ```
You are running the oldest supported major version of MongoDB. Please refer to [https://deprecation.voxel51.com\ for\ deprecation\ notices.\ You\ can\ suppress\ this\ exception\ by\ setting\ your\ \\\`database\_validation\\\`\ config\ parameter\ to\ \\\`False\\\`.\ See\ https://docs.voxel51.com/user\_guide/config.html#configuring-a-mongodb-connection\ for\ more\ information](https://deprecation.voxel51.com\ for\ deprecation\ notices.\ You\ can\ suppress\ this\ exception\ by\ setting\ your\ \\`database_validation\\`\ config\ parameter\ to\ \\`False\\`.\ See\ https://docs.voxel51.com/user_guide/config.html#configuring-a-mongodb-connection\ for\ more\ information) 28 \[31.4ms elapsed, ? remaining, 890.5 samples/s]## Adding Labels We’ll now show how Voxel51 labels can be attached to the exported dataset. Currently, Pixeltable supports only classification and detection labels; other Voxel51 label types may be added in the future. First, let’s generate some labels by applying two models from the Huggingface `transformers` library: A ViT model for image classification and a DETR model for object detection. ```python theme={null} from pixeltable.functions.huggingface import ( detr_for_object_detection, vit_for_image_classification, ) t.add_computed_column( classifications=vit_for_image_classification( t.image, model_id='google/vit-base-patch16-224' ) ) t.add_computed_column( detections=detr_for_object_detection( t.image, model_id='facebook/detr-resnet-50' ) ) ```
Added 4 column values with 0 errors in 4.17 s (0.96 rows/s) Added 4 column values with 0 errors in 2.72 s (1.47 rows/s) 4 rows updated.Both models output JSON containing the model results. Let’s peek at the contents of our table now: ```python theme={null} t.head() ``` Now we need to transform our model data into the format the Voxel51 API expects (see the Pixeltable documentation for [pxt.io.export\_images\_as\_fo\_dataset](/sdk/latest/io#func-export_images_as_fo_dataset) for details). We’ll use Pixeltable UDFs to do the appropriate conversions. ```python theme={null} @pxt.udf def vit_to_fo(vit_labels: list) -> list: return [ {'label': label, 'confidence': score} for label, score in zip( vit_labels['label_text'], vit_labels['scores'] ) ] @pxt.udf def detr_to_fo(img: pxt.Image, detr_labels: dict) -> list: result = [] for label, box, score in zip( detr_labels['label_text'], detr_labels['boxes'], detr_labels['scores'], ): # DETR gives us bounding boxes in (x1,y1,x2,y2) absolute (pixel) coordinates. # Voxel51 expects (x,y,w,h) relative (fractional) coordinates. # So we need to do a conversion. fo_box = [ box[0] / img.width, box[1] / img.height, (box[2] - box[0]) / img.width, (box[3] - box[1]) / img.height, ] result.append( {'label': label, 'bounding_box': fo_box, 'confidence': score} ) return result ``` We can test that our UDFs are working as expected with a `select()` statement. ```python theme={null} t.select( t.image, t.classifications, vit_to_fo(t.classifications), t.detections, detr_to_fo(t.image, t.detections), ).head() ``` Now we pass the modified structures to `export_images_as_fo_dataset`. ```python theme={null} fo_dataset = pxt.io.export_images_as_fo_dataset( t, t.image, classifications=vit_to_fo(t.classifications), detections=detr_to_fo(t.image, t.detections), ) session = fo.launch_app(fo_dataset) ```
28 \[41.8ms elapsed, ? remaining, 669.2 samples/s]## Adding Multiple Label Sets You can include multiple label sets of the same type in the same dataset by passing a `list` or `dict` of expressions to the `classifications` and/or `detections` parameters. If a `list` is specified, default names will be assigned to the label sets; if a `dict` is specified, the label sets will be named according to its keys. As an example, let’s try recomputing our detections using the more powerful DETR model ResNet-101, and then load them into the same Voxel51 dataset as the earlier detections in order to compare them side-by-side. ```python theme={null} t.add_computed_column( detections_101=detr_for_object_detection( t.image, model_id='facebook/detr-resnet-101' ) ) ```
Added 4 column values with 0 errors in 21.91 s (0.18 rows/s) 4 rows updated.```python theme={null} fo_dataset = pxt.io.export_images_as_fo_dataset( t, t.image, classifications=vit_to_fo(t.classifications), detections={ 'detections_50': detr_to_fo(t.image, t.detections), 'detections_101': detr_to_fo(t.image, t.detections_101), }, ) session = fo.launch_app(fo_dataset) ```
28 \[44.2ms elapsed, ? remaining, 633.4 samples/s]Exploring the resulting images, we can see that the results are not much different between the two models, at least on our small sample dataset. # Cloud Storage Source: https://docs.pixeltable.com/integrations/cloud-storage Store and manage media files in cloud storage providers like S3, GCS, Azure, and more Pixeltable supports storing media files (images, videos, audio, documents) in external cloud storage providers instead of local disk. This is essential for production deployments, enabling scalable storage, team collaboration, and integration with existing data infrastructure. ## Supported providers
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'demo'. Created table 'first'.We can use `t.describe()` to examine the table schema. We see that it now contains a single column, as expected. ```python theme={null} t.describe() ``` The new table is initially empty, with no rows: ```python theme={null} t.count() ```
0Now let’s put an image into it! We can add images simply by giving Pixeltable their URLs. The example images in this demo come from the [COCO dataset](https://cocodataset.org/), and we’ll be referencing copies of them in the Pixeltable github repo. But in practice, the images can come from anywhere: an S3 bucket, say, or the local file system. When we add the image, we see that Pixeltable gives us some useful status updates indicating that the operation was successful. ```python theme={null} t.insert( [ { 'input_image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000025.jpg' } ] ) ```
Inserted 1 row with 0 errors in 0.21 s (4.86 rows/s) 1 row inserted.We can use `t.head()` to examine the contents of the table. ```python theme={null} t.head() ``` ## Adding computed columns Great! Now we have a table containing some data. Let’s add an object detection model to our workflow. Specifically, we’re going to use the ResNet-50 object detection model, which runs using the Huggingface DETR (“DEtection TRansformer”) model class. Pixeltable contains a built-in adapter for this model family, so all we have to do is call the `detr_for_object_detection` Pixeltable function. A nice thing about the Huggingface models is that they run locally, so you don’t need an account with a service provider in order to use them. This is our first example of a **computed column**, a key concept in Pixeltable. Recall that when we created the `input_image` column, we specified a type, `ImageType`, indicating our intent to populate it with data in the future. When we create a *computed* column, we instead specify a function that operates on other columns of the table. By default, when we add the new computed column, Pixeltable immediately evaluates it against all existing data in the table - in this case, by calling the `detr_for_object_detection` function on the image. Depending on your setup, it may take a minute for the function to execute. In the background, Pixeltable is downloading the model from Huggingface (if necessary), instantiating it, and caching it for later use. ```python theme={null} from pixeltable.functions import huggingface t.add_computed_column( detections=huggingface.detr_for_object_detection( t.input_image, model_id='facebook/detr-resnet-50' ) ) ```
Added 1 column value with 0 errors in 3.26 s (0.31 rows/s) 1 row updated.Let’s examine the results. ```python theme={null} t.head() ``` We see that the model returned a JSON structure containing a lot of information. In particular, it has the following fields: * `label_text`: Descriptions of the objects detected * `boxes`: Bounding boxes for each detected object * `scores`: Confidence scores for each detection * `labels`: The DETR model’s internal IDs for the detected objects Perhaps this is more than we need, and all we really want are the text labels. We could add another computed column to extract `label_text` from the JSON struct: ```python theme={null} t.add_computed_column(detections_text=t.detections.label_text) t.head() ``` If we inspect the table schema now, we see how Pixeltable distinguishes between ordinary and computed columns. ```python theme={null} t.describe() ``` Now let’s add some more images to our table. This demonstrates another important feature of computed columns: by default, they update incrementally any time new data shows up on their inputs. In this case, Pixeltable will run the ResNet-50 model against each new image that is added, then extract the labels into the `detect_text` column. Pixeltable will orchestrate the execution of any sequence (or DAG) of computed columns. Note how we can pass multiple rows to `t.insert` with a single statement, which will insert them more efficiently. ```python theme={null} more_images = [ 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000030.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000034.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000042.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000061.jpg', ] t.insert({'input_image': image} for image in more_images) ```
Inserted 4 rows with 0 errors in 1.51 s (2.65 rows/s) 4 rows inserted.Let’s see what the model came up with. We’ll use `t.select` to suppress the display of the `detect` column, since right now we’re only interested in the text labels. ```python theme={null} t.select(t.input_image, t.detections_text).head() ``` ## Pixeltable is persistent An important feature of Pixeltable is that *everything is persistent*. Unlike in-memory Python libraries such as Pandas, Pixeltable is a database: all your data, transformations, and computed columns are stored and preserved between sessions. To see this, let’s clear all the variables in our notebook and start fresh. You can optionally restart your notebook kernel at this point, to demonstrate how Pixeltable data persists across sessions. ```python theme={null} # Clear all variables in the notebook %reset -f # Instantiate a new client object import pixeltable as pxt t = pxt.get_table('demo/first') # Display just the first two rows, to avoid cluttering the tutorial t.select(t.input_image, t.detections_text).head(2) ``` ## GPT-4o For comparison, let’s try running our examples through a generative model, Open AI’s `gpt-4o-mini`. For this section, you’ll need an OpenAI account with an API key. You can use the following command to add your API key to the environment (just enter your API key when prompted): ```python theme={null} import getpass import os if 'OPENAI_API_KEY' not in os.environ: os.environ['OPENAI_API_KEY'] = getpass.getpass( 'Enter your OpenAI API key:' ) ``` Now we can connect to OpenAI through Pixeltable. This may take some time, depending on how long OpenAI takes to process the query. ```python theme={null} from pixeltable.functions import openai # Construct a message dict for OpenAI. It follows the same pattern # as the OpenAI SDK, except that in place of an image URL, we can # put a reference to our image column, and Pixeltable will do the # substitution once for each row of the table. messages = [ { 'role': 'user', 'content': [ {'type': 'text', 'text': "What's in this image?"}, {'type': 'image_url', 'image_url': t.input_image}, ], } ] t.add_computed_column( vision=openai.chat_completions(messages, model='gpt-4o-mini') ) ```
Added 5 column values with 0 errors in 6.98 s (0.72 rows/s) 5 rows updated.Let’s see how GPT-4’s responses compare to the traditional discriminative (DETR) model. ```python theme={null} t.select(t.input_image, t.detections_text, t.vision).head() ``` It looks like OpenAI returned a whole range of context information along with the image descriptions. Let’s pluck out just the response content from inside those JSON structures, so that it’s easier to see in the table. Note that we can unpack JSON columns in Pixeltable the same way we would with ordinary Python dicts and lists. ```python theme={null} t.select( t.input_image, t.detections_text, t.vision['choices'][0]['message']['content'], ).head() ``` In addition to adapters for local models and inference APIs, Pixeltable can perform a range of more basic image operations. These image operations can be seamlessly chained with API calls, and Pixeltable will keep track of the sequence of operations, constructing new images and caching when necessary to keep things running smoothly. Just for fun (and to demonstrate the power of computed columns), let’s see what OpenAI thinks of our sample images when we rotate them by 180 degrees. ```python theme={null} t.add_computed_column(rot_image=t.input_image.rotate(180)) # This is identical to the preceding messages dict, but with # `t.rot_image` in place of `t.input_image`. messages = [ { 'role': 'user', 'content': [ {'type': 'text', 'text': "What's in this image?"}, {'type': 'image_url', 'image_url': t.rot_image}, ], } ] t.add_computed_column( rot_vision=openai.chat_completions(messages, model='gpt-4o-mini') ) ```
Added 5 column values with 0 errors in 6.19 s (0.81 rows/s) 5 rows updated.```python theme={null} t.select( t.rot_image, t.rot_vision['choices'][0]['message']['content'] ).head() ``` ## UDFs: Enhancing Pixeltable’s capabilities Another important principle of Pixeltable is that, although Pixeltable has a built-in library of useful operations and adapters, it will never prescribe a particular way of doing things. Pixeltable is built from the ground up to be extensible. Let’s take a specific example. Recall our use of the ResNet-50 detection model, in which the `detect` column contains a JSON blob with bounding boxes, scores, and labels. Suppose we want to create a column containing the single label with the highest confidence score. There’s no built-in Pixeltable function to do this, but it’s easy to write our own. In fact, all we have to do is write a Python function that does the thing we want, and mark it with the `@pxt.udf` decorator. ```python theme={null} @pxt.udf def top_detection(detect: dict) -> str: scores = detect['scores'] label_text = detect['label_text'] # Get the index of the object with the highest confidence i = scores.index(max(scores)) # Return the corresponding label return label_text[i] ``` ```python theme={null} t.add_computed_column(top=top_detection(t.detections)) ```
Added 5 column values with 0 errors in 0.11 s (45.52 rows/s) 5 rows updated.```python theme={null} t.select(t.detections_text, t.top).show() ``` Congratulations! You’ve reached the end of the tutorial. Hopefully, this gives a good overview of the capabilities of Pixeltable, but there’s much more to explore. As a next step, you might check out one of the other tutorials, depending on your interests: * [Object Detection in Videos](/howto/use-cases/object-detection-in-videos) * [RAG Operations in Pixeltable](/howto/use-cases/rag-operations) * [Working with OpenAI in Pixeltable](/howto/providers/working-with-openai) # Configuration Source: https://docs.pixeltable.com/platform/configuration Complete guide to configuring Pixeltable ## Configuration options Pixeltable can be configured through: * Environment variables * System configuration file (`~/.pixeltable/config.toml` on Linux/macOS or `C:\Users\
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'sharing-demo'. Output()
Extracting table data into: /Users/asiegel/.pixeltable/tmp/acad78b1-4a62-483e-a0b1-728ccb5603cf Created directory '\_system'. Created local replica 'sharing-demo/coco-copy' from URI: pxt://pixeltable:fiftyone/coco\_mini\_2017You can check that the replica exists at the local path with `list_tables()`. ```python theme={null} pxt.list_tables('sharing-demo') ```
\['sharing-demo/coco-copy']To see the structure of the replicated table: ```python theme={null} coco_copy ``` ### Working with replicas Replicated datasets are read-only locally, but you can query, explore, and use them in powerful ways: **1. Query and explore the data** ```python theme={null} # View the replicated data coco_copy.limit(3).collect() ``` **2. Perform similarity searches** Replicas include embedding indexes, so you can immediately perform similarity searches: ```python theme={null} # Get a sample image to search with sample_img = ( coco_copy.select(coco_copy.image).limit(1).collect()[0]['image'] ) sample_img ```
\['sharing-demo/coco-copy']```python theme={null} # Assign a handle to the replica coco_copy = pxt.get_table('sharing-demo.coco-copy') ``` **4. Create an independent copy** To work with the data in new ways, create an independent table with the replica as the source: ```python theme={null} # Create a fresh table with values only my_coco = pxt.create_table('sharing-demo.my-coco-table', source=coco_copy) ```
Created table 'my-coco-table'.This copies the values in the source, but drops the computational definitions and cannot be updated if the source table changes. ### Updating replicas with pull If the upstream table changes, you can update your local replica using `pull()`: ```python theme={null} # Update your local replica with changes from the cloud coco_copy.pull() ```
Replica 'sharing-demo/coco-copy' is already up to date with source: pxt://pixeltable:fiftyone/d699317b-23a4-404b-8f71-6531fd8dc462This synchronizes your local replica with any updates made to the source dataset. ## Publishing datasets **Requirements:** * A Pixeltable Cloud account (Community Edition includes 1TB storage - see [pricing](https://www.pixeltable.com/pricing)) * Your API key from the [account dashboard](https://pixeltable.com/dashboard) Publishing allows you to share your datasets with your team or make them publicly available. ### Configure your API key Pixeltable looks for your API key in the `PIXELTABLE_API_KEY` environment variable. Choose one of these methods: **Option 1: In your notebook (secure and convenient)** Run this cell to securely enter your API key (get it from [pixeltable.com/dashboard](https://pixeltable.com/dashboard)): ```python theme={null} import os from getpass import getpass os.environ['PIXELTABLE_API_KEY'] = getpass('Pixeltable API Key:') ``` **Option 2: Environment variable** Add to your `~/.zshrc` or `~/.bashrc`: ```bash theme={null} export PIXELTABLE_API_KEY='your-api-key-here' ``` **Option 3: Config file** Add to `~/.pixeltable/config.toml`: ```toml theme={null} [pixeltable] api_key = 'your-api-key-here' ``` See the [Configuration Guide](/platform/configuration) for details. ### Create a sample dataset Let’s create a table with images from this repository to publish. The `comment` parameter provides a description that will be visible on Pixeltable Cloud: ```python theme={null} t = pxt.create_table( 'sharing-demo.photos', schema={'image': pxt.Image, 'description': pxt.String}, comment='Sample image dataset for demonstrating Pixeltable Cloud publishing', ) ```
Created table 'photos'.```python theme={null} base_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images' t.insert( [ { 'image': f'{base_url}/000000000009.jpg', 'description': 'Kitchen scene', }, { 'image': f'{base_url}/000000000025.jpg', 'description': 'Street view', }, { 'image': f'{base_url}/000000000042.jpg', 'description': 'Indoor setting', }, ] ) ```
Inserted 3 rows with 0 errors in 0.02 s (169.05 rows/s) 3 rows inserted.### Publish your dataset Publish your table to Pixeltable Cloud. When calling `publish()`: * **`source`** (required): An existing local table - either a table path string (e.g., `'sample-images.photos'`) or table handle (e.g., `t`) * If you use a local table path string, it must match a table in your local database (you can verify with `pxt.list_tables()`) * **`destination_uri`** (required): The cloud URI where you want to publish, in the format `pxt://orgname/dataset` * Pixeltable automatically creates any directory structure in the cloud based on this URI * Your local directory structure doesn’t need to match the cloud structure See the [publish() SDK reference](/sdk/latest/pixeltable#func-publish) for full documentation. ```python theme={null} # Option 1: Publish using table path (string) pxt.publish( source='sharing-demo.photos', # Table path from list_tables() destination_uri='pxt://your-orgname/sample-images', ) # Option 2: Publish using table handle # pxt.publish( # source=t, # Table handle you assigned # destination_uri='pxt://your-orgname/sample-images' # ) ``` ### Understanding destination URIs The `destination_uri` in `publish()` uses the format: `pxt://org:database/path` **URI components:** * **`org`** (required): Your organization name * **`database`** (optional): Database name - defaults to `main` if omitted * **`path`** (required): Directory and table path in the cloud **Examples:** * `pxt://orgname/my-dataset` → Uses the default `main` database * `pxt://orgname:main/my-dataset` → Explicitly specifies the `main` database * `pxt://orgname:analytics/my-dataset` → Uses the `analytics` database **About databases:** * Every Pixeltable Cloud account includes a `main` database by default * Each database has its own storage bucket * You can create additional databases in your [Pixeltable dashboard](https://pixeltable.com/dashboard) ### Updating published datasets with push After you’ve published a dataset, you can update the cloud replica with local changes using `push()`: ```python theme={null} # Make some changes to your local table t.insert( [ { 'image': f'{base_url}/000000000049.jpg', 'description': 'Outdoor scene', } ] ) # Push the changes to your published dataset t.push() ``` This updates the published dataset on Pixeltable Cloud with your local changes. Your dataset is now published and can be replicated by others using: ```python theme={null} import pixeltable as pxt sample_images = pxt.replicate( remote_uri='pxt://your-orgname/sample-images', local_path='sample-images-copy' ) ``` **Note:** If you are the owner of a published table, you cannot use `replicate()` to create a replica of your own table. This is because the table already exists in your Pixeltable database. The `replicate()` function is intended for pulling datasets published by others into your environment. ### Access control The `access` parameter in `publish()` controls who can replicate your dataset: * **`access='private'`** (default): Only your team members can access the dataset * **`access='public'`**: Anyone can replicate your dataset You can set access control either at the time of publish using the `access` parameter, or change it later in the [Pixeltable Cloud UI](https://pixeltable.com/dashboard). You can also manage team members and permissions in your dashboard. ### Deleting published tables If you want to delete a published table, you have two options: **Option 1: Using the Pixeltable SDK** Use `drop_table()` with your table’s destination URI (the same `pxt://` URI you used when publishing): ```python theme={null} pxt.drop_table('pxt://your-orgname/sample-images') ``` **Option 2: Using the Pixeltable Cloud dashboard** Navigate to your [Pixeltable Cloud dashboard](https://pixeltable.com/dashboard) and delete the table from the UI. ## Get help Have questions or need support? Join our community: * **[Discord Community](https://discord.com/invite/QPyqFYx2UN)**: Ask questions, get community support, and share what you build with Pixeltable * **[YouTube](https://www.youtube.com/@PixeltableHQ)**: Watch tutorials, demos, and feature walkthroughs * **[GitHub Issues](https://github.com/pixeltable/pixeltable/issues)**: Report bugs or request features ## Resources * [Pixeltable Cloud Dashboard](https://www.pixeltable.com/dashboard) * [Pixeltable Public Datasets](https://www.pixeltable.com/data-products) * [Pixeltable SDK Reference](/sdk/latest/) # Embedding Indices Source: https://docs.pixeltable.com/platform/embedding-indexes
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory \`indices\_demo\`. Created table \`img\_tbl\`.We start out by inserting 10 rows: ```python theme={null} img_urls = [ 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000030.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000034.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000042.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000049.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000057.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000061.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000063.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000064.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000069.jpg', 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000071.jpg', ] imgs.insert({'id': i, 'img': url} for i, url in enumerate(img_urls)) ```
Computing cells: 80%|█████████████████████████████████▌ | 16/20 \[00:01\<00:00, 14.67 cells/s] Inserting rows into \`img\_tbl\`: 10 rows \[00:00, 3589.17 rows/s] Computing cells: 100%|██████████████████████████████████████████| 20/20 \[00:01\<00:00, 18.16 cells/s] Inserted 10 rows with 0 errors. UpdateStatus(num\_rows=10, num\_computed\_values=20, num\_excs=0, updated\_cols=\[], cols\_with\_excs=\[])For the sake of convenience, we’re storing the images as external URLs, which are cached transparently by Pixeltable. For details on working with external media files, see [Working with External Files](/platform/external-files). ## Creating an index To create and populate an index, we call [`Table.add_embedding_index()`](/sdk/latest/table#method-add_embedding_index) and tell it which UDF or UDFs to use to create embeddings. That definition is persisted as part of the table’s metadata, which allows Pixeltable to maintain the index in response to updates to the table. Any embedding UDF can be used for the index. For this example, we’re going to use a [CLIP](https://huggingface.co/docs/transformers/en/model_doc/clip) model, which has built-in support in Pixeltable under the [`pixeltable.functions.huggingface`](/sdk/latest/huggingface) package. As an alternative, you could use an online service such as OpenAI (see [`pixeltable.functions.openai`](/sdk/latest/openai)), or create your own embedding UDF with custom code (we’ll see how to do this below). Because we’re adding an index to an image column, the UDF we specify *must* be able to handle images. In fact, CLIP models are multimodal: they can handle both text and images, which is useful for doing lookups against the index. ```python theme={null} import PIL.Image from pixeltable.functions.huggingface import clip # create embedding index on the 'img' column imgs.add_embedding_index( 'img', embedding=clip.using(model_id='openai/clip-vit-base-patch32') ) ```
Computing cells: 100%|██████████████████████████████████████████| 10/10 \[00:04\<00:00, 2.50 cells/s]The first parameter of `add_embedding_index()` is the name of the column being indexed; the `embed` parameter specifies the relevant embedding. Notice the notation we used: ```python theme={null} clip.using(model_id='openai/clip-vit-base-patch32') ``` `clip` is a general-purpose UDF that can accept any CLIP model available in the Hugging Face model repository. To define an embedding, however, we need to provide a specific embedding function to `add_embedding_index()`: a function that is *not* parameterized on `model_id`. The `.using(model_id=...)` syntax tells Pixeltable to specialize the `clip` UDF by fixing the `model_id` parameter to the specific value `'openai/clip-vit-base-patch32'`.
.using() as a partial
function operator. It’s a general operator that can be applied to
any UDF (not just embedding functions), transforming a UDF with n
parameters into one with k parameters by fixing the values of
n-k of its arguments. Python has something similar in the
functools package: the
functools.partial()
operator.
Computing cells: 33%|██████████████ | 10/30 \[00:01\<00:02, 8.90 cells/s] Inserting rows into \`img\_tbl\`: 10 rows \[00:00, 1337.60 rows/s] Computing cells: 100%|██████████████████████████████████████████| 30/30 \[00:01\<00:00, 24.55 cells/s] Inserted 10 rows with 0 errors. UpdateStatus(num\_rows=10, num\_computed\_values=30, num\_excs=0, updated\_cols=\[], cols\_with\_excs=\[])When we now re-run the initial similarity query, we get a different result: ```python theme={null} sim = imgs.img.similarity(image=sample_img) res = ( imgs.order_by(sim, asc=False) .limit(2) .select(imgs.id, imgs.img, sim) .collect() ) res ``` ## Similarity search on different types Because CLIP models are multimodal, we can also do lookups by text. ```python theme={null} sim = imgs.img.similarity(string='train') # String lookup res = ( imgs.order_by(sim, asc=False) .limit(2) .select(imgs.id, imgs.img, sim) .collect() ) res ``` ## Creating multiple indexes on a single column We can create multiple embedding indexes on the same column, utilizing different embedding models. In order to use a specific index in a query, we need to assign it a name and then use that name in the query. To illustrate this, let’s create a table with text (taken from the Wikipedia article on [Pablo Picasso](https://en.wikipedia.org/wiki/Pablo_Picasso)): ```python theme={null} txts = pxt.create_table('indices_demo/text_tbl', {'text': pxt.String}) sentences = [ 'Pablo Ruiz Picasso (25 October 1881 – 8 April 1973) was a Spanish painter, sculptor, printmaker, ceramicist, and theatre designer who spent most of his adult life in France.', 'One of the most influential artists of the 20th century, he is known for co-founding the Cubist movement, the invention of constructed sculpture,[8][9] the co-invention of collage, and for the wide variety of styles that he helped develop and explore.', "Among his most famous works are the proto-Cubist Les Demoiselles d'Avignon (1907) and the anti-war painting Guernica (1937), a dramatic portrayal of the bombing of Guernica by German and Italian air forces during the Spanish Civil War.", 'Picasso demonstrated extraordinary artistic talent in his early years, painting in a naturalistic manner through his childhood and adolescence.', 'During the first decade of the 20th century, his style changed as he experimented with different theories, techniques, and ideas.', 'After 1906, the Fauvist work of the older artist Henri Matisse motivated Picasso to explore more radical styles, beginning a fruitful rivalry between the two artists, who subsequently were often paired by critics as the leaders of modern art.', "Picasso's output, especially in his early career, is often periodized.", 'While the names of many of his later periods are debated, the most commonly accepted periods in his work are the Blue Period (1901–1904), the Rose Period (1904–1906), the African-influenced Period (1907–1909), Analytic Cubism (1909–1912), and Synthetic Cubism (1912–1919), also referred to as the Crystal period.', "Much of Picasso's work of the late 1910s and early 1920s is in a neoclassical style, and his work in the mid-1920s often has characteristics of Surrealism.", 'His later work often combines elements of his earlier styles.', ] txts.insert({'text': s} for s in sentences) ```
Created table \`text\_tbl\`. Inserting rows into \`text\_tbl\`: 10 rows \[00:00, 3599.64 rows/s] Inserted 10 rows with 0 errors. UpdateStatus(num\_rows=10, num\_computed\_values=10, num\_excs=0, updated\_cols=\[], cols\_with\_excs=\[])When calling [`add_embedding_index()`](/sdk/latest/table#method-add_embedding_index), we now specify the index name (`idx_name`) directly. If it is not specified, Pixeltable will assign a name (such as `idx0`). ```python theme={null} from pixeltable.functions.huggingface import sentence_transformer txts.add_embedding_index( 'text', idx_name='minilm_idx', embedding=sentence_transformer.using( model_id='sentence-transformers/all-MiniLM-L12-v2' ), ) txts.add_embedding_index( 'text', idx_name='e5_idx', embedding=sentence_transformer.using(model_id='intfloat/e5-large-v2'), ) ```
Computing cells: 100%|██████████████████████████████████████████| 10/10 \[00:01\<00:00, 6.86 cells/s] Computing cells: 100%|██████████████████████████████████████████| 10/10 \[00:01\<00:00, 6.35 cells/s]To do a similarity query, we now call `similarity()` with the `idx` parameter: ```python theme={null} sim = txts.text.similarity('cubism', idx='minilm_idx') res = ( txts.order_by(sim, asc=False) .limit(2) .select(txts.text, sim) .collect() ) res ``` ## Using a UDF for a custom embedding The above examples show how to use any model in the Hugging Face `CLIP` or `sentence_transformer` model families, and essentially the same pattern can be used for any other embedding with built-in Pixeltable support, such as OpenAI embeddings. But what if you want to adapt a new model family that doesn’t have built-in support in Pixeltable? This can be done by writing a custom Pixeltable UDF. In the following example, we’ll write a simple UDF to use the [BERT](https://www.kaggle.com/models/tensorflow/bert/tensorFlow2/en-uncased-preprocess/3) model built on TensorFlow. First we install the necessary dependencies. ```python theme={null} %pip install -qU tensorflow tensorflow-hub tensorflow-text ``` Text embedding UDFs must always take a string as input, and return a 1-dimensional numpy array of fixed dimension (512 in the case of `small_bert`, the variant we’ll be using). If we were writing an image embedding UDF, the `input` would have type `PIL.Image.Image` rather than `str`. The UDF is straightforward, loading the model and evaluating it against the input, with a minor data conversion on either side of the model invocation. ```python theme={null} import pixeltable as pxt import tensorflow as tf import tensorflow_hub as hub import tensorflow_text # Necessary to ensure BERT dependencies are loaded @pxt.udf def bert(input: str) -> pxt.Array[(512,), pxt.Float]: """Computes text embeddings using the small_bert model.""" preprocessor = hub.load( 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3' ) bert_model = hub.load( 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/2' ) tensor = tf.constant([input]) # Convert the string to a tensor result = bert_model(preprocessor(tensor))['pooled_output'] return result.numpy()[0, :] ``` ```python theme={null} txts.add_embedding_index('text', idx_name='bert_idx', embedding=bert) ```
Computing cells: 100%|██████████████████████████████████████████| 10/10 \[00:17\<00:00, 1.72s/ cells]Here’s the output of our sample query run against `bert_idx`. ```python theme={null} sim = txts.text.similarity('cubism', idx='bert_idx') res = ( txts.order_by(sim, asc=False) .limit(2) .select(txts.text, sim) .collect() ) res ``` Our example UDF is very simple, but it would perform poorly in a production setting. To make our UDF production-ready, we’d want to do two things: * Cache the model: the current version calls `hub.load()` on every UDF invocation. In a real application, we’d want to instantiate the model just once, then reuse it on subsequent UDF calls. * Batch our inputs: we’d use Pixeltable’s batching capability to ensure we’re making efficient use of the model. Batched UDFs are described in depth in the [User-Defined Functions](/platform/udfs-in-pixeltable) how-to guide. You might have noticed that the updates to `bert_idx` seem sluggish; that’s why! ## Deleting an index To delete an index, call [`Table.drop_embedding_index()`](/sdk/latest/table#method-drop_embedding_index): * specify the `idx_name` parameter if you have multiple indices * otherwise the `column_name` parameter is sufficient Given that we have several embedding indices, we’ll specify which index to drop: ```python theme={null} txts.drop_embedding_index(idx_name='e5_idx') ``` # External Files Source: https://docs.pixeltable.com/platform/external-files
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory \`external\_data\`. \```python theme={null} v = pxt.create_table('external_data/videos', {'video': pxt.Video}) prefix = 's3://multimedia-commons/' paths = [ 'data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4', 'data/videos/mp4/ffe/feb/ffefebb41485539f964760e6115fbc44.mp4', 'data/videos/mp4/ffe/f73/ffef7384d698b5f70d411c696247169.mp4', ] v.insert({'video': prefix + p} for p in paths) ```
Created table \`videos\`. Computing cells: 0%| | 0/6 \[00:00\, ? cells/s] Inserting rows into \`videos\`: 3 rows \[00:00, 1004.62 rows/s] Computing cells: 100%|████████████████████████████████████████████| 6/6 \[00:00\<00:00, 79.14 cells/s] Inserted 3 rows with 0 errors. UpdateStatus(num\_rows=3, num\_computed\_values=6, num\_excs=0, updated\_cols=\[], cols\_with\_excs=\[]) UpdateStatus(num\_rows=3, num\_computed\_values=0, num\_excs=0, updated\_cols=\[], cols\_with\_excs=\[])We just inserted 3 rows with video files residing in S3. When we now query these, we are presented with their locally cached counterparts. (Note: we don’t simply display the output of `collect()` here, because that is formatted as an HTML table with a media player and so would obscure the file path.) ```python theme={null} rows = list(v.select(v.video).collect()) rows[0] ```
\{'video': '/Users/asiegel/.pixeltable/file\_cache/682f022a704d4459adb2f29f7fe9577c\_0\_1fcfcb221263cff76a2853250fbbb2e90375dd495454c0007bc6ff4430c9a4a7.mp4'}
Let’s make a local copy of the first file and insert that separately.
First, the copy:
```python theme={null}
local_path = tempfile.mktemp(suffix='.mp4')
shutil.copyfile(rows[0]['video'], local_path)
local_path
```
'/var/folders/hb/qd0dztsj43j\_mdb6hbl1gzyc0000gn/T/tmp1jo4a7ca.mp4'Now the insert: ```python theme={null} v.insert([{'video': local_path}]) ```
Computing cells: 0%| | 0/2 \[00:00\, ? cells/s] Inserting rows into \`videos\`: 1 rows \[00:00, 725.78 rows/s] Computing cells: 100%|████████████████████████████████████████████| 2/2 \[00:00\<00:00, 53.23 cells/s] Inserted 1 row with 0 errors. UpdateStatus(num\_rows=1, num\_computed\_values=2, num\_excs=0, updated\_cols=\[], cols\_with\_excs=\[])When we query this again, we see that local paths are preserved: ```python theme={null} rows = list(v.select(v.video).collect()) rows ```
\[\{'video': '/Users/asiegel/.pixeltable/file\_cache/682f022a704d4459adb2f29f7fe9577c\_0\_1fcfcb221263cff76a2853250fbbb2e90375dd495454c0007bc6ff4430c9a4a7.mp4'},
\{'video': '/Users/asiegel/.pixeltable/file\_cache/682f022a704d4459adb2f29f7fe9577c\_0\_fc11428b32768ae782193a57ebcbad706f45bbd9fa13354471e0bcd798fee3ea.mp4'},
\{'video': '/Users/asiegel/.pixeltable/file\_cache/682f022a704d4459adb2f29f7fe9577c\_0\_b9fb0d9411bc9cd183a36866911baa7a8834f22f665bce47608566b38485c16a.mp4'},
\{'video': '/var/folders/hb/qd0dztsj43j\_mdb6hbl1gzyc0000gn/T/tmp1jo4a7ca.mp4'}]
UDFs also see local paths:
```python theme={null}
@pxt.udf
def f(v: pxt.Video) -> int:
print(f'{type(v)}: {v}')
return 1
```
```python theme={null}
v.select(f(v.video)).show()
```
\## Dealing with errors When interacting with media data in Pixeltable, the user can assume that the underlying files exist, are local and are valid for their respective data type. In other words, the user doesn’t need to consider error conditions. To that end, Pixeltable validates media data on ingest. The default behavior is to reject invalid media files: ```python theme={null} v.insert([{'video': prefix + 'bad_path.mp4'}]) ```: /Users/asiegel/.pixeltable/file\_cache/682f022a704d4459adb2f29f7fe9577c\_0\_1fcfcb221263cff76a2853250fbbb2e90375dd495454c0007bc6ff4430c9a4a7.mp4 \ : /Users/asiegel/.pixeltable/file\_cache/682f022a704d4459adb2f29f7fe9577c\_0\_fc11428b32768ae782193a57ebcbad706f45bbd9fa13354471e0bcd798fee3ea.mp4 \ : /Users/asiegel/.pixeltable/file\_cache/682f022a704d4459adb2f29f7fe9577c\_0\_b9fb0d9411bc9cd183a36866911baa7a8834f22f665bce47608566b38485c16a.mp4 \ : /var/folders/hb/qd0dztsj43j\_mdb6hbl1gzyc0000gn/T/tmp1jo4a7ca.mp4
Computing cells: 0%| | 0/2 \[00:01\, ? cells/s] Error: Failed to download s3://multimedia-commons/bad\_path.mp4: An error occurred (404) when calling the HeadObject operation: Not Found \[0;31m---------------------------------------------------------------------------\[0m \[0;31mError\[0m Traceback (most recent call last) Cell \[0;32mIn\[9], line 1\[0m \[0;32m----> 1\[0m \[43mv\[49m\[38;5;241;43m.\[39;49m\[43minsert\[49m\[43m(\[49m\[43mvideo\[49m\[38;5;241;43m=\[39;49m\[43mprefix\[49m\[43m \[49m\[38;5;241;43m+\[39;49m\[43m \[49m\[38;5;124;43m'\[39;49m\[38;5;124;43mbad\_path.mp4\[39;49m\[38;5;124;43m'\[39;49m\[43m)\[49m File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/insertable\_table.py:125\[0m, in \[0;36mInsertableTable.insert\[0;34m(self, rows, print\_stats, on\_error, \*\*kwargs)\[0m \[1;32m 123\[0m \[38;5;28;01mraise\[39;00m excs\[38;5;241m.\[39mError(\[38;5;124m'\[39m\[38;5;124mrows must be a list of dictionaries\[39m\[38;5;124m'\[39m) \[1;32m 124\[0m \[38;5;28mself\[39m\[38;5;241m.\[39m\_validate\_input\_rows(rows) \[0;32m--> 125\[0m status \[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43m\_tbl\_version\[49m\[38;5;241;43m.\[39;49m\[43minsert\[49m\[43m(\[49m\[43mrows\[49m\[43m,\[49m\[43m \[49m\[38;5;28;43;01mNone\[39;49;00m\[43m,\[49m\[43m \[49m\[43mprint\_stats\[49m\[38;5;241;43m=\[39;49m\[43mprint\_stats\[49m\[43m,\[49m\[43m \[49m\[43mfail\_on\_exception\[49m\[38;5;241;43m=\[39;49m\[43mfail\_on\_exception\[49m\[43m)\[49m \[1;32m 127\[0m \[38;5;28;01mif\[39;00m status\[38;5;241m.\[39mnum\_excs \[38;5;241m==\[39m \[38;5;241m0\[39m: \[1;32m 128\[0m cols\_with\_excs\_str \[38;5;241m=\[39m \[38;5;124m'\[39m\[38;5;124m'\[39m File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table\_version.py:723\[0m, in \[0;36mTableVersion.insert\[0;34m(self, rows, df, conn, print\_stats, fail\_on\_exception)\[0m \[1;32m 721\[0m \[38;5;28;01mif\[39;00m conn \[38;5;129;01mis\[39;00m \[38;5;28;01mNone\[39;00m: \[1;32m 722\[0m \[38;5;28;01mwith\[39;00m Env\[38;5;241m.\[39mget()\[38;5;241m.\[39mengine\[38;5;241m.\[39mbegin() \[38;5;28;01mas\[39;00m conn: \[0;32m--> 723\[0m \[38;5;28;01mreturn\[39;00m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43m\_insert\[49m\[43m(\[49m \[1;32m 724\[0m \[43m \[49m\[43mplan\[49m\[43m,\[49m\[43m \[49m\[43mconn\[49m\[43m,\[49m\[43m \[49m\[43mtime\[49m\[38;5;241;43m.\[39;49m\[43mtime\[49m\[43m(\[49m\[43m)\[49m\[43m,\[49m\[43m \[49m\[43mprint\_stats\[49m\[38;5;241;43m=\[39;49m\[43mprint\_stats\[49m\[43m,\[49m\[43m \[49m\[43mrowids\[49m\[38;5;241;43m=\[39;49m\[43mrowids\[49m\[43m(\[49m\[43m)\[49m\[43m,\[49m\[43m \[49m\[43mabort\_on\_exc\[49m\[38;5;241;43m=\[39;49m\[43mfail\_on\_exception\[49m\[43m)\[49m \[1;32m 725\[0m \[38;5;28;01melse\[39;00m: \[1;32m 726\[0m \[38;5;28;01mreturn\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_insert( \[1;32m 727\[0m plan, conn, time\[38;5;241m.\[39mtime(), print\_stats\[38;5;241m=\[39mprint\_stats, rowids\[38;5;241m=\[39mrowids(), abort\_on\_exc\[38;5;241m=\[39mfail\_on\_exception) File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table\_version.py:737\[0m, in \[0;36mTableVersion.\_insert\[0;34m(self, exec\_plan, conn, timestamp, rowids, print\_stats, abort\_on\_exc)\[0m \[1;32m 735\[0m \[38;5;28mself\[39m\[38;5;241m.\[39mversion \[38;5;241m+\[39m\[38;5;241m=\[39m \[38;5;241m1\[39m \[1;32m 736\[0m result \[38;5;241m=\[39m UpdateStatus() \[0;32m--> 737\[0m num\_rows, num\_excs, cols\_with\_excs \[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mstore\_tbl\[49m\[38;5;241;43m.\[39;49m\[43minsert\_rows\[49m\[43m(\[49m \[1;32m 738\[0m \[43m \[49m\[43mexec\_plan\[49m\[43m,\[49m\[43m \[49m\[43mconn\[49m\[43m,\[49m\[43m \[49m\[43mv\_min\[49m\[38;5;241;43m=\[39;49m\[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mversion\[49m\[43m,\[49m\[43m \[49m\[43mrowids\[49m\[38;5;241;43m=\[39;49m\[43mrowids\[49m\[43m,\[49m\[43m \[49m\[43mabort\_on\_exc\[49m\[38;5;241;43m=\[39;49m\[43mabort\_on\_exc\[49m\[43m)\[49m \[1;32m 739\[0m result\[38;5;241m.\[39mnum\_rows \[38;5;241m=\[39m num\_rows \[1;32m 740\[0m result\[38;5;241m.\[39mnum\_excs \[38;5;241m=\[39m num\_excs File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/store.py:323\[0m, in \[0;36mStoreBase.insert\_rows\[0;34m(self, exec\_plan, conn, v\_min, show\_progress, rowids, abort\_on\_exc)\[0m \[1;32m 321\[0m \[38;5;28;01mtry\[39;00m: \[1;32m 322\[0m exec\_plan\[38;5;241m.\[39mopen() \[0;32m--> 323\[0m \[38;5;28;01mfor\[39;00m row\_batch \[38;5;129;01min\[39;00m exec\_plan: \[1;32m 324\[0m num\_rows \[38;5;241m+\[39m\[38;5;241m=\[39m \[38;5;28mlen\[39m(row\_batch) \[1;32m 325\[0m \[38;5;28;01mfor\[39;00m batch\_start\_idx \[38;5;129;01min\[39;00m \[38;5;28mrange\[39m(\[38;5;241m0\[39m, \[38;5;28mlen\[39m(row\_batch), \[38;5;28mself\[39m\[38;5;241m.\[39m\_\_INSERT\_BATCH\_SIZE): \[1;32m 326\[0m \[38;5;66;03m# compute batch of rows and convert them into table rows\[39;00m File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exec/expr\_eval\_node.py:45\[0m, in \[0;36mExprEvalNode.\_\_next\_\_\[0;34m(self)\[0m \[1;32m 44\[0m \[38;5;28;01mdef\[39;00m \[38;5;21m\_\_next\_\_\[39m(\[38;5;28mself\[39m) \[38;5;241m-\[39m\[38;5;241m>\[39m DataRowBatch: \[0;32m---> 45\[0m input\_batch \[38;5;241m=\[39m \[38;5;28;43mnext\[39;49m\[43m(\[49m\[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43minput\[49m\[43m)\[49m \[1;32m 46\[0m \[38;5;66;03m# compute target exprs\[39;00m \[1;32m 47\[0m \[38;5;28;01mfor\[39;00m cohort \[38;5;129;01min\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39mcohorts: File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exec/cache\_prefetch\_node.py:71\[0m, in \[0;36mCachePrefetchNode.\_\_next\_\_\[0;34m(self)\[0m \[1;32m 68\[0m futures\[executor\[38;5;241m.\[39msubmit(\[38;5;28mself\[39m\[38;5;241m.\[39m\_fetch\_url, row, info\[38;5;241m.\[39mslot\_idx)] \[38;5;241m=\[39m (row, info) \[1;32m 69\[0m \[38;5;28;01mfor\[39;00m future \[38;5;129;01min\[39;00m concurrent\[38;5;241m.\[39mfutures\[38;5;241m.\[39mas\_completed(futures): \[1;32m 70\[0m \[38;5;66;03m# TODO: does this need to deal with recoverable errors (such as retry after throttling)?\[39;00m \[0;32m---> 71\[0m tmp\_path \[38;5;241m=\[39m \[43mfuture\[49m\[38;5;241;43m.\[39;49m\[43mresult\[49m\[43m(\[49m\[43m)\[49m \[1;32m 72\[0m \[38;5;28;01mif\[39;00m tmp\_path \[38;5;129;01mis\[39;00m \[38;5;28;01mNone\[39;00m: \[1;32m 73\[0m \[38;5;28;01mcontinue\[39;00m File \[0;32m/opt/miniconda3/envs/pxt/lib/python3.9/concurrent/futures/\_base.py:439\[0m, in \[0;36mFuture.result\[0;34m(self, timeout)\[0m \[1;32m 437\[0m \[38;5;28;01mraise\[39;00m CancelledError() \[1;32m 438\[0m \[38;5;28;01melif\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_state \[38;5;241m==\[39m FINISHED: \[0;32m--> 439\[0m \[38;5;28;01mreturn\[39;00m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43m\_\_get\_result\[49m\[43m(\[49m\[43m)\[49m \[1;32m 441\[0m \[38;5;28mself\[39m\[38;5;241m.\[39m\_condition\[38;5;241m.\[39mwait(timeout) \[1;32m 443\[0m \[38;5;28;01mif\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_state \[38;5;129;01min\[39;00m \[CANCELLED, CANCELLED\_AND\_NOTIFIED]: File \[0;32m/opt/miniconda3/envs/pxt/lib/python3.9/concurrent/futures/\_base.py:391\[0m, in \[0;36mFuture.\_\_get\_result\[0;34m(self)\[0m \[1;32m 389\[0m \[38;5;28;01mif\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_exception: \[1;32m 390\[0m \[38;5;28;01mtry\[39;00m: \[0;32m--> 391\[0m \[38;5;28;01mraise\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_exception \[1;32m 392\[0m \[38;5;28;01mfinally\[39;00m: \[1;32m 393\[0m \[38;5;66;03m# Break a reference cycle with the exception in self.\_exception\[39;00m \[1;32m 394\[0m \[38;5;28mself\[39m \[38;5;241m=\[39m \[38;5;28;01mNone\[39;00m File \[0;32m/opt/miniconda3/envs/pxt/lib/python3.9/concurrent/futures/thread.py:58\[0m, in \[0;36m\_WorkItem.run\[0;34m(self)\[0m \[1;32m 55\[0m \[38;5;28;01mreturn\[39;00m \[1;32m 57\[0m \[38;5;28;01mtry\[39;00m: \[0;32m---> 58\[0m result \[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mfn\[49m\[43m(\[49m\[38;5;241;43m*\[39;49m\[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43margs\[49m\[43m,\[49m\[43m \[49m\[38;5;241;43m*\[39;49m\[38;5;241;43m\*\[39;49m\[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mkwargs\[49m\[43m)\[49m \[1;32m 59\[0m \[38;5;28;01mexcept\[39;00m \[38;5;167;01mBaseException\[39;00m \[38;5;28;01mas\[39;00m exc: \[1;32m 60\[0m \[38;5;28mself\[39m\[38;5;241m.\[39mfuture\[38;5;241m.\[39mset\_exception(exc) File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exec/cache\_prefetch\_node.py:115\[0m, in \[0;36mCachePrefetchNode.\_fetch\_url\[0;34m(self, row, slot\_idx)\[0m \[1;32m 113\[0m \[38;5;28mself\[39m\[38;5;241m.\[39mrow\_builder\[38;5;241m.\[39mset\_exc(row, slot\_idx, exc) \[1;32m 114\[0m \[38;5;28;01mif\[39;00m \[38;5;129;01mnot\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39mctx\[38;5;241m.\[39mignore\_errors: \[0;32m--> 115\[0m \[38;5;28;01mraise\[39;00m exc \[38;5;28;01mfrom\[39;00m \[38;5;28;01mNone\[39;00m \[38;5;66;03m# suppress original exception\[39;00m \[1;32m 116\[0m \[38;5;28;01mreturn\[39;00m \[38;5;28;01mNone\[39;00m \[0;31mError\[0m: Failed to download s3://multimedia-commons/bad\_path.mp4: An error occurred (404) when calling the HeadObject operation: Not FoundThe same happens for corrupted files: ```python theme={null} # create invalid .mp4 with tempfile.NamedTemporaryFile( mode='wb', suffix='.mp4', delete=False ) as temp_file: temp_file.write(random.randbytes(1024)) corrupted_path = temp_file.name v.insert([{'video': corrupted_path}]) ```
Computing cells: 100%|██████████████████████████████████████████| 2/2 \[00:00\<00:00, 1084.64 cells/s]
Error: Not a valid video: /var/folders/hb/qd0dztsj43j\_mdb6hbl1gzyc0000gn/T/tmp3djgfyjp.mp4
\[0;31m---------------------------------------------------------------------------\[0m
\[0;31mError\[0m Traceback (most recent call last)
Cell \[0;32mIn\[10], line 6\[0m
\[1;32m 3\[0m temp\_file\[38;5;241m.\[39mwrite(random\[38;5;241m.\[39mrandbytes(\[38;5;241m1024\[39m))
\[1;32m 4\[0m corrupted\_path \[38;5;241m=\[39m temp\_file\[38;5;241m.\[39mname
\[0;32m----> 6\[0m \[43mv\[49m\[38;5;241;43m.\[39;49m\[43minsert\[49m\[43m(\[49m\[43mvideo\[49m\[38;5;241;43m=\[39;49m\[43mcorrupted\_path\[49m\[43m)\[49m
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/insertable\_table.py:125\[0m, in \[0;36mInsertableTable.insert\[0;34m(self, rows, print\_stats, on\_error, \*\*kwargs)\[0m
\[1;32m 123\[0m \[38;5;28;01mraise\[39;00m excs\[38;5;241m.\[39mError(\[38;5;124m'\[39m\[38;5;124mrows must be a list of dictionaries\[39m\[38;5;124m'\[39m)
\[1;32m 124\[0m \[38;5;28mself\[39m\[38;5;241m.\[39m\_validate\_input\_rows(rows)
\[0;32m--> 125\[0m status \[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43m\_tbl\_version\[49m\[38;5;241;43m.\[39;49m\[43minsert\[49m\[43m(\[49m\[43mrows\[49m\[43m,\[49m\[43m \[49m\[38;5;28;43;01mNone\[39;49;00m\[43m,\[49m\[43m \[49m\[43mprint\_stats\[49m\[38;5;241;43m=\[39;49m\[43mprint\_stats\[49m\[43m,\[49m\[43m \[49m\[43mfail\_on\_exception\[49m\[38;5;241;43m=\[39;49m\[43mfail\_on\_exception\[49m\[43m)\[49m
\[1;32m 127\[0m \[38;5;28;01mif\[39;00m status\[38;5;241m.\[39mnum\_excs \[38;5;241m==\[39m \[38;5;241m0\[39m:
\[1;32m 128\[0m cols\_with\_excs\_str \[38;5;241m=\[39m \[38;5;124m'\[39m\[38;5;124m'\[39m
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table\_version.py:723\[0m, in \[0;36mTableVersion.insert\[0;34m(self, rows, df, conn, print\_stats, fail\_on\_exception)\[0m
\[1;32m 721\[0m \[38;5;28;01mif\[39;00m conn \[38;5;129;01mis\[39;00m \[38;5;28;01mNone\[39;00m:
\[1;32m 722\[0m \[38;5;28;01mwith\[39;00m Env\[38;5;241m.\[39mget()\[38;5;241m.\[39mengine\[38;5;241m.\[39mbegin() \[38;5;28;01mas\[39;00m conn:
\[0;32m--> 723\[0m \[38;5;28;01mreturn\[39;00m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43m\_insert\[49m\[43m(\[49m
\[1;32m 724\[0m \[43m \[49m\[43mplan\[49m\[43m,\[49m\[43m \[49m\[43mconn\[49m\[43m,\[49m\[43m \[49m\[43mtime\[49m\[38;5;241;43m.\[39;49m\[43mtime\[49m\[43m(\[49m\[43m)\[49m\[43m,\[49m\[43m \[49m\[43mprint\_stats\[49m\[38;5;241;43m=\[39;49m\[43mprint\_stats\[49m\[43m,\[49m\[43m \[49m\[43mrowids\[49m\[38;5;241;43m=\[39;49m\[43mrowids\[49m\[43m(\[49m\[43m)\[49m\[43m,\[49m\[43m \[49m\[43mabort\_on\_exc\[49m\[38;5;241;43m=\[39;49m\[43mfail\_on\_exception\[49m\[43m)\[49m
\[1;32m 725\[0m \[38;5;28;01melse\[39;00m:
\[1;32m 726\[0m \[38;5;28;01mreturn\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_insert(
\[1;32m 727\[0m plan, conn, time\[38;5;241m.\[39mtime(), print\_stats\[38;5;241m=\[39mprint\_stats, rowids\[38;5;241m=\[39mrowids(), abort\_on\_exc\[38;5;241m=\[39mfail\_on\_exception)
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table\_version.py:737\[0m, in \[0;36mTableVersion.\_insert\[0;34m(self, exec\_plan, conn, timestamp, rowids, print\_stats, abort\_on\_exc)\[0m
\[1;32m 735\[0m \[38;5;28mself\[39m\[38;5;241m.\[39mversion \[38;5;241m+\[39m\[38;5;241m=\[39m \[38;5;241m1\[39m
\[1;32m 736\[0m result \[38;5;241m=\[39m UpdateStatus()
\[0;32m--> 737\[0m num\_rows, num\_excs, cols\_with\_excs \[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mstore\_tbl\[49m\[38;5;241;43m.\[39;49m\[43minsert\_rows\[49m\[43m(\[49m
\[1;32m 738\[0m \[43m \[49m\[43mexec\_plan\[49m\[43m,\[49m\[43m \[49m\[43mconn\[49m\[43m,\[49m\[43m \[49m\[43mv\_min\[49m\[38;5;241;43m=\[39;49m\[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mversion\[49m\[43m,\[49m\[43m \[49m\[43mrowids\[49m\[38;5;241;43m=\[39;49m\[43mrowids\[49m\[43m,\[49m\[43m \[49m\[43mabort\_on\_exc\[49m\[38;5;241;43m=\[39;49m\[43mabort\_on\_exc\[49m\[43m)\[49m
\[1;32m 739\[0m result\[38;5;241m.\[39mnum\_rows \[38;5;241m=\[39m num\_rows
\[1;32m 740\[0m result\[38;5;241m.\[39mnum\_excs \[38;5;241m=\[39m num\_excs
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/store.py:334\[0m, in \[0;36mStoreBase.insert\_rows\[0;34m(self, exec\_plan, conn, v\_min, show\_progress, rowids, abort\_on\_exc)\[0m
\[1;32m 332\[0m \[38;5;28;01mif\[39;00m abort\_on\_exc \[38;5;129;01mand\[39;00m row\[38;5;241m.\[39mhas\_exc():
\[1;32m 333\[0m exc \[38;5;241m=\[39m row\[38;5;241m.\[39mget\_first\_exc()
\[0;32m--> 334\[0m \[38;5;28;01mraise\[39;00m exc
\[1;32m 336\[0m rowid \[38;5;241m=\[39m (\[38;5;28mnext\[39m(rowids),) \[38;5;28;01mif\[39;00m rowids \[38;5;129;01mis\[39;00m \[38;5;129;01mnot\[39;00m \[38;5;28;01mNone\[39;00m \[38;5;28;01melse\[39;00m row\[38;5;241m.\[39mpk\[:\[38;5;241m-\[39m\[38;5;241m1\[39m]
\[1;32m 337\[0m pk \[38;5;241m=\[39m rowid \[38;5;241m+\[39m (v\_min,)
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exprs/column\_ref.py:159\[0m, in \[0;36mColumnRef.eval\[0;34m(self, data\_row, row\_builder)\[0m
\[1;32m 156\[0m \[38;5;28;01mreturn\[39;00m
\[1;32m 158\[0m \[38;5;28;01mtry\[39;00m:
\[0;32m--> 159\[0m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mcol\[49m\[38;5;241;43m.\[39;49m\[43mcol\_type\[49m\[38;5;241;43m.\[39;49m\[43mvalidate\_media\[49m\[43m(\[49m\[43mdata\_row\[49m\[38;5;241;43m.\[39;49m\[43mfile\_paths\[49m\[43m\[\[49m\[43munvalidated\_slot\_idx\[49m\[43m]\[49m\[43m)\[49m
\[1;32m 160\[0m \[38;5;66;03m# access the value only after successful validation\[39;00m
\[1;32m 161\[0m val \[38;5;241m=\[39m data\_row\[unvalidated\_slot\_idx]
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/type\_system.py:906\[0m, in \[0;36mVideoType.validate\_media\[0;34m(self, val)\[0m
\[1;32m 904\[0m \[38;5;28;01mraise\[39;00m excs\[38;5;241m.\[39mError(\[38;5;124mf\[39m\[38;5;124m'\[39m\[38;5;124mNot a valid video: \[39m\[38;5;132;01m\{\[39;00mval\[38;5;132;01m}\[39;00m\[38;5;124m'\[39m)
\[1;32m 905\[0m \[38;5;28;01mexcept\[39;00m av\[38;5;241m.\[39mAVError:
\[0;32m--> 906\[0m \[38;5;28;01mraise\[39;00m excs\[38;5;241m.\[39mError(\[38;5;124mf\[39m\[38;5;124m'\[39m\[38;5;124mNot a valid video: \[39m\[38;5;132;01m\{\[39;00mval\[38;5;132;01m}\[39;00m\[38;5;124m'\[39m) \[38;5;28;01mfrom\[39;00m \[38;5;28;01mNone\[39;00m
\[0;31mError\[0m: Not a valid video: /var/folders/hb/qd0dztsj43j\_mdb6hbl1gzyc0000gn/T/tmp3djgfyjp.mp4
Alternatively, Pixeltable can also be instructed to record error
conditions and proceed with the ingest, via the `on_error` flag
(default: `'abort'`):
```python theme={null}
v.insert(
[{'video': prefix + 'bad_path.mp4'}, {'video': corrupted_path}],
on_error='ignore',
)
```
Computing cells: 100%|████████████████████████████████████████████| 4/4 \[00:00\<00:00, 20.98 cells/s] Inserting rows into \`videos\`: 2 rows \[00:00, 671.63 rows/s] Computing cells: 100%|████████████████████████████████████████████| 4/4 \[00:00\<00:00, 20.13 cells/s] Inserted 2 rows with 4 errors across 2 columns (videos.video, videos.None). UpdateStatus(num\_rows=2, num\_computed\_values=4, num\_excs=4, updated\_cols=\[], cols\_with\_excs=\['videos.video', 'videos.None'])Every media column has properties `errortype` and `errormsg` (both containing `string` data) that indicate whether the column value is valid. Invalid values show up as `None` and have non-null `errortype`/`errormsg`: ```python theme={null} v.select(v.video == None, v.video.errortype, v.video.errormsg).collect() ``` Errors can now be inspected (and corrected) after the ingest: ```python theme={null} v.where(v.video.errortype != None).select(v.video.errormsg).collect() ``` ## Accessing the original file paths In some cases, it will be necessary to access file paths (not, say, the `PIL.Image.Image`), and Pixeltable provides the column properties `fileurl` and `localpath` for that purpose: ```python theme={null} v.select(v.video.fileurl, v.video.localpath).collect() ``` Note that for local media files, the `fileurl` property still returns a parsable URL. # Iterators Source: https://docs.pixeltable.com/platform/iterators Learn about iterators for processing documents, videos, audio, and images ## What are iterators? Iterators in Pixeltable are specialized tools for processing and transforming media content. They efficiently break down large files into manageable chunks, enabling analysis at different granularities. Iterators work seamlessly with views to create virtual derived tables without duplicating storage. In Pixeltable, iterators: * Process media files incrementally to manage memory efficiently * Transform single records into multiple output records * Support various media types including documents, videos, images, and audio * Integrate with the view system for automated processing pipelines * Provide configurable parameters for fine-tuning output Iterators are particularly useful when: * Working with large media files that can't be processed at once * Building retrieval systems that require chunked content * Creating analysis pipelines for multimedia data * Implementing feature extraction workflows ```python theme={null} import pixeltable as pxt from pixeltable.functions.document import document_splitter # Create a view using an iterator chunks = pxt.create_view( 'docs/chunks', documents_table, iterator=document_splitter( document=documents_table.document, separators='sentence,token_limit', limit=300 ) ) ``` ## Core concepts
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory \`udf\_demo\`. Created table \`strings\`. Inserting rows into \`strings\`: 2 rows \[00:00, 763.99 rows/s] Inserted 2 rows with 0 errors.## What is a UDF? A Pixeltable UDF is just a Python function that is marked with the `@pxt.udf` decorator. ```python theme={null} @pxt.udf def add_one(n: int) -> int: return n + 1 ``` It’s as simple as that! Without the decorator, `add_one` would be an ordinary Python function that operates on integers. Adding `@pxt.udf` converts it into a Pixeltable function that operates on *columns* of integers. The decorated function can then be used directly to define computed columns; Pixeltable will orchestrate its execution across all the input data. For our first working example, let’s do something slightly more interesting: write a function to extract the longest word from a sentence. (If there are ties for the longest word, we choose the first word among those ties.) In Python, that might look something like this: ```python theme={null} import numpy as np def longest_word(sentence: str, strip_punctuation: bool = False) -> str: words = sentence.split() if ( strip_punctuation ): # Remove non-alphanumeric characters from each word words = [''.join(filter(str.isalnum, word)) for word in words] i = np.argmax([len(word) for word in words]) return words[i] ``` ```python theme={null} longest_word("Let's check that it works.", strip_punctuation=True) ```
'check'The `longest_word` Python function isn’t a Pixeltable UDF (yet); it operates on individual strings, not columns of strings. Adding the decorator turns it into a UDF: ```python theme={null} @pxt.udf def longest_word(sentence: str, strip_punctuation: bool = False) -> str: words = sentence.split() if ( strip_punctuation ): # Remove non-alphanumeric characters from each word words = [''.join(filter(str.isalnum, word)) for word in words] i = np.argmax([len(word) for word in words]) return words[i] ``` Now we can use it to create a computed column. Pixeltable orchestrates the computation like it does with any other function, applying the UDF in turn to each existing row of the table, then updating incrementally each time a new row is added. ```python theme={null} t.add_computed_column(longest_word=longest_word(t.input)) t.show() ```
Computing cells: 100%|███████████████████████████████████████████| 2/2 \[00:00\<00:00, 138.28 cells/s] Added 2 column values with 0 errors.```python theme={null} t.insert([{'input': 'Pixeltable updates tables incrementally.'}]) t.show() ```
Computing cells: 0%| | 0/3 \[00:00\, ? cells/s] Inserting rows into \`strings\`: 1 rows \[00:00, 255.24 rows/s] Computing cells: 100%|███████████████████████████████████████████| 3/3 \[00:00\<00:00, 364.69 cells/s] Inserted 1 row with 0 errors.Oops, those trailing punctuation marks are kind of annoying. Let’s add another column, this time using the handy `strip_punctuation` parameter from our UDF. (We could alternatively drop the first column before adding the new one, but for purposes of this tutorial it’s convenient to see how Pixeltable executes both variants side-by-side.) Note how *columns* such as `t.input` and *constants* such as `True` can be freely intermixed as arguments to the UDF. ```python theme={null} t.add_computed_column( longest_word_2=longest_word(t.input, strip_punctuation=True) ) t.show() ```
Computing cells: 100%|███████████████████████████████████████████| 3/3 \[00:00\<00:00, 252.91 cells/s] Added 3 column values with 0 errors.## Types in UDFs You might have noticed that the `longest_word` UDF has *type hints* in its signature. ```python theme={null} def longest_word(sentence: str, strip_punctuation: bool = False) -> str: ... ``` The `sentence` parameter, `strip_punctuation` parameter, and return value all have explicit types (`str`, `bool`, and `str` respectively). In general Python code, type hints are usually optional. But Pixeltable is a database system: *everything* in Pixeltable must have a type. And since Pixeltable is also an orchestrator - meaning it sets up workflows and computed columns *before* executing them - these types need to be known in advance. That’s the reasoning behind a fundamental principle of Pixeltable UDFs: * Type hints are *required*. You can turn almost any Python function into a Pixeltable UDF, provided that it has type hints, and provided that Pixeltable supports the types that it uses. The most familiar types that you’ll use in UDFs are: * `int` * `float` * `str` * `list` (can optionally be parameterized, e.g., `list[str]`) * `dict` (can optionally be parameterized, e.g., `dict[str, int]`) * `PIL.Image.Image` In addition to these standard Python types, Pixeltable also recognizes various kinds of arrays, audio and video media, and documents. ## Local and module UDFs The `longest_word` UDF that we defined above is a *local* UDF: it was defined directly in our notebook, rather than in a module that we imported. Many other UDFs, including all of Pixeltable’s built-in functions, are defined in modules. We encountered a few of these in the 10-Minute Tour tutorial: the `huggingface.detr_for_object_detection` and `openai.vision` functions. (Although these are built-in functions, they behave the same way as UDFs, and in fact they’re defined the same way under the covers.) There is an important difference between the two. When you add a module UDF such as `openai.vision` to a table, Pixeltable stores a *reference* to the corresponding Python function in the module. If you later restart your Python runtime and reload Pixeltable, then Pixeltable will re-import the module UDF when it loads the computed column. This means that any code changes made to the UDF will be picked up at that time, and the new version of the UDF will be used in any future execution. Conversely, when you add a local UDF to a table, the *entire code* for the UDF is serialized and stored in the table. This ensures that if you restart your notebook kernel (say), or even delete the notebook entirely, the UDF will continue to function. However, it also means that if you modify the UDF code, the updated logic will not be reflected in any existing Pixeltable columns. To see how this works in practice, let’s modify our `longest_word` UDF so that if `strip_punctuation` is `True`, then we remove only a single punctuation mark from the *end* of each word. ```python theme={null} @pxt.udf def longest_word(sentence: str, strip_punctuation: bool = False) -> str: words = sentence.split() if strip_punctuation: words = [ word if word[-1].isalnum() else word[:-1] for word in words ] i = np.argmax([len(word) for word in words]) return words[i] ``` Now we see that Pixeltable continues to use the *old* definition, even as new rows are added to the table. ```python theme={null} t.insert([{'input': "Let's check that it still works."}]) t.show() ```
Computing cells: 0%| | 0/5 \[00:00\, ? cells/s] Inserting rows into \`strings\`: 1 rows \[00:00, 242.01 rows/s] Computing cells: 100%|███████████████████████████████████████████| 5/5 \[00:00\<00:00, 623.99 cells/s] Inserted 1 row with 0 errors.But if we add a new *column* that references the `longest_word` UDF, Pixeltable will use the updated version. ```python theme={null} t.add_computed_column( longest_word_3=longest_word(t.input, strip_punctuation=True) ) t.show() ```
Computing cells: 100%|███████████████████████████████████████████| 4/4 \[00:00\<00:00, 348.89 cells/s] Added 4 column values with 0 errors.The general rule is: changes to module UDFs will affect any future execution; changes to local UDFs will only affect *new columns* that are defined using the new version of the UDF. ## Batching Pixeltable provides several ways to optimize UDFs for better performance. One of the most common is *batching*, which is particularly important for UDFs that involve GPU operations. Ordinary UDFs process one row at a time, meaning the UDF will be invoked exactly once per row processed. Conversely, a batched UDF processes several rows at a time; the specific number is user-configurable. As an example, let’s modify our `longest_word` UDF to take a batched parameter. Here’s what it looks like: ```python theme={null} from pixeltable.func import Batch @pxt.udf(batch_size=16) def longest_word( sentences: Batch[str], strip_punctuation: bool = False ) -> Batch[str]: results = [] for sentence in sentences: words = sentence.split() if strip_punctuation: words = [ word if word[-1].isalnum() else word[:-1] for word in words ] i = np.argmax([len(word) for word in words]) results.append(words[i]) return results ``` There are several changes: * The parameter `batch_size=16` has been added to the `@pxt.udf` decorator, specifying the batch size; * The `sentences` parameter has changed from `str` to `Batch[str]`; * The return type has also changed from `str` to `Batch[str]`; and * Instead of processing a single sentence, the UDF is processing a `Batch` of sentences and returning the result `Batch`. What exactly is a `Batch[str]`? Functionally, it’s simply a `list[str]`, and you can use it exactly like a `list[str]` in any Python code. The only difference is in the type hint; a type hint of `Batch[str]` tells Pixeltable, “My data consists of individual strings that I want you to process in batches”. Conversely, a type hint of `list[str]` would mean, “My data consists of *lists* of strings that I want you to process one at a time”. Notice that the `strip_punctuation` parameter is *not* wrapped in a `Batch` type. This because `strip_punctuation` controls the behavior of the UDF, rather than being part of the input data. When we use the batched `longest_word` UDF, the `strip_punctuation` parameter will always be a constant, not a column. Let’s put the new, batched UDF to work. ```python theme={null} t.add_computed_column( longest_word_3_batched=longest_word(t.input, strip_punctuation=True) ) t.show() ```
Computing cells: 100%|███████████████████████████████████████████| 4/4 \[00:00\<00:00, 353.90 cells/s] Added 4 column values with 0 errors.As expected, the output of the `longest_word_3_batched` column is identical to the `longest_word_3` column. Under the covers, though, Pixeltable is orchestrating execution in batches of 16. That probably won’t have much performance impact on our toy example, but for GPU-bound computations such as text or image embeddings, it can make a substantial difference. ## UDAs (aggregate UDFs) Ordinary UDFs are always one-to-one on rows: each row of input generates one UDF output value. Functions that aggregate data, conversely, are many-to-one, and in Pixeltable they are represented by a related abstraction, the UDA (User-Defined Aggregate). Pixeltable has a number of built-in UDAs; if you’ve worked through the Fundamentals tutorial, you’ll have already encountered a few of them, such as `sum` and `count`. In this section, we’ll show how to define your own custom UDAs. For demonstration purposes, let’s start by creating a table containing all the integers from 0 to 49. ```python theme={null} import pixeltable as pxt t = pxt.create_table('udf_demo/values', {'val': pxt.Int}) t.insert({'val': n} for n in range(50)) ```
Created table \`values\`. Inserting rows into \`values\`: 50 rows \[00:00, 9267.95 rows/s] Inserted 50 rows with 0 errors. UpdateStatus(num\_rows=50, num\_computed\_values=0, num\_excs=0, updated\_cols=\[], cols\_with\_excs=\[])If we wanted to compute their sum using the built-in `sum` aggregate, we’d do it like this: ```python theme={null} import pixeltable.functions as pxtf t.select(pxtf.sum(t.val)).collect() ``` Or perhaps we want to group them by `n // 10` (corresponding to the tens digit of each integer) and sum each group: ```python theme={null} t.group_by(t.val // 10).order_by(t.val // 10).select( t.val // 10, pxtf.sum(t.val) ).collect() ``` Now let’s define a new aggregate to compute the sum of squares of a set of numbers. To define an aggregate, we implement a subclass of the `pxt.Aggregator` Python class and decorate it with the `@pxt.uda` decorator, similar to what we did for UDFs. The subclass must implement three methods: * `__init__()` - initializes the aggregator; can be used to parameterize aggregator behavior * `update()` - updates the internal state of the aggregator with a new value * `value()` - retrieves the current value held by the aggregator In our example, the class will have a single member `cur_sum`, which holds a running total of the squares of all the values we’ve seen. ```python theme={null} @pxt.uda class sum_of_squares(pxt.Aggregator): def __init__(self): # No data yet; initialize `cur_sum` to 0 self.cur_sum = 0 def update(self, val: int) -> None: # Update the value of `cur_sum` with the new datapoint self.cur_sum += val * val def value(self) -> int: # Retrieve the current value of `cur_sum` return self.cur_sum ``` ```python theme={null} t.select(sum_of_squares(t.val)).collect() ``` ```python theme={null} t.group_by(t.val // 10).order_by(t.val // 10).select( t.val // 10, sum_of_squares(t.val) ).collect() ``` # Version Control and Lineage Source: https://docs.pixeltable.com/platform/version-control Automatic versioning, time travel queries, and full data lineage tracking Pixeltable automatically tracks every change to your tables—data insertions, updates, deletions, and schema modifications. Query any point in history, undo mistakes, and maintain full reproducibility without manual version management. ## How it works Every operation that modifies a table creates a new version: ```python theme={null} import pixeltable as pxt # Version 0: Table created products = pxt.create_table('demo/products', { 'name': pxt.String, 'price': pxt.Float }) # Version 1: Data inserted products.insert([ {'name': 'Widget', 'price': 9.99}, {'name': 'Gadget', 'price': 24.99} ]) # Version 2: Schema changed products.add_computed_column(price_with_tax=products.price * 1.08) # Version 3: Data updated products.update({'price': 19.99}, where=products.name == 'Widget') ``` No configuration required—versioning is always on. ## Viewing history ### Human-readable history ```python theme={null} products.history() ``` Returns a DataFrame showing all versions with timestamps, change types, and row counts: | version | created\_at | change\_type | inserts | updates | deletes | schema\_change | | ------- | ------------------- | ------------ | ------- | ------- | ------- | ----------------------- | | 3 | 2025-01-15 10:30:00 | data | 0 | 1 | 0 | None | | 2 | 2025-01-15 10:29:00 | schema | 0 | 2 | 0 | Added: price\_with\_tax | | 1 | 2025-01-15 10:28:00 | data | 2 | 0 | 0 | None | | 0 | 2025-01-15 10:27:00 | schema | 0 | 0 | 0 | Initial Version | ### Programmatic access ```python theme={null} versions = products.get_versions() # List of dictionaries latest = versions[0] print(f"Version {latest['version']}: {latest['inserts']} inserts") ``` ## Time travel queries Query any historical version using the `table_name:version` syntax: ```python theme={null} # Get the table at version 1 (before computed column) products_v1 = pxt.get_table('demo/products:1') products_v1.collect() # Returns data as it was at version 1 # Compare with current state products.collect() # Returns current data ``` Version handles are **read-only**—you cannot modify historical data. ### Use cases * **Debugging**: Compare data before and after a problematic update * **Auditing**: Track who changed what and when * **Recovery**: Find and extract accidentally deleted or modified data * **Reproducibility**: Query exact data used for a specific model training run ## Reverting changes Undo the most recent change with `revert()`: ```python theme={null} # Oops, wrong update products.update({'price': 0.00}, where=products.name == 'Widget') # Undo it products.revert() # Removes version N, table is now at version N-1 ```
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'fundamentals'. Created table 'population'. Inserting rows into \`population\`: 234 rows \[00:00, 6850.71 rows/s] Inserted 234 rows with 0 errors.Also recall that `pop_t.head()` returns the first few rows of a table, and typing the table name `pop_t` by itself gives the schema. ```python theme={null} pop_t.head(5) ``` ```python theme={null} pop_t ``` Now let’s suppose we want to add a new column for the year-over-year population change from 2022 to 2023. You can `select()` such a quantity into a Pixeltable `Query`, giving it the name `yoy_change` (year-over-year change): ```python theme={null} pop_t.select( pop_t.country, yoy_change=(pop_t.pop_2023 - pop_t.pop_2022) ).head(5) ``` A **computed column** is a way of turning such a selection into a new, permanent column of the table. Here’s how it works: ```python theme={null} pop_t.add_computed_column(yoy_change=(pop_t.pop_2023 - pop_t.pop_2022)) ```
Added 234 column values with 0 errors. 234 rows updated, 468 values computed.As soon as the column is added, Pixeltable will (by default) automatically compute its value for all rows in the table, storing the results in the new column. If we now inspect the schema of `pop_t`, we see the new column and its definition. ```python theme={null} pop_t ``` The new column can be queried in the usual manner. ```python theme={null} pop_t.select(pop_t.country, pop_t.yoy_change).head(5) ``` The output is identical to the previous example, but now we’re retrieving the computed output from the database, instead of computing it on-the-fly. Computed columns can be “chained” with other computed columns. Here’s an example that expresses population change as a percentage: ```python theme={null} pop_t.add_computed_column( yoy_percent_change=(100 * pop_t.yoy_change / pop_t.pop_2022) ) ```
Added 234 column values with 0 errors. 234 rows updated, 468 values computed.```python theme={null} pop_t ``` ```python theme={null} pop_t.select( pop_t.country, pop_t.yoy_change, pop_t.yoy_percent_change ).head(5) ``` Although computed columns appear superficially similar to Queries, there is a key difference. Because computed columns are a permanent part of the table, they will be automatically updated any time new data is added to the table. These updates will propagate through any other computed columns that are “downstream” of the new data, ensuring that the state of the entire data is kept up-to-date.
Inserting rows into \`population\`: 1 rows \[00:00, 228.35 rows/s] Inserted 1 row with 0 errors. 1 row inserted, 5 values computed.Observe that the computed columns `yoy_growth` and `yoy_percent_growth` have been automatically updated in response to the new data. ```python theme={null} pop_t.tail(5) ```
Inserting rows into \`population\`: 235 rows \[00:00, 8795.92 rows/s] 235 rows updated, 940 values computed.```python theme={null} pop_t.tail(5) ``` As expected, it looks the same.
recompute\_columns() is primarily useful when the input data
remains the same, but your UDF business logic changes.
Created table 'image\_ops'.```python theme={null} url_prefix = 'https://github.com/pixeltable/pixeltable/raw/release/docs/resources/images' images = ['000000000139.jpg', '000000000632.jpg', '000000000872.jpg'] t.insert({'source': f'{url_prefix}/{image}'} for image in images) ```
Inserting rows into \`image\_ops\`: 3 rows \[00:00, 1133.39 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 6 values computed.```python theme={null} t.collect() ``` What are some things we might want to do with these images? A fairly basic one is to extract metadata. Pixeltable provides the built-in UDF `get_metadata()`, which returns a dictionary with various metadata about the image. Let’s go ahead and make this a computed column.
get\_metadata() function isn’t user-defined, it’s built in
to the Pixeltable library. But we’ll consistently refer to Pixeltable
functions as “UDFs” in order to clearly distinguish them from ordinary
Python functions. Later in this guide, we’ll see how to turn (almost)
any Python function into a Pixeltable UDF.
Added 3 column values with 0 errors.Image operations, of course, can also return new images. ```python theme={null} t.add_computed_column(rotated=t.source.rotate(10)) ```
Added 3 column values with 0 errors. 3 rows updated, 3 values computed.```python theme={null} t.collect() ``` Or, perhaps we want to rotate our images and fill them in with a transparent background rather than black. We can do this by chaining image operations, adding a transparency layer before doing the rotation. ```python theme={null} t.add_computed_column( rotated_transparent=t.source.convert('RGBA').rotate(10) ) t.collect() ```
Added 3 column values with 0 errors.
get\_metadata(),
convert(), and rotate(), Pixeltable has a
sizable library of other common image operations that can be used as
UDFs in computed columns. For the most part, the image UDFs are analogs
of the operations provided by the
Pillow library
(in fact, Pixeltable is just using Pillow under the covers). You can
read more about the provided image (and other) UDFs in the
Pixeltable SDK
Documentation.
Added 3 column values with 0 errors. 3 rows updated, 3 values computed.```python theme={null} t.select(t.source, t.detections).collect() ``` It’s great that the DETR model gave us so much information about the images, but it’s not exactly in human-readable form. Those are JSON structures that encode bounding boxes, confidence scores, and categories for each detected object. Let’s do something more useful with them: we’ll use Pixeltable’s `draw_bounding_boxes()` API to superimpose bounding boxes on the images, using different colors to distinguish different object categories. ```python theme={null} from pixeltable.functions.vision import draw_bounding_boxes t.add_computed_column( image_with_bb=draw_bounding_boxes( t.source, t.detections.boxes, labels=t.detections.label_text, fill=True, ) ) t.select(t.source, t.image_with_bb).collect() ```
Added 3 column values with 0 errors.It can be a little hard to see what’s going on, so let’s zoom in on just one image. If you select a single image in a notebook, Pixeltable will enlarge its display: ```python theme={null} t.select(t.image_with_bb).head(1) ``` Let’s check in on our schema. We now have five computed columns, all derived from the single source column. ```python theme={null} t ``` And as always, when we add new data to the table, its computed columns are updated automatically. Let’s try this on a few more images. ```python theme={null} more_images = ['000000000108.jpg', '000000000885.jpg'] t.insert({'source': f'{url_prefix}/{image}'} for image in more_images) ```
Inserting rows into \`image\_ops\`: 2 rows \[00:00, 944.77 rows/s] Inserted 2 rows with 0 errors. 2 rows inserted, 14 values computed.```python theme={null} t.select( t.source, t.image_with_bb, t.detections.label_text, t.metadata ).tail(2) ```
t.detections, as well as generated images
such as t.image\_with\_bb. (Later we’ll see how to tune this
behavior in cases where it might be undesirable to store
everything, but the default behavior is that computed column
output is always persisted.)
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'demo'.In this guide we’ll work with a subset of the MNIST dataset, a classic reference database of hand-drawn digits. A copy of the MNIST dataset is hosted on the Hugging Face datasets repository, so we can use `create_table()` with the `source` parameter to load it into a Pixeltable table. ```python theme={null} import datasets # Download the first 50 images of the MNIST dataset ds = datasets.load_dataset('ylecun/mnist', split='train[:50]') # Import them into a Pixeltable table t = pxt.create_table('demo/mnist', source=ds) ```
Created table 'mnist'. Inserting rows into \`mnist\`: 50 rows \[00:00, 7516.67 rows/s] Inserted 50 rows with 0 errors.```python theme={null} t.head(5) ``` ### Column References The most basic type of expression is a **column reference**: that’s what you get when you type, say, `t.image`. An expression such as `t.image` by itself is just a Python object; it doesn’t contain any actual data, and no data will be loaded until you use the expression in a `select()` query or `add_column()` statement. Here’s what we get if we type `t.image` by itself: ```python theme={null} t.image ``` This is true of all Pixeltable expressions: we can freely create them and manipulate them in various ways, but no actual data will be loaded until we use them in a query. ### JSON Collections (Dicts and Lists) Data is commonly presented in JSON format: for example, API responses and model output often take the shape of JSON dictionaries or lists of dictionaries. Pixeltable has native support for JSON accessors. To demonstrate this, let’s add a computed column that runs an image classification model against the images in our dataset. ```python theme={null} from pixeltable.functions.huggingface import vit_for_image_classification t.add_computed_column( classification=vit_for_image_classification( t.image, model_id='farleyknight-org-username/vit-base-mnist' ) ) ```
Added 50 column values with 0 errors. 50 rows updated, 50 values computed.```python theme={null} t.select(t.image, t.classification).head(3) ``` We see that the output is returned as a dict containing three lists: the five most likely labels (classes) for the image, the corresponding text labels (in this case, just the string form of the class number), and the scores (confidences) of each prediction. The Pixeltable type of the `classification` column is `pxt.Json`: ```python theme={null} t ``` Pixeltable provides a range of operators on `Json`-typed output that behave just as you’d expect. To look up a key in a dictionary, use the syntax `t.classification['labels']`: ```python theme={null} t.select(t.classification['labels']).head(3) ``` You can also use a convenient “attribute” syntax for dictionary lookups. This follows the standard [JSONPath](https://en.wikipedia.org/wiki/JSONPath) expression syntax. ```python theme={null} t.select(t.classification.labels).head(3) ``` The “attribute” syntax isn’t fully general (it won’t work for dictionary keys that are not valid Python identifiers), but it’s handy when it works. `t.classification.labels` is another Pixeltable expression; you can think of it as saying, “do the `'labels'` lookup from every dictionary in the column `t.classification`, and return the result as a new column.” As before, the expression by itself contains no data; it’s the query that does the actual work of retrieving data. Here’s what we see if we just give the expression by itself, without a query: ```python theme={null} t.classification.labels ```
classification.labelsSimilarly, one can pull out a specific item in a list (for this model, we’re probably mostly interested in the first item anyway): ```python theme={null} t.select(t.classification.labels[0]).head(3) ``` Or slice a list in the usual manner: ```python theme={null} t.select(t.classification.labels[:2]).head(3) ``` Pixeltable is resilient against out-of-bounds indices or dictionary keys. If an index or key doesn’t exist for a particular row, you’ll get a `None` output for that row. ```python theme={null} t.select(t.classification.not_a_key).head(3) ``` As always, any expression can be used to create a computed column. ```python theme={null} # Use label_text to be consistent with t.label, which was given # to us as a string t.add_computed_column(pred_label=t.classification.label_text[0]) t ```
Added 50 column values with 0 errors.Finally, just as it’s possible to extract items from lists and dictionaries using Pixeltable expressions, you can also construct new lists and dictionaries: just package them up in the usual way. ```python theme={null} custom_dict = { # Keys must be strings; values can be any expressions 'ground_truth': t.label, 'prediction': t.pred_label, 'is_correct': t.label == t.pred_label, # You can also use constants as values 'engine': 'pixeltable', } t.select(t.image, custom_dict).head(5) ``` ### UDF Calls UDF calls are another common type of expression. For example, we used one earlier when we added a model invocation to our workload: ```python theme={null} vit_for_image_classification( t.image, model_id='farleyknight-org-username/vit-base-mnist' ) ``` This calls the `vit_for_image_classification` UDF in the `pxt.functions.huggingface` module. Note that `vit_for_image_classification` is a Pixeltable UDF, not an ordinary Python function. You can think of a Pixeltable UDF as a function that operates on columns of data, iteratively applying an underlying operation to each row in the column (or columns). In this case, `vit_for_image_classification` operates on `t.image`, running the model against every image in the column. Notice that in addition to the column `t.image`, this call to `vit_for_image_classification` also takes a constant argument specifying the `model_id`. Any UDF call argument may be a constant, and the constant value simply means “use this value for every row being evaluated”. You can always compose Pixeltable expressions to form more complicated ones; here’s an example that runs the model against a 90-degree rotation of every image in the sample and extracts the label. Not surprisingly, the model doesn’t perform as well on the rotated images. ```python theme={null} rot_model_result = vit_for_image_classification( t.image.rotate(90), model_id='farleyknight-org-username/vit-base-mnist', ) t.select(t.image, rot_label=rot_model_result.labels[0]).head(5) ```
rot\_model\_result for
later reuse. Every Pixeltable expression is a Python object, so you can
freely assign them to variables, reuse them, compose them, and so on.
Remember that nothing actually happens until the expression is used in a
query - so in this example, setting the variable
rot\_model\_result doesn’t itself result in any data being
retrieved; that only happens later, when we actually use it in the
select() query.
Added 50 column values with 0 errors.The underlying Python type of `pxt.Array` is an ordinary NumPy array (`np.ndarray`), so that an array-typed column is a column of NumPy arrays (in this example, representing the embedding output of each image in the table). As with lists, arrays can be sliced in all the usual ways. ```python theme={null} t.select(t.clip[0], t.clip[5:10], t.clip[-3:]).head(5) ``` ### Ad hoc UDFs with `apply` We’ve now seen the most commonly encountered Pixeltable expression types. There are a few other less commonly encountered expressions that are occasionally useful. You can use `apply` to map any Python function onto a column of data. You can think of `apply` as a quick way of constructing an “on-the-fly” UDF for one-off use. ```python theme={null} import numpy as np t.select(t.clip.apply(np.ndarray.dumps, col_type=pxt.String)).head(2) ``` Note, however, that if the function you’re `apply`ing doesn’t have type hints (as in the example here), you’ll need to specify the output column type explicitly. ### Type Conversion with `astype` Sometimes it’s useful to transform an expression of one type into a different type. For example, you can use `astype` to turn an expression of type `pxt.Json` into one of type `pxt.String`. This assumes that the value being converted is actually a string; otherwise, you’ll get an exception. Here’s an example: ```python theme={null} # Select the text in position 0 of `t.classification.label_text`; since # `t.classification.label_text` has type `pxt.Json`, so does # `t.classification.label_text[0]` t.classification.label_text[0].col_type ```
Optional\[Json]```python theme={null} # Select the text in position 0 of `t.classification.label_text`, this time # cast as a `pxt.String` t.classification.label_text[0].astype(pxt.String).col_type ```
Optional\[String]### Column Properties Some `ColumnRef` expressions have additional useful properties. A media column (image, video, audio, or document) has the following two properties: * `localpath`: the media location on the local filesystem * `fileurl`: the original URL where the media resides (could be the same as `localpath`) ```python theme={null} t.select(t.image, t.image.localpath).head(5) ``` Any computed column will have two additional properties, `errortype` and `errormsg`. These properties will usually be `None`. However, if the computed column was created with `on_error='ignore'` and an exception was encountered during column execution, then the properties will contain additional information about the exception. To demonstrate this feature, we’re going to deliberately trigger an exception in a computed column. The images in our example table are black and white, meaning they have only one color channel. If we try to extract a channel other than channel number `0`, we’ll get an exception. Ordinarily when we call `add_computed_column`, the exception is raised and the `add_computed_column` operation is aborted. ```python theme={null} t.add_computed_column(channel=t.image.getchannel(1)) ```
Error: Error while evaluating computed column 'channel':
band index out of range
\[0;31m---------------------------------------------------------------------------\[0m
\[0;31mValueError\[0m Traceback (most recent call last)
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/exec/expr\_eval/evaluators.py:225\[0m, in \[0;36mFnCallEvaluator.eval\[0;34m(self, call\_args\_batch)\[0m
\[1;32m 224\[0m \[38;5;28;01mtry\[39;00m:
\[0;32m--> 225\[0m item\[38;5;241m.\[39mrow\[\[38;5;28mself\[39m\[38;5;241m.\[39mfn\_call\[38;5;241m.\[39mslot\_idx] \[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mscalar\_py\_fn\[49m\[43m(\[49m\[38;5;241;43m*\[39;49m\[43mitem\[49m\[38;5;241;43m.\[39;49m\[43margs\[49m\[43m,\[49m\[43m \[49m\[38;5;241;43m*\[39;49m\[38;5;241;43m\*\[39;49m\[43mitem\[49m\[38;5;241;43m.\[39;49m\[43mkwargs\[49m\[43m)\[49m
\[1;32m 226\[0m \[38;5;28;01mexcept\[39;00m \[38;5;167;01mException\[39;00m \[38;5;28;01mas\[39;00m exc:
File \[0;32m/opt/miniconda3/envs/pxt/lib/python3.10/site-packages/PIL/Image.py:2682\[0m, in \[0;36mImage.getchannel\[0;34m(self, channel)\[0m
\[1;32m 2680\[0m \[38;5;28;01mraise\[39;00m \[38;5;167;01mValueError\[39;00m(msg) \[38;5;28;01mfrom\[39;00m \[38;5;21;01me\[39;00m
\[0;32m-> 2682\[0m \[38;5;28;01mreturn\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_new(\[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mim\[49m\[38;5;241;43m.\[39;49m\[43mgetband\[49m\[43m(\[49m\[43mchannel\[49m\[43m)\[49m)
\[0;31mValueError\[0m: band index out of range
The above exception was the direct cause of the following exception:
\[0;31mError\[0m Traceback (most recent call last)
Cell \[0;32mIn\[27], line 1\[0m
\[0;32m----> 1\[0m \[43mt\[49m\[38;5;241;43m.\[39;49m\[43madd\_computed\_column\[49m\[43m(\[49m\[43mchannel\[49m\[38;5;241;43m=\[39;49m\[43mt\[49m\[38;5;241;43m.\[39;49m\[43mimage\[49m\[38;5;241;43m.\[39;49m\[43mgetchannel\[49m\[43m(\[49m\[38;5;241;43m1\[39;49m\[43m)\[49m\[43m)\[49m
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table.py:697\[0m, in \[0;36mTable.add\_computed\_column\[0;34m(self, stored, destination, print\_stats, on\_error, if\_exists, \*\*kwargs)\[0m
\[1;32m 695\[0m \[38;5;28mself\[39m\[38;5;241m.\[39m\_verify\_column(new\_col)
\[1;32m 696\[0m \[38;5;28;01massert\[39;00m \[38;5;28mself\[39m\[38;5;241m.\[39m\_tbl\_version \[38;5;129;01mis\[39;00m \[38;5;129;01mnot\[39;00m \[38;5;28;01mNone\[39;00m
\[0;32m--> 697\[0m result \[38;5;241m+\[39m\[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43m\_tbl\_version\[49m\[38;5;241;43m.\[39;49m\[43mget\[49m\[43m(\[49m\[43m)\[49m\[38;5;241;43m.\[39;49m\[43madd\_columns\[49m\[43m(\[49m\[43m\[\[49m\[43mnew\_col\[49m\[43m]\[49m\[43m,\[49m\[43m \[49m\[43mprint\_stats\[49m\[38;5;241;43m=\[39;49m\[43mprint\_stats\[49m\[43m,\[49m\[43m \[49m\[43mon\_error\[49m\[38;5;241;43m=\[39;49m\[43mon\_error\[49m\[43m)\[49m
\[1;32m 698\[0m FileCache\[38;5;241m.\[39mget()\[38;5;241m.\[39memit\_eviction\_warnings()
\[1;32m 699\[0m \[38;5;28;01mreturn\[39;00m result
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table\_version.py:666\[0m, in \[0;36mTableVersion.add\_columns\[0;34m(self, cols, print\_stats, on\_error)\[0m
\[1;32m 664\[0m all\_cols\[38;5;241m.\[39mappend(undo\_col)
\[1;32m 665\[0m \[38;5;66;03m# Add all columns\[39;00m
\[0;32m--> 666\[0m status \[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43m\_add\_columns\[49m\[43m(\[49m\[43mall\_cols\[49m\[43m,\[49m\[43m \[49m\[43mprint\_stats\[49m\[38;5;241;43m=\[39;49m\[43mprint\_stats\[49m\[43m,\[49m\[43m \[49m\[43mon\_error\[49m\[38;5;241;43m=\[39;49m\[43mon\_error\[49m\[43m)\[49m
\[1;32m 667\[0m \[38;5;66;03m# Create indices and their md records\[39;00m
\[1;32m 668\[0m \[38;5;28;01mfor\[39;00m col, (idx, val\_col, undo\_col) \[38;5;129;01min\[39;00m index\_cols\[38;5;241m.\[39mitems():
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/catalog/table\_version.py:732\[0m, in \[0;36mTableVersion.\_add\_columns\[0;34m(self, cols, print\_stats, on\_error)\[0m
\[1;32m 730\[0m plan\[38;5;241m.\[39mopen()
\[1;32m 731\[0m \[38;5;28;01mtry\[39;00m:
\[0;32m--> 732\[0m excs\_per\_col \[38;5;241m=\[39m \[38;5;28;43mself\[39;49m\[38;5;241;43m.\[39;49m\[43mstore\_tbl\[49m\[38;5;241;43m.\[39;49m\[43mload\_column\[49m\[43m(\[49m\[43mcol\[49m\[43m,\[49m\[43m \[49m\[43mplan\[49m\[43m,\[49m\[43m \[49m\[43mon\_error\[49m\[43m \[49m\[38;5;241;43m==\[39;49m\[43m \[49m\[38;5;124;43m'\[39;49m\[38;5;124;43mabort\[39;49m\[38;5;124;43m'\[39;49m\[43m)\[49m
\[1;32m 733\[0m \[38;5;28;01mexcept\[39;00m sql\_exc\[38;5;241m.\[39mDBAPIError \[38;5;28;01mas\[39;00m exc:
\[1;32m 734\[0m Catalog\[38;5;241m.\[39mget()\[38;5;241m.\[39mconvert\_sql\_exc(exc, \[38;5;28mself\[39m\[38;5;241m.\[39mid, \[38;5;28mself\[39m\[38;5;241m.\[39mhandle, convert\_db\_excs\[38;5;241m=\[39m\[38;5;28;01mTrue\[39;00m)
File \[0;32m\~/Dropbox/workspace/pixeltable/pixeltable/pixeltable/store.py:247\[0m, in \[0;36mStoreBase.load\_column\[0;34m(self, col, exec\_plan, abort\_on\_exc)\[0m
\[1;32m 245\[0m \[38;5;28;01mif\[39;00m abort\_on\_exc \[38;5;129;01mand\[39;00m row\[38;5;241m.\[39mhas\_exc():
\[1;32m 246\[0m exc \[38;5;241m=\[39m row\[38;5;241m.\[39mget\_first\_exc()
\[0;32m--> 247\[0m \[38;5;28;01mraise\[39;00m excs\[38;5;241m.\[39mError(\[38;5;124mf\[39m\[38;5;124m'\[39m\[38;5;124mError while evaluating computed column \[39m\[38;5;132;01m\{\[39;00mcol\[38;5;241m.\[39mname\[38;5;132;01m!r}\[39;00m\[38;5;124m:\[39m\[38;5;130;01m\n\[39;00m\[38;5;132;01m\{\[39;00mexc\[38;5;132;01m}\[39;00m\[38;5;124m'\[39m) \[38;5;28;01mfrom\[39;00m \[38;5;21;01mexc\[39;00m
\[1;32m 248\[0m table\_row, num\_row\_exc \[38;5;241m=\[39m row\_builder\[38;5;241m.\[39mcreate\_store\_table\_row(row, \[38;5;28;01mNone\[39;00m, row\[38;5;241m.\[39mpk)
\[1;32m 249\[0m num\_excs \[38;5;241m+\[39m\[38;5;241m=\[39m num\_row\_exc
\[0;31mError\[0m: Error while evaluating computed column 'channel':
band index out of range
But if we use `on_error='ignore'`, the exception will be logged in the
column properties instead.
```python theme={null}
t.add_computed_column(channel=t.image.getchannel(1), on_error='ignore')
```
Added 50 column values with 50 errors. 50 rows updated, 50 values computed, 50 exceptions.Notice that the update status informs us that there were 50 errors. If we query the table, we see that the column contains only `None` values, but the `errortype` and `errormsg` fields contain details of the error. ```python theme={null} t.select( t.image, t.channel, t.channel.errortype, t.channel.errormsg ).head(5) ``` More details on Pixeltable’s error handling can be found in the [External Files](/platform/external-files) guide. ## The Pixeltable Type System We’ve seen that every column and every expression in Pixeltable has an associated **Pixeltable type**. In this section, we’ll briefly survey the various Pixeltable types and their uses. Here are all the supported types and their corresponding Python types: The Python type is what you’ll get back if you query an expression of the given Pixeltable type. For `pxt.Json`, it can be any of `str`, `int`, `float`, `bool`, `list`, or `dict`.
pxt.Audio, pxt.Video,
and pxt.Document all correspond to the Python type
str. This is because those types are represented by file
paths that reference the media in question. When you query for, say,
t.select(t.video\_col), you’re guaranteed to get a file path
on the local filesystem (Pixeltable will download and cache a
local copy of the video if necessary to ensure this). If you want the
original URL, use t.video\_col.fileurl instead.
pxt.Image by
itself to mean “any image, without constraints”, but numerical arrays
must always specify a shape and a dtype; pxt.Array by
itself will raise an error.
(512,) or
(64,64,3). A None may be used in place of an
integer to indicate an unconstrained size for that dimension, as in
(None,None,3) (3-dimensional array with two unconstrained
dimensions), or simply (None,) (unconstrained 1-dimensional
array).
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata Created directory 'fundamentals'.Now let’s create our first table. To create a table, we must give it a name and a **schema** that describes the table structure. Note that prefacing the name with `fundamentals` causes it to be placed in our newly-created directory. ```python theme={null} films_t = pxt.create_table( 'fundamentals/films', {'film_name': pxt.String, 'year': pxt.Int, 'revenue': pxt.Float}, ) ```
Created table 'films'.To insert data into a table, we use the `insert()` method, passing it a list of Python dicts. ```python theme={null} films_t.insert( [ {'film_name': 'Jurassic Park', 'year': 1993, 'revenue': 1037.5}, {'film_name': 'Titanic', 'year': 1997, 'revenue': 2257.8}, { 'film_name': 'Avengers: Endgame', 'year': 2019, 'revenue': 2797.5, }, ] ) ```
Inserting rows into \`films\`: 3 rows \[00:00, 572.84 rows/s] Inserted 3 rows with 0 errors. 3 rows inserted, 3 values computed.If you’re inserting just a single row, you can use an alternate syntax that is sometimes more convenient. ```python theme={null} films_t.insert( [{'film_name': 'Inside Out 2', 'year': 2024, 'revenue': 1462.7}] ) ```
Inserting rows into \`films\`: 1 rows \[00:00, 318.76 rows/s] Inserted 1 row with 0 errors. 1 row inserted, 1 value computed.We can peek at the data in our table with the `collect()` method, which retrieves all the rows in the table. ```python theme={null} films_t.collect() ``` Pixeltable also provides `update()` and `delete()` methods for modifying and removing data from a table; we’ll see examples of them shortly. ### Filtering and Selecting Data Often you want to select only certain rows and/or certain columns in a table. You can do this with the `where()` and `select()` methods. ```python theme={null} films_t.where(films_t.revenue >= 2000.0).collect() ``` ```python theme={null} films_t.select(films_t.film_name, films_t.year).collect() ``` Note the expressions that appear inside the calls to `where()` and `select()`, such as `films_t.year`. These are **column references** that point to specific columns within a table. In place of `films_t.year`, you can also use dictionary syntax and type `films_t['year']`, which means exactly the same thing but is sometimes more convenient. ```python theme={null} films_t.select(films_t['film_name'], films_t['year']).collect() ``` In addition to selecting columns directly, you can use column references inside various kinds of expressions. For example, our `revenue` numbers are given in millions of dollars. Let’s say we wanted to select revenue in thousands of dollars instead; we could do that as follows: ```python theme={null} films_t.select(films_t.film_name, films_t.revenue * 1000).collect() ``` Note that since we selected an abstract expression rather than a specific column, Pixeltable gave it the generic name `col_1`. You can assign it a more informative name with Python keyword syntax: ```python theme={null} films_t.select( films_t.film_name, revenue_thousands=films_t.revenue * 1000 ).collect() ``` ### Tables are Persistent This is a good time to mention a few key differences between Pixeltable tables and other familiar datastructures, such as Python dicts or Pandas dataframes. First, **Pixeltable is persistent. Unlike in-memory Python libraries such as Pandas, Pixeltable is a database**. When you reset a notebook kernel or start a new Python session, you’ll have access to all the data you’ve stored previously in Pixeltable. Let’s demonstrate this by using the IPython `%reset -f` command to clear out all our notebook variables, so that `films_t` is no longer defined. ```python theme={null} %reset -f films_t.collect() # Throws an exception now ```
NameError: name 'films\_t' is not defined \[0;31m---------------------------------------------------------------------------\[0m \[0;31mNameError\[0m Traceback (most recent call last) Cell \[0;32mIn\[11], line 2\[0m \[1;32m 1\[0m get\_ipython()\[38;5;241m.\[39mrun\_line\_magic(\[38;5;124m'\[39m\[38;5;124mreset\[39m\[38;5;124m'\[39m, \[38;5;124m'\[39m\[38;5;124m-f\[39m\[38;5;124m'\[39m) \[0;32m----> 2\[0m \[43mfilms\_t\[49m\[38;5;241m.\[39mcollect() \[38;5;66;03m# Throws an exception now\[39;00m \[0;31mNameError\[0m: name 'films\_t' is not definedThe `films_t` variable (along with all other variables in our Python session) has been cleared out - but that’s ok, because it wasn’t the source of record for our data. The `films_t` variable is just a reference to the underlying database table. We can recover it with the `get_table` command, referencing the `films` table by name. ```python theme={null} import pixeltable as pxt films_t = pxt.get_table('fundamentals/films') films_t.collect() ``` You can always get a list of existing tables with the Pixeltable `pxt.ls()` command. Let’s use it to see the contents of the `fundamentals` directory. ```python theme={null} pxt.ls(path='fundamentals') ```
String,
Int, and Float, Pixeltable provides several
additional data types:
Bool, whose values are True or
False;Array for
numerical arrays;Json, for
lists or dicts that correspond to valid JSON structures;
andImage, Video, Audio, and
Document.Created table 'earthquakes'. Inserting rows into \`earthquakes\`: 1823 rows \[00:00, 19554.24 rows/s] Inserted 1823 rows with 0 errors.
http\:// URL, but it can
also be an s3:// URL referencing an S3 bucket.
create\_table() function
with the source parameter can import data from various
formats including CSV, Excel, and Hugging Face datasets. You can also
use source to import from a Pandas dataframe. For more
details, see the
pixeltable.io
package reference.
head(n) and
limit(n).collect() appear similar in this example. But
head() always returns the earliest rows in a table,
whereas limit() makes no promises about the ordering of its
results (unless you specify an order\_by() clause - more on
this below).
1823```python theme={null} # 5 highest-magnitude earthquakes eq_t.order_by(eq_t.magnitude, asc=False).limit(5).collect() ``` ```python theme={null} from datetime import datetime # 5 highest-magnitude earthquakes in Q3 2023 eq_t.where( (eq_t.timestamp >= datetime(2023, 6, 1)) & (eq_t.timestamp < datetime(2023, 10, 1)) ).order_by(eq_t.magnitude, asc=False).limit(5).collect() ``` Note that Pixeltable uses Pandas-like operators for filtering data: the expression ```python theme={null} (eq_t.timestamp >= datetime(2023, 6, 1)) & (eq_t.timestamp < datetime(2023, 10, 1)) ``` means *both* conditions must be true; similarly (say), ```python theme={null} (eq_t.timestamp < datetime(2023, 6, 1)) | (eq_t.timestamp >= datetime(2023, 10, 1)) ``` would mean *either* condition must be true. You can also use the special `isin` operator to select just those values that appear within a particular list: ```python theme={null} # Earthquakes with specific ids eq_t.where(eq_t.id.isin([123, 456, 789])).collect() ``` In addition to basic operators like `>=` and `isin`, a Pixeltable `where` clause can also contain more complex operations. For example, the `location` column in our dataset is a string that contains a lot of information, but in a relatively unstructured way. Suppose we wanted to see all Earthquakes in the vicinity of Rainier, Washington; one way to do this is with the `contains()` method: ```python theme={null} # All earthquakes in the vicinity of Rainier eq_t.where(eq_t.location.contains('Rainier')).collect() ``` Pixeltable also supports various **aggregators**; here’s an example showcasing two fairly simple ones, `max()` and `min()`: ```python theme={null} # Min and max ids eq_t.select( min=pxt.functions.min(eq_t.id), max=pxt.functions.max(eq_t.id) ).collect() ``` To learn more about Pixeltable functions and expressions, see the [Computed Columns](/tutorials/computed-columns) guide. They’re also exhaustively documented in the [Pixeltable SDK Documentation](/sdk/latest). ### Extracting Data from Tables into Python/Pandas Sometimes it’s handy to pull out data from a table into a Python object. We’ve actually already done this; the call to `collect()` returns an in-memory result set, which we can then dereference in various ways. For example: ```python theme={null} result = eq_t.limit(5).collect() result[0] # Get the first row of the results as a dict ```
\{'id': 0,
'magnitude': 1.15,
'location': '10 km NW of Belfair, Washington',
'timestamp': datetime.datetime(2023, 1, 1, 8, 10, 37, 50000, tzinfo=zoneinfo.ZoneInfo(key='America/Los\_Angeles')),
'longitude': -122.93,
'latitude': 47.51}
```python theme={null}
result[
'timestamp'
] # Get a list of the `timestamp` field of all the rows that were queried
```
\[datetime.datetime(2023, 1, 1, 8, 10, 37, 50000, tzinfo=zoneinfo.ZoneInfo(key='America/Los\_Angeles')), datetime.datetime(2023, 1, 2, 1, 2, 43, 950000, tzinfo=zoneinfo.ZoneInfo(key='America/Los\_Angeles')), datetime.datetime(2023, 1, 2, 12, 5, 1, 420000, tzinfo=zoneinfo.ZoneInfo(key='America/Los\_Angeles')), datetime.datetime(2023, 1, 2, 12, 45, 14, 220000, tzinfo=zoneinfo.ZoneInfo(key='America/Los\_Angeles')), datetime.datetime(2023, 1, 2, 13, 19, 27, 200000, tzinfo=zoneinfo.ZoneInfo(key='America/Los\_Angeles'))]```python theme={null} df = result.to_pandas() # Convert the result set into a Pandas dataframe df['magnitude'].describe() ```
count 5.000000 mean 0.744000 std 0.587988 min 0.200000 25% 0.290000 50% 0.520000 75% 1.150000 max 1.560000 Name: magnitude, dtype: float64`collect()` without a preceding `limit()` returns the entire contents of a query or table. Be careful! For very large tables, this could result in out-of-memory errors. In this example, the 1823 rows in the table fit comfortably into a dataframe. ```python theme={null} df = eq_t.collect().to_pandas() df['magnitude'].describe() ```
count 1823.000000 mean 0.900378 std 0.625492 min -0.830000 25% 0.420000 50% 0.850000 75% 1.310000 max 4.300000 Name: magnitude, dtype: float64### Adding Columns Like other database tables, Pixeltable tables aren’t fixed entities: they’re meant to evolve over time. Suppose we want to add a new column to hold user-specified comments about particular earthquake events. We can do this with the `add_column()` method: ```python theme={null} eq_t.add_column(note=pxt.String) ```
Added 1823 column values with 0 errors. 1823 rows updated, 1823 values computed.Here, `note` is the column name, and `pxt.String` specifies the type of the new column. ```python theme={null} eq_t.add_column(contact_email=pxt.String) ```
Added 1823 column values with 0 errors. 1823 rows updated, 1823 values computed.Let’s have a look at the revised schema. ```python theme={null} eq_t.describe() ``` ### Updating Rows in a Table Table rows can be modified and deleted with the SQL-like `update()` and `delete()` commands. ```python theme={null} # Add a comment to records with IDs 123 and 127 ( eq_t.where(eq_t.id.isin([121, 123])).update( { 'note': 'Still investigating.', 'contact_email': 'contact@pixeltable.com', } ) ) ```
Inserting rows into \`earthquakes\`: 2 rows \[00:00, 366.84 rows/s] 2 rows updated, 4 values computed.```python theme={null} eq_t.where(eq_t.id >= 120).select( eq_t.id, eq_t.magnitude, eq_t.note, eq_t.contact_email ).head(5) ``` `update()` can also accept an expression, rather than a constant value. For example, suppose we wanted to shorten the location strings by replacing every occurrence of `Washington` with `WA`. One way to do this is with an `update()` clause, using a Pixeltable expression with the `replace()` method. ```python theme={null} eq_t.update({'location': eq_t.location.replace('Washington', 'WA')}) ```
Inserting rows into \`earthquakes\`: 1823 rows \[00:00, 21494.07 rows/s] 1823 rows updated, 1823 values computed.```python theme={null} eq_t.head(5) ``` Notice that in all cases, the `update()` clause takes a Python dictionary, but its values can be either constants such as `'contact@pixeltable.com'`, or more complex expressions such as `eq_t.location.replace('Washington', 'WA')`. Also notice that if `update()` appears without a `where()` clause, then every row in the table will be updated, as in the preceding example. ### Batch Updates The `batch_update()` method provides an alternative way to update multiple rows with different values. With a `batch_update()`, the contents of each row are specified by individual `dict`s, rather than according to a formula. Here’s a toy example that shows `batch_update()` in action. ```python theme={null} updates = [ {'id': 500, 'note': 'This is an example note.'}, {'id': 501, 'note': 'This is a different note.'}, {'id': 502, 'note': 'A third note, unrelated to the others.'}, ] eq_t.batch_update(updates) ```
Inserting rows into \`earthquakes\`: 3 rows \[00:00, 984.58 rows/s] 3 rows updated, 3 values computed.```python theme={null} eq_t.where(eq_t.id >= 500).select( eq_t.id, eq_t.magnitude, eq_t.note, eq_t.contact_email ).head(5) ``` ### Deleting Rows To delete rows from a table, use the `delete()` method. ```python theme={null} # Delete all rows in 2024 eq_t.where(eq_t.timestamp >= datetime(2024, 1, 1)).delete() ```
587 rows deleted.```python theme={null} eq_t.count() # How many are left after deleting? ```
1236Don’t forget to specify a `where()` clause when using `delete()`! If you run `delete()` without a `where()` clause, the entire contents of the table will be deleted. ```python theme={null} eq_t.delete() ```
1236 rows deleted.```python theme={null} eq_t.count() ```
0### Table Versioning Every table in Pixeltable is versioned: some or all of its modification history is preserved. We’ve seen a reference to this already; `pxt.ls()` will show the most recent version along with each table it lists. ```python theme={null} pxt.ls('fundamentals') ``` To see the version history of a particular table: ```python theme={null} eq_t.history() ``` If you ever make a mistake, you can always call `revert()` to undo the most recent change to a table and roll back to the previous version. Let’s try it out: we’ll use it to revert the successive `delete()` calls that we just executed. ```python theme={null} eq_t.revert() ``` ```python theme={null} eq_t.count() ```
1236```python theme={null} eq_t.revert() ``` ```python theme={null} eq_t.count() ```
1823
revert() cannot
be undone!
Added 1823 column values with 0 errors.```python theme={null} # Update the row with id == 1002, adding an image to the `map_image` column eq_t.where(eq_t.id == 1002).update( { 'map_image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/port-townsend-map.jpeg' } ) ```
Inserting rows into \`earthquakes\`: 1 rows \[00:00, 192.79 rows/s] 1 row updated, 1 value computed.
Created directory 'fundamentals/subdir'. Created directory 'fundamentals/subdir/subsubdir'. Created table 'my\_table'.### Deleting Columns, Tables, and Directories `drop_column()`, `drop_table()`, and `drop_dir()` are used to delete columns, tables, and directories, respectively. ```python theme={null} # Delete the `contact_email` column eq_t.drop_column('contact_email') ``` ```python theme={null} eq_t.describe() ``` ```python theme={null} # Delete the entire table (cannot be reverted!) pxt.drop_table('fundamentals/earthquakes') ``` ```python theme={null} # Delete the entire directory and all its contents, including any nested # subdirectories (cannot be reverted) pxt.drop_dir('fundamentals', force=True) ``` ## Next Steps Learn more about working with Pixeltable: * [Computed Columns](/tutorials/computed-columns) * [Queries and Expressions](/tutorials/queries-and-expressions) # Agents & MCP Source: https://docs.pixeltable.com/use-cases/agents-mcp Build AI agents with tool calling, persistent memory, and MCP server integration **Who:** Agent Builders, AI Engineers\ **Output:** Autonomous AI agents with memory and tool use Build AI agents that can call tools, remember context, and integrate with MCP servers—all backed by Pixeltable's persistent storage and orchestration.