Overview
Pixeltable unifies data storage, versioning, and indexing with orchestration and model versioning under a declarative table interface, with transformations, model inference, and custom logic represented as computed columns.
Data, models, and orchestration in a unified declarative interface
Pixeltable is a Python library providing a declarative interface for multimodal data (text, images, audio, video). It features built-in versioning, lineage tracking, and incremental updates, enabling users to store, transform, index, and iterate on data for their ML workflows.
๐พ Installation
Pixeltable is persistent. Unlike in-memory Python libraries such as Pandas, Pixeltable is a database.
Python 3.9, 3.10, 3.11, or 3.12 running on Linux, MacOS, or Windows are supported
pip install pixeltable
See the Getting Started with Pixeltable guide for more detailed installation instructions. In these tutorials, we'll see how to create tables, populate them with data, and enhance them with built-in and user-defined transformations and AI operations.
- User-Defined Functions (UDFs)
- Comparing Object Detection Models
- Experimenting with Chunking (RAG)
- Working with External Files
- Document Indexing and RAG
- Transcribing and Indexing Audio and Video
Why should you use Pixeltable?
- It gives you transparency and reproducibility
- All generated data is automatically recorded and versioned
- You will never need to re-run a workload because you lost track of the input data
- It saves you money
- All data changes are automatically incremental
- You never need to re-run pipelines from scratch because youโre adding data
- It integrates with any existing Python code or libraries
- Bring your ever-changing code and workloads
- You choose the models, tools, and AI practices (e.g., your embedding model for a vector index); Pixeltable orchestrates the data
Examples of Specific Data Problems Pixeltable Addresses
Pixeltable Use Case | Added Value |
---|---|
Interact with video data at the frame level | Eliminates the need for manual frame extraction, storage management, and worry about storage overload. |
Augment data incrementally and interactively | Simplifies data augmentation through built-in functions and custom functions (UDFs) without needing complex data pipelines or manual incremental updates. |
Interact with diverse data types (video, images, documents, etc.) | Provides a unified interface for interacting with various data types in a dataframe-style, streamlining data manipulation and analysis across different modalities. |
Perform text and image similarity search at the video frame level | Enables advanced search capabilities within video data without requiring frame storage. |
Access Pixeltable data as a PyTorch and COCO dataset | Offers direct integration with PyTorch for seamless data loading and preprocessing in machine learning workflows. |
Rely on versioning and snapshot functionality | Ensures data reproducibility and safeguards against unintended changes or errors through built-in versioning and snapshotting capabilities. |
Examples of High-Level Use Cases
Computer Vision
- Object Detection and Classification: Efficiently manage massive image and video datasets, pre-labeling them within Pixeltable and seamlessly integrating with external labeling tools to keep our all curated datasets, features, and labels in sync.
- Text and Image Search: Build datasets and multiple embedding indices to create searchable image catalog for products, enabling users to find similar items based on image similarity. Track the impact of different feature engineering and embedding choices on search performance.
- Image Analysis: Pixeltable's lineage tracking ensures traceability and reproducibility, crucial for regulatory compliance.
- Defect Detection in Model Performance: Reference images from your production pipeline directly in Pixeltable. Apply transformations and train models to quickly identify and classify defects.
Natural Language Processing (NLP)
- Sentiment Analysis Import or reference text data from social media, survey responses, etc. Experiment with different preprocessing and modeling approaches. Track simply within Pixeltable's table structure how changes in terms of datasets, models, and logic impact your performance metrics
- Text Summarization: Load research papers, articles, and documents into Pixeltable. Utilize computed columns for text summarization, comparing the performance of various techniques. Leverage lineage to understand the reasoning behind each summary.
- Entity Recognition: Extract critical information from financial statements, contracts, or news articles. Track the performance of your models on different document types. Use Pixeltable to create a centralized repository for extracted entities.
Retrieval Augmented Generation (RAG)
- Knowledge Base Q&A Systems: Integrate Pixeltable with your company's documentation or knowledge base articles. Build a powerful database that can power chatbots and answer questions accurately and with explainability, thanks to Pixeltable's data operation tracing and built-in lineage.
- Content Generation: Automate marketing copy, product descriptions, or social media posts. Pixeltable allows experimenting with different prompts and LLM parameters while tracking the impact of each variation, simply using multiple computed columns and views.
- Code Generation: Use Pixeltable to build a RAG system that leverages your codebase to generate code snippets, summaries, or explanations on-demand.
Contributions & Feedback
Are you experiencing issues or bugs with Pixeltable? File an Issue.
Do you want to contribute? Feel free to open a PR.
๐๏ธ License
This library is licensed under the Apache 2.0 License.
Updated 5 days ago