Bringing Data
A comprehensive guide to inserting, referencing, and importing data in Pixeltable
Working with Data in Pixeltable
Pixeltable provides a unified interface for working with diverse data types - from structured tables to unstructured media files. This guide covers everything you need to know about bringing your data into Pixeltable.
Direct Import with Table Creation and Insertion
Pixeltable supports importing data directly during table creation and insertion operations. This streamlines the process of loading data into Pixeltable tables from external sources.
Pixeltable supports importing from a variety of data sources:
- CSV files (
.csv
) - Excel files (
.xls
,.xlsx
) - Parquet files (
.parquet
,.pq
,.parq
) - JSON files (
.json
) - Pandas DataFrames
- Pixeltable DataFrames
- Hugging Face datasets
- Row data structures or Iterators
Creating Tables from External Sources
You can create a table directly from an external data source using the source
parameter in the create_table
function. Pixeltable will automatically infer the schema from the source data.
You can also provide schema overrides for more control:
Inserting Data from External Sources
You can also insert data from external sources into existing tables using the insert
method:
Supported Data Types & Formats
Import Functions (Alternative Approach)
Key Points
- All media types (Image, Video, Audio, Document) support local files, URLs, and cloud storage paths
- Array types require explicit shape and dtype specifications
- JSON type can store any valid JSON data structure
- Basic types (Int, Float, Bool, String, Timestamp) match their Python equivalents
- Import functions support schema overrides to ensure correct type assignment
- Use batch inserts for better performance when adding multiple rows
- Cloud storage paths (s3://) require appropriate credentials to be configured
- Tables can be created directly from CSV, Excel, Parquet files, and pandas DataFrames using the
source
parameter - Existing tables can import data directly from external sources using the
insert
method - Schema inference is automatic when importing from external sources, with optional schema overrides