Bringing Data
A comprehensive guide to inserting, referencing, and importing data in Pixeltable
Working with Data in Pixeltable
Pixeltable provides a unified interface for working with diverse data types - from structured tables to unstructured media files. This guide covers everything you need to know about bringing your data into Pixeltable.
Direct Import with Table Creation and Insertion
Pixeltable supports importing data directly during table creation and insertion operations. This streamlines the process of loading data into Pixeltable tables from external sources.
Pixeltable supports importing from a variety of data sources:
- CSV files (
.csv
) - Excel files (
.xls
,.xlsx
) - Parquet files (
.parquet
,.pq
,.parq
) - JSON files (
.json
) - Pandas DataFrames
- Pixeltable DataFrames
- Hugging Face datasets
- Row data structures or Iterators
Creating Tables from External Sources
You can create a table directly from an external data source using the source
parameter in the create_table
function. Pixeltable will automatically infer the schema from the source data.
You can also provide schema overrides for more control:
Inserting Data from External Sources
You can also insert data from external sources into existing tables using the insert
method:
Supported Data Types & Formats
Basic Types (Int, Float, Bool, String, Timestamp)
Basic Types (Int, Float, Bool, String, Timestamp)
Array Type
Array Type
JSON Type
JSON Type
Image Type
Image Type
Video Type
Video Type
Audio Type
Audio Type
Document Type
Document Type
Import Functions (Alternative Approach)
CSV Import
CSV Import
Excel Import
Excel Import
Parquet Import
Parquet Import
JSON Import
JSON Import
Hugging Face Dataset Import
Hugging Face Dataset Import
Key Points
- All media types (Image, Video, Audio, Document) support local files, URLs, and cloud storage paths
- Array types require explicit shape and dtype specifications
- JSON type can store any valid JSON data structure
- Basic types (Int, Float, Bool, String, Timestamp) match their Python equivalents
- Import functions support schema overrides to ensure correct type assignment
- Use batch inserts for better performance when adding multiple rows
- Cloud storage paths (s3://) require appropriate credentials to be configured
- Tables can be created directly from CSV, Excel, Parquet files, and pandas DataFrames using the
source
parameter - Existing tables can import data directly from external sources using the
insert
method - Schema inference is automatic when importing from external sources, with optional schema overrides