Import data from Parquet files

This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.

Load columnar data from Parquet files into Pixeltable tables for processing and analysis.

Problem

You have data stored in Parquet format—a common format for analytics, data lakes, and ML pipelines. You need to load this data for processing with AI models or combining with other data sources.

Solution

What’s in this recipe:

Import Parquet files directly into tables
Export tables to Parquet for external tools
Handle schema type overrides

You use pxt.create_table() with a source parameter to create a table from a Parquet file. Pixeltable infers column types from the Parquet schema automatically.

Setup

%pip install -qU pixeltable pyarrow pandas

import pandas as pd
import pixeltable as pxt
import tempfile
from pathlib import Path

Create sample Parquet file

First, create a sample Parquet file to demonstrate the import process:

# Create sample data
sample_data = pd.DataFrame(
    {
        'product_id': [1, 2, 3, 4, 5],
        'name': [
            'Widget A',
            'Widget B',
            'Gadget X',
            'Gadget Y',
            'Tool Z',
        ],
        'price': [29.99, 39.99, 149.99, 199.99, 79.99],
        'category': ['widgets', 'widgets', 'gadgets', 'gadgets', 'tools'],
        'in_stock': [True, False, True, True, False],
    }
)

# Save to temporary Parquet file
temp_dir = tempfile.mkdtemp()
parquet_path = Path(temp_dir) / 'products.parquet'
sample_data.to_parquet(parquet_path, index=False)
sample_data

Import Parquet file

Use create_table with the source parameter to create a table directly from the Parquet file:

# Create a fresh directory
pxt.drop_dir('parquet_demo', force=True)
pxt.create_dir('parquet_demo')

Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘parquet_demo’.
<pixeltable.catalog.dir.Dir at 0x17f0ca920>

# Import Parquet file into a new table
products = pxt.create_table(
    'parquet_demo/products', source=str(parquet_path)
)

Created table ‘products’.Inserting rows into `products`: 0 rows [00:00, ? rows/s]
Inserting rows into `products`: 5 rows [00:00, 653.18 rows/s]
Inserted 5 rows with 0 errors.

# View imported data
products.collect()

Add computed columns

Once imported, you can add computed columns like any other Pixeltable table:

# Add a computed column for discounted price
products.add_computed_column(sale_price=products.price * 0.9)

Added 5 column values with 0 errors.
5 rows updated, 10 values computed.

# View with computed column
products.select(
    products.name, products.price, products.sale_price
).collect()

Import with primary key

Specify a primary key when you need upsert behavior or unique constraints:

# Import with a primary key
products_pk = pxt.create_table(
    'parquet_demo/products_with_pk',
    source=str(parquet_path),
    primary_key='product_id',
)

Created table ‘products_with_pk’.Inserting rows into `products_with_pk`: 0 rows [00:00, ? rows/s]
Inserting rows into `products_with_pk`: 5 rows [00:00, 1548.97 rows/s]
Inserted 5 rows with 0 errors.

# View the table
products_pk.collect()

Export table to Parquet

Export your processed data back to Parquet for use with other toolee

# Export to Parquet (note: image columns require inline_images=True)
export_path = Path(temp_dir) / 'exported_products'

pxt.io.export_parquet(
    products.select(products.name, products.price, products.sale_price),
    parquet_path=export_path,
)

# Verify export by reading back
import pyarrow.parquet as pq

exported_table = pq.read_table(export_path)
exported_table.to_pandas()

Explanation

When to use Parquet import:

Key features:

Automatic schema inference from Parquet metadata
Support for partitioned datasets (directory of files)
Export with pxt.io.export_parquet for interoperability
Primary key support for upsert workflows

Welcome to Pixeltable

Core Concepts

How-To

Problem

Solution

Setup

Create sample Parquet file

Import Parquet file

Add computed columns

Import with primary key

Export table to Parquet

Explanation

See also

Welcome to Pixeltable

Core Concepts

How-To

​Problem

​Solution

​Setup

​Create sample Parquet file

​Import Parquet file

​Add computed columns

​Import with primary key

​Export table to Parquet

​Explanation

​See also

Problem

Solution

Setup

Create sample Parquet file

Import Parquet file

Add computed columns

Import with primary key

Export table to Parquet

Explanation

See also