Skip to main content
Open in Kaggle  Open in Colab  Download Notebook
This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.
Load data from Excel spreadsheets (.xlsx) into Pixeltable tables.

Problem

You have data in Excel format that needs to be loaded for AI processing—reports, inventory lists, or business data exported from other systems.

Solution

What’s in this recipe:
  • Import Excel files directly into tables
  • Handle multiple sheets
  • Override column types when needed
You use pxt.create_table() with an Excel file path as the source parameter. Pixeltable infers column types automatically.

Setup

%pip install -qU pixeltable openpyxl pandas
import pixeltable as pxt
import pandas as pd
import tempfile
from pathlib import Path

Create sample Excel file

# Create sample Excel file for demo
sample_data = pd.DataFrame({
    'order_id': [1001, 1002, 1003, 1004, 1005],
    'customer': ['Alice', 'Bob', 'Carol', 'Dave', 'Eve'],
    'product': ['Widget A', 'Gadget B', 'Widget A', 'Tool C', 'Gadget B'],
    'quantity': [2, 1, 5, 3, 2],
    'price': [29.99, 149.99, 29.99, 79.99, 149.99],
    'date': ['2024-01-15', '2024-01-16', '2024-01-16', '2024-01-17', '2024-01-18']
})

# Save to temp Excel file
temp_dir = tempfile.mkdtemp()
excel_path = Path(temp_dir) / 'orders.xlsx'
sample_data.to_excel(excel_path, index=False)
sample_data

Import Excel file

# Create a fresh directory
pxt.drop_dir('excel_demo', force=True)
pxt.create_dir('excel_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘excel_demo’.
<pixeltable.catalog.dir.Dir at 0x30b6cb730>
# Import Excel file directly
orders = pxt.create_table(
    'excel_demo.orders',
    source=str(excel_path),
    source_format='excel'  # Hint for Excel format
)
Created table ‘orders’.Inserting rows into `orders`: 0 rows [00:00, ? rows/s]
Inserting rows into `orders`: 5 rows [00:00, 501.21 rows/s]
Inserted 5 rows with 0 errors.
# View imported data
orders.collect()

Add computed columns

# Add computed column for order total
orders.add_computed_column(
    total=orders.quantity * orders.price
)
Added 5 column values with 0 errors.
5 rows updated, 10 values computed.
# View with computed total
orders.select(
    orders.order_id,
    orders.customer,
    orders.product,
    orders.quantity,
    orders.price,
    orders.total
).collect()

Explanation

Import methods:
Excel-specific options: Pass Pandas read_excel arguments via extra_args:
pxt.create_table(
    'table_name',
    source='data.xlsx',
    source_format='excel',
    extra_args={'sheet_name': 'Sheet2', 'skiprows': 1}
)
Common extra_args:

See also