Skip to main content
Open in Kaggle  Open in Colab  Download Notebook
This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.
Load data from CSV and Excel files into Pixeltable tables for processing and analysis.

Problem

You have data in CSV or Excel files that you want to process with AI models, add computed columns to, or combine with other data sources.

Solution

What’s in this recipe:
  • Import CSV files directly into tables
  • Import from Pandas DataFrames
  • Handle different data types
You use pxt.create_table() with a source parameter to create a table from a CSV file, or insert DataFrame rows into an existing table.

Setup

%pip install -qU pixeltable pandas
import pixeltable as pxt
import pandas as pd
# Create a fresh directory
pxt.drop_dir('import_demo', force=True)
pxt.create_dir('import_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘import_demo’.
<pixeltable.catalog.dir.Dir at 0x141eca110>

Import CSV directly

Use create_table with source to create a table from a CSV file:
# Import CSV from URL
csv_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/world-population-data.csv'

population = pxt.create_table(
    'import_demo.population',
    source=csv_url
)
Created table ‘population’.Inserting rows into `population`: 0 rows [00:00, ? rows/s]
Inserting rows into `population`: 234 rows [00:00, 9032.63 rows/s]
Inserted 234 rows with 0 errors.
# View the imported data
population.head(5)

Import from Pandas DataFrame

You can also create a DataFrame first and insert it:
# Create a DataFrame
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['NYC', 'LA', 'Chicago']
})

# Create table and insert DataFrame
users = pxt.create_table('import_demo.users', {
    'name': pxt.String,
    'age': pxt.Int,
    'city': pxt.String
})
users.insert(df)
Created table ‘users’.Inserting rows into `users`: 0 rows [00:00, ? rows/s]
Inserting rows into `users`: 3 rows [00:00, 923.31 rows/s]
Inserted 3 rows with 0 errors.
3 rows inserted, 6 values computed.
# View the data
users.collect()

Explanation

Source types supported:
Type inference: Pixeltable automatically infers column types from CSV data. You can override types using schema_overrides. Large files: For very large CSV files, consider:
  • Using create_table(source=...) which streams data
  • Importing in batches if memory is limited

See also