Import data from JSON files

This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.

Load structured data from JSON files into Pixeltable tables for processing and analysis.

Problem

You have data in JSON format—from APIs, exports, or application logs. You need to load this data for processing with AI models or combining with other data sources.

Solution

What’s in this recipe:

Import JSON files directly into tables
Import from URLs (APIs, remote files)
Handle nested JSON structures

You use pxt.create_table() with a source parameter to create a table from a JSON file or URL. The JSON must be an array of objects, where each object becomes a row.

Setup

%pip install -qU pixeltable

import json
import pixeltable as pxt
import tempfile
from pathlib import Path

Create sample JSON file

First, create a sample JSON file to demonstrate the import process:

# Create sample JSON data (array of objects)
sample_data = [
    {
        'id': 1,
        'title': 'Introduction to ML',
        'author': 'Alice',
        'tags': ['ml', 'intro'],
        'rating': 4.5,
    },
    {
        'id': 2,
        'title': 'Deep Learning Basics',
        'author': 'Bob',
        'tags': ['dl', 'neural'],
        'rating': 4.8,
    },
    {
        'id': 3,
        'title': 'NLP Fundamentals',
        'author': 'Carol',
        'tags': ['nlp', 'text'],
        'rating': 4.2,
    },
    {
        'id': 4,
        'title': 'Computer Vision',
        'author': 'Dave',
        'tags': ['cv', 'images'],
        'rating': 4.6,
    },
    {
        'id': 5,
        'title': 'Reinforcement Learning',
        'author': 'Eve',
        'tags': ['rl', 'agents'],
        'rating': 4.3,
    },
]

# Save to temporary JSON file
temp_dir = tempfile.mkdtemp()
json_path = Path(temp_dir) / 'articles.json'

with open(json_path, 'w') as f:
    json.dump(sample_data, f, indent=2)

Import JSON file

Use create_table with source to create a table directly from a JSON file:

# Create a fresh directory
pxt.drop_dir('json_demo', force=True)
pxt.create_dir('json_demo')

Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘json_demo’.
<pixeltable.catalog.dir.Dir at 0x1556b2800>

# Import JSON file into a new table
articles = pxt.create_table(
    'json_demo/articles',
    source=str(json_path),
    source_format='json',  # Explicitly specify format when using local file paths
)

Created table ‘articles’.Inserting rows into `articles`: 0 rows [00:00, ? rows/s]
Inserting rows into `articles`: 5 rows [00:00, 538.52 rows/s]
Inserted 5 rows with 0 errors.

# View imported data
articles.collect()

Import from URL

You can import JSON directly from a URL—useful for APIs and remote data:

# Import from a public JSON URL
# Using JSONPlaceholder API as an example
posts = pxt.create_table(
    'json_demo/posts',
    source='https://jsonplaceholder.typicode.com/posts',
    source_format='json',  # Required for URL sources
)

Created table ‘posts’.Inserting rows into `posts`: 0 rows [00:00, ? rows/s]
Inserting rows into `posts`: 100 rows [00:00, 15623.57 rows/s]
Inserted 100 rows with 0 errors.

# View first few rows
posts.head(5)

Import from Python dictionaries

Use create_table with a list of dictionaries as source—useful when you have data in memory:

# Import from a list of dictionaries
events = [
    {
        'event': 'page_view',
        'user_id': 101,
        'timestamp': '2024-01-15T10:30:00',
    },
    {
        'event': 'click',
        'user_id': 101,
        'timestamp': '2024-01-15T10:31:00',
    },
    {
        'event': 'purchase',
        'user_id': 102,
        'timestamp': '2024-01-15T10:32:00',
    },
]

event_table = pxt.create_table('json_demo/events', source=events)

Created table ‘events’.Inserting rows into `events`: 0 rows [00:00, ? rows/s]
Inserting rows into `events`: 3 rows [00:00, 988.06 rows/s]
Inserted 3 rows with 0 errors.

# View imported events
event_table.collect()

Add computed columns

Once imported, you can enrich the data with computed columns:

# Add a computed column combining title and author
articles.add_computed_column(
    summary=articles.title + ' by ' + articles.author
)

Added 5 column values with 0 errors.
5 rows updated, 10 values computed.

# View with computed column
articles.select(
    articles.title, articles.author, articles.summary
).collect()

Explanation

JSON format requirements: The JSON file must contain an array of objects at the top level:

[
  {"col1": "value1", "col2": 123},
  {"col1": "value2", "col2": 456}
]

Source types supported:

Nested JSON handling: Nested objects and arrays are stored as JSON columns. You can access nested fields using Pixeltable’s JSON path syntax in computed columns.

Welcome to Pixeltable

Core Concepts

How-To

Problem

Solution

Setup

Create sample JSON file

Import JSON file

Import from URL

Import from Python dictionaries

Add computed columns

Explanation

See also

Welcome to Pixeltable

Core Concepts

How-To

​Problem

​Solution

​Setup

​Create sample JSON file

​Import JSON file

​Import from URL

​Import from Python dictionaries

​Add computed columns

​Explanation

​See also

Problem

Solution

Setup

Create sample JSON file

Import JSON file

Import from URL

Import from Python dictionaries

Add computed columns

Explanation

See also