This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Load structured data from JSON files into Pixeltable tables for
processing and analysis.
Problem
You have data in JSON format—from APIs, exports, or application logs.
You need to load this data for processing with AI models or combining
with other data sources.
Solution
What’s in this recipe:
- Import JSON files directly into tables
- Import from URLs (APIs, remote files)
- Handle nested JSON structures
You use pxt.create_table() with a source parameter to create a table
from a JSON file or URL. The JSON must be an array of objects, where
each object becomes a row.
Setup
%pip install -qU pixeltable
import pixeltable as pxt
import json
import tempfile
from pathlib import Path
Create sample JSON file
First, create a sample JSON file to demonstrate the import process:
# Create sample JSON data (array of objects)
sample_data = [
{'id': 1, 'title': 'Introduction to ML', 'author': 'Alice', 'tags': ['ml', 'intro'], 'rating': 4.5},
{'id': 2, 'title': 'Deep Learning Basics', 'author': 'Bob', 'tags': ['dl', 'neural'], 'rating': 4.8},
{'id': 3, 'title': 'NLP Fundamentals', 'author': 'Carol', 'tags': ['nlp', 'text'], 'rating': 4.2},
{'id': 4, 'title': 'Computer Vision', 'author': 'Dave', 'tags': ['cv', 'images'], 'rating': 4.6},
{'id': 5, 'title': 'Reinforcement Learning', 'author': 'Eve', 'tags': ['rl', 'agents'], 'rating': 4.3}
]
# Save to temporary JSON file
temp_dir = tempfile.mkdtemp()
json_path = Path(temp_dir) / 'articles.json'
with open(json_path, 'w') as f:
json.dump(sample_data, f, indent=2)
Import JSON file
Use create_table with source to create a table directly from a JSON
file:
# Create a fresh directory
pxt.drop_dir('json_demo', force=True)
pxt.create_dir('json_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘json_demo’.
<pixeltable.catalog.dir.Dir at 0x1556b2800>
# Import JSON file into a new table
articles = pxt.create_table(
'json_demo.articles',
source=str(json_path),
source_format='json' # Explicitly specify format when using local file paths
)
Created table ‘articles’.Inserting rows into `articles`: 0 rows [00:00, ? rows/s]
Inserting rows into `articles`: 5 rows [00:00, 538.52 rows/s]
Inserted 5 rows with 0 errors.
# View imported data
articles.collect()
Import from URL
You can import JSON directly from a URL—useful for APIs and remote data:
# Import from a public JSON URL
# Using JSONPlaceholder API as an example
posts = pxt.create_table(
'json_demo.posts',
source='https://jsonplaceholder.typicode.com/posts',
source_format='json' # Required for URL sources
)
Created table ‘posts’.Inserting rows into `posts`: 0 rows [00:00, ? rows/s]
Inserting rows into `posts`: 100 rows [00:00, 15623.57 rows/s]
Inserted 100 rows with 0 errors.
# View first few rows
posts.head(5)
Import from Python dictionaries
Use create_table with a list of dictionaries as source—useful when
you have data in memory:
# Import from a list of dictionaries
events = [
{'event': 'page_view', 'user_id': 101, 'timestamp': '2024-01-15T10:30:00'},
{'event': 'click', 'user_id': 101, 'timestamp': '2024-01-15T10:31:00'},
{'event': 'purchase', 'user_id': 102, 'timestamp': '2024-01-15T10:32:00'},
]
event_table = pxt.create_table('json_demo.events', source=events)
Created table ‘events’.Inserting rows into `events`: 0 rows [00:00, ? rows/s]
Inserting rows into `events`: 3 rows [00:00, 988.06 rows/s]
Inserted 3 rows with 0 errors.
# View imported events
event_table.collect()
Add computed columns
Once imported, you can enrich the data with computed columns:
# Add a computed column combining title and author
articles.add_computed_column(
summary=articles.title + ' by ' + articles.author
)
Added 5 column values with 0 errors.
5 rows updated, 10 values computed.
# View with computed column
articles.select(articles.title, articles.author, articles.summary).collect()
Explanation
JSON format requirements:
The JSON file must contain an array of objects at the top level:
[
{"col1": "value1", "col2": 123},
{"col1": "value2", "col2": 456}
]
Source types supported:
Nested JSON handling:
Nested objects and arrays are stored as JSON columns. You can access
nested fields using Pixeltable’s JSON path syntax in computed columns.
See also