This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Parse and access specific fields from structured JSON responses returned
by language models.
Problem
LLM APIs return nested JSON responses with metadata you don’t need. You
want to extract just the text content or specific fields for downstream
processing.
{
"id": "chatcmpl-123",
"choices": [{
"message": {
"content": "This is the actual response text" // ← You want this
}
}],
"usage": {"tokens": 50}
}
Solution
What’s in this recipe:
- Extract text content from chat completions
- Access nested JSON fields
- Create separate columns for different fields
You use JSON path notation to extract specific fields from API responses
and store them in computed columns.
Setup
%pip install -qU pixeltable openai
import os
import getpass
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
import pixeltable as pxt
from pixeltable.functions import openai
Create prompts table
# Create a fresh directory
pxt.drop_dir('json_demo', force=True)
pxt.create_dir('json_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘json_demo’.
<pixeltable.catalog.dir.Dir at 0x313d7a150>
t = pxt.create_table('json_demo.prompts', {'prompt': pxt.String})
Created table ‘prompts’.
t.insert([
{'prompt': 'What is the capital of France?'},
{'prompt': 'Write a haiku about coding'},
])
Inserting rows into `prompts`: 2 rows [00:00, 325.83 rows/s]
Inserted 2 rows with 0 errors.
2 rows inserted, 2 values computed.
Get LLM responses
# Add computed column for API response (returns full JSON)
t.add_computed_column(
response=openai.chat_completions(
messages=[{'role': 'user', 'content': t.prompt}],
model='gpt-4o-mini'
)
)
Added 2 column values with 0 errors.
2 rows updated, 2 values computed.
Use dot notation to access nested JSON fields:
# Extract just the text content
t.add_computed_column(
text=t.response.choices[0].message.content
)
# Extract token usage
t.add_computed_column(
tokens=t.response.usage.total_tokens
)
Added 2 column values with 0 errors.
Added 2 column values with 0 errors.
2 rows updated, 2 values computed.
# View clean results
t.select(t.prompt, t.text, t.tokens).collect()
Explanation
Common extraction patterns:
Accessing JSON fields:
- Use dot notation for object properties:
response.usage
- Use brackets for array elements:
choices[0]
- Chain them:
response.choices[0].message.content
Extracted columns are computed:
Changes to the source data automatically update all extracted fields.
See also