Skip to main content
Open in Kaggle  Open in Colab  Download Notebook
This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.
Parse and access specific fields from structured JSON responses returned by language models.

Problem

LLM APIs return nested JSON responses with metadata you don’t need. You want to extract just the text content or specific fields for downstream processing.
{
  "id": "chatcmpl-123",
  "choices": [{
    "message": {
      "content": "This is the actual response text"  // ← You want this
    }
  }],
  "usage": {"tokens": 50}
}

Solution

What’s in this recipe:
  • Extract text content from chat completions
  • Access nested JSON fields
  • Create separate columns for different fields
You use JSON path notation to extract specific fields from API responses and store them in computed columns.

Setup

%pip install -qU pixeltable openai

import os
import getpass

if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
import pixeltable as pxt
from pixeltable.functions import openai

Create prompts table

# Create a fresh directory
pxt.drop_dir('json_demo', force=True)
pxt.create_dir('json_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘json_demo’.
<pixeltable.catalog.dir.Dir at 0x313d7a150>
t = pxt.create_table('json_demo.prompts', {'prompt': pxt.String})
Created table ‘prompts’.
t.insert([
    {'prompt': 'What is the capital of France?'},
    {'prompt': 'Write a haiku about coding'},
])
Inserting rows into `prompts`: 2 rows [00:00, 325.83 rows/s]
Inserted 2 rows with 0 errors.
2 rows inserted, 2 values computed.

Get LLM responses

# Add computed column for API response (returns full JSON)
t.add_computed_column(
    response=openai.chat_completions(
        messages=[{'role': 'user', 'content': t.prompt}],
        model='gpt-4o-mini'
    )
)
Added 2 column values with 0 errors.
2 rows updated, 2 values computed.

Extract specific fields

Use dot notation to access nested JSON fields:
# Extract just the text content
t.add_computed_column(
    text=t.response.choices[0].message.content
)

# Extract token usage
t.add_computed_column(
    tokens=t.response.usage.total_tokens
)
Added 2 column values with 0 errors.
Added 2 column values with 0 errors.
2 rows updated, 2 values computed.
# View clean results
t.select(t.prompt, t.text, t.tokens).collect()

Explanation

Common extraction patterns:
Accessing JSON fields:
  • Use dot notation for object properties: response.usage
  • Use brackets for array elements: choices[0]
  • Chain them: response.choices[0].message.content
Extracted columns are computed: Changes to the source data automatically update all extracted fields.

See also