Extract fields from LLM JSON responses

This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.

Parse and access specific fields from structured JSON responses returned by language models.

Problem

LLM APIs return nested JSON responses with metadata you don’t need. You want to extract just the text content or specific fields for downstream processing.

{
  "id": "chatcmpl-123",
  "choices": [{
    "message": {
      "content": "This is the actual response text"  // ← You want this
    }
  }],
  "usage": {"tokens": 50}
}

Solution

What’s in this recipe:

Extract text content from chat completions
Access nested JSON fields
Create separate columns for different fields

You use JSON path notation to extract specific fields from API responses and store them in computed columns.

Setup

%pip install -qU pixeltable openai

import os
import getpass

if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')

import pixeltable as pxt
from pixeltable.functions import openai

Create prompts table

# Create a fresh directory
pxt.drop_dir('json_demo', force=True)
pxt.create_dir('json_demo')

Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘json_demo’.
<pixeltable.catalog.dir.Dir at 0x313d7a150>

t = pxt.create_table('json_demo.prompts', {'prompt': pxt.String})

Created table ‘prompts’.

t.insert([
    {'prompt': 'What is the capital of France?'},
    {'prompt': 'Write a haiku about coding'},
])

Inserting rows into `prompts`: 2 rows [00:00, 325.83 rows/s]
Inserted 2 rows with 0 errors.
2 rows inserted, 2 values computed.

Get LLM responses

# Add computed column for API response (returns full JSON)
t.add_computed_column(
    response=openai.chat_completions(
        messages=[{'role': 'user', 'content': t.prompt}],
        model='gpt-4o-mini'
    )
)

Added 2 column values with 0 errors.
2 rows updated, 2 values computed.

Extract specific fields

Use dot notation to access nested JSON fields:

# Extract just the text content
t.add_computed_column(
    text=t.response.choices[0].message.content
)

# Extract token usage
t.add_computed_column(
    tokens=t.response.usage.total_tokens
)

Added 2 column values with 0 errors.
Added 2 column values with 0 errors.
2 rows updated, 2 values computed.

# View clean results
t.select(t.prompt, t.text, t.tokens).collect()

Explanation

Common extraction patterns:

Accessing JSON fields:

Use dot notation for object properties: response.usage
Use brackets for array elements: choices[0]
Chain them: response.choices[0].message.content

Extracted columns are computed: Changes to the source data automatically update all extracted fields.

Welcome to Pixeltable

Core Concepts

How-To

Problem

Solution

Setup

Create prompts table

Get LLM responses

Extract specific fields

Explanation

See also

Welcome to Pixeltable

Core Concepts

How-To

​Problem

​Solution

​Setup

​Create prompts table

​Get LLM responses

​Extract specific fields

​Explanation

​See also

Problem

Solution

Setup

Create prompts table

Get LLM responses

Extract specific fields

Explanation

See also