> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pixeltable.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Extract fields from LLM JSON responses

> Extract structured fields from LLM JSON responses in Pixeltable using path expressions, validation, and computed columns for downstream queries.

<a href="https://kaggle.com/kernels/welcome?src=https://github.com/pixeltable/pixeltable/blob/release/docs/release/howto/cookbooks/core/workflow-json-extraction.ipynb" id="openKaggle" target="_blank" rel="noopener noreferrer"><img src="https://kaggle.com/static/images/open-in-kaggle.svg" alt="Open in Kaggle" style={{ display: 'inline', margin: '0px' }} noZoom /></a>  <a href="https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/release/howto/cookbooks/core/workflow-json-extraction.ipynb" id="openColab" target="_blank" rel="noopener noreferrer"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" style={{ display: 'inline', margin: '0px' }} noZoom /></a>  <a href="https://raw.githubusercontent.com/pixeltable/pixeltable/refs/tags/release/docs/release/howto/cookbooks/core/workflow-json-extraction.ipynb" id="downloadNotebook" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/%E2%AC%87-Download%20Notebook-blue" alt="Download Notebook" style={{ display: 'inline', margin: '0px' }} noZoom /></a>

<Tip>This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.</Tip>

export const quartoRawHtml = [`
<table class="dataframe" data-quarto-postprocess="true" data-border="1">
<thead>
<tr style="text-align: right;">
<th data-quarto-table-cell-role="th">prompt</th>
<th data-quarto-table-cell-role="th">text</th>
<th data-quarto-table-cell-role="th">tokens</th>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: middle;">What is the capital of France?</td>
<td style="vertical-align: middle;">The capital of France is Paris.</td>
<td style="vertical-align: middle;">21</td>
</tr>
<tr>
<td style="vertical-align: middle;">Write a haiku about coding</td>
<td style="vertical-align: middle;">Fingers dance on keys, Logic flows like whispered winds, Worlds born
line by line.</td>
<td style="vertical-align: middle;">34</td>
</tr>
</tbody>
</table>
`, `
<table>
<thead>
<tr>
<th>API</th>
<th>Text content path</th>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: middle;">OpenAI</td>
<td style="vertical-align: middle;"><code>response.choices[0].message.content</code></td>
</tr>
<tr>
<td style="vertical-align: middle;">Anthropic</td>
<td style="vertical-align: middle;"><code>response.content[0].text</code></td>
</tr>
<tr>
<td style="vertical-align: middle;">OpenAI Whisper</td>
<td style="vertical-align: middle;"><code>response.text</code></td>
</tr>
</tbody>
</table>
`];

Parse and access specific fields from structured JSON responses returned
by language models.

## Problem

LLM APIs return nested JSON responses with metadata you don’t need. You
want to extract just the text content or specific fields for downstream
processing.

```json theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
{
  "id": "chatcmpl-123",
  "choices": [{
    "message": {
      "content": "This is the actual response text"  // ← You want this
    }
  }],
  "usage": {"tokens": 50}
}
```

## Solution

**What’s in this recipe:**

* Extract text content from chat completions
* Access nested JSON fields
* Create separate columns for different fields

You use JSON path notation to extract specific fields from API responses
and store them in computed columns.

### Setup

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable openai

import getpass
import os

if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
```

### Create prompts table

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions import openai

# Create a fresh directory
pxt.drop_dir('json_demo', force=True)
pxt.create_dir('json_demo')
```

<pre style={{ 'margin': '-20px 20px 0px 20px', 'padding': '0px', 'background-color': 'transparent', 'color': 'black' }}>
  Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
  Created directory 'json\_demo'.
  \<pixeltable.catalog.dir.Dir at 0x313d7a150>
</pre>

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t = pxt.create_table('json_demo/prompts', {'prompt': pxt.String})
```

<pre style={{ 'margin': '-20px 20px 0px 20px', 'padding': '0px', 'background-color': 'transparent', 'color': 'black' }}>
  Created table 'prompts'.
</pre>

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
t.insert(
    [
        {'prompt': 'What is the capital of France?'},
        {'prompt': 'Write a haiku about coding'},
    ]
)
```

<pre style={{ 'margin': '-20px 20px 0px 20px', 'padding': '0px', 'background-color': 'transparent', 'color': 'black' }}>
  Inserting rows into \`prompts\`: 2 rows \[00:00, 325.83 rows/s]
  Inserted 2 rows with 0 errors.
  2 rows inserted, 2 values computed.
</pre>

### Get LLM responses

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Add computed column for API response (returns full JSON)
t.add_computed_column(
    response=openai.chat_completions(
        messages=[{'role': 'user', 'content': t.prompt}],
        model='gpt-4o-mini',
    )
)
```

<pre style={{ 'margin': '-20px 20px 0px 20px', 'padding': '0px', 'background-color': 'transparent', 'color': 'black' }}>
  Added 2 column values with 0 errors.
  2 rows updated, 2 values computed.
</pre>

### Extract specific fields

Use dot notation to access nested JSON fields:

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# Extract just the text content
t.add_computed_column(text=t.response.choices[0].message.content)

# Extract token usage
t.add_computed_column(tokens=t.response.usage.total_tokens)
```

<pre style={{ 'margin': '-20px 20px 0px 20px', 'padding': '0px', 'background-color': 'transparent', 'color': 'black' }}>
  Added 2 column values with 0 errors.
  Added 2 column values with 0 errors.
  2 rows updated, 2 values computed.
</pre>

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
# View clean results
t.select(t.prompt, t.text, t.tokens).collect()
```

<div style={{ 'margin': '0px 20px 0px 20px' }} dangerouslySetInnerHTML={{ __html: quartoRawHtml[0] }} />

## Explanation

**Common extraction patterns:**

<div style={{ 'margin': '0px 20px 0px 20px' }} dangerouslySetInnerHTML={{ __html: quartoRawHtml[1] }} />

**Accessing JSON fields:**

* Use dot notation for object properties: `response.usage`
* Use brackets for array elements: `choices[0]`
* Chain them: `response.choices[0].message.content`

**Extracted columns are computed:**

Changes to the source data automatically update all extracted fields.

## See also

* [Configure API
  keys](/howto/cookbooks/core/workflow-api-keys)
* [Extract structured data from
  images](/howto/cookbooks/images/vision-structured-output)
