Skip to main content
Open in Kaggle  Open in Colab  Download Notebook
This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.
Use AI vision to extract JSON data from receipts, forms, documents, and other images.

Problem

You have images containing structured information (receipts, forms, ID cards) and need to extract specific fields as JSON for downstream processing.

Solution

What’s in this recipe:
  • Extract structured JSON from images using GPT-4o
  • Use openai.vision() which handles images directly
  • Access individual fields from the extracted data
You use Pixeltable’s openai.vision() function which automatically handles image encoding. Request JSON output via response_format in model_kwargs.

Setup

%pip install -qU pixeltable openai

import getpass
import os

if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
import pixeltable as pxt
from pixeltable.functions import openai

Load images

# Create a fresh directory
pxt.drop_dir('extraction_demo', force=True)
pxt.create_dir('extraction_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory ‘extraction_demo’.
<pixeltable.catalog.dir.Dir at 0x1232b4f70>
t = pxt.create_table('extraction_demo/images', {'image': pxt.Image})
Created table ‘images’.
# Insert sample images
t.insert(
    [
        {
            'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg'
        },
        {
            'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg'
        },
    ]
)
Inserted 2 rows with 0 errors in 0.03 s (60.43 rows/s)
2 rows inserted.

Extract structured data

Use openai.chat_completions() to analyze images and get JSON output:
# Add extraction column using openai.vision (handles images directly)
PROMPT = """Analyze this image and extract the following as JSON:
- description: A brief description of the image
- objects: List of objects visible in the image
- dominant_colors: List of dominant colors
- scene_type: Type of scene (indoor, outdoor, etc.)"""

messages = [
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': PROMPT},
            {'type': 'image_url', 'image_url': t.image},
        ],
    }
]

t.add_computed_column(
    data=openai.chat_completions(
        messages,
        model='gpt-4o-mini',
        model_kwargs={'response_format': {'type': 'json_object'}},
    )
)
Added 2 column values with 0 errors in 7.55 s (0.26 rows/s)
2 rows updated.
# View extracted data
t.select(
    t.image, t.data, t.data['choices'][0]['message']['content']
).collect()
# You can also parse the JSON into individual columns if needed
import json


@pxt.udf
def parse_description(data: str) -> str:
    return json.loads(data).get('description', '')


t.select(
    t.image,
    description=parse_description(
        t.data['choices'][0]['message']['content']
    ),
).collect()

Explanation

Getting JSON output: Pass model_kwargs={'response_format': {'type': 'json_object'}} to get structured JSON. Other extraction use cases:

See also

Last modified on March 3, 2026