Skip to main content
Open in Kaggle  Open in Colab  Download Notebook
This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.
Use AI vision to extract JSON data from receipts, forms, documents, and other images.

Problem

You have images containing structured information (receipts, forms, ID cards) and need to extract specific fields as JSON for downstream processing.

Solution

What’s in this recipe:
  • Extract structured JSON from images using GPT-4o
  • Use openai.vision() which handles images directly
  • Access individual fields from the extracted data
You use Pixeltable’s openai.vision() function which automatically handles image encoding. Request JSON output via response_format in model_kwargs.

Setup

%pip install -qU pixeltable openai

import os
import getpass

if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
import pixeltable as pxt
from pixeltable.functions import openai

Load images

# Create a fresh directory
pxt.drop_dir('extraction_demo', force=True)
pxt.create_dir('extraction_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘extraction_demo’.
<pixeltable.catalog.dir.Dir at 0x145771ed0>
t = pxt.create_table('extraction_demo.images', {'image': pxt.Image})
Created table ‘images’.
# Insert sample images
t.insert([
    {'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg'},
    {'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg'},
])
Inserting rows into `images`: 2 rows [00:00, 365.50 rows/s]
Inserted 2 rows with 0 errors.
2 rows inserted, 4 values computed.

Extract structured data

Use openai.vision() to analyze images and get JSON output:
# Add extraction column using openai.vision (handles images directly)
PROMPT = '''Analyze this image and extract the following as JSON:
- description: A brief description of the image
- objects: List of objects visible in the image
- dominant_colors: List of dominant colors
- scene_type: Type of scene (indoor, outdoor, etc.)'''

t.add_computed_column(
    data=openai.vision(
        prompt=PROMPT,
        image=t.image,
        model='gpt-4o-mini',
        model_kwargs={'response_format': {'type': 'json_object'}}
    )
)
Added 2 column values with 0 errors.
2 rows updated, 4 values computed.
# View extracted data
t.select(t.image, t.data).collect()
# You can also parse the JSON into individual columns if needed
import json

@pxt.udf
def parse_description(data: str) -> str:
    return json.loads(data).get('description', '')

t.select(t.image, description=parse_description(t.data)).collect()

Explanation

Why use openai.vision():
  • Handles PIL Images directly (no manual base64 encoding or URL storage)
  • Simpler API than constructing chat messages manually
  • Returns the content string directly
Getting JSON output: Pass model_kwargs={'response_format': {'type': 'json_object'}} to get structured JSON. Other extraction use cases:

See also