This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Use AI vision to extract JSON data from receipts, forms, documents, and
other images.
Problem
You have images containing structured information (receipts, forms, ID
cards) and need to extract specific fields as JSON for downstream
processing.
Solution
What’s in this recipe:
- Extract structured JSON from images using GPT-4o
- Use
openai.chat_completions() with multimodal messages
- Access individual fields from the extracted data
You use Pixeltable’s openai.chat_completions() function with
multimodal messages that include images directly. Request JSON output
via response_format in model_kwargs.
Setup
%pip install -qU pixeltable openai
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
Load images
import pixeltable as pxt
from pixeltable.functions import openai
# Create a fresh directory
pxt.drop_dir('extraction_demo', force=True)
pxt.create_dir('extraction_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory ‘extraction_demo’.
<pixeltable.catalog.dir.Dir at 0x1232b4f70>
t = pxt.create_table('extraction_demo/images', {'image': pxt.Image})
Created table ‘images’.
# Insert sample images
t.insert(
[
{
'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg'
},
{
'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg'
},
]
)
Inserted 2 rows with 0 errors in 0.03 s (60.43 rows/s)
2 rows inserted.
Use openai.chat_completions() to analyze images and get JSON output:
# Add extraction column using openai.chat_completions with multimodal messages
PROMPT = """Analyze this image and extract the following as JSON:
- description: A brief description of the image
- objects: List of objects visible in the image
- dominant_colors: List of dominant colors
- scene_type: Type of scene (indoor, outdoor, etc.)"""
messages = [
{
'role': 'user',
'content': [
{'type': 'text', 'text': PROMPT},
{'type': 'image_url', 'image_url': t.image},
],
}
]
t.add_computed_column(
data=openai.chat_completions(
messages,
model='gpt-4o-mini',
model_kwargs={'response_format': {'type': 'json_object'}},
)
)
Added 2 column values with 0 errors in 7.55 s (0.26 rows/s)
2 rows updated.
# View extracted data
t.select(
t.image, t.data, t.data['choices'][0]['message']['content']
).collect()
# You can also parse the JSON into individual columns if needed
import json
@pxt.udf
def parse_description(data: str) -> str:
return json.loads(data).get('description', '')
t.select(
t.image,
description=parse_description(
t.data['choices'][0]['message']['content']
),
).collect()
Explanation
Getting JSON output:
Pass model_kwargs={'response_format': {'type': 'json_object'}} to get
structured JSON.
Other extraction use cases:
See also