This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Use AI vision to extract JSON data from receipts, forms, documents, and
other images.
Problem
You have images containing structured information (receipts, forms, ID
cards) and need to extract specific fields as JSON for downstream
processing.
Solution
What’s in this recipe:
- Extract structured JSON from images using GPT-4o
- Use
openai.vision() which handles images directly
- Access individual fields from the extracted data
You use Pixeltable’s openai.vision() function which automatically
handles image encoding. Request JSON output via response_format in
model_kwargs.
Setup
%pip install -qU pixeltable openai
import os
import getpass
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
import pixeltable as pxt
from pixeltable.functions import openai
Load images
# Create a fresh directory
pxt.drop_dir('extraction_demo', force=True)
pxt.create_dir('extraction_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘extraction_demo’.
<pixeltable.catalog.dir.Dir at 0x145771ed0>
t = pxt.create_table('extraction_demo.images', {'image': pxt.Image})
Created table ‘images’.
# Insert sample images
t.insert([
{'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg'},
{'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg'},
])
Inserting rows into `images`: 2 rows [00:00, 365.50 rows/s]
Inserted 2 rows with 0 errors.
2 rows inserted, 4 values computed.
Use openai.vision() to analyze images and get JSON output:
# Add extraction column using openai.vision (handles images directly)
PROMPT = '''Analyze this image and extract the following as JSON:
- description: A brief description of the image
- objects: List of objects visible in the image
- dominant_colors: List of dominant colors
- scene_type: Type of scene (indoor, outdoor, etc.)'''
t.add_computed_column(
data=openai.vision(
prompt=PROMPT,
image=t.image,
model='gpt-4o-mini',
model_kwargs={'response_format': {'type': 'json_object'}}
)
)
Added 2 column values with 0 errors.
2 rows updated, 4 values computed.
# View extracted data
t.select(t.image, t.data).collect()
# You can also parse the JSON into individual columns if needed
import json
@pxt.udf
def parse_description(data: str) -> str:
return json.loads(data).get('description', '')
t.select(t.image, description=parse_description(t.data)).collect()
Explanation
Why use openai.vision():
- Handles PIL Images directly (no manual base64 encoding or URL
storage)
- Simpler API than constructing chat messages manually
- Returns the content string directly
Getting JSON output:
Pass model_kwargs={'response_format': {'type': 'json_object'}} to get
structured JSON.
Other extraction use cases:
See also