Extract structured data from images

This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.

Use AI vision to extract JSON data from receipts, forms, documents, and other images.

Problem

You have images containing structured information (receipts, forms, ID cards) and need to extract specific fields as JSON for downstream processing.

Solution

What’s in this recipe:

Extract structured JSON from images using GPT-4o
Use openai.vision() which handles images directly
Access individual fields from the extracted data

You use Pixeltable’s openai.vision() function which automatically handles image encoding. Request JSON output via response_format in model_kwargs.

Setup

%pip install -qU pixeltable openai

import os
import getpass

if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')

import pixeltable as pxt
from pixeltable.functions import openai

Load images

# Create a fresh directory
pxt.drop_dir('extraction_demo', force=True)
pxt.create_dir('extraction_demo')

Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘extraction_demo’.
<pixeltable.catalog.dir.Dir at 0x145771ed0>

t = pxt.create_table('extraction_demo.images', {'image': pxt.Image})

Created table ‘images’.

# Insert sample images
t.insert([
    {'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg'},
    {'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg'},
])

Inserting rows into `images`: 2 rows [00:00, 365.50 rows/s]
Inserted 2 rows with 0 errors.
2 rows inserted, 4 values computed.