This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Automatically create descriptive captions for images using AI vision
models.
Problem
You have a collection of images that need captions—for accessibility,
SEO, content management, or searchability. Writing captions manually
doesn’t scale.
Solution
What’s in this recipe:
- Generate captions using OpenAI’s vision models
- Customize caption style (short, detailed, SEO-focused)
- Process images in batch automatically
You add a computed column that sends each image to a vision model with a
captioning prompt. New images are captioned automatically on insert.
Setup
%pip install -qU pixeltable openai
import os
import getpass
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
import pixeltable as pxt
from pixeltable.functions.openai import vision
Load images
# Create a fresh directory
pxt.drop_dir('caption_demo', force=True)
pxt.create_dir('caption_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘caption_demo’.
<pixeltable.catalog.dir.Dir at 0x141561010>
# Create table for images
images = pxt.create_table('caption_demo.images', {'image': pxt.Image})
Created table ‘images’.
# Insert sample images
image_urls = [
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000106.jpg',
]
images.insert([{'image': url} for url in image_urls])
Inserting rows into `images`: 3 rows [00:00, 412.20 rows/s]
Inserted 3 rows with 0 errors.
3 rows inserted, 6 values computed.
# View images
images.collect()
Generate captions
Add a computed column that generates captions using the vision model:
# Add caption using OpenAI vision
images.add_computed_column(
caption=vision(
'Write a concise, descriptive caption for this image in one sentence.',
images.image,
model='gpt-4o-mini'
)
)
Added 3 column values with 0 errors.
3 rows updated, 6 values computed.
# View images with captions
images.select(images.image, images.caption).collect()
Different caption styles
You can generate multiple caption styles for different uses:
# Add alt text for accessibility (brief)
images.add_computed_column(
alt_text=vision(
'Write a brief alt text for this image (under 125 characters) for screen readers.',
images.image,
model='gpt-4o-mini'
)
)
Added 3 column values with 0 errors.
3 rows updated, 6 values computed.
# Add detailed description
images.add_computed_column(
description=vision(
'Describe this image in detail, including objects, colors, setting, and mood.',
images.image,
model='gpt-4o-mini'
)
)
Added 3 column values with 0 errors.
3 rows updated, 6 values computed.
# View all caption types
images.select(images.image, images.caption, images.alt_text).collect()
Explanation
Caption prompt patterns:
Model selection:
gpt-4o-mini: Fast and affordable, good for most captioning tasks
gpt-4o: Higher quality for complex images or detailed descriptions
See also