This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Automatically create descriptive captions for images using AI vision
models.
Problem
You have a collection of images that need captions—for accessibility,
SEO, content management, or searchability. Writing captions manually
doesn’t scale.
Solution
What’s in this recipe:
- Generate captions using OpenAI’s vision models
- Customize caption style (short, detailed, SEO-focused)
- Process images in batch automatically
You add a computed column that sends each image to a vision model with a
captioning prompt. New images are captioned automatically on insert.
Setup
%pip install -qU pixeltable openai
import getpass
import os
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
import pixeltable as pxt
from pixeltable.functions.openai import chat_completions
Load images
# Create a fresh directory
pxt.drop_dir('caption_demo', force=True)
pxt.create_dir('caption_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory ‘caption_demo’.
<pixeltable.catalog.dir.Dir at 0x11fba5840>
# Create table for images
images = pxt.create_table('caption_demo/images', {'image': pxt.Image})
Created table ‘images’.
# Insert sample images
image_urls = [
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000106.jpg',
]
images.insert([{'image': url} for url in image_urls])
Inserted 3 rows with 0 errors in 0.12 s (25.17 rows/s)
3 rows inserted.
# View images
images.collect()
Generate captions
Add a computed column that generates captions using the vision model:
# Add caption using OpenAI vision
messages = [
{
'role': 'user',
'content': [
{
'type': 'text',
'text': 'Write a concise, descriptive caption for this image in one sentence.',
},
{'type': 'image_url', 'image_url': images.image},
],
}
]
images.add_computed_column(
caption=chat_completions(messages, model='gpt-4o-mini')
)
Added 3 column values with 0 errors in 4.62 s (0.65 rows/s)
3 rows updated.
# View images with captions
images.select(
images.image, images.caption['choices'][0]['message']['content']
).collect()
Different caption styles
You can generate multiple caption styles for different uses:
# Add alt text for accessibility (brief)
messages = [
{
'role': 'user',
'content': [
{
'type': 'text',
'text': 'Write a brief alt text for this image (under 125 characters) for screen readers.',
},
{'type': 'image_url', 'image_url': images.image},
],
}
]
images.add_computed_column(
alt_text=chat_completions(messages, model='gpt-4o-mini')
)
Added 3 column values with 0 errors in 3.51 s (0.85 rows/s)
3 rows updated.
# Add detailed description
messages = [
{
'role': 'user',
'content': [
{
'type': 'text',
'text': 'Describe this image in detail, including objects, colors, setting, and mood.',
},
{'type': 'image_url', 'image_url': images.image},
],
}
]
images.add_computed_column(
description=chat_completions(messages, model='gpt-4o-mini')
)
Added 3 column values with 0 errors in 11.28 s (0.27 rows/s)
3 rows updated.
# View all caption types
images.select(
images.image,
images.caption['choices'][0]['message']['content'],
images.alt_text['choices'][0]['message']['content'],
images.description['choices'][0]['message']['content'],
).collect()
Explanation
Caption prompt patterns:
Model selection:
gpt-4o-mini: Fast and affordable, good for most captioning tasks
gpt-4o: Higher quality for complex images or detailed descriptions
See also