Skip to main content
Open in Kaggle  Open in Colab  Download Notebook
This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.
Automatically create descriptive captions for images using AI vision models.

Problem

You have a collection of images that need captions—for accessibility, SEO, content management, or searchability. Writing captions manually doesn’t scale.

Solution

What’s in this recipe:
  • Generate captions using OpenAI’s vision models
  • Customize caption style (short, detailed, SEO-focused)
  • Process images in batch automatically
You add a computed column that sends each image to a vision model with a captioning prompt. New images are captioned automatically on insert.

Setup

%pip install -qU pixeltable openai
import os
import getpass

if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
import pixeltable as pxt
from pixeltable.functions.openai import vision

Load images

# Create a fresh directory
pxt.drop_dir('caption_demo', force=True)
pxt.create_dir('caption_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘caption_demo’.
<pixeltable.catalog.dir.Dir at 0x141561010>
# Create table for images
images = pxt.create_table('caption_demo.images', {'image': pxt.Image})
Created table ‘images’.
# Insert sample images
image_urls = [
    'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg',
    'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg',
    'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000106.jpg',
]

images.insert([{'image': url} for url in image_urls])
Inserting rows into `images`: 3 rows [00:00, 412.20 rows/s]
Inserted 3 rows with 0 errors.
3 rows inserted, 6 values computed.
# View images
images.collect()

Generate captions

Add a computed column that generates captions using the vision model:
# Add caption using OpenAI vision
images.add_computed_column(
    caption=vision(
        'Write a concise, descriptive caption for this image in one sentence.',
        images.image,
        model='gpt-4o-mini'
    )
)
Added 3 column values with 0 errors.
3 rows updated, 6 values computed.
# View images with captions
images.select(images.image, images.caption).collect()

Different caption styles

You can generate multiple caption styles for different uses:
# Add alt text for accessibility (brief)
images.add_computed_column(
    alt_text=vision(
        'Write a brief alt text for this image (under 125 characters) for screen readers.',
        images.image,
        model='gpt-4o-mini'
    )
)
Added 3 column values with 0 errors.
3 rows updated, 6 values computed.
# Add detailed description
images.add_computed_column(
    description=vision(
        'Describe this image in detail, including objects, colors, setting, and mood.',
        images.image,
        model='gpt-4o-mini'
    )
)
Added 3 column values with 0 errors.
3 rows updated, 6 values computed.
# View all caption types
images.select(images.image, images.caption, images.alt_text).collect()

Explanation

Caption prompt patterns:
Model selection:
  • gpt-4o-mini: Fast and affordable, good for most captioning tasks
  • gpt-4o: Higher quality for complex images or detailed descriptions

See also