Find similar images with CLIP

This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.

Build visual similarity search to find images that look alike using OpenAI’s CLIP model.

Problem

You have a collection of images and need to find visually similar ones—for duplicate detection, content recommendations, or visual search.

Solution

What’s in this recipe:

Create image embeddings with CLIP
Search by image similarity
Search by text description (cross-modal)

You add an embedding index using CLIP, which understands both images and text. This enables finding similar images or searching images by text description.

Setup

%pip install -qU pixeltable sentence-transformers torch

import pixeltable as pxt
from pixeltable.functions.huggingface import clip

Load images

# Create a fresh directory
pxt.drop_dir('image_search_demo', force=True)
pxt.create_dir('image_search_demo')

Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘image_search_demo’.
<pixeltable.catalog.dir.Dir at 0x13ca93c50>

images = pxt.create_table(
    'image_search_demo/images', {'image': pxt.Image}
)

Created table ‘images’.

# Insert sample images
images.insert(
    [
        {
            'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg'
        },
        {
            'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg'
        },
        {
            'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000106.jpg'
        },
        {
            'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000139.jpg'
        },
    ]
)

Inserting rows into `images`: 4 rows [00:00, 973.44 rows/s]
Inserted 4 rows with 0 errors.
4 rows inserted, 8 values computed.

Create CLIP embedding index

Add an embedding index using CLIP for cross-modal search:

# Add CLIP embedding index (supports both image and text queries)
images.add_embedding_index(
    'image', embedding=clip.using(model_id='openai/clip-vit-base-patch32')
)

Search by text description

Find images matching a text query:

# Search by text description
query = 'people eating food'
sim = images.image.similarity(string=query)

results = (
    images.order_by(sim, asc=False)
    .select(images.image, score=sim)
    .limit(2)
)
results.collect()

Explanation

Why CLIP: CLIP (Contrastive Language-Image Pre-training) understands both images and text in the same embedding space. This enables:

Image-to-image search (find similar photos)
Text-to-image search (find photos matching a description)

Index parameters:

Both must use the same model for cross-modal search to work. New images are indexed automatically: When you insert new images, embeddings are generated without extra code.

Welcome to Pixeltable

Core Concepts

How-To

Problem

Solution

Setup

Load images

Create CLIP embedding index

Search by text description

Explanation

See also

Welcome to Pixeltable

Core Concepts

How-To

​Problem

​Solution

​Setup

​Load images

​Create CLIP embedding index

​Search by text description

​Explanation

​See also

Problem

Solution

Setup

Load images

Create CLIP embedding index

Search by text description

Explanation

See also