This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Build visual similarity search to find images that look alike using
OpenAI’s CLIP model.
Problem
You have a collection of images and need to find visually similar
ones—for duplicate detection, content recommendations, or visual search.
Solution
What’s in this recipe:
- Create image embeddings with CLIP
- Search by image similarity
- Search by text description (cross-modal)
You add an embedding index using CLIP, which understands both images and
text. This enables finding similar images or searching images by text
description.
Setup
%pip install -qU pixeltable sentence-transformers torch
import pixeltable as pxt
from pixeltable.functions.huggingface import clip
Load images
# Create a fresh directory
pxt.drop_dir('image_search_demo', force=True)
pxt.create_dir('image_search_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘image_search_demo’.
<pixeltable.catalog.dir.Dir at 0x13ca93c50>
images = pxt.create_table('image_search_demo.images', {'image': pxt.Image})
Created table ‘images’.
# Insert sample images
images.insert([
{'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg'},
{'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg'},
{'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000106.jpg'},
{'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000139.jpg'},
])
Inserting rows into `images`: 4 rows [00:00, 973.44 rows/s]
Inserted 4 rows with 0 errors.
4 rows inserted, 8 values computed.
Create CLIP embedding index
Add an embedding index using CLIP for cross-modal search:
# Add CLIP embedding index (supports both image and text queries)
images.add_embedding_index(
'image',
embedding=clip.using(model_id='openai/clip-vit-base-patch32')
)
Search by text description
Find images matching a text query:
# Search by text description
query = "people eating food"
sim = images.image.similarity(query)
results = (
images
.order_by(sim, asc=False)
.select(images.image, score=sim)
.limit(2)
)
results.collect()
Explanation
Why CLIP:
CLIP (Contrastive Language-Image Pre-training) understands both images
and text in the same embedding space. This enables:
- Image-to-image search (find similar photos)
- Text-to-image search (find photos matching a description)
Index parameters:
Both must use the same model for cross-modal search to work.
New images are indexed automatically:
When you insert new images, embeddings are generated without extra code.
See also