Build semantic search for text

This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.

Create a searchable knowledge base that finds content by meaning, not just keywords.

Problem

You have a collection of text content (articles, notes, documentation) and need to find relevant items based on meaning. Keyword search fails when users phrase queries differently from the source text:

Solution

What’s in this recipe:

Create a text table with embeddings
Search by semantic similarity
Combine with metadata filters

You add an embedding index to your text column. Pixeltable automatically generates embeddings for each row and enables similarity search.

Setup

%pip install -qU pixeltable sentence-transformers

import pixeltable as pxt
from pixeltable.functions.huggingface import sentence_transformer

Create knowledge base

# Create a fresh directory
pxt.drop_dir('search_demo', force=True)
pxt.create_dir('search_demo')

Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘search_demo’.
<pixeltable.catalog.dir.Dir at 0x14208ca10>

# Create table with content and metadata
kb = pxt.create_table(
    'search_demo/articles',
    {'title': pxt.String, 'content': pxt.String, 'category': pxt.String},
)

Created table ‘articles’.

# Insert sample content
kb.insert(
    [
        {
            'title': 'Debugging best practices',
            'content': 'Use logging, breakpoints, and unit tests to identify and fix issues in your code.',
            'category': 'engineering',
        },
        {
            'title': 'Machine learning model optimization',
            'content': 'Improve training efficiency with batch normalization, learning rate schedules, and early stopping.',
            'category': 'ml',
        },
        {
            'title': 'Production infrastructure setup',
            'content': 'Deploy applications using containers, load balancers, and automated scaling.',
            'category': 'devops',
        },
        {
            'title': 'API design principles',
            'content': 'Create RESTful endpoints with proper versioning, authentication, and error handling.',
            'category': 'engineering',
        },
    ]
)

Inserting rows into `articles`: 4 rows [00:00, 577.69 rows/s]
Inserted 4 rows with 0 errors.
4 rows inserted, 12 values computed.

Add semantic search

Create an embedding index on the content column:

# Add embedding index
kb.add_embedding_index(
    column='content',
    string_embed=sentence_transformer.using(model_id='all-MiniLM-L6-v2'),
)

Search by meaning

Find content semantically similar to your query:

# Search by meaning
query = 'how to fix bugs'
sim = kb.content.similarity(string=query)

results = (
    kb.order_by(sim, asc=False)
    .select(kb.title, kb.content, score=sim)
    .limit(2)
)
results.collect()

Filter by metadata

Combine semantic search with metadata filters:

# Search within a specific category
query = 'best practices'
sim = kb.content.similarity(string=query)

results = (
    kb.where(kb.category == 'engineering')  # Filter first
    .order_by(sim, asc=False)
    .select(kb.title, kb.category, score=sim)
    .limit(2)
)
results.collect()

Explanation

How similarity search works:

Your query is converted to an embedding vector
Pixeltable finds the most similar vectors in the index
Results are ranked by cosine similarity (0 to 1)

Embedding models:

New content is indexed automatically: When you insert new rows, embeddings are generated without extra code.

Welcome to Pixeltable

Core Concepts

How-To

Problem

Solution

Setup

Create knowledge base

Add semantic search

Search by meaning

Filter by metadata

Explanation

See also

Welcome to Pixeltable

Core Concepts

How-To

​Problem

​Solution

​Setup

​Create knowledge base

​Add semantic search

​Search by meaning

​Filter by metadata

​Explanation

​See also

Problem

Solution

Setup

Create knowledge base

Add semantic search

Search by meaning

Filter by metadata

Explanation

See also