Query	Keyword match	Semantic match
“how to fix bugs”	❌ No results	✓ “Debugging best practices”
“ML training”	❌ No results	✓ “Machine learning model optimization”
“deploy to cloud”	❌ No results	✓ “Production infrastructure setup”

title	content	score
Debugging best practices	Use logging, breakpoints, and unit tests to identify and fix issues in your code.	0.391
API design principles	Create RESTful endpoints with proper versioning, authentication, and error handling.	0.186

title	category	score
API design principles	engineering	0.238
Debugging best practices	engineering	0.157

Model	Speed	Quality	Use case
`all-MiniLM-L6-v2`	Fast	Good	General text
`all-mpnet-base-v2`	Medium	Better	Higher accuracy
OpenAI `text-embedding-3-small`	API	Best	Production apps

## Solution **What’s in this recipe:** * Create a text table with embeddings * Search by semantic similarity * Combine with metadata filters You add an embedding index to your text column. Pixeltable automatically generates embeddings for each row and enables similarity search. ### Setup ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} %pip install -qU pixeltable sentence-transformers ``` ### Create knowledge base ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} import pixeltable as pxt from pixeltable.functions.huggingface import sentence_transformer # Create a fresh directory pxt.drop_dir('search_demo', force=True) pxt.create_dir('search_demo') ```

  Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
  Created directory 'search\_demo'.
  \

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Create table with content and metadata kb = pxt.create_table( 'search_demo/articles', {'title': pxt.String, 'content': pxt.String, 'category': pxt.String}, ) ```

  Created table 'articles'.

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Insert sample content kb.insert( [ { 'title': 'Debugging best practices', 'content': 'Use logging, breakpoints, and unit tests to identify and fix issues in your code.', 'category': 'engineering', }, { 'title': 'Machine learning model optimization', 'content': 'Improve training efficiency with batch normalization, learning rate schedules, and early stopping.', 'category': 'ml', }, { 'title': 'Production infrastructure setup', 'content': 'Deploy applications using containers, load balancers, and automated scaling.', 'category': 'devops', }, { 'title': 'API design principles', 'content': 'Create RESTful endpoints with proper versioning, authentication, and error handling.', 'category': 'engineering', }, ] ) ```

  Inserting rows into \`articles\`: 4 rows \[00:00, 577.69 rows/s]
  Inserted 4 rows with 0 errors.
  4 rows inserted, 12 values computed.

### Add semantic search Create an embedding index on the content column: ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Add embedding index kb.add_embedding_index( column='content', string_embed=sentence_transformer.using(model_id='all-MiniLM-L6-v2'), ) ``` ### Search by meaning Find content semantically similar to your query: ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Search by meaning query = 'how to fix bugs' sim = kb.content.similarity(string=query) results = ( kb.order_by(sim, asc=False) .select(kb.title, kb.content, score=sim) .limit(2) ) results.collect() ```

### Filter by metadata Combine semantic search with metadata filters: ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Search within a specific category query = 'best practices' sim = kb.content.similarity(string=query) results = ( kb.where(kb.category == 'engineering') # Filter first .order_by(sim, asc=False) .select(kb.title, kb.category, score=sim) .limit(2) ) results.collect() ```

## Explanation **How similarity search works:** 1. Your query is converted to an embedding vector 2. Pixeltable finds the most similar vectors in the index 3. Results are ranked by cosine similarity (0 to 1) **Embedding models:**

**New content is indexed automatically:** When you insert new rows, embeddings are generated without extra code. ## See also * [Vector database documentation](/platform/embedding-indexes) * [Split documents for RAG](/howto/cookbooks/text/doc-chunk-for-rag)