Use case	Input	Output
Document search	Articles	Find related articles
Product matching	Descriptions	Find similar products
FAQ retrieval	Questions	Match to answers

title	content	sim
Machine Learning	Machine learning is a subset of AI that enables systems to learn from data.	0.415
Data Science	Data science combines statistics, programming, and domain expertise to extract insights from data.	0.256
Web Development	Web development involves building websites and web applications using HTML, CSS, and JavaScript.	0.205

Model	Dimensions	Use case
`text-embedding-3-small`	1536	Cost-effective, good quality
`text-embedding-3-large`	3072	Higher accuracy
`text-embedding-ada-002`	1536	Legacy model

Metric	Best for
`cosine`	Text similarity (default)
`ip`	Inner product
`l2`	Euclidean distance

## Solution **What’s in this recipe:** * Generate embeddings with OpenAI’s models * Store embeddings as computed columns * Use embeddings for similarity queries You add an embedding column that automatically generates vectors for new rows. The embeddings are cached and only recomputed when the source text changes. ### Setup ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} %pip install -qU pixeltable openai ``` ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} import getpass import os if 'OPENAI_API_KEY' not in os.environ: os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ') ``` ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} import pixeltable as pxt from pixeltable.functions.openai import embeddings # Create a fresh directory pxt.drop_dir('embed_demo', force=True) pxt.create_dir('embed_demo') ```

  Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
  Created directory 'embed\_demo'.
  \

### Create table with embedding column ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Create table for documents docs = pxt.create_table( 'embed_demo/documents', {'title': pxt.String, 'content': pxt.String} ) ```

  Created table 'documents'.

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Add embedding column using OpenAI's text-embedding-3-small docs.add_computed_column( embedding=embeddings(docs.content, model='text-embedding-3-small') ) ```

  Added 0 column values with 0 errors.
  No rows affected.

### Insert documents ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Insert sample documents sample_docs = [ { 'title': 'Python Basics', 'content': 'Python is a high-level programming language known for its clear syntax and readability.', }, { 'title': 'Machine Learning', 'content': 'Machine learning is a subset of AI that enables systems to learn from data.', }, { 'title': 'Web Development', 'content': 'Web development involves building websites and web applications using HTML, CSS, and JavaScript.', }, { 'title': 'Data Science', 'content': 'Data science combines statistics, programming, and domain expertise to extract insights from data.', }, { 'title': 'Cloud Computing', 'content': 'Cloud computing provides on-demand computing resources over the internet.', }, ] docs.insert(sample_docs) ```

  Inserting rows into \`documents\`: 5 rows \[00:00, 553.22 rows/s]
  Inserted 5 rows with 0 errors.
  5 rows inserted, 15 values computed.

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # View documents with embeddings (showing first 5 dimensions) result = docs.select(docs.title, docs.embedding).collect() ``` ### Query by similarity Find documents similar to a query by creating an embedding index: ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Add embedding index for semantic search docs.add_embedding_index( column='content', string_embed=embeddings.using(model='text-embedding-3-small'), ) ``` ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Search for similar documents sim = docs.content.similarity( string='artificial intelligence applications' ) results = ( docs.where(sim > 0.2) .order_by(sim, asc=False) .limit(3) .select(docs.title, docs.content, sim=sim) ) results.collect() ```

## Explanation **OpenAI embedding models:**

**Similarity metrics:**

**Key benefits of computed embedding columns:** * Embeddings are generated automatically on insert * Results are cached—no re-computation on subsequent queries * Index enables fast similarity search at scale ## See also * [Semantic text search](/howto/cookbooks/search/search-semantic-text) - Full semantic search patterns * [Chunk documents for RAG](/howto/cookbooks/text/doc-chunk-for-rag) - Prepare documents for retrieval