Create text embeddings with OpenAI

This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.

Generate vector embeddings for text data to enable semantic search and similarity matching.

Problem

You need to convert text into vector embeddings for:

Semantic search (find similar documents)
RAG pipelines (retrieve relevant context)
Clustering and classification

Solution

What’s in this recipe:

Generate embeddings with OpenAI’s models
Store embeddings as computed columns
Use embeddings for similarity queries

You add an embedding column that automatically generates vectors for new rows. The embeddings are cached and only recomputed when the source text changes.

Setup

%pip install -qU pixeltable openai

import getpass
import os

if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')

import pixeltable as pxt
from pixeltable.functions.openai import embeddings

# Create a fresh directory
pxt.drop_dir('embed_demo', force=True)
pxt.create_dir('embed_demo')

Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘embed_demo’.
<pixeltable.catalog.dir.Dir at 0x14ee4fcd0>

Create table with embedding column

# Create table for documents
docs = pxt.create_table(
    'embed_demo/documents', {'title': pxt.String, 'content': pxt.String}
)

Created table ‘documents’.

# Add embedding column using OpenAI's text-embedding-3-small
docs.add_computed_column(
    embedding=embeddings(docs.content, model='text-embedding-3-small')
)

Added 0 column values with 0 errors.
No rows affected.

Insert documents

# Insert sample documents
sample_docs = [
    {
        'title': 'Python Basics',
        'content': 'Python is a high-level programming language known for its clear syntax and readability.',
    },
    {
        'title': 'Machine Learning',
        'content': 'Machine learning is a subset of AI that enables systems to learn from data.',
    },
    {
        'title': 'Web Development',
        'content': 'Web development involves building websites and web applications using HTML, CSS, and JavaScript.',
    },
    {
        'title': 'Data Science',
        'content': 'Data science combines statistics, programming, and domain expertise to extract insights from data.',
    },
    {
        'title': 'Cloud Computing',
        'content': 'Cloud computing provides on-demand computing resources over the internet.',
    },
]

docs.insert(sample_docs)

Inserting rows into `documents`: 5 rows [00:00, 553.22 rows/s]
Inserted 5 rows with 0 errors.
5 rows inserted, 15 values computed.

# View documents with embeddings (showing first 5 dimensions)
result = docs.select(docs.title, docs.embedding).collect()

Query by similarity

Find documents similar to a query by creating an embedding index:

# Add embedding index for semantic search
docs.add_embedding_index(
    column='content',
    string_embed=embeddings.using(model='text-embedding-3-small'),
)

# Search for similar documents
sim = docs.content.similarity(
    string='artificial intelligence applications'
)
results = (
    docs.where(sim > 0.2)
    .order_by(sim, asc=False)
    .limit(3)
    .select(docs.title, docs.content, sim=sim)
)
results.collect()

Explanation

OpenAI embedding models:

Similarity metrics:

Key benefits of computed embedding columns:

Embeddings are generated automatically on insert
Results are cached—no re-computation on subsequent queries
Index enables fast similarity search at scale

Welcome to Pixeltable

Core Concepts

How-To

Problem

Solution

Setup

Create table with embedding column

Insert documents

Query by similarity

Explanation

See also

Welcome to Pixeltable

Core Concepts

How-To

​Problem

​Solution

​Setup

​Create table with embedding column

​Insert documents

​Query by similarity

​Explanation

​See also

Problem

Solution

Setup

Create table with embedding column

Insert documents

Query by similarity

Explanation

See also