Build a RAG pipeline

This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.

Create a retrieval-augmented generation system that answers questions using your documents as context.

Problem

You want an LLM to answer questions using your specific documents—not just its training data. You need to retrieve relevant context and include it in the prompt.

Solution

What’s in this recipe:

Embed and index documents for retrieval
Create a query function that retrieves context
Generate answers grounded in your documents

You build a pipeline that: (1) embeds documents, (2) finds relevant chunks for a query, and (3) generates an answer using those chunks as context.

Setup

%pip install -qU pixeltable openai

import os
import getpass

if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')

import pixeltable as pxt
from pixeltable.functions.openai import embeddings, chat_completions

# Create a fresh directory
pxt.drop_dir('rag_demo', force=True)
pxt.create_dir('rag_demo')

Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘rag_demo’.
<pixeltable.catalog.dir.Dir at 0x17c878c10>

Step 1: create document store with embeddings

# Create table for document chunks
chunks = pxt.create_table(
    'rag_demo.chunks',
    {'doc_id': pxt.String, 'chunk_text': pxt.String}
)

Created table ‘chunks’.

# Add embedding index for semantic search
chunks.add_embedding_index(
    column="chunk_text",
    string_embed=embeddings.using(model="text-embedding-3-small")
)

Step 2: load documents

# Sample knowledge base (in production, load from files/database)
documents = [
    {
        'doc_id': 'password-reset',
        'chunk_text': 'To reset your password, go to the login page and click "Forgot Password". Enter your email address and you will receive a reset link within 5 minutes. The link expires after 24 hours.'
    },
    {
        'doc_id': 'password-reset',
        'chunk_text': 'Password requirements: minimum 8 characters, at least one uppercase letter, one number, and one special character. Passwords expire every 90 days for security.'
    },
    {
        'doc_id': 'account-settings',
        'chunk_text': 'To update your profile, navigate to Settings > Account. You can change your display name, email address, and notification preferences. Changes take effect immediately.'
    },
    {
        'doc_id': 'billing',
        'chunk_text': 'Billing occurs on the first of each month. You can view invoices under Settings > Billing. To change your payment method, click "Update Payment" and enter your new card details.'
    },
    {
        'doc_id': 'api-access',
        'chunk_text': 'API keys can be generated in Settings > Developer. Each key has configurable permissions. Rate limits are 1000 requests per minute for standard plans, 10000 for enterprise.'
    },
]

chunks.insert(documents)

Inserting rows into `chunks`: 5 rows [00:00, 345.31 rows/s]
Inserted 5 rows with 0 errors.
5 rows inserted, 15 values computed.

Step 3: create the RAG query function

# Define a query function that retrieves context
@pxt.query
def retrieve_context(query: str, top_k: int = 3):
    """Retrieve the most relevant chunks for a query."""
    sim = chunks.chunk_text.similarity(string=query)
    return (
        chunks.where(sim > 0.5)
        .order_by(sim, asc=False)
        .limit(top_k)
        .select(
            doc_id=chunks.doc_id,
            text=chunks.chunk_text
        )
    )

# View retrieved context for a query
query = "What are the key features?"
context_chunks = retrieve_context(query)
context_chunks

retrieve_context(‘What are the key features?’)

Step 4: generate answers with context

# Create a table for questions/answers
qa = pxt.create_table(
    'rag_demo.qa',
    {'question': pxt.String}
)

Created table ‘qa’.

# Add retrieval step
qa.add_computed_column(context=retrieve_context(qa.question, top_k=3))

Added 0 column values with 0 errors.
No rows affected.

# Build the RAG prompt
@pxt.udf
def build_rag_prompt(question: str, context: list[dict]) -> str:
    context_text = '\n\n'.join([f"[{c['doc_id']}]: {c['text']}" for c in context])
    return f"""Answer the question based only on the provided context. If the context doesn't contain the answer, say "I don't have information about that."

Context:
{context_text}

Question: {question}

Answer:"""

qa.add_computed_column(prompt=build_rag_prompt(qa.question, qa.context))

Added 0 column values with 0 errors.
No rows affected.

# Generate answer
qa.add_computed_column(
    response=chat_completions(
        messages=[{'role': 'user', 'content': qa.prompt}],
        model='gpt-4o-mini'
    )
)
qa.add_computed_column(answer=qa.response.choices[0].message.content)

Added 0 column values with 0 errors.
Added 0 column values with 0 errors.
No rows affected.

Ask questions

# Insert questions
questions = [
    {'question': 'How do I reset my password?'},
    {'question': 'What are the API rate limits?'},
    {'question': 'When am I billed?'},
]

qa.insert(questions)

Inserting rows into `qa`: 3 rows [00:00, 872.12 rows/s]
Inserted 3 rows with 0 errors.
3 rows inserted, 18 values computed.

# View answers
qa.select(qa.question, qa.answer).collect()

Explanation

RAG pipeline flow:

Question → Embed → Retrieve similar chunks → Build prompt with context → Generate answer

Key components:

Scaling tips:

Use doc-chunk-for-rag recipe to split long documents
Adjust top_k to balance context size vs. relevance
Consider metadata filtering for large knowledge bases

Welcome to Pixeltable

Core Concepts

How-To

Problem

Solution

Setup

Step 1: create document store with embeddings

Step 2: load documents

Step 3: create the RAG query function

Step 4: generate answers with context

Ask questions

Explanation

See also

Welcome to Pixeltable

Core Concepts

How-To

​Problem

​Solution

​Setup

​Step 1: create document store with embeddings

​Step 2: load documents

​Step 3: create the RAG query function

​Step 4: generate answers with context

​Ask questions

​Explanation

​See also

Problem

Solution

Setup

Step 1: create document store with embeddings

Step 2: load documents

Step 3: create the RAG query function

Step 4: generate answers with context

Ask questions

Explanation

See also