This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Create a retrieval-augmented generation system that answers questions
using your documents as context.
Problem
You want an LLM to answer questions using your specific documents—not
just its training data. You need to retrieve relevant context and
include it in the prompt.
Solution
What’s in this recipe:
- Embed and index documents for retrieval
- Create a query function that retrieves context
- Generate answers grounded in your documents
You build a pipeline that: (1) embeds documents, (2) finds relevant
chunks for a query, and (3) generates an answer using those chunks as
context.
Setup
%pip install -qU pixeltable openai
import os
import getpass
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
import pixeltable as pxt
from pixeltable.functions.openai import embeddings, chat_completions
# Create a fresh directory
pxt.drop_dir('rag_demo', force=True)
pxt.create_dir('rag_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘rag_demo’.
<pixeltable.catalog.dir.Dir at 0x17c878c10>
Step 1: create document store with embeddings
# Create table for document chunks
chunks = pxt.create_table(
'rag_demo.chunks',
{'doc_id': pxt.String, 'chunk_text': pxt.String}
)
Created table ‘chunks’.
# Add embedding index for semantic search
chunks.add_embedding_index(
column="chunk_text",
string_embed=embeddings.using(model="text-embedding-3-small")
)
Step 2: load documents
# Sample knowledge base (in production, load from files/database)
documents = [
{
'doc_id': 'password-reset',
'chunk_text': 'To reset your password, go to the login page and click "Forgot Password". Enter your email address and you will receive a reset link within 5 minutes. The link expires after 24 hours.'
},
{
'doc_id': 'password-reset',
'chunk_text': 'Password requirements: minimum 8 characters, at least one uppercase letter, one number, and one special character. Passwords expire every 90 days for security.'
},
{
'doc_id': 'account-settings',
'chunk_text': 'To update your profile, navigate to Settings > Account. You can change your display name, email address, and notification preferences. Changes take effect immediately.'
},
{
'doc_id': 'billing',
'chunk_text': 'Billing occurs on the first of each month. You can view invoices under Settings > Billing. To change your payment method, click "Update Payment" and enter your new card details.'
},
{
'doc_id': 'api-access',
'chunk_text': 'API keys can be generated in Settings > Developer. Each key has configurable permissions. Rate limits are 1000 requests per minute for standard plans, 10000 for enterprise.'
},
]
chunks.insert(documents)
Inserting rows into `chunks`: 5 rows [00:00, 345.31 rows/s]
Inserted 5 rows with 0 errors.
5 rows inserted, 15 values computed.
Step 3: create the RAG query function
# Define a query function that retrieves context
@pxt.query
def retrieve_context(query: str, top_k: int = 3):
"""Retrieve the most relevant chunks for a query."""
sim = chunks.chunk_text.similarity(query)
return (
chunks.where(sim > 0.5)
.order_by(sim, asc=False)
.limit(top_k)
.select(
doc_id=chunks.doc_id,
text=chunks.chunk_text
)
)
# View retrieved context for a query
query = "What are the key features?"
context_chunks = retrieve_context(query)
context_chunks
retrieve_context(‘What are the key features?’)
Step 4: generate answers with context
# Create a table for questions/answers
qa = pxt.create_table(
'rag_demo.qa',
{'question': pxt.String}
)
Created table ‘qa’.
# Add retrieval step
qa.add_computed_column(context=retrieve_context(qa.question, top_k=3))
Added 0 column values with 0 errors.
No rows affected.
# Build the RAG prompt
@pxt.udf
def build_rag_prompt(question: str, context: list[dict]) -> str:
context_text = '\n\n'.join([f"[{c['doc_id']}]: {c['text']}" for c in context])
return f"""Answer the question based only on the provided context. If the context doesn't contain the answer, say "I don't have information about that."
Context:
{context_text}
Question: {question}
Answer:"""
qa.add_computed_column(prompt=build_rag_prompt(qa.question, qa.context))
Added 0 column values with 0 errors.
No rows affected.
# Generate answer
qa.add_computed_column(
response=chat_completions(
messages=[{'role': 'user', 'content': qa.prompt}],
model='gpt-4o-mini'
)
)
qa.add_computed_column(answer=qa.response.choices[0].message.content)
Added 0 column values with 0 errors.
Added 0 column values with 0 errors.
No rows affected.
Ask questions
# Insert questions
questions = [
{'question': 'How do I reset my password?'},
{'question': 'What are the API rate limits?'},
{'question': 'When am I billed?'},
]
qa.insert(questions)
Inserting rows into `qa`: 3 rows [00:00, 872.12 rows/s]
Inserted 3 rows with 0 errors.
3 rows inserted, 18 values computed.
# View answers
qa.select(qa.question, qa.answer).collect()
Explanation
RAG pipeline flow:
Question → Embed → Retrieve similar chunks → Build prompt with context → Generate answer
Key components:
Scaling tips:
- Use
doc-chunk-for-rag recipe to split long documents
- Adjust
top_k to balance context size vs. relevance
- Consider metadata filtering for large knowledge bases
See also