Documentation Index Fetch the complete documentation index at: https://docs.pixeltable.com/llms.txt
Use this file to discover all available pages before exploring further.
If you’re running a RAG application with Postgres for metadata, a vector database like Pinecone or Weaviate for embeddings, and LangChain for orchestration — this guide shows how Pixeltable unifies all three.
Concept Mapping
Your Database Stack Pixeltable Equivalent Postgres / MySQL for metadata pxt.create_table() with typed columnsPinecone / Weaviate / Chroma for embeddings add_embedding_index() — built-in HNSW searchS3 for media files (referenced by URL) pxt.Image, pxt.Video, pxt.Document native typesORM (SQLAlchemy, Prisma) .select(), .where(), .order_by()LangChain DocumentLoader insert(), import_csv() , import from S3 RecursiveCharacterTextSplitterdocument_splitter iterator via create_view()retriever.get_relevant_documents().similarity() + .order_by()create_retrieval_chain()Computed column with LLM callKeeping Postgres and Pinecone in sync Automatic — derived columns can’t go stale
Side by Side: RAG Pipeline
Load documents, chunk, embed, retrieve, and generate answers.
LangChain + Pinecone
Pixeltable
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_pinecone import PineconeVectorStore
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.prompts import PromptTemplate
# Load and chunk
documents = PyPDFLoader( 'report.pdf' ).load()
chunks = RecursiveCharacterTextSplitter(
chunk_size = 1000 , chunk_overlap = 200
).split_documents(documents)
# Embed and store in Pinecone
embeddings = OpenAIEmbeddings( model = 'text-embedding-3-small' )
vector_store = PineconeVectorStore.from_documents(
chunks, embeddings, index_name = 'my-index' )
retriever = vector_store.as_retriever( search_kwargs ={ 'k' : 5 })
# Build chain
prompt = PromptTemplate.from_template(
'Answer based on context: \n {context} \n\n Question: {input} ' )
llm = ChatOpenAI( model = 'gpt-4o-mini' , temperature = 0 )
rag_chain = create_retrieval_chain(
retriever,
create_stuff_documents_chain(llm, prompt))
result = rag_chain.invoke({ 'input' : 'What were the key findings?' })
print (result[ 'answer' ])
Packages: langchain, langchain-openai, langchain-pinecone, pinecone-client, sqlalchemyimport pixeltable as pxt
from pixeltable.functions.openai import chat_completions, embeddings
from pixeltable.functions.document import document_splitter
docs = pxt.create_table( 'rag.docs' , {
'pdf' : pxt.Document, 'source' : pxt.String})
chunks = pxt.create_view( 'rag.chunks' , docs,
iterator =document_splitter(
docs.pdf, separators = 'sentence,token_limit' , limit = 300 ))
chunks.add_embedding_index( 'text' ,
string_embed =embeddings.using( model = 'text-embedding-3-small' ))
@pxt.query
def retrieve ( question : str , top_k : int = 5 ) -> pxt.Query:
sim = chunks.text.similarity( string =question)
return chunks.order_by(sim, asc = False ) \
.limit(top_k).select(chunks.text)
qa = pxt.create_table( 'rag.qa' , { 'question' : pxt.String})
qa.add_computed_column( context =retrieve(qa.question))
@pxt.udf
def build_prompt ( question : str , context : list[ dict ]) -> str :
ctx = ' \n\n ' .join(c[ 'text' ] for c in context)
return f 'Answer based on context: \n { ctx } \n\n Question: { question } '
qa.add_computed_column( prompt =build_prompt(qa.question, qa.context))
qa.add_computed_column( response =chat_completions(
messages =[{ 'role' : 'user' , 'content' : qa.prompt}],
model = 'gpt-4o-mini' ))
qa.add_computed_column(
answer =qa.response.choices[ 0 ].message.content)
docs.insert([{ 'pdf' : 'report.pdf' , 'source' : 'annual_report' }])
qa.insert([{ 'question' : 'What were the key findings?' }])
qa.select(qa.question, qa.answer).collect()
Packages: pixeltable, openai
What Changes
LangChain + Pinecone Pixeltable New documents Re-run chunking, embedding, and Pinecone upsert docs.insert([...]) — chunks, embeddings, and index update automaticallyInfrastructure Postgres + Pinecone account + API keys Single local system, no external services Sync issues Postgres metadata and Pinecone vectors can drift Impossible — derived columns are always consistent Intermediate results Ephemeral unless you add logging Every column is stored and queryable: qa.select(qa.context).collect() Versioning Not built-in t.history(), pxt.create_snapshot()Swap providers Rewrite chain with new provider classes Change the model string — same pipeline
Common Patterns
Adding new documents
LangChain + Pinecone
Pixeltable
new_docs = PyPDFLoader( 'new_report.pdf' ).load()
new_chunks = splitter.split_documents(new_docs)
vector_store.add_documents(new_chunks)
# Also update Postgres metadata...
docs.insert([{ 'pdf' : 'new_report.pdf' , 'source' : 'quarterly' }])
retriever = vector_store.as_retriever(
search_kwargs ={ 'k' : 5 , 'filter' : { 'source' : 'annual_report' }})
sim = chunks.text.similarity( string =query)
results = (chunks
.where((chunks.source == 'annual_report' ) & (sim > 0.3 ))
.order_by(sim, asc = False ).limit( 5 ).collect())
Inspecting what was retrieved
result = rag_chain.invoke({ 'input' : query})
print (result[ 'context' ]) # if available
qa.select(qa.question, qa.context, qa.answer).collect()
Next Steps
Backend for AI Apps Full use case walkthrough
RAG Pipeline Complete RAG system with chunking and retrieval
Chunk Documents Control chunk size, overlap, and splitting strategies
Semantic Search Search patterns and similarity queries