Skip to main content
If you’re running a RAG application with Postgres for metadata, a vector database like Pinecone or Weaviate for embeddings, and LangChain for orchestration — this guide shows how Pixeltable unifies all three.
Related use case: Backend for AI Apps

Concept Mapping

Your Database StackPixeltable Equivalent
Postgres / MySQL for metadatapxt.create_table() with typed columns
Pinecone / Weaviate / Chroma for embeddingsadd_embedding_index() — built-in HNSW search
S3 for media files (referenced by URL)pxt.Image, pxt.Video, pxt.Document native types
ORM (SQLAlchemy, Prisma).select(), .where(), .order_by()
LangChain DocumentLoaderinsert(), import_csv(), import from S3
RecursiveCharacterTextSplitterdocument_splitter iterator via create_view()
retriever.get_relevant_documents().similarity() + .order_by()
create_retrieval_chain()Computed column with LLM call
Keeping Postgres and Pinecone in syncAutomatic — derived columns can’t go stale

Side by Side: RAG Pipeline

Load documents, chunk, embed, retrieve, and generate answers.
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_pinecone import PineconeVectorStore
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.prompts import PromptTemplate

# Load and chunk
documents = PyPDFLoader('report.pdf').load()
chunks = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200
).split_documents(documents)

# Embed and store in Pinecone
embeddings = OpenAIEmbeddings(model='text-embedding-3-small')
vector_store = PineconeVectorStore.from_documents(
    chunks, embeddings, index_name='my-index')
retriever = vector_store.as_retriever(search_kwargs={'k': 5})

# Build chain
prompt = PromptTemplate.from_template(
    'Answer based on context:\n{context}\n\nQuestion: {input}')
llm = ChatOpenAI(model='gpt-4o-mini', temperature=0)
rag_chain = create_retrieval_chain(
    retriever,
    create_stuff_documents_chain(llm, prompt))

result = rag_chain.invoke({'input': 'What were the key findings?'})
print(result['answer'])
Packages: langchain, langchain-openai, langchain-pinecone, pinecone-client, sqlalchemy

What Changes

LangChain + PineconePixeltable
New documentsRe-run chunking, embedding, and Pinecone upsertdocs.insert([...]) — chunks, embeddings, and index update automatically
InfrastructurePostgres + Pinecone account + API keysSingle local system, no external services
Sync issuesPostgres metadata and Pinecone vectors can driftImpossible — derived columns are always consistent
Intermediate resultsEphemeral unless you add loggingEvery column is stored and queryable: qa.select(qa.context).collect()
VersioningNot built-int.history(), pxt.create_snapshot()
Swap providersRewrite chain with new provider classesChange the model string — same pipeline

Common Patterns

Adding new documents

new_docs = PyPDFLoader('new_report.pdf').load()
new_chunks = splitter.split_documents(new_docs)
vector_store.add_documents(new_chunks)
# Also update Postgres metadata...

Filtering by metadata

retriever = vector_store.as_retriever(
    search_kwargs={'k': 5, 'filter': {'source': 'annual_report'}})

Inspecting what was retrieved

result = rag_chain.invoke({'input': query})
print(result['context'])  # if available

Next Steps

Backend for AI Apps

Full use case walkthrough

RAG Pipeline

Complete RAG system with chunking and retrieval

Chunk Documents

Control chunk size, overlap, and splitting strategies

Semantic Search

Search patterns and similarity queries
Last modified on March 3, 2026