Skip to main content
If you’re running a RAG application with Postgres for metadata, a vector database like Pinecone or Weaviate for embeddings, and LangChain for orchestration — this guide shows how Pixeltable unifies all three.
Related use case: Backend for AI Apps

Concept Mapping

Your Database StackPixeltable Equivalent
Postgres / MySQL for metadatapxt.create_table() with typed columns
Pinecone / Weaviate / Chroma for embeddingsadd_embedding_index() — built-in HNSW search
S3 for media files (referenced by URL)pxt.Image, pxt.Video, pxt.Document native types
ORM (SQLAlchemy, Prisma).select(), .where(), .order_by()
LangChain DocumentLoaderinsert(), import_csv(), import from S3
RecursiveCharacterTextSplitterdocument_splitter iterator via create_view()
retriever.get_relevant_documents().similarity() + .order_by()
create_retrieval_chain()Computed column with LLM call
Keeping Postgres and Pinecone in syncAutomatic — derived columns can’t go stale

Side by Side: RAG Pipeline

Load documents, chunk, embed, retrieve, and generate answers.
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_pinecone import PineconeVectorStore
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.prompts import PromptTemplate

# Load and chunk
documents = PyPDFLoader('report.pdf').load()
chunks = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200
).split_documents(documents)

# Embed and store in Pinecone
embeddings = OpenAIEmbeddings(model='text-embedding-3-small')
vector_store = PineconeVectorStore.from_documents(
    chunks, embeddings, index_name='my-index')
retriever = vector_store.as_retriever(search_kwargs={'k': 5})

# Build chain
prompt = PromptTemplate.from_template(
    'Answer based on context:\n{context}\n\nQuestion: {input}')
llm = ChatOpenAI(model='gpt-4o-mini', temperature=0)
rag_chain = create_retrieval_chain(
    retriever,
    create_stuff_documents_chain(llm, prompt))

result = rag_chain.invoke({'input': 'What were the key findings?'})
print(result['answer'])
Packages: langchain, langchain-openai, langchain-pinecone, pinecone-client, sqlalchemy

What Changes

LangChain + PineconePixeltable
New documentsRe-run chunking, embedding, and Pinecone upsertdocs.insert([...]) — chunks, embeddings, and index update automatically
InfrastructurePostgres + Pinecone account + API keysSingle local system, no external services
Sync issuesPostgres metadata and Pinecone vectors can driftImpossible — derived columns are always consistent
Intermediate resultsEphemeral unless you add loggingEvery column is stored and queryable: qa.select(qa.context).collect()
VersioningNot built-int.history(), pxt.create_snapshot()
Swap providersRewrite chain with new provider classesChange the model string — same pipeline

Common Patterns

Adding new documents

new_docs = PyPDFLoader('new_report.pdf').load()
new_chunks = splitter.split_documents(new_docs)
vector_store.add_documents(new_chunks)
# Also update Postgres metadata...

Filtering by metadata

retriever = vector_store.as_retriever(
    search_kwargs={'k': 5, 'filter': {'source': 'annual_report'}})

Inspecting what was retrieved

result = rag_chain.invoke({'input': query})
print(result['context'])  # if available

Next Steps

Last modified on March 3, 2026