Building a PDF Search Workflow
Pixeltable PDF search works in two phases:- Define your workflow structure (once)
- Query your document database (anytime)
1
Install Dependencies
Define Your Workflow
Create
table.py
:Use Your Workflow
Create
app.py
:What Makes This Different?
Smart Chunking
Token-aware document splitting:
Vector Search
Natural language document search:
Auto-updating
Self-maintaining document database:
Workflow Components
PDF Processing
PDF Processing
Advanced document handling:
- Automatic text extraction
- PDF parsing and cleaning
- Structure preservation
- Support for multiple PDF formats
Text Chunking
Text Chunking
Intelligent text splitting:
- Token-aware chunking
- Configurable chunk sizes
- Context preservation
- Multiple chunking strategies
Vector Search
Vector Search
High-quality search:
- E5 text embeddings
- Fast similarity search
- Natural language queries
- Configurable similarity thresholds