Iterators
Learn about iterators for processing documents, videos, audio, and images
What are Iterators?
Iterators in Pixeltable are specialized tools for processing and transforming media content. They efficiently break down large files into manageable chunks, enabling analysis at different granularities. Iterators work seamlessly with views to create virtual derived tables without duplicating storage.
In Pixeltable, iterators:
- Process media files incrementally to manage memory efficiently
- Transform single records into multiple output records
- Support various media types including documents, videos, images, and audio
- Integrate with the view system for automated processing pipelines
- Provide configurable parameters for fine-tuning output
Iterators are particularly useful when:
- Working with large media files that can’t be processed at once
- Building retrieval systems that require chunked content
- Creating analysis pipelines for multimedia data
- Implementing feature extraction workflows
Core Concepts
Document Splitting
Split documents into chunks by headings, paragraphs, or sentences
Video Processing
Extract frames at specified intervals or counts
Image Tiling
Divide images into overlapping or non-overlapping tiles
Audio Chunking
Split audio files into time-based chunks with configurable overlap
Iterators are powerful tools for processing large media files. They work seamlessly with Pixeltable’s computed columns and versioning system.
Available Iterators
Parameters
separators
: Choose from ‘heading’, ‘paragraph’, ‘sentence’, ‘token_limit’, ‘char_limit’, ‘page’limit
: Maximum tokens/characters per chunkmetadata
: Optional fields like ‘title’, ‘heading’, ‘sourceline’, ‘page’, ‘bounding_box’overlap
: Optional overlap between chunks
Parameters
separators
: Choose from ‘heading’, ‘paragraph’, ‘sentence’, ‘token_limit’, ‘char_limit’, ‘page’limit
: Maximum tokens/characters per chunkmetadata
: Optional fields like ‘title’, ‘heading’, ‘sourceline’, ‘page’, ‘bounding_box’overlap
: Optional overlap between chunks
Parameters
fps
: Frames per second to extract (can be fractional)num_frames
: Exact number of frames to extract- Only one of
fps
ornum_frames
can be specified
Parameters
tile_size
: Tuple of (width, height) for each tileoverlap
: Optional tuple for overlap between tiles
Parameters
chunk_duration_sec
(float): Duration of each audio chunk in secondsoverlap_sec
(float, default: 0.0): Overlap duration between consecutive chunks in secondsmin_chunk_duration_sec
(float, default: 0.0): Minimum duration threshold - the last chunk will be dropped if it’s shorter than this value
Returns
For each chunk, yields:
start_time_sec
: Start time of the chunk in secondsend_time_sec
: End time of the chunk in secondsaudio_chunk
: The audio chunk as pxt.Audio type
Notes
- If the input contains no audio, no chunks are yielded
- The audio file is processed efficiently with proper codec handling
- Supports various audio formats including MP3, AAC, Vorbis, Opus, FLAC
Common Use Cases
Document Processing
Split documents for:
- RAG systems
- Text analysis
- Content extraction
Video Analysis
Extract frames for:
- Object detection
- Scene classification
- Activity recognition
Image Processing
Create tiles for:
- High-resolution analysis
- Object detection
- Segmentation tasks
Audio Analysis
Split audio for:
- Speech recognition
- Sound classification
- Audio feature extraction
Example Workflows
Best Practices
Memory Management
- Use appropriate chunk sizes
- Consider overlap requirements
- Monitor memory usage with large files
Performance
- Balance chunk size vs. processing time
- Use batch processing when possible
- Cache intermediate results
Tips & Tricks
When using token_limit
with DocumentSplitter, ensure the limit accounts for any model context windows in your pipeline.