Build a multimodal video search workflow with Pixeltable
Pixeltable lets you build comprehensive video search workflows combining both audio and visual content:
Install Dependencies
Define Your Workflow
Create table.py
:
Use Your Workflow
Create app.py
:
Process both audio and visual content from the same videos:
Automatic image description using vision models:
Use the same embedding model for both text and image descriptions:
Search independently across audio or visual content:
Video Processing
Extracts both audio and visual content:
Visual Content Analysis
Analyzes video frames with AI:
Audio Processing
Handles audio for efficient transcription:
Speech-to-Text
Uses OpenAI’s Whisper for transcription:
Vector Search
Implements unified embedding space: