This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Extract the audio track from video files for transcription, analysis, or
processing.
Problem
You have video files but need to work with just the audio track—for
transcription, speaker analysis, or audio processing. Extracting audio
manually with ffmpeg is tedious and doesn’t integrate with your data
pipeline.
Solution
What’s in this recipe:
- Extract audio from video as a computed column
- Choose audio format (mp3, wav, flac)
- Chain with transcription for automatic video-to-text
You use the extract_audio function to create an audio column from
video. This integrates seamlessly with transcription and other audio
processing.
Setup
%pip install -qU pixeltable
import pixeltable as pxt
from pixeltable.functions.video import extract_audio
# Create a fresh directory
pxt.drop_dir('audio_extract_demo', force=True)
pxt.create_dir('audio_extract_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘audio_extract_demo’.
<pixeltable.catalog.dir.Dir at 0x1061fc510>
# Create table for videos
videos = pxt.create_table(
'audio_extract_demo.videos',
{'title': pxt.String, 'video': pxt.Video}
)
Created table ‘videos’.
# Add computed column to extract audio as MP3
videos.add_computed_column(
audio=extract_audio(videos.video, format='mp3')
)
Added 0 column values with 0 errors.
No rows affected.
# Insert a sample video (from multimedia-commons with audio)
video_url = 's3://multimedia-commons/data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4'
videos.insert([{
'title': 'Sample Video',
'video': video_url
}])
Inserting rows into `videos`: 1 rows [00:00, 207.52 rows/s]
Inserted 1 row with 0 errors.
1 row inserted, 4 values computed.
# View results
videos.select(videos.title, videos.audio).collect()
Chain with transcription
Add transcription as a follow-up computed column:
# Install whisper for transcription
%pip install -qU openai-whisper
from pixeltable.functions import whisper
# Add transcription of the extracted audio
videos.add_computed_column(
transcription=whisper.transcribe(videos.audio, model='base.en')
)
Added 1 column value with 0 errors.
1 row updated, 1 value computed.
# Extract the transcript text
videos.add_computed_column(
transcript=videos.transcription.text
)
Added 1 column value with 0 errors.
1 row updated, 1 value computed.
# View the full pipeline results
videos.select(videos.title, videos.transcript).collect()
Explanation
Audio format options:
Pipeline flow:
Video → extract_audio → Audio → whisper.transcribe → Transcript
Each step is a computed column. When you insert a new video: 1. Audio is
extracted automatically 2. Whisper transcribes the audio 3. All results
are cached for future queries
See also