Skip to main content
Open in Kaggle  Open in Colab  Download Notebook
This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.
Extract the audio track from video files for transcription, analysis, or processing.

Problem

You have video files but need to work with just the audio track—for transcription, speaker analysis, or audio processing. Extracting audio manually with ffmpeg is tedious and doesn’t integrate with your data pipeline.

Solution

What’s in this recipe:
  • Extract audio from video as a computed column
  • Choose audio format (mp3, wav, flac)
  • Chain with transcription for automatic video-to-text
You use the extract_audio function to create an audio column from video. This integrates seamlessly with transcription and other audio processing.

Setup

%pip install -qU pixeltable
import pixeltable as pxt
from pixeltable.functions.video import extract_audio
# Create a fresh directory
pxt.drop_dir('audio_extract_demo', force=True)
pxt.create_dir('audio_extract_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘audio_extract_demo’.
<pixeltable.catalog.dir.Dir at 0x1061fc510>

Extract audio from video

# Create table for videos
videos = pxt.create_table(
    'audio_extract_demo.videos',
    {'title': pxt.String, 'video': pxt.Video}
)
Created table ‘videos’.
# Add computed column to extract audio as MP3
videos.add_computed_column(
    audio=extract_audio(videos.video, format='mp3')
)
Added 0 column values with 0 errors.
No rows affected.
# Insert a sample video (from multimedia-commons with audio)
video_url = 's3://multimedia-commons/data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4'

videos.insert([{
    'title': 'Sample Video',
    'video': video_url
}])
Inserting rows into `videos`: 1 rows [00:00, 207.52 rows/s]
Inserted 1 row with 0 errors.
1 row inserted, 4 values computed.
# View results
videos.select(videos.title, videos.audio).collect()

Chain with transcription

Add transcription as a follow-up computed column:
# Install whisper for transcription
%pip install -qU openai-whisper
from pixeltable.functions import whisper

# Add transcription of the extracted audio
videos.add_computed_column(
    transcription=whisper.transcribe(videos.audio, model='base.en')
)
Added 1 column value with 0 errors.
1 row updated, 1 value computed.
# Extract the transcript text
videos.add_computed_column(
    transcript=videos.transcription.text
)
Added 1 column value with 0 errors.
1 row updated, 1 value computed.
# View the full pipeline results
videos.select(videos.title, videos.transcript).collect()

Explanation

Audio format options:
Pipeline flow:
Video → extract_audio → Audio → whisper.transcribe → Transcript
Each step is a computed column. When you insert a new video: 1. Audio is extracted automatically 2. Whisper transcribes the audio 3. All results are cached for future queries

See also