Extract audio from video

This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.

Extract the audio track from video files for transcription, analysis, or processing.

Problem

You have video files but need to work with just the audio track—for transcription, speaker analysis, or audio processing. Extracting audio manually with ffmpeg is tedious and doesn’t integrate with your data pipeline.

Solution

What’s in this recipe:

Extract audio from video as a computed column
Choose audio format (mp3, wav, flac)
Chain with transcription for automatic video-to-text

You use the extract_audio function to create an audio column from video. This integrates seamlessly with transcription and other audio processing.

Setup

%pip install -qU pixeltable boto3 'numpy<2.4'

import pixeltable as pxt
from pixeltable.functions.video import extract_audio

# Create a fresh directory
pxt.drop_dir('audio_extract_demo', force=True)
pxt.create_dir('audio_extract_demo')

Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘audio_extract_demo’.
<pixeltable.catalog.dir.Dir at 0x1061fc510>

# Create table for videos
videos = pxt.create_table(
    'audio_extract_demo/videos', {'title': pxt.String, 'video': pxt.Video}
)

Created table ‘videos’.

# Add computed column to extract audio as MP3
videos.add_computed_column(
    audio=extract_audio(videos.video, format='mp3')
)

Added 0 column values with 0 errors.
No rows affected.

# Insert a sample video (from multimedia-commons with audio)
video_url = 's3://multimedia-commons/data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4'

videos.insert([{'title': 'Sample Video', 'video': video_url}])

Inserting rows into `videos`: 1 rows [00:00, 207.52 rows/s]
Inserted 1 row with 0 errors.
1 row inserted, 4 values computed.

# View results
videos.select(videos.title, videos.audio).collect()

Chain with transcription

Add transcription as a follow-up computed column:

# Install whisper for transcription
%pip install -qU openai-whisper

from pixeltable.functions import whisper

# Add transcription of the extracted audio
videos.add_computed_column(
    transcription=whisper.transcribe(videos.audio, model='base.en')
)

Added 1 column value with 0 errors.
1 row updated, 1 value computed.

# Extract the transcript text
videos.add_computed_column(transcript=videos.transcription.text)

Added 1 column value with 0 errors.
1 row updated, 1 value computed.

# View the full pipeline results
videos.select(videos.title, videos.transcript).collect()

Explanation

Audio format options:

Pipeline flow:

Video → extract_audio → Audio → whisper.transcribe → Transcript

Each step is a computed column. When you insert a new video:

Audio is extracted automatically
Whisper transcribes the audio
All results are cached for future queries

Welcome to Pixeltable

Core Concepts

How-To

Extract audio from video

Problem

Solution

Setup