Summarize podcasts and audio

This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.

Transcribe audio files and generate summaries automatically using Whisper and LLMs.

Problem

You have podcast episodes, meeting recordings, or interviews that need both transcription and summarization. Doing this manually is time-consuming and doesn’t scale.

Solution

What’s in this recipe:

Transcribe audio with Whisper (runs locally)
Generate summaries with an LLM
Chain transcription → summarization automatically

You create a pipeline where audio is transcribed first, then the transcript is summarized. Both steps run automatically when you insert new audio files.

Setup

%pip install -qU pixeltable openai-whisper openai

import getpass
import os

if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')

import pixeltable as pxt
from pixeltable.functions import openai, whisper

# Create a fresh directory
pxt.drop_dir('podcast_demo', force=True)
pxt.create_dir('podcast_demo')

Created directory ‘podcast_demo’.
<pixeltable.catalog.dir.Dir at 0x30c117650>

Create the pipeline

Create a table with audio input, then add computed columns for transcription and summarization:

# Create table for audio files
podcasts = pxt.create_table(
    'podcast_demo/episodes', {'title': pxt.String, 'audio': pxt.Audio}
)

Created table ‘episodes’.

# Step 1: Transcribe with local Whisper (uses GPU if available)
podcasts.add_computed_column(
    transcription=whisper.transcribe(podcasts.audio, model='base.en')
)

Added 0 column values with 0 errors.
No rows affected.

# Extract the text from transcription result (cast to String for concatenation)
podcasts.add_computed_column(
    transcript_text=podcasts.transcription.text.astype(pxt.String)
)

Added 0 column values with 0 errors.
No rows affected.

# Step 2: Summarize the transcript with OpenAI
summary_prompt = (
    """Summarize this transcript in 2-3 sentences, then list 3 key points.

Transcript:
"""
    + podcasts.transcript_text
)

podcasts.add_computed_column(
    summary_response=openai.chat_completions(
        messages=[{'role': 'user', 'content': summary_prompt}],
        model='gpt-4o-mini',
    )
)

Added 0 column values with 0 errors.
No rows affected.

# Extract summary text from response
podcasts.add_computed_column(
    summary=podcasts.summary_response.choices[0].message.content
)

Added 0 column values with 0 errors.
No rows affected.

Process audio files

Insert audio files and watch the pipeline run automatically:

# Insert sample audio
audio_url = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/10-minute%20tour%20of%20Pixeltable.mp3'

podcasts.insert([{'title': 'Pixeltable Tour', 'audio': audio_url}])

Inserting rows into `episodes`: 1 rows [00:00, 185.18 rows/s]
Inserted 1 row with 0 errors.
1 row inserted, 8 values computed.

# View transcript
podcasts.select(podcasts.title, podcasts.transcript_text).collect()

# View summary
podcasts.select(podcasts.title, podcasts.summary).collect()

Explanation

Pipeline architecture:

Audio → Whisper transcription → Transcript text → LLM summarization → Summary

Each step is a computed column that depends on the previous one. When you insert a new audio file, all steps run automatically in sequence. Whisper model options:

For production with varied audio quality, use small.en or larger.

Welcome to Pixeltable

Core Concepts

How-To

Problem

Solution

Setup

Create the pipeline

Process audio files

Explanation

See also

Welcome to Pixeltable

Core Concepts

How-To

​Problem

​Solution

​Setup

​Create the pipeline

​Process audio files

​Explanation

​See also

Problem

Solution

Setup

Create the pipeline

Process audio files

Explanation

See also