Skip to main content
Open in Kaggle  Open in Colab  Download Notebook
This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.
Transcribe audio files and generate summaries automatically using Whisper and LLMs.

Problem

You have podcast episodes, meeting recordings, or interviews that need both transcription and summarization. Doing this manually is time-consuming and doesn’t scale.

Solution

What’s in this recipe:
  • Transcribe audio with Whisper (runs locally)
  • Generate summaries with an LLM
  • Chain transcription → summarization automatically
You create a pipeline where audio is transcribed first, then the transcript is summarized. Both steps run automatically when you insert new audio files.

Setup

%pip install -qU pixeltable openai-whisper openai
import os
import getpass

if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
import pixeltable as pxt
from pixeltable.functions import whisper, openai
# Create a fresh directory
pxt.drop_dir('podcast_demo', force=True)
pxt.create_dir('podcast_demo')
Created directory ‘podcast_demo’.
<pixeltable.catalog.dir.Dir at 0x30c117650>

Create the pipeline

Create a table with audio input, then add computed columns for transcription and summarization:
# Create table for audio files
podcasts = pxt.create_table(
    'podcast_demo.episodes',
    {'title': pxt.String, 'audio': pxt.Audio}
)
Created table ‘episodes’.
# Step 1: Transcribe with local Whisper (uses GPU if available)
podcasts.add_computed_column(
    transcription=whisper.transcribe(podcasts.audio, model='base.en')
)
Added 0 column values with 0 errors.
No rows affected.
# Extract the text from transcription result (cast to String for concatenation)
podcasts.add_computed_column(
    transcript_text=podcasts.transcription.text.astype(pxt.String)
)
Added 0 column values with 0 errors.
No rows affected.
# Step 2: Summarize the transcript with OpenAI
summary_prompt = '''Summarize this transcript in 2-3 sentences, then list 3 key points.

Transcript:
''' + podcasts.transcript_text

podcasts.add_computed_column(
    summary_response=openai.chat_completions(
        messages=[{'role': 'user', 'content': summary_prompt}],
        model='gpt-4o-mini'
    )
)
Added 0 column values with 0 errors.
No rows affected.
# Extract summary text from response
podcasts.add_computed_column(
    summary=podcasts.summary_response.choices[0].message.content
)
Added 0 column values with 0 errors.
No rows affected.

Process audio files

Insert audio files and watch the pipeline run automatically:
# Insert sample audio
audio_url = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/10-minute%20tour%20of%20Pixeltable.mp3'

podcasts.insert([{
    'title': 'Pixeltable Tour',
    'audio': audio_url
}])
Inserting rows into `episodes`: 1 rows [00:00, 185.18 rows/s]
Inserted 1 row with 0 errors.
1 row inserted, 8 values computed.
# View transcript
podcasts.select(podcasts.title, podcasts.transcript_text).collect()
# View summary
podcasts.select(podcasts.title, podcasts.summary).collect()

Explanation

Pipeline architecture:
Audio → Whisper transcription → Transcript text → LLM summarization → Summary
Each step is a computed column that depends on the previous one. When you insert a new audio file, all steps run automatically in sequence. Whisper model options:
For production with varied audio quality, use small.en or larger.

See also