This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Transcribe audio files and generate summaries automatically using
Whisper and LLMs.
Problem
You have podcast episodes, meeting recordings, or interviews that need
both transcription and summarization. Doing this manually is
time-consuming and doesn’t scale.
Solution
What’s in this recipe:
- Transcribe audio with Whisper (runs locally)
- Generate summaries with an LLM
- Chain transcription → summarization automatically
You create a pipeline where audio is transcribed first, then the
transcript is summarized. Both steps run automatically when you insert
new audio files.
Setup
%pip install -qU pixeltable openai-whisper openai
import os
import getpass
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
import pixeltable as pxt
from pixeltable.functions import whisper, openai
# Create a fresh directory
pxt.drop_dir('podcast_demo', force=True)
pxt.create_dir('podcast_demo')
Created directory ‘podcast_demo’.
<pixeltable.catalog.dir.Dir at 0x30c117650>
Create the pipeline
Create a table with audio input, then add computed columns for
transcription and summarization:
# Create table for audio files
podcasts = pxt.create_table(
'podcast_demo.episodes',
{'title': pxt.String, 'audio': pxt.Audio}
)
Created table ‘episodes’.
# Step 1: Transcribe with local Whisper (uses GPU if available)
podcasts.add_computed_column(
transcription=whisper.transcribe(podcasts.audio, model='base.en')
)
Added 0 column values with 0 errors.
No rows affected.
# Extract the text from transcription result (cast to String for concatenation)
podcasts.add_computed_column(
transcript_text=podcasts.transcription.text.astype(pxt.String)
)
Added 0 column values with 0 errors.
No rows affected.
# Step 2: Summarize the transcript with OpenAI
summary_prompt = '''Summarize this transcript in 2-3 sentences, then list 3 key points.
Transcript:
''' + podcasts.transcript_text
podcasts.add_computed_column(
summary_response=openai.chat_completions(
messages=[{'role': 'user', 'content': summary_prompt}],
model='gpt-4o-mini'
)
)
Added 0 column values with 0 errors.
No rows affected.
# Extract summary text from response
podcasts.add_computed_column(
summary=podcasts.summary_response.choices[0].message.content
)
Added 0 column values with 0 errors.
No rows affected.
Process audio files
Insert audio files and watch the pipeline run automatically:
# Insert sample audio
audio_url = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/10-minute%20tour%20of%20Pixeltable.mp3'
podcasts.insert([{
'title': 'Pixeltable Tour',
'audio': audio_url
}])
Inserting rows into `episodes`: 1 rows [00:00, 185.18 rows/s]
Inserted 1 row with 0 errors.
1 row inserted, 8 values computed.
# View transcript
podcasts.select(podcasts.title, podcasts.transcript_text).collect()
# View summary
podcasts.select(podcasts.title, podcasts.summary).collect()
Explanation
Pipeline architecture:
Audio → Whisper transcription → Transcript text → LLM summarization → Summary
Each step is a computed column that depends on the previous one. When
you insert a new audio file, all steps run automatically in sequence.
Whisper model options:
For production with varied audio quality, use small.en or larger.
See also