File	Duration	Challenge
podcast.mp3	60 min	Too long to process at once
interview.mp4	30 min	Need to extract audio first
meeting.wav	2 hours	Must segment for memory efficiency

segment_start	segment_end
0.	30.
28.003	58.003

segment_start	segment_end	text
0.	30.	of experiencing self versus remembering self. I was hoping you can give a simple answer of how we should live life. Based on the fact that our memories could be a source of happiness or could be the primary source of happiness, that an event when experienced bears its fruits the most when it's remembered over and over and over and over.
28.003	58.003	over and over and over and over and maybe there is some wisdom in the fact that we can control to some degree how we remember how we evolve our memory of it such that it can maximize the long-term happiness of that repeated experience. Okay, well first I'll say I wish I could take you on the road with me. That was such a great description. Can I be your opening ax? Oh my God, no, I'm going to open for you dude. Otherwise it's like, you know, everybody leaves.

Model	Speed	Quality	Best for
`tiny.en`	Fastest	Basic	Quick tests
`base.en`	Fast	Good	General use
`small.en`	Medium	Better	Higher accuracy
`medium.en`	Slow	Great	Professional quality
`large`	Slowest	Best	Maximum accuracy

Parameter	Description
`duration`	Duration of each segment in seconds
`overlap`	Overlap between segments (helps with word boundaries)
`min_segment_duration`	Drop the last segment if shorter than this

## Solution **What’s in this recipe:** * Transcribe audio files locally with Whisper (no API key) * Automatically segment long files * Extract and transcribe audio from videos You create a view with `audio_splitter` to break long files into segments, then add a computed column for transcription. Whisper runs locally on your machine—no API calls needed. ### Setup ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} %pip install -qU pixeltable openai-whisper ``` ### Load audio files ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} import pixeltable as pxt from pixeltable.functions import whisper from pixeltable.functions.audio import audio_splitter # Create a fresh directory pxt.drop_dir('audio_demo', force=True) pxt.create_dir('audio_demo') ```

  Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
  Converting metadata from version 45 to 46
  Created directory 'audio\_demo'.
  \

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Create table for audio files audio = pxt.create_table('audio_demo/files', {'audio': pxt.Audio}) ```

  Created table 'files'.

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Insert a sample audio file (video files also work - audio is extracted automatically) audio.insert( [ { 'audio': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/audio-transcription-demo/Lex-Fridman-Podcast-430-Excerpt-0.mp4' } ] ) ```

  Inserted 1 row with 0 errors in 1.05 s (0.95 rows/s)
  1 row inserted.

### Split into segments Create a view that splits audio into 30-second segments with overlap: ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Split audio into segments for transcription segments = pxt.create_view( 'audio_demo/segments', audio, iterator=audio_splitter( audio.audio, duration=30.0, # 30-second segments overlap=2.0, # 2-second overlap for context min_segment_duration=5.0, # Drop segments shorter than 5 seconds ), ) ``` ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # View the segments segments.select(segments.segment_start, segments.segment_end).collect() ```

### Transcribe with Whisper Add a computed column that transcribes each segment: ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Add transcription column (runs locally - no API key needed) segments.add_computed_column( transcription=whisper.transcribe( audio=segments.audio_segment, model='base.en', # Options: tiny.en, base.en, small.en, medium.en, large ) ) ```

  Added 2 column values with 0 errors in 3.35 s (0.60 rows/s)
  2 rows updated.

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Extract just the text segments.add_computed_column(text=segments.transcription.text) ```

  Added 2 column values with 0 errors in 0.06 s (31.82 rows/s)
  2 rows updated.

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # View transcriptions with timestamps segments.select( segments.segment_start, segments.segment_end, segments.text ).collect() ```

## Explanation **Whisper models:**

Models ending in `.en` are English-only and faster. Remove `.en` for multilingual support. **audio\_splitter parameters:**

**Video files work too:** When you insert a video file, Pixeltable automatically extracts the audio track. ## See also * [Iterators documentation](/platform/iterators) * [Whisper library](https://github.com/openai/whisper)