UDFs
transcribe()
udf
Transcribe an audio file using WhisperX.
This UDF runs a transcription model locally using the WhisperX library, equivalent to the WhisperX transcribe
function, as described in the WhisperX library documentation.
If diarize=True
, then speaker diarization will also be performed. Several of the UDF parameters are only valid if diarize=True
, as documented in the parameters list below.
Requirements:
pip install whisperx
audio
(Audio): The audio file to transcribe.model
(String): The name of the model to use for transcription.diarize
(Bool): Whether to perform speaker diarization.compute_type
(Optional[String]): The compute type to use for the model (e.g.,'int8'
,'float16'
). IfNone
, defaults to'float16'
on CUDA devices and'int8'
otherwise.language
(Optional[String]): The language code for the transcription (e.g.,'en'
for English).task
(Optional[String]): The task to perform (e.g.,'transcribe'
or'translate'
). Defaults to'transcribe'
.chunk_size
(Optional[Int]): The size of the audio chunks to process, in seconds. Defaults to30
.alignment_model_name
(Optional[String]): The name of the alignment model to use. IfNone
, uses the default model for the given language. Only valid ifdiarize=True
.interpolate_method
(Optional[String]): The method to use for interpolation of the alignment results. If not specified, uses the WhisperX default ('nearest'
). Only valid ifdiarize=True
.return_char_alignments
(Optional[Bool]): Whether to return character-level alignments. Defaults toFalse
. Only valid ifdiarize=True
.diarization_model_name
(Optional[String]): The name of the diarization model to use. Defaults topyannote/speaker-diarization-3.1
. Only valid ifdiarize=True
.num_speakers
(Optional[Int]): The number of speakers to expect in the audio. By default, the model with try to detect the number of speakers. Only valid ifdiarize=True
.min_speakers
(Optional[Int]): If specified, the minimum number of speakers to expect in the audio. Only valid ifdiarize=True
.max_speakers
(Optional[Int]): If specified, the maximum number of speakers to expect in the audio. Only valid ifdiarize=True
.
- Json: A dictionary containing the audio transcription, diarization (if enabled), and various other metadata.
tiny.en
to an existing Pixeltable column tbl.audio
of the table tbl
:
tiny.en
to an existing Pixeltable column tbl.audio
of the table tbl
, with speaker diarization enabled, expecting at least 2 speakers: