UDFs
transcribe() udf
Transcribe an audio file using WhisperX.
This UDF runs a transcription model locally using the WhisperX library, equivalent to the WhisperX transcribe function, as described in the WhisperX library documentation.
If diarize=True, then speaker diarization will also be performed. Several of the UDF parameters are only valid if diarize=True, as documented in the parameters list below.
Requirements:
pip install whisperx
audio(Audio): The audio file to transcribe.model(String): The name of the model to use for transcription.diarize(Bool): Whether to perform speaker diarization.compute_type(Optional[String]): The compute type to use for the model (e.g.,'int8','float16'). IfNone, defaults to'float16'on CUDA devices and'int8'otherwise.language(Optional[String]): The language code for the transcription (e.g.,'en'for English).task(Optional[String]): The task to perform (e.g.,'transcribe'or'translate'). Defaults to'transcribe'.chunk_size(Optional[Int]): The size of the audio chunks to process, in seconds. Defaults to30.alignment_model_name(Optional[String]): The name of the alignment model to use. IfNone, uses the default model for the given language. Only valid ifdiarize=True.interpolate_method(Optional[String]): The method to use for interpolation of the alignment results. If not specified, uses the WhisperX default ('nearest'). Only valid ifdiarize=True.return_char_alignments(Optional[Bool]): Whether to return character-level alignments. Defaults toFalse. Only valid ifdiarize=True.diarization_model_name(Optional[String]): The name of the diarization model to use. Defaults topyannote/speaker-diarization-3.1. Only valid ifdiarize=True.num_speakers(Optional[Int]): The number of speakers to expect in the audio. By default, the model with try to detect the number of speakers. Only valid ifdiarize=True.min_speakers(Optional[Int]): If specified, the minimum number of speakers to expect in the audio. Only valid ifdiarize=True.max_speakers(Optional[Int]): If specified, the maximum number of speakers to expect in the audio. Only valid ifdiarize=True.
- Json: A dictionary containing the audio transcription, diarization (if enabled), and various other metadata.
tiny.en to an existing Pixeltable column tbl.audio of the table tbl:
tiny.en to an existing Pixeltable column tbl.audio of the table tbl, with speaker diarization enabled, expecting at least 2 speakers: