udf transcribe()
transcribe function, as described in the WhisperX library documentation.
If diarize=True, then speaker diarization will also be performed. Several of the UDF parameters are only valid if diarize=True, as documented in the parameters list below.
Requirements:
pip install whisperx
audio(Audio): The audio file to transcribe.model(String): The name of the model to use for transcription.diarize(Bool): Whether to perform speaker diarization.compute_type(String | None): The compute type to use for the model (e.g.,'int8','float16'). IfNone, defaults to'float16'on CUDA devices and'int8'otherwise.language(String | None): The language code for the transcription (e.g.,'en'for English).task(String | None): The task to perform (e.g.,'transcribe'or'translate'). Defaults to'transcribe'.chunk_size(Int | None): The size of the audio chunks to process, in seconds. Defaults to30.alignment_model_name(String | None): The name of the alignment model to use. IfNone, uses the default model for the given language. Only valid ifdiarize=True.interpolate_method(String | None): The method to use for interpolation of the alignment results. If not specified, uses the WhisperX default ('nearest'). Only valid ifdiarize=True.return_char_alignments(Bool | None): Whether to return character-level alignments. Defaults toFalse. Only valid ifdiarize=True.diarization_model_name(String | None): The name of the diarization model to use. Defaults topyannote/speaker-diarization-3.1. Only valid ifdiarize=True.num_speakers(Int | None): The number of speakers to expect in the audio. By default, the model with try to detect the number of speakers. Only valid ifdiarize=True.min_speakers(Int | None): If specified, the minimum number of speakers to expect in the audio. Only valid ifdiarize=True.max_speakers(Int | None): If specified, the maximum number of speakers to expect in the audio. Only valid ifdiarize=True.
Json: A dictionary containing the audio transcription, diarization (if enabled), and various other metadata.
tiny.en to an existing Pixeltable column tbl.audio of the table tbl:
tiny.en to an existing Pixeltable column tbl.audio of the table tbl, with speaker diarization enabled, expecting at least 2 speakers: