Documentation Index
Fetch the complete documentation index at: https://docs.pixeltable.com/llms.txt
Use this file to discover all available pages before exploring further.
module pixeltable.functions.whisperx
WhisperX audio transcription and diarization functions.udf transcribe()
Signature
transcribe function, as described in the
WhisperX library documentation.
If diarize=True, then speaker diarization will also be performed. Several of the UDF parameters are only valid if
diarize=True, as documented in the parameters list below.
Requirements:
pip install whisperx
audio(pxt.Audio): The audio file to transcribe.model(pxt.String): The name of the model to use for transcription.diarize(pxt.Bool): Whether to perform speaker diarization.compute_type(pxt.String | None): The compute type to use for the model (e.g.,'int8','float16'). IfNone, defaults to'float16'on CUDA devices and'int8'otherwise.language(pxt.String | None): The language code for the transcription (e.g.,'en'for English).task(pxt.String | None): The task to perform (e.g.,'transcribe'or'translate'). Defaults to'transcribe'.chunk_size(pxt.Int | None): The size of the audio chunks to process, in seconds. Defaults to30.alignment_model_name(pxt.String | None): The name of the alignment model to use. IfNone, uses the default model for the given language. Only valid ifdiarize=True.interpolate_method(pxt.String | None): The method to use for interpolation of the alignment results. If not specified, uses the WhisperX default ('nearest'). Only valid ifdiarize=True.return_char_alignments(pxt.Bool | None): Whether to return character-level alignments. Defaults toFalse. Only valid ifdiarize=True.diarization_model_name(pxt.String | None): The name of the diarization model to use. Defaults topyannote/speaker-diarization-3.1. Only valid ifdiarize=True.num_speakers(pxt.Int | None): The number of speakers to expect in the audio. By default, the model with try to detect the number of speakers. Only valid ifdiarize=True.min_speakers(pxt.Int | None): If specified, the minimum number of speakers to expect in the audio. Only valid ifdiarize=True.max_speakers(pxt.Int | None): If specified, the maximum number of speakers to expect in the audio. Only valid ifdiarize=True.
pxt.Json: A dictionary containing the audio transcription, diarization (if enabled), and various other metadata.
tiny.en to an existing Pixeltable column tbl.audio of the table tbl:
tiny.en to an existing Pixeltable column tbl.audio of the table tbl, with speaker diarization enabled, expecting at least 2 speakers: