> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pixeltable.com/llms.txt
> Use this file to discover all available pages before exploring further.

# whisperx

> <a href="https://github.com/pixeltable/pixeltable/blob/main/pixeltable/functions/whisperx.py#L0" id="viewSource" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/View%20Source%20on%20Github-blue?logo=github&labelColor=gray" alt="View Source on GitHub" style={{ display: 'inline', margin: '0px' }} noZoom /></a>

# <span style={{ 'color': 'gray' }}>module</span>  pixeltable.functions.whisperx

WhisperX audio transcription and diarization functions.

## <span style={{ 'color': 'gray' }}>udf</span>  transcribe()

```python Signature theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
@pxt.udf
transcribe(
    audio: pxt.Audio,
    *,
    model: pxt.String,
    diarize: pxt.Bool = False,
    compute_type: pxt.String | None = None,
    language: pxt.String | None = None,
    task: pxt.String | None = None,
    chunk_size: pxt.Int | None = None,
    alignment_model_name: pxt.String | None = None,
    interpolate_method: pxt.String | None = None,
    return_char_alignments: pxt.Bool | None = None,
    diarization_model_name: pxt.String | None = None,
    num_speakers: pxt.Int | None = None,
    min_speakers: pxt.Int | None = None,
    max_speakers: pxt.Int | None = None
) -> pxt.Json
```

Transcribe an audio file using WhisperX.

This UDF runs a transcription model *locally* using the WhisperX library,
equivalent to the WhisperX `transcribe` function, as described in the
[WhisperX library documentation](https://github.com/m-bain/whisperX).

If `diarize=True`, then speaker diarization will also be performed. Several of the UDF parameters are only valid if
`diarize=True`, as documented in the parameters list below.

**Requirements:**

* `pip install whisperx`

**Parameters:**

* **`audio`** (`pxt.Audio`): The audio file to transcribe.
* **`model`** (`pxt.String`): The name of the model to use for transcription.
* **`diarize`** (`pxt.Bool`): Whether to perform speaker diarization.
* **`compute_type`** (`pxt.String | None`): The compute type to use for the model (e.g., `'int8'`, `'float16'`). If `None`,
  defaults to `'float16'` on CUDA devices and `'int8'` otherwise.
* **`language`** (`pxt.String | None`): The language code for the transcription (e.g., `'en'` for English).
* **`task`** (`pxt.String | None`): The task to perform (e.g., `'transcribe'` or `'translate'`). Defaults to `'transcribe'`.
* **`chunk_size`** (`pxt.Int | None`): The size of the audio chunks to process, in seconds. Defaults to `30`.
* **`alignment_model_name`** (`pxt.String | None`): The name of the alignment model to use. If `None`, uses the default model for the given
  language. Only valid if `diarize=True`.
* **`interpolate_method`** (`pxt.String | None`): The method to use for interpolation of the alignment results. If not specified, uses the
  WhisperX default (`'nearest'`). Only valid if `diarize=True`.
* **`return_char_alignments`** (`pxt.Bool | None`): Whether to return character-level alignments. Defaults to `False`.
  Only valid if `diarize=True`.
* **`diarization_model_name`** (`pxt.String | None`): The name of the diarization model to use. Defaults to
  `pyannote/speaker-diarization-3.1`. Only valid if `diarize=True`.
* **`num_speakers`** (`pxt.Int | None`): The number of speakers to expect in the audio. By default, the model with try to detect the
  number of speakers. Only valid if `diarize=True`.
* **`min_speakers`** (`pxt.Int | None`): If specified, the minimum number of speakers to expect in the audio.
  Only valid if `diarize=True`.
* **`max_speakers`** (`pxt.Int | None`): If specified, the maximum number of speakers to expect in the audio.
  Only valid if `diarize=True`.

**Returns:**

* `pxt.Json`: A dictionary containing the audio transcription, diarization (if enabled), and various other metadata.

**Examples:**

Add a computed column that applies the model `tiny.en` to an existing Pixeltable column `tbl.audio`
of the table `tbl`:

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(result=transcribe(tbl.audio, model='tiny.en'))
```

Add a computed column that applies the model `tiny.en` to an existing Pixeltable column `tbl.audio`
of the table `tbl`, with speaker diarization enabled, expecting at least 2 speakers:

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tbl.add_computed_column(
    result=transcribe(
        tbl.audio, model='tiny.en', diarize=True, min_speakers=2
    )
)
```