module pixeltable.functions.audio

Pixeltable UDFs for AudioType.

iterator audio_splitter()

Signature

@pxt.iterator
audio_splitter(
    audio: pxt.Audio,
    duration: pxt.Float,
    *,
    overlap: pxt.Float = 0.0,
    min_segment_duration: pxt.Float = 0.0
)

Iterator over segments of an audio file. The audio file is split into smaller segments, where the duration of each segment is determined by duration. If the input contains no audio, no segments are yielded. Outputs: One row per audio segment, with the following columns:

segment_start (pxt.Float): Start time of the audio segment in seconds
segment_end (pxt.Float): End time of the audio segment in seconds
audio_segment (pxt.Audio | None): The audio content of the segment

Parameters:

duration (pxt.Float): Audio segment duration in seconds
overlap (pxt.Float): Overlap between consecutive segments in seconds
min_segment_duration (pxt.Float): Drop the last segment if it is smaller than min_segment_duration

Examples: This example assumes an existing table tbl with a column audio of type pxt.Audio. Create a view that splits all audio files into segments of 30 seconds with 5 seconds overlap:

pxt.create_view(
    'audio_segments',
    tbl,
    iterator=audio_splitter(tbl.audio, duration=30.0, overlap=5.0),
)

udf encode_audio()

Signature

@pxt.udf
encode_audio(
    audio_data: pxt.Array[float32],
    *,
    input_sample_rate: pxt.Int,
    format: pxt.String,
    output_sample_rate: pxt.Int | None = None
) -> pxt.Audio

Encodes an audio clip represented as an array into a specified audio format. Parameters:

audio_data (pxt.Array[float32]): An array of sampled amplitudes. The accepted array shapes are (N,) or (1, N) for mono audio or (2, N) for stereo.
input_sample_rate (pxt.Int): The sample rate of the input audio data.
format (pxt.String): The desired output audio format. The supported formats are ‘wav’, ‘mp3’, ‘flac’, and ‘mp4’.
output_sample_rate (pxt.Int | None): The desired sample rate for the output audio. Defaults to the input sample rate if unspecified.

Examples: Add a computed column with encoded FLAC audio files to a table with audio data (as arrays of floats) and sample rates:

t.add_computed_column(
    audio_file=encode_audio(
        t.audio_data, input_sample_rate=t.sample_rate, format='flac'
    )
)

udf get_metadata()

Signature

@pxt.udf
get_metadata(audio: pxt.Audio) -> pxt.Json

Gets various metadata associated with an audio file and returns it as a dictionary. Parameters:

audio (pxt.Audio): The audio to get metadata for.

Returns:

pxt.Json: A dict such as the following:

{
    'size': 2568827,
    'streams': [
        {
            'type': 'audio',
            'frames': 0,
            'duration': 2646000,
            'metadata': {},
            'time_base': 2.2675736961451248e-05,
            'codec_context': {
                'name': 'flac',
                'profile': None,
                'channels': 1,
                'codec_tag': '\x00\x00\x00\x00',
            },
            'duration_seconds': 60.0,
        }
    ],
    'bit_rate': 342510,
    'metadata': {'encoder': 'Lavf61.1.100'},
    'bit_exact': False,
}

Examples: Extract metadata for files in the audio_col column of the table tbl:

tbl.select(tbl.audio_col.get_metadata()).collect()

SDK Reference

​module pixeltable.functions.audio

​iterator audio_splitter()

​udf encode_audio()

​udf get_metadata()

module pixeltable.functions.audio

iterator audio_splitter()

udf encode_audio()

udf get_metadata()