Convert text to speech

This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.

Generate natural-sounding audio from text using OpenAI’s text-to-speech models.

Problem

You need to convert text content into spoken audio—for accessibility, content repurposing, or voice applications.

Solution

What’s in this recipe:

Generate speech with OpenAI TTS
Choose from multiple voice options
Store text and audio together

You add a computed column that converts text to audio. The audio is cached and only regenerated when the source text changes.

Setup

%pip install -qU pixeltable openai

import getpass
import os

if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')

import pixeltable as pxt
from pixeltable.functions.openai import speech

# Create a fresh directory
pxt.drop_dir('tts_demo', force=True)
pxt.create_dir('tts_demo')

Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘tts_demo’.
<pixeltable.catalog.dir.Dir at 0x17f0d5bd0>

Create text-to-speech pipeline

# Create table for articles
articles = pxt.create_table(
    'tts_demo/articles', {'title': pxt.String, 'content': pxt.String}
)

Created table ‘articles’.

# Add audio generation column
articles.add_computed_column(
    audio=speech(articles.content, model='tts-1', voice='alloy')
)

Added 0 column values with 0 errors.
No rows affected.

Generate audio

# Insert sample articles
sample_articles = [
    {
        'title': 'Welcome to AI',
        'content': 'Artificial intelligence is transforming how we work and live. From smart assistants to autonomous vehicles, AI is becoming part of our daily lives.',
    },
    {
        'title': 'Getting Started',
        'content': 'To begin your journey with machine learning, start by understanding the basics of data preparation and model training.',
    },
]

articles.insert(sample_articles)

Inserting rows into `articles`: 2 rows [00:00, 423.90 rows/s]
Inserted 2 rows with 0 errors.
2 rows inserted, 6 values computed.

# View articles with generated audio
articles.select(
    articles.title, articles.content, articles.audio
).collect()

Explanation

OpenAI TTS models:

Voice options:

Tips:

Use tts-1 for drafts and real-time applications
Use tts-1-hd for final production audio
Audio is cached—no regeneration on queries

Welcome to Pixeltable

Core Concepts

How-To

Problem

Solution

Setup

Create text-to-speech pipeline

Generate audio

Explanation

See also

Welcome to Pixeltable

Core Concepts

How-To

​Problem

​Solution

​Setup

​Create text-to-speech pipeline

​Generate audio

​Explanation

​See also

Problem

Solution

Setup

Create text-to-speech pipeline

Generate audio

Explanation

See also