Text-to-Speech (TTS) API Guide

Overview

The Audio API provides a speech endpoint that implements the following features based on TTS models:

📝 Blog post narration

🌍 Multi-language audio generation

🎵 Real-time audio streaming output

Important Note: You must inform users that the audio they hear is AI-generated speech, not human voice

Basic Usage

Basic Example

from pathlib import Path
from openai import OpenAI

client = OpenAI(
    base_url="https://www.kkiai.com/v1",
    api_key=key
)

speech_file_path = Path(__file__).parent / "speech.mp3"
response = client.audio.speech.create(
  model="tts-1",
  voice="alloy",
  input="Today is a wonderful day to build something people love!"
)

response.stream_to_file(speech_file_path)

Features

Audio Quality Options

tts-1: Low latency, suitable for real-time applications

tts-1-hd: Higher quality, may have less static noise

Available Voices

alloy

echo

fable

nova

shimmer

onyx

Supported Output Formats

Format	Characteristics	Use Cases
MP3	Default format	General use
Opus	Low latency	Web streaming and communication
AAC	Efficient compression	Mobile device playback
FLAC	Lossless compression	Audio archiving
WAV	Uncompressed	Low-latency applications
PCM	Raw samples	24kHz, 16-bit signed

Real-time Audio Streaming

from openai import OpenAI

client = OpenAI(
    base_url="https://www.kkiai.com/v1",
    api_key=key
)

response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello world! This is a streaming test.",
)

response.stream_to_file("output.mp3")

Supported Languages

Multiple languages are supported, including:

Asian languages: Chinese, Japanese, Korean, etc.

European languages: English, French, German, etc.

Other languages: Arabic, Hindi, etc.

Note: Current voices are primarily optimized for English

Frequently Asked Questions

Q: How do I control the emotion of generated audio?

A: There is currently no direct control mechanism. Uppercase letters or grammar may influence the output, but the effect is uncertain.

Q: Can I create custom voices?

A: Custom voice creation is not supported.

Q: Who owns the generated audio?

A: The audio is owned by the creator, but you must inform users that it is AI-generated audio.

Text-to-Speech (TTS) API Guide ​

Overview ​

Basic Usage ​

Basic Example ​

Features ​

Audio Quality Options ​

Available Voices ​

Supported Output Formats ​

Real-time Audio Streaming ​

Supported Languages ​

Frequently Asked Questions ​

Q: How do I control the emotion of generated audio? ​

Q: Can I create custom voices? ​

Q: Who owns the generated audio? ​