Skip to content

Text-to-Speech (TTS) API Guide

Overview

The Audio API provides a speech endpoint that implements the following features based on TTS models:

📝 Blog post narration

🌍 Multi-language audio generation

🎵 Real-time audio streaming output

Important Note: You must inform users that the audio they hear is AI-generated speech, not human voice

Basic Usage

Basic Example

from pathlib import Path
from openai import OpenAI

client = OpenAI(
    base_url="https://www.kkiai.com/v1",
    api_key=key
)

speech_file_path = Path(__file__).parent / "speech.mp3"
response = client.audio.speech.create(
  model="tts-1",
  voice="alloy",
  input="Today is a wonderful day to build something people love!"
)

response.stream_to_file(speech_file_path)

Features

Audio Quality Options

tts-1: Low latency, suitable for real-time applications

tts-1-hd: Higher quality, may have less static noise

Available Voices

alloy

echo

fable

nova

shimmer

onyx

Supported Output Formats

FormatCharacteristicsUse Cases
MP3Default formatGeneral use
OpusLow latencyWeb streaming and communication
AACEfficient compressionMobile device playback
FLACLossless compressionAudio archiving
WAVUncompressedLow-latency applications
PCMRaw samples24kHz, 16-bit signed

Real-time Audio Streaming

from openai import OpenAI

client = OpenAI(
    base_url="https://www.kkiai.com/v1",
    api_key=key
)

response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello world! This is a streaming test.",
)

response.stream_to_file("output.mp3")

Supported Languages

Multiple languages are supported, including:

Asian languages: Chinese, Japanese, Korean, etc.

European languages: English, French, German, etc.

Other languages: Arabic, Hindi, etc.

Note: Current voices are primarily optimized for English

Frequently Asked Questions

Q: How do I control the emotion of generated audio?

A: There is currently no direct control mechanism. Uppercase letters or grammar may influence the output, but the effect is uncertain.

Q: Can I create custom voices?

A: Custom voice creation is not supported.

Q: Who owns the generated audio?

A: The audio is owned by the creator, but you must inform users that it is AI-generated audio.