Theme
Text-to-Speech (TTS) API Guide
Overview
The Audio API provides a speech endpoint that implements the following features based on TTS models:
📝 Blog post narration
🌍 Multi-language audio generation
🎵 Real-time audio streaming output
Important Note: You must inform users that the audio they hear is AI-generated speech, not human voice
Basic Usage
Basic Example
from pathlib import Path
from openai import OpenAI
client = OpenAI(
base_url="https://www.kkiai.com/v1",
api_key=key
)
speech_file_path = Path(__file__).parent / "speech.mp3"
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="Today is a wonderful day to build something people love!"
)
response.stream_to_file(speech_file_path)Features
Audio Quality Options
tts-1: Low latency, suitable for real-time applications
tts-1-hd: Higher quality, may have less static noise
Available Voices
alloy
echo
fable
nova
shimmer
onyx
Supported Output Formats
| Format | Characteristics | Use Cases |
|---|---|---|
| MP3 | Default format | General use |
| Opus | Low latency | Web streaming and communication |
| AAC | Efficient compression | Mobile device playback |
| FLAC | Lossless compression | Audio archiving |
| WAV | Uncompressed | Low-latency applications |
| PCM | Raw samples | 24kHz, 16-bit signed |
Real-time Audio Streaming
from openai import OpenAI
client = OpenAI(
base_url="https://www.kkiai.com/v1",
api_key=key
)
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="Hello world! This is a streaming test.",
)
response.stream_to_file("output.mp3")Supported Languages
Multiple languages are supported, including:
Asian languages: Chinese, Japanese, Korean, etc.
European languages: English, French, German, etc.
Other languages: Arabic, Hindi, etc.
Note: Current voices are primarily optimized for English
Frequently Asked Questions
Q: How do I control the emotion of generated audio?
A: There is currently no direct control mechanism. Uppercase letters or grammar may influence the output, but the effect is uncertain.
Q: Can I create custom voices?
A: Custom voice creation is not supported.
Q: Who owns the generated audio?
A: The audio is owned by the creator, but you must inform users that it is AI-generated audio.