Theme
Speech-to-Text API Guide
Overview
The Audio API provides two main endpoints:
📝 transcriptions: Convert audio to text
🔄 translations: Translate audio to English
Supported Formats
📁 File size: Maximum 25 MB
🎵 Supported formats: mp3, mp4, mpeg, mpg, m4a, wav, webm
Usage
1. Transcription
Convert audio to text in the original language
from openai import OpenAI
client = OpenAI(
base_url="https://www.kkiai.com/v1",
api_key=key
)
# Basic transcription
audio_file = open("/path/to/file/audio.mp3", "rb")
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
print(transcription.text)
# Specify output format
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="text"
)2. Translation
Convert audio in any language to English text
from openai import OpenAI
client = OpenAI(
base_url="https://www.kkiai.com/v1",
api_key=key
)
audio_file = open("/path/to/file/german.mp3", "rb")
translation = client.audio.translations.create(
model="whisper-1",
file=audio_file
)
print(translation.text)3. Timestamp Feature
from openai import OpenAI
client = OpenAI(
base_url="https://www.kkiai.com/v1",
api_key=key
)
audio_file = open("speech.mp3", "rb")
transcript = client.audio.transcriptions.create(
file=audio_file,
model="whisper-1",
response_format="verbose_json",
timestamp_granularities=["word"]
)
print(transcript.words)4. Handling Large Files
Use PyDub to split files larger than 25MB:
from pydub import AudioSegment
song = AudioSegment.from_mp3("good_morning.mp3")
# Split into 10-minute segments
ten_minutes = 10 * 60 * 1000
first_10_minutes = song[:ten_minutes]
first_10_minutes.export("good_morning_10.mp3", format="mp3")Optimization Tips
Prompts Usage Tips
🔍 For correcting specific vocabulary recognition
📜 Maintain contextual coherence
✍️ Control punctuation output
🗣️ Preserve filler words
📝 Control output text style (e.g., simplified vs. traditional Chinese)
Supported Languages
Supports 98 languages, including:
Major Asian languages: Chinese, Japanese, Korean, etc.
European languages: English, French, German, etc.
Other regional languages: Arabic, Hindi, etc.
Note: Only languages with word error rate (WER) below 50% are listed. Other languages are supported but may have lower quality