Speech-to-Text API Guide

Overview

The Audio API provides two main endpoints:

📝 transcriptions: Convert audio to text

🔄 translations: Translate audio to English

Supported Formats

📁 File size: Maximum 25 MB

🎵 Supported formats: mp3, mp4, mpeg, mpg, m4a, wav, webm

Usage

1. Transcription

Convert audio to text in the original language

from openai import OpenAI

client = OpenAI(
    base_url="https://www.kkiai.com/v1",
    api_key=key
)

# Basic transcription
audio_file = open("/path/to/file/audio.mp3", "rb")
transcription = client.audio.transcriptions.create(
  model="whisper-1", 
  file=audio_file
)
print(transcription.text)

# Specify output format
transcription = client.audio.transcriptions.create(
  model="whisper-1", 
  file=audio_file, 
  response_format="text"
)

2. Translation

Convert audio in any language to English text

from openai import OpenAI

client = OpenAI(
    base_url="https://www.kkiai.com/v1",
    api_key=key
)

audio_file = open("/path/to/file/german.mp3", "rb")
translation = client.audio.translations.create(
  model="whisper-1", 
  file=audio_file
)
print(translation.text)

3. Timestamp Feature

from openai import OpenAI

client = OpenAI(
    base_url="https://www.kkiai.com/v1",
    api_key=key
)

audio_file = open("speech.mp3", "rb")
transcript = client.audio.transcriptions.create(
  file=audio_file,
  model="whisper-1",
  response_format="verbose_json",
  timestamp_granularities=["word"]
)

print(transcript.words)

4. Handling Large Files

Use PyDub to split files larger than 25MB:

from pydub import AudioSegment

song = AudioSegment.from_mp3("good_morning.mp3")

# Split into 10-minute segments
ten_minutes = 10 * 60 * 1000
first_10_minutes = song[:ten_minutes]
first_10_minutes.export("good_morning_10.mp3", format="mp3")

Optimization Tips

Prompts Usage Tips

🔍 For correcting specific vocabulary recognition
📜 Maintain contextual coherence
✍️ Control punctuation output
🗣️ Preserve filler words
📝 Control output text style (e.g., simplified vs. traditional Chinese)

Supported Languages

Supports 98 languages, including:

Major Asian languages: Chinese, Japanese, Korean, etc.

European languages: English, French, German, etc.

Other regional languages: Arabic, Hindi, etc.

Note: Only languages with word error rate (WER) below 50% are listed. Other languages are supported but may have lower quality

Speech-to-Text API Guide ​

Overview ​

Supported Formats ​

Usage ​

1. Transcription ​

2. Translation ​

3. Timestamp Feature ​

4. Handling Large Files ​

Optimization Tips ​

Prompts Usage Tips ​

Supported Languages ​