Skip to content

Text-to-Speech

POST https://www.kkiai.com/ent/v2/audio-tts

Official Documentation: https://platform.vidu.cn/docs/text-to-speech

Request Parameters

Authorization

Add the Authorization parameter to the Header. Its value should be the Token concatenated after Bearer.

Example: Authorization: Bearer ********************

Header Parameters

Parameter NameTypeRequiredDescriptionExample
AuthorizationstringOptionalBearer {{YOUR_API_KEY}}
Content-TypestringOptionalapplication/json

Body Parameters (application/json)

Parameter NameTypeRequiredDescription
textstringRequiredText to be synthesized into speech 1. Length limit: less than 10000 characters 2. Paragraph breaks are marked with line breaks 3. Pause control: supports custom time intervals between text for speech pause effects. * Usage: add <#x#> markers in the text, where x is the pause duration in seconds, range [0.01, 99.99], with a maximum of two decimal places. The interval should be set between two pronounceable text segments and cannot use multiple pause markers consecutively * Example: Hello<#2#>I am vidu<#2#>nice to meet you!
voice_setting_voice_idstringRequiredVoice ID for synthesized audio. View the voice list to query all available voices: https://shengshu.feishu.cn/sheets/EgFvs6DShhiEBStmjzccr5gonOg
voice_setting_speedstringOptionalSpeech speed, default is 1.0. 1.0 is normal speed, range [0.5,2]. At 0.5 the speech is slowest, at 2 the speech is fastest
voice_setting_volumestringOptionalVolume level. Range 0 - 10, default is 0, representing normal volume. Higher values mean higher volume
voice_setting_pitchstringOptionalPitch of synthesized audio. Range [-12,12], default 0. 0 represents the original voice output
voice_setting_emotionstringOptionalControls the emotion of synthesized speech 1. Parameter range ["happy", "sad", "angry", "fearful", "disgusted", "surprised", "calm"], corresponding to 7 emotions: happy, sad, angry, fearful, disgusted, surprised, neutral 2. The model automatically matches appropriate emotions based on input text, manual specification is generally not needed
pronunciation_dict_tonestringOptionalDefine pronunciation of polyphonic characters. Define phonetic annotations or pronunciation replacement rules for specific characters or symbols that need special marking. For polyphonic characters in Chinese text, tones are represented by numbers: first tone is 1; second tone is 2; third tone is 3; fourth tone is 4; neutral tone is 5. Examples: ["燕少飞/(yan4)(shao3)(fei1)", "达菲/(da2)(fei1)", "omg/oh my god"]
payloadstringOptionalPass-through parameter. No processing is performed, only data transmission. Note: maximum 1048576 characters

Request Example

json
{
    "text": "Artificial intelligence is changing the way we live, from smart homes to autonomous driving. Advances in technology are making the world more convenient.",
    "voice_setting_voice_id": "male-qn-daxuesheng"
}

cURL Example

bash
curl --location --request POST 'https://www.kkiai.com/ent/v2/audio-tts' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data-raw '{
    "text": "Artificial intelligence is changing the way we live, from smart homes to autonomous driving. Advances in technology are making the world more convenient.",
    "voice_setting_voice_id": "male-qn-daxuesheng"
}'

Response

🟢 200 Success

Response Body

Parameter NameTypeRequiredDescription
task_idstringRequired
statestringRequired
modelstringRequired
promptstringRequired
durationintegerRequired
seedintegerRequired
created_atstringRequired
creditsintegerRequired

Response Example

json
{
    "task_id": "911094612548939776",
    "state": "created",
    "model": "audio1.0",
    "prompt": "The sound of raindrops falling on a window, accompanied by soft thunder.",
    "duration": 5,
    "seed": 0,
    "created_at": "2026-01-20T07:16:38.094635957Z",
    "credits": 10
}