Single-sentence Voice Recognition Interface Documentation
Table of Contents
Interface Description
The Single-sentence Voice Recognition (Flash ASR) interface is used for quick recognition of short audio content, converting speech to text. This interface is suitable for short speech recognition scenarios, such as voice commands, short voice messages, etc.
Request
HTTP Request
POST /v1/audio/asr/flash
Request Headers
| Parameter | Type | Required | Default Value | Description |
|---|---|---|---|---|
| format | string | No | wav | Audio format, supports wav, mp3, pcm and other common formats |
| sample_rate | integer | No | 16000 | Audio sampling rate, in Hz |
| max_sentence_silence | integer | No | 3000 | Maximum silence duration between sentences, in milliseconds |
| model | string | No | - | ASR model to use, uses the default model if not specified |
Request Body
The request body is a binary audio data stream, with the audio file content sent directly as the request body.
Supported audio formats:
- WAV
- MP3
- PCM
- Other common audio formats (specific support depends on the selected model)
Response
{
"task_id": "asr-task-123456",
"user": "user-123",
"flash_result": {
"duration": 5600,
"sentences": [
{
"text": "今天天气真不错",
"begin_time": 0,
"end_time": 2500
},
{
"text": "我很开心",
"begin_time": 3000,
"end_time": 5600
}
]
}
}