Skip to main content
POST
/
v1
/
audio
/
transcriptions
Audio transcriptions
curl --request POST \
  --url http://localhost:8000/v1/audio/transcriptions \
  --header 'Content-Type: application/json' \
  --data '{
  "file": "@/path/to/file/audio.mp3"
}'
{
  "text": "Hello, how are you?",
  "usage": {
    "type": "tokens",
    "prompt_tokens": 20,
    "completion_tokens": 10,
    "total_tokens": 30,
    "input_token_details": {
      "audio_tokens": 10,
      "text_tokens": 10
    }
  }
}
Given an audio file, the model transcribes it into text.

Body

application/json
file
file
required

The audio file object (not file name) to transcribe, in one of these formats: mp3, wav, flac, ogg, and many other standard audio formats.

model
string | null

Routes the request to a specific adapter.

Examples:

"(adapter-route)"

chunking_strategy

Controls how the audio is cut into chunks. When set to "auto", the server first normalizes loudness and then uses voice activity detection (VAD) to choose boundaries. server_vad object can be provided to tweak VAD detection parameters manually. If unset, the audio is transcribed as a single block.

Allowed value: "auto"
language
string | null

The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.

temperature
number | null

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Response

Successfully transcribed the audio file.

text
string
required

The transcribed text.

usage
object
required