Skip to content

Typhoon ASR Documentation

.wav, .mp3, .flac, .ogg, .opus

Typhoon’s hosted API is OpenAI-compatible. This is the fastest way to integrate ASR without setting up infrastructure.

You will need a Typhoon API key which you can get one for free at our web playground.

See OpenAI’s API Doc for Transcription

from openai import OpenAI
client = OpenAI(
api_key="<YOUR_API_KEY>",
base_url="https://api.opentyphoon.ai/v1"
)
def transcribe_audio_file(audio_file_path):
"""
Transcribe an audio file using Typhoon ASR API
"""
try:
with open(audio_file_path, 'rb') as audio_file:
transcription = client.audio.transcriptions.create(
file=audio_file,
model="typhoon-asr-realtime"
)
return transcription
except Exception as e:
print(f"Error transcribing audio: {e}")
return None
audio_file_path = "path/to/your/audio.wav"
transcription = transcribe_audio_file(audio_file_path)
if transcription:
print(f"Transcription: {transcription.text}")
print(f"Usage: {transcription.usage}")
Model IDSizeDescriptionRate LimitsRelease Date
typhoon-asr-realtime114MStreaming ASR100 reqs/minute2025-09-08

🖥️ Option 2: Self-Hosting with Python Package

Section titled “🖥️ Option 2: Self-Hosting with Python Package”

For developers who want to run the model locally (CPU or GPU). No API key required.

pip install typhoon-asr
from typhoon_asr import transcribe
# Basic transcription
result = transcribe("audio.wav")
print(result['text'])
# With word timestamps
result = transcribe("audio.wav", with_timestamps=True)
for ts in result['timestamps']:
print(f"[{ts['start']:.2f}s - {ts['end']:.2f}s] {ts['word']}")
# Specify device (CPU/GPU/auto)
result = transcribe("audio.wav", device="cuda")
print(result['text'])
transcribe(
input_file,
model_name="scb10x/typhoon-asr-realtime",
with_timestamps=False,
device="auto"
)
  • input_file (str) – Path to audio file

  • model_name (str) – Hugging Face model identifier (default: scb10x/typhoon-asr-realtime)

  • with_timestamps (bool) – Return word timestamps (default: False)

  • device (str) – “auto”, “cpu”, “cuda”

  • text – Transcribed text

  • timestamps – Word timestamps (if enabled)

  • processing_time – Processing duration in seconds

  • audio_duration – Input audio length in seconds

  • Python ≥ 3.8

  • CUDA (optional, for GPU acceleration)

See our Github Repo for more example codes including a fine-tuning example: https://github.com/scb-10x/typhoon-asr

Apache Software License 2.0