เอกสารประกอบ Typhoon ASR

About The Model

Typhoon ASR Real-Time:
- ข้อมูลการเปิดตัวฉบับเต็ม
- Model Fact Sheet
- Web Playground - ทดลองใช้งานได้ทันทีผ่านเบราว์เซอร์ เหมาะสำหรับผู้ใช้ทั่วไป

🎧 ไฟล์เสียงที่รองรับ

.wav, .mp3, .flac, .ogg, .opus

🔌 ตัวเลือกที่ 1: ใช้ Typhoon API

API ที่ให้บริการของ Typhoon สามารถใช้งานได้ในรูปแบบ OpenAI-compatible ซึ่งเป็นวิธีที่เร็วที่สุดในการเชื่อมต่อ ASR โดยไม่ต้องติดตั้งโครงสร้างพื้นฐานเอง

คุณจะต้องมี Typhoon API key ซึ่งสามารถขอใช้งานได้ฟรีที่ Web Playground ของเรา

ตัวอย่าง:

ดู เอกสาร OpenAI API สำหรับการถอดเสียง

from openai import OpenAI

# เชื่อมต่อกับ Typhoon API
client = OpenAI(
    base_url="https://api.opentyphoon.ai/v1",
    api_key="your_api_key_here"
)

# ส่งไฟล์เสียงเพื่อถอดเสียง
with open("audio.wav", "rb") as f:
    response = client.audio.transcriptions.create(
        model="typhoon-asr-realtime",
        file=f
    )

print(response.text)

ข้อมูลอ้างอิง

Model ID	ขนาด	รายละเอียด	Rate Limits	วันที่เปิดตัว
`typhoon-asr-realtime`	114M	Streaming ASR	100 reqs/minute	2025-09-08

🖥️ ตัวเลือกที่ 2: Self-Hosting ด้วย Python Package

สำหรับนักพัฒนาที่ต้องการรันโมเดลบนเครื่องตนเอง (CPU หรือ GPU) โดยไม่ต้องใช้ API key

ติดตั้งแพ็กเกจ

pip install typhoon-asr

ตัวอย่างการใช้งาน (Local Usage)

from typhoon_asr import transcribe

# Basic transcription
result = transcribe("audio.wav")
print(result['text'])

# With word timestamps
result = transcribe("audio.wav", with_timestamps=True)
for ts in result['timestamps']:
    print(f"[{ts['start']:.2f}s - {ts['end']:.2f}s] {ts['word']}")

# Specify device (CPU/GPU/auto)
result = transcribe("audio.wav", device="cuda")
print(result['text'])

API Reference (โหมด Self-Host)

transcribe(
    input_file,
    model_name="scb10x/typhoon-asr-realtime",
    with_timestamps=False,
    device="auto"
)

Parameters:

input_file (str) – ที่อยู่ไฟล์เสียง
model_name (str) – Hugging Face model identifier (ค่าเริ่มต้น: scb10x/typhoon-asr-realtime)
with_timestamps (bool) – ส่งคืนเวลาของแต่ละคำ (ค่าเริ่มต้น: False)
device (str) – “auto”, “cpu”, “cuda”

Returns (dict):

text – ข้อความที่ถอดเสียง
timestamps – เวลาของแต่ละคำ (ถ้ามีการเปิดใช้งาน)
processing_time – ระยะเวลาในการประมวลผล (วินาที)
audio_duration – ความยาวไฟล์เสียง (วินาที)

ความต้องการของระบบ

Python ≥ 3.8
CUDA (optional, for GPU acceleration)

ลิงก์เพิ่มเติม

ดูตัวอย่างโค้ดเพิ่มเติมรวมถึงโค้ดสำหรับการปรับจูนโมเดลได้ที่ https://github.com/scb-10x/typhoon-asr

ใบอนุญาต (License)

Apache Software License 2.0