Tokens and Usage
Understanding how tokens work is essential for effectively using the OpenTyphoon.ai API. This page explains tokens, context windows, and provides tips for optimizing your token usage.
What are Tokens?
Tokens are the fundamental units of text processing in language models. The OpenTyphoon.ai API processes text by breaking it down into tokens before sending it to the model. A token can be as short as a single character or as long as a word, depending on the language and specific text.
In Thai language:
- A word can be 1-3 tokens on average
- Space characters count as tokens
- Punctuation marks are separate tokens
- Numbers are typically broken down into individual digits
For example, the Thai phrase “สวัสดีครับ” might be tokenized into approximately 3-4 tokens.
Context Window
The context window refers to the maximum number of tokens that a model can process in a single request, including both the input (prompt) and the generated output. Each OpenTyphoon.ai model has a specific context window size:
Model | Total Context Window | Input Token Limit | Output Token Limit |
---|---|---|---|
All Typhoon models | 8K tokens | Depends on output | Depends on input |
The total of input + output tokens cannot exceed the context window size. For example, with an 8K context window, if your input prompt uses 7K tokens, the model can only generate up to 1K tokens in response.
Token Counting
To estimate the number of tokens in your text, you can use this rough approximation:
- For Thai text: approximately 2-3 tokens per word
- For English text: approximately 1.3 tokens per word
For more precise token counting, especially in production applications, you should use a proper tokenizer. Unfortunately, the exact tokenizer used by OpenTyphoon.ai is not publicly available, but you can use similar tokenizers for estimation.
Token Usage in API Responses
Each API response includes a usage
field that provides information about the tokens used in your request:
{ "id": "cmpl-abc123", "object": "chat.completion", "created": 1677858242, "model": "typhoon-v2-70b-instruct", "usage": { "prompt_tokens": 25, "completion_tokens": 12, "total_tokens": 37 }, "choices": [...]}
prompt_tokens
: The number of tokens in your input messagescompletion_tokens
: The number of tokens in the generated responsetotal_tokens
: The total number of tokens used in the request (prompt + completion)
Best Practices for Token Usage
Optimize Prompts
Keep your prompts concise and focused:
-
Remove unnecessary context: Include only the information that’s directly relevant to your query.
-
Use system messages efficiently: System messages can help set behavior without lengthy explanations in each user message.
# Efficient use of system messagemessages = [{"role": "system", "content": "You are a Thai language translator. Always translate English to Thai."},{"role": "user", "content": "Hello, how are you?"}] -
Truncate conversation history: For long conversations, consider keeping only the most recent and relevant messages.
Handling Long Inputs
When working with long documents or conversations that might exceed token limits:
-
Chunk the text: Split long text into smaller chunks and process them separately.
def process_long_document(document, chunk_size=1000, overlap=100):# This is a simplified example - in practice, you'd want to split on sentence boundarieswords = document.split()chunks = []for i in range(0, len(words), chunk_size - overlap):chunk = ' '.join(words[i:i + chunk_size])chunks.append(chunk)return chunks -
Summarize previous context: Instead of sending the entire conversation history, summarize earlier turns.
# Example summarization approachsummary_messages = [{"role": "system", "content": "Summarize the following conversation in 2-3 sentences."},{"role": "user", "content": full_conversation_history}]summary_response = client.chat.completions.create(model="typhoon-v2-70b-instruct",messages=summary_messages,max_tokens=100)summary = summary_response.choices[0].message.content# Now use the summary + recent messagesnew_conversation = [{"role": "system", "content": "Previous conversation summary: " + summary},# Add recent message exchanges...] -
Use retrieval-based approaches: For question answering with large documents, use retrieval to fetch only the most relevant portions.
Managing Token Limits
When working within token limits:
-
Set appropriate
max_tokens
: Specify a reasonable value for themax_tokens
parameter based on how long you expect the response to be. -
Monitor token usage: Keep track of token usage to ensure you stay within limits and optimize where necessary.
-
Implement truncation strategies: Have a plan for handling cases where content exceeds token limits.
Example: Token Usage Estimator
def estimate_tokens(text, lang="thai"): # Very rough estimation - use a proper tokenizer for production if lang.lower() == "thai": # Thai text: estimate 2.5 tokens per word (rough average) # Count Thai characters and spaces words = len(text.split()) return int(words * 2.5) else: # English text: estimate 1.3 tokens per word words = len(text.split()) return int(words * 1.3)
def check_token_limits(messages, model="typhoon-v2-70b-instruct", max_output_tokens=500): # Estimate total input tokens input_tokens = 0 for message in messages: # Add 4 tokens for message formatting content = message.get("content", "") lang = "english" if all(ord(c) < 128 for c in content) else "thai" input_tokens += estimate_tokens(content, lang) + 4
# Check against model's context limit context_limit = 8192 # 8K tokens for all Typhoon models remaining_tokens = context_limit - input_tokens
if remaining_tokens <= 0: print(f"Warning: Input exceeds context window of {context_limit} tokens.") return False
if remaining_tokens < max_output_tokens: print(f"Warning: Only {remaining_tokens} tokens remaining for output " + f"(requested {max_output_tokens}).") print("Consider reducing input length or requested output tokens.") return False
print(f"Estimated input tokens: {input_tokens}") print(f"Remaining tokens for output: {remaining_tokens}") return True
Thai Language Considerations
Thai language has some specific characteristics that affect tokenization:
-
No spaces between words: Thai language doesn’t use spaces between words, which affects tokenization differently than space-delimited languages like English.
-
Character-level tokens: Many Thai characters or combinations of characters become individual tokens.
-
Tone marks and vowels: These are often separate tokens from the consonants they modify.
These characteristics mean that Thai text may use more tokens than you might expect compared to English text of similar meaning. Keep this in mind when designing prompts and estimating token usage.