Rate limits
Routic applies rate limits per API Key to ensure fair usage and system stability.
Default limits
| Limit | Default | Max (upper bound) |
|---|---|---|
| RPM (requests per minute) | 100 | 1,000 |
| TPM (tokens per minute) | 10,000 | 100,000 |
| Max budget | 100 USD | 1,000 USD |
| Budget duration | 30 days | 90 days |
Limits are set per key when the key is generated. Your dashboard shows the current limits for each key.
How limits are applied
- Per-key: Limits apply to each API Key independently, not per account.
- Per-minute: RPM and TPM are calculated on a rolling 60-second window.
- No queuing: When a limit is exceeded, the request is rejected immediately with a 429 status. Requests are not queued.
429 response
When you exceed a rate limit, you receive a 429 response:
{
"error": {
"message": "RPM limit exceeded. Try again in 5 seconds.",
"type": "rate_limit_error",
"code": "rate_limit_exceeded"
}
}
The response may include a Retry-After header indicating the number of seconds to wait.
How to handle rate limits
Retry with exponential backoff
import time
import random
def call_api_with_retry(messages, model, max_retries=3):
for attempt in range(max_retries):
response = client.chat.completions.create(model=model, messages=messages)
if response.status_code != 429:
return response
delay = min(1 * (2 ** attempt) + random.uniform(0, 0.5), 60)
time.sleep(delay)
raise Exception("Max retries exceeded")
Monitor your usage
Check the usage object in each response to track your token consumption:
{
"usage": {
"prompt_tokens": 100,
"completion_tokens": 200,
"total_tokens": 300
}
}
Reduce token consumption
- Use shorter prompts
- Truncate old conversation turns in multi-turn scenarios
- Use
max_tokensto limit output length
Requesting higher limits
If you need higher rate limits, contact support with:
- Your API Key ID
- Current limits
- Desired limits
- Expected usage pattern
Limits can be increased up to the upper bound (1,000 RPM / 100,000 TPM) on request. Higher limits require special approval.