Rate limits

Routic applies rate limits per API Key to ensure fair usage and system stability.

Default limits

LimitDefaultMax (upper bound)
RPM (requests per minute)1001,000
TPM (tokens per minute)10,000100,000
Max budget100 USD1,000 USD
Budget duration30 days90 days

Limits are set per key when the key is generated. Your dashboard shows the current limits for each key.

How limits are applied

  • Per-key: Limits apply to each API Key independently, not per account.
  • Per-minute: RPM and TPM are calculated on a rolling 60-second window.
  • No queuing: When a limit is exceeded, the request is rejected immediately with a 429 status. Requests are not queued.

429 response

When you exceed a rate limit, you receive a 429 response:

{
  "error": {
    "message": "RPM limit exceeded. Try again in 5 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

The response may include a Retry-After header indicating the number of seconds to wait.

How to handle rate limits

Retry with exponential backoff

import time
import random

def call_api_with_retry(messages, model, max_retries=3):
    for attempt in range(max_retries):
        response = client.chat.completions.create(model=model, messages=messages)
        if response.status_code != 429:
            return response
        delay = min(1 * (2 ** attempt) + random.uniform(0, 0.5), 60)
        time.sleep(delay)
    raise Exception("Max retries exceeded")

Monitor your usage

Check the usage object in each response to track your token consumption:

{
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 200,
    "total_tokens": 300
  }
}

Reduce token consumption

  • Use shorter prompts
  • Truncate old conversation turns in multi-turn scenarios
  • Use max_tokens to limit output length

Requesting higher limits

If you need higher rate limits, contact support with:

  1. Your API Key ID
  2. Current limits
  3. Desired limits
  4. Expected usage pattern

Limits can be increased up to the upper bound (1,000 RPM / 100,000 TPM) on request. Higher limits require special approval.


See also