Error handling & retries
This guide covers how to handle errors and retries when calling the Routic API. For the full list of error codes, see HTTP semantics & error payloads.
Core principles
- Always log
request_id— every error response includes this field. You'll need it when contacting support. - Distinguish retryable from non-retryable — 429, 500, 503 are safe to retry; 400, 401, 402, 422 require a fix first.
- Use exponential backoff — never retry immediately, never retry at fixed intervals.
Retryable vs non-retryable
| Status code | Retry? | Why |
|---|---|---|
| 400 | ❌ | Your request is malformed — retrying won't help |
| 401 | ❌ | Invalid API key — retrying won't help |
| 402 | ❌ | Insufficient balance — top up first |
| 422 | ❌ | Invalid parameter combination — fix and resend |
| 429 | ✅ | Rate limit — just wait a bit |
| 500 | ✅ | Temporary server issue |
| 503 | ✅ | Model provider temporarily unavailable |
How to retry
Use exponential backoff with jitter:
delay = min(1s × 2^attempt + random_jitter, 60s)
- Base delay: 1 second
- Max retries: 3
- Max delay: 60 seconds
- Jitter: 0–500 ms (prevents thundering herd)
Python (OpenAI SDK built-in retry)
The OpenAI SDK has automatic retry built in — just configure it:
from openai import OpenAI
client = OpenAI(
base_url="https://api.routic.ai/v1",
api_key="sk-xxxxxxxx",
max_retries=3, # retry up to 3 times
timeout=30.0, # 30s timeout per request
)
# 429/500/503 are retried automatically — no retry code needed
response = client.chat.completions.create(
model="deepseek-r1",
messages=[{"role": "user", "content": "Hello"}],
)
Python (manual retry)
If you need more control:
import time
import random
from openai import OpenAI
client = OpenAI(
base_url="https://api.routic.ai/v1",
api_key="sk-xxxxxxxx",
max_retries=0, # disable SDK auto-retry
)
def call_with_retry(messages, max_retries=3):
for attempt in range(max_retries + 1):
try:
return client.chat.completions.create(
model="deepseek-r1",
messages=messages,
)
except Exception as e:
# Non-retryable — raise immediately
if hasattr(e, 'status_code') and e.status_code in (400, 401, 402, 422):
raise
# Retryable — back off
if attempt < max_retries:
delay = min(1 * (2 ** attempt) + random.uniform(0, 0.5), 60)
print(f"Attempt {attempt + 1} failed. Retrying in {delay:.1f}s...")
time.sleep(delay)
else:
raise
Node.js (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.routic.ai/v1",
apiKey: "sk-xxxxxxxx",
maxRetries: 3,
});
Idempotency
The chat completions endpoint is not idempotent — sending the same request twice consumes tokens twice. So:
- Don't retry the same request blindly — unless it's a 429/500/503 (the first request likely wasn't processed)
- Don't retry on 400/401/402/422 — the first request was received and rejected; retrying just wastes effort
Handling 429 rate limits specifically
429 responses may include a Retry-After header telling you how many seconds to wait:
except RateLimitError as e:
retry_after = e.response.headers.get("retry-after")
if retry_after:
wait = int(retry_after)
else:
wait = 5 # default wait
time.sleep(wait)
If your application is latency-sensitive:
- Rotate across multiple API keys (each key has independent rate limits)
- Use routing aliases when enabled (e.g.,
auto/chat) — the gateway picks an available model for that capability - Contact support to request higher limits
Common pitfalls
| Pitfall | Cause | Fix |
|---|---|---|
| Retries get slower over time | No jitter — all clients retry at once | Add random jitter 0–500ms |
| Repeated 401 retries | Key is expired but still retrying | Don't retry 401 — get a new key |
| 429 thundering herd | Immediate retry after rate limit | Wait for Retry-After or at least 5 seconds |
| Streaming interruption | Network hiccup, not a server error | Use SDK stream reconnection, or fall back to non-streaming retry |
See also
- HTTP semantics & error payloads — full error code reference
- Rate limits — rate limit details
- Observability & request ID — how to trace issues