Error handling & retries

This guide covers how to handle errors and retries when calling the Routic API. For the full list of error codes, see HTTP semantics & error payloads.

Core principles

Always log request_id — every error response includes this field. You'll need it when contacting support.
Distinguish retryable from non-retryable — 429, 500, 503 are safe to retry; 400, 401, 402, 422 require a fix first.
Use exponential backoff — never retry immediately, never retry at fixed intervals.

Retryable vs non-retryable

Status code	Retry?	Why
400	❌	Your request is malformed — retrying won't help
401	❌	Invalid API key — retrying won't help
402	❌	Insufficient balance — top up first
422	❌	Invalid parameter combination — fix and resend
429	✅	Rate limit — just wait a bit
500	✅	Temporary server issue
503	✅	Model provider temporarily unavailable

How to retry

Use exponential backoff with jitter:

delay = min(1s × 2^attempt + random_jitter, 60s)

Base delay: 1 second
Max retries: 3
Max delay: 60 seconds
Jitter: 0–500 ms (prevents thundering herd)

Python (OpenAI SDK built-in retry)

The OpenAI SDK has automatic retry built in — just configure it:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.routic.ai/v1",
    api_key="sk-xxxxxxxx",
    max_retries=3,      # retry up to 3 times
    timeout=30.0,       # 30s timeout per request
)

# 429/500/503 are retried automatically — no retry code needed
response = client.chat.completions.create(
    model="deepseek-r1",
    messages=[{"role": "user", "content": "Hello"}],
)

Python (manual retry)

If you need more control:

import time
import random
from openai import OpenAI

client = OpenAI(
    base_url="https://api.routic.ai/v1",
    api_key="sk-xxxxxxxx",
    max_retries=0,  # disable SDK auto-retry
)

def call_with_retry(messages, max_retries=3):
    for attempt in range(max_retries + 1):
        try:
            return client.chat.completions.create(
                model="deepseek-r1",
                messages=messages,
            )
        except Exception as e:
            # Non-retryable — raise immediately
            if hasattr(e, 'status_code') and e.status_code in (400, 401, 402, 422):
                raise

            # Retryable — back off
            if attempt < max_retries:
                delay = min(1 * (2 ** attempt) + random.uniform(0, 0.5), 60)
                print(f"Attempt {attempt + 1} failed. Retrying in {delay:.1f}s...")
                time.sleep(delay)
            else:
                raise

Node.js (OpenAI SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.routic.ai/v1",
  apiKey: "sk-xxxxxxxx",
  maxRetries: 3,
});

Idempotency

The chat completions endpoint is not idempotent — sending the same request twice consumes tokens twice. So:

Don't retry the same request blindly — unless it's a 429/500/503 (the first request likely wasn't processed)
Don't retry on 400/401/402/422 — the first request was received and rejected; retrying just wastes effort

Handling 429 rate limits specifically

429 responses may include a Retry-After header telling you how many seconds to wait:

except RateLimitError as e:
    retry_after = e.response.headers.get("retry-after")
    if retry_after:
        wait = int(retry_after)
    else:
        wait = 5  # default wait
    time.sleep(wait)

If your application is latency-sensitive:

Rotate across multiple API keys (each key has independent rate limits)
Use routing aliases when enabled (e.g., auto/chat) — the gateway picks an available model for that capability
Contact support to request higher limits

Common pitfalls

Pitfall	Cause	Fix
Retries get slower over time	No jitter — all clients retry at once	Add random jitter 0–500ms
Repeated 401 retries	Key is expired but still retrying	Don't retry 401 — get a new key
429 thundering herd	Immediate retry after rate limit	Wait for `Retry-After` or at least 5 seconds
Streaming interruption	Network hiccup, not a server error	Use SDK stream reconnection, or fall back to non-streaming retry