Error handling & retries

This guide covers how to handle errors and retries when calling the Routic API. For the full list of error codes, see HTTP semantics & error payloads.

Core principles

  1. Always log request_id — every error response includes this field. You'll need it when contacting support.
  2. Distinguish retryable from non-retryable — 429, 500, 503 are safe to retry; 400, 401, 402, 422 require a fix first.
  3. Use exponential backoff — never retry immediately, never retry at fixed intervals.

Retryable vs non-retryable

Status codeRetry?Why
400Your request is malformed — retrying won't help
401Invalid API key — retrying won't help
402Insufficient balance — top up first
422Invalid parameter combination — fix and resend
429Rate limit — just wait a bit
500Temporary server issue
503Model provider temporarily unavailable

How to retry

Use exponential backoff with jitter:

delay = min(1s × 2^attempt + random_jitter, 60s)
  • Base delay: 1 second
  • Max retries: 3
  • Max delay: 60 seconds
  • Jitter: 0–500 ms (prevents thundering herd)

Python (OpenAI SDK built-in retry)

The OpenAI SDK has automatic retry built in — just configure it:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.routic.ai/v1",
    api_key="sk-xxxxxxxx",
    max_retries=3,      # retry up to 3 times
    timeout=30.0,       # 30s timeout per request
)

# 429/500/503 are retried automatically — no retry code needed
response = client.chat.completions.create(
    model="deepseek-r1",
    messages=[{"role": "user", "content": "Hello"}],
)

Python (manual retry)

If you need more control:

import time
import random
from openai import OpenAI

client = OpenAI(
    base_url="https://api.routic.ai/v1",
    api_key="sk-xxxxxxxx",
    max_retries=0,  # disable SDK auto-retry
)

def call_with_retry(messages, max_retries=3):
    for attempt in range(max_retries + 1):
        try:
            return client.chat.completions.create(
                model="deepseek-r1",
                messages=messages,
            )
        except Exception as e:
            # Non-retryable — raise immediately
            if hasattr(e, 'status_code') and e.status_code in (400, 401, 402, 422):
                raise

            # Retryable — back off
            if attempt < max_retries:
                delay = min(1 * (2 ** attempt) + random.uniform(0, 0.5), 60)
                print(f"Attempt {attempt + 1} failed. Retrying in {delay:.1f}s...")
                time.sleep(delay)
            else:
                raise

Node.js (OpenAI SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.routic.ai/v1",
  apiKey: "sk-xxxxxxxx",
  maxRetries: 3,
});

Idempotency

The chat completions endpoint is not idempotent — sending the same request twice consumes tokens twice. So:

  • Don't retry the same request blindly — unless it's a 429/500/503 (the first request likely wasn't processed)
  • Don't retry on 400/401/402/422 — the first request was received and rejected; retrying just wastes effort

Handling 429 rate limits specifically

429 responses may include a Retry-After header telling you how many seconds to wait:

except RateLimitError as e:
    retry_after = e.response.headers.get("retry-after")
    if retry_after:
        wait = int(retry_after)
    else:
        wait = 5  # default wait
    time.sleep(wait)

If your application is latency-sensitive:

  • Rotate across multiple API keys (each key has independent rate limits)
  • Use routing aliases when enabled (e.g., auto/chat) — the gateway picks an available model for that capability
  • Contact support to request higher limits

Common pitfalls

PitfallCauseFix
Retries get slower over timeNo jitter — all clients retry at onceAdd random jitter 0–500ms
Repeated 401 retriesKey is expired but still retryingDon't retry 401 — get a new key
429 thundering herdImmediate retry after rate limitWait for Retry-After or at least 5 seconds
Streaming interruptionNetwork hiccup, not a server errorUse SDK stream reconnection, or fall back to non-streaming retry

See also