Model catalog

Routic offers a curated catalog of AI models through a single API endpoint. All models are accessible via the same Base URL; you select which model to use with the model parameter in your request.

Base URL

https://api.routic.ai/v1

Authentication

All requests require an API Key (sk-*) in the Authorization header:

Authorization: Bearer sk-xxxxxxxx

How to select a model

Use the model parameter in the request body. Two naming styles are supported:

Style 1: Canonical model name (recommended)

Use the industry-standard model identifier. This is the primary way to call a specific model:

{
  "model": "deepseek-r1",
  "messages": [{ "role": "user", "content": "Hello" }]
}

Style 2: Smart routing name

Use a Routic-managed routing identifier. Routic automatically picks the best available model for that capability:

{
  "model": "auto/reasoning",
  "messages": [{ "role": "user", "content": "Hello" }]
}

Both styles work on the same endpoint (/v1/chat/completions). Canonical names give you precise control; smart routing names let Routic optimize routing behind the scenes.


Available models

Reasoning models

Reasoning models use extended chain-of-thought processing before producing a response. They excel at complex problem-solving, mathematical reasoning, and logic-heavy tasks.

Canonical nameTypeContext lengthMax outputReasoningFunction callingPrice (input)Price (output)Cache (input)
deepseek-r1Reasoning64K8KYesYes$0.55/M$2.00/M$0.055/M
deepseek-r1-0528Reasoning64K8KYesYes$0.35/M$1.70/M$0.035/M
qwq-32bReasoning128K8KYesYes$0.12/M$0.45/M$0.012/M

General chat models

General-purpose conversation models optimized for speed and cost. Suitable for everyday Q&A, summarization, and content generation.

Canonical nameTypeContext lengthMax outputReasoningFunction callingPrice (input)Price (output)Cache (input)
deepseek-v3Chat64K8KNoYes$0.25/M$0.70/M$0.025/M
deepseek-v3-0324Chat64K8KNoYes$0.16/M$0.60/M$0.016/M
deepseek-v3.1Chat128K8KNoYes$0.12/M$0.60/M$0.012/M
deepseek-v3.2Chat128K8KNoYes$0.20/M$0.30/M$0.020/M
minimax-m2.5Chat / Code1M8KNoYes$0.10/M$0.80/M$0.010/M

Code models

Models optimized for code generation, programming assistance, and software engineering tasks.

Canonical nameTypeContext lengthMax outputReasoningFunction callingPrice (input)Price (output)Cache (input)
qwen3-coder-plusCode1M65KNoYes$0.65/M$3.25/M

Distill models (cost-effective)

Distilled versions of top models, offering good performance at lower cost. Ideal for high-volume, latency-sensitive workloads. These models are trained on reasoning outputs but do not support the thinking API parameter.

Canonical nameTypeContext lengthMax outputReasoningFunction callingPrice (input)Price (output)Cache (input)
deepseek-r1-distill-qwen-32bChat64K8KNoYes$0.23/M$0.23/M$0.023/M
deepseek-r1-distill-qwen-14bChat64K8KNoYes$0.15/M$0.15/M$0.015/M
deepseek-r1-distill-llama-70bChat128K8KNoYes$0.55/M$0.65/M$0.055/M

Partner models

Models hosted through partner infrastructure with specific routing configurations.

Canonical nameTypeContext lengthMax outputReasoningFunction callingPrice (input)Price (output)Cache (input)
beijing-unicom-qwen3-32bChat32K8KNoYesContact salesContact sales
beijing-unicom-qwen3.5-397bReasoning128K8KYesYesContact salesContact sales

Availability note: deepseek-r1-0528 and deepseek-v3-0324 are listed as salable SKUs but may not have a live upstream deployment at all times. If you need guaranteed availability, use deepseek-r1 or deepseek-v3 instead, or enable smart routing for automatic failover.

Cache pricing: The "Cache (input)" column shows the per-model cache hit rate. A dash (—) means cache hit pricing is not yet published for that model; check the console for the latest rate. See Context caching for details.


Per-model parameter reference

Thinking Mode (reasoning)

ModelThinking supportHow to enabletemperature / top_p
deepseek-r1Auto (on)Automatic; or thinking: { "type": "enabled" }Fixed; setting has no effect
deepseek-r1-0528Auto (on)Automatic; or thinking: { "type": "enabled" }Fixed; setting has no effect
qwq-32bManualthinking: { "type": "enabled" }Normal (0–2)
beijing-unicom-qwen3.5-397bManualthinking: { "type": "enabled" }Normal (0–2)
All other modelsNot supportedNormal (0–2)

For details, see Thinking Mode.

Context caching

Context caching is enabled by default on all models. Cache hit tokens are billed at a lower rate (approximately 10x cheaper than standard input). The "Cache (input)" column in the model tables above shows the per-model cache hit rate.

See Context caching for details and best practices.

API endpoints

All models share the same API endpoints:

EndpointPathMethod
Chat completions/v1/chat/completionsPOST
Model list/v1/modelsGET

Capability matrix

ModelReasoningFunction callingStreamingJSON modeContext caching
deepseek-r1YesYesYesYesYes
deepseek-r1-0528YesYesYesYesYes
deepseek-v3NoYesYesYesYes
deepseek-v3-0324NoYesYesYesYes
deepseek-v3.1NoYesYesYesYes
deepseek-v3.2NoYesYesYesYes
qwq-32bYesYesYesYesYes
minimax-m2.5NoYesYesLimitedYes
qwen3-coder-plusNoYesYesYesYes
deepseek-r1-distill-qwen-32bNoYesYesYesYes
deepseek-r1-distill-qwen-14bNoYesYesYesYes
deepseek-r1-distill-llama-70bNoYesYesYesYes
beijing-unicom-qwen3-32bNoYesYesYesYes
beijing-unicom-qwen3.5-397bYesYesYesYesYes

Note: "Reasoning" refers to extended chain-of-thought processing (Thinking Mode). Models marked "Yes" support the thinking parameter. "Function calling" refers to the tools / tool_choice parameter support.

Rate limits

Each API Key has default RPM (requests per minute) and TPM (tokens per minute) limits. Default values:

LimitValue
Default RPM100
Default TPM10,000
Max RPM (upper bound)1,000
Max TPM (upper bound)100,000
Max budget duration90 days

To request higher limits, contact support.

Model selection best practices

  1. Use canonical names for predictable behavior and reproducible results.
  2. Use reasoning models (deepseek-r1, qwq-32b) for math, logic, and complex analysis.
  3. Use general chat models (deepseek-v3.*) for Q&A, summarization, and content generation.
  4. Use code models (qwen3-coder-plus) for code generation, refactoring, and programming assistance.
  5. Use distill models for high-volume, cost-sensitive workloads where latency matters.
  6. Use minimax-m2.5 when you need ultra-long context (up to 1M tokens).
  7. Enable context caching for multi-turn conversations with repeated context (default: on).
  8. Use smart routing for automatic failover when model availability changes.

Coming soon

The following capabilities are planned but not yet available:

  • Vision models (image understanding)
  • Video generation
  • Embedding / vector models
  • FIM (Fill-In-the-Middle) completion for code

See also