Model catalog
Routic offers a curated catalog of AI models through a single API endpoint. All models are accessible via the same Base URL; you select which model to use with the model parameter in your request.
Base URL
https://api.routic.ai/v1
Authentication
All requests require an API Key (sk-*) in the Authorization header:
Authorization: Bearer sk-xxxxxxxx
How to select a model
Use the model parameter in the request body. Two naming styles are supported:
Style 1: Canonical model name (recommended)
Use the industry-standard model identifier. This is the primary way to call a specific model:
{
"model": "deepseek-r1",
"messages": [{ "role": "user", "content": "Hello" }]
}
Style 2: Smart routing name
Use a Routic-managed routing identifier. Routic automatically picks the best available model for that capability:
{
"model": "auto/reasoning",
"messages": [{ "role": "user", "content": "Hello" }]
}
Both styles work on the same endpoint (/v1/chat/completions). Canonical names give you precise control; smart routing names let Routic optimize routing behind the scenes.
Available models
Reasoning models
Reasoning models use extended chain-of-thought processing before producing a response. They excel at complex problem-solving, mathematical reasoning, and logic-heavy tasks.
| Canonical name | Type | Context length | Max output | Reasoning | Function calling | Price (input) | Price (output) | Cache (input) |
|---|---|---|---|---|---|---|---|---|
deepseek-r1 | Reasoning | 64K | 8K | Yes | Yes | $0.55/M | $2.00/M | $0.055/M |
deepseek-r1-0528 | Reasoning | 64K | 8K | Yes | Yes | $0.35/M | $1.70/M | $0.035/M |
qwq-32b | Reasoning | 128K | 8K | Yes | Yes | $0.12/M | $0.45/M | $0.012/M |
General chat models
General-purpose conversation models optimized for speed and cost. Suitable for everyday Q&A, summarization, and content generation.
| Canonical name | Type | Context length | Max output | Reasoning | Function calling | Price (input) | Price (output) | Cache (input) |
|---|---|---|---|---|---|---|---|---|
deepseek-v3 | Chat | 64K | 8K | No | Yes | $0.25/M | $0.70/M | $0.025/M |
deepseek-v3-0324 | Chat | 64K | 8K | No | Yes | $0.16/M | $0.60/M | $0.016/M |
deepseek-v3.1 | Chat | 128K | 8K | No | Yes | $0.12/M | $0.60/M | $0.012/M |
deepseek-v3.2 | Chat | 128K | 8K | No | Yes | $0.20/M | $0.30/M | $0.020/M |
minimax-m2.5 | Chat / Code | 1M | 8K | No | Yes | $0.10/M | $0.80/M | $0.010/M |
Code models
Models optimized for code generation, programming assistance, and software engineering tasks.
| Canonical name | Type | Context length | Max output | Reasoning | Function calling | Price (input) | Price (output) | Cache (input) |
|---|---|---|---|---|---|---|---|---|
qwen3-coder-plus | Code | 1M | 65K | No | Yes | $0.65/M | $3.25/M | — |
Distill models (cost-effective)
Distilled versions of top models, offering good performance at lower cost. Ideal for high-volume, latency-sensitive workloads. These models are trained on reasoning outputs but do not support the thinking API parameter.
| Canonical name | Type | Context length | Max output | Reasoning | Function calling | Price (input) | Price (output) | Cache (input) |
|---|---|---|---|---|---|---|---|---|
deepseek-r1-distill-qwen-32b | Chat | 64K | 8K | No | Yes | $0.23/M | $0.23/M | $0.023/M |
deepseek-r1-distill-qwen-14b | Chat | 64K | 8K | No | Yes | $0.15/M | $0.15/M | $0.015/M |
deepseek-r1-distill-llama-70b | Chat | 128K | 8K | No | Yes | $0.55/M | $0.65/M | $0.055/M |
Partner models
Models hosted through partner infrastructure with specific routing configurations.
| Canonical name | Type | Context length | Max output | Reasoning | Function calling | Price (input) | Price (output) | Cache (input) |
|---|---|---|---|---|---|---|---|---|
beijing-unicom-qwen3-32b | Chat | 32K | 8K | No | Yes | Contact sales | Contact sales | — |
beijing-unicom-qwen3.5-397b | Reasoning | 128K | 8K | Yes | Yes | Contact sales | Contact sales | — |
Availability note:
deepseek-r1-0528anddeepseek-v3-0324are listed as salable SKUs but may not have a live upstream deployment at all times. If you need guaranteed availability, usedeepseek-r1ordeepseek-v3instead, or enable smart routing for automatic failover.
Cache pricing: The "Cache (input)" column shows the per-model cache hit rate. A dash (—) means cache hit pricing is not yet published for that model; check the console for the latest rate. See Context caching for details.
Per-model parameter reference
Thinking Mode (reasoning)
| Model | Thinking support | How to enable | temperature / top_p |
|---|---|---|---|
deepseek-r1 | Auto (on) | Automatic; or thinking: { "type": "enabled" } | Fixed; setting has no effect |
deepseek-r1-0528 | Auto (on) | Automatic; or thinking: { "type": "enabled" } | Fixed; setting has no effect |
qwq-32b | Manual | thinking: { "type": "enabled" } | Normal (0–2) |
beijing-unicom-qwen3.5-397b | Manual | thinking: { "type": "enabled" } | Normal (0–2) |
| All other models | Not supported | — | Normal (0–2) |
For details, see Thinking Mode.
Context caching
Context caching is enabled by default on all models. Cache hit tokens are billed at a lower rate (approximately 10x cheaper than standard input). The "Cache (input)" column in the model tables above shows the per-model cache hit rate.
See Context caching for details and best practices.
API endpoints
All models share the same API endpoints:
| Endpoint | Path | Method |
|---|---|---|
| Chat completions | /v1/chat/completions | POST |
| Model list | /v1/models | GET |
Capability matrix
| Model | Reasoning | Function calling | Streaming | JSON mode | Context caching |
|---|---|---|---|---|---|
deepseek-r1 | Yes | Yes | Yes | Yes | Yes |
deepseek-r1-0528 | Yes | Yes | Yes | Yes | Yes |
deepseek-v3 | No | Yes | Yes | Yes | Yes |
deepseek-v3-0324 | No | Yes | Yes | Yes | Yes |
deepseek-v3.1 | No | Yes | Yes | Yes | Yes |
deepseek-v3.2 | No | Yes | Yes | Yes | Yes |
qwq-32b | Yes | Yes | Yes | Yes | Yes |
minimax-m2.5 | No | Yes | Yes | Limited | Yes |
qwen3-coder-plus | No | Yes | Yes | Yes | Yes |
deepseek-r1-distill-qwen-32b | No | Yes | Yes | Yes | Yes |
deepseek-r1-distill-qwen-14b | No | Yes | Yes | Yes | Yes |
deepseek-r1-distill-llama-70b | No | Yes | Yes | Yes | Yes |
beijing-unicom-qwen3-32b | No | Yes | Yes | Yes | Yes |
beijing-unicom-qwen3.5-397b | Yes | Yes | Yes | Yes | Yes |
Note: "Reasoning" refers to extended chain-of-thought processing (Thinking Mode). Models marked "Yes" support the
thinkingparameter. "Function calling" refers to thetools/tool_choiceparameter support.
Rate limits
Each API Key has default RPM (requests per minute) and TPM (tokens per minute) limits. Default values:
| Limit | Value |
|---|---|
| Default RPM | 100 |
| Default TPM | 10,000 |
| Max RPM (upper bound) | 1,000 |
| Max TPM (upper bound) | 100,000 |
| Max budget duration | 90 days |
To request higher limits, contact support.
Model selection best practices
- Use canonical names for predictable behavior and reproducible results.
- Use reasoning models (
deepseek-r1,qwq-32b) for math, logic, and complex analysis. - Use general chat models (
deepseek-v3.*) for Q&A, summarization, and content generation. - Use code models (
qwen3-coder-plus) for code generation, refactoring, and programming assistance. - Use distill models for high-volume, cost-sensitive workloads where latency matters.
- Use
minimax-m2.5when you need ultra-long context (up to 1M tokens). - Enable context caching for multi-turn conversations with repeated context (default: on).
- Use smart routing for automatic failover when model availability changes.
Coming soon
The following capabilities are planned but not yet available:
- Vision models (image understanding)
- Video generation
- Embedding / vector models
- FIM (Fill-In-the-Middle) completion for code