Model catalog

Routic offers a curated catalog of AI models through a single API endpoint. All models are accessible via the same Base URL; you select which model to use with the model parameter in your request.

Base URL

https://api.routic.ai/v1

Authentication

All requests require an API Key (sk-*) in the Authorization header:

Authorization: Bearer sk-xxxxxxxx

How to select a model

Use the model parameter in the request body. Two naming styles are supported:

Style 1: Canonical model name (recommended)

Use the industry-standard model identifier. This is the primary way to call a specific model:

{
  "model": "deepseek-r1",
  "messages": [{ "role": "user", "content": "Hello" }]
}

Style 2: Smart routing name

Use a Routic-managed routing identifier. Routic automatically picks the best available model for that capability:

{
  "model": "auto/reasoning",
  "messages": [{ "role": "user", "content": "Hello" }]
}

Both styles work on the same endpoint (/v1/chat/completions). Canonical names give you precise control; smart routing names let Routic optimize routing behind the scenes.

Available models

Reasoning models

Reasoning models use extended chain-of-thought processing before producing a response. They excel at complex problem-solving, mathematical reasoning, and logic-heavy tasks.

Canonical name	Type	Context length	Max output	Reasoning	Function calling	Price (input)	Price (output)	Cache (input)
`deepseek-r1`	Reasoning	64K	8K	Yes	Yes	$0.55/M	$2.00/M	$0.055/M
`deepseek-r1-0528`	Reasoning	64K	8K	Yes	Yes	$0.35/M	$1.70/M	$0.035/M
`qwq-32b`	Reasoning	128K	8K	Yes	Yes	$0.12/M	$0.45/M	$0.012/M

General chat models

General-purpose conversation models optimized for speed and cost. Suitable for everyday Q&A, summarization, and content generation.

Canonical name	Type	Context length	Max output	Reasoning	Function calling	Price (input)	Price (output)	Cache (input)
`deepseek-v3`	Chat	64K	8K	No	Yes	$0.25/M	$0.70/M	$0.025/M
`deepseek-v3-0324`	Chat	64K	8K	No	Yes	$0.16/M	$0.60/M	$0.016/M
`deepseek-v3.1`	Chat	128K	8K	No	Yes	$0.12/M	$0.60/M	$0.012/M
`deepseek-v3.2`	Chat	128K	8K	No	Yes	$0.20/M	$0.30/M	$0.020/M
`minimax-m2.5`	Chat / Code	1M	8K	No	Yes	$0.10/M	$0.80/M	$0.010/M

Code models

Models optimized for code generation, programming assistance, and software engineering tasks.

Canonical name	Type	Context length	Max output	Reasoning	Function calling	Price (input)	Price (output)	Cache (input)
`qwen3-coder-plus`	Code	1M	65K	No	Yes	$0.65/M	$3.25/M	—

Distill models (cost-effective)

Distilled versions of top models, offering good performance at lower cost. Ideal for high-volume, latency-sensitive workloads. These models are trained on reasoning outputs but do not support the thinking API parameter.

Canonical name	Type	Context length	Max output	Reasoning	Function calling	Price (input)	Price (output)	Cache (input)
`deepseek-r1-distill-qwen-32b`	Chat	64K	8K	No	Yes	$0.23/M	$0.23/M	$0.023/M
`deepseek-r1-distill-qwen-14b`	Chat	64K	8K	No	Yes	$0.15/M	$0.15/M	$0.015/M
`deepseek-r1-distill-llama-70b`	Chat	128K	8K	No	Yes	$0.55/M	$0.65/M	$0.055/M

Partner models

Models hosted through partner infrastructure with specific routing configurations.

Canonical name	Type	Context length	Max output	Reasoning	Function calling	Price (input)	Price (output)	Cache (input)
`beijing-unicom-qwen3-32b`	Chat	32K	8K	No	Yes	Contact sales	Contact sales	—
`beijing-unicom-qwen3.5-397b`	Reasoning	128K	8K	Yes	Yes	Contact sales	Contact sales	—

Availability note: deepseek-r1-0528 and deepseek-v3-0324 are listed as salable SKUs but may not have a live upstream deployment at all times. If you need guaranteed availability, use deepseek-r1 or deepseek-v3 instead, or enable smart routing for automatic failover.

Cache pricing: The "Cache (input)" column shows the per-model cache hit rate. A dash (—) means cache hit pricing is not yet published for that model; check the console for the latest rate. See Context caching for details.

Per-model parameter reference

Thinking Mode (reasoning)

Model	Thinking support	How to enable	`temperature` / `top_p`
`deepseek-r1`	Auto (on)	Automatic; or `thinking: { "type": "enabled" }`	Fixed; setting has no effect
`deepseek-r1-0528`	Auto (on)	Automatic; or `thinking: { "type": "enabled" }`	Fixed; setting has no effect
`qwq-32b`	Manual	`thinking: { "type": "enabled" }`	Normal (0–2)
`beijing-unicom-qwen3.5-397b`	Manual	`thinking: { "type": "enabled" }`	Normal (0–2)
All other models	Not supported	—	Normal (0–2)

For details, see Thinking Mode.

Context caching

Context caching is enabled by default on all models. Cache hit tokens are billed at a lower rate (approximately 10x cheaper than standard input). The "Cache (input)" column in the model tables above shows the per-model cache hit rate.

See Context caching for details and best practices.

API endpoints

All models share the same API endpoints:

Endpoint	Path	Method
Chat completions	`/v1/chat/completions`	POST
Model list	`/v1/models`	GET

Capability matrix

Model	Reasoning	Function calling	Streaming	JSON mode	Context caching
`deepseek-r1`	Yes	Yes	Yes	Yes	Yes
`deepseek-r1-0528`	Yes	Yes	Yes	Yes	Yes
`deepseek-v3`	No	Yes	Yes	Yes	Yes
`deepseek-v3-0324`	No	Yes	Yes	Yes	Yes
`deepseek-v3.1`	No	Yes	Yes	Yes	Yes
`deepseek-v3.2`	No	Yes	Yes	Yes	Yes
`qwq-32b`	Yes	Yes	Yes	Yes	Yes
`minimax-m2.5`	No	Yes	Yes	Limited	Yes
`qwen3-coder-plus`	No	Yes	Yes	Yes	Yes
`deepseek-r1-distill-qwen-32b`	No	Yes	Yes	Yes	Yes
`deepseek-r1-distill-qwen-14b`	No	Yes	Yes	Yes	Yes
`deepseek-r1-distill-llama-70b`	No	Yes	Yes	Yes	Yes
`beijing-unicom-qwen3-32b`	No	Yes	Yes	Yes	Yes
`beijing-unicom-qwen3.5-397b`	Yes	Yes	Yes	Yes	Yes

Note: "Reasoning" refers to extended chain-of-thought processing (Thinking Mode). Models marked "Yes" support the thinking parameter. "Function calling" refers to the tools / tool_choice parameter support.

Rate limits

Each API Key has default RPM (requests per minute) and TPM (tokens per minute) limits. Default values:

Limit	Value
Default RPM	100
Default TPM	10,000
Max RPM (upper bound)	1,000
Max TPM (upper bound)	100,000
Max budget duration	90 days

To request higher limits, contact support.

Model selection best practices

Use canonical names for predictable behavior and reproducible results.
Use reasoning models (deepseek-r1, qwq-32b) for math, logic, and complex analysis.
Use general chat models (deepseek-v3.*) for Q&A, summarization, and content generation.
Use code models (qwen3-coder-plus) for code generation, refactoring, and programming assistance.
Use distill models for high-volume, cost-sensitive workloads where latency matters.
Use minimax-m2.5 when you need ultra-long context (up to 1M tokens).
Enable context caching for multi-turn conversations with repeated context (default: on).
Use smart routing for automatic failover when model availability changes.

Coming soon

The following capabilities are planned but not yet available:

Vision models (image understanding)
Video generation
Embedding / vector models
FIM (Fill-In-the-Middle) completion for code