Streaming & parameter support
Routic supports streaming output and a broad set of OpenAI-compatible parameters. This page describes what is supported and what is not.
Streaming
Set stream: true in your request to receive incremental responses via Server-Sent Events (SSE).
from openai import OpenAI
client = OpenAI(
base_url="https://api.routic.ai/v1",
api_key="sk-xxxxxxxx",
)
stream = client.chat.completions.create(
model="deepseek-r1",
messages=[{"role": "user", "content": "Hello"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Streaming format
- Content-Type:
text/event-stream - Each chunk follows the OpenAI
chat.completion.chunkschema - The final event is
data: [DONE]
Thinking model streaming
Reasoning models (e.g., deepseek-r1) return reasoning content in the delta.reasoning_content field when streaming:
data: {"object":"chat.completion.chunk","delta":{"reasoning_content":"Let me think..."},"index":0}
data: {"object":"chat.completion.chunk","delta":{"content":"The answer is..."},"index":0}
data: [DONE]
Stream options
Use the stream_options parameter to control streaming behavior:
| Option | Type | Default | Description |
|---|---|---|---|
include_usage | boolean | false | Include token usage in the final chunk |
{
"model": "deepseek-r1",
"messages": [{"role": "user", "content": "Hello"}],
"stream": true,
"stream_options": {"include_usage": true}
}
When include_usage is true, the last chunk before [DONE] contains the usage object.
Parameter reference
Supported parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model | string | — | Model name. Supports canonical names and smart routing names. |
stream | boolean | false | Enable SSE streaming. |
temperature | number | 1 | Sampling randomness. Range: 0–2. |
max_tokens | integer | — | Maximum tokens to generate. |
top_p | number | 1 | Nucleus sampling threshold. |
frequency_penalty | number | 0 | Penalize repeated tokens. Range: -2 to 2. |
presence_penalty | number | 0 | Encourage new topics. Range: -2 to 2. |
stop | array / string | null | Up to 4 stop sequences. |
tools | array | — | Function calling definitions. |
tool_choice | string / object | — | "none", "auto", or a specific tool. |
response_format | object | — | Enforce JSON mode or schema. |
logprobs | boolean | false | Return log probabilities. |
top_logprobs | integer | — | Number of top logprobs to return. Range: 0–20. |
stream_options | object | — | Streaming options (only when stream: true). |
thinking | object | — | Routic extension — enable Thinking Mode. |
Not supported
| Parameter | Type | Note |
|---|---|---|
n | integer | Always returns 1 choice. |
seed | integer | Not supported yet. |
logit_bias | object | Not supported yet. |
Gateway-level controls
The following controls are configured at the gateway level and cannot be set per request:
| Control | Scope | Description |
|---|---|---|
| RPM limit | Per key | Requests per minute. Default: 100. Max: 1,000. |
| TPM limit | Per key | Tokens per minute. Default: 10,000. Max: 100,000. |
| Max budget | Per key | Spending cap. Default: $100. Max: $1,000. |
| Retries | Per route | Gateway auto-retries on model provider errors. Default: 3. |
| Timeout | Per route | Request timeout. Default: 60 seconds. |
To change key-level limits, contact support or adjust in your dashboard. See Rate limits.