Streaming & parameter support

Routic supports streaming output and a broad set of OpenAI-compatible parameters. This page describes what is supported and what is not.

Streaming

Set stream: true in your request to receive incremental responses via Server-Sent Events (SSE).

from openai import OpenAI

client = OpenAI(
    base_url="https://api.routic.ai/v1",
    api_key="sk-xxxxxxxx",
)

stream = client.chat.completions.create(
    model="deepseek-r1",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Streaming format

Content-Type: text/event-stream
Each chunk follows the OpenAI chat.completion.chunk schema
The final event is data: [DONE]

Thinking model streaming

Reasoning models (e.g., deepseek-r1) return reasoning content in the delta.reasoning_content field when streaming:

data: {"object":"chat.completion.chunk","delta":{"reasoning_content":"Let me think..."},"index":0}

data: {"object":"chat.completion.chunk","delta":{"content":"The answer is..."},"index":0}

data: [DONE]

Stream options

Use the stream_options parameter to control streaming behavior:

Option	Type	Default	Description
`include_usage`	boolean	false	Include token usage in the final chunk

{
  "model": "deepseek-r1",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": true,
  "stream_options": {"include_usage": true}
}

When include_usage is true, the last chunk before [DONE] contains the usage object.

Parameter reference

Supported parameters

Parameter	Type	Default	Description
`model`	string	—	Model name. Supports canonical names and smart routing names.
`stream`	boolean	false	Enable SSE streaming.
`temperature`	number	1	Sampling randomness. Range: 0–2.
`max_tokens`	integer	—	Maximum tokens to generate.
`top_p`	number	1	Nucleus sampling threshold.
`frequency_penalty`	number	0	Penalize repeated tokens. Range: -2 to 2.
`presence_penalty`	number	0	Encourage new topics. Range: -2 to 2.
`stop`	array / string	null	Up to 4 stop sequences.
`tools`	array	—	Function calling definitions.
`tool_choice`	string / object	—	`"none"`, `"auto"`, or a specific tool.
`response_format`	object	—	Enforce JSON mode or schema.
`logprobs`	boolean	false	Return log probabilities.
`top_logprobs`	integer	—	Number of top logprobs to return. Range: 0–20.
`stream_options`	object	—	Streaming options (only when `stream: true`).
`thinking`	object	—	Routic extension — enable Thinking Mode.

Not supported

Parameter	Type	Note
`n`	integer	Always returns 1 choice.
`seed`	integer	Not supported yet.
`logit_bias`	object	Not supported yet.

Gateway-level controls

The following controls are configured at the gateway level and cannot be set per request:

Control	Scope	Description
RPM limit	Per key	Requests per minute. Default: 100. Max: 1,000.
TPM limit	Per key	Tokens per minute. Default: 10,000. Max: 100,000.
Max budget	Per key	Spending cap. Default: $100. Max: $1,000.
Retries	Per route	Gateway auto-retries on model provider errors. Default: 3.
Timeout	Per route	Request timeout. Default: 60 seconds.

To change key-level limits, contact support or adjust in your dashboard. See Rate limits.