Streaming & parameter support

Routic supports streaming output and a broad set of OpenAI-compatible parameters. This page describes what is supported and what is not.

Streaming

Set stream: true in your request to receive incremental responses via Server-Sent Events (SSE).

from openai import OpenAI

client = OpenAI(
    base_url="https://api.routic.ai/v1",
    api_key="sk-xxxxxxxx",
)

stream = client.chat.completions.create(
    model="deepseek-r1",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Streaming format

  • Content-Type: text/event-stream
  • Each chunk follows the OpenAI chat.completion.chunk schema
  • The final event is data: [DONE]

Thinking model streaming

Reasoning models (e.g., deepseek-r1) return reasoning content in the delta.reasoning_content field when streaming:

data: {"object":"chat.completion.chunk","delta":{"reasoning_content":"Let me think..."},"index":0}

data: {"object":"chat.completion.chunk","delta":{"content":"The answer is..."},"index":0}

data: [DONE]

Stream options

Use the stream_options parameter to control streaming behavior:

OptionTypeDefaultDescription
include_usagebooleanfalseInclude token usage in the final chunk
{
  "model": "deepseek-r1",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": true,
  "stream_options": {"include_usage": true}
}

When include_usage is true, the last chunk before [DONE] contains the usage object.


Parameter reference

Supported parameters

ParameterTypeDefaultDescription
modelstringModel name. Supports canonical names and smart routing names.
streambooleanfalseEnable SSE streaming.
temperaturenumber1Sampling randomness. Range: 0–2.
max_tokensintegerMaximum tokens to generate.
top_pnumber1Nucleus sampling threshold.
frequency_penaltynumber0Penalize repeated tokens. Range: -2 to 2.
presence_penaltynumber0Encourage new topics. Range: -2 to 2.
stoparray / stringnullUp to 4 stop sequences.
toolsarrayFunction calling definitions.
tool_choicestring / object"none", "auto", or a specific tool.
response_formatobjectEnforce JSON mode or schema.
logprobsbooleanfalseReturn log probabilities.
top_logprobsintegerNumber of top logprobs to return. Range: 0–20.
stream_optionsobjectStreaming options (only when stream: true).
thinkingobjectRoutic extension — enable Thinking Mode.

Not supported

ParameterTypeNote
nintegerAlways returns 1 choice.
seedintegerNot supported yet.
logit_biasobjectNot supported yet.

Gateway-level controls

The following controls are configured at the gateway level and cannot be set per request:

ControlScopeDescription
RPM limitPer keyRequests per minute. Default: 100. Max: 1,000.
TPM limitPer keyTokens per minute. Default: 10,000. Max: 100,000.
Max budgetPer keySpending cap. Default: $100. Max: $1,000.
RetriesPer routeGateway auto-retries on model provider errors. Default: 3.
TimeoutPer routeRequest timeout. Default: 60 seconds.

To change key-level limits, contact support or adjust in your dashboard. See Rate limits.


See also