Chat completions

The Chat Completions endpoint is the primary way to generate text responses. Send a list of messages, get a model-generated response back.

Endpoint

POST https://api.routic.ai/v1/chat/completions
Content-Type: application/json
Authorization: Bearer sk-xxxxxxxx

Replace the domain with your assigned Base URL.

Request body

Required parameters

ParameterTypeDescription
modelstringThe model name or enabled routing alias (e.g., deepseek-r1, auto/reasoning). See the Model catalog.
messagesarrayA list of messages comprising the conversation. Each message must have a role (system, user, or assistant) and content (string).

Optional parameters

ParameterTypeDefaultRangeDescription
streambooleanfalse-If true, the response is streamed as server-sent events.
temperaturenumber1.00–2Sampling temperature. Lower values make output more predictable; 0 for most predictable, 2 for most random.
max_tokensintegerModel default-Maximum number of tokens to generate.
top_pnumber1.00–1Nucleus sampling probability threshold.
frequency_penaltynumber0-2–2Penalize tokens based on their frequency in the text so far.
presence_penaltynumber0-2–2Penalize tokens based on whether they appear in the text so far.
stoparray / string / nullnull-Up to 4 sequences where the API will stop generating further tokens.
toolsarraynull-A list of tool definitions for function calling. See Tool calls.
tool_choicestring / object"none"-Controls which (if any) tool is called. "none" = no tool, "auto" = model decides, or specify a tool.
response_formatobject{ "type": "text" }-Set { "type": "json_object" } to enable JSON output mode. See JSON output.
thinkingobjectnull-Set { "type": "enabled" } to enable Thinking Mode on reasoning models. See Thinking Mode.
logprobsbooleanfalse-Whether to return log probabilities of the output tokens.
top_logprobsintegernull0–20Number of most likely tokens to return probabilities for at each token position.
stream_optionsobjectnull-Options for streaming response. Only set when stream: true.

Messages format

Each message in the messages array must have:

FieldTypeRequiredDescription
rolestringYesOne of system, user, or assistant.
contentstringYesThe text content of the message.

System message example (optional, guides behavior):

{ "role": "system", "content": "You are a helpful coding assistant." }

User message example:

{ "role": "user", "content": "Explain how JWT authentication works." }

Assistant message example (from previous responses):

{ "role": "assistant", "content": "JWT authentication works by..." }

Response body (non-streaming)

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1713593400,
  "model": "deepseek-r1",
  "system_fingerprint": "fp_xxxxx",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here's a detailed explanation...",
        "reasoning_content": "..." // Present when thinking mode is enabled on reasoning models
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 120,
    "total_tokens": 145,
    "prompt_cache_hit_tokens": 0,
    "prompt_cache_miss_tokens": 25
  }
}

Response fields

FieldTypeDescription
idstringUnique identifier for this completion.
objectstringAlways "chat.completion".
createdintegerUnix timestamp of when the completion was created.
modelstringThe model that generated the response.
system_fingerprintstringRepresents the backend configuration fingerprint.
choicesarrayList of completion choices. Usually contains one item.
choices[].indexintegerIndex of the choice in the list.
choices[].messageobjectThe generated message with role and content.
choices[].message.reasoning_contentstringChain-of-thought content (only when thinking mode is enabled).
choices[].finish_reasonstringReason the model stopped: "stop", "length", "tool_calls", or "content_filter".
usageobjectToken usage statistics.
usage.prompt_tokensintegerNumber of tokens in the prompt.
usage.completion_tokensintegerNumber of tokens in the generated response.
usage.total_tokensintegerTotal tokens (prompt + completion).
usage.prompt_cache_hit_tokensintegerTokens served from cache (billed at lower rate).
usage.prompt_cache_miss_tokensintegerTokens not served from cache (billed at normal rate).

Streaming response

When stream: true, the server sends events as the model generates tokens. Each event is a text/event-stream message with the following format:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1713593400,"model":"deepseek-r1","choices":[{"index":0,"delta":{"role":"assistant","content":"Here"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1713593400,"model":"deepseek-r1","choices":[{"index":0,"delta":{"content":"'s"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1713593400,"model":"deepseek-r1","choices":[{"index":0,"delta":{"content":" a"},"finish_reason":null}]}

data: [DONE]

Streaming delta format

Each chunk contains:

FieldDescription
delta.roleThe role of the message (only in the first chunk).
delta.contentThe incremental text generated. Concatenate all chunks to get the full response.
delta.reasoning_contentIncremental thinking content (when thinking mode is enabled).
finish_reasonnull while generating, then "stop" / "length" / "tool_calls" on the final chunk.

The stream ends with data: [DONE].


Code examples

cURL (basic)

curl -X POST "https://api.routic.ai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-xxxxxxxx" \
  -d '{
    "model": "deepseek-r1",
    "messages": [
      { "role": "user", "content": "Explain the difference between REST and GraphQL." }
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

cURL (streaming)

curl -X POST "https://api.routic.ai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-xxxxxxxx" \
  -d '{
    "model": "deepseek-v3",
    "messages": [
      { "role": "user", "content": "Write a haiku about coding." }
    ],
    "stream": true
  }'

cURL (smart routing name)

curl -X POST "https://api.routic.ai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-xxxxxxxx" \
  -d '{
    "model": "auto/reasoning",
    "messages": [
      { "role": "user", "content": "Analyze this business requirement and suggest next steps." }
    ]
  }'

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.routic.ai/v1",
    api_key="sk-xxxxxxxx",
)

response = client.chat.completions.create(
    model="deepseek-r1",
    messages=[
        {"role": "user", "content": "Explain the difference between REST and GraphQL."}
    ],
    temperature=0.7,
    max_tokens=500,
)

print(response.choices[0].message.content)

Node.js (OpenAI SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.routic.ai/v1",
  apiKey: "sk-xxxxxxxx",
});

const response = await client.chat.completions.create({
  model: "deepseek-r1",
  messages: [
    {
      role: "user",
      content: "Explain the difference between REST and GraphQL.",
    },
  ],
  temperature: 0.7,
  max_tokens: 500,
});

console.log(response.choices[0].message.content);

Two calling styles

Style 1: Canonical model name (recommended)

Use the industry-standard model name for precise control:

{ "model": "deepseek-r1", "messages": [...] }

Style 2: Smart routing name

Use a Routic-managed routing identifier — Routic automatically picks the best model for that capability:

{ "model": "auto/reasoning", "messages": [...] }

Both work on the same endpoint. See the Model catalog for details.


Migration from other platforms

If you're already using OpenAI, OpenRouter, or another compatible gateway, migration requires only 3 steps:

  1. Change base_url to your Routic endpoint.
  2. Change the API key to your Routic API Key.
  3. Update the model parameter to a Routic-supported canonical name.

No changes to request structure, message format, or response parsing are required.


See also