Thinking Mode

Thinking Mode (also called extended chain-of-thought) enables reasoning models to perform deep analysis before producing a final answer. It is ideal for complex problem-solving, mathematical reasoning, logic-heavy tasks, and multi-step analysis.

Supported models

Thinking Mode is available on reasoning models only:

Model	Auto-enabled	Manual enable
`deepseek-r1`	Yes (default)	Yes
`deepseek-r1-0528`	Yes (default)	Yes
`qwq-32b`	No	Yes
`beijing-unicom-qwen3.5-397b`	No	Yes

How to enable

Method 1: Use a reasoning model (auto-enabled)

Simply set the model parameter to a reasoning model. Thinking Mode is automatically enabled:

{
  "model": "deepseek-r1",
  "messages": [{ "role": "user", "content": "Solve this math problem..." }]
}

Method 2: Explicit enable with `thinking` parameter

You can explicitly enable Thinking Mode with the thinking parameter:

{
  "model": "deepseek-r1",
  "messages": [{ "role": "user", "content": "Analyze this logic..." }],
  "thinking": { "type": "enabled" }
}

Response format

When Thinking Mode is enabled, the response includes both the reasoning process and the final answer:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "The answer is 42.",
        "reasoning_content": "First, let me break down the problem...\nStep 1: ...\nStep 2: ...\n..."
      }
    }
  ]
}

Field	Description
`content`	The final answer provided to the user.
`reasoning_content`	The extended chain-of-thought process. This is the model's internal reasoning and may include self-correction, verification, and multi-step analysis.

Parameter restrictions

When using Thinking Mode, the following parameters have special behavior:

Parameter	Behavior
`max_tokens`	Supported. Default 32K, max 64K for reasoning models.
`temperature`	Not applicable. Setting it has no effect.
`top_p`	Not applicable.
`presence_penalty`	Not applicable.
`frequency_penalty`	Not applicable.
`logprobs`	Not supported (returns 400 error).
`top_logprobs`	Not supported.

Multi-turn conversations with Thinking Mode

When using Thinking Mode in multi-turn conversations:

Each turn returns both reasoning_content and content.
Do NOT include previous turns' reasoning_content in the next turn's messages — only include the content (final answers).
This saves bandwidth and prevents the model from re-processing its own reasoning.

Example:

{
  "model": "deepseek-r1",
  "messages": [
    { "role": "user", "content": "What is the square root of 144?" },
    { "role": "assistant", "content": "The square root of 144 is 12." },
    { "role": "user", "content": "What about 169?" }
  ]
}

Note: The assistant's previous response only includes content, not reasoning_content.

Tool calls in Thinking Mode

When Thinking Mode is combined with tool calls:

The model may perform multiple rounds of reasoning + tool calls before producing a final answer.
During tool calls, you must pass the reasoning_content back to the API to let the model continue its reasoning chain.
When the user starts a new question, clear previous reasoning_content from the conversation.

See Tool calls for detailed tool call documentation.

Temperature recommendations

For reasoning models, temperature is typically fixed or has limited effect. For other models:

Scenario	Recommended temperature
Programming / Math	0.0
Data cleaning / Analysis	1.0
General conversation	1.3
Translation	1.3
Creative writing / Poetry	1.5