Thinking Mode

Thinking Mode (also called extended chain-of-thought) enables reasoning models to perform deep analysis before producing a final answer. It is ideal for complex problem-solving, mathematical reasoning, logic-heavy tasks, and multi-step analysis.

Supported models

Thinking Mode is available on reasoning models only:

ModelAuto-enabledManual enable
deepseek-r1Yes (default)Yes
deepseek-r1-0528Yes (default)Yes
qwq-32bNoYes
beijing-unicom-qwen3.5-397bNoYes

How to enable

Method 1: Use a reasoning model (auto-enabled)

Simply set the model parameter to a reasoning model. Thinking Mode is automatically enabled:

{
  "model": "deepseek-r1",
  "messages": [{ "role": "user", "content": "Solve this math problem..." }]
}

Method 2: Explicit enable with thinking parameter

You can explicitly enable Thinking Mode with the thinking parameter:

{
  "model": "deepseek-r1",
  "messages": [{ "role": "user", "content": "Analyze this logic..." }],
  "thinking": { "type": "enabled" }
}

Response format

When Thinking Mode is enabled, the response includes both the reasoning process and the final answer:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "The answer is 42.",
        "reasoning_content": "First, let me break down the problem...\nStep 1: ...\nStep 2: ...\n..."
      }
    }
  ]
}
FieldDescription
contentThe final answer provided to the user.
reasoning_contentThe extended chain-of-thought process. This is the model's internal reasoning and may include self-correction, verification, and multi-step analysis.

Parameter restrictions

When using Thinking Mode, the following parameters have special behavior:

ParameterBehavior
max_tokensSupported. Default 32K, max 64K for reasoning models.
temperatureNot applicable. Setting it has no effect.
top_pNot applicable.
presence_penaltyNot applicable.
frequency_penaltyNot applicable.
logprobsNot supported (returns 400 error).
top_logprobsNot supported.

Multi-turn conversations with Thinking Mode

When using Thinking Mode in multi-turn conversations:

  1. Each turn returns both reasoning_content and content.
  2. Do NOT include previous turns' reasoning_content in the next turn's messages — only include the content (final answers).
  3. This saves bandwidth and prevents the model from re-processing its own reasoning.

Example:

{
  "model": "deepseek-r1",
  "messages": [
    { "role": "user", "content": "What is the square root of 144?" },
    { "role": "assistant", "content": "The square root of 144 is 12." },
    { "role": "user", "content": "What about 169?" }
  ]
}

Note: The assistant's previous response only includes content, not reasoning_content.

Tool calls in Thinking Mode

When Thinking Mode is combined with tool calls:

  1. The model may perform multiple rounds of reasoning + tool calls before producing a final answer.
  2. During tool calls, you must pass the reasoning_content back to the API to let the model continue its reasoning chain.
  3. When the user starts a new question, clear previous reasoning_content from the conversation.

See Tool calls for detailed tool call documentation.

Temperature recommendations

For reasoning models, temperature is typically fixed or has limited effect. For other models:

ScenarioRecommended temperature
Programming / Math0.0
Data cleaning / Analysis1.0
General conversation1.3
Translation1.3
Creative writing / Poetry1.5

See also