Thinking Mode
Thinking Mode (also called extended chain-of-thought) enables reasoning models to perform deep analysis before producing a final answer. It is ideal for complex problem-solving, mathematical reasoning, logic-heavy tasks, and multi-step analysis.
Supported models
Thinking Mode is available on reasoning models only:
| Model | Auto-enabled | Manual enable |
|---|---|---|
deepseek-r1 | Yes (default) | Yes |
deepseek-r1-0528 | Yes (default) | Yes |
qwq-32b | No | Yes |
beijing-unicom-qwen3.5-397b | No | Yes |
How to enable
Method 1: Use a reasoning model (auto-enabled)
Simply set the model parameter to a reasoning model. Thinking Mode is automatically enabled:
{
"model": "deepseek-r1",
"messages": [{ "role": "user", "content": "Solve this math problem..." }]
}
Method 2: Explicit enable with thinking parameter
You can explicitly enable Thinking Mode with the thinking parameter:
{
"model": "deepseek-r1",
"messages": [{ "role": "user", "content": "Analyze this logic..." }],
"thinking": { "type": "enabled" }
}
Response format
When Thinking Mode is enabled, the response includes both the reasoning process and the final answer:
{
"choices": [
{
"message": {
"role": "assistant",
"content": "The answer is 42.",
"reasoning_content": "First, let me break down the problem...\nStep 1: ...\nStep 2: ...\n..."
}
}
]
}
| Field | Description |
|---|---|
content | The final answer provided to the user. |
reasoning_content | The extended chain-of-thought process. This is the model's internal reasoning and may include self-correction, verification, and multi-step analysis. |
Parameter restrictions
When using Thinking Mode, the following parameters have special behavior:
| Parameter | Behavior |
|---|---|
max_tokens | Supported. Default 32K, max 64K for reasoning models. |
temperature | Not applicable. Setting it has no effect. |
top_p | Not applicable. |
presence_penalty | Not applicable. |
frequency_penalty | Not applicable. |
logprobs | Not supported (returns 400 error). |
top_logprobs | Not supported. |
Multi-turn conversations with Thinking Mode
When using Thinking Mode in multi-turn conversations:
- Each turn returns both
reasoning_contentandcontent. - Do NOT include previous turns'
reasoning_contentin the next turn's messages — only include thecontent(final answers). - This saves bandwidth and prevents the model from re-processing its own reasoning.
Example:
{
"model": "deepseek-r1",
"messages": [
{ "role": "user", "content": "What is the square root of 144?" },
{ "role": "assistant", "content": "The square root of 144 is 12." },
{ "role": "user", "content": "What about 169?" }
]
}
Note: The assistant's previous response only includes content, not reasoning_content.
Tool calls in Thinking Mode
When Thinking Mode is combined with tool calls:
- The model may perform multiple rounds of reasoning + tool calls before producing a final answer.
- During tool calls, you must pass the
reasoning_contentback to the API to let the model continue its reasoning chain. - When the user starts a new question, clear previous
reasoning_contentfrom the conversation.
See Tool calls for detailed tool call documentation.
Temperature recommendations
For reasoning models, temperature is typically fixed or has limited effect. For other models:
| Scenario | Recommended temperature |
|---|---|
| Programming / Math | 0.0 |
| Data cleaning / Analysis | 1.0 |
| General conversation | 1.3 |
| Translation | 1.3 |
| Creative writing / Poetry | 1.5 |