Multi-turn conversations
Multi-turn conversations maintain context across multiple exchanges. The model uses the conversation history to provide coherent and contextually relevant responses.
How it works
Build the messages array by appending each turn's messages:
{
"model": "deepseek-v3",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "What is Python?" },
{
"role": "assistant",
"content": "Python is a high-level programming language..."
},
{ "role": "user", "content": "What are its main use cases?" }
]
}
The model uses the full conversation history to understand context.
Message roles
| Role | Description |
|---|---|
system | Optional. Sets the behavior and tone. Usually placed at the beginning. |
user | The user's input or question. |
assistant | The model's previous response. |
Building a multi-turn conversation
Example: Code review
messages = [
{"role": "system", "content": "You are a senior code reviewer."},
]
# Turn 1
messages.append({"role": "user", "content": "Review this Python function:\n\ndef add(a, b):\n return a + b"})
response = client.chat.completions.create(model="deepseek-v3", messages=messages)
messages.append({"role": "assistant", "content": response.choices[0].message.content})
# Turn 2
messages.append({"role": "user", "content": "How would you add type hints?"})
response = client.chat.completions.create(model="deepseek-v3", messages=messages)
Each turn appends the new messages to the array. The model sees the full history.
System prompt best practices
- Be specific: Instead of "Be helpful", use "You are a Python expert. Explain code clearly with examples."
- Set constraints: "Keep answers under 200 words" or "Use bullet points for lists."
- Define the format: "Always respond in JSON when asked for structured data."
Token budget management
Long conversations consume more tokens. Consider:
| Strategy | Description |
|---|---|
| Truncate old messages | Remove the oldest user/assistant pairs when approaching context limits. |
| Summarize history | Replace early turns with a summary. |
| Use system prompt for persistent context | Move stable instructions (e.g., "You are a coding assistant") to the system message. |
Thinking Mode in multi-turn
When using Thinking Mode (reasoning models):
- Do NOT include
reasoning_contentfrom previous turns in the next turn's messages. - Only include the
content(final answer) from previous assistant responses. - This saves bandwidth and prevents the model from re-processing its own reasoning.
See Thinking Mode for details.
Error handling in multi-turn
If a turn fails:
- Log the
request_idfrom the error response. - Retry the last message (not the full history).
- If retry fails, consider starting a new conversation with a summary of the previous context.