Multi-turn conversations

Multi-turn conversations maintain context across multiple exchanges. The model uses the conversation history to provide coherent and contextually relevant responses.

How it works

Build the messages array by appending each turn's messages:

{
  "model": "deepseek-v3",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "What is Python?" },
    {
      "role": "assistant",
      "content": "Python is a high-level programming language..."
    },
    { "role": "user", "content": "What are its main use cases?" }
  ]
}

The model uses the full conversation history to understand context.

Message roles

RoleDescription
systemOptional. Sets the behavior and tone. Usually placed at the beginning.
userThe user's input or question.
assistantThe model's previous response.

Building a multi-turn conversation

Example: Code review

messages = [
    {"role": "system", "content": "You are a senior code reviewer."},
]

# Turn 1
messages.append({"role": "user", "content": "Review this Python function:\n\ndef add(a, b):\n    return a + b"})
response = client.chat.completions.create(model="deepseek-v3", messages=messages)
messages.append({"role": "assistant", "content": response.choices[0].message.content})

# Turn 2
messages.append({"role": "user", "content": "How would you add type hints?"})
response = client.chat.completions.create(model="deepseek-v3", messages=messages)

Each turn appends the new messages to the array. The model sees the full history.

System prompt best practices

  1. Be specific: Instead of "Be helpful", use "You are a Python expert. Explain code clearly with examples."
  2. Set constraints: "Keep answers under 200 words" or "Use bullet points for lists."
  3. Define the format: "Always respond in JSON when asked for structured data."

Token budget management

Long conversations consume more tokens. Consider:

StrategyDescription
Truncate old messagesRemove the oldest user/assistant pairs when approaching context limits.
Summarize historyReplace early turns with a summary.
Use system prompt for persistent contextMove stable instructions (e.g., "You are a coding assistant") to the system message.

Thinking Mode in multi-turn

When using Thinking Mode (reasoning models):

  • Do NOT include reasoning_content from previous turns in the next turn's messages.
  • Only include the content (final answer) from previous assistant responses.
  • This saves bandwidth and prevents the model from re-processing its own reasoning.

See Thinking Mode for details.

Error handling in multi-turn

If a turn fails:

  1. Log the request_id from the error response.
  2. Retry the last message (not the full history).
  3. If retry fails, consider starting a new conversation with a summary of the previous context.

See also