Context caching

Context caching (KV cache) automatically caches prompt prefixes across multiple requests. When subsequent requests share the same prefix, the cached context is reused, reducing latency and cost.

How it works

Default: on — no code changes required.
Automatic — the cache is managed by the gateway. You do not need to enable or configure it.
Prefix matching — if two requests share a common prompt prefix, the shared portion is served from cache.

Pricing

Cache hits are billed at a lower rate than cache misses:

Item	Rate
Cache hit	$0.028 / 1M tokens (10x cheaper)
Cache miss	$0.28 / 1M tokens (standard input rate)

The exact rates for each model are listed in the Model catalog.

Monitoring cache effectiveness

The response usage object includes cache hit/miss statistics:

{
  "usage": {
    "prompt_tokens": 1000,
    "completion_tokens": 200,
    "total_tokens": 1200,
    "prompt_cache_hit_tokens": 800,
    "prompt_cache_miss_tokens": 200
  }
}

Field	Description
`prompt_cache_hit_tokens`	Number of prompt tokens served from cache (billed at lower rate).
`prompt_cache_miss_tokens`	Number of prompt tokens not served from cache (billed at standard rate).

Cache hit rate

You can calculate the cache hit rate for each request:

hit_rate = prompt_cache_hit_tokens / prompt_tokens

A high hit rate means effective caching and lower costs.

Best practices for maximizing cache hits

Use consistent prompt prefixes: If you have a system message or few-shot examples that stay the same across requests, keep them at the beginning of the prompt.
Batch similar requests: Requests with similar prefixes within a short time window are more likely to hit the cache.
Avoid randomizing the prompt: Random IDs, timestamps, or varying system messages reduce cache effectiveness.

Cache behavior

Behavior	Description
Minimum cache unit	64 tokens
Cache build time	Seconds
Cache expiration	Hours to days (automatic)
Cache invalidation	Automatic when the cache is stale

Cost example

A 10,000 token prompt with 8,000 cache hits and 2,000 cache misses:

Item	Tokens	Rate	Cost
Cache hit	8,000	$0.028/M	$0.000224
Cache miss	2,000	$0.28/M	$0.000560
Total	10,000	—	$0.000784

Without caching, the same prompt would cost $0.002800 — a 72% savings with an 80% cache hit rate.