Streaming

Server‑Sent Events (SSE) streaming.

Set stream: true to receive incremental deltas as they are generated. Streaming reduces perceived latency and enables real-time UIs.

How to enable

POST /api/v1/chat/completions

{
  "model": "provider/model",
  "messages": [{"role":"user","content":"Stream this response."}],
  "stream": true
}

SSE format

Events
The response is an SSE stream where each event line begins with data:. The stream terminates with data: [DONE].
data: {"id":"...","choices":[{"index":0,"delta":{"content":"Hel"}}]}

data: {"id":"...","choices":[{"index":0,"delta":{"content":"lo"}}]}

data: [DONE]

Client-side tips

  • Set sensible timeouts; streaming responses can be long-lived.
  • Handle partial UTF‑8 chunks safely (most SDKs already do).
  • Reconnect logic should be request-level, not chunk-level.
  • On 429, apply exponential backoff and consider lowering concurrency.