Rate limits

429s, retries, and resilient clients.

Rate limits can come from lmchat or from upstream providers. Treat 429s as a normal part of production operation: backoff, reduce concurrency, and use routing fallbacks.

Best practices

  • Use exponential backoff with jitter for 429 and transient 5xx errors.
  • Bound retries: infinite retries turn outages into cascading failures.
  • Cap concurrency per key, per user, and per tenant.
  • Prefer streaming for long generations to reduce timeouts.

Operational strategy

Combine with routing
If a provider is returning 429s, fall back to another provider or sort by the lowest-latency available option. For strict cost controls, disable fallbacks and fail fast.