Rate limits
429s, retries, and resilient clients.
Rate limits can come from lmchat or from upstream providers. Treat 429s as a normal part of production operation: backoff, reduce concurrency, and use routing fallbacks.
Best practices
- Use exponential backoff with jitter for 429 and transient 5xx errors.
- Bound retries: infinite retries turn outages into cascading failures.
- Cap concurrency per key, per user, and per tenant.
- Prefer streaming for long generations to reduce timeouts.
Operational strategy
Combine with routing
If a provider is returning 429s, fall back to another provider or sort by the lowest-latency available option. For strict cost controls, disable fallbacks and fail fast.