Routing
Provider selection, fallbacks, and control.
Routing is how a “unified API” becomes production-grade: you can prioritize cost, latency, or provider preferences, and fall back when a provider is degraded. This section documents the intended knobs and best practices.
Why routing exists
Reliability and leverage
Providers can be rate-limited, degraded, or temporarily unavailable. Routing lets you define what happens next without changing your application code.
Provider preferences (example shape)
Request-level routing
An example of a routing object you can attach to requests. The exact feature set may evolve, but the principles remain stable.
{
"model": "anthropic/claude",
"messages": [{"role":"user","content":"Hello"}],
"lmchat": {
"routing": {
"prefer": ["bedrock", "vertex", "direct"],
"allow_fallbacks": true,
"max_provider_tries": 2
}
}
}Sorting strategies
Cost vs latency vs quality
Common strategies:
- pricing-low-to-high: minimize cost.
- latency-low-to-high: minimize time-to-first-token.
- quality-high-to-low: prefer the most capable options.
Fallback best practices
- Use fallbacks for read-only workloads (summaries, extraction) by default.
- For deterministic tasks, cap retries and enforce idempotency on your side.
- Record provider used in logs to debug regressions and cost spikes.
- Combine routing with data policies if prompts contain sensitive data.