Routing

Provider selection, fallbacks, and control.

Routing is how a “unified API” becomes production-grade: you can prioritize cost, latency, or provider preferences, and fall back when a provider is degraded. This section documents the intended knobs and best practices.

Why routing exists

Reliability and leverage
Providers can be rate-limited, degraded, or temporarily unavailable. Routing lets you define what happens next without changing your application code.

Provider preferences (example shape)

Request-level routing
An example of a routing object you can attach to requests. The exact feature set may evolve, but the principles remain stable.
{
  "model": "anthropic/claude",
  "messages": [{"role":"user","content":"Hello"}],
  "lmchat": {
    "routing": {
      "prefer": ["bedrock", "vertex", "direct"],
      "allow_fallbacks": true,
      "max_provider_tries": 2
    }
  }
}

Sorting strategies

Cost vs latency vs quality
Common strategies:
  • pricing-low-to-high: minimize cost.
  • latency-low-to-high: minimize time-to-first-token.
  • quality-high-to-low: prefer the most capable options.

Fallback best practices

  • Use fallbacks for read-only workloads (summaries, extraction) by default.
  • For deterministic tasks, cap retries and enforce idempotency on your side.
  • Record provider used in logs to debug regressions and cost spikes.
  • Combine routing with data policies if prompts contain sensitive data.