Routing

Provider selection, fallbacks, and control.

Routing is how a “unified API” becomes production-grade: you can prioritize cost, latency, or provider preferences, and fall back when a provider is degraded. This section documents the intended knobs and best practices.

Why routing exists

Reliability and leverage

Providers can be rate-limited, degraded, or temporarily unavailable. Routing lets you define what happens next without changing your application code.

Provider preferences (example shape)

Request-level routing

An example of a routing object you can attach to requests. The exact feature set may evolve, but the principles remain stable.

{
  "model": "anthropic/claude",
  "messages": [{"role":"user","content":"Hello"}],
  "lmchat": {
    "routing": {
      "prefer": ["bedrock", "vertex", "direct"],
      "allow_fallbacks": true,
      "max_provider_tries": 2
    }
  }
}

Sorting strategies

Cost vs latency vs quality

Common strategies:

pricing-low-to-high: minimize cost.
latency-low-to-high: minimize time-to-first-token.
quality-high-to-low: prefer the most capable options.

Fallback best practices

Use fallbacks for read-only workloads (summaries, extraction) by default.
For deterministic tasks, cap retries and enforce idempotency on your side.
Record provider used in logs to debug regressions and cost spikes.
Combine routing with data policies if prompts contain sensitive data.