Chat Completions

Messages, tools, streaming, structured outputs.

This is the primary endpoint for modern applications. It supports multi-turn conversations, tool calling, streaming, and (where supported by the model) structured JSON outputs.

Endpoint

POST /api/v1/chat/completions

Minimal request

{
  "model": "anthropic/claude",
  "messages": [
    {"role":"system","content":"You are concise."},
    {"role":"user","content":"Write a 1-sentence tagline."}
  ]
}

Core parameters

model
A provider/model identifier. See Models.
messages
An array of chat messages. Each message has a role and content.
temperature / top_p
Controls randomness. Prefer tuning one (temperature or top_p), not both.
max_tokens
Caps the output length. If omitted, defaults may depend on the model/provider.
stream
When true, response is sent as SSE. See Streaming.

Extended parameters (reference)

Common knobs
Different providers expose different subsets. The safest approach is to use these parameters when supported, and ignore unknown fields on responses.
{
  "temperature": 0.7,
  "top_p": 1,
  "max_tokens": 512,
  "stop": ["\n\nUser:"],
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "n": 1,
  "seed": 1234,
  "user": "end_user_id",
  "stream_options": { "include_usage": true }
}

Structured outputs (JSON mode)

response_format
For models that support it, you can request strict JSON output. If a model does not support structured output, enforce it via prompting and validation.
{
  "model": "provider/model",
  "messages": [{"role":"user","content":"Return JSON with keys: title, bullets[]"}],
  "response_format": { "type": "json_object" }
}

Vision / multimodal content

Message content parts
Some models accept mixed content arrays (text + image). The exact shape follows the OpenAI-style “content parts” pattern.
{
  "model": "provider/model",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this screenshot."},
        {"type": "image_url", "image_url": {"url": "https://example.com/image.png"}}
      ]
    }
  ]
}

Routing overrides (advanced)

Per-request routing
If you need strict preferences or fallbacks for a specific request, attach a routing directive object (see the full routing guide).
{
  "model": "anthropic/claude",
  "messages": [{"role":"user","content":"Hello"}],
  "lmchat": {
    "routing": {
      "prefer": ["vertex", "bedrock", "direct"],
      "allow_fallbacks": true,
      "max_provider_tries": 2
    }
  }
}
Learn more in Routing & fallbacks.

Response

{
  "id": "chatcmpl_...",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "provider/model",
  "choices": [
    {
      "index": 0,
      "message": {"role": "assistant", "content": "Hello."},
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 3,
    "total_tokens": 15
  }
}