Chat Completions

Messages, tools, streaming, structured outputs.

This is the primary endpoint for modern applications. It supports multi-turn conversations, tool calling, streaming, and (where supported by the model) structured JSON outputs.

Endpoint

POST /api/v1/chat/completions

Minimal request

{
  "model": "anthropic/claude",
  "messages": [
    {"role":"system","content":"You are concise."},
    {"role":"user","content":"Write a 1-sentence tagline."}
  ]
}

Core parameters

model

A provider/model identifier. See Models.

messages

An array of chat messages. Each message has a role and content.

temperature / top_p

Controls randomness. Prefer tuning one (temperature or top_p), not both.

max_tokens

Caps the output length. If omitted, defaults may depend on the model/provider.

stream

When true, response is sent as SSE. See Streaming.

Extended parameters (reference)

Common knobs

Different providers expose different subsets. The safest approach is to use these parameters when supported, and ignore unknown fields on responses.

{
  "temperature": 0.7,
  "top_p": 1,
  "max_tokens": 512,
  "stop": ["\n\nUser:"],
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "n": 1,
  "seed": 1234,
  "user": "end_user_id",
  "stream_options": { "include_usage": true }
}

Structured outputs (JSON mode)

response_format

For models that support it, you can request strict JSON output. If a model does not support structured output, enforce it via prompting and validation.

{
  "model": "provider/model",
  "messages": [{"role":"user","content":"Return JSON with keys: title, bullets[]"}],
  "response_format": { "type": "json_object" }
}

Vision / multimodal content

Message content parts

Some models accept mixed content arrays (text + image). The exact shape follows the OpenAI-style “content parts” pattern.

{
  "model": "provider/model",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this screenshot."},
        {"type": "image_url", "image_url": {"url": "https://example.com/image.png"}}
      ]
    }
  ]
}

Routing overrides (advanced)

Per-request routing

If you need strict preferences or fallbacks for a specific request, attach a routing directive object (see the full routing guide).

{
  "model": "anthropic/claude",
  "messages": [{"role":"user","content":"Hello"}],
  "lmchat": {
    "routing": {
      "prefer": ["vertex", "bedrock", "direct"],
      "allow_fallbacks": true,
      "max_provider_tries": 2
    }
  }
}

Learn more in Routing & fallbacks.

Response

{
  "id": "chatcmpl_...",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "provider/model",
  "choices": [
    {
      "index": 0,
      "message": {"role": "assistant", "content": "Hello."},
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 3,
    "total_tokens": 15
  }
}