Luxeno Docs

API Reference

Luxeno exposes two OpenAI-compatible endpoints and one Anthropic-compatible endpoint. All endpoints accept standard JSON request bodies and return standard JSON responses.

Authentication

Every request must include your Luxeno API key. Two header formats are accepted — use whichever your client library supports.

Bearer token (recommended)

curl https://api.luxeno.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-silk-xxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"model":"luxeno/glm-4-flash","messages":[{"role":"user","content":"Hello"}]}'

x-api-key header

curl https://api.luxeno.ai/v1/chat/completions \
  -H "x-api-key: sk-silk-xxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"model":"luxeno/glm-4-flash","messages":[{"role":"user","content":"Hello"}]}'

Get your API key

Go to /keys in your dashboard to create and manage API keys. Keys begin with /keys sk-silk-

Models & Pricing

All prices are in USD per 1 million tokens. Input and output tokens are billed separately where rates differ.

OpenAI-compatible endpoint

Standard Models

ModelInputOutputEndpoint
luxeno/glm-4-flash$0.30/1M$0.30/1M/v1/chat/completions
luxeno/glm-4$0.50/1M$0.50/1M/v1/chat/completions

Standard models charge the same rate for all input token types, including cached tokens.

Cache-Aware Models

ModelInputOutputCached ReadCache WriteCache StorageEndpoint
luxeno/claude-sonnet$3.00/1M$15.00/1M$0.30/1M$3.75/1M/v1/chat/completions

Anthropic-compatible endpoint

Standard Models

ModelInputOutputEndpoint
claude-sonnet-4-20250514$0.30/1M$0.30/1M/v1/messages
glm-4-flash$0.30/1M$0.30/1M/v1/messages
glm-4.7$0.50/1M$0.50/1M/v1/messages
glm-5$0.50/1M$0.50/1M/v1/messages

Standard models charge the same rate for all input token types, including cached tokens.

claude-sonnet-4-20250514, glm-4-flashclaude-sonnet-4-20250514, glm-4-flash, and other Anthropic-format models are routed through Zhipu's Anthropic-compatible endpoint at the same $0.30/1M rate — significantly cheaper than native Anthropic pricing.

POST /v1/chat/completions

OpenAI-compatible chat endpoint. Accepts the same request schema as openai.chat.completions.create() — drop-in replacement by changing the base URL and model name.

Request

POST /v1/chat/completions
POST /v1/chat/completions
Content-Type: application/json
Authorization: Bearer sk-silk-xxxxxxxxxxxx

{
  "model": "luxeno/glm-4-flash",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user",   "content": "Explain quantum entanglement in one sentence." }
  ],
  "temperature": 0.7,
  "max_tokens": 256
}

Response

200 OK
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1748000000,
  "model": "luxeno/glm-4-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum entanglement is a phenomenon where two particles become correlated such that measuring one instantly determines the state of the other, regardless of distance."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 31,
    "completion_tokens": 38,
    "total_tokens": 69
  }
}

Streaming (SSE)

Set "stream": true to receive server-sent events. The final chunk before [DONE] includes the usage object for billing transparency.

Streaming example
POST /v1/chat/completions
Content-Type: application/json
Authorization: Bearer sk-silk-xxxxxxxxxxxx

{
  "model": "luxeno/glm-4-flash",
  "messages": [{ "role": "user", "content": "Count to five." }],
  "stream": true
}

--- SSE response ---
data: {"id":"chatcmpl-xyz","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant","content":"1"},"index":0}]}

data: {"id":"chatcmpl-xyz","object":"chat.completion.chunk","choices":[{"delta":{"content":", 2"},"index":0}]}

data: {"id":"chatcmpl-xyz","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop","index":0}],"usage":{"prompt_tokens":12,"completion_tokens":10}}

data: [DONE]

POST /v1/messages

Anthropic Messages API-compatible endpoint. Accepts the same request schema as the official Anthropic SDK. Use this endpoint with tools that call the Anthropic API directly (e.g. Claude Code, Cursor in Anthropic mode).

anthropic-version header

Include anthropic-version: 2023-06-01 in your request. If omitted, the gateway defaults to that version automatically.

Request

POST /v1/messages
POST /v1/messages
Content-Type: application/json
Authorization: Bearer sk-silk-xxxxxxxxxxxx
anthropic-version: 2023-06-01

{
  "model": "claude-sonnet-4-20250514",
  "max_tokens": 1024,
  "system": "You are a concise technical writer.",
  "messages": [
    { "role": "user", "content": "What is a B-tree index?" }
  ]
}

Response

200 OK
{
  "id": "msg_01AbCdEf",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "A B-tree index is a balanced tree data structure that maintains sorted data and allows searches, insertions, deletions, and range queries in O(log n) time."
    }
  ],
  "model": "claude-sonnet-4-20250514",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 24,
    "output_tokens": 42
  }
}

Streaming SSE event types

Anthropic SSE uses named event types rather than a single data: stream. Key events:

  • message_start contains input token count in message.usage.input_tokens
  • content_block_delta carries text chunks in delta.text
  • message_delta contains output token count in usage.output_tokens
  • message_stop signals end of stream
Anthropic SSE stream
data: {"type":"message_start","message":{"id":"msg_01AbCd","type":"message","role":"assistant","usage":{"input_tokens":24,"output_tokens":0}}}

data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"A B-tree index"}}

data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" is a balanced tree..."}}

data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":42}}

data: {"type":"message_stop"}

Rate Limits

Each user account is limited to 60 requests per minute across both the /v1/chat/completions and /v1/messages endpoints combined (sliding window).

429 response
HTTP/1.1 429 Too Many Requests
Retry-After: 12
Content-Type: application/json

{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_error",
    "code": 429
  }
}

Retry-After

Always read the Retry-After header (value in seconds) before retrying. Immediate retries will be rejected and count against your limit.

Error Codes

All errors follow OpenAI-format envelopes: { error: { message, type, code } }. Anthropic-format errors on the /v1/messages endpoint use { type: 'error', error: { type, message } }.

StatusTypeCauseResolution
400invalid_request_errorBad model name or malformed bodyCheck the model name and request schema
401authentication_errorMissing or invalid API keyVerify your key in the dashboard
402payment_requiredAccount balance is zeroTop up your balance at /billing
429rate_limit_error60 RPM limit exceededRespect the Retry-After header
500api_errorUnexpected server errorRetry with exponential back-off
503api_errorUpstream gateway unavailableRetry; check status page

Playground

Try the API directly in your browser. The playground calls /api/v1/chat/completions using your first active API key. Requests are billed to your account at normal rates.

Loading playground...