API Reference
Luxeno exposes two OpenAI-compatible endpoints and one Anthropic-compatible endpoint. All endpoints accept standard JSON request bodies and return standard JSON responses.
Authentication
Every request must include your Luxeno API key. Two header formats are accepted — use whichever your client library supports.
Bearer token (recommended)
curl https://api.luxeno.ai/v1/chat/completions \
-H "Authorization: Bearer sk-silk-xxxxxxxxxxxx" \
-H "Content-Type: application/json" \
-d '{"model":"luxeno/glm-4-flash","messages":[{"role":"user","content":"Hello"}]}'x-api-key header
curl https://api.luxeno.ai/v1/chat/completions \
-H "x-api-key: sk-silk-xxxxxxxxxxxx" \
-H "Content-Type: application/json" \
-d '{"model":"luxeno/glm-4-flash","messages":[{"role":"user","content":"Hello"}]}'Get your API key
sk-silk-Models & Pricing
All prices are in USD per 1 million tokens. Input and output tokens are billed separately where rates differ.
OpenAI-compatible endpoint
Standard Models
| Model | Input | Output | Endpoint |
|---|---|---|---|
| luxeno/glm-4-flash | $0.30/1M | $0.30/1M | /v1/chat/completions |
| luxeno/glm-4 | $0.50/1M | $0.50/1M | /v1/chat/completions |
Standard models charge the same rate for all input token types, including cached tokens.
Cache-Aware Models
| Model | Input | Output | Cached Read | Cache Write | Cache Storage | Endpoint |
|---|---|---|---|---|---|---|
| luxeno/claude-sonnet | $3.00/1M | $15.00/1M | $0.30/1M | $3.75/1M | — | /v1/chat/completions |
Anthropic-compatible endpoint
Standard Models
| Model | Input | Output | Endpoint |
|---|---|---|---|
| claude-sonnet-4-20250514 | $0.30/1M | $0.30/1M | /v1/messages |
| glm-4-flash | $0.30/1M | $0.30/1M | /v1/messages |
| glm-4.7 | $0.50/1M | $0.50/1M | /v1/messages |
| glm-5 | $0.50/1M | $0.50/1M | /v1/messages |
Standard models charge the same rate for all input token types, including cached tokens.
claude-sonnet-4-20250514, glm-4-flashclaude-sonnet-4-20250514, glm-4-flash, and other Anthropic-format models are routed through Zhipu's Anthropic-compatible endpoint at the same $0.30/1M rate — significantly cheaper than native Anthropic pricing.POST /v1/chat/completions
OpenAI-compatible chat endpoint. Accepts the same request schema as openai.chat.completions.create() — drop-in replacement by changing the base URL and model name.
Request
POST /v1/chat/completions
Content-Type: application/json
Authorization: Bearer sk-silk-xxxxxxxxxxxx
{
"model": "luxeno/glm-4-flash",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Explain quantum entanglement in one sentence." }
],
"temperature": 0.7,
"max_tokens": 256
}Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1748000000,
"model": "luxeno/glm-4-flash",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum entanglement is a phenomenon where two particles become correlated such that measuring one instantly determines the state of the other, regardless of distance."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 31,
"completion_tokens": 38,
"total_tokens": 69
}
}Streaming (SSE)
Set "stream": true to receive server-sent events. The final chunk before [DONE] includes the usage object for billing transparency.
POST /v1/chat/completions
Content-Type: application/json
Authorization: Bearer sk-silk-xxxxxxxxxxxx
{
"model": "luxeno/glm-4-flash",
"messages": [{ "role": "user", "content": "Count to five." }],
"stream": true
}
--- SSE response ---
data: {"id":"chatcmpl-xyz","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant","content":"1"},"index":0}]}
data: {"id":"chatcmpl-xyz","object":"chat.completion.chunk","choices":[{"delta":{"content":", 2"},"index":0}]}
data: {"id":"chatcmpl-xyz","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop","index":0}],"usage":{"prompt_tokens":12,"completion_tokens":10}}
data: [DONE]POST /v1/messages
Anthropic Messages API-compatible endpoint. Accepts the same request schema as the official Anthropic SDK. Use this endpoint with tools that call the Anthropic API directly (e.g. Claude Code, Cursor in Anthropic mode).
anthropic-version header
Request
POST /v1/messages
Content-Type: application/json
Authorization: Bearer sk-silk-xxxxxxxxxxxx
anthropic-version: 2023-06-01
{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"system": "You are a concise technical writer.",
"messages": [
{ "role": "user", "content": "What is a B-tree index?" }
]
}Response
{
"id": "msg_01AbCdEf",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "A B-tree index is a balanced tree data structure that maintains sorted data and allows searches, insertions, deletions, and range queries in O(log n) time."
}
],
"model": "claude-sonnet-4-20250514",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 24,
"output_tokens": 42
}
}Streaming SSE event types
Anthropic SSE uses named event types rather than a single data: stream. Key events:
message_start— contains input token count in message.usage.input_tokenscontent_block_delta— carries text chunks in delta.textmessage_delta— contains output token count in usage.output_tokensmessage_stop— signals end of stream
data: {"type":"message_start","message":{"id":"msg_01AbCd","type":"message","role":"assistant","usage":{"input_tokens":24,"output_tokens":0}}}
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"A B-tree index"}}
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" is a balanced tree..."}}
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":42}}
data: {"type":"message_stop"}Rate Limits
Each user account is limited to 60 requests per minute across both the /v1/chat/completions and /v1/messages endpoints combined (sliding window).
HTTP/1.1 429 Too Many Requests
Retry-After: 12
Content-Type: application/json
{
"error": {
"message": "Rate limit exceeded",
"type": "rate_limit_error",
"code": 429
}
}Retry-After
Error Codes
All errors follow OpenAI-format envelopes: { error: { message, type, code } }. Anthropic-format errors on the /v1/messages endpoint use { type: 'error', error: { type, message } }.
| Status | Type | Cause | Resolution |
|---|---|---|---|
| 400 | invalid_request_error | Bad model name or malformed body | Check the model name and request schema |
| 401 | authentication_error | Missing or invalid API key | Verify your key in the dashboard |
| 402 | payment_required | Account balance is zero | Top up your balance at /billing |
| 429 | rate_limit_error | 60 RPM limit exceeded | Respect the Retry-After header |
| 500 | api_error | Unexpected server error | Retry with exponential back-off |
| 503 | api_error | Upstream gateway unavailable | Retry; check status page |
Playground
Try the API directly in your browser. The playground calls /api/v1/chat/completions using your first active API key. Requests are billed to your account at normal rates.