Documentation
Leman.zero API Reference
OpenAI-compatible chat completions API. No API key required — just point your existing OpenAI SDK at our endpoint and start building.
Quickstart
Leman.zero is fully compatible with the OpenAI SDK. Install it, point it at our base URL, and make your first request in under a minute. No API key needed.
1. Install the OpenAI SDK
npm install openai2. Make your first request
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/v1',
});
const response = await client.chat.completions.create({
model: 'leman0-1.7b',
messages: [
{ role: 'user', content: 'Hello!' }
],
});
console.log(response.choices[0].message.content);Or use curl directly
curl -X POST https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "leman0-1.7b",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'Base URL
All API requests should be made to the following base URL. The API is open — no authentication required.
https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/v1List Models
Returns a list of available models. Currently serves leman0-1.7b.
GET /v1/modelsExample request
curl https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/v1/modelsResponse
{
"object": "list",
"data": [
{
"id": "leman0-1.7b",
"object": "model",
"created": 1738965600,
"owned_by": "lemanlabs"
}
]
}Chat Completions
Creates a chat completion. Send a list of messages and receive a model-generated response. Works as a drop-in replacement for the OpenAI chat completions endpoint.
POST /v1/chat/completionsRequest body
messagesA list of messages comprising the conversation. Each message is an object with 'role' (string) and 'content' (string) fields. Roles can be 'system', 'user', or 'assistant'.
model"leman0-1.7b"The model to use. Currently only 'leman0-1.7b' is available.
max_tokens128Maximum number of tokens to generate. Capped at 1024.
temperature0.7Sampling temperature between 0 and 2. Higher values make output more random, lower values more deterministic.
top_p0.95Nucleus sampling parameter. The model considers tokens with top_p cumulative probability mass.
streamfalseWhether to stream partial responses. Not yet supported — will return an error if set to true.
Example request
curl -X POST https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "leman0-1.7b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is linear attention?"}
],
"max_tokens": 256,
"temperature": 0.7
}'Response
{
"id": "chatcmpl-1738965600000",
"object": "chat.completion",
"created": 1738965600,
"model": "leman0-1.7b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Linear attention is a mechanism that..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 42,
"total_tokens": 66
}
}Python example
from openai import OpenAI
client = OpenAI(
base_url="https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/v1",
api_key="unused", # no key needed, but SDK requires a value
)
response = client.chat.completions.create(
model="leman0-1.7b",
messages=[
{"role": "user", "content": "Explain linear attention in one sentence."}
],
max_tokens=128,
)
print(response.choices[0].message.content)Health Check
Simple health check endpoint to verify the API is running.
GET /healthcurl https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/healthResponse
{"ok": true}Errors
The API uses standard HTTP status codes. Errors are returned as JSON with a detail field describing the issue.
200Success
400Bad request — e.g., stream=true is not yet supported.
422Validation error — missing or malformed request parameters.
500Internal server error — something went wrong on our end.