Documentation

Leman.zero API Reference

OpenAI-compatible chat completions API. No API key required — just point your existing OpenAI SDK at our endpoint and start building.

Quickstart

Leman.zero is fully compatible with the OpenAI SDK. Install it, point it at our base URL, and make your first request in under a minute. No API key needed.

1. Install the OpenAI SDK

terminal
npm install openai

2. Make your first request

example.ts
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/v1',
});

const response = await client.chat.completions.create({
  model: 'leman0-1.7b',
  messages: [
    { role: 'user', content: 'Hello!' }
  ],
});

console.log(response.choices[0].message.content);

Or use curl directly

terminal
curl -X POST https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "leman0-1.7b",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Base URL

All API requests should be made to the following base URL. The API is open — no authentication required.

https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/v1
GET

List Models

Returns a list of available models. Currently serves leman0-1.7b.

GET /v1/models

Example request

terminal
curl https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/v1/models

Response

response.json
{
  "object": "list",
  "data": [
    {
      "id": "leman0-1.7b",
      "object": "model",
      "created": 1738965600,
      "owned_by": "lemanlabs"
    }
  ]
}
POST

Chat Completions

Creates a chat completion. Send a list of messages and receive a model-generated response. Works as a drop-in replacement for the OpenAI chat completions endpoint.

POST /v1/chat/completions

Request body

messages
arrayrequired

A list of messages comprising the conversation. Each message is an object with 'role' (string) and 'content' (string) fields. Roles can be 'system', 'user', or 'assistant'.

model
stringoptionaldefault: "leman0-1.7b"

The model to use. Currently only 'leman0-1.7b' is available.

max_tokens
integeroptionaldefault: 128

Maximum number of tokens to generate. Capped at 1024.

temperature
floatoptionaldefault: 0.7

Sampling temperature between 0 and 2. Higher values make output more random, lower values more deterministic.

top_p
floatoptionaldefault: 0.95

Nucleus sampling parameter. The model considers tokens with top_p cumulative probability mass.

stream
booleanoptionaldefault: false

Whether to stream partial responses. Not yet supported — will return an error if set to true.

Example request

terminal
curl -X POST https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "leman0-1.7b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is linear attention?"}
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'

Response

response.json
{
  "id": "chatcmpl-1738965600000",
  "object": "chat.completion",
  "created": 1738965600,
  "model": "leman0-1.7b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Linear attention is a mechanism that..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 42,
    "total_tokens": 66
  }
}

Python example

example.py
from openai import OpenAI

client = OpenAI(
    base_url="https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/v1",
    api_key="unused",  # no key needed, but SDK requires a value
)

response = client.chat.completions.create(
    model="leman0-1.7b",
    messages=[
        {"role": "user", "content": "Explain linear attention in one sentence."}
    ],
    max_tokens=128,
)

print(response.choices[0].message.content)
GET

Health Check

Simple health check endpoint to verify the API is running.

GET /health
terminal
curl https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/health

Response

response.json
{"ok": true}

Errors

The API uses standard HTTP status codes. Errors are returned as JSON with a detail field describing the issue.

Status
Description
200

Success

400

Bad request — e.g., stream=true is not yet supported.

422

Validation error — missing or malformed request parameters.

500

Internal server error — something went wrong on our end.