Documentation

Leman.zero API Reference

OpenAI-compatible chat completions API. No API key required — just point your existing OpenAI SDK at our endpoint and start building.

Quickstart

Leman.zero is fully compatible with the OpenAI SDK. Install it, point it at our base URL, and make your first request in under a minute. No API key needed.

1. Install the OpenAI SDK

terminal

npm install openai

2. Make your first request

example.ts

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/v1',
});

const response = await client.chat.completions.create({
  model: 'leman0-1.7b',
  messages: [
    { role: 'user', content: 'Hello!' }
  ],
});

console.log(response.choices[0].message.content);

Or use curl directly

terminal

curl -X POST https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "leman0-1.7b",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Base URL

All API requests should be made to the following base URL. The API is open — no authentication required.

https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/v1

GET

List Models

Returns a list of available models. Currently serves leman0-1.7b.

GET /v1/models

Example request

terminal

curl https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/v1/models

Response

response.json

{
  "object": "list",
  "data": [
    {
      "id": "leman0-1.7b",
      "object": "model",
      "created": 1738965600,
      "owned_by": "lemanlabs"
    }
  ]
}

POST

Chat Completions

Creates a chat completion. Send a list of messages and receive a model-generated response. Works as a drop-in replacement for the OpenAI chat completions endpoint.

POST /v1/chat/completions

Request body

messages

arrayrequired

A list of messages comprising the conversation. Each message is an object with 'role' (string) and 'content' (string) fields. Roles can be 'system', 'user', or 'assistant'.

model

stringoptionaldefault: "leman0-1.7b"

The model to use. Currently only 'leman0-1.7b' is available.

max_tokens

integeroptionaldefault: 128

Maximum number of tokens to generate. Capped at 1024.

temperature

floatoptionaldefault: 0.7

Sampling temperature between 0 and 2. Higher values make output more random, lower values more deterministic.

top_p

floatoptionaldefault: 0.95

Nucleus sampling parameter. The model considers tokens with top_p cumulative probability mass.

stream

booleanoptionaldefault: false

Whether to stream partial responses. Not yet supported — will return an error if set to true.

Example request

terminal

curl -X POST https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "leman0-1.7b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is linear attention?"}
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'

Response

response.json

{
  "id": "chatcmpl-1738965600000",
  "object": "chat.completion",
  "created": 1738965600,
  "model": "leman0-1.7b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Linear attention is a mechanism that..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 42,
    "total_tokens": 66
  }
}

Python example

example.py

from openai import OpenAI

client = OpenAI(
    base_url="https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/v1",
    api_key="unused",  # no key needed, but SDK requires a value
)

response = client.chat.completions.create(
    model="leman0-1.7b",
    messages=[
        {"role": "user", "content": "Explain linear attention in one sentence."}
    ],
    max_tokens=128,
)

print(response.choices[0].message.content)

GET

Health Check

Simple health check endpoint to verify the API is running.

GET /health

terminal

curl https://carloshurtadocomin--lemanlabs-openai-api-fastapi-app.modal.run/health

Response

response.json

{"ok": true}

Errors

The API uses standard HTTP status codes. Errors are returned as JSON with a detail field describing the issue.

Status

Description

200

Success

400

Bad request — e.g., stream=true is not yet supported.

422

Validation error — missing or malformed request parameters.

500

Internal server error — something went wrong on our end.