AiHummer docs
v1.0.x
RU EN

Chat Completions API

v1.0.x · updated 2026-06-26

AiHummer exposes a single OpenAI-compatible HTTP endpoint for text turns: POST /v1/chat/completions. Any client or SDK that already speaks the OpenAI Chat Completions format can talk to AiHummer by changing the base URL and the API key — no AiHummer-specific code is required.

Authentication

Requests are authenticated with a personal API key as a Bearer token. AiHummer keys are prefixed with ah- and are issued from the web admin UI.

POST /v1/chat/completions HTTP/1.1
Host: your-aihummer.example
Authorization: Bearer ah-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Content-Type: application/json

[!TIP] The base URL is your gateway address. In a default install the gateway listens on :8765, so a local call goes to http://localhost:8765/v1/chat/completions. In production you terminate TLS at a reverse proxy in front of the gateway.

A basic request

Send a JSON body with messages, exactly as you would to OpenAI. The model field selects the model (or agent) configured on your instance.

curl https://your-aihummer.example/v1/chat/completions \
  -H "Authorization: Bearer ah-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "default",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Summarise our refund policy in two sentences." }
    ],
    "temperature": 0.3
  }'

A non-streaming response follows the familiar Chat Completions shape:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1750000000,
  "model": "default",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Refunds are issued within 14 days..." },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0 }
}

Streaming responses (SSE)

Set "stream": true to receive the answer incrementally as Server-Sent Events. Each event carries a chat.completion.chunk delta, and the stream terminates with a final data: [DONE] line.

curl -N https://your-aihummer.example/v1/chat/completions \
  -H "Authorization: Bearer ah-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "default",
    "stream": true,
    "messages": [
      { "role": "user", "content": "Write a one-line greeting." }
    ]
  }'

The response is a text/event-stream:

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"}}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":"stop"}]}

data: [DONE]

[!WARNING] /v1/chat/completions is the only endpoint on the OpenAI-compatible surface. AiHummer does not expose /v1/models and does not expose /v1/embeddings — embeddings are an internal subsystem and are not reachable over HTTP. Do not rely on those routes; they return 404.

Discovery & schema surfaces

While there is no /v1/models listing, AiHummer ships several discovery surfaces so humans and tools can explore the API:

PathWhat it serves
GET /docsHuman-readable documentation entry point
GET /docs/apiInteractive API Explorer
GET /docs/openapi.jsonOpenAPI 3.x specification
GET /openapi.jsonOpenAPI 3.x specification (root alias)
GET /docs/llm.jsonMachine-readable API summary for LLM tooling
GET /llms.txtllms.txt index for LLM agents
# Fetch the OpenAPI spec
curl https://your-aihummer.example/openapi.json

System endpoints

Two lightweight, unauthenticated system endpoints help with liveness and clock checks:

Method & pathPurpose
GET /v1/pingReturns a simple liveness response
GET /v1/timeReturns the gateway’s current server time
curl https://your-aihummer.example/v1/ping
curl https://your-aihummer.example/v1/time

Where to next