Gateway & turn engine — AiHummer Docs

The heart of AiHummer is a single gateway service. It is at the same time the control-plane (admin API, settings, channel wiring, marketplace) and the turn engine (the function-calling loop that produces an answer). A typical deployment is therefore just that one service plus PostgreSQL — there is no separate worker tier you are required to run.

One service, two roles

The gateway listens on :8765 by default, controlled by AIHUMMER_GATEWAY_ADDR. The installer may pick a different free port, but the process is always the same single service.

# gateway.env — the only required setting
AIHUMMER_DATABASE_URL=postgres://user:pass@localhost:5432/aihummer?sslmode=disable
# Admin UI after start at http://localhost:8765/admin/

The only hard dependency is PostgreSQL. Postgres is the single source of truth: agents, settings, conversations, memory, the outbox and audit all live there. Everything else — sidecars, a vector store, a message bus, model providers — is optional and wired in only when you configure it.

[!NOTE] Without a database the gateway starts in health-only mode: it answers GET /healthz so an orchestrator or load balancer can see the process is alive, but it will not serve turns. GET /readyz checks PostgreSQL and returns 503 while the database is unreachable.

What happens at startup

On startup the gateway performs a few steps in a strict order:

opens the database connection pool,
applies any pending migrations under a Postgres advisory lock,
resolves configuration (database value → environment variable → built-in default), and
wires the services — router, orchestrator, channels, tools, memory, delivery — into a running gateway.

The important consequence is that most features are opt-in via a settings key. A capability that is not configured is simply not activated, which keeps the default runtime small and predictable. You turn things on from the web admin UI or with an AIHUMMER_* variable, and the gateway resolves them on the next start (or hot, for the knobs that support it).

The turn engine

When a message reaches the gateway, the turn engine takes over. It runs a function-calling loop: the model is given the system prompt and the conversation, it may call tools (or spawn sub-agents), each tool result is fed back, and the loop continues until the model produces a final answer. That answer is then handed to the delivery layer.

inbound message
   └─▶ turn engine
         ├─ assemble layered system prompt
         ├─ call model ──▶ tool calls / sub-agents ──▶ tool results ─┐
         │       ▲                                                    │
         │       └────────────────────────────────────────────────-─┘
         └─ final answer ─▶ outbox ─▶ originating channel

Because the loop is deterministic about where each input comes from, answers are resolved from the conversation history and from tool results — never by injecting untrusted text into the instructions. That property is what makes the prompt layering below safe as well as fast.

The layered, cache-friendly system prompt

The system prompt is not one blob. It is assembled in layers, deliberately ordered so that the stable parts come first and the volatile parts come last. This matters because model providers cache a prompt by its prefix: as long as the beginning of the prompt is byte-for-byte identical, the cached prefix is reused and only the tail is re-processed.

Zone	Layers (in order)	Changes…
Stable prefix (cacheable)	base identity + tool/memory guide → tenant → project → persona → skills	rarely — per agent/tenant/project
Volatile tail (appended last)	onboarding state → memory hydration → the live date	every turn

The stable prefix carries everything that defines who the agent is: the built-in identity and the guide to how tools and memory work, then the tenant layer, the project layer, the agent’s persona, and the rendered skills block. None of that changes between two consecutive turns of the same agent, so it forms a reusable cached prefix.

The volatile tail is appended after the stable prefix precisely so it never invalidates the cache: onboarding state, the memory hydrated for this specific conversation, and the current date all change turn to turn, but because they sit at the end they only cost what they add.

[!TIP] This ordering is the reason live data such as today’s date can be present in every turn without paying to re-encode the whole identity each time. Keep custom per-agent content in the stable layers (persona, skills) and let the engine own the volatile tail.

Where to next

See how tenants are isolated in Multitenancy & idempotency.
Learn how replies are returned in Delivery, outbox & recovery.
Optional capabilities run as Sidecars.