Gateway & turn engine
The heart of AiHummer is a single gateway service. It is at the same time the control-plane (admin API, settings, channel wiring, marketplace) and the turn engine (the function-calling loop that produces an answer). A typical deployment is therefore just that one service plus PostgreSQL — there is no separate worker tier you are required to run.
One service, two roles
The gateway listens on :8765 by default, controlled by
AIHUMMER_GATEWAY_ADDR. The installer may pick a different free port, but the
process is always the same single service.
# gateway.env — the only required setting
AIHUMMER_DATABASE_URL=postgres://user:pass@localhost:5432/aihummer?sslmode=disable
# Admin UI after start at http://localhost:8765/admin/
The only hard dependency is PostgreSQL. Postgres is the single source of truth: agents, settings, conversations, memory, the outbox and audit all live there. Everything else — sidecars, a vector store, a message bus, model providers — is optional and wired in only when you configure it.
[!NOTE] Without a database the gateway starts in health-only mode: it answers
GET /healthzso an orchestrator or load balancer can see the process is alive, but it will not serve turns.GET /readyzchecks PostgreSQL and returns503while the database is unreachable.
What happens at startup
On startup the gateway performs a few steps in a strict order:
- opens the database connection pool,
- applies any pending migrations under a Postgres advisory lock,
- resolves configuration (database value → environment variable → built-in default), and
- wires the services — router, orchestrator, channels, tools, memory, delivery — into a running gateway.
The important consequence is that most features are opt-in via a settings
key. A capability that is not configured is simply not activated, which keeps
the default runtime small and predictable. You turn things on from the web admin
UI or with an AIHUMMER_* variable, and the gateway resolves them on the next
start (or hot, for the knobs that support it).
The turn engine
When a message reaches the gateway, the turn engine takes over. It runs a function-calling loop: the model is given the system prompt and the conversation, it may call tools (or spawn sub-agents), each tool result is fed back, and the loop continues until the model produces a final answer. That answer is then handed to the delivery layer.
inbound message
└─▶ turn engine
├─ assemble layered system prompt
├─ call model ──▶ tool calls / sub-agents ──▶ tool results ─┐
│ ▲ │
│ └────────────────────────────────────────────────-─┘
└─ final answer ─▶ outbox ─▶ originating channel
Because the loop is deterministic about where each input comes from, answers are resolved from the conversation history and from tool results — never by injecting untrusted text into the instructions. That property is what makes the prompt layering below safe as well as fast.
The layered, cache-friendly system prompt
The system prompt is not one blob. It is assembled in layers, deliberately ordered so that the stable parts come first and the volatile parts come last. This matters because model providers cache a prompt by its prefix: as long as the beginning of the prompt is byte-for-byte identical, the cached prefix is reused and only the tail is re-processed.
| Zone | Layers (in order) | Changes… |
|---|---|---|
| Stable prefix (cacheable) | base identity + tool/memory guide → tenant → project → persona → skills | rarely — per agent/tenant/project |
| Volatile tail (appended last) | onboarding state → memory hydration → the live date | every turn |
The stable prefix carries everything that defines who the agent is: the built-in identity and the guide to how tools and memory work, then the tenant layer, the project layer, the agent’s persona, and the rendered skills block. None of that changes between two consecutive turns of the same agent, so it forms a reusable cached prefix.
The volatile tail is appended after the stable prefix precisely so it never invalidates the cache: onboarding state, the memory hydrated for this specific conversation, and the current date all change turn to turn, but because they sit at the end they only cost what they add.
[!TIP] This ordering is the reason live data such as today’s date can be present in every turn without paying to re-encode the whole identity each time. Keep custom per-agent content in the stable layers (persona, skills) and let the engine own the volatile tail.
Where to next
- See how tenants are isolated in Multitenancy & idempotency.
- Learn how replies are returned in Delivery, outbox & recovery.
- Optional capabilities run as Sidecars.