AiHummer docs
v1.0.x
RU EN

Delivery, outbox & recovery

v1.0.x · updated 2026-06-26

When a turn finishes, the answer still has to reach the channel it came from — reliably, and exactly once. AiHummer handles this with a transactional outbox, idempotent side-effects, and turn recovery after a restart. An optional NATS bus can carry delivery for larger deployments.

Guaranteed delivery

The reply is not sent inline at the end of a turn. Instead it is enqueued in the outbox as part of the same work that produced it, and a delivery worker picks it up and sends it to the originating channel. Because the reply is durably recorded before it leaves, a crash between “answer produced” and “answer sent” cannot lose it — on restart the outbox entry is still there and delivery resumes.

turn produces reply
   └─▶ enqueue in outbox (durable)
         └─▶ delivery worker ─▶ originating channel
               └─ acknowledged ─▶ marked delivered

Delivery is paired with idempotency so that this at-least-once worker yields exactly-once external behaviour: a resume-stable ledger key and a side-effect barrier ensure a redelivered or replayed reply is sent to the channel only once (see Multitenancy & idempotency).

[!NOTE] “Exactly-once” here describes the externally visible outcome: the user gets the reply once. Internally the worker may attempt delivery more than once; the idempotency layer collapses those attempts to a single visible send.

Turn recovery after a restart

If the gateway restarts mid-turn — a deploy, a crash, a host reboot — the work is not abandoned. Turn recovery replays the interrupted turn so it can complete. This is only safe because side-effects are idempotent: any mail or channel-send that already happened before the restart is recognised by its ledger key and skipped, while the parts that did not happen run to completion. The user sees a single, complete answer rather than a half-finished one or a duplicate.

The optional NATS bus

By default delivery works entirely through PostgreSQL, which keeps a minimal deployment to one gateway plus a database. For larger or multi-gateway setups you can put delivery on a NATS bus by configuring its URL:

# gateway.env
AIHUMMER_NATS_URL=nats://127.0.0.1:4222

The bus is genuinely optional — leave AIHUMMER_NATS_URL unset and the outbox runs without it. When present, NATS becomes the transport over which delivery work flows, which suits horizontally scaled gateways behind a proxy.

[!TIP] Zero-downtime upgrades run two or more gateways behind a reverse proxy. The scheduler stays single-leader via a Postgres advisory lock, and idempotent delivery means the extra gateways can share the load without sending anything twice.

What to watch

Delivery is where reliability problems surface first, so it is the thing to monitor. AiHummer pushes telemetry over OTLP (AIHUMMER_OTEL_ENDPOINT) and ships Grafana dashboards; there is no Prometheus /metrics endpoint to scrape.

  • Delivery dispositions — the outcome of each delivery attempt. A rising rate of failures points at a channel or downstream problem.
  • Outbox depth — how many replies are waiting to be delivered. Sustained growth means delivery is falling behind.
  • DLQ depth — entries that exhausted their retries and landed in the dead-letter queue. Any DLQ growth deserves investigation.

[!WARNING] Watch these alongside turn latency and error rate. A growing outbox or DLQ is an early warning that downstream channels are degraded even when turns themselves still succeed.

Where to next