AiHummer docs
v1.0.x
RU EN

SIP telephony

v1.0.x · updated 2026-06-26

SIP telephony lets an AiHummer agent answer and place real phone calls. The agent listens to the caller, runs a normal turn, and speaks the reply back — a fully voice-driven conversation over the public telephone network.

The channel ships as a connector from the in-product Marketplace and runs host-native. It is configured from the Admin UI, like any other channel.

Operator-neutral by design

SIP is operator-neutral: AiHummer connects to any standard SIP trunk, with no carrier lock-in. Bring the provider you already use, or run your own PBX — the connector does not depend on any single telephony vendor.

Ports and runtime

PortPurpose
8830Connector health endpoint
5062SIP listen
4444baresip control

The voice stack is built from baresip for SIP signalling and media, an ALSA loopback device for audio routing, and a Python bridge that connects the call audio to the agent turn.

In-call capabilities

While a call is live, the agent has call-specific tools and behaviours:

  • ask_assistant — run an agent turn on what the caller just said.
  • send_dtmf — emit DTMF tones (for IVR navigation or keypad input).
  • Recording of the call.
  • A post-call summary generated once the call ends.

[!NOTE] On this channel the agent supports STT, TTS, barge-in, DTMF and recording only. Speaker diarization, translation and voice cloning are core/sidecar voice features — they are not part of the SIP connector.

Speech engines

Two speech engines are available; choose per your latency and sovereignty needs:

  • yandex — a realtime cloud engine.
  • local — fully self-hosted, using faster-whisper for speech-to-text and edge-tts for text-to-speech.

[!TIP] The local engine keeps voice entirely on your own infrastructure, in line with AiHummer’s no-mandatory-paid-models principle.

Where to next

  • On installing connectors: Marketplace.
  • For diarization, translation and voice cloning, see the Voice section.
  • For text channels, see Telegram and MAX.