SIP telephony lets an AiHummer agent answer and place real phone calls. The
agent listens to the caller, runs a normal turn, and speaks the reply back — a
fully voice-driven conversation over the public telephone network.
The channel ships as a connector from the in-product
Marketplace and runs host-native. It is
configured from the Admin UI, like any other channel.
Operator-neutral by design
SIP is operator-neutral: AiHummer connects to any standard SIP trunk,
with no carrier lock-in. Bring the provider you already use, or run your own
PBX — the connector does not depend on any single telephony vendor.
Ports and runtime
Port
Purpose
8830
Connector health endpoint
5062
SIP listen
4444
baresip control
The voice stack is built from baresip for SIP signalling and media, an
ALSA loopback device for audio routing, and a Python bridge that connects
the call audio to the agent turn.
In-call capabilities
While a call is live, the agent has call-specific tools and behaviours:
ask_assistant — run an agent turn on what the caller just said.
[!NOTE]
On this channel the agent supports STT, TTS, barge-in, DTMF and recording
only. Speaker diarization, translation and voice cloning are core/sidecar
voice features — they are not part of the SIP connector.
Speech engines
Two speech engines are available; choose per your latency and sovereignty needs:
yandex — a realtime cloud engine.
local — fully self-hosted, using faster-whisper for speech-to-text
and edge-tts for text-to-speech.
[!TIP]
The local engine keeps voice entirely on your own infrastructure, in line
with AiHummer’s no-mandatory-paid-models principle.
**SIP telephony lets an AiHummer agent answer and place real phone calls.** The
agent listens to the caller, runs a normal turn, and speaks the reply back — a
fully voice-driven conversation over the public telephone network.
The channel ships as a **connector from the in-product
[Marketplace](/en/v1.0/marketplace/overview-tiers)** and runs host-native. It is
configured from the Admin UI, like any other channel.
## Operator-neutral by design
SIP is **operator-neutral**: AiHummer connects to **any standard SIP trunk**,
with **no carrier lock-in**. Bring the provider you already use, or run your own
PBX — the connector does not depend on any single telephony vendor.
## Ports and runtime
| Port | Purpose |
|---|---|
| 8830 | Connector health endpoint |
| 5062 | SIP listen |
| 4444 | baresip control |
The voice stack is built from **baresip** for SIP signalling and media, an
**ALSA loopback** device for audio routing, and a **Python bridge** that connects
the call audio to the agent turn.
## In-call capabilities
While a call is live, the agent has call-specific tools and behaviours:
- **`ask_assistant`** — run an agent turn on what the caller just said.
- **`send_dtmf`** — emit DTMF tones (for IVR navigation or keypad input).
- **Recording** of the call.
- A **post-call summary** generated once the call ends.
> [!NOTE]
> On this channel the agent supports **STT, TTS, barge-in, DTMF and recording**
> only. Speaker diarization, translation and voice cloning are **core/sidecar**
> voice features — they are **not** part of the SIP connector.
## Speech engines
Two speech engines are available; choose per your latency and sovereignty needs:
- **`yandex`** — a realtime cloud engine.
- **`local`** — fully self-hosted, using **faster-whisper** for speech-to-text
and **edge-tts** for text-to-speech.
> [!TIP]
> The `local` engine keeps voice entirely on your own infrastructure, in line
> with AiHummer's no-mandatory-paid-models principle.
## Where to next
- On installing connectors: [Marketplace](/en/v1.0/marketplace/overview-tiers).
- For diarization, translation and voice cloning, see the Voice section.
- For text channels, see [Telegram](/en/v1.0/channels/telegram) and
[MAX](/en/v1.0/channels/max).