SIP Voice puts your agent on the phone. It answers and places real calls over
any standard SIP trunk, runs a live speech turn during the call, and hands you a
recording and a post-call summary when the call ends. It is operator-neutral:
it speaks plain SIP, so there is no carrier lock-in — point it at whatever trunk
your telephony provider gives you.
The plugin runs host-native and stitches together baresip (the SIP/RTP
endpoint), an ALSA loopback audio path, and a small Python bridge that moves
audio between the call and the gateway’s voice turn. It talks to the gateway over
the contract. The end-user channel behaviour is on the
channel page SIP; this page covers the plugin itself.
A bridge between a phone call and an AiHummer turn. When a call connects, audio
flows from baresip through the ALSA loopback into the Python bridge, which runs
a speech-to-text → agent turn → text-to-speech cycle and plays the reply back
into the call. Two speech engines are available:
yandex — a realtime cloud speech engine.
local — fully self-hosted, using faster-whisper for STT and edge-tts
for TTS.
How it is used
During a call the agent can do more than just talk:
ask_assistant — run an in-call agent turn to answer the caller.
send_dtmf — emit DTMF tones (e.g. to navigate an IVR or enter a code).
recording — capture the call audio.
barge-in — let the caller interrupt the agent’s speech naturally.
post-call summary — produce a summary once the call ends.
[!WARNING]
The SIP plugin supports STT, TTS, barge-in, DTMF and recording only. It
does not provide speaker diarization, speech translation or voice cloning.
Those are separate core/sidecar capabilities (see
Diarization, translation & cloning)
and are not part of the SIP call path. Do not assume them on a phone call.
Installation
Install SIP Voice in one click from the marketplace in the admin UI. The
host-native deployer downloads the plugin, runs its install step, renders a
sandboxed systemd unit and waits for the health endpoint before marking it ready
— see Install & updates. You then
configure your SIP trunk credentials and choose the yandex or local engine.
[!TIP]
For a fully self-hosted, no-paid-models setup, choose the local engine:
faster-whisper and edge-tts keep the entire voice path on your own host.
Security and limits
Operator-neutral trunk. Standard SIP, no carrier lock-in.
Scope is STT/TTS/DTMF/recording/barge-in. No diarization, translation or
voice clone on the call path.
Host-native.baresip, the loopback and the bridge run under systemd, not
in a container.
Recording is explicit. Recording and post-call summaries are call features
you enable — handle call audio per your local consent/retention rules.
<p class="doc-plugin-logo"><span class="pl-tile"><img src="/pl/sip.svg" alt="SIP" width="34" height="34" /></span></p>
**SIP Voice puts your agent on the phone.** It answers and places real calls over
any standard SIP trunk, runs a live speech turn during the call, and hands you a
recording and a post-call summary when the call ends. It is **operator-neutral**:
it speaks plain SIP, so there is no carrier lock-in — point it at whatever trunk
your telephony provider gives you.
The plugin runs host-native and stitches together `baresip` (the SIP/RTP
endpoint), an **ALSA loopback** audio path, and a small Python bridge that moves
audio between the call and the gateway's voice turn. It talks to the gateway over
the contract. The end-user channel behaviour is on the
channel page [SIP](/en/v1.0/channels/sip); this page covers the plugin itself.
## Facts
| Field | Value |
|---|---|
| Version | 25.1.0 |
| Health port | 8830 |
| SIP listen | 5062 |
| baresip control | 4444 |
| Runtime | baresip + ALSA loopback + Python bridge, host-native |
## What it is
A bridge between a phone call and an AiHummer turn. When a call connects, audio
flows from `baresip` through the ALSA loopback into the Python bridge, which runs
a **speech-to-text → agent turn → text-to-speech** cycle and plays the reply back
into the call. Two speech engines are available:
- **`yandex`** — a realtime cloud speech engine.
- **`local`** — fully self-hosted, using `faster-whisper` for STT and `edge-tts`
for TTS.
## How it is used
During a call the agent can do more than just talk:
- **`ask_assistant`** — run an in-call agent turn to answer the caller.
- **`send_dtmf`** — emit DTMF tones (e.g. to navigate an IVR or enter a code).
- **recording** — capture the call audio.
- **barge-in** — let the caller interrupt the agent's speech naturally.
- **post-call summary** — produce a summary once the call ends.
```text
caller ─▶ SIP trunk ─▶ baresip ─▶ ALSA loopback ─▶ bridge ─▶ STT ─▶ agent turn ─▶ TTS ─▶ caller
```
## Feature scope
> [!WARNING]
> The SIP plugin supports **STT, TTS, barge-in, DTMF and recording only.** It
> does **not** provide speaker diarization, speech translation or voice cloning.
> Those are separate core/sidecar capabilities (see
> [Diarization, translation & cloning](/en/v1.0/voice/diarization-translation-clone))
> and are not part of the SIP call path. Do not assume them on a phone call.
## Installation
Install SIP Voice in one click from the marketplace in the admin UI. The
host-native deployer downloads the plugin, runs its install step, renders a
sandboxed systemd unit and waits for the health endpoint before marking it ready
— see [Install & updates](/en/v1.0/marketplace/install-updates). You then
configure your SIP trunk credentials and choose the `yandex` or `local` engine.
> [!TIP]
> For a fully self-hosted, no-paid-models setup, choose the **`local`** engine:
> `faster-whisper` and `edge-tts` keep the entire voice path on your own host.
## Security and limits
- **Operator-neutral trunk.** Standard SIP, no carrier lock-in.
- **Scope is STT/TTS/DTMF/recording/barge-in.** No diarization, translation or
voice clone on the call path.
- **Host-native.** `baresip`, the loopback and the bridge run under systemd, not
in a container.
- **Recording is explicit.** Recording and post-call summaries are call features
you enable — handle call audio per your local consent/retention rules.
## Where to next
- Channel: [SIP](/en/v1.0/channels/sip)
- [Speech in/out (STT/TTS)](/en/v1.0/voice/stt-tts)
- [Install & updates](/en/v1.0/marketplace/install-updates)