AiHummer docs
v1.0.x
RU EN

SIP Voice plugin

v1.0.x · updated 2026-06-26

SIP Voice puts your agent on the phone. It answers and places real calls over any standard SIP trunk, runs a live speech turn during the call, and hands you a recording and a post-call summary when the call ends. It is operator-neutral: it speaks plain SIP, so there is no carrier lock-in — point it at whatever trunk your telephony provider gives you.

The plugin runs host-native and stitches together baresip (the SIP/RTP endpoint), an ALSA loopback audio path, and a small Python bridge that moves audio between the call and the gateway’s voice turn. It talks to the gateway over the contract. The end-user channel behaviour is on the channel page SIP; this page covers the plugin itself.

Facts

FieldValue
Version25.1.0
Health port8830
SIP listen5062
baresip control4444
Runtimebaresip + ALSA loopback + Python bridge, host-native

What it is

A bridge between a phone call and an AiHummer turn. When a call connects, audio flows from baresip through the ALSA loopback into the Python bridge, which runs a speech-to-text → agent turn → text-to-speech cycle and plays the reply back into the call. Two speech engines are available:

  • yandex — a realtime cloud speech engine.
  • local — fully self-hosted, using faster-whisper for STT and edge-tts for TTS.

How it is used

During a call the agent can do more than just talk:

  • ask_assistant — run an in-call agent turn to answer the caller.
  • send_dtmf — emit DTMF tones (e.g. to navigate an IVR or enter a code).
  • recording — capture the call audio.
  • barge-in — let the caller interrupt the agent’s speech naturally.
  • post-call summary — produce a summary once the call ends.
caller ─▶ SIP trunk ─▶ baresip ─▶ ALSA loopback ─▶ bridge ─▶ STT ─▶ agent turn ─▶ TTS ─▶ caller

Feature scope

[!WARNING] The SIP plugin supports STT, TTS, barge-in, DTMF and recording only. It does not provide speaker diarization, speech translation or voice cloning. Those are separate core/sidecar capabilities (see Diarization, translation & cloning) and are not part of the SIP call path. Do not assume them on a phone call.

Installation

Install SIP Voice in one click from the marketplace in the admin UI. The host-native deployer downloads the plugin, runs its install step, renders a sandboxed systemd unit and waits for the health endpoint before marking it ready — see Install & updates. You then configure your SIP trunk credentials and choose the yandex or local engine.

[!TIP] For a fully self-hosted, no-paid-models setup, choose the local engine: faster-whisper and edge-tts keep the entire voice path on your own host.

Security and limits

  • Operator-neutral trunk. Standard SIP, no carrier lock-in.
  • Scope is STT/TTS/DTMF/recording/barge-in. No diarization, translation or voice clone on the call path.
  • Host-native. baresip, the loopback and the bridge run under systemd, not in a container.
  • Recording is explicit. Recording and post-call summaries are call features you enable — handle call audio per your local consent/retention rules.

Where to next