feat(voice): DAVE E2E + full voice UX (squash of voice/dave-recv)

Squashed branch: voice/dave-recv → master. Closes Pas 12 (DAVE E2E) and lands
voice-mode UX polish + verbal voice control on top of the Pas 1-10 scaffolding
already on master.

## DAVE E2E receive-side decrypt (e4f3177)

Vendored fork: discord-ext-voice-recv 0.5.3a+echo.dave1. Patches the receive
pipeline to handle Discord's mandatory DAVE encryption on voice gateway v=8.
- `_maybe_dave_decrypt`: uses davey.can_passthrough(user_id) as primary gate,
  falls through to dave.decrypt for DAVE-epoch peers, drops on decrypt failure
  without killing the reader thread.
- VAD fix: silero-vad v5+ requires exactly 512 samples; our 100ms window
  (1600 samples) was silently raising ValueError → STT never fired. Now slice
  into 512-sample chunks.
- Whisper: bumped beam_size 1→5 and added RO initial_prompt.
- Tests: 11 DAVE unit tests + 2 callback integration tests + contract test
  with fork-version guard.

## Voice UX polish (d1bc77e)

- Killed the 3s "mă gândesc" filler (always collided with Claude p50 4-7s).
- Barge-in via `ttsq.clear()` at top of `on_segment_done`.
- DTX silence-flush poller (200ms tick) — Discord stops sending RTP packets
  when silent, so the inline silence-check in sink.write() never fired for
  trailing audio; background thread handles it.
- `EchoStreamingAudioSource.read()` non-blocking — old `get_frame(timeout=0.1)`
  wrecked Discord's 20ms cadence and the client interpreted bursts as
  stuttering (Marius heard "4 de minute" instead of full sentence).
- RO time expansion: 23:09 → "douăzeci și trei și nouă minute".
- Supertonic Unicode sanitize centralized in tools/tts.py.
- Whisper local_files_only=True — no HF metadata GET on each startup.
- Diagnostic logging through sink → VAD → Claude stream → TTS chain.

## Voice mode iteration (e589e48)

- `personality/VOICE_MODE.md` — voice-tailored system prompt (short, no
  markdown, no abbreviations, time without seconds, distances in
  "mii"/"milioane"); plumbed via build_system_prompt(voice_mode=True).
- Isolated voice session key `voice:<channel_id>` — voice doesn't share
  context with text adapter on the same channel; auto-applied without
  /clear ceremony. /clear drops both keys.
- Metric units + Romanian thousands (normalize.py): "384.000 km" →
  "trei sute optzeci și patru de mii de kilometri" with feminine-correct
  pluralization and "de" particle for ≥20.
- `/voice setvoice <M1-F5>` slash command with native autocomplete; swaps
  live + persists voice.default_voice to config.json.
- Verbal voice change (src/voice/voice_commands.py + 29 tests) — "schimbă
  vocea pe M5", "voce em cinci", with permissive substring fallback for
  Whisper-mangled forms like "Mâcinci"=M5 and "unul cinci"=M5. Whisper
  initial_prompt now lists voice vocabulary to bias STT toward clean
  outputs.
- Fast barge-in: VAD ≥2 consecutive windows (~200ms) on Marius's user
  while Echo has pending TTS frames → cut him off mid-sentence so user
  doesn't wait the full silence + STT cycle. Acoustic echo bleed-through
  still requires headphones (no AEC).

## Test suite

130 voice + router tests pass (test_voice_recv_dave, test_voice_session_cleanup,
test_voice_adapter_contract, test_voice_normalize, test_voice_commands,
test_router).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-27 21:00:27 +00:00
parent 13931db953
commit 4be70440e8
18 changed files with 1118 additions and 140 deletions

View File

@@ -30,7 +30,10 @@ class TestClearCommand:
response, is_cmd = route_message("ch-1", "user-1", "/clear")
assert response == "Session cleared. Model reset to sonnet."
assert is_cmd is True
mock_clear.assert_called_once_with("ch-1")
# /clear drops both the text-adapter session and the isolated voice
# session for the same Discord channel.
mock_clear.assert_any_call("ch-1")
mock_clear.assert_any_call("voice:ch-1")
@patch("src.router._get_config")
@patch("src.router.clear_session")
@@ -191,7 +194,7 @@ class TestRegularMessage:
response, is_cmd = route_message("ch-1", "user-1", "hello")
assert response == "Hello from Claude!"
assert is_cmd is False
mock_send.assert_called_once_with("ch-1", "hello", model="sonnet", on_text=None)
mock_send.assert_called_once_with("ch-1", "hello", model="sonnet", on_text=None, voice_mode=False)
@patch("src.router.send_message")
def test_model_override(self, mock_send):
@@ -199,7 +202,7 @@ class TestRegularMessage:
response, is_cmd = route_message("ch-1", "user-1", "hello", model="opus")
assert response == "Response"
assert is_cmd is False
mock_send.assert_called_once_with("ch-1", "hello", model="opus", on_text=None)
mock_send.assert_called_once_with("ch-1", "hello", model="opus", on_text=None, voice_mode=False)
@patch("src.router._get_channel_config")
@patch("src.router._get_config")
@@ -227,7 +230,7 @@ class TestRegularMessage:
cb = lambda t: None
route_message("ch-1", "user-1", "hello", on_text=cb)
mock_send.assert_called_once_with("ch-1", "hello", model="sonnet", on_text=cb)
mock_send.assert_called_once_with("ch-1", "hello", model="sonnet", on_text=cb, voice_mode=False)
# --- _get_channel_config ---
@@ -269,7 +272,7 @@ class TestModelResolution:
mock_chan_cfg.return_value = {"id": "ch-1", "default_model": "haiku"}
route_message("ch-1", "user-1", "hello")
mock_send.assert_called_once_with("ch-1", "hello", model="haiku", on_text=None)
mock_send.assert_called_once_with("ch-1", "hello", model="haiku", on_text=None, voice_mode=False)
@patch("src.router._get_channel_config")
@patch("src.router._get_config")
@@ -283,7 +286,7 @@ class TestModelResolution:
mock_get_config.return_value = mock_cfg
route_message("ch-1", "user-1", "hello")
mock_send.assert_called_once_with("ch-1", "hello", model="opus", on_text=None)
mock_send.assert_called_once_with("ch-1", "hello", model="opus", on_text=None, voice_mode=False)
@patch("src.router._get_channel_config")
@patch("src.router._get_config")
@@ -297,7 +300,7 @@ class TestModelResolution:
mock_get_config.return_value = mock_cfg
route_message("ch-1", "user-1", "hello")
mock_send.assert_called_once_with("ch-1", "hello", model="sonnet", on_text=None)
mock_send.assert_called_once_with("ch-1", "hello", model="sonnet", on_text=None, voice_mode=False)
@patch("src.router.get_active_session")
@patch("src.router.send_message")
@@ -307,4 +310,4 @@ class TestModelResolution:
mock_get_session.return_value = {"model": "opus", "session_id": "abc"}
route_message("ch-1", "user-1", "hello")
mock_send.assert_called_once_with("ch-1", "hello", model="opus", on_text=None)
mock_send.assert_called_once_with("ch-1", "hello", model="opus", on_text=None, voice_mode=False)

View File

@@ -169,3 +169,54 @@ def test_voice_data_has_opus_property():
opus_attr = inspect.getattr_static(VoiceData, "opus", None)
assert isinstance(opus_attr, property), "VoiceData.opus must be a property"
# --- Echo-core DAVE-decrypt fork guards -------------------------------------
#
# Two contract tests pinned by the DAVE receive-side decrypt patch.
# See plan: /home/moltbot/.claude/plans/wiggly-exploring-glade.md
#
# These fail fast on either:
# 1. An upstream voice-recv re-install wiping the fork's version marker
# (i.e. our patch is gone), OR
# 2. A discord.py upgrade renaming the connection-level DAVE attrs the
# patch reads (`dave_session`, `dave_protocol_version`).
def test_voice_recv_fork_version():
"""Echo-core fork tag for the DAVE-decrypt patch.
Lane A bumps `voice_recv.__version__` to `'0.5.3a+echo.dave1'` (PEP 440
local segment). If this assertion fails after a vendor reinstall, the
fork patch has been lost — re-apply `_maybe_dave_decrypt` + the
`callback()` hook before deploying, or live voice will regress to the
`opus_decode: corrupted stream` error chain.
"""
from discord.ext import voice_recv
assert voice_recv.__version__ == "0.5.3a+echo.dave1", (
f"voice_recv.__version__ is {voice_recv.__version__!r}; expected "
"'0.5.3a+echo.dave1'. The DAVE-decrypt fork patch has been "
"overwritten — re-apply before reinstalling the vendored package."
)
def test_voice_connection_state_has_dave_attrs():
"""`_maybe_dave_decrypt` reads `dave_session` and `dave_protocol_version`
off the discord.py `VoiceConnectionState`. If a future discord.py upgrade
renames either attr, fail loudly here rather than in a live voice call
(where the symptom is silent packet drops).
"""
from discord import voice_state
src = inspect.getsource(voice_state.VoiceConnectionState)
assert "dave_session" in src, (
"discord.voice_state.VoiceConnectionState source no longer mentions "
"'dave_session' — discord.py may have renamed the attr. Update "
"vendor/discord-ext-voice-recv/.../reader.py::_maybe_dave_decrypt."
)
assert "dave_protocol_version" in src, (
"discord.voice_state.VoiceConnectionState source no longer mentions "
"'dave_protocol_version' — discord.py may have renamed the attr. "
"Update _maybe_dave_decrypt accordingly."
)

View File

@@ -0,0 +1,55 @@
"""Tests for src/voice/voice_commands.detect_voice_change."""
from __future__ import annotations
import pytest
from src.voice.voice_commands import detect_voice_change
class TestDetectVoiceChange:
# --- positive cases (direct form) ---
@pytest.mark.parametrize("text,expected", [
("schimbă vocea pe M5", "M5"),
("Schimbă vocea pe F3.", "F3"),
("vorbește cu vocea M1", "M1"),
("vorbește cu vocea F2", "F2"),
("voce M4", "M4"),
("Voce F5.", "F5"),
("treci pe vocea F1", "F1"),
("Echo, treci pe M2.", "M2"),
("voice M3", "M3"),
])
def test_direct_form(self, text, expected):
assert detect_voice_change(text) == expected
# --- positive cases (word form, what Whisper actually produces) ---
@pytest.mark.parametrize("text,expected", [
("schimbă vocea pe em cinci", "M5"),
("vorbește cu vocea em trei", "M3"),
("voce em unu", "M1"),
("schimbă vocea pe ef doi", "F2"),
("voce ef cinci", "F5"),
("vorbește cu vocea masculină cinci", "M5"),
("schimbă vocea pe feminină trei", "F3"),
("voce masculin patru", "M4"),
("schimbă vocea pe M cinci", "M5"),
("voce F două", "F2"),
])
def test_word_form(self, text, expected):
assert detect_voice_change(text) == expected
# --- negative cases ---
@pytest.mark.parametrize("text", [
"",
"cât este ora",
"M5", # no trigger word
"Salut Echo, sunt în M3", # M3 here is a location/etc, no trigger
"vocea ta este foarte bună", # trigger but no voice id
"schimbă te rog", # trigger but no id
"voce M6", # out of range
"voce M0", # out of range
"voce F8", # out of range
"schimbă vocea pe șapte", # digit out of range
])
def test_no_match(self, text):
assert detect_voice_change(text) is None

View File

@@ -0,0 +1,302 @@
"""DAVE receive-side decrypt tests for the vendored voice-recv fork.
Exercises Lane A's patch on
`vendor/discord-ext-voice-recv/discord/ext/voice_recv/reader.py`:
* `_maybe_dave_decrypt(rtp_packet)` — DAVE E2E layer sandwiched between the
transport-layer decrypt and the routing into the opus decoder. No-op when
the room is non-DAVE, when davey isn't installed, or when the SSRC map
hasn't caught up to a new speaker yet.
* `callback()` hook — feeds the DAVE-unwrapped plaintext into
`packet_router.feed_rtp()` on success, drops the packet on failure WITHOUT
killing the reader thread.
The test fixtures mirror `tests/test_voice_session_cleanup.py:33-54`:
* Construct `AudioReader` via `AudioReader.__new__(AudioReader)` + manual
attr set so the reader thread is never started.
* `MagicMock` everything below the unit under test.
`_HAS_DAVE` / `_MEDIA_TYPE_AUDIO` on the reader module are monkey-patched per
test so the suite passes whether or not `davey` is importable in the venv.
The assertions only become meaningful once Lane A's patch has landed and the
package has been re-installed (`pip install -e vendor/discord-ext-voice-recv
--force-reinstall`); the FILE itself is valid Python regardless.
See plan: /home/moltbot/.claude/plans/wiggly-exploring-glade.md
"""
from __future__ import annotations
from unittest.mock import MagicMock
import pytest
from discord.ext.voice_recv.reader import AudioReader
# Sentinel for `_MEDIA_TYPE_AUDIO`. Using a plain object() keeps the tests
# independent of whether davey is importable — we just assert the value
# flows through to `dave_session.decrypt()` unchanged.
_FAKE_MEDIA_TYPE_AUDIO = object()
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
@pytest.fixture
def fake_dave_session():
sess = MagicMock(name="dave_session")
sess.ready = True
# Default: this user is NOT in passthrough — DAVE decrypt must run.
# Individual tests can override to True to exercise the passthrough path.
sess.can_passthrough = MagicMock(return_value=False)
sess.decrypt = MagicMock(return_value=b"plaintext_opus")
return sess
@pytest.fixture
def fake_connection(fake_dave_session):
conn = MagicMock(name="_connection")
conn.dave_protocol_version = 1
conn.dave_session = fake_dave_session
return conn
@pytest.fixture
def fake_voice_client(fake_connection):
vc = MagicMock(name="voice_client")
vc._connection = fake_connection
vc._ssrc_to_id = {12345: 999_000}
return vc
@pytest.fixture
def fake_rtp_packet():
pkt = MagicMock(name="rtp_packet")
pkt.ssrc = 12345
pkt.decrypted_data = b"ciphertext_after_transport_decrypt"
pkt.is_silence = MagicMock(return_value=False)
return pkt
@pytest.fixture
def reader(fake_voice_client):
"""`AudioReader` instance with no reader thread spawned.
Same pattern used by `tests/test_voice_session_cleanup.py` for
`VoiceSession` — bypass `__init__` so we can drive the public surface
against pure mocks.
"""
r = AudioReader.__new__(AudioReader)
r.voice_client = fake_voice_client
r.error = None
return r
@pytest.fixture
def dave_enabled(monkeypatch):
"""Force the reader module's DAVE-availability flags ON.
Pins `_MEDIA_TYPE_AUDIO` to a known sentinel so the happy-path test can
assert exactly what gets passed to `dave_session.decrypt`. `raising=False`
keeps the monkeypatch valid even if Lane A's patch hasn't landed yet —
the tests will still fail (no `_maybe_dave_decrypt` attr), just for the
right reason.
"""
import discord.ext.voice_recv.reader as reader_mod
monkeypatch.setattr(reader_mod, "_HAS_DAVE", True, raising=False)
monkeypatch.setattr(
reader_mod, "_MEDIA_TYPE_AUDIO", _FAKE_MEDIA_TYPE_AUDIO, raising=False
)
return reader_mod
# ---------------------------------------------------------------------------
# Unit tests: `_maybe_dave_decrypt`
# ---------------------------------------------------------------------------
class TestMaybeDaveDecrypt:
"""Seven unit tests on the DAVE-decrypt gate.
The gate mirrors `voice_client.can_encrypt` in discord.py 2.7.1 exactly
(`voice_state.py:272-273`). Bypass semantics on every "DAVE inactive"
branch let non-DAVE rooms and davey-less environments keep working.
"""
def test_protocol_version_zero_bypasses_decrypt(
self, dave_enabled, reader, fake_connection, fake_dave_session, fake_rtp_packet,
):
"""`dave_protocol_version == 0` → return the transport-decrypted
payload unchanged; never touch `dave_session.decrypt`."""
fake_connection.dave_protocol_version = 0
result = reader._maybe_dave_decrypt(fake_rtp_packet)
assert result is fake_rtp_packet.decrypted_data
fake_dave_session.decrypt.assert_not_called()
def test_dave_session_none_bypasses_decrypt(
self, dave_enabled, reader, fake_connection, fake_rtp_packet,
):
"""`dave_session is None` → bypass. Pre-MLS-handshake state."""
fake_connection.dave_session = None
result = reader._maybe_dave_decrypt(fake_rtp_packet)
assert result is fake_rtp_packet.decrypted_data
def test_dave_session_not_ready_bypasses_decrypt(
self, dave_enabled, reader, fake_dave_session, fake_rtp_packet,
):
"""`dave_session.ready is False` → bypass. Pre-MLS-epoch-1 packets
are transport-only on the wire."""
fake_dave_session.ready = False
result = reader._maybe_dave_decrypt(fake_rtp_packet)
assert result is fake_rtp_packet.decrypted_data
fake_dave_session.decrypt.assert_not_called()
def test_unknown_ssrc_returns_none(
self, dave_enabled, reader, fake_voice_client, fake_dave_session, fake_rtp_packet,
):
"""SSRC not in `_ssrc_to_id` → drop (return None).
Accepted regression: davey requires per-user keys; when SPEAKING
events race behind the first audio packet, 1-5 packets per new
speaker per session are dropped. See plan §Edge cases.
"""
fake_voice_client._ssrc_to_id.clear()
result = reader._maybe_dave_decrypt(fake_rtp_packet)
assert result is None
fake_dave_session.decrypt.assert_not_called()
def test_happy_path_invokes_decrypt_and_returns_plaintext(
self, dave_enabled, reader, fake_dave_session, fake_rtp_packet,
):
"""Full DAVE-active path: `decrypt(user_id, MediaType.audio, ciphertext)`
called exactly once with the expected args; method returns the
davey plaintext bytes verbatim."""
ciphertext = fake_rtp_packet.decrypted_data
result = reader._maybe_dave_decrypt(fake_rtp_packet)
assert result == b"plaintext_opus"
fake_dave_session.decrypt.assert_called_once_with(
999_000, _FAKE_MEDIA_TYPE_AUDIO, ciphertext,
)
def test_decrypt_raises_returns_none_no_crash(
self, dave_enabled, reader, fake_dave_session, fake_rtp_packet,
):
"""davey.decrypt raising → drop the packet, don't propagate, and
leave `reader.error` untouched so the reader thread stays alive.
MLS epoch transitions can produce transient decrypt failures —
bumping `reader.error` would call `self.stop()` and kill the whole
receive pipeline."""
fake_dave_session.decrypt.side_effect = RuntimeError(
"simulated MLS epoch transition fail"
)
result = reader._maybe_dave_decrypt(fake_rtp_packet)
assert result is None
assert reader.error is None
def test_has_dave_false_bypasses_even_with_session_present(
self, monkeypatch, reader, fake_dave_session, fake_rtp_packet,
):
"""`_HAS_DAVE = False` → bypass everything, even if a real session
somehow showed up on the connection. Defensive shim that keeps the
tests (and any davey-less deploys) green."""
import discord.ext.voice_recv.reader as reader_mod
monkeypatch.setattr(reader_mod, "_HAS_DAVE", False, raising=False)
result = reader._maybe_dave_decrypt(fake_rtp_packet)
assert result is fake_rtp_packet.decrypted_data
fake_dave_session.decrypt.assert_not_called()
def test_can_passthrough_true_returns_payload_without_decrypt(
self, dave_enabled, reader, fake_dave_session, fake_rtp_packet,
):
"""`can_passthrough(user_id) == True` → return the transport-decrypted
payload as-is; never call `decrypt`. Mirrors Discord's protocol where
a passthrough-mode peer sends non-DAVE-wrapped packets that the
receiver must accept verbatim."""
fake_dave_session.can_passthrough = MagicMock(return_value=True)
result = reader._maybe_dave_decrypt(fake_rtp_packet)
assert result is fake_rtp_packet.decrypted_data
fake_dave_session.can_passthrough.assert_called_once_with(999_000)
fake_dave_session.decrypt.assert_not_called()
def test_can_passthrough_raises_falls_through_to_decrypt(
self, dave_enabled, reader, fake_dave_session, fake_rtp_packet,
):
"""`can_passthrough` raising → swallow the error and try `decrypt`.
Defensive: an older davey build or transient internal state shouldn't
break the receive pipeline."""
fake_dave_session.can_passthrough = MagicMock(
side_effect=RuntimeError("simulated davey internal error")
)
result = reader._maybe_dave_decrypt(fake_rtp_packet)
assert result == b"plaintext_opus"
fake_dave_session.decrypt.assert_called_once()
# ---------------------------------------------------------------------------
# Integration tests: `callback()` exercises the DAVE hook
# ---------------------------------------------------------------------------
class TestCallbackIntegration:
"""Two integration tests for the hook Lane A inserts between transport
decrypt (reader.py:141) and the post-decrypt routing (reader.py:159).
Strategy: stub the transport-decrypt and RTP parsing path so `callback()`
reaches the hook, then mock `_maybe_dave_decrypt` directly on the reader
instance. The assertion focuses on `feed_rtp` being called (test 8) vs.
not called (test 9). The transport path correctness is covered by
voice-recv's own upstream tests.
"""
@staticmethod
def _wire_callback(reader, monkeypatch, fake_rtp_packet):
import discord.ext.voice_recv.reader as reader_mod
# Redirect rtp parsing — we want an RTP path (not RTCP) so the hook fires.
monkeypatch.setattr(reader_mod.rtp, "is_rtcp", lambda data: False)
monkeypatch.setattr(reader_mod.rtp, "decode_rtp", lambda data: fake_rtp_packet)
# Stub the instance attrs `callback()` touches besides the hook.
reader.decryptor = MagicMock(name="decryptor")
reader.decryptor.decrypt_rtp = MagicMock(return_value=b"ciphertext")
reader.packet_router = MagicMock(name="packet_router")
reader.packet_router.feed_rtp = MagicMock()
reader.speaking_timer = MagicMock(name="speaking_timer")
reader.sink = MagicMock(name="sink")
def test_callback_feeds_when_dave_returns_bytes(
self, monkeypatch, reader, fake_rtp_packet,
):
"""Hook returns plaintext → `feed_rtp` called once with the
rtp_packet whose `decrypted_data` is now the post-DAVE plaintext."""
self._wire_callback(reader, monkeypatch, fake_rtp_packet)
plaintext = b"dave_unwrapped_opus_payload"
reader._maybe_dave_decrypt = MagicMock(return_value=plaintext)
reader.callback(b"raw_packet_bytes")
reader._maybe_dave_decrypt.assert_called_once_with(fake_rtp_packet)
assert reader.packet_router.feed_rtp.call_count == 1
called_with = reader.packet_router.feed_rtp.call_args[0][0]
assert called_with is fake_rtp_packet
assert fake_rtp_packet.decrypted_data == plaintext
assert reader.error is None
def test_callback_drops_when_dave_returns_none(
self, monkeypatch, reader, fake_rtp_packet,
):
"""Hook returns None → `feed_rtp` NOT called, no exception propagated,
`reader.error` stays None (reader thread survives the drop)."""
self._wire_callback(reader, monkeypatch, fake_rtp_packet)
reader._maybe_dave_decrypt = MagicMock(return_value=None)
reader.callback(b"raw_packet_bytes")
reader._maybe_dave_decrypt.assert_called_once_with(fake_rtp_packet)
reader.packet_router.feed_rtp.assert_not_called()
assert reader.error is None