feat(voice): DAVE E2E receive-side decrypt — unblocks Pas 12
Vendored fork: discord-ext-voice_recv 0.5.3a+echo.dave1 Patches the receive pipeline to handle Discord's mandatory DAVE E2E encryption on voice gateway v=8. Without this, opus_decode raised "corrupted stream" on every received packet in a DAVE-active room and voice-to-voice never connected. DAVE patch (vendor/discord-ext-voice-recv/reader.py): - `_maybe_dave_decrypt(rtp_packet)`: gate mirrors discord.py 2.7.1 `voice_state.can_encrypt`. Uses davey's `can_passthrough(user_id)` to branch — peers in passthrough send transport-only packets that pass through verbatim; peers in DAVE epoch go through `davey.decrypt`. - Hooked in `callback()` between transport decrypt and feed_rtp; drops on decrypt failure without killing the reader thread. - Bumps __version__ to '0.5.3a+echo.dave1' (PEP 440 local segment) so a contract test can fail fast on accidental upstream-sync overwrite. Pipeline fixes uncovered while testing DAVE end-to-end: - src/voice/pipeline.py: silero-vad v6+ requires exactly 512 samples per call at 16kHz; our 100ms window (1600 samples) was silently raising ValueError → VAD always returned False → STT never fired. Slice the window into 512-sample chunks. Bump whisper beam_size 1→5 and add a Romanian `initial_prompt` — transcriptions go from "Eco salt." gibberish to "Echo, salutare, te rog spune-mi cât este ora." - src/voice/tts_stream.py: EchoStreamingAudioSource.read() returns a 20ms silence frame instead of b'' on empty queue. Empty return is treated by Discord as end-of-stream and kills the player, so any TTS pushed later would be silently discarded. - src/adapters/discord_voice.py: actually attach EchoStreamingAudioSource to the voice client after the wakeup beep (chained via `after=`), which was missing entirely — TTS frames had no consumer. Tests: - tests/test_voice_recv_dave.py: 11 unit + callback integration tests covering bypass paths, can_passthrough gate, decrypt error handling. - tests/test_voice_adapter_contract.py: +test_voice_recv_fork_version and +test_voice_connection_state_has_dave_attrs guards against upstream drift on either side. Config: - config.json: voice.allowed_user_ids whitelist for Marius's user id. Status: voice-to-voice loop closes end-to-end (DAVE → VAD → Whisper → Claude → Supertonic → audio out). Latency is ~8-13s per turn, which is out of scope for this commit — see TODOS.md for the real-time UX follow-up plan. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -21,6 +21,14 @@ from typing import Optional
|
||||
import discord
|
||||
from discord import app_commands
|
||||
|
||||
# Optional DAVE dep (mandatory at runtime when discord.py 2.7.1 is paired with
|
||||
# Discord voice gateway v=8; tolerated missing in tests / dev environments).
|
||||
try:
|
||||
import davey
|
||||
_HAS_DAVE = True
|
||||
except ImportError:
|
||||
_HAS_DAVE = False
|
||||
|
||||
from src.config import Config
|
||||
from src.voice.pipeline import (
|
||||
VoiceSession,
|
||||
@@ -28,7 +36,7 @@ from src.voice.pipeline import (
|
||||
_get_whisper_model,
|
||||
_get_silero_vad,
|
||||
)
|
||||
from src.voice.tts_stream import TTSQueue
|
||||
from src.voice.tts_stream import TTSQueue, EchoStreamingAudioSource
|
||||
from src.voice._discord_voice_adapter import connect_voice
|
||||
|
||||
log = logging.getLogger("echo-core.discord.voice")
|
||||
@@ -53,6 +61,11 @@ async def warmup_models() -> None:
|
||||
"""
|
||||
global _voice_load_error
|
||||
try:
|
||||
if not discord.opus.is_loaded():
|
||||
discord.opus.load_opus("libopus.so.0")
|
||||
if _HAS_DAVE:
|
||||
log.info("DAVE protocol v%d available (davey %s)",
|
||||
davey.DAVE_PROTOCOL_VERSION, davey.__version__)
|
||||
await asyncio.to_thread(_get_whisper_model)
|
||||
await asyncio.to_thread(_get_silero_vad)
|
||||
log.info("Voice models warm")
|
||||
@@ -167,11 +180,24 @@ def register(tree: app_commands.CommandTree, bot: discord.Client) -> app_command
|
||||
)
|
||||
return
|
||||
_voice_sessions[guild_id] = session
|
||||
# Wake-up beep
|
||||
# Start TTS streaming source for the entire session. Chain the
|
||||
# wake-up beep via `after=` so streaming takes over when beep ends.
|
||||
def _start_stream(error: Optional[Exception] = None) -> None:
|
||||
if error is not None:
|
||||
log.warning("Beep playback ended with error: %s", error)
|
||||
try:
|
||||
vc.play(EchoStreamingAudioSource(ttsq))
|
||||
log.info("TTS streaming source attached")
|
||||
except Exception:
|
||||
log.exception("EchoStreamingAudioSource attach failed")
|
||||
try:
|
||||
vc.play(discord.FFmpegPCMAudio("assets/voice/beep_200ms.wav"))
|
||||
vc.play(
|
||||
discord.FFmpegPCMAudio("assets/voice/beep_200ms.wav"),
|
||||
after=_start_stream,
|
||||
)
|
||||
except Exception:
|
||||
log.warning("Beep playback skipped", exc_info=True)
|
||||
log.warning("Beep playback skipped, starting stream directly", exc_info=True)
|
||||
_start_stream()
|
||||
# Attach sink
|
||||
try:
|
||||
bot_user_id = int(bot.user.id) if bot.user is not None else 0
|
||||
|
||||
Reference in New Issue
Block a user