3.0 KiB
TODOS — Echo Core deferred work
Captured during planning reviews. Re-evaluate after relevant features ship or dogfood data accumulates.
Voice
Bounded SSRC buffer for DAVE-active unknown-SSRC race
What: Replace the hard-drop of unknown-SSRC RTP packets in _maybe_dave_decrypt (vendor/discord-ext-voice-recv/.../reader.py) with a small bounded buffer per SSRC. Flush on SPEAKING event mapping the SSRC → user_id, then DAVE-decrypt and feed downstream.
Why: voice-recv vanilla feeds unknown-SSRC packets to opus decoder anyway (reader.py:178 logs info but still calls feed_rtp). The DAVE patch turns this into a hard drop because davey requires user_id. Net regression: 40-200ms (1-5 packets) lost on the FIRST utterance of each new speaker per session, when audio races ahead of SPEAKING event. Subsequent utterances unaffected.
Pros: Eliminates first-utterance audio loss. Whisper STT gets the complete prefix ("Echo, cât e ceasul?" instead of possibly "co, cât e ceasul?").
Cons: New state machine — queue per SSRC, TTL flush (~2s), ordering preservation, memory bound. New race surface between socket-reader thread (queueing) and asyncio loop (SPEAKING event → flush). 50 packets * ~1KB * N concurrent unknown SSRCs = memory footprint. Bug risk traded for UX win.
Context: Discovered during /plan-eng-review on /home/moltbot/.claude/plans/wiggly-exploring-glade.md (DAVE receive-side decrypt patch). Outside-voice reviewer flagged this as a regression vs voice-recv vanilla behavior. Accepted as tradeoff for v1 because SPEAKING typically arrives before audio in normal Discord flow — impact may be rare. Depends on: dogfood data from Pas 12 Etapa 2 #3-#13 confirming this IS observed in practice (i.e., Whisper transcripts repeatedly missing first word). If not observed, this TODO stays permanent. If observed in 3+ sessions, escalate.
Where to start: _maybe_dave_decrypt in vendor/discord-ext-voice-recv/discord/ext/voice_recv/reader.py. Add _pending_packets: dict[ssrc, deque[bytes]] on AudioReader. Hook SPEAKING event handler in voice_client.py to call new flush_pending(ssrc, user_id) method.
Depends on / blocked by: Pas 12 dogfood data. Re-evaluate after 3+ sessions of live use.
(Other deferred items from voice review — already in plan's "Out of scope" section)
- Wake-word "Echo" cu porcupine (P3 — incompatible with /voice join continuous)
- Telegram voice memo bidirectional (P2 — reuses src/voice/pipeline.py)
- Full-session WAV recording (P3 — KB transcript sufficient v1)
- Upstreaming the DAVE patch to imayhaveborkedit/discord-ext-voice-recv (separate community effort)
threading.Lockaround davey.decrypt (conditional follow-up — only if dogfood reveals crashes)- DAVE verification UI (
voice_privacy_code, pairwise fingerprints — useful but not blocking voice-to-voice) - Video E2E decrypt (Echo is audio-only, no video pipeline)
- Pre-existent test failures: TestPromptInjectionProtection × 2 + TestOnMessage × 4 (separate ticket)