Files
echo-core/TODOS.md

3.0 KiB
Raw Blame History

TODOS — Echo Core deferred work

Captured during planning reviews. Re-evaluate after relevant features ship or dogfood data accumulates.

Voice

Bounded SSRC buffer for DAVE-active unknown-SSRC race

What: Replace the hard-drop of unknown-SSRC RTP packets in _maybe_dave_decrypt (vendor/discord-ext-voice-recv/.../reader.py) with a small bounded buffer per SSRC. Flush on SPEAKING event mapping the SSRC → user_id, then DAVE-decrypt and feed downstream.

Why: voice-recv vanilla feeds unknown-SSRC packets to opus decoder anyway (reader.py:178 logs info but still calls feed_rtp). The DAVE patch turns this into a hard drop because davey requires user_id. Net regression: 40-200ms (1-5 packets) lost on the FIRST utterance of each new speaker per session, when audio races ahead of SPEAKING event. Subsequent utterances unaffected.

Pros: Eliminates first-utterance audio loss. Whisper STT gets the complete prefix ("Echo, cât e ceasul?" instead of possibly "co, cât e ceasul?").

Cons: New state machine — queue per SSRC, TTL flush (~2s), ordering preservation, memory bound. New race surface between socket-reader thread (queueing) and asyncio loop (SPEAKING event → flush). 50 packets * ~1KB * N concurrent unknown SSRCs = memory footprint. Bug risk traded for UX win.

Context: Discovered during /plan-eng-review on /home/moltbot/.claude/plans/wiggly-exploring-glade.md (DAVE receive-side decrypt patch). Outside-voice reviewer flagged this as a regression vs voice-recv vanilla behavior. Accepted as tradeoff for v1 because SPEAKING typically arrives before audio in normal Discord flow — impact may be rare. Depends on: dogfood data from Pas 12 Etapa 2 #3-#13 confirming this IS observed in practice (i.e., Whisper transcripts repeatedly missing first word). If not observed, this TODO stays permanent. If observed in 3+ sessions, escalate.

Where to start: _maybe_dave_decrypt in vendor/discord-ext-voice-recv/discord/ext/voice_recv/reader.py. Add _pending_packets: dict[ssrc, deque[bytes]] on AudioReader. Hook SPEAKING event handler in voice_client.py to call new flush_pending(ssrc, user_id) method.

Depends on / blocked by: Pas 12 dogfood data. Re-evaluate after 3+ sessions of live use.


(Other deferred items from voice review — already in plan's "Out of scope" section)

  • Wake-word "Echo" cu porcupine (P3 — incompatible with /voice join continuous)
  • Telegram voice memo bidirectional (P2 — reuses src/voice/pipeline.py)
  • Full-session WAV recording (P3 — KB transcript sufficient v1)
  • Upstreaming the DAVE patch to imayhaveborkedit/discord-ext-voice-recv (separate community effort)
  • threading.Lock around davey.decrypt (conditional follow-up — only if dogfood reveals crashes)
  • DAVE verification UI (voice_privacy_code, pairwise fingerprints — useful but not blocking voice-to-voice)
  • Video E2E decrypt (Echo is audio-only, no video pipeline)
  • Pre-existent test failures: TestPromptInjectionProtection × 2 + TestOnMessage × 4 (separate ticket)