Files
echo-core/TODOS.md

35 lines
3.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# TODOS — Echo Core deferred work
Captured during planning reviews. Re-evaluate after relevant features ship or dogfood data accumulates.
## Voice
### Bounded SSRC buffer for DAVE-active unknown-SSRC race
**What:** Replace the hard-drop of unknown-SSRC RTP packets in `_maybe_dave_decrypt` (vendor/discord-ext-voice-recv/.../reader.py) with a small bounded buffer per SSRC. Flush on SPEAKING event mapping the SSRC → user_id, then DAVE-decrypt and feed downstream.
**Why:** voice-recv vanilla feeds unknown-SSRC packets to opus decoder anyway (reader.py:178 logs `info` but still calls `feed_rtp`). The DAVE patch turns this into a hard drop because davey requires `user_id`. Net regression: 40-200ms (1-5 packets) lost on the FIRST utterance of each new speaker per session, when audio races ahead of SPEAKING event. Subsequent utterances unaffected.
**Pros:** Eliminates first-utterance audio loss. Whisper STT gets the complete prefix ("Echo, cât e ceasul?" instead of possibly "co, cât e ceasul?").
**Cons:** New state machine — queue per SSRC, TTL flush (~2s), ordering preservation, memory bound. New race surface between socket-reader thread (queueing) and asyncio loop (SPEAKING event → flush). 50 packets * ~1KB * N concurrent unknown SSRCs = memory footprint. Bug risk traded for UX win.
**Context:** Discovered during /plan-eng-review on `/home/moltbot/.claude/plans/wiggly-exploring-glade.md` (DAVE receive-side decrypt patch). Outside-voice reviewer flagged this as a regression vs voice-recv vanilla behavior. Accepted as tradeoff for v1 because SPEAKING typically arrives before audio in normal Discord flow — impact may be rare. **Depends on:** dogfood data from Pas 12 Etapa 2 #3-#13 confirming this IS observed in practice (i.e., Whisper transcripts repeatedly missing first word). If not observed, this TODO stays permanent. If observed in 3+ sessions, escalate.
**Where to start:** `_maybe_dave_decrypt` in `vendor/discord-ext-voice-recv/discord/ext/voice_recv/reader.py`. Add `_pending_packets: dict[ssrc, deque[bytes]]` on `AudioReader`. Hook SPEAKING event handler in voice_client.py to call new `flush_pending(ssrc, user_id)` method.
**Depends on / blocked by:** Pas 12 dogfood data. Re-evaluate after 3+ sessions of live use.
---
## (Other deferred items from voice review — already in plan's "Out of scope" section)
- Wake-word "Echo" cu porcupine (P3 — incompatible with /voice join continuous)
- Telegram voice memo bidirectional (P2 — reuses src/voice/pipeline.py)
- Full-session WAV recording (P3 — KB transcript sufficient v1)
- Upstreaming the DAVE patch to imayhaveborkedit/discord-ext-voice-recv (separate community effort)
- `threading.Lock` around davey.decrypt (conditional follow-up — only if dogfood reveals crashes)
- DAVE verification UI (`voice_privacy_code`, pairwise fingerprints — useful but not blocking voice-to-voice)
- Video E2E decrypt (Echo is audio-only, no video pipeline)
- Pre-existent test failures: TestPromptInjectionProtection × 2 + TestOnMessage × 4 (separate ticket)