35 lines
3.0 KiB
Markdown
35 lines
3.0 KiB
Markdown
# TODOS — Echo Core deferred work
|
||
|
||
Captured during planning reviews. Re-evaluate after relevant features ship or dogfood data accumulates.
|
||
|
||
## Voice
|
||
|
||
### Bounded SSRC buffer for DAVE-active unknown-SSRC race
|
||
|
||
**What:** Replace the hard-drop of unknown-SSRC RTP packets in `_maybe_dave_decrypt` (vendor/discord-ext-voice-recv/.../reader.py) with a small bounded buffer per SSRC. Flush on SPEAKING event mapping the SSRC → user_id, then DAVE-decrypt and feed downstream.
|
||
|
||
**Why:** voice-recv vanilla feeds unknown-SSRC packets to opus decoder anyway (reader.py:178 logs `info` but still calls `feed_rtp`). The DAVE patch turns this into a hard drop because davey requires `user_id`. Net regression: 40-200ms (1-5 packets) lost on the FIRST utterance of each new speaker per session, when audio races ahead of SPEAKING event. Subsequent utterances unaffected.
|
||
|
||
**Pros:** Eliminates first-utterance audio loss. Whisper STT gets the complete prefix ("Echo, cât e ceasul?" instead of possibly "co, cât e ceasul?").
|
||
|
||
**Cons:** New state machine — queue per SSRC, TTL flush (~2s), ordering preservation, memory bound. New race surface between socket-reader thread (queueing) and asyncio loop (SPEAKING event → flush). 50 packets * ~1KB * N concurrent unknown SSRCs = memory footprint. Bug risk traded for UX win.
|
||
|
||
**Context:** Discovered during /plan-eng-review on `/home/moltbot/.claude/plans/wiggly-exploring-glade.md` (DAVE receive-side decrypt patch). Outside-voice reviewer flagged this as a regression vs voice-recv vanilla behavior. Accepted as tradeoff for v1 because SPEAKING typically arrives before audio in normal Discord flow — impact may be rare. **Depends on:** dogfood data from Pas 12 Etapa 2 #3-#13 confirming this IS observed in practice (i.e., Whisper transcripts repeatedly missing first word). If not observed, this TODO stays permanent. If observed in 3+ sessions, escalate.
|
||
|
||
**Where to start:** `_maybe_dave_decrypt` in `vendor/discord-ext-voice-recv/discord/ext/voice_recv/reader.py`. Add `_pending_packets: dict[ssrc, deque[bytes]]` on `AudioReader`. Hook SPEAKING event handler in voice_client.py to call new `flush_pending(ssrc, user_id)` method.
|
||
|
||
**Depends on / blocked by:** Pas 12 dogfood data. Re-evaluate after 3+ sessions of live use.
|
||
|
||
---
|
||
|
||
## (Other deferred items from voice review — already in plan's "Out of scope" section)
|
||
|
||
- Wake-word "Echo" cu porcupine (P3 — incompatible with /voice join continuous)
|
||
- Telegram voice memo bidirectional (P2 — reuses src/voice/pipeline.py)
|
||
- Full-session WAV recording (P3 — KB transcript sufficient v1)
|
||
- Upstreaming the DAVE patch to imayhaveborkedit/discord-ext-voice-recv (separate community effort)
|
||
- `threading.Lock` around davey.decrypt (conditional follow-up — only if dogfood reveals crashes)
|
||
- DAVE verification UI (`voice_privacy_code`, pairwise fingerprints — useful but not blocking voice-to-voice)
|
||
- Video E2E decrypt (Echo is audio-only, no video pipeline)
|
||
- Pre-existent test failures: TestPromptInjectionProtection × 2 + TestOnMessage × 4 (separate ticket)
|