feat(voice): Pas 5 — voice/pipeline.py VoiceSession + EchoVoiceSink + cleanup
Central voice pipeline (~250 LOC + docstrings = ~430 lines):
VoiceSession (context manager + idempotent cleanup pe 5 căi):
- __enter__: acquire _lock, open JSONL (record=on)
- __exit__: calls cleanup("exit"), nu suprimă exceptions
- cleanup(reason): IDEMPOTENT, side effects o singură dată — JSONL
flush+close (record=on) sau delete (record=off), bot presence cleared,
voice_client.cleanup(), ttsq.stop(), cancel filler task, lock release,
structured log la logs/voice_metrics.jsonl
- on_segment_done(speaker_id, text, no_speech_prob): mirror text channel,
append JSONL, arm 3s filler timer, route_message cu on_text callback
+ cancel filler la first block
- last_activity_ts: time.monotonic() — caller-driven 5min auto-leave
EchoVoiceSink(session, bot_user_id):
- wants_opus() False (PCM)
- write() runs în voice_recv reader thread (threading primitives only):
- GUARD 1: user None/id==0/id==bot_user_id → return (load-bearing
echo prevention)
- GUARD 2: whitelist filter (empty = allow all)
- Buffer 20ms packets per-user → batch 100ms (5×20ms = 19200 bytes)
→ silero-vad threshold 0.5 → 800ms cumulative silence flush
- _flush_to_stt: faster-whisper small int8 cpu_threads=4 lang=ro
beam_size=1, no_speech_prob > 0.6 drop, schedule on_segment_done
via run_coroutine_threadsafe pe session.loop
Module helpers (lazy thread-safe singletons): _get_whisper_model,
_get_silero_vad. Constants: FILLER_DELAY_S=3.0, SILENCE_FLUSH_MS=800,
VAD_THRESHOLD=0.5, VAD_WINDOW_MS=100, NO_SPEECH_DROP_THRESHOLD=0.6.
Decisions:
- STT runs in audio thread — acceptable la 2.25s p50 (user just stopped
talking, no batching contention). Wrap în ThreadPoolExecutor.submit
if perf bites later.
- Downsample 48k→16k via 3-sample averaging (no scipy dep). Whisper
robust la mild aliasing.
- Energy-RMS VAD fallback dacă torch import fail — graceful degrade.
- router_route_message injection seam ca kwarg pentru testabilitate.
- bot.change_presence handling cross-thread via run_coroutine_threadsafe.
tests/test_voice_session_cleanup.py — 6 tests:
- voice_leave / disconnect / crash via __exit__ / auto_leave /
user_left_channel (5 cleanup paths each verified for: JSONL state,
presence cleared, voice_client.cleanup, ttsq.stop, lock release,
idempotency)
- 1 robustness cross-cut (double-cleanup safety)
6/6 PASS. Regression suite 63/63 PASS (normalize + adapter + mutex).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>