chore(voice): spike STT latency benchmark + HT contention lesson
Pas 1 (BLOCKING) din Discord voice-to-voice test plan. Sweet spot empiric
pe i7-6700T: faster-whisper small int8 @ cpu_threads=4 → p50 2.25s,
p95 2.64s, mean RTF 0.46. Curba HT: 2t=3.25s → 4t=2.25s (sweet) →
6t=2.79s (regres +24% prin contention). tiny respinge — halucinează RO.
- tools/voice_bench.py: harness benchmark cu 8 sample-uri RO sintetizate
via Supertonic API, măsoară p50/p95/RTF pentru small+tiny pe N threads.
- tools/voice_bench_results*.json: raw output 3 pass-uri (threads 2/4/6).
- tasks/voice-bench-results*.md: summary markdown per pass.
- tasks/lessons.md: HT contention rule — cpu_threads = physical cores,
rulează sweep nu single-point pentru ML inference compute-bound.
Budget updated în plan-uri: STT p50 1.5s → 2.5s, perceived 4s → 5s p50.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>