echo-core

romfast/echo-core

Fork 0

Commit Graph

Author	SHA1	Message	Date
Marius Mutu	e79bed7afe	feat(voice): unify Discord voice↔text session (squash of voice/text-unify) Voice utterances and text messages on the same Discord channel now share one Claude session, and Echo's voice replies are mirrored back into the text channel. Replaces the old voice:<id> session-key split. Changes: - src/adapters/_text_chunks.py: new leaf module for split_message (used by both discord_bot and voice pipeline) - src/router.py: drop voice: prefix from session_key; add [voice] marker; strip leading [speaker:/[voice] tokens from user input (anti-jailbreak); remove dead double-clear of voice: key - src/claude_session.py: include personality/VOICE_MODE.md unconditionally (rules become per-turn-aware via [speaker:] prefix instead of session flag) - src/voice/pipeline.py: VoiceSession splits text_channel_id + voice_channel_id; resolve text channel per-send (no stale refs); mirror Echo's reply text into the text channel after route_message returns - src/adapters/discord_voice.py: /voice join passes both channel ids - src/adapters/discord_bot.py: import split_message from leaf module - personality/VOICE_MODE.md: rewrite as per-turn dynamic rules; add synthesis instructions for text turns after voice turns Tests: - tests/test_router.py: 4 new cases (plain channel_id, anti-jailbreak, text-adapter regression, no-double-clear) - tests/test_pipeline_mirror.py: new — Echo reply mirror chunking, empty guard, mirror_enabled=False, send-raises resilience - tests/test_voice_session_channel_ids.py: new — split-attr contract + metrics payload schema - tests/test_voice_session_cleanup.py: update for new kwargs Plan: /home/moltbot/.claude/plans/vreau-ca-tot-textul-greedy-rivest.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 14:24:15 +00:00
Marius Mutu	23666f7910	feat(voice): Pas 5 — voice/pipeline.py VoiceSession + EchoVoiceSink + cleanup Central voice pipeline (~250 LOC + docstrings = ~430 lines): VoiceSession (context manager + idempotent cleanup pe 5 căi): - __enter__: acquire _lock, open JSONL (record=on) - __exit__: calls cleanup("exit"), nu suprimă exceptions - cleanup(reason): IDEMPOTENT, side effects o singură dată — JSONL flush+close (record=on) sau delete (record=off), bot presence cleared, voice_client.cleanup(), ttsq.stop(), cancel filler task, lock release, structured log la logs/voice_metrics.jsonl - on_segment_done(speaker_id, text, no_speech_prob): mirror text channel, append JSONL, arm 3s filler timer, route_message cu on_text callback + cancel filler la first block - last_activity_ts: time.monotonic() — caller-driven 5min auto-leave EchoVoiceSink(session, bot_user_id): - wants_opus() False (PCM) - write() runs în voice_recv reader thread (threading primitives only): - GUARD 1: user None/id==0/id==bot_user_id → return (load-bearing echo prevention) - GUARD 2: whitelist filter (empty = allow all) - Buffer 20ms packets per-user → batch 100ms (5×20ms = 19200 bytes) → silero-vad threshold 0.5 → 800ms cumulative silence flush - _flush_to_stt: faster-whisper small int8 cpu_threads=4 lang=ro beam_size=1, no_speech_prob > 0.6 drop, schedule on_segment_done via run_coroutine_threadsafe pe session.loop Module helpers (lazy thread-safe singletons): _get_whisper_model, _get_silero_vad. Constants: FILLER_DELAY_S=3.0, SILENCE_FLUSH_MS=800, VAD_THRESHOLD=0.5, VAD_WINDOW_MS=100, NO_SPEECH_DROP_THRESHOLD=0.6. Decisions: - STT runs in audio thread — acceptable la 2.25s p50 (user just stopped talking, no batching contention). Wrap în ThreadPoolExecutor.submit if perf bites later. - Downsample 48k→16k via 3-sample averaging (no scipy dep). Whisper robust la mild aliasing. - Energy-RMS VAD fallback dacă torch import fail — graceful degrade. - router_route_message injection seam ca kwarg pentru testabilitate. - bot.change_presence handling cross-thread via run_coroutine_threadsafe. tests/test_voice_session_cleanup.py — 6 tests: - voice_leave / disconnect / crash via __exit__ / auto_leave / user_left_channel (5 cleanup paths each verified for: JSONL state, presence cleared, voice_client.cleanup, ttsq.stop, lock release, idempotency) - 1 robustness cross-cut (double-cleanup safety) 6/6 PASS. Regression suite 63/63 PASS (normalize + adapter + mutex). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 14:55:57 +00:00

Author

SHA1

Message

Date

Marius Mutu

e79bed7afe

feat(voice): unify Discord voice↔text session (squash of voice/text-unify)

Voice utterances and text messages on the same Discord channel now share
one Claude session, and Echo's voice replies are mirrored back into the
text channel. Replaces the old voice:<id> session-key split.

Changes:
- src/adapters/_text_chunks.py: new leaf module for split_message
  (used by both discord_bot and voice pipeline)
- src/router.py: drop voice: prefix from session_key; add [voice] marker;
  strip leading [speaker:/[voice] tokens from user input (anti-jailbreak);
  remove dead double-clear of voice: key
- src/claude_session.py: include personality/VOICE_MODE.md unconditionally
  (rules become per-turn-aware via [speaker:] prefix instead of session flag)
- src/voice/pipeline.py: VoiceSession splits text_channel_id +
  voice_channel_id; resolve text channel per-send (no stale refs); mirror
  Echo's reply text into the text channel after route_message returns
- src/adapters/discord_voice.py: /voice join passes both channel ids
- src/adapters/discord_bot.py: import split_message from leaf module
- personality/VOICE_MODE.md: rewrite as per-turn dynamic rules;
  add synthesis instructions for text turns after voice turns

Tests:
- tests/test_router.py: 4 new cases (plain channel_id, anti-jailbreak,
  text-adapter regression, no-double-clear)
- tests/test_pipeline_mirror.py: new — Echo reply mirror chunking,
  empty guard, mirror_enabled=False, send-raises resilience
- tests/test_voice_session_channel_ids.py: new — split-attr contract
  + metrics payload schema
- tests/test_voice_session_cleanup.py: update for new kwargs

Plan: /home/moltbot/.claude/plans/vreau-ca-tot-textul-greedy-rivest.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-28 14:24:15 +00:00

Marius Mutu

23666f7910

feat(voice): Pas 5 — voice/pipeline.py VoiceSession + EchoVoiceSink + cleanup

Central voice pipeline (~250 LOC + docstrings = ~430 lines):

VoiceSession (context manager + idempotent cleanup pe 5 căi):
- __enter__: acquire _lock, open JSONL (record=on)
- __exit__: calls cleanup("exit"), nu suprimă exceptions
- cleanup(reason): IDEMPOTENT, side effects o singură dată — JSONL
  flush+close (record=on) sau delete (record=off), bot presence cleared,
  voice_client.cleanup(), ttsq.stop(), cancel filler task, lock release,
  structured log la logs/voice_metrics.jsonl
- on_segment_done(speaker_id, text, no_speech_prob): mirror text channel,
  append JSONL, arm 3s filler timer, route_message cu on_text callback
  + cancel filler la first block
- last_activity_ts: time.monotonic() — caller-driven 5min auto-leave

EchoVoiceSink(session, bot_user_id):
- wants_opus() False (PCM)
- write() runs în voice_recv reader thread (threading primitives only):
  - GUARD 1: user None/id==0/id==bot_user_id → return (load-bearing
    echo prevention)
  - GUARD 2: whitelist filter (empty = allow all)
  - Buffer 20ms packets per-user → batch 100ms (5×20ms = 19200 bytes)
    → silero-vad threshold 0.5 → 800ms cumulative silence flush
  - _flush_to_stt: faster-whisper small int8 cpu_threads=4 lang=ro
    beam_size=1, no_speech_prob > 0.6 drop, schedule on_segment_done
    via run_coroutine_threadsafe pe session.loop

Module helpers (lazy thread-safe singletons): _get_whisper_model,
_get_silero_vad. Constants: FILLER_DELAY_S=3.0, SILENCE_FLUSH_MS=800,
VAD_THRESHOLD=0.5, VAD_WINDOW_MS=100, NO_SPEECH_DROP_THRESHOLD=0.6.

Decisions:
- STT runs in audio thread — acceptable la 2.25s p50 (user just stopped
  talking, no batching contention). Wrap în ThreadPoolExecutor.submit
  if perf bites later.
- Downsample 48k→16k via 3-sample averaging (no scipy dep). Whisper
  robust la mild aliasing.
- Energy-RMS VAD fallback dacă torch import fail — graceful degrade.
- router_route_message injection seam ca kwarg pentru testabilitate.
- bot.change_presence handling cross-thread via run_coroutine_threadsafe.

tests/test_voice_session_cleanup.py — 6 tests:
- voice_leave / disconnect / crash via __exit__ / auto_leave /
  user_left_channel (5 cleanup paths each verified for: JSONL state,
  presence cleared, voice_client.cleanup, ttsq.stop, lock release,
  idempotency)
- 1 robustness cross-cut (double-cleanup safety)

6/6 PASS. Regression suite 63/63 PASS (normalize + adapter + mutex).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-27 14:55:57 +00:00

2 Commits