Files
echo-core/personality
Marius Mutu e589e4885e feat(voice): voice-mode prompt, isolated session, units, verbal voice swap, fast barge-in
Second voice UX iteration. Targets Marius's live-test pain points from today.

- **Voice-mode system prompt** (personality/VOICE_MODE.md, plumbed via
  claude_session.build_system_prompt(voice_mode=True)) — when the voice
  adapter starts a session, append voice-tailored instructions: short replies,
  no markdown, no abbreviations, time without seconds, distances rounded
  to "mii"/"milioane", no curly quotes / em-dash / ellipsis. Marius asked
  for a "in-the-car friend" persona for voice.

- **Isolated voice session key** (router.py) — voice mode uses
  `voice:<channel_id>` so it doesn't share context with the text adapter
  on the same Discord channel. Fresh start, voice prompt applied
  automatically without `/clear` ceremony. `/clear` drops both keys.

- **Metric units + Romanian thousands** (src/voice/normalize.py) —
  `384.000 km` was being read as "trei sute optzeci și patru virgulă zero
  zero zero km" because the dot was treated as decimal separator and `km`
  wasn't expanded. New `normalize_thousands` collapses Romanian thousands
  separators (`X.000`/`X.000.000`) before number expansion, and
  `expand_units` handles km/kg/cm/mm/ml/ha/mp with correct Romanian
  pluralization ("un kilometru", "două kilograme", "douăzeci de
  centimetri", "o sută de kilometri" with "de" particle).

- **`/voice setvoice <M1-F5>` slash command** (discord_voice.py) — Discord
  native autocomplete; swaps the live TTSQueue voice_id AND persists
  voice.default_voice to config.json. No restart needed.

- **Verbal voice change** (src/voice/voice_commands.py — new module +
  29 tests) — say "schimbă vocea pe M5" / "vorbește cu vocea F3" / "voce
  em cinci" from inside the voice channel. Detector requires both a
  trigger word (voce/vorbește/schimbă/treci pe) and a recognizable voice
  ID (direct "M5", word form "em cinci", or fallback substring match for
  Whisper-mangled forms like "unul cinci"=M5 and "Mâcinci"=M5). On
  detection: live-swap, persist to config, mirror to chat with
  `🎤 ... / 🔊 Voce → M5`, speak short ack in the NEW voice, skip
  Claude. "pământinci" still can't be recovered (no recoverable digit
  substring); user gets passthrough to Claude in that case.

- **Whisper initial_prompt** now lists the voice-command vocabulary so
  STT biases toward producing clean "M5" / "F3" tokens instead of
  inventing "pământ" / "unul" phonetic neighbors.

- **Fast barge-in** (pipeline.py EchoVoiceSink) — previously `ttsq.clear()`
  only fired in `on_segment_done` (after 800ms silence + 2-3s STT ≈ 3s lag).
  Now also fires from the sink as soon as VAD detects ≥2 consecutive
  windows (~200ms) of sustained speech on Marius's user while Echo has
  pending TTS frames. Single-window glitches don't cut Echo off; sustained
  speech does. (Acoustic echo bleed-through still requires headphones —
  no AEC in the bot.)

- Tests: 130 voice + router tests pass; updated test_router.py to expect
  `/clear` to drop both text and voice session keys.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 20:59:10 +00:00
..