Gemma 4 cloud audio was infeasible (31b-cloud has no audio; E4B broken
upstream, no deploy host), so improve faster-whisper instead.
- Pin temperature=0.0 to disable the fallback ladder that re-decoded unclear
audio up to 6x (source of the 16-24s latency outliers); reject hallucinated
segments via avg_logprob/compression_ratio in the new pure _filter_segments.
- Adopt mikr/whisper-small-ro-cv11 (CT2 int8) via configurable voice.stt_model:
spike showed WER 24%->10%, numbers fixed at source, +0.33s p50 (in budget).
- Add tools/voice_stt_mine.py (log mining) + tools/voice_stt_spike.py (model
eval with diacritic scoring) + tests for the gate and miner.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Expose a navigation layer to the agent and harden RAG, after analyzing the
OKF note and testing on the real KB.
- memory_search.search(): dedupe best-chunk-per-file (a relevant note can no
longer be buried by another file's chunks) + keyword fallback tagged
degraded:True when Ollama is unreachable (no more hard crash).
- update_notes_index.py: emit per-folder index.md + root router; prune empty
folders; fix latent subcategory->project bug.
- Exclude generated index.md from RAG rglob (reindex/incremental) + indexer
scans + heartbeat freshness check (prevents self-pollution / reindex thrash).
- CLAUDE.md: reframe memory as hybrid (navigation first, RAG for fuzzy recall).
- Delete stale orphan kb/youtube/index.json; correct the OKF source note.
- Tests: dedup, keyword fallback, index.md exclusion. Plan + review in docs/.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Voice utterances and text messages on the same Discord channel now share
one Claude session, and Echo's voice replies are mirrored back into the
text channel. Replaces the old voice:<id> session-key split.
Changes:
- src/adapters/_text_chunks.py: new leaf module for split_message
(used by both discord_bot and voice pipeline)
- src/router.py: drop voice: prefix from session_key; add [voice] marker;
strip leading [speaker:/[voice] tokens from user input (anti-jailbreak);
remove dead double-clear of voice: key
- src/claude_session.py: include personality/VOICE_MODE.md unconditionally
(rules become per-turn-aware via [speaker:] prefix instead of session flag)
- src/voice/pipeline.py: VoiceSession splits text_channel_id +
voice_channel_id; resolve text channel per-send (no stale refs); mirror
Echo's reply text into the text channel after route_message returns
- src/adapters/discord_voice.py: /voice join passes both channel ids
- src/adapters/discord_bot.py: import split_message from leaf module
- personality/VOICE_MODE.md: rewrite as per-turn dynamic rules;
add synthesis instructions for text turns after voice turns
Tests:
- tests/test_router.py: 4 new cases (plain channel_id, anti-jailbreak,
text-adapter regression, no-double-clear)
- tests/test_pipeline_mirror.py: new — Echo reply mirror chunking,
empty guard, mirror_enabled=False, send-raises resilience
- tests/test_voice_session_channel_ids.py: new — split-attr contract
+ metrics payload schema
- tests/test_voice_session_cleanup.py: update for new kwargs
Plan: /home/moltbot/.claude/plans/vreau-ca-tot-textul-greedy-rivest.md
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Squashed branch: voice/dave-recv → master. Closes Pas 12 (DAVE E2E) and lands
voice-mode UX polish + verbal voice control on top of the Pas 1-10 scaffolding
already on master.
## DAVE E2E receive-side decrypt (e4f3177)
Vendored fork: discord-ext-voice-recv 0.5.3a+echo.dave1. Patches the receive
pipeline to handle Discord's mandatory DAVE encryption on voice gateway v=8.
- `_maybe_dave_decrypt`: uses davey.can_passthrough(user_id) as primary gate,
falls through to dave.decrypt for DAVE-epoch peers, drops on decrypt failure
without killing the reader thread.
- VAD fix: silero-vad v5+ requires exactly 512 samples; our 100ms window
(1600 samples) was silently raising ValueError → STT never fired. Now slice
into 512-sample chunks.
- Whisper: bumped beam_size 1→5 and added RO initial_prompt.
- Tests: 11 DAVE unit tests + 2 callback integration tests + contract test
with fork-version guard.
## Voice UX polish (d1bc77e)
- Killed the 3s "mă gândesc" filler (always collided with Claude p50 4-7s).
- Barge-in via `ttsq.clear()` at top of `on_segment_done`.
- DTX silence-flush poller (200ms tick) — Discord stops sending RTP packets
when silent, so the inline silence-check in sink.write() never fired for
trailing audio; background thread handles it.
- `EchoStreamingAudioSource.read()` non-blocking — old `get_frame(timeout=0.1)`
wrecked Discord's 20ms cadence and the client interpreted bursts as
stuttering (Marius heard "4 de minute" instead of full sentence).
- RO time expansion: 23:09 → "douăzeci și trei și nouă minute".
- Supertonic Unicode sanitize centralized in tools/tts.py.
- Whisper local_files_only=True — no HF metadata GET on each startup.
- Diagnostic logging through sink → VAD → Claude stream → TTS chain.
## Voice mode iteration (e589e48)
- `personality/VOICE_MODE.md` — voice-tailored system prompt (short, no
markdown, no abbreviations, time without seconds, distances in
"mii"/"milioane"); plumbed via build_system_prompt(voice_mode=True).
- Isolated voice session key `voice:<channel_id>` — voice doesn't share
context with text adapter on the same channel; auto-applied without
/clear ceremony. /clear drops both keys.
- Metric units + Romanian thousands (normalize.py): "384.000 km" →
"trei sute optzeci și patru de mii de kilometri" with feminine-correct
pluralization and "de" particle for ≥20.
- `/voice setvoice <M1-F5>` slash command with native autocomplete; swaps
live + persists voice.default_voice to config.json.
- Verbal voice change (src/voice/voice_commands.py + 29 tests) — "schimbă
vocea pe M5", "voce em cinci", with permissive substring fallback for
Whisper-mangled forms like "Mâcinci"=M5 and "unul cinci"=M5. Whisper
initial_prompt now lists voice vocabulary to bias STT toward clean
outputs.
- Fast barge-in: VAD ≥2 consecutive windows (~200ms) on Marius's user
while Echo has pending TTS frames → cut him off mid-sentence so user
doesn't wait the full silence + STT cycle. Acoustic echo bleed-through
still requires headphones (no AEC).
## Test suite
130 voice + router tests pass (test_voice_recv_dave, test_voice_session_cleanup,
test_voice_adapter_contract, test_voice_normalize, test_voice_commands,
test_router).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
src/voice/tts_stream.py (~280 lines):
- clause_segments(text, min_words=8): yield Romanian-aware clause chunks.
Split la punct (./!/?;:,) cu accumulation până min_words satisfied;
edge case text < min_words → single chunk. NU split mid-word/number/
currency. Romanian intonație de frază se rupe la sentence break — 8+
words minimizează seams.
- TTSQueue worker thread: text queue in → PCM frames out. Methods:
start/stop/push_text/push_filler/clear/is_empty. normalize_for_tts()
apply first, then clause_segments(), then Supertonic synth per chunk.
- EchoStreamingAudioSource(discord.AudioSource): read() pull from PCM
queue, 20ms frames (3840 bytes 48kHz s16le stereo). Eliminates RTP
gap between play() calls — single play() per session, source pulls.
- load_thinking_wav(): one-shot cache → 140 × 20ms frames (~2.8s)
pentru filler "Stai puțin să-mi adun gândurile".
- wav_to_pcm_20ms_frames(): WAV parser + ffmpeg subprocess resample
la 48kHz s16le stereo dacă nevoie.
Smoke test (în session): clause_segments behaviour OK, thinking.wav
loads, TTSQueue + EchoStreamingAudioSource construct clean. Integration
testing deferred la convergență (Pas 7 + Pas 11).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix arhitectural general (beneficiu și pentru text adapters), nu doar voice.
src/claude_session.py:
- _session_locks: dict[str, threading.Lock] cu bootstrap lock pentru
lazy creation thread-safe.
- _get_session_lock(channel_id) helper.
- send_message() body wrapped în with _get_session_lock(channel_id).
- threading.Lock (NU asyncio.Lock) — send_message e sync subprocess.run
blocking; asyncio.Lock nu protejează cod sync rulat via to_thread.
- Per-channel granularity preserved — different channels run în paralel.
- send_message() public signature unchanged.
src/router.py:
- route_message(): dacă adapter_name == "discord-voice", prepend
[speaker:<user_name>] prefix (Config.get("voice.user_name", "user")).
- Original text variable left untouched for downstream paths.
- Text adapters: zero behavior change.
- route_message() public signature unchanged.
tests/test_claude_session_mutex.py — 6 tests REGRESSION-CRITICAL:
- same channel serializes (concurrent → mutex serializes, no overlap)
- same channel lock identity (same dict entry per channel_id)
- different channels run in parallel (overlap MUST fire)
- 3 channels all overlap
- contested acquire blocks then proceeds (policy: blocking, not fail-fast)
- lock released on subprocess exception (no deadlock on crash)
Acquisition policy: BLOCKING acquire bound by claude --timeout (5min default)
nu fail-fast — adapters already serialize via asyncio.to_thread queue, un
non-blocking acquire ar surface transient busy errors.
Test results: 82 passed (51 existing + 31 new). 2 PRE-EXISTING failures în
TestPromptInjectionProtection (stale assertion vs current prompt text) —
out of scope, recomand ticket separat.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure functions pentru TTS text normalization (RO):
- strip_markdown: regex bold/italic/code/link/heading/list
- expand_numbers_ro: num2words pentru cardinals + decimal handling
("3.14" → "trei virgulă paisprezece", "3.05" → "trei virgulă zero
cinci" digit-by-digit la leading zero)
- expand_currency: formă naturală RO ("12.50 RON" → "doisprezece lei
și cincizeci de bani", "$25.99" → "douăzeci și cinci de dolari și
nouăzeci și nouă de cenți")
- expand_symbols: %/&/@/° + whitespace collapse
- expand_abbreviations: etc./dl./dna./nr./ş.a./ş.a.m.d.
- normalize_for_tts: full pipeline + hard truncate 200 cuvinte cu
"Restul l-am scris în chat."
Pipeline order: markdown → abbreviations → currency → numbers →
symbols → truncate. Currency BEFORE numbers — altfel "12.50 RON" se
degradează la "doisprezece virgulă cincizeci RON". Romanian "de"
particle rule: n>=20 AND (n%100 not in 1..19) → "o sută de lei",
"o sută cinci lei" (no "de"). n=1 with currency → "un dolar" /
"un leu" (article, nu cardinal).
35/35 tests pass: markdown(5), cardinals(6), decimals(4), currency
RON/USD/EUR/GBP mix(8), symbols(4), abbreviations(4), truncation(2),
edge cases empty/whitespace(2).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Parametrul `lang` era definit (DEFAULT_LANG = "ro") dar nu era inclus
in request-ul HTTP catre /v1/audio/speech. Adaugat "lang": lang in
body-ul JSON si lang="ro" explicit in _tts_synthesize().
OpenAPI-ul Supertonic confirma ca /v1/audio/speech accepta `lang`
ca parametru optional (OpenAISpeechRequest schema).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two structural fixes that together let users manage feature-branch
work without manual intervention:
Approval guard — `/plan/start` returns 409 `already_committed` if the
project status is approved/running/complete, unless the body opts in
with `force=true`. Frontend now renders "Re-planifică" instead of
"Planifică" on approved cards and gates it behind a confirm dialog
that threads `force=true` through. Prevents an accidental click from
wiping `status=approved` and burning a fresh planning subprocess.
Worktree awareness — projects can now declare that they target a
feature branch on an existing Gitea repo, not a repo-per-slug clone.
Three optional fields added to approved-tasks.json: `repo` (default
= slug), `branch` (feature branch to create), `base_branch` (default
main). Wired through `/p` flag parser in router.py, the dashboard
Propose modal's new "Avansat" section, and the night-execute prompt
which clones {repo} and creates {branch} from {base_branch} before
running ralph.
CLAUDE.md updated with both flows + the new schema fields.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three fixes that together restore the planning UX:
- Dashboard reopen showed only a 500-char truncated excerpt of the last
assistant message. Backend now reads the Claude session JSONL directly
and returns full per-turn history; frontend iterates and renders all
bubbles, falling back to last_text_excerpt when the JSONL is missing.
- Phases never advanced because the agent ran /plan-* skills inline as
tool calls and the marker protocol was loose. Tightened the planning
prompt (mandatory PHASE_STATUS marker on the last line of every turn,
ban on inline phase invocation), and the frontend now auto-calls
/plan/advance when phase_ready=true.
- The phase strip never showed visual state because data-phase values
("office-hours") didn't match orchestrator phase names ("/office-hours").
Added normalizePhase + cleanup of PHASE_STATUS markers from rendered
bubbles.
Also bumps eco.py session-content truncation from 2k to 20k so /eco
session views aren't cut mid-response either.
Bumps last_text_excerpt fallback in planning_session.py from 500 to
50_000 so even when the JSONL is unavailable, the bubble isn't sliced
mid-word.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Înainte: click pe 🧠 Planifică (Discord/Telegram) sau /plan <slug> fără descriere
pe un proiect din workspace fără entry în approved-tasks.json → mesaj eroare
"Adaugă mai întâi cu /p <slug> <descriere>" și user-ul trebuia să facă două
operații.
Acum:
- Discord button "Planifică" cu descriere goală → deschide RalphPlanModal cu
TextInput pentru descriere; on_submit pornește direct start_planning_session
- Discord /plan <slug> fără description param și fără entry → același modal
(response.send_modal ÎNAINTE de defer — Discord constraint)
- Telegram callback "Planifică" cu descriere goală → set state
STEP_INPUT_DESCRIPTION_THEN_PLAN + ForceReply; handle_message detectează
step și pornește planning cu textul user-ului
- ralph_flow.py: nou STEP_INPUT_DESCRIPTION_THEN_PLAN (alături de cel existent
pentru propose-only)
start_planning_session deja auto-creează entry în approved-tasks.json dacă
proiectul lipsește, deci flow-ul e end-to-end: workspace → click → descriere
→ planning agent activ.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Echo Core devine planning agent: poartă o conversație multi-fază cu Marius
folosind skill-urile gstack (/office-hours → /plan-ceo-review →
/plan-eng-review → /plan-design-review opt) și produce final-plan.md în
~/workspace/<slug>/scripts/ralph/, gata să fie consumat de Ralph PRD
generator (W3) noaptea.
Decizii arhitecturale (din eng review + spike findings):
- PlanningSession ca clasă SEPARATĂ de chat-ul main (NU mode=string param)
— separation explicit. claude_session.py rămâne strict pentru chat;
planning trăiește în src/planning_session.py + src/planning_orchestrator.py.
Inheritance literală nu se aplică (claude_session.py expune funcții
module-level, nu o clasă) — separation e satisfacută prin module distinct.
- Fresh subprocess PER skill phase, NU single resumed session — phase-urile
coordinează via disk artifacts (gstack convention în
~/.gstack/projects/<slug>/). Avoids context window growth.
- --max-turns 20 default + retry pe error_max_turns la --max-turns 30.
Spike a arătat că prompt-uri complexe pot exploda turn budget-ul.
- approved-tasks.json schema extins cu planning_session_id + final_plan_path
(Status flow: pending → planning → approved → running → complete).
- State separat în sessions/planning.json (NU active.json), keyed pe
(adapter, channel_id) pentru re-resume la restart echo-core.
Trigger-e:
- Discord: slash command /plan <slug> [descriere] cu autocomplete pe pending,
buton "🧠 Planifică" în RalphProjectView, și /cancel slash command.
- Telegram: /plan + /cancel commands, plus buton "🧠 Planifică" în
ralph project keyboard.
- Router: state-aware routing — dacă chat-ul e în planning, mesajele plain
trec la PlanningOrchestrator.respond() prin --resume; /cancel revine la
status pending; /advance / "Continuă faza" advance fază nouă (fresh
subprocess); /finalize sau "Dau drumul" promote la status approved.
Discord defer pattern: toate butoanele noi (PlanningActiveView,
PlanningFinalView, "🧠 Planifică") apelează await
interaction.response.defer(ephemeral=True) ÎNAINTE de orice IO — evită
"Interaction failed" pe IO >3s.
UX strings warm + colaborativ (per design review): "🧠 Pornesc planning
pentru ...", "Răspunde aici", "Continuă faza", "Dau drumul tonight",
"Anulează" — niciun "Submit/Approve/Cancel" generic.
Tests: 23 noi (test_planning_session, test_planning_orchestrator,
test_router_planning) — toate pass. Mock pe _run_claude pentru a evita
subprocess Claude real în CI.
Files new:
prompts/planning_agent.md
src/planning_session.py
src/planning_orchestrator.py
tests/test_planning_session.py
tests/test_planning_orchestrator.py
tests/test_router_planning.py
Files modified:
src/claude_session.py — _run_claude(cwd=...) optional + surface subtype/is_error
src/router.py — state-aware routing, start_planning_session, planning_advance/approve/cancel, _ralph_propose schema cu planning_session_id + final_plan_path
src/adapters/discord_bot.py — /plan + /cancel slash commands; planning views imported
src/adapters/discord_views.py — PlanningActiveView, PlanningFinalView, "Planifică" button în RalphProjectView, _split_chunks helper
src/adapters/telegram_bot.py — /plan + /cancel handlers, callback_ralph extins cu plan/planadvance/plancancel/planapprove, planning keyboards
Status testelor pe modulele atinse: 75 passed, 0 failed
(test_claude_session security_section preexistent — neatins).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restructurează comenzile Ralph într-un dispatcher unificat (_try_ralph_dispatch)
care suportă atât comenzile noi scurte (/p /a /l /k) cât și aliasurile legacy
(!propose !approve !status !stop). Pe Discord adaugă slash commands native cu
autocomplete dinamic pentru pending (/a) și running (/k). Pe Telegram apar în
meniul /. WhatsApp le parsează ca text plain.
Activează cron jobs morning-report (08:30) și evening-report (21:00) și adaugă
night-execute (23:00) pentru execuția autonomă a proiectelor aprobate.
Foundation pentru W1 din planul "Echo Core conversational planning agent".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Uncommitted files alone are not an actionable heartbeat alert.
Only send a message if there are other findings besides git status.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Heartbeat is now handled exclusively by the Claude-based cron job
(heartbeat-2h in jobs.json), which is more flexible.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- AsyncIOScheduler now runs in Europe/Bucharest so cron strings in jobs.json
match local wall-clock time.
- New add_shell_job() validates name, cron, command list, channel, report_on
(always|changes|never), and optional timeout (1..3600s). Existing add_job()
stays untouched for the Claude path.
- _execute_job dispatches on job['kind'] (default 'claude'); legacy jobs
without the field still route to the Claude executor. Refactored the
Claude path into _execute_claude_job; new _execute_shell_job runs
subprocess with _safe_env + PROJECT_ROOT cwd.
- Shell semantics: non-zero exit always forwards stderr (trimmed to 500 ch)
as '[cron:NAME] exit CODE: STDERR' regardless of report_on. On exit 0,
'always' forwards stdout (trimmed to 1500 ch), 'never' stays silent, and
'changes' parses the GSTACK-CRON marker (^GSTACK-CRON: changes=\d+$) and
forwards stdout only when N>0; missing/malformed marker logs a warning
and stays silent.
- Timeout honoured per-job (falls back to JOB_TIMEOUT=300s).
Digest summarizes unread emails via Claude CLI; forward sends raw
content (split to 4096 chars). Wired as /email digest and
/email forward slash commands, plus instant per-guild sync on ready.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Beehiiv redirects non-existent newsletters to /?404=... with HTTP 302.
With follow_redirects=True, the final 200 was misread as "newsletter exists".
Fix: disable redirect following so only a direct HTTP 200 = newsletter real.
Also reset state back to last_sent=13 (real).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- heartbeat saves unread whitelisted emails via email_process --save --json
- fix: add --add-dir so Claude CLI subprocess can access memory/ symlink
- email_check/process: use BODY.PEEK[] to avoid marking emails as read
- email_process: simplify credential loading via credential_store only
- config: heartbeat interval 30→120min, quiet hours end 08→07
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
KB and embeddings reindex status messages were being sent to Discord
as heartbeat results. These are internal housekeeping — now logged
instead of surfaced to the user.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Switch Bash permission patterns from space to colon separator
- Add memory.bak/ to .gitignore
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- claude_session: replace 10 individual git command patterns with single Bash(git *) wildcard
- generate_pdf: add italic/bold-oblique font loading and render_rich_text() for inline bold/italic
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fast commands for git, email, calendar, notes, search, reminders, and
diagnostics — all execute instantly without Claude CLI. Incremental
embeddings indexing in heartbeat (1h cooldown) + inline indexing after
/note, /jurnal, /email save. Fix Ollama URL (localhost → 10.0.20.161),
fix email_process.py KB path (kb/ → memory/kb/).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Calendar no longer bypasses quiet hours. First run after quiet hours
sends full daily summary, subsequent runs only remind for next event
within 45 min with deduplication. Calendar cooldown set to 30 min.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Heartbeat system overhaul:
- Fix email/calendar checks to parse JSON output correctly
- Add per-check cooldowns and quiet hours config
- Send findings to Discord channel instead of just logging
- Auto-reindex KB when stale files detected
- Claude CLI called only if HEARTBEAT.md has extra instructions
- All settings configurable via config.json heartbeat section
Move hardcoded values to config.json:
- allowed_tools list (claude_session.py)
- Ollama URL/model (memory_search.py now reads ollama.url from config)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>