Update cron, dashboard, root +3 more (+1 ~11)

2026-05-28 20:21:28 +00:00
parent e79bed7afe
commit 0ce8a5a04d
12 changed files with 217 additions and 51 deletions
--- a/TODOS.md
+++ b/TODOS.md
@@ -0,0 +1,34 @@
 # TODOS — Echo Core deferred work
 Captured during planning reviews. Re-evaluate after relevant features ship or dogfood data accumulates.
 ## Voice
 ### Bounded SSRC buffer for DAVE-active unknown-SSRC race
 **What:** Replace the hard-drop of unknown-SSRC RTP packets in `_maybe_dave_decrypt` (vendor/discord-ext-voice-recv/.../reader.py) with a small bounded buffer per SSRC. Flush on SPEAKING event mapping the SSRC → user_id, then DAVE-decrypt and feed downstream.
 **Why:** voice-recv vanilla feeds unknown-SSRC packets to opus decoder anyway (reader.py:178 logs `info` but still calls `feed_rtp`). The DAVE patch turns this into a hard drop because davey requires `user_id`. Net regression: 40-200ms (1-5 packets) lost on the FIRST utterance of each new speaker per session, when audio races ahead of SPEAKING event. Subsequent utterances unaffected.
 **Pros:** Eliminates first-utterance audio loss. Whisper STT gets the complete prefix ("Echo, cât e ceasul?" instead of possibly "co, cât e ceasul?").
 **Cons:** New state machine — queue per SSRC, TTL flush (~2s), ordering preservation, memory bound. New race surface between socket-reader thread (queueing) and asyncio loop (SPEAKING event → flush). 50 packets * ~1KB * N concurrent unknown SSRCs = memory footprint. Bug risk traded for UX win.
 **Context:** Discovered during /plan-eng-review on `/home/moltbot/.claude/plans/wiggly-exploring-glade.md` (DAVE receive-side decrypt patch). Outside-voice reviewer flagged this as a regression vs voice-recv vanilla behavior. Accepted as tradeoff for v1 because SPEAKING typically arrives before audio in normal Discord flow — impact may be rare. **Depends on:** dogfood data from Pas 12 Etapa 2 #3-#13 confirming this IS observed in practice (i.e., Whisper transcripts repeatedly missing first word). If not observed, this TODO stays permanent. If observed in 3+ sessions, escalate.
 **Where to start:** `_maybe_dave_decrypt` in `vendor/discord-ext-voice-recv/discord/ext/voice_recv/reader.py`. Add `_pending_packets: dict[ssrc, deque[bytes]]` on `AudioReader`. Hook SPEAKING event handler in voice_client.py to call new `flush_pending(ssrc, user_id)` method.
 **Depends on / blocked by:** Pas 12 dogfood data. Re-evaluate after 3+ sessions of live use.
 ---
 ## (Other deferred items from voice review — already in plan's "Out of scope" section)
 - Wake-word "Echo" cu porcupine (P3 — incompatible with /voice join continuous)
 - Telegram voice memo bidirectional (P2 — reuses src/voice/pipeline.py)
 - Full-session WAV recording (P3 — KB transcript sufficient v1)
 - Upstreaming the DAVE patch to imayhaveborkedit/discord-ext-voice-recv (separate community effort)
 - `threading.Lock` around davey.decrypt (conditional follow-up — only if dogfood reveals crashes)
 - DAVE verification UI (`voice_privacy_code`, pairwise fingerprints — useful but not blocking voice-to-voice)
 - Video E2E decrypt (Echo is audio-only, no video pipeline)
 - Pre-existent test failures: TestPromptInjectionProtection × 2 + TestOnMessage × 4 (separate ticket)
--- a/config.json
+++ b/config.json
@@ -109,7 +109,7 @@
      "949388626146517022"
    ],
    "user_name": "Marius",
-    "default_voice": "M5",
+    "default_voice": "M2",
    "auto_leave_minutes": 5
  },
  "paths": {
--- a/cron/jobs.json
+++ b/cron/jobs.json
--- a/cron/newsletter-cercetasi-state.json
+++ b/cron/newsletter-cercetasi-state.json
@@ -1,5 +1,5 @@
 {
-  "last_sent": 19,
+  "last_sent": 20,
  "year": 2026,
-  "last_sent_at": "2026-05-21T17:00:58.795355+00:00"
+  "last_sent_at": "2026-05-28T20:05:22.628304+00:00"
 }
--- a/dashboard/habits.json
+++ b/dashboard/habits.json
@@ -1,5 +1,5 @@
 {
-  "lastUpdated": "2026-04-29T05:30:59.129949",
+  "lastUpdated": "2026-05-27T15:16:49.070154",
  "habits": [
    {
      "id": "95c15eef-3a14-4985-a61e-0b64b72851b0",
@@ -17,7 +17,7 @@
      "streak": {
        "current": 1,
        "best": 6,
-        "lastCheckIn": "2026-03-31"
+        "lastCheckIn": "2026-05-27"
      },
      "lives": 2,
      "completions": [
@@ -56,10 +56,14 @@
        {
          "date": "2026-03-31",
          "type": "check"
        },
        {
          "date": "2026-05-27",
          "type": "check"
        }
      ],
      "createdAt": "2026-02-11T00:54:03.447063",
-      "updatedAt": "2026-03-31T19:39:08.013266",
+      "updatedAt": "2026-05-27T15:16:49.070154",
      "lastLivesAward": "2026-02-23"
    },
    {
--- a/src/adapters/discord_bot.py
+++ b/src/adapters/discord_bot.py
@@ -15,7 +15,7 @@ from src.claude_session import (
    PROJECT_ROOT,
    VALID_MODELS,
 )
-from src.fast_commands import dispatch as fast_dispatch
+from src.fast_commands import dispatch as fast_dispatch, split_text_chunks, extract_url_text
 from src.router import (
    route_message,
    _ralph_propose,
@@ -916,6 +916,37 @@ def create_bot(config: Config) -> discord.Client:
        rezumat: bool = False,
    ) -> None:
        await interaction.response.defer()
        voice = voce or "M2"
        # URL fără rezumat → fetch + split în chunks + trimite pe rând
        if text_sau_url and text_sau_url.startswith("http") and not rezumat:
            text = await asyncio.to_thread(extract_url_text, text_sau_url)
            if not text:
                await interaction.followup.send("Nu am putut extrage text din URL.")
                return
            chunks = split_text_chunks(text, max_chars=1500)
            total = len(chunks)
            for i, chunk in enumerate(chunks, 1):
                result = await asyncio.to_thread(fast_dispatch, "audio", [voice, chunk])
                if result and result.startswith("__AUDIO__:"):
                    wav_path = result[len("__AUDIO__:"):]
                    try:
                        filename = f"echo-audio-{i}din{total}.wav" if total > 1 else "echo-audio.wav"
                        await interaction.followup.send(
                            content=f"Bucata {i}/{total}" if total > 1 else None,
                            file=discord.File(wav_path, filename=filename),
                        )
                    finally:
                        try:
                            os.unlink(wav_path)
                        except OSError:
                            pass
                else:
                    await interaction.followup.send(result or f"Eroare TTS la bucata {i}.")
                    return
            return
        # Comportament existent: text direct, gol, sau rezumat URL
        args: list[str] = []
        if voce:
            args.append(voce)
--- a/src/adapters/discord_voice.py
+++ b/src/adapters/discord_voice.py
@@ -285,6 +285,23 @@ def register(tree: app_commands.CommandTree, bot: discord.Client) -> app_command
            msg = f"Default voce setată {new_voice}. Va intra în vigoare la următorul /voice join."
        await interaction.followup.send(msg, ephemeral=True)
    @voice_group.command(name="stop", description="Oprește audio-ul curent (golește coada TTS)")
    async def stop_audio(interaction: discord.Interaction) -> None:
        await interaction.response.defer(ephemeral=True)
        guild_id = interaction.guild.id if interaction.guild else None
        session = _voice_sessions.get(guild_id) if guild_id is not None else None
        if session is None or session.ttsq is None:
            await interaction.followup.send("Nu sunt în voice.", ephemeral=True)
            return
        try:
            session.ttsq.clear()
            log.info("voice stop: TTS queue cleared by user %s", interaction.user)
        except Exception as e:
            log.warning("voice stop: ttsq.clear failed: %s", e)
            await interaction.followup.send(f"Eroare la oprire: {e}", ephemeral=True)
            return
        await interaction.followup.send("Audio oprit.", ephemeral=True)
    @voice_group.command(name="doctor", description="Verifică voice stack")
    async def doctor(interaction: discord.Interaction) -> None:
        await interaction.response.defer(ephemeral=True)
--- a/src/fast_commands.py
+++ b/src/fast_commands.py
@@ -812,6 +812,51 @@ def _tts_synthesize(text: str, voice: str) -> dict:
        return {"ok": False, "error": str(e)}
 def split_text_chunks(text: str, max_chars: int = 1500) -> list[str]:
    """Împarte text în chunks pe paragrafe fără a depăși max_chars."""
    import re as _re
    paragraphs = [p.strip() for p in text.split("\n\n") if p.strip()]
    if not paragraphs:
        paragraphs = [p.strip() for p in text.split("\n") if p.strip()]
    chunks: list[str] = []
    current_parts: list[str] = []
    current_len = 0
    for para in paragraphs:
        if len(para) > max_chars:
            if current_parts:
                chunks.append("\n\n".join(current_parts))
                current_parts = []
                current_len = 0
            sentences = _re.split(r'(?<=[.!?])\s+', para)
            for sent in sentences:
                if current_len + len(sent) + 1 > max_chars and current_parts:
                    chunks.append(" ".join(current_parts))
                    current_parts = [sent]
                    current_len = len(sent)
                else:
                    current_parts.append(sent)
                    current_len += len(sent) + 1
        elif current_len + len(para) + 2 > max_chars and current_parts:
            chunks.append("\n\n".join(current_parts))
            current_parts = [para]
            current_len = len(para)
        else:
            current_parts.append(para)
            current_len += len(para) + 2
    if current_parts:
        chunks.append("\n\n".join(current_parts))
    return chunks if chunks else [text[:max_chars]]
 def extract_url_text(url: str) -> str | None:
    """Extrage textul principal dintr-un URL (publică)."""
    return _extract_url_text(url)
 def _extract_url_text(url: str) -> str | None:
    """Extrage textul principal dintr-un URL cu trafilatura."""
    try:
--- a/src/voice/pipeline.py
+++ b/src/voice/pipeline.py
@@ -53,6 +53,24 @@ NO_SPEECH_DROP_THRESHOLD = 0.6
 PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent
 LOGS_DIR = PROJECT_ROOT / "logs"
 VOICE_METRICS_PATH = LOGS_DIR / "voice_metrics.jsonl"
 VOICE_STT_LOG_PATH = LOGS_DIR / "voice_stt_log.jsonl"
 _stt_log_lock = threading.Lock()
 def _append_stt_log(entry: dict) -> None:
    """Append one Whisper transcript to ``voice_stt_log.jsonl``.
    Separate from ``record_enabled``/``transcripts_jsonl_path`` (which feed
    KB). This log is always-on, scoped to STT debugging — used to mine
    code-switching mistranscriptions (English words in Romanian flow) over
    several days and build a personal vocabulary correction table.
    """
    try:
        LOGS_DIR.mkdir(parents=True, exist_ok=True)
        with _stt_log_lock, VOICE_STT_LOG_PATH.open("a", encoding="utf-8") as f:
            f.write(json.dumps(entry, ensure_ascii=False) + "\n")
    except Exception as e:  # noqa: BLE001
        log.debug("STT log write failed: %s", e)
 # ---------- Lazy model singletons ----------
@@ -100,24 +118,31 @@ def _get_silero_vad():
 def _pcm48_stereo_to_16_mono(pcm: bytes) -> np.ndarray:
    """Discord 48kHz s16le stereo bytes -> 16kHz mono float32 in [-1, 1].
-    Cheap downsample: average the two channels, then average every 3
+    Mix channels to mono, then resample 48k→16k with torchaudio's polyphase
-    samples (48k / 3 = 16k). faster-whisper + silero-vad accept the
+    Kaiser-windowed sinc (``lowpass_filter_width=16``) instead of a naive
-    resulting ``np.float32`` array directly.
+    every-3-samples average. The previous decimation had no anti-aliasing,
    which folded HF content (sibilants, fricatives) back into the
    speech band and degraded Whisper's accuracy on short wake phrases
    like "Salut, Eco". faster-whisper + silero-vad accept the resulting
    ``np.float32`` array directly.
    """
    if not pcm:
        return np.zeros(0, dtype=np.float32)
    samples = np.frombuffer(pcm, dtype=np.int16)
    if samples.size % 2 != 0:
        samples = samples[:-1]
-    stereo = samples.reshape(-1, 2)
+    if samples.size == 0:
    mono = stereo.mean(axis=1).astype(np.float32) / 32768.0
    if mono.size == 0:
        return mono
    trim = (mono.size // 3) * 3
    if trim == 0:
        return np.zeros(0, dtype=np.float32)
-    mono = mono[:trim].reshape(-1, 3).mean(axis=1)
+    stereo = samples.reshape(-1, 2)
-    return mono.astype(np.float32)
+    mono48 = stereo.mean(axis=1).astype(np.float32) / 32768.0
    import torch
    import torchaudio.functional as taF
    wav = torch.from_numpy(mono48).unsqueeze(0)
    mono16 = taF.resample(
        wav, SAMPLE_RATE_DISCORD, SAMPLE_RATE_WHISPER,
        lowpass_filter_width=16,
    ).squeeze(0).numpy()
    return np.ascontiguousarray(mono16, dtype=np.float32)
 # ---------- VoiceSession ----------
@@ -646,19 +671,25 @@ class EchoVoiceSink(AudioSink):
    def _flush_to_stt(self, user_id: int, pcm48_stereo: bytes) -> None:
        """Downsample, Whisper-transcribe RO, drop hallucinations, dispatch."""
        try:
            t_start = time.monotonic()
            mono16 = _pcm48_stereo_to_16_mono(pcm48_stereo)
            if mono16.size == 0:
                return
            audio_duration_s = float(mono16.size) / float(SAMPLE_RATE_WHISPER)
            model = _get_whisper_model()
            segments, _info = model.transcribe(
                mono16, language="ro", beam_size=5,
                initial_prompt=(
-                    "Echo Core, asistent personal AI românesc al lui Marius. "
+                    "Conversatie in romana cu asistentul Eco (Echo Core). "
-                    "Conversație colocvială în română. "
+                    "Marius i se adreseaza cu 'Salut, Eco', 'Eco' sau 'Echo Core' "
-                    "Comenzi voce recunoscute: schimbă vocea pe M1, M2, M3, M4, M5, "
+                    "la inceputul mesajului. Exemple: 'Salut, Eco, ce mai faci?', "
-                    "F1, F2, F3, F4, F5. Exemple: vorbește cu vocea M5, voce F3, "
+                    "'Eco, adauga pe agenda de maine sa sun la Bianca', "
                    "'Echo Core, vreau sa-mi reamintesti diseara'. "
                    "Comenzi voce recunoscute: schimba vocea pe M1, M2, M3, M4, M5, "
                    "F1, F2, F3, F4, F5. Exemple: vorbeste cu vocea M5, voce F3, "
                    "treci pe vocea F1."
                ),
                hotwords="Eco Echo Core Marius Bianca",
                condition_on_previous_text=False,
            )
            text_parts: list[str] = []
@@ -677,6 +708,16 @@ class EchoVoiceSink(AudioSink):
            text = " ".join(text_parts).strip()
            if not text:
                return
            _append_stt_log({
                "ts": time.time(),
                "channel_id": self.session.voice_channel_id,
                "user_id": int(user_id),
                "text": text,
                "no_speech_prob": round(worst_no_speech, 3),
                "audio_duration_s": round(audio_duration_s, 3),
                "stt_latency_s": round(time.monotonic() - t_start, 3),
                "model": "small",
            })
            self._schedule_segment_done(user_id, text, worst_no_speech)
        except Exception as e:  # noqa: BLE001
            log.warning("Whisper transcribe failed: %s", e)
--- a/tasks/lessons.md
+++ b/tasks/lessons.md
@@ -17,6 +17,13 @@ Lecții capturate din corectările lui Marius. Citește acest fișier la începu
 <!-- Lecțiile se adaugă mai jos, cele mai noi sus. -->
 ## Intră în plan mode ÎNAINTE de a executa orice modificare de cod
 **Data:** 2026-05-28
 **Context:** Marius a descris o cerință de îmbunătățire a comenzii `/audio` cu URL (chunk by chunk). Am implementat direct fără plan mode.
 **Greșeala:** Am sărit peste pasul de planificare și am modificat fișierele fără aprobarea lui Marius.
 **Regula:** Pentru orice modificare de cod (nu doar task-uri cu 3+ pași), intră în plan mode, prezintă planul, și AȘTEAPTĂ aprobarea înainte de a atinge vreun fișier.
 **Când se aplică:** Orice cerere de cod/implementare, indiferent de simplitate aparentă. Dacă e tentant să implementezi direct pentru că pare simplu — e exact momentul să te oprești și să planifici.
 ## Supertonic rejectează ghilimelele curly (Unicode) cu HTTP 500
 **Data:** 2026-05-27
 **Context:** Marius a dat o comandă audio pe Discord cu un URL, iar răspunsul lui Claude conținea `„foo"` (ghilimele românești curly). Supertonic a returnat `HTTP 500: synthesis failed: Found 1 unsupported character(s): ['„']` și răspunsul nu s-a mai auzit. Fără retry logic vizibil în UX — pur și simplu tace.
--- a/tools/anaf-monitor/hashes.json
+++ b/tools/anaf-monitor/hashes.json
@@ -5,7 +5,7 @@
  "D394": "c4c4e62bda30032f12c17edf9a5087b6173a350ccb1fd750158978b3bd0acb7d",
  "D406": "ca6103448d663ab16fcaef0f29f8933ef526cbf5aad12c7ff5dbd61b22ca9fc6",
  "SIT_FIN_SEM_2025": "8164843431e6b703a38fbdedc7898ec6ae83559fe10f88663ba0b55f3091d5fe",
-  "SIT_FIN_AN_2025": "ec5b2ce694b02bf780e0f72df462b1aeec578ee64c11b3e44ed1a80b2dbe85d8",
+  "SIT_FIN_AN_2025": "accceef5b6585a3e901d83d23fc2e60f6562eac4a2ce00f943856232bed929d6",
  "DESCARCARE_DECLARATII": "8cc082021edb0ae97686d73f8179369be33a68ef03ec791757460bb7fff99e34",
  "D205": "d3c20a7ae70f4c18bbb7add42af035e3746d323b2e6df37a4e31ed625ddb86d9",
  "D390": "4726938ed5858ec735caefd947a7d182b6dc64009478332c4feabdb36412a84e",
--- a/tools/anaf-monitor/snapshots/SIT_FIN_AN_2025.txt
+++ b/tools/anaf-monitor/snapshots/SIT_FIN_AN_2025.txt
@@ -21,6 +21,7 @@ S1061
 S1070
 S1072
 S1079
 S1080
 Tabel 
 		codificări
 tipuri de situaţii financiare şi raportări anuale