Update cron, dashboard, root +3 more (+1 ~11)

This commit is contained in:
2026-05-28 20:21:28 +00:00
parent e79bed7afe
commit 0ce8a5a04d
12 changed files with 217 additions and 51 deletions

34
TODOS.md Normal file
View File

@@ -0,0 +1,34 @@
# TODOS — Echo Core deferred work
Captured during planning reviews. Re-evaluate after relevant features ship or dogfood data accumulates.
## Voice
### Bounded SSRC buffer for DAVE-active unknown-SSRC race
**What:** Replace the hard-drop of unknown-SSRC RTP packets in `_maybe_dave_decrypt` (vendor/discord-ext-voice-recv/.../reader.py) with a small bounded buffer per SSRC. Flush on SPEAKING event mapping the SSRC → user_id, then DAVE-decrypt and feed downstream.
**Why:** voice-recv vanilla feeds unknown-SSRC packets to opus decoder anyway (reader.py:178 logs `info` but still calls `feed_rtp`). The DAVE patch turns this into a hard drop because davey requires `user_id`. Net regression: 40-200ms (1-5 packets) lost on the FIRST utterance of each new speaker per session, when audio races ahead of SPEAKING event. Subsequent utterances unaffected.
**Pros:** Eliminates first-utterance audio loss. Whisper STT gets the complete prefix ("Echo, cât e ceasul?" instead of possibly "co, cât e ceasul?").
**Cons:** New state machine — queue per SSRC, TTL flush (~2s), ordering preservation, memory bound. New race surface between socket-reader thread (queueing) and asyncio loop (SPEAKING event → flush). 50 packets * ~1KB * N concurrent unknown SSRCs = memory footprint. Bug risk traded for UX win.
**Context:** Discovered during /plan-eng-review on `/home/moltbot/.claude/plans/wiggly-exploring-glade.md` (DAVE receive-side decrypt patch). Outside-voice reviewer flagged this as a regression vs voice-recv vanilla behavior. Accepted as tradeoff for v1 because SPEAKING typically arrives before audio in normal Discord flow — impact may be rare. **Depends on:** dogfood data from Pas 12 Etapa 2 #3-#13 confirming this IS observed in practice (i.e., Whisper transcripts repeatedly missing first word). If not observed, this TODO stays permanent. If observed in 3+ sessions, escalate.
**Where to start:** `_maybe_dave_decrypt` in `vendor/discord-ext-voice-recv/discord/ext/voice_recv/reader.py`. Add `_pending_packets: dict[ssrc, deque[bytes]]` on `AudioReader`. Hook SPEAKING event handler in voice_client.py to call new `flush_pending(ssrc, user_id)` method.
**Depends on / blocked by:** Pas 12 dogfood data. Re-evaluate after 3+ sessions of live use.
---
## (Other deferred items from voice review — already in plan's "Out of scope" section)
- Wake-word "Echo" cu porcupine (P3 — incompatible with /voice join continuous)
- Telegram voice memo bidirectional (P2 — reuses src/voice/pipeline.py)
- Full-session WAV recording (P3 — KB transcript sufficient v1)
- Upstreaming the DAVE patch to imayhaveborkedit/discord-ext-voice-recv (separate community effort)
- `threading.Lock` around davey.decrypt (conditional follow-up — only if dogfood reveals crashes)
- DAVE verification UI (`voice_privacy_code`, pairwise fingerprints — useful but not blocking voice-to-voice)
- Video E2E decrypt (Echo is audio-only, no video pipeline)
- Pre-existent test failures: TestPromptInjectionProtection × 2 + TestOnMessage × 4 (separate ticket)

View File

@@ -109,7 +109,7 @@
"949388626146517022" "949388626146517022"
], ],
"user_name": "Marius", "user_name": "Marius",
"default_voice": "M5", "default_voice": "M2",
"auto_leave_minutes": 5 "auto_leave_minutes": 5
}, },
"paths": { "paths": {

File diff suppressed because one or more lines are too long

View File

@@ -1,5 +1,5 @@
{ {
"last_sent": 19, "last_sent": 20,
"year": 2026, "year": 2026,
"last_sent_at": "2026-05-21T17:00:58.795355+00:00" "last_sent_at": "2026-05-28T20:05:22.628304+00:00"
} }

View File

@@ -1,5 +1,5 @@
{ {
"lastUpdated": "2026-04-29T05:30:59.129949", "lastUpdated": "2026-05-27T15:16:49.070154",
"habits": [ "habits": [
{ {
"id": "95c15eef-3a14-4985-a61e-0b64b72851b0", "id": "95c15eef-3a14-4985-a61e-0b64b72851b0",
@@ -17,7 +17,7 @@
"streak": { "streak": {
"current": 1, "current": 1,
"best": 6, "best": 6,
"lastCheckIn": "2026-03-31" "lastCheckIn": "2026-05-27"
}, },
"lives": 2, "lives": 2,
"completions": [ "completions": [
@@ -56,10 +56,14 @@
{ {
"date": "2026-03-31", "date": "2026-03-31",
"type": "check" "type": "check"
},
{
"date": "2026-05-27",
"type": "check"
} }
], ],
"createdAt": "2026-02-11T00:54:03.447063", "createdAt": "2026-02-11T00:54:03.447063",
"updatedAt": "2026-03-31T19:39:08.013266", "updatedAt": "2026-05-27T15:16:49.070154",
"lastLivesAward": "2026-02-23" "lastLivesAward": "2026-02-23"
}, },
{ {

View File

@@ -15,7 +15,7 @@ from src.claude_session import (
PROJECT_ROOT, PROJECT_ROOT,
VALID_MODELS, VALID_MODELS,
) )
from src.fast_commands import dispatch as fast_dispatch from src.fast_commands import dispatch as fast_dispatch, split_text_chunks, extract_url_text
from src.router import ( from src.router import (
route_message, route_message,
_ralph_propose, _ralph_propose,
@@ -916,6 +916,37 @@ def create_bot(config: Config) -> discord.Client:
rezumat: bool = False, rezumat: bool = False,
) -> None: ) -> None:
await interaction.response.defer() await interaction.response.defer()
voice = voce or "M2"
# URL fără rezumat → fetch + split în chunks + trimite pe rând
if text_sau_url and text_sau_url.startswith("http") and not rezumat:
text = await asyncio.to_thread(extract_url_text, text_sau_url)
if not text:
await interaction.followup.send("Nu am putut extrage text din URL.")
return
chunks = split_text_chunks(text, max_chars=1500)
total = len(chunks)
for i, chunk in enumerate(chunks, 1):
result = await asyncio.to_thread(fast_dispatch, "audio", [voice, chunk])
if result and result.startswith("__AUDIO__:"):
wav_path = result[len("__AUDIO__:"):]
try:
filename = f"echo-audio-{i}din{total}.wav" if total > 1 else "echo-audio.wav"
await interaction.followup.send(
content=f"Bucata {i}/{total}" if total > 1 else None,
file=discord.File(wav_path, filename=filename),
)
finally:
try:
os.unlink(wav_path)
except OSError:
pass
else:
await interaction.followup.send(result or f"Eroare TTS la bucata {i}.")
return
return
# Comportament existent: text direct, gol, sau rezumat URL
args: list[str] = [] args: list[str] = []
if voce: if voce:
args.append(voce) args.append(voce)

View File

@@ -285,6 +285,23 @@ def register(tree: app_commands.CommandTree, bot: discord.Client) -> app_command
msg = f"Default voce setată {new_voice}. Va intra în vigoare la următorul /voice join." msg = f"Default voce setată {new_voice}. Va intra în vigoare la următorul /voice join."
await interaction.followup.send(msg, ephemeral=True) await interaction.followup.send(msg, ephemeral=True)
@voice_group.command(name="stop", description="Oprește audio-ul curent (golește coada TTS)")
async def stop_audio(interaction: discord.Interaction) -> None:
await interaction.response.defer(ephemeral=True)
guild_id = interaction.guild.id if interaction.guild else None
session = _voice_sessions.get(guild_id) if guild_id is not None else None
if session is None or session.ttsq is None:
await interaction.followup.send("Nu sunt în voice.", ephemeral=True)
return
try:
session.ttsq.clear()
log.info("voice stop: TTS queue cleared by user %s", interaction.user)
except Exception as e:
log.warning("voice stop: ttsq.clear failed: %s", e)
await interaction.followup.send(f"Eroare la oprire: {e}", ephemeral=True)
return
await interaction.followup.send("Audio oprit.", ephemeral=True)
@voice_group.command(name="doctor", description="Verifică voice stack") @voice_group.command(name="doctor", description="Verifică voice stack")
async def doctor(interaction: discord.Interaction) -> None: async def doctor(interaction: discord.Interaction) -> None:
await interaction.response.defer(ephemeral=True) await interaction.response.defer(ephemeral=True)

View File

@@ -812,6 +812,51 @@ def _tts_synthesize(text: str, voice: str) -> dict:
return {"ok": False, "error": str(e)} return {"ok": False, "error": str(e)}
def split_text_chunks(text: str, max_chars: int = 1500) -> list[str]:
"""Împarte text în chunks pe paragrafe fără a depăși max_chars."""
import re as _re
paragraphs = [p.strip() for p in text.split("\n\n") if p.strip()]
if not paragraphs:
paragraphs = [p.strip() for p in text.split("\n") if p.strip()]
chunks: list[str] = []
current_parts: list[str] = []
current_len = 0
for para in paragraphs:
if len(para) > max_chars:
if current_parts:
chunks.append("\n\n".join(current_parts))
current_parts = []
current_len = 0
sentences = _re.split(r'(?<=[.!?])\s+', para)
for sent in sentences:
if current_len + len(sent) + 1 > max_chars and current_parts:
chunks.append(" ".join(current_parts))
current_parts = [sent]
current_len = len(sent)
else:
current_parts.append(sent)
current_len += len(sent) + 1
elif current_len + len(para) + 2 > max_chars and current_parts:
chunks.append("\n\n".join(current_parts))
current_parts = [para]
current_len = len(para)
else:
current_parts.append(para)
current_len += len(para) + 2
if current_parts:
chunks.append("\n\n".join(current_parts))
return chunks if chunks else [text[:max_chars]]
def extract_url_text(url: str) -> str | None:
"""Extrage textul principal dintr-un URL (publică)."""
return _extract_url_text(url)
def _extract_url_text(url: str) -> str | None: def _extract_url_text(url: str) -> str | None:
"""Extrage textul principal dintr-un URL cu trafilatura.""" """Extrage textul principal dintr-un URL cu trafilatura."""
try: try:

View File

@@ -53,6 +53,24 @@ NO_SPEECH_DROP_THRESHOLD = 0.6
PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent
LOGS_DIR = PROJECT_ROOT / "logs" LOGS_DIR = PROJECT_ROOT / "logs"
VOICE_METRICS_PATH = LOGS_DIR / "voice_metrics.jsonl" VOICE_METRICS_PATH = LOGS_DIR / "voice_metrics.jsonl"
VOICE_STT_LOG_PATH = LOGS_DIR / "voice_stt_log.jsonl"
_stt_log_lock = threading.Lock()
def _append_stt_log(entry: dict) -> None:
"""Append one Whisper transcript to ``voice_stt_log.jsonl``.
Separate from ``record_enabled``/``transcripts_jsonl_path`` (which feed
KB). This log is always-on, scoped to STT debugging — used to mine
code-switching mistranscriptions (English words in Romanian flow) over
several days and build a personal vocabulary correction table.
"""
try:
LOGS_DIR.mkdir(parents=True, exist_ok=True)
with _stt_log_lock, VOICE_STT_LOG_PATH.open("a", encoding="utf-8") as f:
f.write(json.dumps(entry, ensure_ascii=False) + "\n")
except Exception as e: # noqa: BLE001
log.debug("STT log write failed: %s", e)
# ---------- Lazy model singletons ---------- # ---------- Lazy model singletons ----------
@@ -100,24 +118,31 @@ def _get_silero_vad():
def _pcm48_stereo_to_16_mono(pcm: bytes) -> np.ndarray: def _pcm48_stereo_to_16_mono(pcm: bytes) -> np.ndarray:
"""Discord 48kHz s16le stereo bytes -> 16kHz mono float32 in [-1, 1]. """Discord 48kHz s16le stereo bytes -> 16kHz mono float32 in [-1, 1].
Cheap downsample: average the two channels, then average every 3 Mix channels to mono, then resample 48k→16k with torchaudio's polyphase
samples (48k / 3 = 16k). faster-whisper + silero-vad accept the Kaiser-windowed sinc (``lowpass_filter_width=16``) instead of a naive
resulting ``np.float32`` array directly. every-3-samples average. The previous decimation had no anti-aliasing,
which folded HF content (sibilants, fricatives) back into the
speech band and degraded Whisper's accuracy on short wake phrases
like "Salut, Eco". faster-whisper + silero-vad accept the resulting
``np.float32`` array directly.
""" """
if not pcm: if not pcm:
return np.zeros(0, dtype=np.float32) return np.zeros(0, dtype=np.float32)
samples = np.frombuffer(pcm, dtype=np.int16) samples = np.frombuffer(pcm, dtype=np.int16)
if samples.size % 2 != 0: if samples.size % 2 != 0:
samples = samples[:-1] samples = samples[:-1]
stereo = samples.reshape(-1, 2) if samples.size == 0:
mono = stereo.mean(axis=1).astype(np.float32) / 32768.0
if mono.size == 0:
return mono
trim = (mono.size // 3) * 3
if trim == 0:
return np.zeros(0, dtype=np.float32) return np.zeros(0, dtype=np.float32)
mono = mono[:trim].reshape(-1, 3).mean(axis=1) stereo = samples.reshape(-1, 2)
return mono.astype(np.float32) mono48 = stereo.mean(axis=1).astype(np.float32) / 32768.0
import torch
import torchaudio.functional as taF
wav = torch.from_numpy(mono48).unsqueeze(0)
mono16 = taF.resample(
wav, SAMPLE_RATE_DISCORD, SAMPLE_RATE_WHISPER,
lowpass_filter_width=16,
).squeeze(0).numpy()
return np.ascontiguousarray(mono16, dtype=np.float32)
# ---------- VoiceSession ---------- # ---------- VoiceSession ----------
@@ -646,19 +671,25 @@ class EchoVoiceSink(AudioSink):
def _flush_to_stt(self, user_id: int, pcm48_stereo: bytes) -> None: def _flush_to_stt(self, user_id: int, pcm48_stereo: bytes) -> None:
"""Downsample, Whisper-transcribe RO, drop hallucinations, dispatch.""" """Downsample, Whisper-transcribe RO, drop hallucinations, dispatch."""
try: try:
t_start = time.monotonic()
mono16 = _pcm48_stereo_to_16_mono(pcm48_stereo) mono16 = _pcm48_stereo_to_16_mono(pcm48_stereo)
if mono16.size == 0: if mono16.size == 0:
return return
audio_duration_s = float(mono16.size) / float(SAMPLE_RATE_WHISPER)
model = _get_whisper_model() model = _get_whisper_model()
segments, _info = model.transcribe( segments, _info = model.transcribe(
mono16, language="ro", beam_size=5, mono16, language="ro", beam_size=5,
initial_prompt=( initial_prompt=(
"Echo Core, asistent personal AI românesc al lui Marius. " "Conversatie in romana cu asistentul Eco (Echo Core). "
"Conversație colocvială în română. " "Marius i se adreseaza cu 'Salut, Eco', 'Eco' sau 'Echo Core' "
"Comenzi voce recunoscute: schimbă vocea pe M1, M2, M3, M4, M5, " "la inceputul mesajului. Exemple: 'Salut, Eco, ce mai faci?', "
"F1, F2, F3, F4, F5. Exemple: vorbește cu vocea M5, voce F3, " "'Eco, adauga pe agenda de maine sa sun la Bianca', "
"'Echo Core, vreau sa-mi reamintesti diseara'. "
"Comenzi voce recunoscute: schimba vocea pe M1, M2, M3, M4, M5, "
"F1, F2, F3, F4, F5. Exemple: vorbeste cu vocea M5, voce F3, "
"treci pe vocea F1." "treci pe vocea F1."
), ),
hotwords="Eco Echo Core Marius Bianca",
condition_on_previous_text=False, condition_on_previous_text=False,
) )
text_parts: list[str] = [] text_parts: list[str] = []
@@ -677,6 +708,16 @@ class EchoVoiceSink(AudioSink):
text = " ".join(text_parts).strip() text = " ".join(text_parts).strip()
if not text: if not text:
return return
_append_stt_log({
"ts": time.time(),
"channel_id": self.session.voice_channel_id,
"user_id": int(user_id),
"text": text,
"no_speech_prob": round(worst_no_speech, 3),
"audio_duration_s": round(audio_duration_s, 3),
"stt_latency_s": round(time.monotonic() - t_start, 3),
"model": "small",
})
self._schedule_segment_done(user_id, text, worst_no_speech) self._schedule_segment_done(user_id, text, worst_no_speech)
except Exception as e: # noqa: BLE001 except Exception as e: # noqa: BLE001
log.warning("Whisper transcribe failed: %s", e) log.warning("Whisper transcribe failed: %s", e)

View File

@@ -17,6 +17,13 @@ Lecții capturate din corectările lui Marius. Citește acest fișier la începu
<!-- Lecțiile se adaugă mai jos, cele mai noi sus. --> <!-- Lecțiile se adaugă mai jos, cele mai noi sus. -->
## Intră în plan mode ÎNAINTE de a executa orice modificare de cod
**Data:** 2026-05-28
**Context:** Marius a descris o cerință de îmbunătățire a comenzii `/audio` cu URL (chunk by chunk). Am implementat direct fără plan mode.
**Greșeala:** Am sărit peste pasul de planificare și am modificat fișierele fără aprobarea lui Marius.
**Regula:** Pentru orice modificare de cod (nu doar task-uri cu 3+ pași), intră în plan mode, prezintă planul, și AȘTEAPTĂ aprobarea înainte de a atinge vreun fișier.
**Când se aplică:** Orice cerere de cod/implementare, indiferent de simplitate aparentă. Dacă e tentant să implementezi direct pentru că pare simplu — e exact momentul să te oprești și să planifici.
## Supertonic rejectează ghilimelele curly (Unicode) cu HTTP 500 ## Supertonic rejectează ghilimelele curly (Unicode) cu HTTP 500
**Data:** 2026-05-27 **Data:** 2026-05-27
**Context:** Marius a dat o comandă audio pe Discord cu un URL, iar răspunsul lui Claude conținea `„foo"` (ghilimele românești curly). Supertonic a returnat `HTTP 500: synthesis failed: Found 1 unsupported character(s): ['„']` și răspunsul nu s-a mai auzit. Fără retry logic vizibil în UX — pur și simplu tace. **Context:** Marius a dat o comandă audio pe Discord cu un URL, iar răspunsul lui Claude conținea `„foo"` (ghilimele românești curly). Supertonic a returnat `HTTP 500: synthesis failed: Found 1 unsupported character(s): ['„']` și răspunsul nu s-a mai auzit. Fără retry logic vizibil în UX — pur și simplu tace.

View File

@@ -5,7 +5,7 @@
"D394": "c4c4e62bda30032f12c17edf9a5087b6173a350ccb1fd750158978b3bd0acb7d", "D394": "c4c4e62bda30032f12c17edf9a5087b6173a350ccb1fd750158978b3bd0acb7d",
"D406": "ca6103448d663ab16fcaef0f29f8933ef526cbf5aad12c7ff5dbd61b22ca9fc6", "D406": "ca6103448d663ab16fcaef0f29f8933ef526cbf5aad12c7ff5dbd61b22ca9fc6",
"SIT_FIN_SEM_2025": "8164843431e6b703a38fbdedc7898ec6ae83559fe10f88663ba0b55f3091d5fe", "SIT_FIN_SEM_2025": "8164843431e6b703a38fbdedc7898ec6ae83559fe10f88663ba0b55f3091d5fe",
"SIT_FIN_AN_2025": "ec5b2ce694b02bf780e0f72df462b1aeec578ee64c11b3e44ed1a80b2dbe85d8", "SIT_FIN_AN_2025": "accceef5b6585a3e901d83d23fc2e60f6562eac4a2ce00f943856232bed929d6",
"DESCARCARE_DECLARATII": "8cc082021edb0ae97686d73f8179369be33a68ef03ec791757460bb7fff99e34", "DESCARCARE_DECLARATII": "8cc082021edb0ae97686d73f8179369be33a68ef03ec791757460bb7fff99e34",
"D205": "d3c20a7ae70f4c18bbb7add42af035e3746d323b2e6df37a4e31ed625ddb86d9", "D205": "d3c20a7ae70f4c18bbb7add42af035e3746d323b2e6df37a4e31ed625ddb86d9",
"D390": "4726938ed5858ec735caefd947a7d182b6dc64009478332c4feabdb36412a84e", "D390": "4726938ed5858ec735caefd947a7d182b6dc64009478332c4feabdb36412a84e",

View File

@@ -21,6 +21,7 @@ S1061
S1070 S1070
S1072 S1072
S1079 S1079
S1080
Tabel Tabel
codificări codificări
tipuri de situaţii financiare şi raportări anuale tipuri de situaţii financiare şi raportări anuale