chore: working-tree state — anaf snapshots, cron state, KB notes, tools

Pre-existing uncommitted changes swept in with the STT work:
anaf-monitor snapshots/versions, cron job + newsletter state, 9 youtube KB
notes, tools/ocr_bon.py, and tools/tts.py.

Note: the tts.py change breaks 2 truncation tests in test_voice_normalize.py
(sanitize word-count) — flagged for a separate follow-up.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-06-27 18:16:31 +00:00
parent ce273d14db
commit d175d5ba5a
17 changed files with 840 additions and 41 deletions

View File

@@ -35,10 +35,26 @@ _TTS_PUNCT_MAP = {
}
# Supertonic ONNX model hard limit: inputs longer than this trigger
# Mul node dimension mismatches in attention layers.
_MAX_TTS_CHARS = 400
def sanitize_for_supertonic(text: str) -> str:
"""Replace Unicode punctuation Supertonic rejects with ASCII equivalents."""
"""Replace Unicode punctuation and strip chars that crash Supertonic's ONNX model."""
for src, dst in _TTS_PUNCT_MAP.items():
text = text.replace(src, dst)
# Strip emoji and high-codepoint chars (keep ASCII printable + Latin/Romanian diacritice)
cleaned = []
for ch in text:
cp = ord(ch)
if (32 <= cp <= 126) or (128 <= cp <= 591):
cleaned.append(ch)
else:
cleaned.append(' ')
text = ' '.join(''.join(cleaned).split())
if len(text) > _MAX_TTS_CHARS:
text = text[:_MAX_TTS_CHARS]
return text