chore: working-tree state — anaf snapshots, cron state, KB notes, tools
Pre-existing uncommitted changes swept in with the STT work: anaf-monitor snapshots/versions, cron job + newsletter state, 9 youtube KB notes, tools/ocr_bon.py, and tools/tts.py. Note: the tts.py change breaks 2 truncation tests in test_voice_normalize.py (sanitize word-count) — flagged for a separate follow-up. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
18
tools/tts.py
18
tools/tts.py
@@ -35,10 +35,26 @@ _TTS_PUNCT_MAP = {
|
||||
}
|
||||
|
||||
|
||||
# Supertonic ONNX model hard limit: inputs longer than this trigger
|
||||
# Mul node dimension mismatches in attention layers.
|
||||
_MAX_TTS_CHARS = 400
|
||||
|
||||
|
||||
def sanitize_for_supertonic(text: str) -> str:
|
||||
"""Replace Unicode punctuation Supertonic rejects with ASCII equivalents."""
|
||||
"""Replace Unicode punctuation and strip chars that crash Supertonic's ONNX model."""
|
||||
for src, dst in _TTS_PUNCT_MAP.items():
|
||||
text = text.replace(src, dst)
|
||||
# Strip emoji and high-codepoint chars (keep ASCII printable + Latin/Romanian diacritice)
|
||||
cleaned = []
|
||||
for ch in text:
|
||||
cp = ord(ch)
|
||||
if (32 <= cp <= 126) or (128 <= cp <= 591):
|
||||
cleaned.append(ch)
|
||||
else:
|
||||
cleaned.append(' ')
|
||||
text = ' '.join(''.join(cleaned).split())
|
||||
if len(text) > _MAX_TTS_CHARS:
|
||||
text = text[:_MAX_TTS_CHARS]
|
||||
return text
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user