Prevent + net the unescaped-quote bug in the durable prompts/pipeline
The escape-ASCII-quote rule previously lived only in ephemeral Agent-call strings. Bake it into the durable artifacts so the next session doesn't re-derive it: - SUBAGENT_PROMPT.md + ENRICHMENT_PROMPT.md: explicit rule to escape any ASCII " inside JSON string values (Romanian „cuvânt" is the trap). - run_enrichment.py collect_enrichment: repair malformed parts with escape_stray_quotes instead of dropping them — the enrichment path had no repair net (bad parts were silently dropped, losing that activity's enrichment). Extraction already had one; now both do. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -74,6 +74,21 @@ The file is one JSON object: a `header` plus an `activities` array.
|
||||
- Do **not** paraphrase the `source_excerpt` — copy it character for character.
|
||||
- Better to extract fewer activities accurately than to pad the output.
|
||||
|
||||
## Escaping quotes inside JSON strings (CRITICAL)
|
||||
|
||||
Any ASCII double-quote (`"`, U+0022) that appears **inside a string value** must
|
||||
be written escaped as `\"`. This is the single most common way these extractions
|
||||
break: Romanian source text uses typographic quotes like `„cuvânt"` where the
|
||||
closing mark is a plain ASCII `"`. Written raw, it terminates the JSON string
|
||||
early and corrupts the whole file. So:
|
||||
|
||||
- `"description": "grupul cântă „Unu\" în cor"` ← correct (inner `"` escaped)
|
||||
- `"description": "grupul cântă „Unu" în cor"` ← BROKEN (unescaped `"`)
|
||||
|
||||
Prefer keeping the source's typographic quotes (`„ "`), but whenever a literal
|
||||
ASCII `"` lands inside a value, escape it. After writing, re-read the file and
|
||||
confirm it parses as valid JSON.
|
||||
|
||||
## Writing large outputs in batches (IMPORTANT)
|
||||
|
||||
A single Write tool call has a hard ~32K output-token limit. Dense chunks
|
||||
|
||||
Reference in New Issue
Block a user