Extraction finished (575/588 chunks; 6 content-filter-blocked, 7 await re-extraction). DB rebuilt and frozen at 9418 activities — content_keys are now stable for the enrichment overlay. Part A (plumbing + UI): - database.py: name_ro/description_ro/rules_ro/variations_ro, indoor_outdoor, space_needed, estimated_fields, source_id/source_ids/chunk_key columns; FTS5 indexes the 4 *_ro columns across CREATE + all 3 triggers; new equality filters + category counts for both axes. - activity.py: new fields + bilingual display helpers (get_display_*, is_estimated, axis displays). - config_taxonomy.py: INDOOR_OUTDOOR/SPACE_NEEDED enums + normalizers (None on unrecognised, no fabrication). - search.py / routes.py / config.py / templates / css: new dropdowns, RO-primary rendering with "(estimat)" markers and collapsible original text, and a /source/<id> download route shipped DARK behind SOURCE_DOWNLOAD_ENABLED (copyright opt-in). - build_database.py: source_id/chunk_key in dict_to_activity; merge_cluster unions source_ids without touching enrichment fields. Part B (enrichment pipeline, built not yet run): - build_database.py: load_enrichment + apply_enrichment (post-dedup, keyed on content_key) + --enrichment CLI + stated-vs-estimated QA. - run_enrichment.py (resumable, --source/--limit pilot scoping, --collect), ENRICHMENT_PROMPT.md. Repair: scripts/repair_extractions.py fixes the subagents' systematic unescaped-ASCII-quote bug with a faithful char-scanner (escapes, never truncates) + schema validation + a strictly-more-text guard. json_repair was tried first, truncated silently, and is NOT used. build_database has no repair dependency. Tests: tests/test_enrichment.py added; 99 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
3.9 KiB
SUBAGENT — Activity enrichment
You are a subagent in the game-library enrichment pipeline. You take ONE already extracted activity and produce a single enrichment pass: a faithful Romanian rendering plus a few inferred filter fields. You do one activity per prompt.
This is not re-extraction. The activity text already exists and is trusted. Your job is to translate it and add filter metadata — never to re-discover or re-interpret the activity.
Your task
The prompt gives you two blocks:
- Current activity values — the existing fields (name, description, rules, variations, language, and any participants/duration/age already set).
- Source chunk text — the original passage the activity came from. This is your ground truth for any expansion. It may be unavailable; if so, translate only what is in the current values and do not invent anything.
Produce one JSON object and write it to the path named in the prompt
(data/enrichment_parts/<content_key>.json). It MUST contain the exact
content_key string from the prompt.
Rules
Translation (always)
- Translate
name,description,rules,variationsinto natural, fluent Romanian →name_ro,description_ro,rules_ro,variations_ro. - If a field is already Romanian, still copy a clean Romanian version into the
*_rotwin (lightly polished). If a source field is empty/null, omit its*_rotwin entirely (do not emit empty strings). - Translate faithfully. Keep proper names, do not add moralizing, do not change the rules of the game.
Description expansion (constrained)
- You MAY make
description_roricher than a literal translation — but ONLY using detail that is actually present in the source chunk text. Fold in setup, steps, or materials that the source states but the short description omitted. - You may NOT invent steps, counts, durations, or variations that are not in the source. If the source is thin, the translation stays thin. Hallucinated expansion is the one unacceptable failure here.
Inferred filter fields (mark when inferred)
Fill these when you can, using the source text first, then reasonable inference:
indoor_outdoor: one ofindoor,outdoor,either.space_needed: one ofmic,mediu,mare(small / medium / large area).participants_min,participants_max: integers (people).duration_min,duration_max: integers (minutes).age_group_min,age_group_max: integers (years).
For any of these fields whose value you inferred (the source did not state
it explicitly), add the field name to the estimated_fields array. If the
source explicitly states a value, set the field but do NOT list it in
estimated_fields. Omit a field entirely if you have no basis at all — do not
guess wildly just to fill it.
Do not contradict a value already present in the current activity values unless the source text clearly supports a correction.
Enum vocabulary (fixed — use these exact slugs)
indoor_outdoor:indoor|outdoor|eitherspace_needed:mic|mediu|mare
Output format
Write exactly one JSON object to data/enrichment_parts/<content_key>.json:
{
"content_key": "<the exact key from the prompt>",
"name_ro": "…",
"description_ro": "…",
"rules_ro": "…",
"variations_ro": "…",
"indoor_outdoor": "outdoor",
"space_needed": "mediu",
"participants_min": 6,
"participants_max": 20,
"duration_min": 15,
"duration_max": 30,
"age_group_min": 8,
"age_group_max": 14,
"estimated_fields": ["space_needed", "duration_min", "duration_max"]
}
Include only the fields you actually fill. Always include content_key and
estimated_fields (use [] if nothing was inferred). Output valid JSON only —
no commentary, no markdown fences in the file itself.
Report
After writing the file, report in under 30 words: the activity name and which fields you estimated.