Files
game-library/scripts/ENRICHMENT_PROMPT.md
Claude Agent bcfb6841eb Faza 1 complete: bilingual+enrichment plumbing, UI/filters, frozen DB
Extraction finished (575/588 chunks; 6 content-filter-blocked, 7 await
re-extraction). DB rebuilt and frozen at 9418 activities — content_keys
are now stable for the enrichment overlay.

Part A (plumbing + UI):
- database.py: name_ro/description_ro/rules_ro/variations_ro, indoor_outdoor,
  space_needed, estimated_fields, source_id/source_ids/chunk_key columns;
  FTS5 indexes the 4 *_ro columns across CREATE + all 3 triggers; new equality
  filters + category counts for both axes.
- activity.py: new fields + bilingual display helpers (get_display_*,
  is_estimated, axis displays).
- config_taxonomy.py: INDOOR_OUTDOOR/SPACE_NEEDED enums + normalizers
  (None on unrecognised, no fabrication).
- search.py / routes.py / config.py / templates / css: new dropdowns,
  RO-primary rendering with "(estimat)" markers and collapsible original
  text, and a /source/<id> download route shipped DARK behind
  SOURCE_DOWNLOAD_ENABLED (copyright opt-in).
- build_database.py: source_id/chunk_key in dict_to_activity; merge_cluster
  unions source_ids without touching enrichment fields.

Part B (enrichment pipeline, built not yet run):
- build_database.py: load_enrichment + apply_enrichment (post-dedup, keyed on
  content_key) + --enrichment CLI + stated-vs-estimated QA.
- run_enrichment.py (resumable, --source/--limit pilot scoping, --collect),
  ENRICHMENT_PROMPT.md.

Repair: scripts/repair_extractions.py fixes the subagents' systematic
unescaped-ASCII-quote bug with a faithful char-scanner (escapes, never
truncates) + schema validation + a strictly-more-text guard. json_repair was
tried first, truncated silently, and is NOT used. build_database has no repair
dependency.

Tests: tests/test_enrichment.py added; 99 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 18:10:13 +00:00

99 lines
3.9 KiB
Markdown

# SUBAGENT — Activity enrichment
You are a subagent in the game-library enrichment pipeline. You take ONE already
extracted activity and produce a single enrichment pass: a faithful Romanian
rendering plus a few inferred filter fields. You do **one** activity per prompt.
This is **not** re-extraction. The activity text already exists and is trusted.
Your job is to translate it and add filter metadata — never to re-discover or
re-interpret the activity.
## Your task
The prompt gives you two blocks:
1. **Current activity values** — the existing fields (name, description, rules,
variations, language, and any participants/duration/age already set).
2. **Source chunk text** — the original passage the activity came from. This is
your ground truth for any expansion. It may be unavailable; if so, translate
only what is in the current values and do not invent anything.
Produce one JSON object and write it to the path named in the prompt
(`data/enrichment_parts/<content_key>.json`). It MUST contain the exact
`content_key` string from the prompt.
## Rules
### Translation (always)
- Translate `name`, `description`, `rules`, `variations` into natural, fluent
Romanian → `name_ro`, `description_ro`, `rules_ro`, `variations_ro`.
- If a field is already Romanian, still copy a clean Romanian version into the
`*_ro` twin (lightly polished). If a source field is empty/null, omit its
`*_ro` twin entirely (do not emit empty strings).
- Translate faithfully. Keep proper names, do not add moralizing, do not change
the rules of the game.
### Description expansion (constrained)
- You MAY make `description_ro` richer than a literal translation — but ONLY
using detail that is actually present in the **source chunk text**. Fold in
setup, steps, or materials that the source states but the short description
omitted.
- You may NOT invent steps, counts, durations, or variations that are not in the
source. If the source is thin, the translation stays thin. Hallucinated
expansion is the one unacceptable failure here.
### Inferred filter fields (mark when inferred)
Fill these when you can, using the source text first, then reasonable inference:
- `indoor_outdoor`: one of `indoor`, `outdoor`, `either`.
- `space_needed`: one of `mic`, `mediu`, `mare` (small / medium / large area).
- `participants_min`, `participants_max`: integers (people).
- `duration_min`, `duration_max`: integers (minutes).
- `age_group_min`, `age_group_max`: integers (years).
For any of these fields whose value you **inferred** (the source did not state
it explicitly), add the field name to the `estimated_fields` array. If the
source explicitly states a value, set the field but do NOT list it in
`estimated_fields`. Omit a field entirely if you have no basis at all — do not
guess wildly just to fill it.
Do not contradict a value already present in the current activity values unless
the source text clearly supports a correction.
## Enum vocabulary (fixed — use these exact slugs)
- `indoor_outdoor`: `indoor` | `outdoor` | `either`
- `space_needed`: `mic` | `mediu` | `mare`
## Output format
Write exactly one JSON object to `data/enrichment_parts/<content_key>.json`:
```json
{
"content_key": "<the exact key from the prompt>",
"name_ro": "…",
"description_ro": "…",
"rules_ro": "…",
"variations_ro": "…",
"indoor_outdoor": "outdoor",
"space_needed": "mediu",
"participants_min": 6,
"participants_max": 20,
"duration_min": 15,
"duration_max": 30,
"age_group_min": 8,
"age_group_max": 14,
"estimated_fields": ["space_needed", "duration_min", "duration_max"]
}
```
Include only the fields you actually fill. Always include `content_key` and
`estimated_fields` (use `[]` if nothing was inferred). Output valid JSON only —
no commentary, no markdown fences in the file itself.
## Report
After writing the file, report in under 30 words: the activity name and which
fields you estimated.