# Enrichment PILOT — sign-off required before full-corpus scaling **Date:** 2026-05-29. Pilot covers **34 activities** (the STOP gate from `HANDOFF.md` step 3, guarding ~6–8k LLM calls across the full corpus). ## Pipeline integrity (all green) | Hop | Expected | Actual | |-----|----------|--------| | prompts emitted | 34 | 34 | | part files on disk (valid JSON, key matches filename) | 34 | 34 | | `enrichment.json` entries after `--collect` | 34 | 34 | | rebuild overlay: `matched` / `orphaned` | 34 / 0 | **34 / 0** | No leak at any hop. `orphaned 0` confirms the content_key the rebuild computes matches what `run_enrichment` emitted (no dedup rep-selection drift). ## Pilot composition Deliberately mixed to exercise BOTH operations (corpus is 7076 EN / 2465 RO, so en→ro translation is the dominant + highest-risk path): - **26** rows from `teambuilding_corbu` — all Romanian → **ro→ro polish** - **8** rows from `d3959920_outdoor_games` — all English → **en→ro translation** Result: ~7 genuine en→ro translations + ~27 ro→ro polish. ## Field population (stated vs estimated) ``` age_group_max : 0 stated / 30 estimated age_group_min : 0 / 34 duration_max : 3 / 29 duration_min : 4 / 28 indoor_outdoor : 12 / 22 participants_max : 0 / 24 participants_min : 4 / 30 space_needed : 2 / 32 ``` Almost everything is estimated — sources rarely state ages/durations explicitly. The pipeline marks every inferred field in `estimated_fields`, and the UI shows an `(estimat)` marker, so estimates are transparent to end users. ## What to evaluate (the three sign-off axes) 1. **Translation fidelity (en→ro)** — e.g. *Labels → Etichete*, *Ships in a Fog → Nave în ceață*, *Spot the Colours → Găsește culorile*. Game rules preserved, no moralizing added, proper terms kept. 2. **Description fidelity / expansion** — ro→ro rows fold in setup/material detail that IS in the source chunk (e.g. *Găsește-ți fratele și sora* adds "carton A6" + "la semnal, toți încep simultan"; *Ce-mi place?* folds in the character-traits discussion). No invented steps observed. 3. **Estimation plausibility** — mostly reasonable. **Weak spots to judge:** a few age ranges are very wide/defaulted (e.g. *Găsește-ți fratele și sora* → age 10–99). If wide age defaults are unacceptable, tighten the ENRICHMENT_PROMPT guidance before scaling. ## Inspect the data yourself ```bash sqlite3 data/activities.db "select name, name_ro, language, indoor_outdoor, space_needed, estimated_fields from activities where name_ro is not null;" # raw overlay: data/enrichment.json (34 entries) # per-activity parts: data/enrichment_parts/*.json ``` ## After sign-off (do NOT auto-proceed) Scale in waves of ~8–16 Sonnet subagents over the rest of the corpus (`run_enrichment.py` is additive + resumable — skips already-enriched keys), `--collect`, then final `build_database.py --rebuild --enrichment`.