Files
game-library/ENRICHMENT_PILOT.md
Claude Agent f7a37f91ec Headless cron enrichment system + progress checkpoint at 32%
OS cron fires enrich_wave.sh twice nightly (post 23:00 UTC reset); each wave
caps at ~700 keys (~75% window) via enrichment_wave.py --prepare. Fully
headless: one claude -p per batch via xargs, flock-guarded, idempotent.
DB updated to 9541 activities; .gitignore covers enrichment intermediates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 21:26:35 +00:00

72 lines
2.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Enrichment PILOT — sign-off required before full-corpus scaling
**Date:** 2026-05-29. Pilot covers **34 activities** (the STOP gate from `HANDOFF.md`
step 3, guarding ~68k LLM calls across the full corpus).
## Pipeline integrity (all green)
| Hop | Expected | Actual |
|-----|----------|--------|
| prompts emitted | 34 | 34 |
| part files on disk (valid JSON, key matches filename) | 34 | 34 |
| `enrichment.json` entries after `--collect` | 34 | 34 |
| rebuild overlay: `matched` / `orphaned` | 34 / 0 | **34 / 0** |
No leak at any hop. `orphaned 0` confirms the content_key the rebuild computes
matches what `run_enrichment` emitted (no dedup rep-selection drift).
## Pilot composition
Deliberately mixed to exercise BOTH operations (corpus is 7076 EN / 2465 RO, so
en→ro translation is the dominant + highest-risk path):
- **26** rows from `teambuilding_corbu` — all Romanian → **ro→ro polish**
- **8** rows from `d3959920_outdoor_games` — all English → **en→ro translation**
Result: ~7 genuine en→ro translations + ~27 ro→ro polish.
## Field population (stated vs estimated)
```
age_group_max : 0 stated / 30 estimated
age_group_min : 0 / 34
duration_max : 3 / 29
duration_min : 4 / 28
indoor_outdoor : 12 / 22
participants_max : 0 / 24
participants_min : 4 / 30
space_needed : 2 / 32
```
Almost everything is estimated — sources rarely state ages/durations explicitly.
The pipeline marks every inferred field in `estimated_fields`, and the UI shows an
`(estimat)` marker, so estimates are transparent to end users.
## What to evaluate (the three sign-off axes)
1. **Translation fidelity (en→ro)** — e.g. *Labels → Etichete*, *Ships in a Fog →
Nave în ceață*, *Spot the Colours → Găsește culorile*. Game rules preserved,
no moralizing added, proper terms kept.
2. **Description fidelity / expansion** — ro→ro rows fold in setup/material detail
that IS in the source chunk (e.g. *Găsește-ți fratele și sora* adds "carton A6"
+ "la semnal, toți încep simultan"; *Ce-mi place?* folds in the character-traits
discussion). No invented steps observed.
3. **Estimation plausibility** — mostly reasonable. **Weak spots to judge:** a few
age ranges are very wide/defaulted (e.g. *Găsește-ți fratele și sora* → age
1099). If wide age defaults are unacceptable, tighten the ENRICHMENT_PROMPT
guidance before scaling.
## Inspect the data yourself
```bash
sqlite3 data/activities.db "select name, name_ro, language, indoor_outdoor, space_needed, estimated_fields from activities where name_ro is not null;"
# raw overlay: data/enrichment.json (34 entries)
# per-activity parts: data/enrichment_parts/*.json
```
## After sign-off (do NOT auto-proceed)
Scale in waves of ~816 Sonnet subagents over the rest of the corpus
(`run_enrichment.py` is additive + resumable — skips already-enriched keys),
`--collect`, then final `build_database.py --rebuild --enrichment`.