Files
game-library/ENRICHMENT_PILOT.md
Claude Agent f7a37f91ec Headless cron enrichment system + progress checkpoint at 32%
OS cron fires enrich_wave.sh twice nightly (post 23:00 UTC reset); each wave
caps at ~700 keys (~75% window) via enrichment_wave.py --prepare. Fully
headless: one claude -p per batch via xargs, flock-guarded, idempotent.
DB updated to 9541 activities; .gitignore covers enrichment intermediates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 21:26:35 +00:00

2.9 KiB
Raw Blame History

Enrichment PILOT — sign-off required before full-corpus scaling

Date: 2026-05-29. Pilot covers 34 activities (the STOP gate from HANDOFF.md step 3, guarding ~68k LLM calls across the full corpus).

Pipeline integrity (all green)

Hop Expected Actual
prompts emitted 34 34
part files on disk (valid JSON, key matches filename) 34 34
enrichment.json entries after --collect 34 34
rebuild overlay: matched / orphaned 34 / 0 34 / 0

No leak at any hop. orphaned 0 confirms the content_key the rebuild computes matches what run_enrichment emitted (no dedup rep-selection drift).

Pilot composition

Deliberately mixed to exercise BOTH operations (corpus is 7076 EN / 2465 RO, so en→ro translation is the dominant + highest-risk path):

  • 26 rows from teambuilding_corbu — all Romanian → ro→ro polish
  • 8 rows from d3959920_outdoor_games — all English → en→ro translation

Result: ~7 genuine en→ro translations + ~27 ro→ro polish.

Field population (stated vs estimated)

age_group_max     : 0 stated / 30 estimated
age_group_min     : 0 / 34
duration_max      : 3 / 29
duration_min      : 4 / 28
indoor_outdoor    : 12 / 22
participants_max  : 0 / 24
participants_min  : 4 / 30
space_needed      : 2 / 32

Almost everything is estimated — sources rarely state ages/durations explicitly. The pipeline marks every inferred field in estimated_fields, and the UI shows an (estimat) marker, so estimates are transparent to end users.

What to evaluate (the three sign-off axes)

  1. Translation fidelity (en→ro) — e.g. Labels → Etichete, Ships in a Fog → Nave în ceață, Spot the Colours → Găsește culorile. Game rules preserved, no moralizing added, proper terms kept.
  2. Description fidelity / expansion — ro→ro rows fold in setup/material detail that IS in the source chunk (e.g. Găsește-ți fratele și sora adds "carton A6"
    • "la semnal, toți încep simultan"; Ce-mi place? folds in the character-traits discussion). No invented steps observed.
  3. Estimation plausibility — mostly reasonable. Weak spots to judge: a few age ranges are very wide/defaulted (e.g. Găsește-ți fratele și sora → age 1099). If wide age defaults are unacceptable, tighten the ENRICHMENT_PROMPT guidance before scaling.

Inspect the data yourself

sqlite3 data/activities.db "select name, name_ro, language, indoor_outdoor, space_needed, estimated_fields from activities where name_ro is not null;"
# raw overlay: data/enrichment.json (34 entries)
# per-activity parts: data/enrichment_parts/*.json

After sign-off (do NOT auto-proceed)

Scale in waves of ~816 Sonnet subagents over the rest of the corpus (run_enrichment.py is additive + resumable — skips already-enriched keys), --collect, then final build_database.py --rebuild --enrichment.