Files

Claude Agent bcfb6841eb Faza 1 complete: bilingual+enrichment plumbing, UI/filters, frozen DB

Extraction finished (575/588 chunks; 6 content-filter-blocked, 7 await
re-extraction). DB rebuilt and frozen at 9418 activities — content_keys
are now stable for the enrichment overlay.

Part A (plumbing + UI):
- database.py: name_ro/description_ro/rules_ro/variations_ro, indoor_outdoor,
  space_needed, estimated_fields, source_id/source_ids/chunk_key columns;
  FTS5 indexes the 4 *_ro columns across CREATE + all 3 triggers; new equality
  filters + category counts for both axes.
- activity.py: new fields + bilingual display helpers (get_display_*,
  is_estimated, axis displays).
- config_taxonomy.py: INDOOR_OUTDOOR/SPACE_NEEDED enums + normalizers
  (None on unrecognised, no fabrication).
- search.py / routes.py / config.py / templates / css: new dropdowns,
  RO-primary rendering with "(estimat)" markers and collapsible original
  text, and a /source/<id> download route shipped DARK behind
  SOURCE_DOWNLOAD_ENABLED (copyright opt-in).
- build_database.py: source_id/chunk_key in dict_to_activity; merge_cluster
  unions source_ids without touching enrichment fields.

Part B (enrichment pipeline, built not yet run):
- build_database.py: load_enrichment + apply_enrichment (post-dedup, keyed on
  content_key) + --enrichment CLI + stated-vs-estimated QA.
- run_enrichment.py (resumable, --source/--limit pilot scoping, --collect),
  ENRICHMENT_PROMPT.md.

Repair: scripts/repair_extractions.py fixes the subagents' systematic
unescaped-ASCII-quote bug with a faithful char-scanner (escapes, never
truncates) + schema validation + a strictly-more-text guard. json_repair was
tried first, truncated silently, and is NOT used. build_database has no repair
dependency.

Tests: tests/test_enrichment.py added; 99 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-05-29 18:10:13 +00:00

3.9 KiB

Raw Blame History

SUBAGENT — Activity enrichment

You are a subagent in the game-library enrichment pipeline. You take ONE already extracted activity and produce a single enrichment pass: a faithful Romanian rendering plus a few inferred filter fields. You do one activity per prompt.

This is not re-extraction. The activity text already exists and is trusted. Your job is to translate it and add filter metadata — never to re-discover or re-interpret the activity.

Your task

The prompt gives you two blocks:

Current activity values — the existing fields (name, description, rules, variations, language, and any participants/duration/age already set).
Source chunk text — the original passage the activity came from. This is your ground truth for any expansion. It may be unavailable; if so, translate only what is in the current values and do not invent anything.

Produce one JSON object and write it to the path named in the prompt (data/enrichment_parts/<content_key>.json). It MUST contain the exact content_key string from the prompt.

Rules

Translation (always)

Translate name, description, rules, variations into natural, fluent Romanian → name_ro, description_ro, rules_ro, variations_ro.
If a field is already Romanian, still copy a clean Romanian version into the *_ro twin (lightly polished). If a source field is empty/null, omit its *_ro twin entirely (do not emit empty strings).
Translate faithfully. Keep proper names, do not add moralizing, do not change the rules of the game.

Description expansion (constrained)

You MAY make description_ro richer than a literal translation — but ONLY using detail that is actually present in the source chunk text. Fold in setup, steps, or materials that the source states but the short description omitted.
You may NOT invent steps, counts, durations, or variations that are not in the source. If the source is thin, the translation stays thin. Hallucinated expansion is the one unacceptable failure here.

Inferred filter fields (mark when inferred)

Fill these when you can, using the source text first, then reasonable inference:

indoor_outdoor: one of indoor, outdoor, either.
space_needed: one of mic, mediu, mare (small / medium / large area).
participants_min, participants_max: integers (people).
duration_min, duration_max: integers (minutes).
age_group_min, age_group_max: integers (years).

For any of these fields whose value you inferred (the source did not state it explicitly), add the field name to the estimated_fields array. If the source explicitly states a value, set the field but do NOT list it in estimated_fields. Omit a field entirely if you have no basis at all — do not guess wildly just to fill it.

Do not contradict a value already present in the current activity values unless the source text clearly supports a correction.

Enum vocabulary (fixed — use these exact slugs)

indoor_outdoor: indoor | outdoor | either
space_needed: mic | mediu | mare

Output format

Write exactly one JSON object to data/enrichment_parts/<content_key>.json:

{
  "content_key": "<the exact key from the prompt>",
  "name_ro": "…",
  "description_ro": "…",
  "rules_ro": "…",
  "variations_ro": "…",
  "indoor_outdoor": "outdoor",
  "space_needed": "mediu",
  "participants_min": 6,
  "participants_max": 20,
  "duration_min": 15,
  "duration_max": 30,
  "age_group_min": 8,
  "age_group_max": 14,
  "estimated_fields": ["space_needed", "duration_min", "duration_max"]
}

Include only the fields you actually fill. Always include content_key and estimated_fields (use [] if nothing was inferred). Output valid JSON only — no commentary, no markdown fences in the file itself.

Report

After writing the file, report in under 30 words: the activity name and which fields you estimated.

3.9 KiB Raw Blame History