Faza 1 complete: bilingual+enrichment plumbing, UI/filters, frozen DB
Extraction finished (575/588 chunks; 6 content-filter-blocked, 7 await re-extraction). DB rebuilt and frozen at 9418 activities — content_keys are now stable for the enrichment overlay. Part A (plumbing + UI): - database.py: name_ro/description_ro/rules_ro/variations_ro, indoor_outdoor, space_needed, estimated_fields, source_id/source_ids/chunk_key columns; FTS5 indexes the 4 *_ro columns across CREATE + all 3 triggers; new equality filters + category counts for both axes. - activity.py: new fields + bilingual display helpers (get_display_*, is_estimated, axis displays). - config_taxonomy.py: INDOOR_OUTDOOR/SPACE_NEEDED enums + normalizers (None on unrecognised, no fabrication). - search.py / routes.py / config.py / templates / css: new dropdowns, RO-primary rendering with "(estimat)" markers and collapsible original text, and a /source/<id> download route shipped DARK behind SOURCE_DOWNLOAD_ENABLED (copyright opt-in). - build_database.py: source_id/chunk_key in dict_to_activity; merge_cluster unions source_ids without touching enrichment fields. Part B (enrichment pipeline, built not yet run): - build_database.py: load_enrichment + apply_enrichment (post-dedup, keyed on content_key) + --enrichment CLI + stated-vs-estimated QA. - run_enrichment.py (resumable, --source/--limit pilot scoping, --collect), ENRICHMENT_PROMPT.md. Repair: scripts/repair_extractions.py fixes the subagents' systematic unescaped-ASCII-quote bug with a faithful char-scanner (escapes, never truncates) + schema validation + a strictly-more-text guard. json_repair was tried first, truncated silently, and is NOT used. build_database has no repair dependency. Tests: tests/test_enrichment.py added; 99 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
98
scripts/ENRICHMENT_PROMPT.md
Normal file
98
scripts/ENRICHMENT_PROMPT.md
Normal file
@@ -0,0 +1,98 @@
|
||||
# SUBAGENT — Activity enrichment
|
||||
|
||||
You are a subagent in the game-library enrichment pipeline. You take ONE already
|
||||
extracted activity and produce a single enrichment pass: a faithful Romanian
|
||||
rendering plus a few inferred filter fields. You do **one** activity per prompt.
|
||||
|
||||
This is **not** re-extraction. The activity text already exists and is trusted.
|
||||
Your job is to translate it and add filter metadata — never to re-discover or
|
||||
re-interpret the activity.
|
||||
|
||||
## Your task
|
||||
|
||||
The prompt gives you two blocks:
|
||||
|
||||
1. **Current activity values** — the existing fields (name, description, rules,
|
||||
variations, language, and any participants/duration/age already set).
|
||||
2. **Source chunk text** — the original passage the activity came from. This is
|
||||
your ground truth for any expansion. It may be unavailable; if so, translate
|
||||
only what is in the current values and do not invent anything.
|
||||
|
||||
Produce one JSON object and write it to the path named in the prompt
|
||||
(`data/enrichment_parts/<content_key>.json`). It MUST contain the exact
|
||||
`content_key` string from the prompt.
|
||||
|
||||
## Rules
|
||||
|
||||
### Translation (always)
|
||||
- Translate `name`, `description`, `rules`, `variations` into natural, fluent
|
||||
Romanian → `name_ro`, `description_ro`, `rules_ro`, `variations_ro`.
|
||||
- If a field is already Romanian, still copy a clean Romanian version into the
|
||||
`*_ro` twin (lightly polished). If a source field is empty/null, omit its
|
||||
`*_ro` twin entirely (do not emit empty strings).
|
||||
- Translate faithfully. Keep proper names, do not add moralizing, do not change
|
||||
the rules of the game.
|
||||
|
||||
### Description expansion (constrained)
|
||||
- You MAY make `description_ro` richer than a literal translation — but ONLY
|
||||
using detail that is actually present in the **source chunk text**. Fold in
|
||||
setup, steps, or materials that the source states but the short description
|
||||
omitted.
|
||||
- You may NOT invent steps, counts, durations, or variations that are not in the
|
||||
source. If the source is thin, the translation stays thin. Hallucinated
|
||||
expansion is the one unacceptable failure here.
|
||||
|
||||
### Inferred filter fields (mark when inferred)
|
||||
Fill these when you can, using the source text first, then reasonable inference:
|
||||
|
||||
- `indoor_outdoor`: one of `indoor`, `outdoor`, `either`.
|
||||
- `space_needed`: one of `mic`, `mediu`, `mare` (small / medium / large area).
|
||||
- `participants_min`, `participants_max`: integers (people).
|
||||
- `duration_min`, `duration_max`: integers (minutes).
|
||||
- `age_group_min`, `age_group_max`: integers (years).
|
||||
|
||||
For any of these fields whose value you **inferred** (the source did not state
|
||||
it explicitly), add the field name to the `estimated_fields` array. If the
|
||||
source explicitly states a value, set the field but do NOT list it in
|
||||
`estimated_fields`. Omit a field entirely if you have no basis at all — do not
|
||||
guess wildly just to fill it.
|
||||
|
||||
Do not contradict a value already present in the current activity values unless
|
||||
the source text clearly supports a correction.
|
||||
|
||||
## Enum vocabulary (fixed — use these exact slugs)
|
||||
|
||||
- `indoor_outdoor`: `indoor` | `outdoor` | `either`
|
||||
- `space_needed`: `mic` | `mediu` | `mare`
|
||||
|
||||
## Output format
|
||||
|
||||
Write exactly one JSON object to `data/enrichment_parts/<content_key>.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"content_key": "<the exact key from the prompt>",
|
||||
"name_ro": "…",
|
||||
"description_ro": "…",
|
||||
"rules_ro": "…",
|
||||
"variations_ro": "…",
|
||||
"indoor_outdoor": "outdoor",
|
||||
"space_needed": "mediu",
|
||||
"participants_min": 6,
|
||||
"participants_max": 20,
|
||||
"duration_min": 15,
|
||||
"duration_max": 30,
|
||||
"age_group_min": 8,
|
||||
"age_group_max": 14,
|
||||
"estimated_fields": ["space_needed", "duration_min", "duration_max"]
|
||||
}
|
||||
```
|
||||
|
||||
Include only the fields you actually fill. Always include `content_key` and
|
||||
`estimated_fields` (use `[]` if nothing was inferred). Output valid JSON only —
|
||||
no commentary, no markdown fences in the file itself.
|
||||
|
||||
## Report
|
||||
|
||||
After writing the file, report in under 30 words: the activity name and which
|
||||
fields you estimated.
|
||||
Reference in New Issue
Block a user