3d9f266696182188d895f86dbb0e1c97e5d14b3e
61 chunks × LLM subagent extraction yielded 1780 raw activities; build_database dedup + hallucination check yielded 1732 in DB. Pilot metrics vs plan acceptance thresholds: - hallucinated drops : 19/1780 = 1.07% (threshold ≤ 2%) - schema-rejected files : 0/61 (threshold ≥ 0.9 valid) - chunks needing re-extract: 13/61 (paraphrased excerpts 75-90/100) - % with rules : 99.9% - extraction_confidence high: 1712/1732 = 98.8% OCR decision: NOT NEEDED. The Cartea_Mare scanned-PDF candidate extracted 151 pages / 38k words of real text via pdfplumber alone. Pilot files: - 1000 Fantastic Scout Games (EN, 278pg, 18 chunks → 946 activities) - dragon.sleepdeprived.ca/games mirror (EN, 498pg, 31 chunks → 531) - Cartea Mare a Jocurilor (RO, 151pg, 10 chunks → 284) - Activităţi şi jocuri ... .doc (RO, 7pg, 1 chunk → 19, needs_review) - Amazing Race templates zip (graphics only, 0 activities — expected) The old activities.db was backed up to .bak before atomic swap. tests/ still green (71 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Description
No description provided
Languages
Python
79.5%
HTML
11.3%
CSS
4.1%
JavaScript
3%
Shell
1.6%
Other
0.5%