game-library

Author	SHA1	Message	Date
Claude Agent	f7a37f91ec	Headless cron enrichment system + progress checkpoint at 32% OS cron fires enrich_wave.sh twice nightly (post 23:00 UTC reset); each wave caps at ~700 keys (~75% window) via enrichment_wave.py --prepare. Fully headless: one claude -p per batch via xargs, flock-guarded, idempotent. DB updated to 9541 activities; .gitignore covers enrichment intermediates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 21:26:35 +00:00
Claude Agent	d6971e47f8	Prevent + net the unescaped-quote bug in the durable prompts/pipeline The escape-ASCII-quote rule previously lived only in ephemeral Agent-call strings. Bake it into the durable artifacts so the next session doesn't re-derive it: - SUBAGENT_PROMPT.md + ENRICHMENT_PROMPT.md: explicit rule to escape any ASCII " inside JSON string values (Romanian „cuvânt" is the trap). - run_enrichment.py collect_enrichment: repair malformed parts with escape_stray_quotes instead of dropping them — the enrichment path had no repair net (bad parts were silently dropped, losing that activity's enrichment). Extraction already had one; now both do. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 18:16:04 +00:00
Claude Agent	bcfb6841eb	Faza 1 complete: bilingual+enrichment plumbing, UI/filters, frozen DB Extraction finished (575/588 chunks; 6 content-filter-blocked, 7 await re-extraction). DB rebuilt and frozen at 9418 activities — content_keys are now stable for the enrichment overlay. Part A (plumbing + UI): - database.py: name_ro/description_ro/rules_ro/variations_ro, indoor_outdoor, space_needed, estimated_fields, source_id/source_ids/chunk_key columns; FTS5 indexes the 4 _ro columns across CREATE + all 3 triggers; new equality filters + category counts for both axes. - activity.py: new fields + bilingual display helpers (get_display_, is_estimated, axis displays). - config_taxonomy.py: INDOOR_OUTDOOR/SPACE_NEEDED enums + normalizers (None on unrecognised, no fabrication). - search.py / routes.py / config.py / templates / css: new dropdowns, RO-primary rendering with "(estimat)" markers and collapsible original text, and a /source/<id> download route shipped DARK behind SOURCE_DOWNLOAD_ENABLED (copyright opt-in). - build_database.py: source_id/chunk_key in dict_to_activity; merge_cluster unions source_ids without touching enrichment fields. Part B (enrichment pipeline, built not yet run): - build_database.py: load_enrichment + apply_enrichment (post-dedup, keyed on content_key) + --enrichment CLI + stated-vs-estimated QA. - run_enrichment.py (resumable, --source/--limit pilot scoping, --collect), ENRICHMENT_PROMPT.md. Repair: scripts/repair_extractions.py fixes the subagents' systematic unescaped-ASCII-quote bug with a faithful char-scanner (escapes, never truncates) + schema validation + a strictly-more-text guard. json_repair was tried first, truncated silently, and is NOT used. build_database has no repair dependency. Tests: tests/test_enrichment.py added; 99 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 18:10:13 +00:00
Claude Agent	46d9592a55	HANDOFF for Faza 1 resumption (10.9% done, switch to Sonnet) 64/588 chunks extracted so far (~1949 activities) but in a fresh session we should switch the subagent model from Opus to Sonnet — the task is structured JSON extraction with a fixed schema, no complex reasoning needed, and Sonnet's 200K context easily fits the ~25k-token prompt and ~20k-token output per chunk. Document captures the exact resume procedure: pending-chunk discovery, the Agent call template with model:"sonnet", and the finalization steps (validate -> build_database -> needs_review bulk merge). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 19:32:44 +00:00
Claude Agent	09999ccd40	Faza 0 follow-ups: re-extract 13 chunks, resolve 377 needs_review - Re-extracted the 13 chunks with paraphrased source_excerpts (root cause: original excerpts straddled --- PAGE N --- markers which the rapidfuzz partial_ratio scored 75-90/100). Re-extraction used verbatim within-page quotes; all now score 100/100. - Hallucinated drops: 19 -> 0. - Bulk-resolved all 377 borderline-dedup needs_review pairs as merge (cleared the badge; both rows remain). They came from chunk overlap re-extracting the same activity with slightly different prose. - Final DB: 1751 activities (was 1732). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 07:59:36 +00:00
Claude Agent	3d9f266696	Faza 0 pilot: rebuild activities.db from 5-file extraction 61 chunks × LLM subagent extraction yielded 1780 raw activities; build_database dedup + hallucination check yielded 1732 in DB. Pilot metrics vs plan acceptance thresholds: - hallucinated drops : 19/1780 = 1.07% (threshold ≤ 2%) - schema-rejected files : 0/61 (threshold ≥ 0.9 valid) - chunks needing re-extract: 13/61 (paraphrased excerpts 75-90/100) - % with rules : 99.9% - extraction_confidence high: 1712/1732 = 98.8% OCR decision: NOT NEEDED. The Cartea_Mare scanned-PDF candidate extracted 151 pages / 38k words of real text via pdfplumber alone. Pilot files: - 1000 Fantastic Scout Games (EN, 278pg, 18 chunks → 946 activities) - dragon.sleepdeprived.ca/games mirror (EN, 498pg, 31 chunks → 531) - Cartea Mare a Jocurilor (RO, 151pg, 10 chunks → 284) - Activităţi şi jocuri ... .doc (RO, 7pg, 1 chunk → 19, needs_review) - Amazing Race templates zip (graphics only, 0 activities — expected) The old activities.db was backed up to .bak before atomic swap. tests/ still green (71 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 07:43:42 +00:00
Claude Agent	66ae831c36	Rebuild extraction pipeline infrastructure (Faza 0 prep) Implements the approved plan to replace the broken regex/index-master extraction with an LLM-subagent pipeline. Four parallel lanes: Lane A — scripts/extract_common.py (PDF/docx/doc/pptx/html/zip, no max_pages truncation), normalize_sources.py, chunk_sources.py (~20pg chunks + overlap, manifest registry), activity_schema.json. Lane B — app/config_taxonomy.py (16 fixed category slugs), schema rebuilt from scratch in app/models/ with content_type, language, source_files, source_excerpt, normalized_name, extraction_confidence, needs_review; FTS5 + 3 triggers extended with materials_list and skills_developed. Lane C — build_database.py (--rebuild, atomic swap, schema + fuzzy source_excerpt validation, dedup with needs_review band), validate_extractions.py, review_queue.py, new run_extraction.py orchestrator, SUBAGENT_PROMPT.md. Lane D — search.py content_type/language filters (default search excludes non-game content), E7 schema-compat audit; fixed a NULL keywords AttributeError in _boost_search_relevance. Removes 8 orphaned/dead scripts and app/services/parser.py + indexer.py. Adds tests/ (70 passing, 1 skipped — libreoffice absent). Note: Lane D made one additive edit to app/models/database.py (_update_category_counts) to surface content_type/language in get_filter_options, outside its nominal lane boundary but after Lane B completed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 17:43:38 +00:00
Claude Agent	e0080edf85	gitignore	2026-05-19 17:27:29 +00:00
Claude Agent	c68dda6c87	Preflight: untrack generated data, fix dangerous .gitignore patterns Per plan E2/E3: ignore regenerated extraction data (sources, chunks, extracted, carti-camp-jocuri) and replace the test.py / debug.py / temp.py / test.db patterns that would silently hide the test suite. Keep activities.db, the hand-written index, golden set and test fixtures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 17:25:23 +00:00
Marius Mutu	a19ddf0b71	Refactor extraction system and reorganize project structure - Remove obsolete documentation files (DEPLOYMENT.md, PLAN_IMPLEMENTARE_S8_DETALIAT.md, README.md) - Add comprehensive extraction pipeline with multiple format support (PDF, HTML, text) - Implement Claude-based activity extraction with structured templates - Update dependencies and Docker configuration - Reorganize scripts directory with modular extraction components - Move example documentation to appropriate location 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-11 23:32:37 +03:00
Marius Mutu	1b6b7e06ad	Add strategic implementation plan for S8 Hybrid extraction strategy - Complete detailed plan for automated activity extraction from 2000+ files - Hybrid approach: Python scripts for HTML/TXT/MD + Claude for PDF/DOC - Includes full Python extractors with error handling and batch processing - Template for Claude-assisted PDF/DOC processing (high-value files) - Orchestrator script for complete automation workflow - Estimated result: 2000+ activities indexed in 8 hours total work Key components: - HTML extractor for 1876 files (BeautifulSoup + pattern recognition) - Text/MD extractor for 45 files (regex patterns + markdown parsing) - Unified processor with progress tracking and batch saving - Claude extraction templates with JSON import system - Complete automation for 90% of files, manual assist for 10% high-value 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-11 01:45:47 +03:00
Marius Mutu	4f83b8e73c	Complete v2.0 transformation: Production-ready Flask application Major Changes: - Migrated from prototype to production architecture - Implemented modular Flask app with models/services/web layers - Added Docker containerization with docker-compose - Switched to Pipenv for dependency management - Built advanced parser extracting 63 real activities from INDEX_MASTER - Implemented SQLite FTS5 full-text search - Created minimalist, responsive web interface - Added comprehensive documentation and deployment guides Technical Improvements: - Clean separation of concerns (models, services, web) - Enhanced database schema with FTS5 indexing - Dynamic filters populated from real data - Production-ready configuration management - Security best practices implementation - Health monitoring and API endpoints Removed Legacy Files: - Old src/ directory structure - Static requirements.txt (replaced by Pipfile) - Test and debug files - Temporary cache files Current Status: - 63 activities indexed across 8 categories - Full-text search operational - Docker deployment ready - Production documentation complete 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-11 00:23:47 +03:00
Marius Mutu	ed0fc0d010	gitignore	2025-09-10 23:21:09 +03:00
Marius Mutu	f47a31812e	curatare readme	2025-09-10 23:19:45 +03:00
Marius Mutu	7cb308d03f	Add database documentation and setup script - Add comprehensive DATABASE_SCHEMA.md with complete SQLite schemas - Document all 3 databases: activities.db, game_library.db, test_activities.db - Include recreation methods, examples, and troubleshooting - Add scripts/create_databases.py for automated database setup - Move README.md to project root for better visibility - Ensure *.db files excluded via .gitignore are fully documented 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-10 00:45:06 +03:00
Marius Mutu	fd87ebca03	Initial commit: Organize project structure - Create organized directory structure (src/, docs/, data/, static/, templates/) - Add comprehensive .gitignore for Python projects - Move Python source files to src/ - Move documentation files to docs/ with project/ and user/ subdirectories - Move database files to data/ - Update all database path references in Python code - Maintain Flask static/ and templates/ directories 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-10 00:40:39 +03:00

16 Commits