game-library

Author	SHA1	Message	Date
Claude Agent	3d9f266696	Faza 0 pilot: rebuild activities.db from 5-file extraction 61 chunks × LLM subagent extraction yielded 1780 raw activities; build_database dedup + hallucination check yielded 1732 in DB. Pilot metrics vs plan acceptance thresholds: - hallucinated drops : 19/1780 = 1.07% (threshold ≤ 2%) - schema-rejected files : 0/61 (threshold ≥ 0.9 valid) - chunks needing re-extract: 13/61 (paraphrased excerpts 75-90/100) - % with rules : 99.9% - extraction_confidence high: 1712/1732 = 98.8% OCR decision: NOT NEEDED. The Cartea_Mare scanned-PDF candidate extracted 151 pages / 38k words of real text via pdfplumber alone. Pilot files: - 1000 Fantastic Scout Games (EN, 278pg, 18 chunks → 946 activities) - dragon.sleepdeprived.ca/games mirror (EN, 498pg, 31 chunks → 531) - Cartea Mare a Jocurilor (RO, 151pg, 10 chunks → 284) - Activităţi şi jocuri ... .doc (RO, 7pg, 1 chunk → 19, needs_review) - Amazing Race templates zip (graphics only, 0 activities — expected) The old activities.db was backed up to .bak before atomic swap. tests/ still green (71 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 07:43:42 +00:00
Claude Agent	66ae831c36	Rebuild extraction pipeline infrastructure (Faza 0 prep) Implements the approved plan to replace the broken regex/index-master extraction with an LLM-subagent pipeline. Four parallel lanes: Lane A — scripts/extract_common.py (PDF/docx/doc/pptx/html/zip, no max_pages truncation), normalize_sources.py, chunk_sources.py (~20pg chunks + overlap, manifest registry), activity_schema.json. Lane B — app/config_taxonomy.py (16 fixed category slugs), schema rebuilt from scratch in app/models/ with content_type, language, source_files, source_excerpt, normalized_name, extraction_confidence, needs_review; FTS5 + 3 triggers extended with materials_list and skills_developed. Lane C — build_database.py (--rebuild, atomic swap, schema + fuzzy source_excerpt validation, dedup with needs_review band), validate_extractions.py, review_queue.py, new run_extraction.py orchestrator, SUBAGENT_PROMPT.md. Lane D — search.py content_type/language filters (default search excludes non-game content), E7 schema-compat audit; fixed a NULL keywords AttributeError in _boost_search_relevance. Removes 8 orphaned/dead scripts and app/services/parser.py + indexer.py. Adds tests/ (70 passing, 1 skipped — libreoffice absent). Note: Lane D made one additive edit to app/models/database.py (_update_category_counts) to surface content_type/language in get_filter_options, outside its nominal lane boundary but after Lane B completed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 17:43:38 +00:00
Claude Agent	e0080edf85	gitignore	2026-05-19 17:27:29 +00:00
Claude Agent	c68dda6c87	Preflight: untrack generated data, fix dangerous .gitignore patterns Per plan E2/E3: ignore regenerated extraction data (sources, chunks, extracted, carti-camp-jocuri) and replace the test.py / debug.py / temp.py / test.db patterns that would silently hide the test suite. Keep activities.db, the hand-written index, golden set and test fixtures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 17:25:23 +00:00
Marius Mutu	a19ddf0b71	Refactor extraction system and reorganize project structure - Remove obsolete documentation files (DEPLOYMENT.md, PLAN_IMPLEMENTARE_S8_DETALIAT.md, README.md) - Add comprehensive extraction pipeline with multiple format support (PDF, HTML, text) - Implement Claude-based activity extraction with structured templates - Update dependencies and Docker configuration - Reorganize scripts directory with modular extraction components - Move example documentation to appropriate location 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-11 23:32:37 +03:00
Marius Mutu	1b6b7e06ad	Add strategic implementation plan for S8 Hybrid extraction strategy - Complete detailed plan for automated activity extraction from 2000+ files - Hybrid approach: Python scripts for HTML/TXT/MD + Claude for PDF/DOC - Includes full Python extractors with error handling and batch processing - Template for Claude-assisted PDF/DOC processing (high-value files) - Orchestrator script for complete automation workflow - Estimated result: 2000+ activities indexed in 8 hours total work Key components: - HTML extractor for 1876 files (BeautifulSoup + pattern recognition) - Text/MD extractor for 45 files (regex patterns + markdown parsing) - Unified processor with progress tracking and batch saving - Claude extraction templates with JSON import system - Complete automation for 90% of files, manual assist for 10% high-value 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-11 01:45:47 +03:00
Marius Mutu	4f83b8e73c	Complete v2.0 transformation: Production-ready Flask application Major Changes: - Migrated from prototype to production architecture - Implemented modular Flask app with models/services/web layers - Added Docker containerization with docker-compose - Switched to Pipenv for dependency management - Built advanced parser extracting 63 real activities from INDEX_MASTER - Implemented SQLite FTS5 full-text search - Created minimalist, responsive web interface - Added comprehensive documentation and deployment guides Technical Improvements: - Clean separation of concerns (models, services, web) - Enhanced database schema with FTS5 indexing - Dynamic filters populated from real data - Production-ready configuration management - Security best practices implementation - Health monitoring and API endpoints Removed Legacy Files: - Old src/ directory structure - Static requirements.txt (replaced by Pipfile) - Test and debug files - Temporary cache files Current Status: - 63 activities indexed across 8 categories - Full-text search operational - Docker deployment ready - Production documentation complete 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-11 00:23:47 +03:00
Marius Mutu	ed0fc0d010	gitignore	2025-09-10 23:21:09 +03:00
Marius Mutu	f47a31812e	curatare readme	2025-09-10 23:19:45 +03:00
Marius Mutu	7cb308d03f	Add database documentation and setup script - Add comprehensive DATABASE_SCHEMA.md with complete SQLite schemas - Document all 3 databases: activities.db, game_library.db, test_activities.db - Include recreation methods, examples, and troubleshooting - Add scripts/create_databases.py for automated database setup - Move README.md to project root for better visibility - Ensure *.db files excluded via .gitignore are fully documented 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-10 00:45:06 +03:00
Marius Mutu	fd87ebca03	Initial commit: Organize project structure - Create organized directory structure (src/, docs/, data/, static/, templates/) - Add comprehensive .gitignore for Python projects - Move Python source files to src/ - Move documentation files to docs/ with project/ and user/ subdirectories - Move database files to data/ - Update all database path references in Python code - Maintain Flask static/ and templates/ directories 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-09-10 00:40:39 +03:00

11 Commits