Commit Graph

3 Commits

Author SHA1 Message Date
Claude Agent
3d9f266696 Faza 0 pilot: rebuild activities.db from 5-file extraction
61 chunks × LLM subagent extraction yielded 1780 raw activities;
build_database dedup + hallucination check yielded 1732 in DB.

Pilot metrics vs plan acceptance thresholds:
- hallucinated drops      : 19/1780 = 1.07%  (threshold ≤ 2%)
- schema-rejected files   : 0/61              (threshold ≥ 0.9 valid)
- chunks needing re-extract: 13/61 (paraphrased excerpts 75-90/100)
- % with rules            : 99.9%
- extraction_confidence high: 1712/1732 = 98.8%

OCR decision: NOT NEEDED. The Cartea_Mare scanned-PDF candidate
extracted 151 pages / 38k words of real text via pdfplumber alone.

Pilot files:
- 1000 Fantastic Scout Games (EN, 278pg, 18 chunks → 946 activities)
- dragon.sleepdeprived.ca/games mirror (EN, 498pg, 31 chunks → 531)
- Cartea Mare a Jocurilor (RO, 151pg, 10 chunks → 284)
- Activităţi şi jocuri ... .doc (RO, 7pg, 1 chunk → 19, needs_review)
- Amazing Race templates zip (graphics only, 0 activities — expected)

The old activities.db was backed up to .bak before atomic swap.
tests/ still green (71 passed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 07:43:42 +00:00
a19ddf0b71 Refactor extraction system and reorganize project structure
- Remove obsolete documentation files (DEPLOYMENT.md, PLAN_IMPLEMENTARE_S8_DETALIAT.md, README.md)
- Add comprehensive extraction pipeline with multiple format support (PDF, HTML, text)
- Implement Claude-based activity extraction with structured templates
- Update dependencies and Docker configuration
- Reorganize scripts directory with modular extraction components
- Move example documentation to appropriate location

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-11 23:32:37 +03:00
4f83b8e73c Complete v2.0 transformation: Production-ready Flask application
Major Changes:
- Migrated from prototype to production architecture
- Implemented modular Flask app with models/services/web layers
- Added Docker containerization with docker-compose
- Switched to Pipenv for dependency management
- Built advanced parser extracting 63 real activities from INDEX_MASTER
- Implemented SQLite FTS5 full-text search
- Created minimalist, responsive web interface
- Added comprehensive documentation and deployment guides

Technical Improvements:
- Clean separation of concerns (models, services, web)
- Enhanced database schema with FTS5 indexing
- Dynamic filters populated from real data
- Production-ready configuration management
- Security best practices implementation
- Health monitoring and API endpoints

Removed Legacy Files:
- Old src/ directory structure
- Static requirements.txt (replaced by Pipfile)
- Test and debug files
- Temporary cache files

Current Status:
- 63 activities indexed across 8 categories
- Full-text search operational
- Docker deployment ready
- Production documentation complete

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-11 00:23:47 +03:00