feat(ocr): Add docTR OCR engine with metrics infrastructure

Add docTR as primary OCR engine with 2-tier sequential processing, OCR metrics tracking, and simplified engine selection. Features: - docTR OCR engine with light+medium preprocessing tiers - doctr_plus mode with early exit optimization (~65% fast path) - OCR metrics dashboard with per-engine statistics - User OCR preference persistence - Parallel worker pool for OCR processing - Cross-validation for extraction quality Engine options: tesseract, doctr, doctr_plus (recommended), paddleocr 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 05:37:16 +02:00
parent 74f7aefc26
commit 495790411f
75 changed files with 23349 additions and 1311 deletions
--- a/backend/modules/data_entry/migrations/env.py
+++ b/backend/modules/data_entry/migrations/env.py
@@ -17,6 +17,7 @@ load_dotenv()
 from backend.modules.data_entry.db.models.receipt import Receipt, ReceiptAttachment
 from backend.modules.data_entry.db.models.accounting_entry import AccountingEntry
 from backend.modules.data_entry.db.models.nomenclature import SyncedSupplier, LocalSupplier, SyncedCashRegister
+from backend.modules.data_entry.db.models.ocr_settings import UserOCRPreference, OCRJobMetrics

 # this is the Alembic Config object, which provides
 # access to the values within the .ini file in use.