Add docTR as primary OCR engine with 2-tier sequential processing, OCR metrics tracking, and simplified engine selection. Features: - docTR OCR engine with light+medium preprocessing tiers - doctr_plus mode with early exit optimization (~65% fast path) - OCR metrics dashboard with per-engine statistics - User OCR preference persistence - Parallel worker pool for OCR processing - Cross-validation for extraction quality Engine options: tesseract, doctr, doctr_plus (recommended), paddleocr 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
4.1 KiB
4.1 KiB
OCR Test Results - docTR+ Engine
Date: 2026-01-02 | Receipts: 26 | Test: Sequential
Summary Comparison
| Workers | Avg | Total | Mem Used | Mem Avail |
|---|---|---|---|---|
| 1 | 6.8s | 176s | 3.2GB | 4.1GB |
| 2 | 7.2s | 187s | 3.1GB | 4.1GB |
| 3 | 6.8s | 176s | 3.9GB | 3.3GB |
Success Rate: 80.8% (21/26) - same for all configs
Note: For sequential tests, 1 worker ≈ 3 workers speed! Multiple workers only help with parallel requests.
Detailed Results (1 Worker)
| # | Receipt | Time | Tier | Result | Notes |
|---|---|---|---|---|---|
| 01 | abonament kineterra | 6.8s | T1 | ✓ | 97% |
| 02 | benzina 14 august | 6.0s | T1 | ✓ | 83% |
| 03 | benzina 27 octombrie | 5.9s | T1 | ✓ | 83% |
| 04 | igiena 11 octombrie | 7.7s | T1 | ✓ | 97% |
| 05 | igiena 14 dec five-holding | 11.5s | T1+T2 | ✗ | TOTAL ±1 |
| 06 | rechizite 12 dec pictus | 5.9s | T1 | ✓ | 97% |
| 07 | benzina 10 mai 2025 | 5.1s | T1 | ✓ | 83% |
| 08 | brick consumabil 604 50% | 4.8s | T1 | ✓ | 97% |
| 09 | benzina 13 septembrie | 4.9s | T1 | ✓ | 83% |
| 10 | brick consumabile 604 | 5.3s | T1 | ✓ | 97% |
| 11 | benzina 20 dec | 5.8s | T1 | ✓ | 79% |
| 12 | bon fiscal Dedeman | 5.7s | T1 | ✓ | 90% |
| 13 | factura Dedeman | 6.8s | T1 | ✓ | 97% |
| 14 | benzina 13 iulie | 5.7s | T1 | ✓ | 95% |
| 15 | best print stampila | 4.5s | T1 | ✓ | 94% |
| 16 | electrobering telecomanda | 4.8s | T1 | ✓ | 97% |
| 17 | brick igiena 8 oct | 11.9s | T1+T2 | ✗ | TOTAL/CUI |
| 18 | gama ink refill toner | 5.9s | T1 | ✓ | 94% |
| 19 | kineterra fizioterapie | 4.6s | T1 | ✓ | 97% |
| 20 | brick igiena 1 sept | 12.5s | T1+T2 | ✗ | ALL None |
| 21 | kineterra abonament | 5.6s | T1 | ✓ | 97% |
| 22 | brick igiena electrice | 15.9s | T1+T2 | ✗ | DATE None |
| 23 | electrobering igiena | 4.4s | T1 | ✓ | 97% |
| 24 | Lidl papetarie 604 | 5.8s | T1 | ✓ | 87% |
| 25 | brick igiena 604 | 6.8s | T1 | ✗ | DATE ±1 |
| 26 | unlimited duplicat | 4.8s | T1 | ✓ | 86% |
Time Comparison by Receipt
| # | Receipt | 1W | 2W | 3W |
|---|---|---|---|---|
| 01 | abonament kineterra | 6.8s | 6.7s | 5.8s |
| 02 | benzina 14 august | 6.0s | 5.5s | 5.8s |
| 03 | benzina 27 octombrie | 5.9s | 5.9s | 5.7s |
| 04 | igiena 11 octombrie | 7.7s | 8.9s | 7.4s |
| 05 | igiena 14 dec (FAIL) | 11.5s | 12.3s | 11.9s |
| 06 | rechizite pictus | 5.9s | 5.9s | 5.7s |
| 07 | benzina 10 mai | 5.1s | 6.0s | 5.8s |
| 08 | brick 50% | 4.8s | 5.9s | 5.5s |
| 09 | benzina 13 sept | 4.9s | 5.9s | 5.3s |
| 10 | brick consumabile | 5.3s | 5.7s | 5.7s |
| 11 | benzina 20 dec | 5.8s | 5.4s | 5.8s |
| 12 | bon Dedeman | 5.7s | 5.9s | 5.8s |
| 13 | factura Dedeman | 6.8s | 6.9s | 6.8s |
| 14 | benzina 13 iulie | 5.7s | 6.1s | 5.4s |
| 15 | best print | 4.5s | 5.8s | 4.8s |
| 16 | electrobering | 4.8s | 4.2s | 4.7s |
| 17 | brick 8 oct (FAIL) | 11.9s | 13.1s | 12.0s |
| 18 | gama ink | 5.9s | 5.9s | 4.7s |
| 19 | kineterra fizioterapie | 4.6s | 5.9s | 4.8s |
| 20 | brick 1 sept (FAIL) | 12.5s | 13.2s | 13.1s |
| 21 | kineterra abonament | 5.6s | 4.9s | 4.8s |
| 22 | brick electrice (FAIL) | 15.9s | 17.0s | 15.5s |
| 23 | electrobering igiena | 4.4s | 5.4s | 5.0s |
| 24 | Lidl papetarie | 5.8s | 6.9s | 5.8s |
| 25 | brick 604 (FAIL) | 6.8s | 6.5s | 6.9s |
| 26 | unlimited duplicat | 4.8s | 5.8s | 5.0s |
| --- | --------- | ---- | ---- | ----- |
| AVG | 6.8s | 7.2s | 6.8s | |
| TOTAL | 176s | 187s | 176s |
Tier Analysis
- T1 only (early exit): 21 receipts (~5-6s)
- T1+T2 (full): 5 receipts (~12-16s)
Failures (5)
| Receipt | Issue | Fixable |
|---|---|---|
| igiena 14 dec | TOTAL ±1 | No |
| brick 8 oct | TOTAL/CUI | Maybe |
| brick 1 sept | ALL None | No (bad doc) |
| brick electrice | DATE None | Maybe |
| brick 604 | DATE ±1 | No |
Recommendation
OCR_WORKERS=1 # Best for sequential, saves RAM
OCR_WORKERS=2 # For parallel requests (production)
OCR_MAX_TASKS_PER_CHILD=0 # No restart
For 8GB RAM: Use 1-2 workers max