Add docTR as primary OCR engine with 2-tier sequential processing, OCR metrics tracking, and simplified engine selection. Features: - docTR OCR engine with light+medium preprocessing tiers - doctr_plus mode with early exit optimization (~65% fast path) - OCR metrics dashboard with per-engine statistics - User OCR preference persistence - Parallel worker pool for OCR processing - Cross-validation for extraction quality Engine options: tesseract, doctr, doctr_plus (recommended), paddleocr 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
107 lines
4.1 KiB
Markdown
107 lines
4.1 KiB
Markdown
# OCR Test Results - docTR+ Engine
|
|
|
|
**Date:** 2026-01-02 | **Receipts:** 26 | **Test:** Sequential
|
|
|
|
## Summary Comparison
|
|
|
|
| Workers | Avg | Total | Mem Used | Mem Avail |
|
|
|---------|-----|-------|----------|-----------|
|
|
| 1 | 6.8s | 176s | 3.2GB | 4.1GB |
|
|
| 2 | 7.2s | 187s | 3.1GB | 4.1GB |
|
|
| 3 | 6.8s | 176s | 3.9GB | 3.3GB |
|
|
|
|
**Success Rate:** 80.8% (21/26) - same for all configs
|
|
|
|
**Note:** For sequential tests, 1 worker ≈ 3 workers speed!
|
|
Multiple workers only help with parallel requests.
|
|
|
|
## Detailed Results (1 Worker)
|
|
|
|
| # | Receipt | Time | Tier | Result | Notes |
|
|
|---|---------|------|------|--------|-------|
|
|
| 01 | abonament kineterra | 6.8s | T1 | ✓ | 97% |
|
|
| 02 | benzina 14 august | 6.0s | T1 | ✓ | 83% |
|
|
| 03 | benzina 27 octombrie | 5.9s | T1 | ✓ | 83% |
|
|
| 04 | igiena 11 octombrie | 7.7s | T1 | ✓ | 97% |
|
|
| 05 | igiena 14 dec five-holding | 11.5s | T1+T2 | ✗ | TOTAL ±1 |
|
|
| 06 | rechizite 12 dec pictus | 5.9s | T1 | ✓ | 97% |
|
|
| 07 | benzina 10 mai 2025 | 5.1s | T1 | ✓ | 83% |
|
|
| 08 | brick consumabil 604 50% | 4.8s | T1 | ✓ | 97% |
|
|
| 09 | benzina 13 septembrie | 4.9s | T1 | ✓ | 83% |
|
|
| 10 | brick consumabile 604 | 5.3s | T1 | ✓ | 97% |
|
|
| 11 | benzina 20 dec | 5.8s | T1 | ✓ | 79% |
|
|
| 12 | bon fiscal Dedeman | 5.7s | T1 | ✓ | 90% |
|
|
| 13 | factura Dedeman | 6.8s | T1 | ✓ | 97% |
|
|
| 14 | benzina 13 iulie | 5.7s | T1 | ✓ | 95% |
|
|
| 15 | best print stampila | 4.5s | T1 | ✓ | 94% |
|
|
| 16 | electrobering telecomanda | 4.8s | T1 | ✓ | 97% |
|
|
| 17 | brick igiena 8 oct | 11.9s | T1+T2 | ✗ | TOTAL/CUI |
|
|
| 18 | gama ink refill toner | 5.9s | T1 | ✓ | 94% |
|
|
| 19 | kineterra fizioterapie | 4.6s | T1 | ✓ | 97% |
|
|
| 20 | brick igiena 1 sept | 12.5s | T1+T2 | ✗ | ALL None |
|
|
| 21 | kineterra abonament | 5.6s | T1 | ✓ | 97% |
|
|
| 22 | brick igiena electrice | 15.9s | T1+T2 | ✗ | DATE None |
|
|
| 23 | electrobering igiena | 4.4s | T1 | ✓ | 97% |
|
|
| 24 | Lidl papetarie 604 | 5.8s | T1 | ✓ | 87% |
|
|
| 25 | brick igiena 604 | 6.8s | T1 | ✗ | DATE ±1 |
|
|
| 26 | unlimited duplicat | 4.8s | T1 | ✓ | 86% |
|
|
|
|
## Time Comparison by Receipt
|
|
|
|
| # | Receipt | 1W | 2W | 3W |
|
|
|---|---------|----|----|-----|
|
|
| 01 | abonament kineterra | 6.8s | 6.7s | 5.8s |
|
|
| 02 | benzina 14 august | 6.0s | 5.5s | 5.8s |
|
|
| 03 | benzina 27 octombrie | 5.9s | 5.9s | 5.7s |
|
|
| 04 | igiena 11 octombrie | 7.7s | 8.9s | 7.4s |
|
|
| 05 | igiena 14 dec (FAIL) | 11.5s | 12.3s | 11.9s |
|
|
| 06 | rechizite pictus | 5.9s | 5.9s | 5.7s |
|
|
| 07 | benzina 10 mai | 5.1s | 6.0s | 5.8s |
|
|
| 08 | brick 50% | 4.8s | 5.9s | 5.5s |
|
|
| 09 | benzina 13 sept | 4.9s | 5.9s | 5.3s |
|
|
| 10 | brick consumabile | 5.3s | 5.7s | 5.7s |
|
|
| 11 | benzina 20 dec | 5.8s | 5.4s | 5.8s |
|
|
| 12 | bon Dedeman | 5.7s | 5.9s | 5.8s |
|
|
| 13 | factura Dedeman | 6.8s | 6.9s | 6.8s |
|
|
| 14 | benzina 13 iulie | 5.7s | 6.1s | 5.4s |
|
|
| 15 | best print | 4.5s | 5.8s | 4.8s |
|
|
| 16 | electrobering | 4.8s | 4.2s | 4.7s |
|
|
| 17 | brick 8 oct (FAIL) | 11.9s | 13.1s | 12.0s |
|
|
| 18 | gama ink | 5.9s | 5.9s | 4.7s |
|
|
| 19 | kineterra fizioterapie | 4.6s | 5.9s | 4.8s |
|
|
| 20 | brick 1 sept (FAIL) | 12.5s | 13.2s | 13.1s |
|
|
| 21 | kineterra abonament | 5.6s | 4.9s | 4.8s |
|
|
| 22 | brick electrice (FAIL) | 15.9s | 17.0s | 15.5s |
|
|
| 23 | electrobering igiena | 4.4s | 5.4s | 5.0s |
|
|
| 24 | Lidl papetarie | 5.8s | 6.9s | 5.8s |
|
|
| 25 | brick 604 (FAIL) | 6.8s | 6.5s | 6.9s |
|
|
| 26 | unlimited duplicat | 4.8s | 5.8s | 5.0s |
|
|
|---|---------|----|----|-----|
|
|
| **AVG** | | **6.8s** | **7.2s** | **6.8s** |
|
|
| **TOTAL** | | **176s** | **187s** | **176s** |
|
|
|
|
## Tier Analysis
|
|
|
|
- **T1 only (early exit):** 21 receipts (~5-6s)
|
|
- **T1+T2 (full):** 5 receipts (~12-16s)
|
|
|
|
## Failures (5)
|
|
|
|
| Receipt | Issue | Fixable |
|
|
|---------|-------|---------|
|
|
| igiena 14 dec | TOTAL ±1 | No |
|
|
| brick 8 oct | TOTAL/CUI | Maybe |
|
|
| brick 1 sept | ALL None | No (bad doc) |
|
|
| brick electrice | DATE None | Maybe |
|
|
| brick 604 | DATE ±1 | No |
|
|
|
|
## Recommendation
|
|
|
|
```
|
|
OCR_WORKERS=1 # Best for sequential, saves RAM
|
|
OCR_WORKERS=2 # For parallel requests (production)
|
|
OCR_MAX_TASKS_PER_CHILD=0 # No restart
|
|
```
|
|
|
|
**For 8GB RAM:** Use 1-2 workers max
|