Files
roa2web-service-auto/docs/OCR_TEST_RESULTS.md
Marius Mutu 495790411f feat(ocr): Add docTR OCR engine with metrics infrastructure
Add docTR as primary OCR engine with 2-tier sequential processing,
OCR metrics tracking, and simplified engine selection.

Features:
- docTR OCR engine with light+medium preprocessing tiers
- doctr_plus mode with early exit optimization (~65% fast path)
- OCR metrics dashboard with per-engine statistics
- User OCR preference persistence
- Parallel worker pool for OCR processing
- Cross-validation for extraction quality

Engine options: tesseract, doctr, doctr_plus (recommended), paddleocr

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 05:37:16 +02:00

4.1 KiB

OCR Test Results - docTR+ Engine

Date: 2026-01-02 | Receipts: 26 | Test: Sequential

Summary Comparison

Workers Avg Total Mem Used Mem Avail
1 6.8s 176s 3.2GB 4.1GB
2 7.2s 187s 3.1GB 4.1GB
3 6.8s 176s 3.9GB 3.3GB

Success Rate: 80.8% (21/26) - same for all configs

Note: For sequential tests, 1 worker ≈ 3 workers speed! Multiple workers only help with parallel requests.

Detailed Results (1 Worker)

# Receipt Time Tier Result Notes
01 abonament kineterra 6.8s T1 97%
02 benzina 14 august 6.0s T1 83%
03 benzina 27 octombrie 5.9s T1 83%
04 igiena 11 octombrie 7.7s T1 97%
05 igiena 14 dec five-holding 11.5s T1+T2 TOTAL ±1
06 rechizite 12 dec pictus 5.9s T1 97%
07 benzina 10 mai 2025 5.1s T1 83%
08 brick consumabil 604 50% 4.8s T1 97%
09 benzina 13 septembrie 4.9s T1 83%
10 brick consumabile 604 5.3s T1 97%
11 benzina 20 dec 5.8s T1 79%
12 bon fiscal Dedeman 5.7s T1 90%
13 factura Dedeman 6.8s T1 97%
14 benzina 13 iulie 5.7s T1 95%
15 best print stampila 4.5s T1 94%
16 electrobering telecomanda 4.8s T1 97%
17 brick igiena 8 oct 11.9s T1+T2 TOTAL/CUI
18 gama ink refill toner 5.9s T1 94%
19 kineterra fizioterapie 4.6s T1 97%
20 brick igiena 1 sept 12.5s T1+T2 ALL None
21 kineterra abonament 5.6s T1 97%
22 brick igiena electrice 15.9s T1+T2 DATE None
23 electrobering igiena 4.4s T1 97%
24 Lidl papetarie 604 5.8s T1 87%
25 brick igiena 604 6.8s T1 DATE ±1
26 unlimited duplicat 4.8s T1 86%

Time Comparison by Receipt

# Receipt 1W 2W 3W
01 abonament kineterra 6.8s 6.7s 5.8s
02 benzina 14 august 6.0s 5.5s 5.8s
03 benzina 27 octombrie 5.9s 5.9s 5.7s
04 igiena 11 octombrie 7.7s 8.9s 7.4s
05 igiena 14 dec (FAIL) 11.5s 12.3s 11.9s
06 rechizite pictus 5.9s 5.9s 5.7s
07 benzina 10 mai 5.1s 6.0s 5.8s
08 brick 50% 4.8s 5.9s 5.5s
09 benzina 13 sept 4.9s 5.9s 5.3s
10 brick consumabile 5.3s 5.7s 5.7s
11 benzina 20 dec 5.8s 5.4s 5.8s
12 bon Dedeman 5.7s 5.9s 5.8s
13 factura Dedeman 6.8s 6.9s 6.8s
14 benzina 13 iulie 5.7s 6.1s 5.4s
15 best print 4.5s 5.8s 4.8s
16 electrobering 4.8s 4.2s 4.7s
17 brick 8 oct (FAIL) 11.9s 13.1s 12.0s
18 gama ink 5.9s 5.9s 4.7s
19 kineterra fizioterapie 4.6s 5.9s 4.8s
20 brick 1 sept (FAIL) 12.5s 13.2s 13.1s
21 kineterra abonament 5.6s 4.9s 4.8s
22 brick electrice (FAIL) 15.9s 17.0s 15.5s
23 electrobering igiena 4.4s 5.4s 5.0s
24 Lidl papetarie 5.8s 6.9s 5.8s
25 brick 604 (FAIL) 6.8s 6.5s 6.9s
26 unlimited duplicat 4.8s 5.8s 5.0s
--- --------- ---- ---- -----
AVG 6.8s 7.2s 6.8s
TOTAL 176s 187s 176s

Tier Analysis

  • T1 only (early exit): 21 receipts (~5-6s)
  • T1+T2 (full): 5 receipts (~12-16s)

Failures (5)

Receipt Issue Fixable
igiena 14 dec TOTAL ±1 No
brick 8 oct TOTAL/CUI Maybe
brick 1 sept ALL None No (bad doc)
brick electrice DATE None Maybe
brick 604 DATE ±1 No

Recommendation

OCR_WORKERS=1                    # Best for sequential, saves RAM
OCR_WORKERS=2                    # For parallel requests (production)
OCR_MAX_TASKS_PER_CHILD=0        # No restart

For 8GB RAM: Use 1-2 workers max