Files

Claude Agent 62f86250cc refactor(docs): consolidate and cleanup documentation

- Delete 9 deprecated/obsolete docs (~6,300 lines removed)
- Move test PDFs to tests/fixtures/ocr-samples/
- Create docs/DEPLOYMENT.md as principal guide
- Create tests/ocr-validation/README.md
- Update all refs for ultrathin monolith architecture
- Update OCR tests to use relative paths

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-22 09:14:51 +00:00

3.9 KiB

Raw Blame History

OCR Validation Tests

Teste pentru validarea acurateții extragerii OCR din bonuri fiscale.

Prerequisites

Backend-ul trebuie să ruleze pe http://localhost:8000
Modulul Data Entry activat în .env: MODULE_DATA_ENTRY_ENABLED=true
JWT_SECRET_KEY setat (sau folosește default-ul de test)

# Pornește backend-ul
cd /workspace/roa2web
./start-prod.sh
# sau
./start-test.sh

Test Files

Fișier	Scop
`expected_receipts.json`	Expected values pentru fiecare bon (ground truth)
`ocr-direct-validation.py`	Test individual cu comparare detaliată
`test_receipts_sequential.py`	Rulează toate bonurile secvențial
`test_receipts_parallel.py`	Rulează toate bonurile în paralel (performance test)
`test_receipts_parallel_windows.py`	Versiune Windows cu memory tracking
`get_raw_ocr_text.py`	Debug tool - afișează raw OCR text

Fixtures: tests/fixtures/ocr-samples/ - 30 PDF-uri de bonuri fiscale

Cum să rulezi testele

1. Test Individual (Recomandat pentru debug)

cd /workspace/roa2web

# Test toate bonurile cu engine doctr_plus
python tests/ocr-validation/ocr-direct-validation.py

# Test cu engine specific
python tests/ocr-validation/ocr-direct-validation.py --engine doctr_plus
python tests/ocr-validation/ocr-direct-validation.py --engine tesseract

# Test doar un bon specific
python tests/ocr-validation/ocr-direct-validation.py --receipt receipt_01

# Include și bonuri multi-page
python tests/ocr-validation/ocr-direct-validation.py --include-multipage

2. Test Secvențial (Toate bonurile, unul câte unul)

python tests/ocr-validation/test_receipts_sequential.py

Output:

Processing: abonament kineterra.pdf
  ✓ Total: MATCH (1900.0 = 1900.0)
  ✓ Date: MATCH (2025-11-10)
  ✗ CUI: MISMATCH (expected: 31180432, got: 3118043)

3. Test Paralel (Performance benchmark)

python tests/ocr-validation/test_receipts_parallel.py

Output:

PARALLEL TEST: 26 receipts
Phase 1: Submitting all jobs...
Submitted 26 jobs in 2.3s
Phase 2: Waiting for results...
  OK: abonament kineterra.pdf                   12.3s  conf=95%
  OK: benzina 14 august.pdf                      8.7s  conf=92%
TOTAL TIME: 45.2s

4. Debug Raw OCR Text

# Vezi textul raw extras de OCR
python tests/ocr-validation/get_raw_ocr_text.py

# Sau pentru un fișier specific
python tests/ocr-validation/get_raw_ocr_text.py tests/fixtures/ocr-samples/benzina\ 14\ august.pdf

Expected Receipts Format

expected_receipts.json conține ground truth pentru fiecare bon:

{
  "receipts": [
    {
      "id": "receipt_01",
      "filename": "abonament kineterra.pdf",
      "furnizor": "KINETERRA CONCEPT SRL",
      "cui_furnizor": "31180432",
      "total": 1900.0,
      "tva_details": [],
      "total_tva": 0.0,
      "data_bon": "2025-11-10",
      "notes": "Neplatitor TVA - abonament terapie"
    }
  ]
}

Adaugă bonuri noi pentru testare

Pune PDF-ul în tests/fixtures/ocr-samples/
Adaugă entry în expected_receipts.json cu valorile corecte

Rulează testul:

python tests/ocr-validation/ocr-direct-validation.py --receipt receipt_XX

Troubleshooting

"Connection refused" sau "Failed to connect"

Backend-ul nu rulează. Pornește cu ./start-prod.sh

"401 Unauthorized"

JWT token invalid. Verifică JWT_SECRET_KEY în .env

"File not found"

Verifică că PDF-urile sunt în tests/fixtures/ocr-samples/

Rezultate incorecte

Folosește get_raw_ocr_text.py pentru a vedea ce text extrage OCR
Verifică dacă bonul e lizibil și de calitate bună

Performance Notes

doctr_plus engine: ~8-15 secunde per bon (GPU accelerated)
tesseract engine: ~3-5 secunde per bon (CPU only)
Testul paralel poate procesa ~26 bonuri în ~45 secunde (vs ~5 minute secvențial)

3.9 KiB Raw Blame History