Complete implementation of multi-server Oracle database support: Backend: - Multi-pool Oracle with lazy loading per server - Email-to-server cache for automatic server discovery - JWT tokens include server_id claim - /auth/check-identity and /auth/check-email endpoints - /auth/my-servers endpoint for listing user's accessible servers - Server switch with password re-authentication Frontend: - New ServerSelector component for header dropdown - Multi-step login flow (identity → server → password) - Server switching from header with password modal - Mobile drawer menu with server selection - Dark mode support for all new components - URL bookmark support with ?server= query param Scripts: - Unified start.sh replacing start-prod.sh/start-test.sh - Unified ssh-tunnel.sh with multi-server support - Updated status.sh for new architecture Tests: - E2E tests for multi-server and single-server login flows - Backend unit tests for all new endpoints - Oracle multi-pool integration tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
159 lines
3.9 KiB
Markdown
159 lines
3.9 KiB
Markdown
# OCR Validation Tests
|
|
|
|
Teste pentru validarea acurateții extragerii OCR din bonuri fiscale.
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
1. **Backend-ul trebuie să ruleze** pe `http://localhost:8000`
|
|
2. **Modulul Data Entry activat** în `.env`: `MODULE_DATA_ENTRY_ENABLED=true`
|
|
3. **JWT_SECRET_KEY** setat (sau folosește default-ul de test)
|
|
|
|
```bash
|
|
# Pornește backend-ul
|
|
cd /workspace/roa2web
|
|
./start.sh prod
|
|
# sau
|
|
./start.sh test
|
|
```
|
|
|
|
---
|
|
|
|
## Test Files
|
|
|
|
| Fișier | Scop |
|
|
|--------|------|
|
|
| `expected_receipts.json` | Expected values pentru fiecare bon (ground truth) |
|
|
| `ocr-direct-validation.py` | Test individual cu comparare detaliată |
|
|
| `test_receipts_sequential.py` | Rulează toate bonurile secvențial |
|
|
| `test_receipts_parallel.py` | Rulează toate bonurile în paralel (performance test) |
|
|
| `test_receipts_parallel_windows.py` | Versiune Windows cu memory tracking |
|
|
| `get_raw_ocr_text.py` | Debug tool - afișează raw OCR text |
|
|
|
|
**Fixtures:** `tests/fixtures/ocr-samples/` - 30 PDF-uri de bonuri fiscale
|
|
|
|
---
|
|
|
|
## Cum să rulezi testele
|
|
|
|
### 1. Test Individual (Recomandat pentru debug)
|
|
|
|
```bash
|
|
cd /workspace/roa2web
|
|
|
|
# Test toate bonurile cu engine doctr_plus
|
|
python tests/ocr-validation/ocr-direct-validation.py
|
|
|
|
# Test cu engine specific
|
|
python tests/ocr-validation/ocr-direct-validation.py --engine doctr_plus
|
|
python tests/ocr-validation/ocr-direct-validation.py --engine tesseract
|
|
|
|
# Test doar un bon specific
|
|
python tests/ocr-validation/ocr-direct-validation.py --receipt receipt_01
|
|
|
|
# Include și bonuri multi-page
|
|
python tests/ocr-validation/ocr-direct-validation.py --include-multipage
|
|
```
|
|
|
|
### 2. Test Secvențial (Toate bonurile, unul câte unul)
|
|
|
|
```bash
|
|
python tests/ocr-validation/test_receipts_sequential.py
|
|
```
|
|
|
|
Output:
|
|
```
|
|
Processing: abonament kineterra.pdf
|
|
✓ Total: MATCH (1900.0 = 1900.0)
|
|
✓ Date: MATCH (2025-11-10)
|
|
✗ CUI: MISMATCH (expected: 31180432, got: 3118043)
|
|
```
|
|
|
|
### 3. Test Paralel (Performance benchmark)
|
|
|
|
```bash
|
|
python tests/ocr-validation/test_receipts_parallel.py
|
|
```
|
|
|
|
Output:
|
|
```
|
|
PARALLEL TEST: 26 receipts
|
|
Phase 1: Submitting all jobs...
|
|
Submitted 26 jobs in 2.3s
|
|
Phase 2: Waiting for results...
|
|
OK: abonament kineterra.pdf 12.3s conf=95%
|
|
OK: benzina 14 august.pdf 8.7s conf=92%
|
|
TOTAL TIME: 45.2s
|
|
```
|
|
|
|
### 4. Debug Raw OCR Text
|
|
|
|
```bash
|
|
# Vezi textul raw extras de OCR
|
|
python tests/ocr-validation/get_raw_ocr_text.py
|
|
|
|
# Sau pentru un fișier specific
|
|
python tests/ocr-validation/get_raw_ocr_text.py tests/fixtures/ocr-samples/benzina\ 14\ august.pdf
|
|
```
|
|
|
|
---
|
|
|
|
## Expected Receipts Format
|
|
|
|
`expected_receipts.json` conține ground truth pentru fiecare bon:
|
|
|
|
```json
|
|
{
|
|
"receipts": [
|
|
{
|
|
"id": "receipt_01",
|
|
"filename": "abonament kineterra.pdf",
|
|
"furnizor": "KINETERRA CONCEPT SRL",
|
|
"cui_furnizor": "31180432",
|
|
"total": 1900.0,
|
|
"tva_details": [],
|
|
"total_tva": 0.0,
|
|
"data_bon": "2025-11-10",
|
|
"notes": "Neplatitor TVA - abonament terapie"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Adaugă bonuri noi pentru testare
|
|
|
|
1. Pune PDF-ul în `tests/fixtures/ocr-samples/`
|
|
2. Adaugă entry în `expected_receipts.json` cu valorile corecte
|
|
3. Rulează testul:
|
|
```bash
|
|
python tests/ocr-validation/ocr-direct-validation.py --receipt receipt_XX
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### "Connection refused" sau "Failed to connect"
|
|
- Backend-ul nu rulează. Pornește cu `./start.sh prod`
|
|
|
|
### "401 Unauthorized"
|
|
- JWT token invalid. Verifică `JWT_SECRET_KEY` în `.env`
|
|
|
|
### "File not found"
|
|
- Verifică că PDF-urile sunt în `tests/fixtures/ocr-samples/`
|
|
|
|
### Rezultate incorecte
|
|
- Folosește `get_raw_ocr_text.py` pentru a vedea ce text extrage OCR
|
|
- Verifică dacă bonul e lizibil și de calitate bună
|
|
|
|
---
|
|
|
|
## Performance Notes
|
|
|
|
- **doctr_plus** engine: ~8-15 secunde per bon (GPU accelerated)
|
|
- **tesseract** engine: ~3-5 secunde per bon (CPU only)
|
|
- Testul paralel poate procesa ~26 bonuri în ~45 secunde (vs ~5 minute secvențial)
|