fix(ocr): Fix store profile extraction patterns and module loading

Major fixes to OCR store profiles for Romanian receipt extraction:

- Fix ProfileRegistry module path resolution (was loading 0 profiles)
- Add multiline TVA extraction for Brick, Electrobering, Gama Ink
- Add "CARTE CREDIT" payment detection for OMV/SOCAR gas stations
- Handle OCR artifacts: TVA→TUA, "-"→"4", I→L in CUI markers
- Add client CUI patterns for Brick receipts
- Add profile selection logging to ocr_extractor.py
- Create test script for all 29 PDFs (test_all_profiles.py)

Test results: 13/29 passing (improved from 9/29)
Remaining failures are primarily OCR quality issues.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Claude Agent
2026-01-07 09:40:58 +00:00
parent 099556213d
commit 28f259cd05
13 changed files with 1531 additions and 257 deletions

View File

@@ -617,11 +617,36 @@
"data_bon": "2024-05-23",
"numar_bon": "000004",
"notes": "Duplicat cheie yala - NUMERAR"
},
{
"id": "receipt_29",
"filename": "Lidl personal 4 ianuarie .pdf",
"furnizor": "LIDL DISCOUNT S.R.L.",
"cui_furnizor": "RO22891860",
"client": null,
"cui_client": null,
"total": 65.86,
"tva_details": [
{
"rate": 21,
"value": 7.71
},
{
"rate": 11,
"value": 2.13
}
],
"total_tva": 9.84,
"card": 65.86,
"numerar": 0.0,
"data_bon": "2026-01-04",
"numar_bon": "00634",
"notes": "Lidl multi-rate TVA test: A=21% (7.71), B=11% (2.13). FARA CIF CLIENT!"
}
],
"metadata": {
"total_receipts": 30,
"total_files": 28,
"total_receipts": 31,
"total_files": 29,
"extracted_by": "Claude - manual extraction",
"extraction_date": "2026-01-01",
"notes": "Some PDF files contain multiple receipts (pages)"