fix(ocr): Fix store profile extraction patterns and module loading
Major fixes to OCR store profiles for Romanian receipt extraction: - Fix ProfileRegistry module path resolution (was loading 0 profiles) - Add multiline TVA extraction for Brick, Electrobering, Gama Ink - Add "CARTE CREDIT" payment detection for OMV/SOCAR gas stations - Handle OCR artifacts: TVA→TUA, "-"→"4", I→L in CUI markers - Add client CUI patterns for Brick receipts - Add profile selection logging to ocr_extractor.py - Create test script for all 29 PDFs (test_all_profiles.py) Test results: 13/29 passing (improved from 9/29) Remaining failures are primarily OCR quality issues. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -617,11 +617,36 @@
|
||||
"data_bon": "2024-05-23",
|
||||
"numar_bon": "000004",
|
||||
"notes": "Duplicat cheie yala - NUMERAR"
|
||||
},
|
||||
{
|
||||
"id": "receipt_29",
|
||||
"filename": "Lidl personal 4 ianuarie .pdf",
|
||||
"furnizor": "LIDL DISCOUNT S.R.L.",
|
||||
"cui_furnizor": "RO22891860",
|
||||
"client": null,
|
||||
"cui_client": null,
|
||||
"total": 65.86,
|
||||
"tva_details": [
|
||||
{
|
||||
"rate": 21,
|
||||
"value": 7.71
|
||||
},
|
||||
{
|
||||
"rate": 11,
|
||||
"value": 2.13
|
||||
}
|
||||
],
|
||||
"total_tva": 9.84,
|
||||
"card": 65.86,
|
||||
"numerar": 0.0,
|
||||
"data_bon": "2026-01-04",
|
||||
"numar_bon": "00634",
|
||||
"notes": "Lidl multi-rate TVA test: A=21% (7.71), B=11% (2.13). FARA CIF CLIENT!"
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"total_receipts": 30,
|
||||
"total_files": 28,
|
||||
"total_receipts": 31,
|
||||
"total_files": 29,
|
||||
"extracted_by": "Claude - manual extraction",
|
||||
"extraction_date": "2026-01-01",
|
||||
"notes": "Some PDF files contain multiple receipts (pages)"
|
||||
|
||||
Reference in New Issue
Block a user