feat(ocr): Add modular store profiles with hot-reload support

## Store Profiles System
- Add ProfileRegistry for CUI-based profile lookup
- Add BaseStoreProfile with generic extraction patterns
- Implement hot-reload via POST /api/data-entry/ocr/profiles/reload

## 12 Store Profiles
- LIDL: Multi-rate TVA (A, B, C, D codes)
- OMV, SOCAR: B2B with client CUI, YYYY.MM.DD dates
- BRICK, DEDEMAN: Standard TVA, e-factura support
- KINETERRA, BEST PRINT: Non-VAT payers (returns [])
- STEPOUT MARKET: TVA 5% (books/reduced rate)
- UNLIMITED KEYS: NUMERAR payment detection
- GAMA INK, ELECTROBERING, PICTUS VELUM: Standard TVA

## Flexible TVA Patterns
- All patterns use (\d{1,2})% to accept any rate
- Supports historical (19%, 9%, 5%) and current (21%, 11%)

## Payment Methods Fix
- Fixed base.py to support multiple payments of same type
- Changed deduplication from method-only to (method, amount) tuple
- Returns separate entries for split payments

## Tools
- Add generate_store_profile.py for automatic profile generation
- Analyzes PDFs via OCR API and detects patterns

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Claude Agent
2026-01-06 23:07:07 +00:00
parent 67b0082df0
commit 099556213d
25 changed files with 3707 additions and 114 deletions

View File

@@ -0,0 +1,258 @@
# Store Profiles - OCR Extraction
Sistem de profile specifice pentru extracție OCR cu hot-reload.
---
## Quick Start: Adaugă un profil nou
```bash
# 1. Generează profil din PDF-uri (dry-run pentru preview)
python scripts/generate_store_profile.py \
--name "Magazin Nou SRL" \
--cui "12345678" \
--receipts "docs/data-entry/MagazinNou*.pdf" \
--dry-run
# 2. Generează și salvează
python scripts/generate_store_profile.py \
--name "Magazin Nou SRL" \
--cui "12345678" \
--receipts "docs/data-entry/MagazinNou*.pdf" \
--output backend/modules/data_entry/services/ocr/profiles/magazin_nou.py
# 3. Hot-reload (fără restart server)
curl -X POST http://localhost:8000/api/data-entry/ocr/profiles/reload
# 4. Verifică
curl http://localhost:8000/api/data-entry/ocr/profiles
```
---
## Structura directorului
```
profiles/
├── __init__.py # ProfileRegistry + hot-reload (~390 linii)
├── base.py # BaseStoreProfile + pattern-uri generice (~410 linii)
├── lidl.py # Multi-rate TVA (A/B)
├── omv.py # B2B, date YYYY.MM.DD
├── socar.py # B2B, date YYYY.MM.DD
├── brick.py # Standard TVA
├── dedeman.py # E-factura support
├── kineterra.py # Non-VAT payer
├── gama_ink.py # Standard TVA (toner/cartușe)
├── electrobering.py # Standard TVA (electronice)
├── pictus_velum.py # Standard TVA (rechizite)
├── unlimited_keys.py # Standard TVA, NUMERAR payment
├── best_print.py # Non-VAT payer (neplătitor TVA)
├── stepout_market.py # TVA 5% (cărți/librărie)
└── README.md # Acest fișier
```
---
## Profile existente (12 profile)
> **Note**: Pattern-urile TVA sunt **flexibile** și acceptă ORICE cotă (5%, 9%, 11%, 19%, 21%, etc.)
> pentru a gestiona atât datele istorice cât și schimbările viitoare ale legislației.
| Magazin | CUI | Fișier | Caracteristici |
|---------|-----|--------|----------------|
| LIDL DISCOUNT S.R.L. | 22891860 | `lidl.py` | Multi-rate TVA (coduri A, B, C, D) |
| OMV PETROM MARKETING S.R.L. | 11201891 | `omv.py` | B2B (client CUI), date YYYY.MM.DD |
| SOCAR PETROLEUM S.A. | 12546600 | `socar.py` | B2B (client CUI), date YYYY.MM.DD |
| FIVE-HOLDING S.A. (BRICK) | 10562600 | `brick.py` | Standard TVA |
| DEDEMAN SRL | 2816464 | `dedeman.py` | E-factura support |
| KINETERRA CONCEPT SRL | 31180432 | `kineterra.py` | Non-VAT payer (returnează `[]`) |
| GAMA INK SERVICE SRL | 17741882 | `gama_ink.py` | Standard TVA (toner, cartușe) |
| ELECTROBERING S.R.L. | 2744937 | `electrobering.py` | Standard TVA (electronice) |
| PICTUS VELUM SRL | 39634534 | `pictus_velum.py` | Standard TVA (rechizite) |
| UNLIMITED KEYS S.R.L. | 18993187 | `unlimited_keys.py` | Standard TVA, **NUMERAR** plată |
| BEST PRINT TRADE ACTIV SRL | 45417955 | `best_print.py` | **Non-VAT payer** (neplătitor TVA) |
| STEPOUT MARKET SRL | 35532655 | `stepout_market.py` | TVA 5% (cărți, librărie) |
---
## API Endpoints
| Endpoint | Metodă | Descriere |
|----------|--------|-----------|
| `/api/data-entry/ocr/profiles` | GET | Lista toate profilele |
| `/api/data-entry/ocr/profiles/{cui}` | GET | Detalii profil (acceptă RO prefix) |
| `/api/data-entry/ocr/profiles/reload` | POST | Hot-reload toate profilele |
### Exemple API
```bash
# Lista profile
curl http://localhost:8000/api/data-entry/ocr/profiles \
-H "Authorization: Bearer <token>"
# Detalii profil (cu sau fără RO prefix)
curl http://localhost:8000/api/data-entry/ocr/profiles/22891860
curl http://localhost:8000/api/data-entry/ocr/profiles/RO22891860
# Hot-reload după modificări
curl -X POST http://localhost:8000/api/data-entry/ocr/profiles/reload \
-H "Authorization: Bearer <token>"
# Response reload:
{
"success": true,
"reloaded_modules": 12,
"profiles_count": 12,
"registered_cuis": ["22891860", "11201891", "12546600", "10562600", ...],
"last_reload": "2026-01-06T22:37:05.000000"
}
```
---
## Cum funcționează sistemul
### Flow de extracție
```
ReceiptExtractor.extract()
├─► STEP 1: Extrage vendor + CUI
│ └─► _extract_vendor(), _extract_cui()
├─► ProfileRegistry.get_profile(cui)
│ └─► Returnează profil specific sau None
├─► STEP 2: Extracție cu profil (dacă există)
│ ├─► profile.extract_total()
│ ├─► profile.extract_date()
│ ├─► profile.extract_receipt_number()
│ ├─► profile.extract_tva_entries()
│ ├─► profile.extract_payment_methods()
│ └─► profile.extract_client_cui()
└─► STEP 3-4: Validare + post-procesare
```
### Fallback
Dacă nu există profil pentru CUI, se folosește logica generică din `ReceiptExtractor`.
---
## Structura unui profil
```python
from .base import BaseStoreProfile
from . import ProfileRegistry
@ProfileRegistry.register
class MagazinNouProfile(BaseStoreProfile):
"""Docstring cu descriere magazin."""
CUI_LIST = ["12345678"] # Poate avea mai multe CUI-uri
NAME_PATTERNS = ["MAGAZIN", "MAGAZIN NOU", "MAG4ZIN"] # OCR variants
STORE_NAME = "Magazin Nou SRL"
# Override doar ce e diferit de base class
def extract_tva_entries(self, text: str) -> List[dict]:
# Pattern-uri specifice magazinului
...
def get_validation_hints(self) -> Dict[str, Any]:
return {
"has_multi_rate_tva": False,
"card_equals_total": True,
"has_client_cui": False,
"has_efactura": False,
"is_non_vat_payer": False,
}
```
---
## Pattern-uri disponibile în base.py
BaseStoreProfile include pattern-uri generice OCR-tolerant:
| Pattern | Descriere |
|---------|-----------|
| `TOTAL_PATTERNS` | 8 variante pentru TOTAL (TOTAL:, TOTAL DE PLATA, etc.) |
| `DATE_PATTERNS` | 6 variante (DD.MM.YYYY, YYYY-MM-DD, DD/MM/YYYY) |
| `DATE_PATTERNS_OCR_SPACES` | 4 variante cu spații OCR ("2025. 08. 14") |
| `NUMBER_PATTERNS` | 11 variante pentru număr bon (NDS, BF, C3POS) |
| `PAYMENT_PATTERNS` | 8 variante pentru CARD/NUMERAR |
| `CLIENT_MARKERS` | 6 variante pentru secțiune CLIENT |
| `CLIENT_CUI_PATTERNS` | 7 variante pentru CUI client |
### Metode implementate în base class
- `extract_total(text)``Tuple[Decimal, float]`
- `extract_date(text)``Tuple[date, float]`
- `extract_receipt_number(text)``Tuple[str, float]`
- `extract_payment_methods(text)``List[dict]`
- `extract_client_cui(text)``Tuple[str, float]`
- `extract_client_name(text)``Tuple[str, float]`
---
## Când ai nevoie de profil custom?
| Situație | Exemplu | Ce trebuie override |
|----------|---------|---------------------|
| **Multi-rate TVA** | Lidl (TVA A, TVA B) | `extract_tva_entries()` |
| **Format dată special** | OMV/Socar (YYYY.MM.DD) | `DATE_PATTERNS_OCR_SPACES` |
| **B2B receipts** | Benzinării (au client CUI) | `extract_client_cui()` |
| **Non-VAT payer** | Kineterra | `extract_tva_entries()` returnează `[]` |
| **E-factura** | Dedeman | `extract_efactura_reference()` |
---
## Decizii de design
1. **Hot-reload manual** - endpoint `/profiles/reload` apelat când se modifică fișiere
2. **Persistență în Python** - profile în Git, version controlled
3. **Fallback graceful** - dacă nu există profil, folosește logica generică
4. **CUI normalization** - gestionează automat prefixul "RO" și whitespace
5. **Deduplicare TVA** - folosește `seen = set()` pentru a evita duplicate
---
## Comenzi utile
```bash
# Verifică syntax Python pentru toate profilele
for f in backend/modules/data_entry/services/ocr/profiles/*.py; do
python3 -m py_compile "$f" && echo "✓ $(basename $f)"
done
# Lista profile
ls -la backend/modules/data_entry/services/ocr/profiles/
# Pornește backend pentru testare
cd backend && source venv/bin/activate
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 1
# Test OCR pe un PDF
curl -X POST -F "file=@docs/data-entry/test.pdf" \
-H "Authorization: Bearer <token>" \
"http://localhost:8000/api/data-entry/ocr/extract?engine=doctr_plus"
```
---
## Script generare profile
`scripts/generate_store_profile.py` - generator automat de profile
```bash
# Vezi help
python scripts/generate_store_profile.py --help
# Funcționalități:
# - Analizează PDF-uri via OCR API
# - Detectează: TVA format, date format, payment patterns, B2B
# - Generează cod Python cu OCR error variants
# - Suportă glob patterns (*.pdf)
# - Verifică sintaxa după generare
```

View File

@@ -0,0 +1,388 @@
"""
Store Profiles Registry with Hot-Reload Support.
This module provides a registry for store-specific OCR extraction profiles.
Profiles can be reloaded at runtime without restarting the server.
Usage:
from backend.modules.data_entry.services.ocr.profiles import ProfileRegistry
# Get profile for a CUI
profile = ProfileRegistry.get_profile("22891860")
if profile:
tva_entries = profile.extract_tva_entries(text)
# Reload all profiles (after file changes)
count = ProfileRegistry.reload_all()
Architecture:
- ProfileRegistry: Singleton registry with class methods
- BaseStoreProfile: Abstract base class for profiles
- @ProfileRegistry.register: Decorator for profile classes
Hot-Reload Mechanism:
1. Admin calls POST /profiles/reload endpoint
2. Registry clears instance cache
3. importlib.reload() re-executes each profile module
4. @register decorator re-registers classes with new code
"""
from __future__ import annotations
import importlib
import logging
import sys
from datetime import datetime
from pathlib import Path
from typing import Dict, List, Optional, Type, TYPE_CHECKING
if TYPE_CHECKING:
from .base import BaseStoreProfile
logger = logging.getLogger(__name__)
# Directory containing profile modules
PROFILES_DIR = Path(__file__).parent
class ProfileRegistry:
"""
Registry for store-specific OCR extraction profiles.
Uses class methods for singleton-like behavior without explicit instantiation.
Supports hot-reload via importlib.reload() for runtime updates.
Attributes:
_profiles: Maps CUI -> profile class (not instance)
_instances: Maps CUI -> profile instance (lazy, cleared on reload)
_last_reload: Timestamp of last reload
_loaded: Whether initial load has been performed
"""
# Class-level storage (singleton pattern via class methods)
_profiles: Dict[str, Type["BaseStoreProfile"]] = {}
_instances: Dict[str, "BaseStoreProfile"] = {}
_last_reload: Optional[datetime] = None
_loaded: bool = False
# -------------------------------------------------------------------------
# Registration
# -------------------------------------------------------------------------
@classmethod
def register(cls, profile_class: Type["BaseStoreProfile"]) -> Type["BaseStoreProfile"]:
"""
Decorator to register a store profile class.
Registers the profile for all CUIs in the class's CUI_LIST.
Safe for re-registration during hot-reload (overwrites existing).
Usage:
@ProfileRegistry.register
class LidlProfile(BaseStoreProfile):
CUI_LIST = ["22891860"]
...
Args:
profile_class: Profile class to register
Returns:
The same class (allows use as decorator)
Raises:
ValueError: If CUI_LIST is empty
"""
cui_list = getattr(profile_class, 'CUI_LIST', [])
store_name = getattr(profile_class, 'STORE_NAME', profile_class.__name__)
if not cui_list:
logger.warning(f"Profile {profile_class.__name__} has empty CUI_LIST, skipping")
return profile_class
# Register for each CUI
for cui in cui_list:
# Normalize CUI (remove RO prefix, strip whitespace)
normalized_cui = cls._normalize_cui(cui)
if normalized_cui in cls._profiles:
old_class = cls._profiles[normalized_cui]
logger.debug(
f"Re-registering CUI {normalized_cui}: "
f"{old_class.__name__} -> {profile_class.__name__}"
)
# Clear cached instance for this CUI
cls._instances.pop(normalized_cui, None)
cls._profiles[normalized_cui] = profile_class
logger.debug(f"Registered profile {profile_class.__name__} for CUI {normalized_cui}")
logger.info(f"Registered {store_name} for CUIs: {cui_list}")
return profile_class
# -------------------------------------------------------------------------
# Lookup
# -------------------------------------------------------------------------
@classmethod
def get_profile(cls, cui: Optional[str]) -> Optional["BaseStoreProfile"]:
"""
Get profile instance for a CUI.
Uses lazy instantiation - creates instance on first access.
Returns None if no profile is registered for this CUI.
Args:
cui: CUI to lookup (with or without RO prefix)
Returns:
Profile instance or None
"""
if not cui:
return None
# Ensure profiles are loaded
if not cls._loaded:
cls._load_all_profiles()
normalized_cui = cls._normalize_cui(cui)
# Check if profile exists
profile_class = cls._profiles.get(normalized_cui)
if not profile_class:
return None
# Lazy instantiation
if normalized_cui not in cls._instances:
try:
cls._instances[normalized_cui] = profile_class()
logger.debug(f"Instantiated {profile_class.__name__} for CUI {normalized_cui}")
except Exception as e:
logger.error(f"Failed to instantiate {profile_class.__name__}: {e}")
return None
return cls._instances[normalized_cui]
@classmethod
def has_profile(cls, cui: Optional[str]) -> bool:
"""Check if a profile exists for this CUI."""
if not cui:
return False
if not cls._loaded:
cls._load_all_profiles()
return cls._normalize_cui(cui) in cls._profiles
# -------------------------------------------------------------------------
# Listing
# -------------------------------------------------------------------------
@classmethod
def list_profiles(cls) -> List[Dict]:
"""
List all registered profiles.
Returns:
List of dicts with cui, class_name, store_name, name_patterns
"""
if not cls._loaded:
cls._load_all_profiles()
result = []
seen_classes = set()
for cui, profile_class in cls._profiles.items():
# Avoid duplicates for profiles with multiple CUIs
if profile_class.__name__ in seen_classes:
continue
seen_classes.add(profile_class.__name__)
result.append({
"cuis": list(getattr(profile_class, 'CUI_LIST', [])),
"class_name": profile_class.__name__,
"store_name": getattr(profile_class, 'STORE_NAME', profile_class.__name__),
"name_patterns": list(getattr(profile_class, 'NAME_PATTERNS', [])),
})
return result
@classmethod
def get_profile_info(cls, cui: str) -> Optional[Dict]:
"""
Get detailed info about a profile.
Args:
cui: CUI to lookup
Returns:
Dict with profile details or None
"""
profile = cls.get_profile(cui)
if not profile:
return None
return {
"cui": cui,
"cuis": list(profile.CUI_LIST),
"class_name": profile.__class__.__name__,
"store_name": profile.STORE_NAME,
"name_patterns": list(profile.NAME_PATTERNS),
"validation_hints": profile.get_validation_hints(),
}
# -------------------------------------------------------------------------
# Hot-Reload
# -------------------------------------------------------------------------
@classmethod
def reload_all(cls) -> int:
"""
Hot-reload all profile modules.
Clears instance cache and reloads all .py files in profiles directory.
Decorator re-registers classes with updated code.
Returns:
Number of modules reloaded
"""
logger.info("Starting profile hot-reload...")
# Clear instance cache (will be recreated on next get_profile)
cls._instances.clear()
# Get list of profile modules (exclude __init__, base)
module_names = cls._get_profile_module_names()
count = 0
for module_name in module_names:
full_name = f"backend.modules.data_entry.services.ocr.profiles.{module_name}"
try:
if full_name in sys.modules:
# Reload existing module
importlib.reload(sys.modules[full_name])
logger.debug(f"Reloaded module: {module_name}")
else:
# Import new module
importlib.import_module(full_name)
logger.debug(f"Imported new module: {module_name}")
count += 1
except Exception as e:
logger.error(f"Failed to reload {module_name}: {e}")
cls._last_reload = datetime.utcnow()
cls._loaded = True
logger.info(f"Profile hot-reload complete: {count} modules, {len(cls._profiles)} profiles")
return count
@classmethod
def get_reload_status(cls) -> Dict:
"""Get status of the registry including last reload time."""
return {
"loaded": cls._loaded,
"last_reload": cls._last_reload.isoformat() if cls._last_reload else None,
"profiles_count": len(cls._profiles),
"instances_count": len(cls._instances),
"registered_cuis": list(cls._profiles.keys()),
}
# -------------------------------------------------------------------------
# Internal methods
# -------------------------------------------------------------------------
@classmethod
def _normalize_cui(cls, cui: str) -> str:
"""
Normalize CUI for consistent lookup.
- Removes RO prefix (with or without space)
- Strips whitespace
- Converts to uppercase
Args:
cui: Raw CUI string
Returns:
Normalized CUI (digits only)
"""
if not cui:
return ""
cui = str(cui).strip().upper()
# Remove RO prefix (handles "RO12345" and "RO 12345")
if cui.startswith("RO"):
cui = cui[2:].lstrip()
return cui.strip()
@classmethod
def _get_profile_module_names(cls) -> List[str]:
"""
Get list of profile module names from profiles directory.
Excludes __init__.py and base.py.
Returns:
List of module names (without .py extension)
"""
excluded = {"__init__", "base", "__pycache__"}
modules = []
for path in PROFILES_DIR.glob("*.py"):
name = path.stem
if name not in excluded:
modules.append(name)
return sorted(modules)
@classmethod
def _load_all_profiles(cls) -> None:
"""
Initial load of all profile modules.
Called automatically on first get_profile() if not already loaded.
"""
if cls._loaded:
return
logger.info("Loading store profiles...")
module_names = cls._get_profile_module_names()
for module_name in module_names:
full_name = f"backend.modules.data_entry.services.ocr.profiles.{module_name}"
try:
importlib.import_module(full_name)
logger.debug(f"Loaded module: {module_name}")
except Exception as e:
logger.error(f"Failed to load {module_name}: {e}")
cls._loaded = True
cls._last_reload = datetime.utcnow()
logger.info(f"Loaded {len(cls._profiles)} store profiles")
@classmethod
def clear(cls) -> None:
"""
Clear all registered profiles.
Mainly useful for testing.
"""
cls._profiles.clear()
cls._instances.clear()
cls._loaded = False
cls._last_reload = None
# -------------------------------------------------------------------------
# Module exports
# -------------------------------------------------------------------------
__all__ = [
"ProfileRegistry",
"BaseStoreProfile",
]
# Re-export BaseStoreProfile for convenience
from .base import BaseStoreProfile

View File

@@ -0,0 +1,515 @@
"""
Base class for store-specific OCR extraction profiles.
Each store can have different receipt formats (TVA layout, total position, etc.).
Store profiles allow customizing extraction logic per-store for better accuracy.
Usage:
from .base import BaseStoreProfile
from . import ProfileRegistry
@ProfileRegistry.register
class LidlProfile(BaseStoreProfile):
CUI_LIST = ["22891860"]
NAME_PATTERNS = ["LIDL", "LDL"]
def extract_tva_entries(self, text: str) -> List[dict]:
# Custom Lidl TVA extraction logic
...
"""
import re
from abc import ABC
from decimal import Decimal, InvalidOperation
from typing import List, Optional, Tuple, Dict, Any
from datetime import date
class BaseStoreProfile(ABC):
"""
Abstract base class for store-specific extraction profiles.
Each profile defines:
- CUI_LIST: CUI codes that identify this store (without RO prefix)
- NAME_PATTERNS: OCR-tolerant name patterns for fallback matching
- Custom extraction methods for TVA, total, date, etc.
The ProfileRegistry uses CUI_LIST to lookup profiles during extraction.
"""
# -------------------------------------------------------------------------
# Class attributes - override in subclasses
# -------------------------------------------------------------------------
# List of CUI codes (without RO prefix) that identify this store
CUI_LIST: List[str] = []
# OCR-tolerant name patterns for fallback matching
NAME_PATTERNS: List[str] = []
# Store display name
STORE_NAME: str = "Unknown Store"
# -------------------------------------------------------------------------
# Generic patterns - can be overridden in subclasses
# -------------------------------------------------------------------------
# Total amount patterns (confidence-weighted)
TOTAL_PATTERNS = [
(r'T[O0]TAL[.\s]+L[E3][I1!]\s*:?\s*([\d\s.,]+)', 0.98),
(r'TOTAL\s+LEI\s*([\d\s.,]+)', 0.98),
(r'[OT]?OTAL\s+LEI\s*([\d\s.,]+)', 0.95),
(r'TOTAL\s*:?\s*([\d\s.,]+)\s*(?:RON|LEI)?', 0.95),
(r'TOTAL\s+(?:RON|LEI)\s*([\d\s.,]+)', 0.95),
(r'SUBTOTAL\s*([\d\s.,]+)', 0.90),
(r'DE\s+PLATA\s*:?\s*([\d\s.,]+)', 0.90),
(r'SUMA\s*:?\s*([\d\s.,]+)', 0.85),
]
# Date patterns (confidence-weighted)
DATE_PATTERNS = [
(r'D[AR]TA\s*:?\s*(\d{2}[-./]\d{2}[-./]\d{4})', 0.98),
(r'DATA\s*:?\s*(\d{2}[-./]\d{2}[-./]\d{4})', 0.98),
(r'(\d{2}[-./]\d{2}[-./]\d{4})\s+[O0]RA\s*:?\s*\d{2}:\d{2}', 0.95),
(r'(\d{2}[-./]\d{2}[-./]\d{4})\s+\d{2}:\d{2}', 0.90),
(r'(\d{2}[-./]\d{2}[-./]\d{4})', 0.80),
(r'(\d{4}[-./]\d{2}[-./]\d{2})', 0.75),
]
# Date patterns with OCR-introduced spaces (separate because format is different)
DATE_PATTERNS_OCR_SPACES = [
(r'(\d{4})[.,]\s*(\d{2})[.,]\s*(\d{2})\s+\d{2}:\d{2}', 0.92, 'ymd'),
(r'(\d{4})[.,]\s*(\d{2})[.,]\s*(\d{2})', 0.85, 'ymd'),
(r'(\d{2})[.,]\s*(\d{2})[.,]\s*(\d{4})\s+\d{2}:\d{2}', 0.92, 'dmy'),
(r'(\d{2})[.,]\s*(\d{2})[.,]\s*(\d{4})', 0.85, 'dmy'),
]
# Receipt number patterns (confidence-weighted)
NUMBER_PATTERNS = [
(r'NDS\s*:?\s*(\d+)', 0.98),
(r'C3POS[-A-Z0-9]*[N:](\d{6,7})', 0.98),
(r'C3POS.*?(\d{6,7})\b', 0.95),
(r'BF\s*:\s*(\d{4,})', 0.96),
(r'BF\s+(\d{4,})', 0.93),
(r'NIVS\s*:?\s*(\d+)', 0.95),
(r'NR\.?\s*BON\s*:?\s*(\d+)', 0.95),
(r'BON\s+(?:FISCAL\s+)?NR\.?\s*:?\s*(\d+)', 0.95),
(r'CHITANTA\s+NR\.?\s*:?\s*(\d+)', 0.95),
(r'NR\.?\s+DOCUMENT\s*:?\s*(\d+)', 0.90),
(r'ID\s*BF\s*:?\s*(\d+)', 0.90),
]
# Payment method patterns (pattern, method_type, confidence)
PAYMENT_PATTERNS = [
(r'CARTE\s+CREDIT\s*:?\s*([\d\s.,]+)', 'CARD', 0.98),
(r'CARTE\s+CREDIT\s*:?\s*\n\s*([\d\s.,]+)', 'CARD', 0.97),
(r'(?:PLATA\s+)?CARD\s*[:\sA-Z]?\s*([\d\s.,]+)', 'CARD', 0.95),
(r'NUMERAR\s*:?\s*([\d\s.,]+)', 'NUMERAR', 0.95),
(r'CASH\s*:?\s*([\d\s.,]+)', 'NUMERAR', 0.90),
(r'(?:^|\n|\s)RD\s*:?\s*(\d{1,6}[.,]\d{2})\b', 'CARD', 0.70),
(r'(?:^|\n|\s)ARD\s*:?\s*(\d{1,6}[.,]\d{2})\b', 'CARD', 0.75),
(r'(?:^|\n|\s)MERAR\s*:?\s*(\d{1,6}[.,]\d{2})\b', 'NUMERAR', 0.70),
]
# Client section markers (for B2B receipts)
CLIENT_MARKERS = [
r'C\.?\s*[I1]\.?\s*F\.?\s+CLIENT\s*:',
r'C\.?\s*U\.?\s*[I1]\.?\s+CLIENT\s*:',
r'CLIENT\s+C\.?\s*[UI1]\.?\s*[IF1]\.?\s*:',
r'CLIENT\s*:',
r'CUMPARATOR\s*:',
r'BENEFICIAR\s*:',
]
# Client CUI patterns (pattern, confidence)
CLIENT_CUI_PATTERNS = [
(r'(R[O0]\d{6,10})\s*\n\s*CLIENT\s+C\.?\s*U\.?\s*[I1]\.?', 0.99),
(r'(R[O0]\d{6,10})\s*:?\s*\n\s*CLIENT', 0.98),
(r'C[I1]F\s+[A-Z]*\s*CLIENT\s*:?\s*(R[O0]\d{6,10})', 0.98),
(r'C\.?\s*[I1]\.?\s*F\.?\s+CLIENT\s*:?\s*(R[O0]?\d{6,10})', 0.98),
(r'C\.?\s*U\.?\s*[I1]\.?\s+CLIENT\s*:?\s*(R[O0]?\d{6,10})', 0.98),
(r'CLIENT\s+C\.?\s*U\.?\s*[I1]\.?\s*:?\s*(R[O0]?\d{6,10})', 0.95),
(r'CLIENT\s*:?\s*(R[O0]?\d{6,10})', 0.90),
]
# Company type indicators (for identifying company names)
COMPANY_INDICATORS = [
r'\bS\.?\s*R\.?\s*L\.?\b', # S.R.L. or S. R. L.
r'\bS\.?\s*A\.?\b', # S.A. or S. A.
r'\bS\.?\s*N\.?\s*C\.?\b', # S.N.C. or S. N. C.
r'\bS\.?\s*C\.?\s*S\.?\b', # S.C.S. or S. C. S.
r'\bI\.?\s*I\.?\b', # I.I. or I. I.
r'\bP\.?\s*F\.?\s*A\.?\b', # P.F.A. or P. F. A.
r'\bS\.?\s*C\.?\s+[A-Z]', # S.C. followed by company name
r'HOLDING',
r'COMPANY',
r'GROUP',
]
# Maximum reasonable payment amount (to filter OCR errors)
MAX_PAYMENT = Decimal('100000')
# -------------------------------------------------------------------------
# Extraction methods - override in subclasses as needed
# -------------------------------------------------------------------------
def extract_tva_entries(self, text: str) -> List[dict]:
"""
Extract TVA entries from receipt text.
Override this method in subclasses to handle store-specific TVA formats.
Args:
text: Raw OCR text from receipt
Returns:
List of dicts with keys: code, percent, amount
"""
return []
def extract_total(self, text: str) -> Tuple[Optional[Decimal], float]:
"""
Extract total amount from receipt text.
Args:
text: Raw OCR text from receipt
Returns:
Tuple of (amount, confidence) or (None, 0.0)
"""
text_upper = text.upper()
for pattern, confidence in self.TOTAL_PATTERNS:
match = re.search(pattern, text_upper)
if match:
amount = self._parse_decimal(match.group(1))
if amount and amount > 0 and amount < self.MAX_PAYMENT:
return (amount, confidence)
return (None, 0.0)
def extract_date(self, text: str) -> Tuple[Optional[date], float]:
"""
Extract receipt date from text.
Args:
text: Raw OCR text from receipt
Returns:
Tuple of (date, confidence) or (None, 0.0)
"""
text_upper = text.upper()
# Try standard patterns first
for pattern, confidence in self.DATE_PATTERNS:
match = re.search(pattern, text_upper)
if match:
parsed = self._parse_date(match.group(1))
if parsed:
return (parsed, confidence)
# Try OCR-corrupted patterns with spaces
for pattern, confidence, fmt in self.DATE_PATTERNS_OCR_SPACES:
match = re.search(pattern, text_upper)
if match:
try:
if fmt == 'ymd':
year, month, day = int(match.group(1)), int(match.group(2)), int(match.group(3))
else: # dmy
day, month, year = int(match.group(1)), int(match.group(2)), int(match.group(3))
if 1 <= day <= 31 and 1 <= month <= 12 and 2000 <= year <= 2100:
return (date(year, month, day), confidence)
except (ValueError, TypeError):
continue
return (None, 0.0)
def extract_receipt_number(self, text: str) -> Tuple[Optional[str], float]:
"""
Extract receipt number from text.
Args:
text: Raw OCR text from receipt
Returns:
Tuple of (number, confidence) or (None, 0.0)
"""
text_upper = text.upper()
for pattern, confidence in self.NUMBER_PATTERNS:
match = re.search(pattern, text_upper)
if match:
number = match.group(1).strip()
if number and len(number) >= 3:
return (number, confidence)
return (None, 0.0)
def extract_payment_methods(self, text: str) -> List[dict]:
"""
Extract payment methods (CARD/NUMERAR) from receipt.
Supports multiple payments of the same type (e.g., 2x CARD for split payments).
Each payment is returned as a separate entry with its amount.
Args:
text: Raw OCR text from receipt
Returns:
List of dicts: [{'method': 'CARD'/'NUMERAR', 'amount': Decimal, 'confidence': float}]
Multiple entries of same method type are allowed for split payments.
"""
text_upper = text.upper()
methods = []
# Track (method, amount) pairs to avoid exact duplicates from overlapping patterns
seen_entries = set()
for pattern, method, confidence in self.PAYMENT_PATTERNS:
for match in re.finditer(pattern, text_upper):
try:
amount = self._parse_decimal(match.group(1))
if amount and amount > 0 and amount < self.MAX_PAYMENT:
# Deduplicate by (method, amount) to avoid same entry from multiple patterns
# But allow different amounts for same method (split payments)
entry_key = (method, amount)
if entry_key not in seen_entries:
methods.append({
'method': method,
'amount': amount,
'confidence': confidence
})
seen_entries.add(entry_key)
except (ValueError, InvalidOperation):
continue
return methods
def extract_client_cui(self, text: str) -> Tuple[Optional[str], float]:
"""
Extract client CUI from B2B receipts.
Args:
text: Raw OCR text from receipt
Returns:
Tuple of (cui, confidence) or (None, 0.0)
"""
text_upper = text.upper()
# First check if there's a CLIENT section
has_client_section = any(
re.search(marker, text_upper, re.IGNORECASE)
for marker in self.CLIENT_MARKERS
)
if not has_client_section:
return (None, 0.0)
# Try to extract CUI
for pattern, confidence in self.CLIENT_CUI_PATTERNS:
match = re.search(pattern, text_upper, re.IGNORECASE | re.MULTILINE)
if match:
cui = match.group(1)
# Normalize: remove RO prefix for storage
cui_digits = re.sub(r'[^0-9]', '', cui)
if 6 <= len(cui_digits) <= 10:
return (cui_digits, confidence)
return (None, 0.0)
def extract_client_name(self, text: str) -> Tuple[Optional[str], float]:
"""
Extract client/buyer company name from B2B receipts.
Args:
text: Raw OCR text from receipt
Returns:
Tuple of (client_name, confidence) or (None, 0.0)
"""
text_upper = text.upper()
lines = text.split('\n')
# First check if there's a CLIENT section
client_section_idx = None
for i, line in enumerate(lines):
line_upper = line.upper().strip()
if any(re.search(marker, line_upper, re.IGNORECASE) for marker in self.CLIENT_MARKERS):
client_section_idx = i
break
if client_section_idx is None:
return (None, 0.0)
# Look for company name in CLIENT section
line = lines[client_section_idx].strip()
line_upper = line.upper()
# Strategy 1: Check if name is on same line after ":"
if ':' in line:
name_part = line.split(':', 1)[1].strip()
if name_part and len(name_part) >= 3:
# Skip if it looks like a CUI (RO followed by digits)
if re.match(r'^R[O0]?\d{6,10}$', name_part.upper()):
pass # This is CUI, not name - continue to next strategy
else:
# Check for company indicators
name_upper = name_part.upper()
if any(re.search(ind, name_upper) for ind in self.COMPANY_INDICATORS):
return (self._clean_company_name(name_part), 0.95)
elif len(name_part) >= 5 and not name_part.isdigit():
return (self._clean_company_name(name_part), 0.80)
# Strategy 2: Check next line for company name
if client_section_idx + 1 < len(lines):
next_line = lines[client_section_idx + 1].strip()
next_upper = next_line.upper()
# Skip if it's a CUI/CIF line or looks like CUI
if not re.search(r'C\.?\s*[UI]\.?\s*[IF]\.?', next_upper):
if not re.match(r'^R[O0]?\d{6,10}$', next_upper):
if any(re.search(ind, next_upper) for ind in self.COMPANY_INDICATORS):
return (self._clean_company_name(next_line), 0.90)
elif len(next_line) >= 5 and not next_line.isdigit():
# Check it's not CUI/CIF/COD keywords
if not any(kw in next_upper for kw in ['CUI', 'CIF', 'COD', 'FISCAL']):
return (self._clean_company_name(next_line), 0.75)
# Strategy 3: Look for any line with company indicators in CLIENT section region
search_end = min(client_section_idx + 5, len(lines))
for i in range(client_section_idx + 1, search_end):
line = lines[i].strip()
line_upper = line.upper()
# Skip CUI/CIF lines
if re.search(r'C\.?\s*[UI]\.?\s*[IF]\.?', line_upper):
continue
if re.match(r'^R[O0]?\d{6,10}$', line_upper):
continue
if any(re.search(ind, line_upper) for ind in self.COMPANY_INDICATORS):
return (self._clean_company_name(line), 0.85)
return (None, 0.0)
@staticmethod
def _clean_company_name(name: str) -> str:
"""Clean company name for storage."""
if not name:
return ""
# Remove extra whitespace
name = re.sub(r'\s+', ' ', name).strip()
# Remove trailing punctuation except periods in S.R.L., S.A., etc.
name = re.sub(r'[,;:]+$', '', name).strip()
return name
# -------------------------------------------------------------------------
# Validation hints - override to customize validation behavior
# -------------------------------------------------------------------------
def get_validation_hints(self) -> Dict[str, Any]:
"""
Return validation hints for this store.
Returns:
Dict with validation hints. Common keys:
- has_multi_rate_tva: bool - Store uses multiple TVA rates
- card_equals_total: bool - CARD payment equals total
- has_client_cui: bool - Receipt includes client CUI
- has_efactura: bool - Store uses e-factura format
- is_non_vat_payer: bool - Store is not a VAT payer
"""
return {}
# -------------------------------------------------------------------------
# Helper methods - available to all subclasses
# -------------------------------------------------------------------------
@staticmethod
def _normalize_number(text: str) -> str:
"""
Normalize a number string for Decimal conversion.
Handles Romanian formats: "1.234,56" -> "1234.56"
"""
if not text:
return "0"
# Remove spaces
text = text.replace(" ", "")
# Determine decimal separator
last_comma = text.rfind(",")
last_dot = text.rfind(".")
if last_comma > last_dot:
text = text.replace(".", "").replace(",", ".")
elif last_dot > last_comma:
text = text.replace(",", "")
else:
text = text.replace(",", ".")
return text
@staticmethod
def _parse_decimal(text: str) -> Optional[Decimal]:
"""Parse a string to Decimal, handling various formats."""
try:
normalized = BaseStoreProfile._normalize_number(text)
return Decimal(normalized)
except (InvalidOperation, ValueError, TypeError):
return None
@staticmethod
def _parse_date(text: str) -> Optional[date]:
"""
Parse date string in various formats.
Supports: DD-MM-YYYY, DD/MM/YYYY, DD.MM.YYYY, YYYY-MM-DD
"""
if not text:
return None
# Normalize separators
text = text.replace('/', '-').replace('.', '-')
try:
parts = text.split('-')
if len(parts) != 3:
return None
# Determine format based on first part length
if len(parts[0]) == 4:
# YYYY-MM-DD
year, month, day = int(parts[0]), int(parts[1]), int(parts[2])
else:
# DD-MM-YYYY
day, month, year = int(parts[0]), int(parts[1]), int(parts[2])
# Validate ranges
if 1 <= day <= 31 and 1 <= month <= 12 and 2000 <= year <= 2100:
return date(year, month, day)
except (ValueError, TypeError, IndexError):
pass
return None
@staticmethod
def _clean_text(text: str) -> str:
"""Clean OCR text for pattern matching."""
if not text:
return ""
text = re.sub(r'\s+', ' ', text)
text = re.sub(r'[\x00-\x09\x0b\x0c\x0e-\x1f\x7f]', '', text)
return text.strip()
# -------------------------------------------------------------------------
# Magic methods
# -------------------------------------------------------------------------
def __repr__(self) -> str:
return f"<{self.__class__.__name__} CUI={self.CUI_LIST}>"
def __str__(self) -> str:
return f"{self.STORE_NAME} ({', '.join(self.CUI_LIST)})"

View File

@@ -0,0 +1,54 @@
"""
BEST PRINT TRADE ACTIV SRL store profile for OCR extraction.
Stamp manufacturing service. Non-VAT payer (neplătitor de TVA).
"""
from typing import List, Dict, Any
from .base import BaseStoreProfile
from . import ProfileRegistry
@ProfileRegistry.register
class BestPrintProfile(BaseStoreProfile):
"""
BEST PRINT TRADE ACTIV SRL - non-VAT payer profile.
Key characteristics:
- Non-VAT payer (neplătitor de TVA) - NO TVA on receipts
- Stamp manufacturing and printing services
- Total amount has no TVA component
- CARD payment typical
"""
CUI_LIST = ["45417955"]
NAME_PATTERNS = ["BEST PRINT", "BESTPRINT", "BEST PRINT TRADE", "BEST PR1NT"]
STORE_NAME = "BEST PRINT TRADE ACTIV SRL"
def extract_tva_entries(self, text: str) -> List[dict]:
"""
Extract TVA entries - returns empty for non-VAT payer.
BEST PRINT is a non-VAT payer (neplătitor de TVA),
so no TVA entries are expected on receipts.
Args:
text: Raw OCR text from receipt (unused)
Returns:
Empty list (non-VAT payer has no TVA)
"""
# Non-VAT payer - no TVA entries
return []
def get_validation_hints(self) -> Dict[str, Any]:
"""Return BEST PRINT-specific validation hints."""
return {
"has_multi_rate_tva": False,
"card_equals_total": True,
"has_client_cui": True, # May have client CUI
"has_efactura": False,
"is_non_vat_payer": True, # CRITICAL: Non-VAT payer
"tva_pattern": "none",
}

View File

@@ -0,0 +1,101 @@
"""
BRICK (Five-Holding) store profile for OCR extraction.
Five-Holding S.A. operates BRICK stores with standard receipt format.
"""
import re
from decimal import Decimal, InvalidOperation
from typing import List, Dict, Any
from .base import BaseStoreProfile
from . import ProfileRegistry
@ProfileRegistry.register
class BrickProfile(BaseStoreProfile):
"""
FIVE-HOLDING S.A. (BRICK) - standard TVA format.
Key characteristics:
- Standard TVA format
- Single TVA rate typically
- No client CUI on receipts
"""
CUI_LIST = ["10562600"]
NAME_PATTERNS = ["BRICK", "FIVE-HOLDING", "FIVE HOLDING", "BR1CK"] # OCR variants
STORE_NAME = "FIVE-HOLDING S.A."
# Standard TVA patterns (flexible - accepts any rate)
TVA_PATTERNS = [
# "TVA A: XX% = YY,YY" or "TVA-A XX% YY,YY"
r'TVA\s*[-:]?\s*([A-D])\s*:?\s*(\d{1,2})\s*%\s*[=:]?\s*([\d.,]+)',
# "A - XX,XX% = YY,YY"
r'([A-D])\s*[-:]\s*(\d{1,2})[.,]?\d{0,2}\s*%\s*[=:]?\s*([\d.,]+)',
# Simple: "TVA XX% YY,YY"
r'TVA\s+(\d{1,2})\s*%\s*([\d.,]+)',
]
def extract_tva_entries(self, text: str) -> List[dict]:
"""
Extract BRICK-specific TVA entries.
Args:
text: Raw OCR text from receipt
Returns:
List of TVA entries with code, percent, and amount
"""
entries = []
seen = set()
# Try coded patterns first
for pattern in self.TVA_PATTERNS[:2]:
for match in re.finditer(pattern, text, re.IGNORECASE):
try:
code = match.group(1).upper()
percent = int(match.group(2))
amount = self._parse_decimal(match.group(3))
if amount and amount > 0:
entry_key = (code, percent)
if entry_key not in seen:
entries.append({
'code': code,
'percent': percent,
'amount': amount
})
seen.add(entry_key)
except (ValueError, InvalidOperation, IndexError):
continue
# Fallback to simple format
if not entries:
simple_pattern = self.TVA_PATTERNS[2]
for match in re.finditer(simple_pattern, text, re.IGNORECASE):
try:
percent = int(match.group(1))
amount = self._parse_decimal(match.group(2))
if amount and amount > 0:
entries.append({
'code': 'A',
'percent': percent,
'amount': amount
})
break
except (ValueError, InvalidOperation):
continue
return entries
def get_validation_hints(self) -> Dict[str, Any]:
"""Return BRICK-specific validation hints."""
return {
"has_multi_rate_tva": False,
"card_equals_total": False,
"has_client_cui": False,
"has_efactura": False,
"is_non_vat_payer": False,
}

View File

@@ -0,0 +1,118 @@
"""
DEDEMAN store profile for OCR extraction.
Dedeman receipts may include e-factura information and use standard TVA format.
Large DIY retailer in Romania.
"""
import re
from decimal import Decimal, InvalidOperation
from typing import List, Dict, Any
from .base import BaseStoreProfile
from . import ProfileRegistry
@ProfileRegistry.register
class DedemanProfile(BaseStoreProfile):
"""
DEDEMAN SRL - standard TVA with e-factura support.
Key characteristics:
- Standard TVA format
- May include e-factura reference number
- Professional receipts for construction materials
"""
CUI_LIST = ["2816464"]
NAME_PATTERNS = ["DEDEMAN", "DEDEMAN SRL", "OEDEMAN", "D3DEMAN"] # OCR variants
STORE_NAME = "DEDEMAN SRL"
# Standard TVA patterns (flexible - accepts any rate)
TVA_PATTERNS = [
# "TVA A: XX% = YY,YY" or "TVA-A XX% YY,YY"
r'TVA\s*[-:]?\s*([A-D])\s*:?\s*(\d{1,2})\s*%\s*[=:]?\s*([\d.,]+)',
# "A - XX,XX% = YY,YY"
r'([A-D])\s*[-:]\s*(\d{1,2})[.,]?\d{0,2}\s*%\s*[=:]?\s*([\d.,]+)',
# "TVA (XX%) YY,YY"
r'TVA\s*\(?\s*(\d{1,2})\s*%\s*\)?\s*:?\s*([\d.,]+)',
]
# E-factura pattern for reference extraction
EFACTURA_PATTERN = r'e-?factura\s*:?\s*([A-Z0-9]+)'
def extract_tva_entries(self, text: str) -> List[dict]:
"""
Extract Dedeman-specific TVA entries.
Args:
text: Raw OCR text from receipt
Returns:
List of TVA entries with code, percent, and amount
"""
entries = []
seen = set()
# Try coded patterns first
for pattern in self.TVA_PATTERNS[:2]:
for match in re.finditer(pattern, text, re.IGNORECASE):
try:
code = match.group(1).upper()
percent = int(match.group(2))
amount = self._parse_decimal(match.group(3))
if amount and amount > 0:
entry_key = (code, percent)
if entry_key not in seen:
entries.append({
'code': code,
'percent': percent,
'amount': amount
})
seen.add(entry_key)
except (ValueError, InvalidOperation, IndexError):
continue
# Fallback to simple format
if not entries:
simple_pattern = self.TVA_PATTERNS[2]
for match in re.finditer(simple_pattern, text, re.IGNORECASE):
try:
percent = int(match.group(1))
amount = self._parse_decimal(match.group(2))
if amount and amount > 0:
entries.append({
'code': 'A',
'percent': percent,
'amount': amount
})
break
except (ValueError, InvalidOperation):
continue
return entries
def extract_efactura_reference(self, text: str) -> str | None:
"""
Extract e-factura reference number if present.
Args:
text: Raw OCR text from receipt
Returns:
E-factura reference string or None
"""
match = re.search(self.EFACTURA_PATTERN, text, re.IGNORECASE)
return match.group(1) if match else None
def get_validation_hints(self) -> Dict[str, Any]:
"""Return Dedeman-specific validation hints."""
return {
"has_multi_rate_tva": False,
"card_equals_total": False,
"has_client_cui": False,
"has_efactura": True,
"is_non_vat_payer": False,
}

View File

@@ -0,0 +1,102 @@
"""
ELECTROBERING S.R.L. store profile for OCR extraction.
Electronics and home supplies store.
"""
import re
from decimal import Decimal, InvalidOperation
from typing import List, Dict, Any
from .base import BaseStoreProfile
from . import ProfileRegistry
@ProfileRegistry.register
class ElectroberingProfile(BaseStoreProfile):
"""
ELECTROBERING S.R.L. - standard TVA profile.
Key characteristics:
- Standard TVA format (single rate, any percentage)
- Electronics and home supplies
- May have client CUI for B2B purchases
- CARD payment typical
"""
CUI_LIST = ["2744937"]
NAME_PATTERNS = ["ELECTROBERING", "ELECTR0BERING", "ELECTROBERING SRL"]
STORE_NAME = "ELECTROBERING S.R.L."
# Standard TVA patterns (flexible - accepts any rate)
TVA_PATTERNS = [
# "TVA A: XX% = YY,YY" or "TVA-A XX% YY,YY"
r'TVA\s*[-:]?\s*([A-D])\s*:?\s*(\d{1,2})\s*%\s*[=:]?\s*([\d.,]+)',
# "A - XX,XX% = YY,YY"
r'([A-D])\s*[-:]\s*(\d{1,2})[.,]?\d{0,2}\s*%\s*[=:]?\s*([\d.,]+)',
# "TVA XX% YY,YY" (simple format without code)
r'TVA\s+(\d{1,2})\s*%\s*([\d.,]+)',
]
def extract_tva_entries(self, text: str) -> List[dict]:
"""
Extract TVA entries from receipt text.
Args:
text: Raw OCR text from receipt
Returns:
List of TVA entries with code, percent, and amount
"""
entries = []
seen = set()
# Try coded patterns first
for pattern in self.TVA_PATTERNS[:2]:
for match in re.finditer(pattern, text, re.IGNORECASE):
try:
code = match.group(1).upper()
percent = int(match.group(2))
amount = self._parse_decimal(match.group(3))
if amount and amount > 0:
entry_key = (code, percent)
if entry_key not in seen:
entries.append({
'code': code,
'percent': percent,
'amount': amount
})
seen.add(entry_key)
except (ValueError, InvalidOperation, IndexError):
continue
# Fallback to simple format
if not entries:
simple_pattern = self.TVA_PATTERNS[2]
for match in re.finditer(simple_pattern, text, re.IGNORECASE):
try:
percent = int(match.group(1))
amount = self._parse_decimal(match.group(2))
if amount and amount > 0:
entries.append({
'code': 'A',
'percent': percent,
'amount': amount
})
break
except (ValueError, InvalidOperation):
continue
return entries
def get_validation_hints(self) -> Dict[str, Any]:
"""Return ELECTROBERING-specific validation hints."""
return {
"has_multi_rate_tva": False,
"card_equals_total": True,
"has_client_cui": True, # May have client CUI for B2B
"has_efactura": False,
"is_non_vat_payer": False,
}

View File

@@ -0,0 +1,103 @@
"""
GAMA INK SERVICE SRL store profile for OCR extraction.
Toner refill and printer supplies store.
"""
import re
from decimal import Decimal, InvalidOperation
from typing import List, Dict, Any
from .base import BaseStoreProfile
from . import ProfileRegistry
@ProfileRegistry.register
class GamaInkProfile(BaseStoreProfile):
"""
GAMA INK SERVICE SRL - standard TVA profile.
Key characteristics:
- Standard TVA format (single rate, any percentage)
- Service-based (toner refill, printer supplies)
- CARD payment typical
"""
CUI_LIST = ["17741882"]
NAME_PATTERNS = ["GAMA INK", "GAMA", "GAMAINK", "GAMA INK SERVICE"]
STORE_NAME = "GAMA INK SERVICE SRL"
# Standard TVA patterns (flexible - accepts any rate)
TVA_PATTERNS = [
# "TVA A: XX% = YY,YY" or "TVA-A XX% YY,YY"
r'TVA\s*[-:]?\s*([A-D])\s*:?\s*(\d{1,2})\s*%\s*[=:]?\s*([\d.,]+)',
# "A - XX,XX% = YY,YY"
r'([A-D])\s*[-:]\s*(\d{1,2})[.,]?\d{0,2}\s*%\s*[=:]?\s*([\d.,]+)',
# "TVA XX% YY,YY" (simple format without code)
r'TVA\s+(\d{1,2})\s*%\s*([\d.,]+)',
# "TVA: YY,YY" (amount only, percent inferred)
r'TVA\s*:?\s*([\d.,]+)\s*(?:LEI|RON)?',
]
def extract_tva_entries(self, text: str) -> List[dict]:
"""
Extract TVA entries from receipt text.
Args:
text: Raw OCR text from receipt
Returns:
List of TVA entries with code, percent, and amount
"""
entries = []
seen = set()
# Try coded patterns first (have both code and percent)
for pattern in self.TVA_PATTERNS[:2]:
for match in re.finditer(pattern, text, re.IGNORECASE):
try:
code = match.group(1).upper()
percent = int(match.group(2))
amount = self._parse_decimal(match.group(3))
if amount and amount > 0:
entry_key = (code, percent)
if entry_key not in seen:
entries.append({
'code': code,
'percent': percent,
'amount': amount
})
seen.add(entry_key)
except (ValueError, InvalidOperation, IndexError):
continue
# Fallback to simple format (percent + amount without code)
if not entries:
simple_pattern = self.TVA_PATTERNS[2]
for match in re.finditer(simple_pattern, text, re.IGNORECASE):
try:
percent = int(match.group(1))
amount = self._parse_decimal(match.group(2))
if amount and amount > 0:
entries.append({
'code': 'A',
'percent': percent,
'amount': amount
})
break
except (ValueError, InvalidOperation):
continue
return entries
def get_validation_hints(self) -> Dict[str, Any]:
"""Return GAMA INK-specific validation hints."""
return {
"has_multi_rate_tva": False,
"card_equals_total": True,
"has_client_cui": False,
"has_efactura": False,
"is_non_vat_payer": False,
}

View File

@@ -0,0 +1,53 @@
"""
KINETERRA store profile for OCR extraction.
Kineterra is a non-VAT payer (neplătitor de TVA).
Receipts don't include TVA breakdown.
"""
from typing import List, Dict, Any
from .base import BaseStoreProfile
from . import ProfileRegistry
@ProfileRegistry.register
class KineterraProfile(BaseStoreProfile):
"""
KINETERRA CONCEPT SRL - non-VAT payer profile.
Key characteristics:
- Non-VAT payer (neplătitor de TVA)
- No TVA breakdown on receipts
- Total amount has no TVA component
"""
CUI_LIST = ["31180432"]
NAME_PATTERNS = ["KINETERRA", "KINETERRA CONCEPT", "K1NETERRA"] # OCR variants
STORE_NAME = "KINETERRA CONCEPT SRL"
def extract_tva_entries(self, text: str) -> List[dict]:
"""
Extract TVA entries - returns empty for non-VAT payer.
Kineterra is a non-VAT payer, so no TVA entries are expected.
Args:
text: Raw OCR text from receipt (unused)
Returns:
Empty list (non-VAT payer has no TVA)
"""
# Non-VAT payer - no TVA entries
return []
def get_validation_hints(self) -> Dict[str, Any]:
"""Return Kineterra-specific validation hints."""
return {
"has_multi_rate_tva": False,
"card_equals_total": False,
"has_client_cui": False,
"has_efactura": False,
"is_non_vat_payer": True,
"tva_pattern": "none",
}

View File

@@ -0,0 +1,93 @@
"""
LIDL store profile for OCR extraction.
Lidl receipts have a specific TVA format without hyphen/colon separators:
TOTAL TVA 9,84
TVA A 21,00% 7,71
TVA B 11,00% 2,13
This profile handles multi-rate TVA extraction for Lidl receipts.
"""
import re
from decimal import Decimal, InvalidOperation
from typing import List, Dict, Any
from .base import BaseStoreProfile
from . import ProfileRegistry
@ProfileRegistry.register
class LidlProfile(BaseStoreProfile):
"""
LIDL DISCOUNT S.R.L. - multi-rate TVA profile.
Key characteristics:
- Multi-rate TVA (codes A, B, C, D with any percentage - patterns are flexible)
- TVA format: "TVA A XX,XX% YY,YY" (code + percent + amount on same line)
- Supports historical rates (19%, 9%, 5%) and current rates (21%, 11%)
- CARD payment usually equals total
- No client CUI on receipts
"""
CUI_LIST = ["22891860"]
NAME_PATTERNS = ["LIDL", "LDL", "L1DL", "LIDL DISCOUNT"] # OCR variants
STORE_NAME = "LIDL DISCOUNT S.R.L."
# Lidl-specific TVA patterns
# Format: "TVA A 21,00% 7,71" (code + percent + amount on same line)
TVA_PATTERNS = [
# Primary: "TVA A 21,00% 7.71" with various spacing
r'T[VU][AR]\s+([A-D])\s+(\d{1,2})[.,]?\d{0,2}\s*%\s+([\d.,]+)',
# With backslash OCR artifact: "TVA A \21,00% 7.71"
r'T[VU][AR]\s+([A-D])\s+\\?(\d{1,2})[.,]?\d{0,2}\s*%\s+([\d.,]+)',
# IVA variant (rare OCR misread)
r'IVA\s+([A-D])\s+(\d{1,2})[.,]?\d{0,2}\s*%\s+([\d.,]+)',
]
def extract_tva_entries(self, text: str) -> List[dict]:
"""
Extract Lidl-specific TVA entries.
Handles multiple TVA rates (A, B, C, D) commonly found on Lidl receipts.
Uses deduplication to avoid counting the same entry twice from different patterns.
Args:
text: Raw OCR text from receipt
Returns:
List of TVA entries with code, percent, and amount
"""
entries = []
seen = set() # Deduplication key: (code, percent)
for pattern in self.TVA_PATTERNS:
for match in re.finditer(pattern, text, re.IGNORECASE):
try:
code = match.group(1).upper()
percent = int(match.group(2))
amount = self._parse_decimal(match.group(3))
if amount and amount > 0:
entry_key = (code, percent)
if entry_key not in seen:
entries.append({
'code': code,
'percent': percent,
'amount': amount
})
seen.add(entry_key)
except (ValueError, InvalidOperation):
continue
return entries
def get_validation_hints(self) -> Dict[str, Any]:
"""Return Lidl-specific validation hints."""
return {
"has_multi_rate_tva": True,
"card_equals_total": True,
"has_client_cui": False,
"has_efactura": False,
"is_non_vat_payer": False,
}

View File

@@ -0,0 +1,99 @@
"""
OMV Petrom store profile for OCR extraction.
OMV receipts typically include client CUI and use standard TVA format.
Common at gas stations with fuel purchases.
Date format: YYYY. MM. DD with spaces (e.g., "2025. 08. 14")
"""
import re
from datetime import date
from decimal import Decimal, InvalidOperation
from typing import List, Dict, Any, Tuple, Optional
from .base import BaseStoreProfile
from . import ProfileRegistry
@ProfileRegistry.register
class OMVProfile(BaseStoreProfile):
"""
OMV PETROM MARKETING S.R.L. - standard TVA with client CUI.
Key characteristics:
- Standard TVA format (usually single rate, any percentage)
- Includes client CUI on receipt (for business purchases)
- TVA table format: "A-XX,XX% base_amount tva_amount"
- Supports historical rates (19%) and current rates (21%)
- Date format: YYYY. MM. DD (with spaces)
"""
CUI_LIST = ["11201891"]
NAME_PATTERNS = ["OMV", "PETROM", "OMV PETROM", "0MV"] # OCR variants
STORE_NAME = "OMV PETROM MARKETING S.R.L."
# OMV TVA table pattern: "A-19,00% 285,66 49,58" (code-percent base tva)
TVA_TABLE_PATTERN = r'([A-D])\s*[-:]\s*(\d{1,2})[.,]\d{2}\s*%\s+([\d.,]+)\s+([\d.,]+)'
# Standard TVA pattern fallback
TVA_STANDARD_PATTERN = r'TVA\s*:?\s*([\d.,]+)'
# OMV specific: prioritize YYYY. MM. DD format with spaces
DATE_PATTERNS_OCR_SPACES = [
# YYYY. MM. DD with time (OMV format)
(r'(\d{4})[.,]\s*(\d{2})[.,]\s*(\d{2})\s+\d{2}:\d{2}', 0.98, 'ymd'),
(r'(\d{4})[.,]\s*(\d{2})[.,]\s*(\d{2})', 0.95, 'ymd'),
# Fallback to DD. MM. YYYY
(r'(\d{2})[.,]\s*(\d{2})[.,]\s*(\d{4})\s+\d{2}:\d{2}', 0.92, 'dmy'),
(r'(\d{2})[.,]\s*(\d{2})[.,]\s*(\d{4})', 0.85, 'dmy'),
]
def extract_tva_entries(self, text: str) -> List[dict]:
"""
Extract OMV-specific TVA entries.
OMV receipts often show TVA in table format with base and TVA amounts.
Falls back to standard extraction if table format not found.
Args:
text: Raw OCR text from receipt
Returns:
List of TVA entries with code, percent, and amount
"""
entries = []
seen = set()
# Try table format first (more accurate)
for match in re.finditer(self.TVA_TABLE_PATTERN, text, re.IGNORECASE):
try:
code = match.group(1).upper()
percent = int(match.group(2))
# TVA amount is the second number (smaller one)
tva_amount = self._parse_decimal(match.group(4))
if tva_amount and tva_amount > 0:
entry_key = (code, percent)
if entry_key not in seen:
entries.append({
'code': code,
'percent': percent,
'amount': tva_amount
})
seen.add(entry_key)
except (ValueError, InvalidOperation):
continue
return entries
def get_validation_hints(self) -> Dict[str, Any]:
"""Return OMV-specific validation hints."""
return {
"has_multi_rate_tva": False,
"card_equals_total": False,
"has_client_cui": True,
"has_efactura": False,
"is_non_vat_payer": False,
"tva_table_format": True,
}

View File

@@ -0,0 +1,101 @@
"""
PICTUS VELUM SRL store profile for OCR extraction.
Office supplies and stationery store.
"""
import re
from decimal import Decimal, InvalidOperation
from typing import List, Dict, Any
from .base import BaseStoreProfile
from . import ProfileRegistry
@ProfileRegistry.register
class PictusVelumProfile(BaseStoreProfile):
"""
PICTUS VELUM SRL - standard TVA profile.
Key characteristics:
- Standard TVA format (single rate, any percentage)
- Office supplies and stationery (rechizite)
- CARD payment typical
"""
CUI_LIST = ["39634534"]
NAME_PATTERNS = ["PICTUS", "PICTUS VELUM", "P1CTUS", "PICTUS VELUM SRL"]
STORE_NAME = "PICTUS VELUM SRL"
# Standard TVA patterns (flexible - accepts any rate)
TVA_PATTERNS = [
# "TVA A: XX% = YY,YY" or "TVA-A XX% YY,YY"
r'TVA\s*[-:]?\s*([A-D])\s*:?\s*(\d{1,2})\s*%\s*[=:]?\s*([\d.,]+)',
# "A - XX,XX% = YY,YY"
r'([A-D])\s*[-:]\s*(\d{1,2})[.,]?\d{0,2}\s*%\s*[=:]?\s*([\d.,]+)',
# "TVA XX% YY,YY" (simple format without code)
r'TVA\s+(\d{1,2})\s*%\s*([\d.,]+)',
]
def extract_tva_entries(self, text: str) -> List[dict]:
"""
Extract TVA entries from receipt text.
Args:
text: Raw OCR text from receipt
Returns:
List of TVA entries with code, percent, and amount
"""
entries = []
seen = set()
# Try coded patterns first
for pattern in self.TVA_PATTERNS[:2]:
for match in re.finditer(pattern, text, re.IGNORECASE):
try:
code = match.group(1).upper()
percent = int(match.group(2))
amount = self._parse_decimal(match.group(3))
if amount and amount > 0:
entry_key = (code, percent)
if entry_key not in seen:
entries.append({
'code': code,
'percent': percent,
'amount': amount
})
seen.add(entry_key)
except (ValueError, InvalidOperation, IndexError):
continue
# Fallback to simple format
if not entries:
simple_pattern = self.TVA_PATTERNS[2]
for match in re.finditer(simple_pattern, text, re.IGNORECASE):
try:
percent = int(match.group(1))
amount = self._parse_decimal(match.group(2))
if amount and amount > 0:
entries.append({
'code': 'A',
'percent': percent,
'amount': amount
})
break
except (ValueError, InvalidOperation):
continue
return entries
def get_validation_hints(self) -> Dict[str, Any]:
"""Return PICTUS VELUM-specific validation hints."""
return {
"has_multi_rate_tva": False,
"card_equals_total": True,
"has_client_cui": False,
"has_efactura": False,
"is_non_vat_payer": False,
}

View File

@@ -0,0 +1,111 @@
"""
SOCAR Petroleum store profile for OCR extraction.
SOCAR receipts are similar to OMV - gas station with client CUI support.
Date format may use YYYY. MM. DD with spaces.
"""
import re
from datetime import date
from decimal import Decimal, InvalidOperation
from typing import List, Dict, Any, Tuple, Optional
from .base import BaseStoreProfile
from . import ProfileRegistry
@ProfileRegistry.register
class SocarProfile(BaseStoreProfile):
"""
SOCAR PETROLEUM S.A. - standard TVA with client CUI.
Key characteristics:
- Standard TVA format (usually single rate)
- Includes client CUI on receipt (for business purchases)
- Similar format to OMV/Petrom
- Date format may use YYYY. MM. DD (with spaces)
"""
CUI_LIST = ["12546600"]
NAME_PATTERNS = ["SOCAR", "S0CAR", "SOCAR PETROLEUM"] # OCR variants
STORE_NAME = "SOCAR PETROLEUM S.A."
# Standard TVA patterns for gas stations
TVA_PATTERNS = [
# Table format: "A-19,00% 285,66 49,58"
r'([A-D])\s*[-:]\s*(\d{1,2})[.,]\d{2}\s*%\s+([\d.,]+)\s+([\d.,]+)',
# Simple format: "TVA 19% 49,58"
r'TVA\s+(\d{1,2})\s*%\s*([\d.,]+)',
]
# Gas stations may use YYYY. MM. DD format
DATE_PATTERNS_OCR_SPACES = [
(r'(\d{4})[.,]\s*(\d{2})[.,]\s*(\d{2})\s+\d{2}:\d{2}', 0.98, 'ymd'),
(r'(\d{4})[.,]\s*(\d{2})[.,]\s*(\d{2})', 0.95, 'ymd'),
(r'(\d{2})[.,]\s*(\d{2})[.,]\s*(\d{4})\s+\d{2}:\d{2}', 0.92, 'dmy'),
(r'(\d{2})[.,]\s*(\d{2})[.,]\s*(\d{4})', 0.85, 'dmy'),
]
def extract_tva_entries(self, text: str) -> List[dict]:
"""
Extract SOCAR-specific TVA entries.
Args:
text: Raw OCR text from receipt
Returns:
List of TVA entries with code, percent, and amount
"""
entries = []
seen = set()
# Try table format first
table_pattern = self.TVA_PATTERNS[0]
for match in re.finditer(table_pattern, text, re.IGNORECASE):
try:
code = match.group(1).upper()
percent = int(match.group(2))
tva_amount = self._parse_decimal(match.group(4))
if tva_amount and tva_amount > 0:
entry_key = (code, percent)
if entry_key not in seen:
entries.append({
'code': code,
'percent': percent,
'amount': tva_amount
})
seen.add(entry_key)
except (ValueError, InvalidOperation):
continue
# Fallback to simple format if no table entries found
if not entries:
simple_pattern = self.TVA_PATTERNS[1]
for match in re.finditer(simple_pattern, text, re.IGNORECASE):
try:
percent = int(match.group(1))
amount = self._parse_decimal(match.group(2))
if amount and amount > 0:
# Default to code 'A' for simple format
entries.append({
'code': 'A',
'percent': percent,
'amount': amount
})
break # Only take first match for simple format
except (ValueError, InvalidOperation):
continue
return entries
def get_validation_hints(self) -> Dict[str, Any]:
"""Return SOCAR-specific validation hints."""
return {
"has_multi_rate_tva": False,
"card_equals_total": False,
"has_client_cui": True,
"has_efactura": False,
"is_non_vat_payer": False,
}

View File

@@ -0,0 +1,112 @@
"""
STEPOUT MARKET SRL store profile for OCR extraction.
Bookstore with reduced TVA rate (5% for books in Romania).
"""
import re
from decimal import Decimal, InvalidOperation
from typing import List, Dict, Any
from .base import BaseStoreProfile
from . import ProfileRegistry
@ProfileRegistry.register
class StepoutMarketProfile(BaseStoreProfile):
"""
STEPOUT MARKET SRL - reduced TVA rate profile (books).
Key characteristics:
- Reduced TVA rate: 5% for books (cărți qualification in Romania)
- May also have standard rates for non-book items
- Patterns are flexible to accept ANY TVA rate
- CARD payment typical
"""
CUI_LIST = ["35532655"]
NAME_PATTERNS = ["STEPOUT", "STEPOUT MARKET", "STEP0UT", "STEPOUT MARKET SRL"]
STORE_NAME = "STEPOUT MARKET SRL"
# TVA patterns (flexible - accepts any rate including 5%)
TVA_PATTERNS = [
# "TVA A: 5% = YY,YY" or "TVA-A 5% YY,YY" (coded format)
r'TVA\s*[-:]?\s*([A-D])\s*:?\s*(\d{1,2})\s*%\s*[=:]?\s*([\d.,]+)',
# "A - 5,00% = YY,YY" (table format)
r'([A-D])\s*[-:]\s*(\d{1,2})[.,]?\d{0,2}\s*%\s*[=:]?\s*([\d.,]+)',
# "TVA 5% YY,YY" (simple format - common for single rate)
r'TVA\s+(\d{1,2})\s*%\s*([\d.,]+)',
# "TVA 5,00%: YY,YY" (percent with colon)
r'TVA\s+(\d{1,2})[.,]\d{2}\s*%\s*:?\s*([\d.,]+)',
]
def extract_tva_entries(self, text: str) -> List[dict]:
"""
Extract TVA entries from receipt text.
Stepout Market primarily sells books which have 5% TVA in Romania.
The patterns are generic and will extract whatever rate is on the receipt.
Args:
text: Raw OCR text from receipt
Returns:
List of TVA entries with code, percent, and amount
"""
entries = []
seen = set()
# Try coded patterns first (have code letter)
for pattern in self.TVA_PATTERNS[:2]:
for match in re.finditer(pattern, text, re.IGNORECASE):
try:
code = match.group(1).upper()
percent = int(match.group(2))
amount = self._parse_decimal(match.group(3))
if amount and amount > 0:
entry_key = (code, percent)
if entry_key not in seen:
entries.append({
'code': code,
'percent': percent,
'amount': amount
})
seen.add(entry_key)
except (ValueError, InvalidOperation, IndexError):
continue
# Fallback to simple format (no code letter, just percent + amount)
if not entries:
for pattern in self.TVA_PATTERNS[2:]:
for match in re.finditer(pattern, text, re.IGNORECASE):
try:
percent = int(match.group(1))
amount = self._parse_decimal(match.group(2))
if amount and amount > 0:
# Default to code 'A' for simple format
entries.append({
'code': 'A',
'percent': percent,
'amount': amount
})
break # Only take first match for simple format
except (ValueError, InvalidOperation):
continue
if entries:
break
return entries
def get_validation_hints(self) -> Dict[str, Any]:
"""Return STEPOUT MARKET-specific validation hints."""
return {
"has_multi_rate_tva": False,
"card_equals_total": True,
"has_client_cui": True, # May have client CUI
"has_efactura": False,
"is_non_vat_payer": False,
"typical_tva_rate": 5, # Books have 5% TVA in Romania
"product_category": "books",
}

View File

@@ -0,0 +1,103 @@
"""
UNLIMITED KEYS S.R.L. store profile for OCR extraction.
Key duplication service. Notable for CASH (NUMERAR) payments.
"""
import re
from decimal import Decimal, InvalidOperation
from typing import List, Dict, Any
from .base import BaseStoreProfile
from . import ProfileRegistry
@ProfileRegistry.register
class UnlimitedKeysProfile(BaseStoreProfile):
"""
UNLIMITED KEYS S.R.L. - standard TVA profile with NUMERAR payment.
Key characteristics:
- Standard TVA format (single rate, any percentage)
- Key duplication service
- NUMERAR (cash) payment common - different from most stores!
- May also accept CARD
"""
CUI_LIST = ["18993187"]
NAME_PATTERNS = ["UNLIMITED KEYS", "UNLIMITED", "UNL1MITED", "UNLIMITED KEYS SRL"]
STORE_NAME = "UNLIMITED KEYS S.R.L."
# Standard TVA patterns (flexible - accepts any rate)
TVA_PATTERNS = [
# "TVA A: XX% = YY,YY" or "TVA-A XX% YY,YY"
r'TVA\s*[-:]?\s*([A-D])\s*:?\s*(\d{1,2})\s*%\s*[=:]?\s*([\d.,]+)',
# "A - XX,XX% = YY,YY"
r'([A-D])\s*[-:]\s*(\d{1,2})[.,]?\d{0,2}\s*%\s*[=:]?\s*([\d.,]+)',
# "TVA XX% YY,YY" (simple format without code)
r'TVA\s+(\d{1,2})\s*%\s*([\d.,]+)',
]
def extract_tva_entries(self, text: str) -> List[dict]:
"""
Extract TVA entries from receipt text.
Args:
text: Raw OCR text from receipt
Returns:
List of TVA entries with code, percent, and amount
"""
entries = []
seen = set()
# Try coded patterns first
for pattern in self.TVA_PATTERNS[:2]:
for match in re.finditer(pattern, text, re.IGNORECASE):
try:
code = match.group(1).upper()
percent = int(match.group(2))
amount = self._parse_decimal(match.group(3))
if amount and amount > 0:
entry_key = (code, percent)
if entry_key not in seen:
entries.append({
'code': code,
'percent': percent,
'amount': amount
})
seen.add(entry_key)
except (ValueError, InvalidOperation, IndexError):
continue
# Fallback to simple format
if not entries:
simple_pattern = self.TVA_PATTERNS[2]
for match in re.finditer(simple_pattern, text, re.IGNORECASE):
try:
percent = int(match.group(1))
amount = self._parse_decimal(match.group(2))
if amount and amount > 0:
entries.append({
'code': 'A',
'percent': percent,
'amount': amount
})
break
except (ValueError, InvalidOperation):
continue
return entries
def get_validation_hints(self) -> Dict[str, Any]:
"""Return UNLIMITED KEYS-specific validation hints."""
return {
"has_multi_rate_tva": False,
"card_equals_total": False, # May be NUMERAR (cash)
"has_client_cui": True, # May have client CUI
"has_efactura": False,
"is_non_vat_payer": False,
"common_payment": "NUMERAR", # Cash payments common
}