feat(ocr): Add modular store profiles with hot-reload support

## Store Profiles System - Add ProfileRegistry for CUI-based profile lookup - Add BaseStoreProfile with generic extraction patterns - Implement hot-reload via POST /api/data-entry/ocr/profiles/reload ## 12 Store Profiles - LIDL: Multi-rate TVA (A, B, C, D codes) - OMV, SOCAR: B2B with client CUI, YYYY.MM.DD dates - BRICK, DEDEMAN: Standard TVA, e-factura support - KINETERRA, BEST PRINT: Non-VAT payers (returns []) - STEPOUT MARKET: TVA 5% (books/reduced rate) - UNLIMITED KEYS: NUMERAR payment detection - GAMA INK, ELECTROBERING, PICTUS VELUM: Standard TVA ## Flexible TVA Patterns - All patterns use (\d{1,2})% to accept any rate - Supports historical (19%, 9%, 5%) and current (21%, 11%) ## Payment Methods Fix - Fixed base.py to support multiple payments of same type - Changed deduplication from method-only to (method, amount) tuple - Returns separate entries for split payments ## Tools - Add generate_store_profile.py for automatic profile generation - Analyzes PDFs via OCR API and detects patterns 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 23:07:07 +00:00
parent 67b0082df0
commit 099556213d
25 changed files with 3707 additions and 114 deletions
--- a/.claude/rules/claude-learn-backend.md
+++ b/.claude/rules/claude-learn-backend.md
@@ -0,0 +1,106 @@
 # Claude Learn: Backend
 **Domain**: backend
 **Last updated**: 2026-01-06
 **Sessions recorded**: 1
 Knowledge about FastAPI, Python services, Oracle DB, and backend architecture.
 ---
 ## Patterns
 ### ProfileRegistry cu Hot-Reload pentru Store Profiles
 **Discovered**: 2026-01-06 (feature: ocr-store-profiles)
 **Description**: Sistem de înregistrare profile OCR folosind decorator `@ProfileRegistry.register` cu hot-reload via `importlib.reload()`. Permite adăugarea/modificarea profilelor fără restart server.
 **Example** (`backend/modules/data_entry/services/ocr/profiles/__init__.py`):
 ```python
 class ProfileRegistry:
    _profiles: Dict[str, Type["BaseStoreProfile"]] = {}
    _instances: Dict[str, "BaseStoreProfile"] = {}
    @classmethod
    def register(cls, profile_class):
        """Decorator to register a store profile class."""
        for cui in profile_class.CUI_LIST:
            cls._profiles[cls._normalize_cui(cui)] = profile_class
        return profile_class
    @classmethod
    def reload_all(cls):
        """Hot-reload all profile modules via importlib.reload()."""
        cls._instances.clear()
        for module_name in cls._get_profile_module_names():
            importlib.reload(sys.modules[f"backend...profiles.{module_name}"])
 ```
 **Usage**:
 ```python
@ProfileRegistry.register
 class LidlProfile(BaseStoreProfile):
    CUI_LIST = ["22891860"]
    STORE_NAME = "LIDL DISCOUNT S.R.L."
 # Lookup
 profile = ProfileRegistry.get_profile("22891860")
 # Hot-reload (endpoint)
 POST /api/data-entry/ocr/profiles/reload
 ```
 **Tags**: registry-pattern, hot-reload, decorator, ocr, singleton
 ---
 ### Script generare cod Python din analiză PDF
 **Discovered**: 2026-01-06 (feature: ocr-store-profiles)
 **Description**: Script care analizează PDF-uri via OCR API, detectează pattern-uri (TVA format, date format, payment) și generează automat cod Python pentru profile noi. Include JWT auth, async polling, și verificare sintaxă.
 **Example** (`scripts/generate_store_profile.py`):
 ```python
 def analyze_tva_patterns(results: List[Dict]) -> Dict:
    """Detectează format TVA dominant din rezultatele OCR."""
    tva_formats = defaultdict(int)
    for text in raw_texts:
        if re.search(r'TVA\s+[A-D]\s+\d{1,2}', text_upper):
            tva_formats["lidl_multi_rate"] += 1
        if re.search(r'BAZA\s+TVA', text_upper):
            tva_formats["table"] += 1
    return {"dominant_format": max(tva_formats, key=tva_formats.get)}
 def generate_profile_code(store_name, cui, tva_analysis, ...):
    """Generează cod Python pentru clasa de profil."""
    # Template-based generation cu OCR error variants
 ```
 **Usage**:
 ```bash
 # Dry-run pentru preview
 python scripts/generate_store_profile.py \
  --name "Magazin Nou" --cui "12345678" \
  --receipts "docs/data-entry/MagazinNou*.pdf" --dry-run
 # Generează și salvează
 python scripts/generate_store_profile.py \
  --name "Magazin Nou" --cui "12345678" \
  --receipts "docs/data-entry/MagazinNou*.pdf" \
  --output backend/.../profiles/magazin_nou.py
 ```
 **Tags**: code-generation, ocr, automation, cli-tool
 ---
 ## Gotchas
 _(None recorded yet)_
 ---
 ## Statistics
 - **Total Patterns**: 2
 - **Total Gotchas**: 0
 - **Last Session**: 2026-01-06
 - **Sessions Recorded**: 1
--- a/.claude/rules/claude-learn-database.md
+++ b/.claude/rules/claude-learn-database.md
@@ -0,0 +1,28 @@
 # Claude Learn: Database
 **Domain**: database
 **Last updated**: -
 **Sessions recorded**: 0
 Knowledge about Oracle DB, SQLite, SQLModel, migrations, and data modeling.
 ---
 ## Patterns
 _(None recorded yet)_
 ---
 ## Gotchas
 _(None recorded yet)_
 ---
 ## Statistics
 - **Total Patterns**: 0
 - **Total Gotchas**: 0
 - **Last Session**: -
 - **Sessions Recorded**: 0
--- a/.claude/rules/claude-learn-deployment.md
+++ b/.claude/rules/claude-learn-deployment.md
@@ -0,0 +1,55 @@
 # Claude Learn: Deployment
 **Domain**: deployment
 **Last updated**: 2026-01-06
 **Sessions recorded**: 1
 Knowledge about IIS, Docker, deployment scripts, and infrastructure.
 ---
 ## Patterns
 ### IIS URL Rewrite Rules for SPA with Multiple API Backends
 **Discovered**: 2025-12-22 (feature: unified-app)
 **Description**: Configure IIS web.config to proxy different API paths to different backend ports while serving SPA for all other routes. Enables single IIS site to route to multiple microservices.
 **Example** (`public/web.config:5-28`):
 ```xml
 <rewrite>
  <rules>
    <rule name="Proxy Reports API" stopProcessing="true">
      <match url="^api/reports/(.*)" />
      <action type="Rewrite" url="http://localhost:8001/api/{R:1}" />
    </rule>
    <rule name="Proxy Data Entry API" stopProcessing="true">
      <match url="^api/data-entry/(.*)" />
      <action type="Rewrite" url="http://localhost:8003/api/{R:1}" />
    </rule>
    <rule name="SPA Fallback" stopProcessing="true">
      <match url=".*" />
      <conditions logicalGrouping="MatchAll">
        <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
      </conditions>
      <action type="Rewrite" url="/index.html" />
    </rule>
  </rules>
 </rewrite>
 ```
 **Tags**: iis, deployment, spa, microservices, proxy
 ---
 ## Gotchas
 _(None recorded yet)_
 ---
 ## Statistics
 - **Total Patterns**: 1
 - **Total Gotchas**: 0
 - **Last Session**: 2026-01-06
 - **Sessions Recorded**: 1
--- a/.claude/rules/claude-learn-domains.md
+++ b/.claude/rules/claude-learn-domains.md
@@ -0,0 +1,83 @@
 # Claude Learn Domains Configuration
 **Last updated**: 2026-01-06
 This file defines available knowledge domains and their file path patterns.
 ---
 ## Domains
 ### frontend
 **File**: `claude-learn-frontend.md`
 **Patterns**:
 - `src/**/*.vue`
 - `src/**/*.js`
 - `src/**/*.ts`
 - `src/**/*.css`
 - `vite.config.*`
 - `package.json`
 ---
 ### backend
 **File**: `claude-learn-backend.md`
 **Patterns**:
 - `backend/**/*.py`
 - `backend/modules/**/*`
 - `requirements.txt`
 ---
 ### database
 **File**: `claude-learn-database.md`
 **Patterns**:
 - `**/*.sql`
 - `**/models.py`
 - `**/schemas.py`
 - `backend/**/db/**/*`
 ---
 ### testing
 **File**: `claude-learn-testing.md`
 **Patterns**:
 - `tests/**/*`
 - `**/*.test.*`
 - `**/*.spec.*`
 - `pytest.ini`
 - `vitest.config.*`
 ---
 ### deployment
 **File**: `claude-learn-deployment.md`
 **Patterns**:
 - `deployment/**/*`
 - `public/web.config`
 - `Dockerfile*`
 - `docker-compose*.yml`
 - `*.sh`
 - `ansible/**/*`
 ---
 ### global
 **File**: `claude-learn-global.md`
 **Patterns**:
 - `*` (catch-all for cross-cutting concerns)
 ---
 ## Statistics
 | Domain | Patterns | Gotchas | Last Updated |
 |--------|----------|---------|--------------|
 | frontend | 8 | 10 | 2026-01-06 |
 | deployment | 1 | 0 | 2026-01-06 |
 | global | 0 | 1 | 2026-01-06 |
 | backend | 2 | 0 | 2026-01-06 |
 | database | 0 | 0 | - |
 | testing | 0 | 0 | - |
 **Total**: 11 patterns, 11 gotchas across 4 domains
--- a/.claude/rules/claude-learn-frontend.md
+++ b/.claude/rules/claude-learn-frontend.md
@@ -1,9 +1,10 @@
-# Learned Patterns & Gotchas
+# Claude Learn: Frontend
-**Last updated**: 2025-12-24
+**Domain**: frontend
-**Maintained**: Manually (add new patterns/gotchas as discovered)
+**Last updated**: 2026-01-06
 **Sessions recorded**: 3
-This file contains insights learned during feature implementations. Claude Code auto-loads this file to prevent repeating past mistakes.
+Knowledge about Vue.js, Vite, Pinia, CSS, and frontend architecture.
 ---
@@ -130,37 +131,6 @@ resolve: {
 ---
 ### IIS URL Rewrite Rules for SPA with Multiple API Backends
 **Discovered**: 2025-12-22 (feature: unified-app)
 **Description**: Configure IIS web.config to proxy different API paths to different backend ports while serving SPA for all other routes. Enables single IIS site to route to multiple microservices.
 **Example** (`public/web.config:5-28`):
 ```xml
 <rewrite>
  <rules>
    <rule name="Proxy Reports API" stopProcessing="true">
      <match url="^api/reports/(.*)" />
      <action type="Rewrite" url="http://localhost:8001/api/{R:1}" />
    </rule>
    <rule name="Proxy Data Entry API" stopProcessing="true">
      <match url="^api/data-entry/(.*)" />
      <action type="Rewrite" url="http://localhost:8003/api/{R:1}" />
    </rule>
    <rule name="SPA Fallback" stopProcessing="true">
      <match url=".*" />
      <conditions logicalGrouping="MatchAll">
        <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
      </conditions>
      <action type="Rewrite" url="/index.html" />
    </rule>
  </rules>
 </rewrite>
 ```
 **Tags**: iis, deployment, spa, microservices, proxy
 ---
 ### Vue Watcher for Auto-Loading Dependent Data
 **Discovered**: 2025-12-24 (feature: unified-app-ux)
 **Description**: Use Vue watch() to automatically trigger data loading when dependent selections change. Watch company selection changes to auto-load accounting periods, ensuring UI stays synchronized without manual intervention.
@@ -248,15 +218,6 @@ const getStorageKey = () => {
 ---
 ### Sed Command Quote Mismatch in Bulk Find-Replace
 **Discovered**: 2025-12-22 (feature: unified-app)
 **Problem**: Bulk sed commands using single quotes in pattern didn't match imports using double quotes, and vice versa. Commands like sed 's|from '@/stores/'|...' didn't replace from "@/stores/" lines.
 **Solution**: Always use the quote style that matches the target files. For Vue/JS files with ESLint using double quotes, use double quotes in sed patterns. Better yet: use find -exec with separate sed for each file to handle both quote styles.
 **Tags**: sed, regex, scripting, find-replace, migration
 ---
 ### Circular Reference in API Wrapper
 **Discovered**: 2025-12-22 (feature: unified-app)
 **Problem**: receiptsStore.js failed to build with 'Identifier api has already been declared' because it imported api and then declared const api = { ... } wrapper object using the same name.
@@ -287,7 +248,7 @@ const getStorageKey = () => {
 ### Vite Build Transform Count is Progress Indicator
 **Discovered**: 2025-12-22 (feature: unified-app)
 **Problem**: Hard to tell if build is making progress when fixing import issues. Each fix revealed new errors, causing frustration.
-**Solution**: Watch the 'transforming... ✓ N modules transformed' count - it increases with each successful fix even if build ultimately fails. Going from 200→573→1490→1492 modules meant we were getting close to success. Use this as encouragement!
+**Solution**: Watch the 'transforming... N modules transformed' count - it increases with each successful fix even if build ultimately fails. Going from 200->573->1490->1492 modules meant we were getting close to success. Use this as encouragement!
 **Tags**: vite, build, debugging, progress-tracking, developer-experience
@@ -329,9 +290,9 @@ const getStorageKey = () => {
 ---
-## Memory Statistics
+## Statistics
- **Total Patterns**: 9
+- **Total Patterns**: 8
- **Total Gotchas**: 11
+- **Total Gotchas**: 10
- **Last Session**: 2025-12-24 (unified-app-ux)
+- **Last Session**: 2026-01-06
- **Sessions Recorded**: 2
+- **Sessions Recorded**: 3
--- a/.claude/rules/claude-learn-global.md
+++ b/.claude/rules/claude-learn-global.md
@@ -0,0 +1,33 @@
 # Claude Learn: Global
 **Domain**: global
 **Last updated**: 2026-01-06
 **Sessions recorded**: 1
 Cross-cutting knowledge applicable to multiple domains (scripting, tooling, workflow).
 ---
 ## Patterns
 _(None recorded yet)_
 ---
 ## Gotchas
 ### Sed Command Quote Mismatch in Bulk Find-Replace
 **Discovered**: 2025-12-22 (feature: unified-app)
 **Problem**: Bulk sed commands using single quotes in pattern didn't match imports using double quotes, and vice versa. Commands like sed 's|from '@/stores/'|...' didn't replace from "@/stores/" lines.
 **Solution**: Always use the quote style that matches the target files. For Vue/JS files with ESLint using double quotes, use double quotes in sed patterns. Better yet: use find -exec with separate sed for each file to handle both quote styles.
 **Tags**: sed, regex, scripting, find-replace, migration
 ---
 ## Statistics
 - **Total Patterns**: 0
 - **Total Gotchas**: 1
 - **Last Session**: 2026-01-06
 - **Sessions Recorded**: 1
--- a/.claude/rules/claude-learn-testing.md
+++ b/.claude/rules/claude-learn-testing.md
@@ -0,0 +1,28 @@
 # Claude Learn: Testing
 **Domain**: testing
 **Last updated**: -
 **Sessions recorded**: 0
 Knowledge about pytest, Vitest, test patterns, and validation strategies.
 ---
 ## Patterns
 _(None recorded yet)_
 ---
 ## Gotchas
 _(None recorded yet)_
 ---
 ## Statistics
 - **Total Patterns**: 0
 - **Total Gotchas**: 0
 - **Last Session**: -
 - **Sessions Recorded**: 0
--- a/backend/modules/data_entry/routers/ocr.py
+++ b/backend/modules/data_entry/routers/ocr.py
@@ -628,3 +628,86 @@ def _dict_to_extraction_data(data: dict) -> ExtractionData:
        validation_errors=data.get('validation_errors', []),
        inter_ocr_ratios=data.get('inter_ocr_ratios', {}),
    )
 # ============================================================================
 # Store Profiles Management Endpoints
 # ============================================================================
@router.post("/profiles/reload")
 async def reload_store_profiles(
    current_user: CurrentUser = Depends(get_current_user)
 ) -> dict:
    """
    Hot-reload all store profiles.
    Reloads profile Python modules without server restart.
    Use after adding/modifying profile files.
    Returns:
        Dict with reloaded count and profile list
    """
    from backend.modules.data_entry.services.ocr.profiles import ProfileRegistry
    count = ProfileRegistry.reload_all()
    status = ProfileRegistry.get_reload_status()
    return {
        "success": True,
        "reloaded_modules": count,
        "profiles_count": status["profiles_count"],
        "registered_cuis": status["registered_cuis"],
        "last_reload": status["last_reload"],
    }
@router.get("/profiles")
 async def list_store_profiles(
    current_user: CurrentUser = Depends(get_current_user)
 ) -> dict:
    """
    List all registered store profiles.
    Returns:
        Dict with profiles list and status
    """
    from backend.modules.data_entry.services.ocr.profiles import ProfileRegistry
    profiles = ProfileRegistry.list_profiles()
    status = ProfileRegistry.get_reload_status()
    return {
        "profiles": profiles,
        "count": len(profiles),
        "last_reload": status["last_reload"],
    }
@router.get("/profiles/{cui}")
 async def get_store_profile(
    cui: str,
    current_user: CurrentUser = Depends(get_current_user)
 ) -> dict:
    """
    Get details for a specific store profile.
    Args:
        cui: Store CUI (with or without RO prefix)
    Returns:
        Profile details including validation hints
    Raises:
        404: If no profile exists for this CUI
    """
    from backend.modules.data_entry.services.ocr.profiles import ProfileRegistry
    info = ProfileRegistry.get_profile_info(cui)
    if not info:
        raise HTTPException(
            status_code=404,
            detail=f"No profile registered for CUI: {cui}"
        )
    return info
--- a/backend/modules/data_entry/services/ocr/profiles/README.md
+++ b/backend/modules/data_entry/services/ocr/profiles/README.md
@@ -0,0 +1,258 @@
 # Store Profiles - OCR Extraction
 Sistem de profile specifice pentru extracție OCR cu hot-reload.
 ---
 ## Quick Start: Adaugă un profil nou
 ```bash
 # 1. Generează profil din PDF-uri (dry-run pentru preview)
 python scripts/generate_store_profile.py \
  --name "Magazin Nou SRL" \
  --cui "12345678" \
  --receipts "docs/data-entry/MagazinNou*.pdf" \
  --dry-run
 # 2. Generează și salvează
 python scripts/generate_store_profile.py \
  --name "Magazin Nou SRL" \
  --cui "12345678" \
  --receipts "docs/data-entry/MagazinNou*.pdf" \
  --output backend/modules/data_entry/services/ocr/profiles/magazin_nou.py
 # 3. Hot-reload (fără restart server)
 curl -X POST http://localhost:8000/api/data-entry/ocr/profiles/reload
 # 4. Verifică
 curl http://localhost:8000/api/data-entry/ocr/profiles
 ```
 ---
 ## Structura directorului
 ```
 profiles/
 ├── __init__.py        # ProfileRegistry + hot-reload (~390 linii)
 ├── base.py            # BaseStoreProfile + pattern-uri generice (~410 linii)
 ├── lidl.py            # Multi-rate TVA (A/B)
 ├── omv.py             # B2B, date YYYY.MM.DD
 ├── socar.py           # B2B, date YYYY.MM.DD
 ├── brick.py           # Standard TVA
 ├── dedeman.py         # E-factura support
 ├── kineterra.py       # Non-VAT payer
 ├── gama_ink.py        # Standard TVA (toner/cartușe)
 ├── electrobering.py   # Standard TVA (electronice)
 ├── pictus_velum.py    # Standard TVA (rechizite)
 ├── unlimited_keys.py  # Standard TVA, NUMERAR payment
 ├── best_print.py      # Non-VAT payer (neplătitor TVA)
 ├── stepout_market.py  # TVA 5% (cărți/librărie)
 └── README.md          # Acest fișier
 ```
 ---
 ## Profile existente (12 profile)
 > **Note**: Pattern-urile TVA sunt **flexibile** și acceptă ORICE cotă (5%, 9%, 11%, 19%, 21%, etc.)
 > pentru a gestiona atât datele istorice cât și schimbările viitoare ale legislației.
 | Magazin | CUI | Fișier | Caracteristici |
 |---------|-----|--------|----------------|
 | LIDL DISCOUNT S.R.L. | 22891860 | `lidl.py` | Multi-rate TVA (coduri A, B, C, D) |
 | OMV PETROM MARKETING S.R.L. | 11201891 | `omv.py` | B2B (client CUI), date YYYY.MM.DD |
 | SOCAR PETROLEUM S.A. | 12546600 | `socar.py` | B2B (client CUI), date YYYY.MM.DD |
 | FIVE-HOLDING S.A. (BRICK) | 10562600 | `brick.py` | Standard TVA |
 | DEDEMAN SRL | 2816464 | `dedeman.py` | E-factura support |
 | KINETERRA CONCEPT SRL | 31180432 | `kineterra.py` | Non-VAT payer (returnează `[]`) |
 | GAMA INK SERVICE SRL | 17741882 | `gama_ink.py` | Standard TVA (toner, cartușe) |
 | ELECTROBERING S.R.L. | 2744937 | `electrobering.py` | Standard TVA (electronice) |
 | PICTUS VELUM SRL | 39634534 | `pictus_velum.py` | Standard TVA (rechizite) |
 | UNLIMITED KEYS S.R.L. | 18993187 | `unlimited_keys.py` | Standard TVA, **NUMERAR** plată |
 | BEST PRINT TRADE ACTIV SRL | 45417955 | `best_print.py` | **Non-VAT payer** (neplătitor TVA) |
 | STEPOUT MARKET SRL | 35532655 | `stepout_market.py` | TVA 5% (cărți, librărie) |
 ---
 ## API Endpoints
 | Endpoint | Metodă | Descriere |
 |----------|--------|-----------|
 | `/api/data-entry/ocr/profiles` | GET | Lista toate profilele |
 | `/api/data-entry/ocr/profiles/{cui}` | GET | Detalii profil (acceptă RO prefix) |
 | `/api/data-entry/ocr/profiles/reload` | POST | Hot-reload toate profilele |
 ### Exemple API
 ```bash
 # Lista profile
 curl http://localhost:8000/api/data-entry/ocr/profiles \
  -H "Authorization: Bearer <token>"
 # Detalii profil (cu sau fără RO prefix)
 curl http://localhost:8000/api/data-entry/ocr/profiles/22891860
 curl http://localhost:8000/api/data-entry/ocr/profiles/RO22891860
 # Hot-reload după modificări
 curl -X POST http://localhost:8000/api/data-entry/ocr/profiles/reload \
  -H "Authorization: Bearer <token>"
 # Response reload:
 {
  "success": true,
  "reloaded_modules": 12,
  "profiles_count": 12,
  "registered_cuis": ["22891860", "11201891", "12546600", "10562600", ...],
  "last_reload": "2026-01-06T22:37:05.000000"
 }
 ```
 ---
 ## Cum funcționează sistemul
 ### Flow de extracție
 ```
 ReceiptExtractor.extract()
  │
  ├─► STEP 1: Extrage vendor + CUI
  │     └─► _extract_vendor(), _extract_cui()
  │
  ├─► ProfileRegistry.get_profile(cui)
  │     └─► Returnează profil specific sau None
  │
  ├─► STEP 2: Extracție cu profil (dacă există)
  │     ├─► profile.extract_total()
  │     ├─► profile.extract_date()
  │     ├─► profile.extract_receipt_number()
  │     ├─► profile.extract_tva_entries()
  │     ├─► profile.extract_payment_methods()
  │     └─► profile.extract_client_cui()
  │
  └─► STEP 3-4: Validare + post-procesare
 ```
 ### Fallback
 Dacă nu există profil pentru CUI, se folosește logica generică din `ReceiptExtractor`.
 ---
 ## Structura unui profil
 ```python
 from .base import BaseStoreProfile
 from . import ProfileRegistry
@ProfileRegistry.register
 class MagazinNouProfile(BaseStoreProfile):
    """Docstring cu descriere magazin."""
    CUI_LIST = ["12345678"]  # Poate avea mai multe CUI-uri
    NAME_PATTERNS = ["MAGAZIN", "MAGAZIN NOU", "MAG4ZIN"]  # OCR variants
    STORE_NAME = "Magazin Nou SRL"
    # Override doar ce e diferit de base class
    def extract_tva_entries(self, text: str) -> List[dict]:
        # Pattern-uri specifice magazinului
        ...
    def get_validation_hints(self) -> Dict[str, Any]:
        return {
            "has_multi_rate_tva": False,
            "card_equals_total": True,
            "has_client_cui": False,
            "has_efactura": False,
            "is_non_vat_payer": False,
        }
 ```
 ---
 ## Pattern-uri disponibile în base.py
 BaseStoreProfile include pattern-uri generice OCR-tolerant:
 | Pattern | Descriere |
 |---------|-----------|
 | `TOTAL_PATTERNS` | 8 variante pentru TOTAL (TOTAL:, TOTAL DE PLATA, etc.) |
 | `DATE_PATTERNS` | 6 variante (DD.MM.YYYY, YYYY-MM-DD, DD/MM/YYYY) |
 | `DATE_PATTERNS_OCR_SPACES` | 4 variante cu spații OCR ("2025. 08. 14") |
 | `NUMBER_PATTERNS` | 11 variante pentru număr bon (NDS, BF, C3POS) |
 | `PAYMENT_PATTERNS` | 8 variante pentru CARD/NUMERAR |
 | `CLIENT_MARKERS` | 6 variante pentru secțiune CLIENT |
 | `CLIENT_CUI_PATTERNS` | 7 variante pentru CUI client |
 ### Metode implementate în base class
 - `extract_total(text)` → `Tuple[Decimal, float]`
 - `extract_date(text)` → `Tuple[date, float]`
 - `extract_receipt_number(text)` → `Tuple[str, float]`
 - `extract_payment_methods(text)` → `List[dict]`
 - `extract_client_cui(text)` → `Tuple[str, float]`
 - `extract_client_name(text)` → `Tuple[str, float]`
 ---
 ## Când ai nevoie de profil custom?
 | Situație | Exemplu | Ce trebuie override |
 |----------|---------|---------------------|
 | **Multi-rate TVA** | Lidl (TVA A, TVA B) | `extract_tva_entries()` |
 | **Format dată special** | OMV/Socar (YYYY.MM.DD) | `DATE_PATTERNS_OCR_SPACES` |
 | **B2B receipts** | Benzinării (au client CUI) | `extract_client_cui()` |
 | **Non-VAT payer** | Kineterra | `extract_tva_entries()` returnează `[]` |
 | **E-factura** | Dedeman | `extract_efactura_reference()` |
 ---
 ## Decizii de design
 1. **Hot-reload manual** - endpoint `/profiles/reload` apelat când se modifică fișiere
 2. **Persistență în Python** - profile în Git, version controlled
 3. **Fallback graceful** - dacă nu există profil, folosește logica generică
 4. **CUI normalization** - gestionează automat prefixul "RO" și whitespace
 5. **Deduplicare TVA** - folosește `seen = set()` pentru a evita duplicate
 ---
 ## Comenzi utile
 ```bash
 # Verifică syntax Python pentru toate profilele
 for f in backend/modules/data_entry/services/ocr/profiles/*.py; do
  python3 -m py_compile "$f" && echo "✓ $(basename $f)"
 done
 # Lista profile
 ls -la backend/modules/data_entry/services/ocr/profiles/
 # Pornește backend pentru testare
 cd backend && source venv/bin/activate
 uvicorn main:app --host 0.0.0.0 --port 8000 --workers 1
 # Test OCR pe un PDF
 curl -X POST -F "file=@docs/data-entry/test.pdf" \
  -H "Authorization: Bearer <token>" \
  "http://localhost:8000/api/data-entry/ocr/extract?engine=doctr_plus"
 ```
 ---
 ## Script generare profile
 `scripts/generate_store_profile.py` - generator automat de profile
 ```bash
 # Vezi help
 python scripts/generate_store_profile.py --help
 # Funcționalități:
 # - Analizează PDF-uri via OCR API
 # - Detectează: TVA format, date format, payment patterns, B2B
 # - Generează cod Python cu OCR error variants
 # - Suportă glob patterns (*.pdf)
 # - Verifică sintaxa după generare
 ```
--- a/backend/modules/data_entry/services/ocr/profiles/init.py
+++ b/backend/modules/data_entry/services/ocr/profiles/init.py
@@ -0,0 +1,388 @@
 """
 Store Profiles Registry with Hot-Reload Support.
 This module provides a registry for store-specific OCR extraction profiles.
 Profiles can be reloaded at runtime without restarting the server.
 Usage:
    from backend.modules.data_entry.services.ocr.profiles import ProfileRegistry
    # Get profile for a CUI
    profile = ProfileRegistry.get_profile("22891860")
    if profile:
        tva_entries = profile.extract_tva_entries(text)
    # Reload all profiles (after file changes)
    count = ProfileRegistry.reload_all()
 Architecture:
    - ProfileRegistry: Singleton registry with class methods
    - BaseStoreProfile: Abstract base class for profiles
    - @ProfileRegistry.register: Decorator for profile classes
 Hot-Reload Mechanism:
    1. Admin calls POST /profiles/reload endpoint
    2. Registry clears instance cache
    3. importlib.reload() re-executes each profile module
    4. @register decorator re-registers classes with new code
 """
 from __future__ import annotations
 import importlib
 import logging
 import sys
 from datetime import datetime
 from pathlib import Path
 from typing import Dict, List, Optional, Type, TYPE_CHECKING
 if TYPE_CHECKING:
    from .base import BaseStoreProfile
 logger = logging.getLogger(__name__)
 # Directory containing profile modules
 PROFILES_DIR = Path(__file__).parent
 class ProfileRegistry:
    """
    Registry for store-specific OCR extraction profiles.
    Uses class methods for singleton-like behavior without explicit instantiation.
    Supports hot-reload via importlib.reload() for runtime updates.
    Attributes:
        _profiles: Maps CUI -> profile class (not instance)
        _instances: Maps CUI -> profile instance (lazy, cleared on reload)
        _last_reload: Timestamp of last reload
        _loaded: Whether initial load has been performed
    """
    # Class-level storage (singleton pattern via class methods)
    _profiles: Dict[str, Type["BaseStoreProfile"]] = {}
    _instances: Dict[str, "BaseStoreProfile"] = {}
    _last_reload: Optional[datetime] = None
    _loaded: bool = False
    # -------------------------------------------------------------------------
    # Registration
    # -------------------------------------------------------------------------
    @classmethod
    def register(cls, profile_class: Type["BaseStoreProfile"]) -> Type["BaseStoreProfile"]:
        """
        Decorator to register a store profile class.
        Registers the profile for all CUIs in the class's CUI_LIST.
        Safe for re-registration during hot-reload (overwrites existing).
        Usage:
            @ProfileRegistry.register
            class LidlProfile(BaseStoreProfile):
                CUI_LIST = ["22891860"]
                ...
        Args:
            profile_class: Profile class to register
        Returns:
            The same class (allows use as decorator)
        Raises:
            ValueError: If CUI_LIST is empty
        """
        cui_list = getattr(profile_class, 'CUI_LIST', [])
        store_name = getattr(profile_class, 'STORE_NAME', profile_class.__name__)
        if not cui_list:
            logger.warning(f"Profile {profile_class.__name__} has empty CUI_LIST, skipping")
            return profile_class
        # Register for each CUI
        for cui in cui_list:
            # Normalize CUI (remove RO prefix, strip whitespace)
            normalized_cui = cls._normalize_cui(cui)
            if normalized_cui in cls._profiles:
                old_class = cls._profiles[normalized_cui]
                logger.debug(
                    f"Re-registering CUI {normalized_cui}: "
                    f"{old_class.__name__} -> {profile_class.__name__}"
                )
                # Clear cached instance for this CUI
                cls._instances.pop(normalized_cui, None)
            cls._profiles[normalized_cui] = profile_class
            logger.debug(f"Registered profile {profile_class.__name__} for CUI {normalized_cui}")
        logger.info(f"Registered {store_name} for CUIs: {cui_list}")
        return profile_class
    # -------------------------------------------------------------------------
    # Lookup
    # -------------------------------------------------------------------------
    @classmethod
    def get_profile(cls, cui: Optional[str]) -> Optional["BaseStoreProfile"]:
        """
        Get profile instance for a CUI.
        Uses lazy instantiation - creates instance on first access.
        Returns None if no profile is registered for this CUI.
        Args:
            cui: CUI to lookup (with or without RO prefix)
        Returns:
            Profile instance or None
        """
        if not cui:
            return None
        # Ensure profiles are loaded
        if not cls._loaded:
            cls._load_all_profiles()
        normalized_cui = cls._normalize_cui(cui)
        # Check if profile exists
        profile_class = cls._profiles.get(normalized_cui)
        if not profile_class:
            return None
        # Lazy instantiation
        if normalized_cui not in cls._instances:
            try:
                cls._instances[normalized_cui] = profile_class()
                logger.debug(f"Instantiated {profile_class.__name__} for CUI {normalized_cui}")
            except Exception as e:
                logger.error(f"Failed to instantiate {profile_class.__name__}: {e}")
                return None
        return cls._instances[normalized_cui]
    @classmethod
    def has_profile(cls, cui: Optional[str]) -> bool:
        """Check if a profile exists for this CUI."""
        if not cui:
            return False
        if not cls._loaded:
            cls._load_all_profiles()
        return cls._normalize_cui(cui) in cls._profiles
    # -------------------------------------------------------------------------
    # Listing
    # -------------------------------------------------------------------------
    @classmethod
    def list_profiles(cls) -> List[Dict]:
        """
        List all registered profiles.
        Returns:
            List of dicts with cui, class_name, store_name, name_patterns
        """
        if not cls._loaded:
            cls._load_all_profiles()
        result = []
        seen_classes = set()
        for cui, profile_class in cls._profiles.items():
            # Avoid duplicates for profiles with multiple CUIs
            if profile_class.__name__ in seen_classes:
                continue
            seen_classes.add(profile_class.__name__)
            result.append({
                "cuis": list(getattr(profile_class, 'CUI_LIST', [])),
                "class_name": profile_class.__name__,
                "store_name": getattr(profile_class, 'STORE_NAME', profile_class.__name__),
                "name_patterns": list(getattr(profile_class, 'NAME_PATTERNS', [])),
            })
        return result
    @classmethod
    def get_profile_info(cls, cui: str) -> Optional[Dict]:
        """
        Get detailed info about a profile.
        Args:
            cui: CUI to lookup
        Returns:
            Dict with profile details or None
        """
        profile = cls.get_profile(cui)
        if not profile:
            return None
        return {
            "cui": cui,
            "cuis": list(profile.CUI_LIST),
            "class_name": profile.__class__.__name__,
            "store_name": profile.STORE_NAME,
            "name_patterns": list(profile.NAME_PATTERNS),
            "validation_hints": profile.get_validation_hints(),
        }
    # -------------------------------------------------------------------------
    # Hot-Reload
    # -------------------------------------------------------------------------
    @classmethod
    def reload_all(cls) -> int:
        """
        Hot-reload all profile modules.
        Clears instance cache and reloads all .py files in profiles directory.
        Decorator re-registers classes with updated code.
        Returns:
            Number of modules reloaded
        """
        logger.info("Starting profile hot-reload...")
        # Clear instance cache (will be recreated on next get_profile)
        cls._instances.clear()
        # Get list of profile modules (exclude __init__, base)
        module_names = cls._get_profile_module_names()
        count = 0
        for module_name in module_names:
            full_name = f"backend.modules.data_entry.services.ocr.profiles.{module_name}"
            try:
                if full_name in sys.modules:
                    # Reload existing module
                    importlib.reload(sys.modules[full_name])
                    logger.debug(f"Reloaded module: {module_name}")
                else:
                    # Import new module
                    importlib.import_module(full_name)
                    logger.debug(f"Imported new module: {module_name}")
                count += 1
            except Exception as e:
                logger.error(f"Failed to reload {module_name}: {e}")
        cls._last_reload = datetime.utcnow()
        cls._loaded = True
        logger.info(f"Profile hot-reload complete: {count} modules, {len(cls._profiles)} profiles")
        return count
    @classmethod
    def get_reload_status(cls) -> Dict:
        """Get status of the registry including last reload time."""
        return {
            "loaded": cls._loaded,
            "last_reload": cls._last_reload.isoformat() if cls._last_reload else None,
            "profiles_count": len(cls._profiles),
            "instances_count": len(cls._instances),
            "registered_cuis": list(cls._profiles.keys()),
        }
    # -------------------------------------------------------------------------
    # Internal methods
    # -------------------------------------------------------------------------
    @classmethod
    def _normalize_cui(cls, cui: str) -> str:
        """
        Normalize CUI for consistent lookup.
        - Removes RO prefix (with or without space)
        - Strips whitespace
        - Converts to uppercase
        Args:
            cui: Raw CUI string
        Returns:
            Normalized CUI (digits only)
        """
        if not cui:
            return ""
        cui = str(cui).strip().upper()
        # Remove RO prefix (handles "RO12345" and "RO 12345")
        if cui.startswith("RO"):
            cui = cui[2:].lstrip()
        return cui.strip()
    @classmethod
    def _get_profile_module_names(cls) -> List[str]:
        """
        Get list of profile module names from profiles directory.
        Excludes __init__.py and base.py.
        Returns:
            List of module names (without .py extension)
        """
        excluded = {"__init__", "base", "__pycache__"}
        modules = []
        for path in PROFILES_DIR.glob("*.py"):
            name = path.stem
            if name not in excluded:
                modules.append(name)
        return sorted(modules)
    @classmethod
    def _load_all_profiles(cls) -> None:
        """
        Initial load of all profile modules.
        Called automatically on first get_profile() if not already loaded.
        """
        if cls._loaded:
            return
        logger.info("Loading store profiles...")
        module_names = cls._get_profile_module_names()
        for module_name in module_names:
            full_name = f"backend.modules.data_entry.services.ocr.profiles.{module_name}"
            try:
                importlib.import_module(full_name)
                logger.debug(f"Loaded module: {module_name}")
            except Exception as e:
                logger.error(f"Failed to load {module_name}: {e}")
        cls._loaded = True
        cls._last_reload = datetime.utcnow()
        logger.info(f"Loaded {len(cls._profiles)} store profiles")
    @classmethod
    def clear(cls) -> None:
        """
        Clear all registered profiles.
        Mainly useful for testing.
        """
        cls._profiles.clear()
        cls._instances.clear()
        cls._loaded = False
        cls._last_reload = None
 # -------------------------------------------------------------------------
 # Module exports
 # -------------------------------------------------------------------------
 __all__ = [
    "ProfileRegistry",
    "BaseStoreProfile",
 ]
 # Re-export BaseStoreProfile for convenience
 from .base import BaseStoreProfile
--- a/backend/modules/data_entry/services/ocr/profiles/base.py
+++ b/backend/modules/data_entry/services/ocr/profiles/base.py
@@ -0,0 +1,515 @@
 """
 Base class for store-specific OCR extraction profiles.
 Each store can have different receipt formats (TVA layout, total position, etc.).
 Store profiles allow customizing extraction logic per-store for better accuracy.
 Usage:
    from .base import BaseStoreProfile
    from . import ProfileRegistry
    @ProfileRegistry.register
    class LidlProfile(BaseStoreProfile):
        CUI_LIST = ["22891860"]
        NAME_PATTERNS = ["LIDL", "LDL"]
        def extract_tva_entries(self, text: str) -> List[dict]:
            # Custom Lidl TVA extraction logic
            ...
 """
 import re
 from abc import ABC
 from decimal import Decimal, InvalidOperation
 from typing import List, Optional, Tuple, Dict, Any
 from datetime import date
 class BaseStoreProfile(ABC):
    """
    Abstract base class for store-specific extraction profiles.
    Each profile defines:
    - CUI_LIST: CUI codes that identify this store (without RO prefix)
    - NAME_PATTERNS: OCR-tolerant name patterns for fallback matching
    - Custom extraction methods for TVA, total, date, etc.
    The ProfileRegistry uses CUI_LIST to lookup profiles during extraction.
    """
    # -------------------------------------------------------------------------
    # Class attributes - override in subclasses
    # -------------------------------------------------------------------------
    # List of CUI codes (without RO prefix) that identify this store
    CUI_LIST: List[str] = []
    # OCR-tolerant name patterns for fallback matching
    NAME_PATTERNS: List[str] = []
    # Store display name
    STORE_NAME: str = "Unknown Store"
    # -------------------------------------------------------------------------
    # Generic patterns - can be overridden in subclasses
    # -------------------------------------------------------------------------
    # Total amount patterns (confidence-weighted)
    TOTAL_PATTERNS = [
        (r'T[O0]TAL[.\s]+L[E3][I1!]\s*:?\s*([\d\s.,]+)', 0.98),
        (r'TOTAL\s+LEI\s*([\d\s.,]+)', 0.98),
        (r'[OT]?OTAL\s+LEI\s*([\d\s.,]+)', 0.95),
        (r'TOTAL\s*:?\s*([\d\s.,]+)\s*(?:RON|LEI)?', 0.95),
        (r'TOTAL\s+(?:RON|LEI)\s*([\d\s.,]+)', 0.95),
        (r'SUBTOTAL\s*([\d\s.,]+)', 0.90),
        (r'DE\s+PLATA\s*:?\s*([\d\s.,]+)', 0.90),
        (r'SUMA\s*:?\s*([\d\s.,]+)', 0.85),
    ]
    # Date patterns (confidence-weighted)
    DATE_PATTERNS = [
        (r'D[AR]TA\s*:?\s*(\d{2}[-./]\d{2}[-./]\d{4})', 0.98),
        (r'DATA\s*:?\s*(\d{2}[-./]\d{2}[-./]\d{4})', 0.98),
        (r'(\d{2}[-./]\d{2}[-./]\d{4})\s+[O0]RA\s*:?\s*\d{2}:\d{2}', 0.95),
        (r'(\d{2}[-./]\d{2}[-./]\d{4})\s+\d{2}:\d{2}', 0.90),
        (r'(\d{2}[-./]\d{2}[-./]\d{4})', 0.80),
        (r'(\d{4}[-./]\d{2}[-./]\d{2})', 0.75),
    ]
    # Date patterns with OCR-introduced spaces (separate because format is different)
    DATE_PATTERNS_OCR_SPACES = [
        (r'(\d{4})[.,]\s*(\d{2})[.,]\s*(\d{2})\s+\d{2}:\d{2}', 0.92, 'ymd'),
        (r'(\d{4})[.,]\s*(\d{2})[.,]\s*(\d{2})', 0.85, 'ymd'),
        (r'(\d{2})[.,]\s*(\d{2})[.,]\s*(\d{4})\s+\d{2}:\d{2}', 0.92, 'dmy'),
        (r'(\d{2})[.,]\s*(\d{2})[.,]\s*(\d{4})', 0.85, 'dmy'),
    ]
    # Receipt number patterns (confidence-weighted)
    NUMBER_PATTERNS = [
        (r'NDS\s*:?\s*(\d+)', 0.98),
        (r'C3POS[-A-Z0-9]*[N:](\d{6,7})', 0.98),
        (r'C3POS.*?(\d{6,7})\b', 0.95),
        (r'BF\s*:\s*(\d{4,})', 0.96),
        (r'BF\s+(\d{4,})', 0.93),
        (r'NIVS\s*:?\s*(\d+)', 0.95),
        (r'NR\.?\s*BON\s*:?\s*(\d+)', 0.95),
        (r'BON\s+(?:FISCAL\s+)?NR\.?\s*:?\s*(\d+)', 0.95),
        (r'CHITANTA\s+NR\.?\s*:?\s*(\d+)', 0.95),
        (r'NR\.?\s+DOCUMENT\s*:?\s*(\d+)', 0.90),
        (r'ID\s*BF\s*:?\s*(\d+)', 0.90),
    ]
    # Payment method patterns (pattern, method_type, confidence)
    PAYMENT_PATTERNS = [
        (r'CARTE\s+CREDIT\s*:?\s*([\d\s.,]+)', 'CARD', 0.98),
        (r'CARTE\s+CREDIT\s*:?\s*\n\s*([\d\s.,]+)', 'CARD', 0.97),
        (r'(?:PLATA\s+)?CARD\s*[:\sA-Z]?\s*([\d\s.,]+)', 'CARD', 0.95),
        (r'NUMERAR\s*:?\s*([\d\s.,]+)', 'NUMERAR', 0.95),
        (r'CASH\s*:?\s*([\d\s.,]+)', 'NUMERAR', 0.90),
        (r'(?:^|\n|\s)RD\s*:?\s*(\d{1,6}[.,]\d{2})\b', 'CARD', 0.70),
        (r'(?:^|\n|\s)ARD\s*:?\s*(\d{1,6}[.,]\d{2})\b', 'CARD', 0.75),
        (r'(?:^|\n|\s)MERAR\s*:?\s*(\d{1,6}[.,]\d{2})\b', 'NUMERAR', 0.70),
    ]
    # Client section markers (for B2B receipts)
    CLIENT_MARKERS = [
        r'C\.?\s*[I1]\.?\s*F\.?\s+CLIENT\s*:',
        r'C\.?\s*U\.?\s*[I1]\.?\s+CLIENT\s*:',
        r'CLIENT\s+C\.?\s*[UI1]\.?\s*[IF1]\.?\s*:',
        r'CLIENT\s*:',
        r'CUMPARATOR\s*:',
        r'BENEFICIAR\s*:',
    ]
    # Client CUI patterns (pattern, confidence)
    CLIENT_CUI_PATTERNS = [
        (r'(R[O0]\d{6,10})\s*\n\s*CLIENT\s+C\.?\s*U\.?\s*[I1]\.?', 0.99),
        (r'(R[O0]\d{6,10})\s*:?\s*\n\s*CLIENT', 0.98),
        (r'C[I1]F\s+[A-Z]*\s*CLIENT\s*:?\s*(R[O0]\d{6,10})', 0.98),
        (r'C\.?\s*[I1]\.?\s*F\.?\s+CLIENT\s*:?\s*(R[O0]?\d{6,10})', 0.98),
        (r'C\.?\s*U\.?\s*[I1]\.?\s+CLIENT\s*:?\s*(R[O0]?\d{6,10})', 0.98),
        (r'CLIENT\s+C\.?\s*U\.?\s*[I1]\.?\s*:?\s*(R[O0]?\d{6,10})', 0.95),
        (r'CLIENT\s*:?\s*(R[O0]?\d{6,10})', 0.90),
    ]
    # Company type indicators (for identifying company names)
    COMPANY_INDICATORS = [
        r'\bS\.?\s*R\.?\s*L\.?\b',      # S.R.L. or S. R. L.
        r'\bS\.?\s*A\.?\b',              # S.A. or S. A.
        r'\bS\.?\s*N\.?\s*C\.?\b',      # S.N.C. or S. N. C.
        r'\bS\.?\s*C\.?\s*S\.?\b',      # S.C.S. or S. C. S.
        r'\bI\.?\s*I\.?\b',              # I.I. or I. I.
        r'\bP\.?\s*F\.?\s*A\.?\b',      # P.F.A. or P. F. A.
        r'\bS\.?\s*C\.?\s+[A-Z]',       # S.C. followed by company name
        r'HOLDING',
        r'COMPANY',
        r'GROUP',
    ]
    # Maximum reasonable payment amount (to filter OCR errors)
    MAX_PAYMENT = Decimal('100000')
    # -------------------------------------------------------------------------
    # Extraction methods - override in subclasses as needed
    # -------------------------------------------------------------------------
    def extract_tva_entries(self, text: str) -> List[dict]:
        """
        Extract TVA entries from receipt text.
        Override this method in subclasses to handle store-specific TVA formats.
        Args:
            text: Raw OCR text from receipt
        Returns:
            List of dicts with keys: code, percent, amount
        """
        return []
    def extract_total(self, text: str) -> Tuple[Optional[Decimal], float]:
        """
        Extract total amount from receipt text.
        Args:
            text: Raw OCR text from receipt
        Returns:
            Tuple of (amount, confidence) or (None, 0.0)
        """
        text_upper = text.upper()
        for pattern, confidence in self.TOTAL_PATTERNS:
            match = re.search(pattern, text_upper)
            if match:
                amount = self._parse_decimal(match.group(1))
                if amount and amount > 0 and amount < self.MAX_PAYMENT:
                    return (amount, confidence)
        return (None, 0.0)
    def extract_date(self, text: str) -> Tuple[Optional[date], float]:
        """
        Extract receipt date from text.
        Args:
            text: Raw OCR text from receipt
        Returns:
            Tuple of (date, confidence) or (None, 0.0)
        """
        text_upper = text.upper()
        # Try standard patterns first
        for pattern, confidence in self.DATE_PATTERNS:
            match = re.search(pattern, text_upper)
            if match:
                parsed = self._parse_date(match.group(1))
                if parsed:
                    return (parsed, confidence)
        # Try OCR-corrupted patterns with spaces
        for pattern, confidence, fmt in self.DATE_PATTERNS_OCR_SPACES:
            match = re.search(pattern, text_upper)
            if match:
                try:
                    if fmt == 'ymd':
                        year, month, day = int(match.group(1)), int(match.group(2)), int(match.group(3))
                    else:  # dmy
                        day, month, year = int(match.group(1)), int(match.group(2)), int(match.group(3))
                    if 1 <= day <= 31 and 1 <= month <= 12 and 2000 <= year <= 2100:
                        return (date(year, month, day), confidence)
                except (ValueError, TypeError):
                    continue
        return (None, 0.0)
    def extract_receipt_number(self, text: str) -> Tuple[Optional[str], float]:
        """
        Extract receipt number from text.
        Args:
            text: Raw OCR text from receipt
        Returns:
            Tuple of (number, confidence) or (None, 0.0)
        """
        text_upper = text.upper()
        for pattern, confidence in self.NUMBER_PATTERNS:
            match = re.search(pattern, text_upper)
            if match:
                number = match.group(1).strip()
                if number and len(number) >= 3:
                    return (number, confidence)
        return (None, 0.0)
    def extract_payment_methods(self, text: str) -> List[dict]:
        """
        Extract payment methods (CARD/NUMERAR) from receipt.
        Supports multiple payments of the same type (e.g., 2x CARD for split payments).
        Each payment is returned as a separate entry with its amount.
        Args:
            text: Raw OCR text from receipt
        Returns:
            List of dicts: [{'method': 'CARD'/'NUMERAR', 'amount': Decimal, 'confidence': float}]
            Multiple entries of same method type are allowed for split payments.
        """
        text_upper = text.upper()
        methods = []
        # Track (method, amount) pairs to avoid exact duplicates from overlapping patterns
        seen_entries = set()
        for pattern, method, confidence in self.PAYMENT_PATTERNS:
            for match in re.finditer(pattern, text_upper):
                try:
                    amount = self._parse_decimal(match.group(1))
                    if amount and amount > 0 and amount < self.MAX_PAYMENT:
                        # Deduplicate by (method, amount) to avoid same entry from multiple patterns
                        # But allow different amounts for same method (split payments)
                        entry_key = (method, amount)
                        if entry_key not in seen_entries:
                            methods.append({
                                'method': method,
                                'amount': amount,
                                'confidence': confidence
                            })
                            seen_entries.add(entry_key)
                except (ValueError, InvalidOperation):
                    continue
        return methods
    def extract_client_cui(self, text: str) -> Tuple[Optional[str], float]:
        """
        Extract client CUI from B2B receipts.
        Args:
            text: Raw OCR text from receipt
        Returns:
            Tuple of (cui, confidence) or (None, 0.0)
        """
        text_upper = text.upper()
        # First check if there's a CLIENT section
        has_client_section = any(
            re.search(marker, text_upper, re.IGNORECASE)
            for marker in self.CLIENT_MARKERS
        )
        if not has_client_section:
            return (None, 0.0)
        # Try to extract CUI
        for pattern, confidence in self.CLIENT_CUI_PATTERNS:
            match = re.search(pattern, text_upper, re.IGNORECASE | re.MULTILINE)
            if match:
                cui = match.group(1)
                # Normalize: remove RO prefix for storage
                cui_digits = re.sub(r'[^0-9]', '', cui)
                if 6 <= len(cui_digits) <= 10:
                    return (cui_digits, confidence)
        return (None, 0.0)
    def extract_client_name(self, text: str) -> Tuple[Optional[str], float]:
        """
        Extract client/buyer company name from B2B receipts.
        Args:
            text: Raw OCR text from receipt
        Returns:
            Tuple of (client_name, confidence) or (None, 0.0)
        """
        text_upper = text.upper()
        lines = text.split('\n')
        # First check if there's a CLIENT section
        client_section_idx = None
        for i, line in enumerate(lines):
            line_upper = line.upper().strip()
            if any(re.search(marker, line_upper, re.IGNORECASE) for marker in self.CLIENT_MARKERS):
                client_section_idx = i
                break
        if client_section_idx is None:
            return (None, 0.0)
        # Look for company name in CLIENT section
        line = lines[client_section_idx].strip()
        line_upper = line.upper()
        # Strategy 1: Check if name is on same line after ":"
        if ':' in line:
            name_part = line.split(':', 1)[1].strip()
            if name_part and len(name_part) >= 3:
                # Skip if it looks like a CUI (RO followed by digits)
                if re.match(r'^R[O0]?\d{6,10}$', name_part.upper()):
                    pass  # This is CUI, not name - continue to next strategy
                else:
                    # Check for company indicators
                    name_upper = name_part.upper()
                    if any(re.search(ind, name_upper) for ind in self.COMPANY_INDICATORS):
                        return (self._clean_company_name(name_part), 0.95)
                    elif len(name_part) >= 5 and not name_part.isdigit():
                        return (self._clean_company_name(name_part), 0.80)
        # Strategy 2: Check next line for company name
        if client_section_idx + 1 < len(lines):
            next_line = lines[client_section_idx + 1].strip()
            next_upper = next_line.upper()
            # Skip if it's a CUI/CIF line or looks like CUI
            if not re.search(r'C\.?\s*[UI]\.?\s*[IF]\.?', next_upper):
                if not re.match(r'^R[O0]?\d{6,10}$', next_upper):
                    if any(re.search(ind, next_upper) for ind in self.COMPANY_INDICATORS):
                        return (self._clean_company_name(next_line), 0.90)
                    elif len(next_line) >= 5 and not next_line.isdigit():
                        # Check it's not CUI/CIF/COD keywords
                        if not any(kw in next_upper for kw in ['CUI', 'CIF', 'COD', 'FISCAL']):
                            return (self._clean_company_name(next_line), 0.75)
        # Strategy 3: Look for any line with company indicators in CLIENT section region
        search_end = min(client_section_idx + 5, len(lines))
        for i in range(client_section_idx + 1, search_end):
            line = lines[i].strip()
            line_upper = line.upper()
            # Skip CUI/CIF lines
            if re.search(r'C\.?\s*[UI]\.?\s*[IF]\.?', line_upper):
                continue
            if re.match(r'^R[O0]?\d{6,10}$', line_upper):
                continue
            if any(re.search(ind, line_upper) for ind in self.COMPANY_INDICATORS):
                return (self._clean_company_name(line), 0.85)
        return (None, 0.0)
    @staticmethod
    def _clean_company_name(name: str) -> str:
        """Clean company name for storage."""
        if not name:
            return ""
        # Remove extra whitespace
        name = re.sub(r'\s+', ' ', name).strip()
        # Remove trailing punctuation except periods in S.R.L., S.A., etc.
        name = re.sub(r'[,;:]+$', '', name).strip()
        return name
    # -------------------------------------------------------------------------
    # Validation hints - override to customize validation behavior
    # -------------------------------------------------------------------------
    def get_validation_hints(self) -> Dict[str, Any]:
        """
        Return validation hints for this store.
        Returns:
            Dict with validation hints. Common keys:
            - has_multi_rate_tva: bool - Store uses multiple TVA rates
            - card_equals_total: bool - CARD payment equals total
            - has_client_cui: bool - Receipt includes client CUI
            - has_efactura: bool - Store uses e-factura format
            - is_non_vat_payer: bool - Store is not a VAT payer
        """
        return {}
    # -------------------------------------------------------------------------
    # Helper methods - available to all subclasses
    # -------------------------------------------------------------------------
    @staticmethod
    def _normalize_number(text: str) -> str:
        """
        Normalize a number string for Decimal conversion.
        Handles Romanian formats: "1.234,56" -> "1234.56"
        """
        if not text:
            return "0"
        # Remove spaces
        text = text.replace(" ", "")
        # Determine decimal separator
        last_comma = text.rfind(",")
        last_dot = text.rfind(".")
        if last_comma > last_dot:
            text = text.replace(".", "").replace(",", ".")
        elif last_dot > last_comma:
            text = text.replace(",", "")
        else:
            text = text.replace(",", ".")
        return text
    @staticmethod
    def _parse_decimal(text: str) -> Optional[Decimal]:
        """Parse a string to Decimal, handling various formats."""
        try:
            normalized = BaseStoreProfile._normalize_number(text)
            return Decimal(normalized)
        except (InvalidOperation, ValueError, TypeError):
            return None
    @staticmethod
    def _parse_date(text: str) -> Optional[date]:
        """
        Parse date string in various formats.
        Supports: DD-MM-YYYY, DD/MM/YYYY, DD.MM.YYYY, YYYY-MM-DD
        """
        if not text:
            return None
        # Normalize separators
        text = text.replace('/', '-').replace('.', '-')
        try:
            parts = text.split('-')
            if len(parts) != 3:
                return None
            # Determine format based on first part length
            if len(parts[0]) == 4:
                # YYYY-MM-DD
                year, month, day = int(parts[0]), int(parts[1]), int(parts[2])
            else:
                # DD-MM-YYYY
                day, month, year = int(parts[0]), int(parts[1]), int(parts[2])
            # Validate ranges
            if 1 <= day <= 31 and 1 <= month <= 12 and 2000 <= year <= 2100:
                return date(year, month, day)
        except (ValueError, TypeError, IndexError):
            pass
        return None
    @staticmethod
    def _clean_text(text: str) -> str:
        """Clean OCR text for pattern matching."""
        if not text:
            return ""
        text = re.sub(r'\s+', ' ', text)
        text = re.sub(r'[\x00-\x09\x0b\x0c\x0e-\x1f\x7f]', '', text)
        return text.strip()
    # -------------------------------------------------------------------------
    # Magic methods
    # -------------------------------------------------------------------------
    def __repr__(self) -> str:
        return f"<{self.__class__.__name__} CUI={self.CUI_LIST}>"
    def __str__(self) -> str:
        return f"{self.STORE_NAME} ({', '.join(self.CUI_LIST)})"
--- a/backend/modules/data_entry/services/ocr/profiles/best_print.py
+++ b/backend/modules/data_entry/services/ocr/profiles/best_print.py
@@ -0,0 +1,54 @@
 """
 BEST PRINT TRADE ACTIV SRL store profile for OCR extraction.
 Stamp manufacturing service. Non-VAT payer (neplătitor de TVA).
 """
 from typing import List, Dict, Any
 from .base import BaseStoreProfile
 from . import ProfileRegistry
@ProfileRegistry.register
 class BestPrintProfile(BaseStoreProfile):
    """
    BEST PRINT TRADE ACTIV SRL - non-VAT payer profile.
    Key characteristics:
    - Non-VAT payer (neplătitor de TVA) - NO TVA on receipts
    - Stamp manufacturing and printing services
    - Total amount has no TVA component
    - CARD payment typical
    """
    CUI_LIST = ["45417955"]
    NAME_PATTERNS = ["BEST PRINT", "BESTPRINT", "BEST PRINT TRADE", "BEST PR1NT"]
    STORE_NAME = "BEST PRINT TRADE ACTIV SRL"
    def extract_tva_entries(self, text: str) -> List[dict]:
        """
        Extract TVA entries - returns empty for non-VAT payer.
        BEST PRINT is a non-VAT payer (neplătitor de TVA),
        so no TVA entries are expected on receipts.
        Args:
            text: Raw OCR text from receipt (unused)
        Returns:
            Empty list (non-VAT payer has no TVA)
        """
        # Non-VAT payer - no TVA entries
        return []
    def get_validation_hints(self) -> Dict[str, Any]:
        """Return BEST PRINT-specific validation hints."""
        return {
            "has_multi_rate_tva": False,
            "card_equals_total": True,
            "has_client_cui": True,  # May have client CUI
            "has_efactura": False,
            "is_non_vat_payer": True,  # CRITICAL: Non-VAT payer
            "tva_pattern": "none",
        }
--- a/backend/modules/data_entry/services/ocr/profiles/brick.py
+++ b/backend/modules/data_entry/services/ocr/profiles/brick.py
@@ -0,0 +1,101 @@
 """
 BRICK (Five-Holding) store profile for OCR extraction.
 Five-Holding S.A. operates BRICK stores with standard receipt format.
 """
 import re
 from decimal import Decimal, InvalidOperation
 from typing import List, Dict, Any
 from .base import BaseStoreProfile
 from . import ProfileRegistry
@ProfileRegistry.register
 class BrickProfile(BaseStoreProfile):
    """
    FIVE-HOLDING S.A. (BRICK) - standard TVA format.
    Key characteristics:
    - Standard TVA format
    - Single TVA rate typically
    - No client CUI on receipts
    """
    CUI_LIST = ["10562600"]
    NAME_PATTERNS = ["BRICK", "FIVE-HOLDING", "FIVE HOLDING", "BR1CK"]  # OCR variants
    STORE_NAME = "FIVE-HOLDING S.A."
    # Standard TVA patterns (flexible - accepts any rate)
    TVA_PATTERNS = [
        # "TVA A: XX% = YY,YY" or "TVA-A XX% YY,YY"
        r'TVA\s*[-:]?\s*([A-D])\s*:?\s*(\d{1,2})\s*%\s*[=:]?\s*([\d.,]+)',
        # "A - XX,XX% = YY,YY"
        r'([A-D])\s*[-:]\s*(\d{1,2})[.,]?\d{0,2}\s*%\s*[=:]?\s*([\d.,]+)',
        # Simple: "TVA XX% YY,YY"
        r'TVA\s+(\d{1,2})\s*%\s*([\d.,]+)',
    ]
    def extract_tva_entries(self, text: str) -> List[dict]:
        """
        Extract BRICK-specific TVA entries.
        Args:
            text: Raw OCR text from receipt
        Returns:
            List of TVA entries with code, percent, and amount
        """
        entries = []
        seen = set()
        # Try coded patterns first
        for pattern in self.TVA_PATTERNS[:2]:
            for match in re.finditer(pattern, text, re.IGNORECASE):
                try:
                    code = match.group(1).upper()
                    percent = int(match.group(2))
                    amount = self._parse_decimal(match.group(3))
                    if amount and amount > 0:
                        entry_key = (code, percent)
                        if entry_key not in seen:
                            entries.append({
                                'code': code,
                                'percent': percent,
                                'amount': amount
                            })
                            seen.add(entry_key)
                except (ValueError, InvalidOperation, IndexError):
                    continue
        # Fallback to simple format
        if not entries:
            simple_pattern = self.TVA_PATTERNS[2]
            for match in re.finditer(simple_pattern, text, re.IGNORECASE):
                try:
                    percent = int(match.group(1))
                    amount = self._parse_decimal(match.group(2))
                    if amount and amount > 0:
                        entries.append({
                            'code': 'A',
                            'percent': percent,
                            'amount': amount
                        })
                        break
                except (ValueError, InvalidOperation):
                    continue
        return entries
    def get_validation_hints(self) -> Dict[str, Any]:
        """Return BRICK-specific validation hints."""
        return {
            "has_multi_rate_tva": False,
            "card_equals_total": False,
            "has_client_cui": False,
            "has_efactura": False,
            "is_non_vat_payer": False,
        }
--- a/backend/modules/data_entry/services/ocr/profiles/dedeman.py
+++ b/backend/modules/data_entry/services/ocr/profiles/dedeman.py
@@ -0,0 +1,118 @@
 """
 DEDEMAN store profile for OCR extraction.
 Dedeman receipts may include e-factura information and use standard TVA format.
 Large DIY retailer in Romania.
 """
 import re
 from decimal import Decimal, InvalidOperation
 from typing import List, Dict, Any
 from .base import BaseStoreProfile
 from . import ProfileRegistry
@ProfileRegistry.register
 class DedemanProfile(BaseStoreProfile):
    """
    DEDEMAN SRL - standard TVA with e-factura support.
    Key characteristics:
    - Standard TVA format
    - May include e-factura reference number
    - Professional receipts for construction materials
    """
    CUI_LIST = ["2816464"]
    NAME_PATTERNS = ["DEDEMAN", "DEDEMAN SRL", "OEDEMAN", "D3DEMAN"]  # OCR variants
    STORE_NAME = "DEDEMAN SRL"
    # Standard TVA patterns (flexible - accepts any rate)
    TVA_PATTERNS = [
        # "TVA A: XX% = YY,YY" or "TVA-A XX% YY,YY"
        r'TVA\s*[-:]?\s*([A-D])\s*:?\s*(\d{1,2})\s*%\s*[=:]?\s*([\d.,]+)',
        # "A - XX,XX% = YY,YY"
        r'([A-D])\s*[-:]\s*(\d{1,2})[.,]?\d{0,2}\s*%\s*[=:]?\s*([\d.,]+)',
        # "TVA (XX%) YY,YY"
        r'TVA\s*\(?\s*(\d{1,2})\s*%\s*\)?\s*:?\s*([\d.,]+)',
    ]
    # E-factura pattern for reference extraction
    EFACTURA_PATTERN = r'e-?factura\s*:?\s*([A-Z0-9]+)'
    def extract_tva_entries(self, text: str) -> List[dict]:
        """
        Extract Dedeman-specific TVA entries.
        Args:
            text: Raw OCR text from receipt
        Returns:
            List of TVA entries with code, percent, and amount
        """
        entries = []
        seen = set()
        # Try coded patterns first
        for pattern in self.TVA_PATTERNS[:2]:
            for match in re.finditer(pattern, text, re.IGNORECASE):
                try:
                    code = match.group(1).upper()
                    percent = int(match.group(2))
                    amount = self._parse_decimal(match.group(3))
                    if amount and amount > 0:
                        entry_key = (code, percent)
                        if entry_key not in seen:
                            entries.append({
                                'code': code,
                                'percent': percent,
                                'amount': amount
                            })
                            seen.add(entry_key)
                except (ValueError, InvalidOperation, IndexError):
                    continue
        # Fallback to simple format
        if not entries:
            simple_pattern = self.TVA_PATTERNS[2]
            for match in re.finditer(simple_pattern, text, re.IGNORECASE):
                try:
                    percent = int(match.group(1))
                    amount = self._parse_decimal(match.group(2))
                    if amount and amount > 0:
                        entries.append({
                            'code': 'A',
                            'percent': percent,
                            'amount': amount
                        })
                        break
                except (ValueError, InvalidOperation):
                    continue
        return entries
    def extract_efactura_reference(self, text: str) -> str | None:
        """
        Extract e-factura reference number if present.
        Args:
            text: Raw OCR text from receipt
        Returns:
            E-factura reference string or None
        """
        match = re.search(self.EFACTURA_PATTERN, text, re.IGNORECASE)
        return match.group(1) if match else None
    def get_validation_hints(self) -> Dict[str, Any]:
        """Return Dedeman-specific validation hints."""
        return {
            "has_multi_rate_tva": False,
            "card_equals_total": False,
            "has_client_cui": False,
            "has_efactura": True,
            "is_non_vat_payer": False,
        }
--- a/backend/modules/data_entry/services/ocr/profiles/electrobering.py
+++ b/backend/modules/data_entry/services/ocr/profiles/electrobering.py
@@ -0,0 +1,102 @@
 """
 ELECTROBERING S.R.L. store profile for OCR extraction.
 Electronics and home supplies store.
 """
 import re
 from decimal import Decimal, InvalidOperation
 from typing import List, Dict, Any
 from .base import BaseStoreProfile
 from . import ProfileRegistry
@ProfileRegistry.register
 class ElectroberingProfile(BaseStoreProfile):
    """
    ELECTROBERING S.R.L. - standard TVA profile.
    Key characteristics:
    - Standard TVA format (single rate, any percentage)
    - Electronics and home supplies
    - May have client CUI for B2B purchases
    - CARD payment typical
    """
    CUI_LIST = ["2744937"]
    NAME_PATTERNS = ["ELECTROBERING", "ELECTR0BERING", "ELECTROBERING SRL"]
    STORE_NAME = "ELECTROBERING S.R.L."
    # Standard TVA patterns (flexible - accepts any rate)
    TVA_PATTERNS = [
        # "TVA A: XX% = YY,YY" or "TVA-A XX% YY,YY"
        r'TVA\s*[-:]?\s*([A-D])\s*:?\s*(\d{1,2})\s*%\s*[=:]?\s*([\d.,]+)',
        # "A - XX,XX% = YY,YY"
        r'([A-D])\s*[-:]\s*(\d{1,2})[.,]?\d{0,2}\s*%\s*[=:]?\s*([\d.,]+)',
        # "TVA XX% YY,YY" (simple format without code)
        r'TVA\s+(\d{1,2})\s*%\s*([\d.,]+)',
    ]
    def extract_tva_entries(self, text: str) -> List[dict]:
        """
        Extract TVA entries from receipt text.
        Args:
            text: Raw OCR text from receipt
        Returns:
            List of TVA entries with code, percent, and amount
        """
        entries = []
        seen = set()
        # Try coded patterns first
        for pattern in self.TVA_PATTERNS[:2]:
            for match in re.finditer(pattern, text, re.IGNORECASE):
                try:
                    code = match.group(1).upper()
                    percent = int(match.group(2))
                    amount = self._parse_decimal(match.group(3))
                    if amount and amount > 0:
                        entry_key = (code, percent)
                        if entry_key not in seen:
                            entries.append({
                                'code': code,
                                'percent': percent,
                                'amount': amount
                            })
                            seen.add(entry_key)
                except (ValueError, InvalidOperation, IndexError):
                    continue
        # Fallback to simple format
        if not entries:
            simple_pattern = self.TVA_PATTERNS[2]
            for match in re.finditer(simple_pattern, text, re.IGNORECASE):
                try:
                    percent = int(match.group(1))
                    amount = self._parse_decimal(match.group(2))
                    if amount and amount > 0:
                        entries.append({
                            'code': 'A',
                            'percent': percent,
                            'amount': amount
                        })
                        break
                except (ValueError, InvalidOperation):
                    continue
        return entries
    def get_validation_hints(self) -> Dict[str, Any]:
        """Return ELECTROBERING-specific validation hints."""
        return {
            "has_multi_rate_tva": False,
            "card_equals_total": True,
            "has_client_cui": True,  # May have client CUI for B2B
            "has_efactura": False,
            "is_non_vat_payer": False,
        }
--- a/backend/modules/data_entry/services/ocr/profiles/gama_ink.py
+++ b/backend/modules/data_entry/services/ocr/profiles/gama_ink.py
@@ -0,0 +1,103 @@
 """
 GAMA INK SERVICE SRL store profile for OCR extraction.
 Toner refill and printer supplies store.
 """
 import re
 from decimal import Decimal, InvalidOperation
 from typing import List, Dict, Any
 from .base import BaseStoreProfile
 from . import ProfileRegistry
@ProfileRegistry.register
 class GamaInkProfile(BaseStoreProfile):
    """
    GAMA INK SERVICE SRL - standard TVA profile.
    Key characteristics:
    - Standard TVA format (single rate, any percentage)
    - Service-based (toner refill, printer supplies)
    - CARD payment typical
    """
    CUI_LIST = ["17741882"]
    NAME_PATTERNS = ["GAMA INK", "GAMA", "GAMAINK", "GAMA INK SERVICE"]
    STORE_NAME = "GAMA INK SERVICE SRL"
    # Standard TVA patterns (flexible - accepts any rate)
    TVA_PATTERNS = [
        # "TVA A: XX% = YY,YY" or "TVA-A XX% YY,YY"
        r'TVA\s*[-:]?\s*([A-D])\s*:?\s*(\d{1,2})\s*%\s*[=:]?\s*([\d.,]+)',
        # "A - XX,XX% = YY,YY"
        r'([A-D])\s*[-:]\s*(\d{1,2})[.,]?\d{0,2}\s*%\s*[=:]?\s*([\d.,]+)',
        # "TVA XX% YY,YY" (simple format without code)
        r'TVA\s+(\d{1,2})\s*%\s*([\d.,]+)',
        # "TVA: YY,YY" (amount only, percent inferred)
        r'TVA\s*:?\s*([\d.,]+)\s*(?:LEI|RON)?',
    ]
    def extract_tva_entries(self, text: str) -> List[dict]:
        """
        Extract TVA entries from receipt text.
        Args:
            text: Raw OCR text from receipt
        Returns:
            List of TVA entries with code, percent, and amount
        """
        entries = []
        seen = set()
        # Try coded patterns first (have both code and percent)
        for pattern in self.TVA_PATTERNS[:2]:
            for match in re.finditer(pattern, text, re.IGNORECASE):
                try:
                    code = match.group(1).upper()
                    percent = int(match.group(2))
                    amount = self._parse_decimal(match.group(3))
                    if amount and amount > 0:
                        entry_key = (code, percent)
                        if entry_key not in seen:
                            entries.append({
                                'code': code,
                                'percent': percent,
                                'amount': amount
                            })
                            seen.add(entry_key)
                except (ValueError, InvalidOperation, IndexError):
                    continue
        # Fallback to simple format (percent + amount without code)
        if not entries:
            simple_pattern = self.TVA_PATTERNS[2]
            for match in re.finditer(simple_pattern, text, re.IGNORECASE):
                try:
                    percent = int(match.group(1))
                    amount = self._parse_decimal(match.group(2))
                    if amount and amount > 0:
                        entries.append({
                            'code': 'A',
                            'percent': percent,
                            'amount': amount
                        })
                        break
                except (ValueError, InvalidOperation):
                    continue
        return entries
    def get_validation_hints(self) -> Dict[str, Any]:
        """Return GAMA INK-specific validation hints."""
        return {
            "has_multi_rate_tva": False,
            "card_equals_total": True,
            "has_client_cui": False,
            "has_efactura": False,
            "is_non_vat_payer": False,
        }
--- a/backend/modules/data_entry/services/ocr/profiles/kineterra.py
+++ b/backend/modules/data_entry/services/ocr/profiles/kineterra.py
@@ -0,0 +1,53 @@
 """
 KINETERRA store profile for OCR extraction.
 Kineterra is a non-VAT payer (neplătitor de TVA).
 Receipts don't include TVA breakdown.
 """
 from typing import List, Dict, Any
 from .base import BaseStoreProfile
 from . import ProfileRegistry
@ProfileRegistry.register
 class KineterraProfile(BaseStoreProfile):
    """
    KINETERRA CONCEPT SRL - non-VAT payer profile.
    Key characteristics:
    - Non-VAT payer (neplătitor de TVA)
    - No TVA breakdown on receipts
    - Total amount has no TVA component
    """
    CUI_LIST = ["31180432"]
    NAME_PATTERNS = ["KINETERRA", "KINETERRA CONCEPT", "K1NETERRA"]  # OCR variants
    STORE_NAME = "KINETERRA CONCEPT SRL"
    def extract_tva_entries(self, text: str) -> List[dict]:
        """
        Extract TVA entries - returns empty for non-VAT payer.
        Kineterra is a non-VAT payer, so no TVA entries are expected.
        Args:
            text: Raw OCR text from receipt (unused)
        Returns:
            Empty list (non-VAT payer has no TVA)
        """
        # Non-VAT payer - no TVA entries
        return []
    def get_validation_hints(self) -> Dict[str, Any]:
        """Return Kineterra-specific validation hints."""
        return {
            "has_multi_rate_tva": False,
            "card_equals_total": False,
            "has_client_cui": False,
            "has_efactura": False,
            "is_non_vat_payer": True,
            "tva_pattern": "none",
        }
--- a/backend/modules/data_entry/services/ocr/profiles/lidl.py
+++ b/backend/modules/data_entry/services/ocr/profiles/lidl.py
@@ -0,0 +1,93 @@
 """
 LIDL store profile for OCR extraction.
 Lidl receipts have a specific TVA format without hyphen/colon separators:
    TOTAL TVA 9,84
    TVA A 21,00% 7,71
    TVA B 11,00% 2,13
 This profile handles multi-rate TVA extraction for Lidl receipts.
 """
 import re
 from decimal import Decimal, InvalidOperation
 from typing import List, Dict, Any
 from .base import BaseStoreProfile
 from . import ProfileRegistry
@ProfileRegistry.register
 class LidlProfile(BaseStoreProfile):
    """
    LIDL DISCOUNT S.R.L. - multi-rate TVA profile.
    Key characteristics:
    - Multi-rate TVA (codes A, B, C, D with any percentage - patterns are flexible)
    - TVA format: "TVA A XX,XX% YY,YY" (code + percent + amount on same line)
    - Supports historical rates (19%, 9%, 5%) and current rates (21%, 11%)
    - CARD payment usually equals total
    - No client CUI on receipts
    """
    CUI_LIST = ["22891860"]
    NAME_PATTERNS = ["LIDL", "LDL", "L1DL", "LIDL DISCOUNT"]  # OCR variants
    STORE_NAME = "LIDL DISCOUNT S.R.L."
    # Lidl-specific TVA patterns
    # Format: "TVA A 21,00% 7,71" (code + percent + amount on same line)
    TVA_PATTERNS = [
        # Primary: "TVA A 21,00% 7.71" with various spacing
        r'T[VU][AR]\s+([A-D])\s+(\d{1,2})[.,]?\d{0,2}\s*%\s+([\d.,]+)',
        # With backslash OCR artifact: "TVA A \21,00% 7.71"
        r'T[VU][AR]\s+([A-D])\s+\\?(\d{1,2})[.,]?\d{0,2}\s*%\s+([\d.,]+)',
        # IVA variant (rare OCR misread)
        r'IVA\s+([A-D])\s+(\d{1,2})[.,]?\d{0,2}\s*%\s+([\d.,]+)',
    ]
    def extract_tva_entries(self, text: str) -> List[dict]:
        """
        Extract Lidl-specific TVA entries.
        Handles multiple TVA rates (A, B, C, D) commonly found on Lidl receipts.
        Uses deduplication to avoid counting the same entry twice from different patterns.
        Args:
            text: Raw OCR text from receipt
        Returns:
            List of TVA entries with code, percent, and amount
        """
        entries = []
        seen = set()  # Deduplication key: (code, percent)
        for pattern in self.TVA_PATTERNS:
            for match in re.finditer(pattern, text, re.IGNORECASE):
                try:
                    code = match.group(1).upper()
                    percent = int(match.group(2))
                    amount = self._parse_decimal(match.group(3))
                    if amount and amount > 0:
                        entry_key = (code, percent)
                        if entry_key not in seen:
                            entries.append({
                                'code': code,
                                'percent': percent,
                                'amount': amount
                            })
                            seen.add(entry_key)
                except (ValueError, InvalidOperation):
                    continue
        return entries
    def get_validation_hints(self) -> Dict[str, Any]:
        """Return Lidl-specific validation hints."""
        return {
            "has_multi_rate_tva": True,
            "card_equals_total": True,
            "has_client_cui": False,
            "has_efactura": False,
            "is_non_vat_payer": False,
        }
--- a/backend/modules/data_entry/services/ocr/profiles/omv.py
+++ b/backend/modules/data_entry/services/ocr/profiles/omv.py
@@ -0,0 +1,99 @@
 """
 OMV Petrom store profile for OCR extraction.
 OMV receipts typically include client CUI and use standard TVA format.
 Common at gas stations with fuel purchases.
 Date format: YYYY. MM. DD with spaces (e.g., "2025. 08. 14")
 """
 import re
 from datetime import date
 from decimal import Decimal, InvalidOperation
 from typing import List, Dict, Any, Tuple, Optional
 from .base import BaseStoreProfile
 from . import ProfileRegistry
@ProfileRegistry.register
 class OMVProfile(BaseStoreProfile):
    """
    OMV PETROM MARKETING S.R.L. - standard TVA with client CUI.
    Key characteristics:
    - Standard TVA format (usually single rate, any percentage)
    - Includes client CUI on receipt (for business purchases)
    - TVA table format: "A-XX,XX% base_amount tva_amount"
    - Supports historical rates (19%) and current rates (21%)
    - Date format: YYYY. MM. DD (with spaces)
    """
    CUI_LIST = ["11201891"]
    NAME_PATTERNS = ["OMV", "PETROM", "OMV PETROM", "0MV"]  # OCR variants
    STORE_NAME = "OMV PETROM MARKETING S.R.L."
    # OMV TVA table pattern: "A-19,00%  285,66  49,58" (code-percent base tva)
    TVA_TABLE_PATTERN = r'([A-D])\s*[-:]\s*(\d{1,2})[.,]\d{2}\s*%\s+([\d.,]+)\s+([\d.,]+)'
    # Standard TVA pattern fallback
    TVA_STANDARD_PATTERN = r'TVA\s*:?\s*([\d.,]+)'
    # OMV specific: prioritize YYYY. MM. DD format with spaces
    DATE_PATTERNS_OCR_SPACES = [
        # YYYY. MM. DD with time (OMV format)
        (r'(\d{4})[.,]\s*(\d{2})[.,]\s*(\d{2})\s+\d{2}:\d{2}', 0.98, 'ymd'),
        (r'(\d{4})[.,]\s*(\d{2})[.,]\s*(\d{2})', 0.95, 'ymd'),
        # Fallback to DD. MM. YYYY
        (r'(\d{2})[.,]\s*(\d{2})[.,]\s*(\d{4})\s+\d{2}:\d{2}', 0.92, 'dmy'),
        (r'(\d{2})[.,]\s*(\d{2})[.,]\s*(\d{4})', 0.85, 'dmy'),
    ]
    def extract_tva_entries(self, text: str) -> List[dict]:
        """
        Extract OMV-specific TVA entries.
        OMV receipts often show TVA in table format with base and TVA amounts.
        Falls back to standard extraction if table format not found.
        Args:
            text: Raw OCR text from receipt
        Returns:
            List of TVA entries with code, percent, and amount
        """
        entries = []
        seen = set()
        # Try table format first (more accurate)
        for match in re.finditer(self.TVA_TABLE_PATTERN, text, re.IGNORECASE):
            try:
                code = match.group(1).upper()
                percent = int(match.group(2))
                # TVA amount is the second number (smaller one)
                tva_amount = self._parse_decimal(match.group(4))
                if tva_amount and tva_amount > 0:
                    entry_key = (code, percent)
                    if entry_key not in seen:
                        entries.append({
                            'code': code,
                            'percent': percent,
                            'amount': tva_amount
                        })
                        seen.add(entry_key)
            except (ValueError, InvalidOperation):
                continue
        return entries
    def get_validation_hints(self) -> Dict[str, Any]:
        """Return OMV-specific validation hints."""
        return {
            "has_multi_rate_tva": False,
            "card_equals_total": False,
            "has_client_cui": True,
            "has_efactura": False,
            "is_non_vat_payer": False,
            "tva_table_format": True,
        }
--- a/backend/modules/data_entry/services/ocr/profiles/pictus_velum.py
+++ b/backend/modules/data_entry/services/ocr/profiles/pictus_velum.py
@@ -0,0 +1,101 @@
 """
 PICTUS VELUM SRL store profile for OCR extraction.
 Office supplies and stationery store.
 """
 import re
 from decimal import Decimal, InvalidOperation
 from typing import List, Dict, Any
 from .base import BaseStoreProfile
 from . import ProfileRegistry
@ProfileRegistry.register
 class PictusVelumProfile(BaseStoreProfile):
    """
    PICTUS VELUM SRL - standard TVA profile.
    Key characteristics:
    - Standard TVA format (single rate, any percentage)
    - Office supplies and stationery (rechizite)
    - CARD payment typical
    """
    CUI_LIST = ["39634534"]
    NAME_PATTERNS = ["PICTUS", "PICTUS VELUM", "P1CTUS", "PICTUS VELUM SRL"]
    STORE_NAME = "PICTUS VELUM SRL"
    # Standard TVA patterns (flexible - accepts any rate)
    TVA_PATTERNS = [
        # "TVA A: XX% = YY,YY" or "TVA-A XX% YY,YY"
        r'TVA\s*[-:]?\s*([A-D])\s*:?\s*(\d{1,2})\s*%\s*[=:]?\s*([\d.,]+)',
        # "A - XX,XX% = YY,YY"
        r'([A-D])\s*[-:]\s*(\d{1,2})[.,]?\d{0,2}\s*%\s*[=:]?\s*([\d.,]+)',
        # "TVA XX% YY,YY" (simple format without code)
        r'TVA\s+(\d{1,2})\s*%\s*([\d.,]+)',
    ]
    def extract_tva_entries(self, text: str) -> List[dict]:
        """
        Extract TVA entries from receipt text.
        Args:
            text: Raw OCR text from receipt
        Returns:
            List of TVA entries with code, percent, and amount
        """
        entries = []
        seen = set()
        # Try coded patterns first
        for pattern in self.TVA_PATTERNS[:2]:
            for match in re.finditer(pattern, text, re.IGNORECASE):
                try:
                    code = match.group(1).upper()
                    percent = int(match.group(2))
                    amount = self._parse_decimal(match.group(3))
                    if amount and amount > 0:
                        entry_key = (code, percent)
                        if entry_key not in seen:
                            entries.append({
                                'code': code,
                                'percent': percent,
                                'amount': amount
                            })
                            seen.add(entry_key)
                except (ValueError, InvalidOperation, IndexError):
                    continue
        # Fallback to simple format
        if not entries:
            simple_pattern = self.TVA_PATTERNS[2]
            for match in re.finditer(simple_pattern, text, re.IGNORECASE):
                try:
                    percent = int(match.group(1))
                    amount = self._parse_decimal(match.group(2))
                    if amount and amount > 0:
                        entries.append({
                            'code': 'A',
                            'percent': percent,
                            'amount': amount
                        })
                        break
                except (ValueError, InvalidOperation):
                    continue
        return entries
    def get_validation_hints(self) -> Dict[str, Any]:
        """Return PICTUS VELUM-specific validation hints."""
        return {
            "has_multi_rate_tva": False,
            "card_equals_total": True,
            "has_client_cui": False,
            "has_efactura": False,
            "is_non_vat_payer": False,
        }
--- a/backend/modules/data_entry/services/ocr/profiles/socar.py
+++ b/backend/modules/data_entry/services/ocr/profiles/socar.py
@@ -0,0 +1,111 @@
 """
 SOCAR Petroleum store profile for OCR extraction.
 SOCAR receipts are similar to OMV - gas station with client CUI support.
 Date format may use YYYY. MM. DD with spaces.
 """
 import re
 from datetime import date
 from decimal import Decimal, InvalidOperation
 from typing import List, Dict, Any, Tuple, Optional
 from .base import BaseStoreProfile
 from . import ProfileRegistry
@ProfileRegistry.register
 class SocarProfile(BaseStoreProfile):
    """
    SOCAR PETROLEUM S.A. - standard TVA with client CUI.
    Key characteristics:
    - Standard TVA format (usually single rate)
    - Includes client CUI on receipt (for business purchases)
    - Similar format to OMV/Petrom
    - Date format may use YYYY. MM. DD (with spaces)
    """
    CUI_LIST = ["12546600"]
    NAME_PATTERNS = ["SOCAR", "S0CAR", "SOCAR PETROLEUM"]  # OCR variants
    STORE_NAME = "SOCAR PETROLEUM S.A."
    # Standard TVA patterns for gas stations
    TVA_PATTERNS = [
        # Table format: "A-19,00% 285,66 49,58"
        r'([A-D])\s*[-:]\s*(\d{1,2})[.,]\d{2}\s*%\s+([\d.,]+)\s+([\d.,]+)',
        # Simple format: "TVA 19% 49,58"
        r'TVA\s+(\d{1,2})\s*%\s*([\d.,]+)',
    ]
    # Gas stations may use YYYY. MM. DD format
    DATE_PATTERNS_OCR_SPACES = [
        (r'(\d{4})[.,]\s*(\d{2})[.,]\s*(\d{2})\s+\d{2}:\d{2}', 0.98, 'ymd'),
        (r'(\d{4})[.,]\s*(\d{2})[.,]\s*(\d{2})', 0.95, 'ymd'),
        (r'(\d{2})[.,]\s*(\d{2})[.,]\s*(\d{4})\s+\d{2}:\d{2}', 0.92, 'dmy'),
        (r'(\d{2})[.,]\s*(\d{2})[.,]\s*(\d{4})', 0.85, 'dmy'),
    ]
    def extract_tva_entries(self, text: str) -> List[dict]:
        """
        Extract SOCAR-specific TVA entries.
        Args:
            text: Raw OCR text from receipt
        Returns:
            List of TVA entries with code, percent, and amount
        """
        entries = []
        seen = set()
        # Try table format first
        table_pattern = self.TVA_PATTERNS[0]
        for match in re.finditer(table_pattern, text, re.IGNORECASE):
            try:
                code = match.group(1).upper()
                percent = int(match.group(2))
                tva_amount = self._parse_decimal(match.group(4))
                if tva_amount and tva_amount > 0:
                    entry_key = (code, percent)
                    if entry_key not in seen:
                        entries.append({
                            'code': code,
                            'percent': percent,
                            'amount': tva_amount
                        })
                        seen.add(entry_key)
            except (ValueError, InvalidOperation):
                continue
        # Fallback to simple format if no table entries found
        if not entries:
            simple_pattern = self.TVA_PATTERNS[1]
            for match in re.finditer(simple_pattern, text, re.IGNORECASE):
                try:
                    percent = int(match.group(1))
                    amount = self._parse_decimal(match.group(2))
                    if amount and amount > 0:
                        # Default to code 'A' for simple format
                        entries.append({
                            'code': 'A',
                            'percent': percent,
                            'amount': amount
                        })
                        break  # Only take first match for simple format
                except (ValueError, InvalidOperation):
                    continue
        return entries
    def get_validation_hints(self) -> Dict[str, Any]:
        """Return SOCAR-specific validation hints."""
        return {
            "has_multi_rate_tva": False,
            "card_equals_total": False,
            "has_client_cui": True,
            "has_efactura": False,
            "is_non_vat_payer": False,
        }
--- a/backend/modules/data_entry/services/ocr/profiles/stepout_market.py
+++ b/backend/modules/data_entry/services/ocr/profiles/stepout_market.py
@@ -0,0 +1,112 @@
 """
 STEPOUT MARKET SRL store profile for OCR extraction.
 Bookstore with reduced TVA rate (5% for books in Romania).
 """
 import re
 from decimal import Decimal, InvalidOperation
 from typing import List, Dict, Any
 from .base import BaseStoreProfile
 from . import ProfileRegistry
@ProfileRegistry.register
 class StepoutMarketProfile(BaseStoreProfile):
    """
    STEPOUT MARKET SRL - reduced TVA rate profile (books).
    Key characteristics:
    - Reduced TVA rate: 5% for books (cărți qualification in Romania)
    - May also have standard rates for non-book items
    - Patterns are flexible to accept ANY TVA rate
    - CARD payment typical
    """
    CUI_LIST = ["35532655"]
    NAME_PATTERNS = ["STEPOUT", "STEPOUT MARKET", "STEP0UT", "STEPOUT MARKET SRL"]
    STORE_NAME = "STEPOUT MARKET SRL"
    # TVA patterns (flexible - accepts any rate including 5%)
    TVA_PATTERNS = [
        # "TVA A: 5% = YY,YY" or "TVA-A 5% YY,YY" (coded format)
        r'TVA\s*[-:]?\s*([A-D])\s*:?\s*(\d{1,2})\s*%\s*[=:]?\s*([\d.,]+)',
        # "A - 5,00% = YY,YY" (table format)
        r'([A-D])\s*[-:]\s*(\d{1,2})[.,]?\d{0,2}\s*%\s*[=:]?\s*([\d.,]+)',
        # "TVA 5% YY,YY" (simple format - common for single rate)
        r'TVA\s+(\d{1,2})\s*%\s*([\d.,]+)',
        # "TVA 5,00%: YY,YY" (percent with colon)
        r'TVA\s+(\d{1,2})[.,]\d{2}\s*%\s*:?\s*([\d.,]+)',
    ]
    def extract_tva_entries(self, text: str) -> List[dict]:
        """
        Extract TVA entries from receipt text.
        Stepout Market primarily sells books which have 5% TVA in Romania.
        The patterns are generic and will extract whatever rate is on the receipt.
        Args:
            text: Raw OCR text from receipt
        Returns:
            List of TVA entries with code, percent, and amount
        """
        entries = []
        seen = set()
        # Try coded patterns first (have code letter)
        for pattern in self.TVA_PATTERNS[:2]:
            for match in re.finditer(pattern, text, re.IGNORECASE):
                try:
                    code = match.group(1).upper()
                    percent = int(match.group(2))
                    amount = self._parse_decimal(match.group(3))
                    if amount and amount > 0:
                        entry_key = (code, percent)
                        if entry_key not in seen:
                            entries.append({
                                'code': code,
                                'percent': percent,
                                'amount': amount
                            })
                            seen.add(entry_key)
                except (ValueError, InvalidOperation, IndexError):
                    continue
        # Fallback to simple format (no code letter, just percent + amount)
        if not entries:
            for pattern in self.TVA_PATTERNS[2:]:
                for match in re.finditer(pattern, text, re.IGNORECASE):
                    try:
                        percent = int(match.group(1))
                        amount = self._parse_decimal(match.group(2))
                        if amount and amount > 0:
                            # Default to code 'A' for simple format
                            entries.append({
                                'code': 'A',
                                'percent': percent,
                                'amount': amount
                            })
                            break  # Only take first match for simple format
                    except (ValueError, InvalidOperation):
                        continue
                if entries:
                    break
        return entries
    def get_validation_hints(self) -> Dict[str, Any]:
        """Return STEPOUT MARKET-specific validation hints."""
        return {
            "has_multi_rate_tva": False,
            "card_equals_total": True,
            "has_client_cui": True,  # May have client CUI
            "has_efactura": False,
            "is_non_vat_payer": False,
            "typical_tva_rate": 5,  # Books have 5% TVA in Romania
            "product_category": "books",
        }
--- a/backend/modules/data_entry/services/ocr/profiles/unlimited_keys.py
+++ b/backend/modules/data_entry/services/ocr/profiles/unlimited_keys.py
@@ -0,0 +1,103 @@
 """
 UNLIMITED KEYS S.R.L. store profile for OCR extraction.
 Key duplication service. Notable for CASH (NUMERAR) payments.
 """
 import re
 from decimal import Decimal, InvalidOperation
 from typing import List, Dict, Any
 from .base import BaseStoreProfile
 from . import ProfileRegistry
@ProfileRegistry.register
 class UnlimitedKeysProfile(BaseStoreProfile):
    """
    UNLIMITED KEYS S.R.L. - standard TVA profile with NUMERAR payment.
    Key characteristics:
    - Standard TVA format (single rate, any percentage)
    - Key duplication service
    - NUMERAR (cash) payment common - different from most stores!
    - May also accept CARD
    """
    CUI_LIST = ["18993187"]
    NAME_PATTERNS = ["UNLIMITED KEYS", "UNLIMITED", "UNL1MITED", "UNLIMITED KEYS SRL"]
    STORE_NAME = "UNLIMITED KEYS S.R.L."
    # Standard TVA patterns (flexible - accepts any rate)
    TVA_PATTERNS = [
        # "TVA A: XX% = YY,YY" or "TVA-A XX% YY,YY"
        r'TVA\s*[-:]?\s*([A-D])\s*:?\s*(\d{1,2})\s*%\s*[=:]?\s*([\d.,]+)',
        # "A - XX,XX% = YY,YY"
        r'([A-D])\s*[-:]\s*(\d{1,2})[.,]?\d{0,2}\s*%\s*[=:]?\s*([\d.,]+)',
        # "TVA XX% YY,YY" (simple format without code)
        r'TVA\s+(\d{1,2})\s*%\s*([\d.,]+)',
    ]
    def extract_tva_entries(self, text: str) -> List[dict]:
        """
        Extract TVA entries from receipt text.
        Args:
            text: Raw OCR text from receipt
        Returns:
            List of TVA entries with code, percent, and amount
        """
        entries = []
        seen = set()
        # Try coded patterns first
        for pattern in self.TVA_PATTERNS[:2]:
            for match in re.finditer(pattern, text, re.IGNORECASE):
                try:
                    code = match.group(1).upper()
                    percent = int(match.group(2))
                    amount = self._parse_decimal(match.group(3))
                    if amount and amount > 0:
                        entry_key = (code, percent)
                        if entry_key not in seen:
                            entries.append({
                                'code': code,
                                'percent': percent,
                                'amount': amount
                            })
                            seen.add(entry_key)
                except (ValueError, InvalidOperation, IndexError):
                    continue
        # Fallback to simple format
        if not entries:
            simple_pattern = self.TVA_PATTERNS[2]
            for match in re.finditer(simple_pattern, text, re.IGNORECASE):
                try:
                    percent = int(match.group(1))
                    amount = self._parse_decimal(match.group(2))
                    if amount and amount > 0:
                        entries.append({
                            'code': 'A',
                            'percent': percent,
                            'amount': amount
                        })
                        break
                except (ValueError, InvalidOperation):
                    continue
        return entries
    def get_validation_hints(self) -> Dict[str, Any]:
        """Return UNLIMITED KEYS-specific validation hints."""
        return {
            "has_multi_rate_tva": False,
            "card_equals_total": False,  # May be NUMERAR (cash)
            "has_client_cui": True,  # May have client CUI
            "has_efactura": False,
            "is_non_vat_payer": False,
            "common_payment": "NUMERAR",  # Cash payments common
        }
--- a/backend/modules/data_entry/services/ocr_extractor.py
+++ b/backend/modules/data_entry/services/ocr_extractor.py
@@ -7,6 +7,7 @@ from typing import Optional, Tuple, List
 from dataclasses import dataclass, field
 from backend.modules.data_entry.services.ocr.validation import OCRValidationEngine
 from backend.modules.data_entry.services.ocr.profiles import ProfileRegistry
@dataclass
@@ -63,6 +64,57 @@ class ExtractionResult:
 class ReceiptExtractor:
    """Extract receipt fields using pattern matching for Romanian receipts."""
    # =========================================================================
    # DEPRECATED: STORE_PROFILES dict - USE ProfileRegistry INSTEAD
    # =========================================================================
    # Store profiles are now managed by ProfileRegistry in:
    #   backend/modules/data_entry/services/ocr/profiles/
    #
    # This dict is kept for reference only. All extraction logic now uses:
    #   ProfileRegistry.get_profile(cui)
    #
    # See: backend/modules/data_entry/services/ocr/profiles/README.md
    # =========================================================================
    STORE_PROFILES = {
        # Lidl - multi-rate TVA (A+B), specific format without hyphen/colon
        "22891860": {
            "name": "LIDL DISCOUNT S.R.L.",
            "tva_pattern": "lidl",
            "tva_format": "TVA {code} {percent}% {amount}",
            "has_multi_rate_tva": True,
            "card_equals_total": True,
        },
        # OMV Petrom - single TVA rate, client CUI included
        "11201891": {
            "name": "OMV PETROM MARKETING S.R.L.",
            "tva_pattern": "standard",
            "has_client_cui": True,
        },
        # FIVE-HOLDING (BRICK) - standard format
        "10562600": {
            "name": "FIVE-HOLDING S.A.",
            "tva_pattern": "standard",
        },
        # Dedeman - e-factura format
        "2816464": {
            "name": "DEDEMAN SRL",
            "tva_pattern": "standard",
            "has_efactura": True,
        },
        # SOCAR Petroleum
        "12546600": {
            "name": "SOCAR PETROLEUM S.A.",
            "tva_pattern": "standard",
            "has_client_cui": True,
        },
        # Kineterra - non-VAT payer
        "31180432": {
            "name": "KINETERRA CONCEPT SRL",
            "tva_pattern": "none",
            "is_non_vat_payer": True,
        },
    }
    # Total amount patterns (most specific first)
    # Romanian receipts use various formats: TOTAL LEI, TOTAL:, TOTAL RON, etc.
    # OCR often produces errors, so patterns must be tolerant
@@ -394,48 +446,101 @@ class ReceiptExtractor:
        result.raw_text = text
        text_upper = text.upper()
-        # Extract core fields
+        # =========================================================================
-        result.amount, result.confidence_amount = self._extract_amount(text_upper)
+        # STEP 1: Extract vendor info FIRST to find store profile
-        result.receipt_date, result.confidence_date = self._extract_date(text_upper)
+        # =========================================================================
        result.receipt_number, _ = self._extract_number(text_upper)
        result.receipt_series, _ = self._extract_series(text_upper)
        result.partner_name, result.confidence_vendor = self._extract_vendor(text)
        result.cui, _ = self._extract_cui(text_upper, text)
        # Normalize CUI: fix R0 → RO OCR error and validate format
        result.cui = OCRValidationEngine.normalize_cui(result.cui)
-        # Extract additional fields - Multiple TVA entries
+        # Lookup store-specific profile for enhanced extraction accuracy
-        result.tva_entries, result.tva_total = self._extract_tva_entries(text_upper)
+        store_profile = ProfileRegistry.get_profile(result.cui) if result.cui else None
        if store_profile:
            print(f"[Profile] Using {store_profile.__class__.__name__} for CUI {result.cui}", flush=True)
        # =========================================================================
        # STEP 2: Extract ALL fields using profile (if available) or generic
        # =========================================================================
        if store_profile:
            # Profile-specific extraction (higher accuracy for known stores)
            result.amount, result.confidence_amount = store_profile.extract_total(text_upper)
            result.receipt_date, result.confidence_date = store_profile.extract_date(text_upper)
            result.receipt_number, _ = store_profile.extract_receipt_number(text_upper)
            result.tva_entries = store_profile.extract_tva_entries(text_upper)
            result.tva_total = sum(e['amount'] for e in result.tva_entries) if result.tva_entries else None
            result.payment_methods = store_profile.extract_payment_methods(text_upper)
            # Client data extraction via profile (CUI + name)
            profile_client_cui, cui_confidence = store_profile.extract_client_cui(text_upper)
            profile_client_name, name_confidence = store_profile.extract_client_name(text)
            if profile_client_cui or profile_client_name:
                # Use profile extraction results
                result.client_cui = OCRValidationEngine.normalize_cui(profile_client_cui) if profile_client_cui else None
                result.client_name = profile_client_name
                result.confidence_client = max(cui_confidence, name_confidence)
                # Address still via generic (no profile method)
                _, _, client_address, _ = self._extract_client_data(text_upper, text)
                result.client_address = client_address
            else:
                # Fallback to generic client extraction
                client_name, client_cui, client_address, confidence = self._extract_client_data(text_upper, text)
                result.client_name = client_name
                result.client_cui = OCRValidationEngine.normalize_cui(client_cui)
                result.client_address = client_address
                result.confidence_client = confidence
            print(f"[Profile] Extracted: total={result.amount}, date={result.receipt_date}, "
                  f"TVA entries={len(result.tva_entries)}, payments={len(result.payment_methods)}", flush=True)
        else:
            # Generic extraction for unknown stores
            result.amount, result.confidence_amount = self._extract_amount(text_upper)
            result.receipt_date, result.confidence_date = self._extract_date(text_upper)
            result.receipt_number, _ = self._extract_number(text_upper)
            result.tva_entries, result.tva_total = self._extract_tva_entries(text_upper)
            result.payment_methods = self._extract_payment_methods(text_upper)
            # Generic client extraction
            client_name, client_cui, client_address, confidence = self._extract_client_data(text_upper, text)
            result.client_name = client_name
            result.client_cui = OCRValidationEngine.normalize_cui(client_cui)
            result.client_address = client_address
            result.confidence_client = confidence
        # Series extraction (no profile method, always generic)
        result.receipt_series, _ = self._extract_series(text_upper)
        # =========================================================================
        # STEP 3: Debug logging and validation
        # =========================================================================
        if not result.tva_entries:
            print(f"[TVA Debug] No TVA found. Checking patterns...", flush=True)
            # Debug: show what patterns see
            normalized = re.sub(r'(\d+)[.,]\s+(\d{2})', r'\1.\2', text_upper)
            taxe_match = re.search(r'T?OTAL\s+TAXE', normalized, re.IGNORECASE)
            rev_match = re.search(r'([\d.,]+)\s*T?OTAL\s+TAXE', normalized, re.IGNORECASE)
            print(f"[TVA Debug] 'OTAL TAXE' found: {bool(taxe_match)}, reversed: {rev_match.group(1) if rev_match else None}", flush=True)
-        # Log TVA vs TOTAL for debugging (validation happens in ocr_service._final_validation)
+        # Log TVA vs TOTAL for debugging
        # NOTE: We NO LONGER clear TVA here - the service will recalculate TOTAL from TVA if needed
        if result.tva_total and result.amount:
            if result.tva_total > result.amount:
                print(f"[TVA Extraction] TVA ({result.tva_total}) > TOTAL ({result.amount}) - will be corrected in final validation", flush=True)
            elif result.tva_total > result.amount * Decimal('0.5'):
                print(f"[TVA Extraction] Warning: TVA ({result.tva_total}) is > 50% of TOTAL ({result.amount}) - suspicious", flush=True)
        # Additional generic extractions
        result.items_count = self._extract_items_count(text_upper)
        result.address = self._extract_address(text_upper)
        result.payment_methods = self._extract_payment_methods(text_upper)
-        # Validate payment methods against extracted amount
+        # =========================================================================
-        # If payment sum >> amount, clear invalid payments (likely OCR error)
+        # STEP 4: Validate and post-process
        # =========================================================================
        # Save original payment methods before validation (for payment mode detection)
        original_payment_methods = result.payment_methods.copy() if result.payment_methods else []
        # Validate payment methods against extracted amount
        result.payment_methods = self._validate_payment_methods(result.payment_methods, result.amount)
        # Auto-suggest payment_mode based on detected payment methods
        # Use ORIGINAL payment_methods to detect CARD even if validation cleared them
        # (e.g., CARD 318.16 is valid even if total validation failed)
        payment_methods_for_mode = result.payment_methods if result.payment_methods else original_payment_methods
        if payment_methods_for_mode:
            card_amount = sum(
@@ -447,17 +552,9 @@ class ReceiptExtractor:
                result.suggested_payment_mode = 'banca'
                print(f"[Payment Mode] CARD detected ({card_amount}), suggesting 'banca'", flush=True)
            else:
                # Only cash payments detected
                result.suggested_payment_mode = 'numerar'
                print(f"[Payment Mode] Cash only detected, suggesting 'numerar'", flush=True)
        # Extract client data (B2B receipts)
        client_name, client_cui, client_address, confidence_client = self._extract_client_data(text_upper, text)
        result.client_name = client_name
        result.client_cui = OCRValidationEngine.normalize_cui(client_cui)  # Fix R0 → RO OCR error
        result.client_address = client_address
        result.confidence_client = confidence_client
        # Detect receipt type
        result.receipt_type = self._detect_receipt_type(text_upper)
@@ -620,6 +717,40 @@ class ReceiptExtractor:
        return num_str
    def _calculate_multi_rate_tva_total(self, tva_entries: List[dict]) -> Optional[Decimal]:
        """
        Calculate implied total from ALL TVA entries (multi-rate support).
        Formula for each entry: total_for_entry = tva * (100 + rate) / rate
        Final total = sum of all entry totals
        Example for Lidl (TVA A 21% = 7.71, TVA B 11% = 2.13):
            Entry A: 7.71 * 121 / 21 = 44.45
            Entry B: 2.13 * 111 / 11 = 21.49
            Total: 44.45 + 21.49 = 65.94 ≈ 65.86 (within tolerance)
        Returns:
            Implied total Decimal, or None if calculation not possible
        """
        if not tva_entries:
            return None
        total = Decimal('0')
        for entry in tva_entries:
            rate = entry.get('percent', 0)
            tva_amount = entry.get('amount')
            if tva_amount and rate > 0:
                try:
                    tva_dec = Decimal(str(tva_amount))
                    # Formula: total_for_entry = tva * (100 + rate) / rate
                    entry_total = tva_dec * Decimal(100 + rate) / Decimal(rate)
                    total += entry_total
                    print(f"[Multi-rate TVA] Entry {entry.get('code', '?')}: tva={tva_amount}, rate={rate}% -> implied={entry_total:.2f}", flush=True)
                except (InvalidOperation, ValueError, TypeError):
                    continue
        return total.quantize(Decimal('0.01')) if total > 0 else None
    def _cross_validate_and_calculate_amount(
        self,
        amount: Optional[Decimal],
@@ -634,12 +765,11 @@ class ReceiptExtractor:
        Returns: (amount, confidence, source_description)
        Logic:
-        1. If amount is valid (>0) with high confidence (>=0.8), use it directly
+        1. Collect all available sources: extracted amount, payment sum, TVA-implied total
-        2. Calculate payment_sum = CARD + NUMERAR + other methods
+        2. Find consensus: 2+ sources within 3% tolerance
-        3. Calculate tva_implied_total = tva_total * (100 + rate) / rate
+        3. If consensus found, use the higher-confidence source value
-        4. Cross-validate: if payment_sum matches extracted amount, boost confidence
+        4. If extracted differs >10% from all others, it's an outlier - correct it
-        5. If amount is 0/None, use payment_sum as total
+        5. If no consensus possible, fallback to individual validations
        6. If payment_sum is 0, try to calculate from TVA
        """
        # Calculate payment methods sum
        payment_sum = Decimal('0')
@@ -652,43 +782,73 @@ class ReceiptExtractor:
                except (InvalidOperation, ValueError, TypeError):
                    continue
-        # Calculate TVA-implied total: total = tva * (100 + rate) / rate
+        # Calculate TVA-implied total using ALL entries (multi-rate fix)
-        tva_implied_total = None
+        tva_implied_total = self._calculate_multi_rate_tva_total(tva_entries)
        if tva_entries:
            # Use the main TVA entry (typically the largest or first one)
            main_entry = tva_entries[0]
            rate = main_entry.get('percent', 19)
            tva_amount = main_entry.get('amount')
            if tva_amount and rate > 0:
                try:
                    tva_dec = Decimal(str(tva_amount))
                    # total = tva * (100 + rate) / rate
                    tva_implied_total = (tva_dec * Decimal(100 + rate) / Decimal(rate)).quantize(Decimal('0.01'))
                except (InvalidOperation, ValueError, TypeError):
                    pass
-        # Case 1: Amount is valid with high confidence - validate against TVA and payments
+        # Multi-source consensus approach (3% tolerance for multi-rate TVA rounding)
        CONSENSUS_TOLERANCE = 3.0  # 3% tolerance
        # Collect all available sources with their confidences
        sources = []
        if amount and amount > 0:
            sources.append(('extracted', float(amount), confidence_amount))
        if payment_sum > 0:
            sources.append(('payment', float(payment_sum), 0.92))  # Payment is very reliable
        if tva_implied_total and tva_implied_total > 0:
            sources.append(('tva_calc', float(tva_implied_total), 0.88))  # TVA calc is reliable
        print(f"[Cross-Validation] Sources: {[(s[0], f'{s[1]:.2f}', f'{s[2]:.2f}') for s in sources]}", flush=True)
        # Find consensus: 2+ sources within tolerance
        if len(sources) >= 2:
            for i, (name1, val1, conf1) in enumerate(sources):
                for name2, val2, conf2 in sources[i+1:]:
                    if val1 <= 0 or val2 <= 0:
                        continue
                    diff_pct = abs(val1 - val2) / max(val1, val2) * 100
                    if diff_pct <= CONSENSUS_TOLERANCE:
                        # Consensus found! Use value from higher-confidence source
                        if conf1 >= conf2:
                            consensus_val, consensus_conf = val1, conf1
                        else:
                            consensus_val, consensus_conf = val2, conf2
                        # Boost confidence for consensus
                        consensus_conf = min(0.98, consensus_conf + 0.05)
                        print(f"[Cross-Validation] Consensus: {name1}={val1:.2f} ≈ {name2}={val2:.2f} (diff={diff_pct:.1f}%)", flush=True)
                        return Decimal(str(round(consensus_val, 2))), consensus_conf, f"consensus ({name1}+{name2})"
        # No consensus - check if extracted is an outlier (differs >10% from all others)
        if amount and amount > 0 and len(sources) >= 2:
            other_sources = [s for s in sources if s[0] != 'extracted']
            if other_sources:
                extracted_val = float(amount)
                all_differ = all(
                    abs(extracted_val - s[1]) / max(extracted_val, s[1]) * 100 > 10
                    for s in other_sources if s[1] > 0
                )
                if all_differ:
                    # Extracted differs significantly from all others - use the best other source
                    best_other = max(other_sources, key=lambda s: s[2])
                    print(f"[Cross-Validation] Extracted outlier: {extracted_val:.2f} differs >10% from all others, using {best_other[0]}={best_other[1]:.2f}", flush=True)
                    return Decimal(str(round(best_other[1], 2))), best_other[2], f"corrected (extracted outlier, using {best_other[0]})"
        # Fallback: Case 1 - Amount valid with high confidence
        if amount and amount > 0 and confidence_amount >= 0.8:
-            # First check TVA-implied total (most reliable when TVA is extracted correctly)
+            # Check TVA-implied total
            if tva_implied_total and tva_implied_total > 0:
                tva_diff_percent = abs(float(amount) - float(tva_implied_total)) / float(tva_implied_total) * 100
-                if tva_diff_percent <= 1:
+                if tva_diff_percent <= 3:
                    # Near-perfect TVA match - highest confidence
                    return amount, min(0.98, confidence_amount + 0.05), "extracted (validated by TVA)"
                elif tva_diff_percent > 10:
                    # Significant mismatch - TVA-implied total is more reliable
                    # This catches cases where wrong TOTAL line was extracted (e.g., REST, SUBTOTAL)
                    print(f"[Cross-Validation] Amount mismatch with TVA: extracted={amount}, tva_implied={tva_implied_total} (diff={tva_diff_percent:.1f}%)", flush=True)
                    return tva_implied_total, 0.90, "calculated from TVA (extracted amount mismatch)"
            # Cross-validate with payment methods
            if payment_sum > 0 and abs(amount - payment_sum) <= Decimal('0.02'):
                # Perfect match - boost confidence
                return amount, min(0.98, confidence_amount + 0.05), "extracted (validated by payment methods)"
            elif payment_sum > 0:
                payment_diff_percent = abs(float(amount) - float(payment_sum)) / float(payment_sum) * 100
                if payment_diff_percent > 10:
                    # Significant mismatch - payment sum is more reliable
                    print(f"[Cross-Validation] Amount mismatch with payments: extracted={amount}, payments={payment_sum} (diff={payment_diff_percent:.1f}%)", flush=True)
                    return payment_sum, 0.88, "calculated from payment methods (extracted amount mismatch)"
@@ -696,29 +856,22 @@ class ReceiptExtractor:
        # Case 2: Amount exists but low confidence - try to validate/correct
        if amount and amount > 0:
            # First check TVA-implied total (most reliable)
            if tva_implied_total and tva_implied_total > 0:
                tva_diff_percent = abs(float(amount) - float(tva_implied_total)) / float(tva_implied_total) * 100
-                if tva_diff_percent <= 2:
+                if tva_diff_percent <= 3:
                    # Close match - boost confidence
                    return amount, 0.88, "extracted (validated by TVA)"
                elif tva_diff_percent > 10:
                    # Significant mismatch - use TVA-implied total
                    print(f"[Cross-Validation] Amount mismatch with TVA: extracted={amount}, tva_implied={tva_implied_total} (diff={tva_diff_percent:.1f}%)", flush=True)
                    return tva_implied_total, 0.85, "calculated from TVA"
            # Check if payment methods sum matches
            if payment_sum > 0:
                payment_diff_percent = abs(float(amount) - float(payment_sum)) / float(payment_sum) * 100
-                if payment_diff_percent <= 0.5:
+                if payment_diff_percent <= 1:
                    # Close match - boost confidence
                    return amount, 0.90, "extracted (validated by payment methods)"
                elif payment_diff_percent > 10:
                    # Mismatch - prefer payment_sum as it's more reliable
                    print(f"[Cross-Validation] Amount mismatch: extracted={amount}, payments={payment_sum}", flush=True)
                    return payment_sum, 0.85, "calculated from payment methods"
            # No validation possible - return as-is
            return amount, confidence_amount, "extracted (unvalidated)"
        # Case 3: Amount is 0 or None - calculate from payment methods
@@ -946,6 +1099,28 @@ class ReceiptExtractor:
        return name
    def _get_store_profile(self, cui: Optional[str]) -> Optional[dict]:
        """
        Get store-specific profile by CUI.
        DEPRECATED: Use ProfileRegistry.get_profile() directly for profile objects.
        This method is kept for backward compatibility and returns validation hints dict.
        Args:
            cui: The CUI extracted from receipt (with or without RO prefix)
        Returns:
            Store profile validation hints dict or None if not found
        """
        profile = ProfileRegistry.get_profile(cui)
        if profile:
            # Return validation hints for backward compatibility
            hints = profile.get_validation_hints()
            hints['name'] = profile.STORE_NAME
            print(f"[Store Profile] Found profile for {cui}: {profile.STORE_NAME}", flush=True)
            return hints
        return None
    def _extract_cui(self, text_upper: str, original_text: str) -> Tuple[Optional[str], float]:
        """
        Extract vendor CUI (fiscal identification code) from text.
@@ -1020,11 +1195,114 @@ class ReceiptExtractor:
        # Default to bon_fiscal if neither found
        return 'bon_fiscal'
    def _try_pattern_lidl(self, text: str) -> List[dict]:
        """
        Try Lidl-style TVA pattern: "TVA A 21,00% 7.71" (no hyphen/colon separator).
        Lidl receipts format:
            TOTAL TVA 9,84
            TVA A 21,00% 7,71
            TVA B 11,00% 2,13
        Returns list of TVA entries found.
        """
        entries = []
        seen = set()
        # Pattern: TVA/TUA/IVA + code (A-D) + percent + amount (on same line)
        # Handles: "TVA A 21,00% 7,71", "TVA B 11,00% 2,13", "TUA A 21% 7.71"
        lidl_patterns = [
            # Same line: "TVA A  21,00%   7.71" (with various spacing)
            r'T[VU][AR]\s+([A-D])\s+(\d{1,2})[.,]?\d{0,2}\s*%\s+([\d.,]+)',
            # Same line with backslash (OCR artifact): "TVA A \21,00% 7.71"
            r'T[VU][AR]\s+([A-D])\s+\\?(\d{1,2})[.,]?\d{0,2}\s*%\s+([\d.,]+)',
            # IVA variant
            r'IVA\s+([A-D])\s+(\d{1,2})[.,]?\d{0,2}\s*%\s+([\d.,]+)',
        ]
        for pattern in lidl_patterns:
            for match in re.finditer(pattern, text, re.IGNORECASE):
                try:
                    code = match.group(1).upper()
                    percent = int(match.group(2))
                    amount_str = self._normalize_number(match.group(3))
                    amount = Decimal(amount_str)
                    if amount > 0:
                        entry_key = (code, percent)
                        if entry_key not in seen:
                            entries.append({
                                'code': code,
                                'percent': percent,
                                'amount': amount
                            })
                            seen.add(entry_key)
                            print(f"[TVA Lidl] Found: TVA {code} {percent}% = {amount}", flush=True)
                except (ValueError, InvalidOperation):
                    continue
        return entries
    def _select_best_tva_candidate(
        self,
        candidates: List[tuple],
        tva_bon_total: Optional[Decimal]
    ) -> Tuple[List[dict], Optional[Decimal]]:
        """
        Select the best TVA candidate from collected candidates.
        Selection criteria (priority order):
        1. Sum matches TOTAL TVA BON (highest priority)
        2. More entries = better (for multi-rate receipts)
        3. Pattern confidence as tiebreaker
        Args:
            candidates: List of (pattern_name, confidence, entries, sum)
            tva_bon_total: Authoritative TOTAL TVA BON value (if extracted)
        Returns:
            (best_entries, best_sum)
        """
        if not candidates:
            return [], None
        # Score each candidate
        scored = []
        for name, confidence, entries, sum_val in candidates:
            score = 0.0
            # Criterion 1: Sum matches TOTAL TVA BON (highest priority)
            if tva_bon_total and sum_val:
                tolerance = max(Decimal('0.02'), tva_bon_total * Decimal('0.02'))  # 2% tolerance
                if abs(sum_val - tva_bon_total) <= tolerance:
                    score += 100  # High bonus for matching authoritative total
                    print(f"[TVA Select] {name}: sum {sum_val} matches tva_bon_total {tva_bon_total}", flush=True)
            # Criterion 2: More entries (for multi-rate receipts)
            score += len(entries) * 10
            # Criterion 3: Pattern confidence
            score += confidence * 5
            scored.append((score, name, confidence, entries, sum_val))
            print(f"[TVA Select] Candidate {name}: score={score:.1f}, entries={len(entries)}, sum={sum_val}", flush=True)
        # Sort by score descending
        scored.sort(key=lambda x: x[0], reverse=True)
        best = scored[0]
        print(f"[TVA Select] Winner: {best[1]} (score={best[0]:.1f})", flush=True)
        return best[3], best[4]
    def _extract_tva_entries(self, text: str) -> Tuple[List[dict], Optional[Decimal]]:
        """
        Extract multiple TVA (VAT) entries from text.
        Romanian receipts can have multiple TVA rates (A=19%, B=9%, C=5%, D=0%).
        Uses CANDIDATE COLLECTION approach:
        - Try ALL patterns and collect candidates
        - Select best candidate based on matching TOTAL TVA BON
        Returns (tva_entries, tva_total) where tva_entries is a list of:
            {'code': 'A', 'percent': 19, 'amount': Decimal('15.20')}
        """
@@ -1054,6 +1332,22 @@ class ReceiptExtractor:
        # Also normalize comma followed by space to comma (for "21, 00%" -> "21,00%")
        normalized_text = re.sub(r'(\d+),\s+(\d{2})\s*%', r'\1.\2%', normalized_text)
        # Extract TOTAL TVA BON/TOTAL TVA first as the authoritative reference
        tva_bon_total = self._extract_total_tva_bon(normalized_text)
        print(f"[TVA Debug] TOTAL TVA BON: {tva_bon_total}", flush=True)
        # CANDIDATE COLLECTION APPROACH: Try all patterns, collect candidates, select best
        all_candidates = []  # List of (pattern_name, confidence, entries, sum)
        # === LIDL-STYLE PATTERNS (NEW) ===
        # Lidl format: "TVA A 21,00% 7.71" or "TVA B 11,00% 2.13" (no hyphen/colon)
        # This pattern handles multi-rate TVA receipts
        lidl_entries = self._try_pattern_lidl(normalized_text)
        if lidl_entries:
            lidl_sum = sum(e['amount'] for e in lidl_entries)
            all_candidates.append(('lidl', 0.96, lidl_entries, lidl_sum))
            print(f"[TVA Debug] Lidl pattern: {len(lidl_entries)} entries, sum={lidl_sum}", flush=True)
        # Pattern 0a: First try to get TVA from "TOTAL TAXE:" which is most reliable
        # Format: "TOTAL TAXE: 55,22" - this is always the TVA amount
        # OCR may cut "T" producing "OTAL TAXE:" instead of "TOTAL TAXE:"
@@ -1372,10 +1666,21 @@ class ReceiptExtractor:
                    except (ValueError, InvalidOperation):
                        continue
-        # Extract TOTAL TVA BON as reference (separate from individual entries)
+        # Add existing extraction results to candidates (if any)
-        tva_bon_total = self._extract_total_tva_bon(normalized_text)
+        if tva_entries:
            entries_sum = sum(entry['amount'] for entry in tva_entries)
            all_candidates.append(('standard', 0.90, tva_entries, entries_sum))
            print(f"[TVA Debug] Standard patterns: {len(tva_entries)} entries, sum={entries_sum}", flush=True)
-        # Calculate sum from entries
+        # === CANDIDATE SELECTION ===
        # Select best candidate using TOTAL TVA BON as authoritative reference
        if all_candidates:
            best_entries, best_sum = self._select_best_tva_candidate(all_candidates, tva_bon_total)
            if best_entries:
                tva_entries = best_entries
                entries_sum = best_sum
        # Calculate sum from entries (if not set by candidate selection)
        entries_sum = None
        if tva_entries:
            entries_sum = sum(entry['amount'] for entry in tva_entries)
--- a/scripts/generate_store_profile.py
+++ b/scripts/generate_store_profile.py
@@ -0,0 +1,600 @@
 #!/usr/bin/env python3
 """
 Store Profile Generator Script
 Analyzes PDF receipts from a store and generates a Python profile class
 for the OCR extraction system.
 Usage:
    python scripts/generate_store_profile.py \
        --name "Magazin Exemplu" \
        --cui "12345678" \
        --receipts "docs/data-entry/MagazinExemplu*.pdf" \
        --output "backend/modules/data_entry/services/ocr/profiles/magazin_exemplu.py"
 Features:
    - Submits PDFs to OCR API
    - Analyzes extracted text for patterns (TVA, total, date, payment)
    - Generates a BaseStoreProfile subclass with detected patterns
    - Supports hot-reload via ProfileRegistry
 Requirements:
    - Backend server running on localhost:8000
    - JWT authentication
    - python-jose, requests packages
 """
 import argparse
 import glob
 import json
 import os
 import re
 import sys
 import time
 from collections import Counter, defaultdict
 from datetime import datetime, timedelta, timezone
 from pathlib import Path
 from typing import Dict, List, Optional, Tuple
 try:
    import requests
    from jose import jwt
 except ImportError:
    print("Error: Required packages not installed.")
    print("Run: pip install python-jose requests")
    sys.exit(1)
 # Configuration
 API_BASE = os.getenv("API_BASE", "http://localhost:8000")
 JWT_SECRET = os.getenv("JWT_SECRET_KEY", "GENERATE_NEW_SECRET_FOR_PRODUCTION3334!")
 def create_jwt_token() -> str:
    """Create a test JWT token for API authentication."""
    payload = {
        "username": "PROFILE_GENERATOR",
        "user_id": 1,
        "companies": ["604"],
        "permissions": ["read", "write"],
        "exp": datetime.now(timezone.utc) + timedelta(hours=1),
        "iat": datetime.now(timezone.utc),
        "type": "access"
    }
    return jwt.encode(payload, JWT_SECRET, algorithm="HS256")
 def submit_ocr(pdf_path: str, token: str, api_base: str = API_BASE, timeout: int = 120) -> Optional[Dict]:
    """
    Submit a PDF to OCR API and wait for result.
    Args:
        pdf_path: Path to PDF file
        token: JWT authentication token
        api_base: API base URL
        timeout: Max seconds to wait for completion
    Returns:
        Extraction result dict or None on failure
    """
    headers = {"Authorization": f"Bearer {token}"}
    filename = os.path.basename(pdf_path)
    print(f"  Submitting: {filename}...", end=" ", flush=True)
    try:
        with open(pdf_path, "rb") as f:
            files = {"file": (filename, f, "application/pdf")}
            response = requests.post(
                f"{api_base}/api/data-entry/ocr/extract?engine=doctr_plus",
                files=files,
                headers=headers,
                timeout=30
            )
        if response.status_code != 200:
            print(f"FAILED (HTTP {response.status_code})")
            return None
        job_data = response.json()
        job_id = job_data.get("job_id")
        if not job_id:
            print("FAILED (no job_id)")
            return None
        # Poll for completion
        start_time = time.time()
        while time.time() - start_time < timeout:
            poll_response = requests.get(
                f"{api_base}/api/data-entry/ocr/jobs/{job_id}/wait?timeout=30",
                headers=headers,
                timeout=35
            )
            if poll_response.status_code == 200:
                job_result = poll_response.json()
                status = job_result.get("status")
                if status == "completed":
                    elapsed = time.time() - start_time
                    print(f"OK ({elapsed:.1f}s)")
                    return job_result.get("result", {})
                elif status == "error":
                    print(f"ERROR: {job_result.get('error', 'Unknown')}")
                    return None
            time.sleep(2)
        print("TIMEOUT")
        return None
    except Exception as e:
        print(f"EXCEPTION: {e}")
        return None
 def analyze_tva_patterns(results: List[Dict]) -> Dict:
    """
    Analyze TVA patterns from multiple extraction results.
    Returns:
        Dict with detected patterns and statistics
    """
    tva_entries = []
    raw_texts = []
    for r in results:
        if r.get("tva_entries"):
            tva_entries.extend(r["tva_entries"])
        if r.get("raw_text"):
            raw_texts.append(r["raw_text"])
    # Analyze TVA code patterns (A, B, C, etc.)
    codes = Counter(e.get("code") for e in tva_entries if e.get("code"))
    # Analyze TVA percentage patterns
    percents = Counter(e.get("percent") for e in tva_entries if e.get("percent"))
    # Detect TVA format from raw text
    tva_formats = defaultdict(int)
    for text in raw_texts:
        text_upper = text.upper()
        # Standard format: "TVA 19% 10.50" or "TVA: 19% 10.50"
        if re.search(r'TVA\s*:?\s*\d{1,2}%', text_upper):
            tva_formats["standard"] += 1
        # Lidl format: "TVA A 21% 7.71"
        if re.search(r'TVA\s+[A-D]\s+\d{1,2}', text_upper):
            tva_formats["lidl_multi_rate"] += 1
        # Table format: "BAZA TVA | % TVA | VALOARE TVA"
        if re.search(r'BAZA\s+TVA', text_upper):
            tva_formats["table"] += 1
        # No TVA (neplatitor)
        if re.search(r'NEPLATITOR|NON.?TVA', text_upper):
            tva_formats["non_vat"] += 1
    return {
        "codes": dict(codes),
        "percents": dict(percents),
        "formats": dict(tva_formats),
        "has_multi_rate": len(codes) > 1,
        "is_non_vat": tva_formats.get("non_vat", 0) > 0,
        "dominant_format": max(tva_formats, key=tva_formats.get) if tva_formats else "standard"
    }
 def analyze_total_patterns(results: List[Dict]) -> Dict:
    """Analyze TOTAL patterns from extraction results."""
    totals = []
    raw_texts = []
    for r in results:
        if r.get("amount"):
            totals.append(float(r["amount"]))
        if r.get("raw_text"):
            raw_texts.append(r["raw_text"])
    total_formats = defaultdict(int)
    for text in raw_texts:
        text_upper = text.upper()
        if re.search(r'TOTAL\s*:?\s*[\d.,]+', text_upper):
            total_formats["TOTAL:"] += 1
        if re.search(r'TOTAL\s+DE\s+PLAT', text_upper):
            total_formats["TOTAL DE PLATA"] += 1
        if re.search(r'SUMA\s+TOTAL', text_upper):
            total_formats["SUMA TOTALA"] += 1
        if re.search(r'GRAND\s*TOTAL', text_upper):
            total_formats["GRAND TOTAL"] += 1
    return {
        "count": len(totals),
        "formats": dict(total_formats),
        "dominant_format": max(total_formats, key=total_formats.get) if total_formats else "TOTAL"
    }
 def analyze_date_patterns(results: List[Dict]) -> Dict:
    """Analyze date patterns from extraction results."""
    dates = []
    raw_texts = []
    for r in results:
        if r.get("receipt_date"):
            dates.append(r["receipt_date"])
        if r.get("raw_text"):
            raw_texts.append(r["raw_text"])
    date_formats = defaultdict(int)
    for text in raw_texts:
        # DD.MM.YYYY
        if re.search(r'\d{2}\.\d{2}\.\d{4}', text):
            date_formats["DD.MM.YYYY"] += 1
        # YYYY.MM.DD (OMV/SOCAR style)
        if re.search(r'\d{4}\.\d{2}\.\d{2}', text):
            date_formats["YYYY.MM.DD"] += 1
        # DD-MM-YYYY
        if re.search(r'\d{2}-\d{2}-\d{4}', text):
            date_formats["DD-MM-YYYY"] += 1
        # DD/MM/YYYY
        if re.search(r'\d{2}/\d{2}/\d{4}', text):
            date_formats["DD/MM/YYYY"] += 1
    return {
        "extracted_dates": dates,
        "formats": dict(date_formats),
        "dominant_format": max(date_formats, key=date_formats.get) if date_formats else "DD.MM.YYYY"
    }
 def analyze_payment_patterns(results: List[Dict]) -> Dict:
    """Analyze payment method patterns."""
    payment_counts = defaultdict(int)
    for r in results:
        methods = r.get("payment_methods", [])
        for m in methods:
            method_type = m.get("method", "UNKNOWN")
            payment_counts[method_type] += 1
    return {
        "methods": dict(payment_counts),
        "has_mixed_payments": len(payment_counts) > 1
    }
 def analyze_client_patterns(results: List[Dict]) -> Dict:
    """Analyze client (B2B) patterns."""
    has_client_cui = 0
    has_client_name = 0
    for r in results:
        if r.get("client_cui"):
            has_client_cui += 1
        if r.get("client_name"):
            has_client_name += 1
    return {
        "has_client_cui": has_client_cui > 0,
        "has_client_name": has_client_name > 0,
        "b2b_ratio": has_client_cui / len(results) if results else 0
    }
 def generate_profile_code(
    store_name: str,
    cui: str,
    tva_analysis: Dict,
    total_analysis: Dict,
    date_analysis: Dict,
    payment_analysis: Dict,
    client_analysis: Dict
 ) -> str:
    """
    Generate Python profile class code.
    Args:
        store_name: Human-readable store name
        cui: CUI number (without RO prefix)
        *_analysis: Analysis results from pattern detection
    Returns:
        Python source code for the profile class
    """
    # Generate class name from store name
    class_name = "".join(
        word.capitalize()
        for word in re.sub(r'[^a-zA-Z0-9\s]', '', store_name).split()
    ) + "Profile"
    # Generate module name
    module_name = re.sub(r'[^a-z0-9]', '_', store_name.lower()).strip('_')
    # Determine profile characteristics
    is_non_vat = tva_analysis.get("is_non_vat", False)
    has_multi_rate = tva_analysis.get("has_multi_rate", False)
    has_client_cui = client_analysis.get("has_client_cui", False)
    uses_yyyy_mm_dd = date_analysis.get("dominant_format") == "YYYY.MM.DD"
    # Generate OCR name patterns
    name_words = store_name.upper().split()
    primary_word = name_words[0] if name_words else store_name.upper()
    name_patterns = [
        primary_word,
        store_name.upper().replace(".", "").replace(",", ""),
    ]
    # Add OCR error variants
    ocr_variants = {
        'O': '0', 'I': '1', 'L': '1', 'S': '5', 'B': '8', 'E': '3'
    }
    for char, replacement in ocr_variants.items():
        if char in primary_word:
            name_patterns.append(primary_word.replace(char, replacement, 1))
    name_patterns = list(dict.fromkeys(name_patterns))[:4]  # Unique, max 4
    # Build the code
    code_lines = [
        '"""',
        f'{store_name} store profile for OCR extraction.',
        '',
        'Auto-generated by generate_store_profile.py',
        f'Generated: {datetime.now().strftime("%Y-%m-%d %H:%M")}',
        '"""',
        '',
        'import re',
        'from decimal import Decimal, InvalidOperation',
        'from typing import List, Dict, Any',
        '',
        'from .base import BaseStoreProfile',
        'from . import ProfileRegistry',
        '',
        '',
        '@ProfileRegistry.register',
        f'class {class_name}(BaseStoreProfile):',
        '    """',
        f'    {store_name} - OCR extraction profile.',
        '    ',
    ]
    # Add characteristics to docstring
    characteristics = []
    if is_non_vat:
        characteristics.append("Non-VAT payer (neplatitor TVA)")
    if has_multi_rate:
        characteristics.append("Multi-rate TVA")
    if has_client_cui:
        characteristics.append("B2B receipts with client CUI")
    if uses_yyyy_mm_dd:
        characteristics.append("Date format: YYYY.MM.DD")
    if characteristics:
        code_lines.append('    Key characteristics:')
        for c in characteristics:
            code_lines.append(f'    - {c}')
        code_lines.append('    ')
    code_lines.extend([
        '    """',
        '',
        f'    CUI_LIST = ["{cui}"]',
        f'    NAME_PATTERNS = {name_patterns}',
        f'    STORE_NAME = "{store_name}"',
        '',
    ])
    # Add date patterns override for YYYY.MM.DD format
    if uses_yyyy_mm_dd:
        code_lines.extend([
            '    # Override date patterns for YYYY.MM.DD format',
            '    DATE_PATTERNS_OCR_SPACES = [',
            '        r\'(\\d{4})[.,]\\s*(\\d{2})[.,]\\s*(\\d{2})\',  # YYYY. MM. DD with spaces',
            '        r\'(\\d{4})[.,](\\d{2})[.,](\\d{2})\',  # YYYY.MM.DD',
            '    ]',
            '',
        ])
    # Add TVA extraction method for multi-rate or non-VAT
    if is_non_vat:
        code_lines.extend([
            '    def extract_tva_entries(self, text: str) -> List[dict]:',
            '        """Non-VAT payer - returns empty list."""',
            '        return []',
            '',
        ])
    elif has_multi_rate and tva_analysis.get("dominant_format") == "lidl_multi_rate":
        code_lines.extend([
            '    # Store-specific TVA patterns',
            '    TVA_PATTERNS = [',
            '        r\'T[VU][AR]\\s+([A-D])\\s+(\\d{1,2})[.,]?\\d{0,2}\\s*%\\s+([\\d.,]+)\',',
            '    ]',
            '',
            '    def extract_tva_entries(self, text: str) -> List[dict]:',
            '        """Extract multi-rate TVA entries."""',
            '        entries = []',
            '        seen = set()',
            '',
            '        for pattern in self.TVA_PATTERNS:',
            '            for match in re.finditer(pattern, text, re.IGNORECASE):',
            '                try:',
            '                    code = match.group(1).upper()',
            '                    percent = int(match.group(2))',
            '                    amount = self._parse_decimal(match.group(3))',
            '',
            '                    if amount and amount > 0:',
            '                        entry_key = (code, percent)',
            '                        if entry_key not in seen:',
            '                            entries.append({',
            '                                \'code\': code,',
            '                                \'percent\': percent,',
            '                                \'amount\': amount',
            '                            })',
            '                            seen.add(entry_key)',
            '                except (ValueError, InvalidOperation):',
            '                    continue',
            '',
            '        return entries',
            '',
        ])
    # Add validation hints method
    code_lines.extend([
        '    def get_validation_hints(self) -> Dict[str, Any]:',
        f'        """Return {store_name}-specific validation hints."""',
        '        return {',
        f'            "has_multi_rate_tva": {has_multi_rate},',
        f'            "card_equals_total": True,',
        f'            "has_client_cui": {has_client_cui},',
        f'            "has_efactura": False,',
        f'            "is_non_vat_payer": {is_non_vat},',
        '        }',
    ])
    return '\n'.join(code_lines) + '\n'
 def main():
    parser = argparse.ArgumentParser(
        description="Generate store profile from PDF receipts",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
    # Generate profile from a single PDF
    python scripts/generate_store_profile.py \\
        --name "Magazin Nou" --cui "12345678" \\
        --receipts "docs/data-entry/magazin_nou.pdf"
    # Generate profile from multiple PDFs (glob pattern)
    python scripts/generate_store_profile.py \\
        --name "Carrefour" --cui "2475489" \\
        --receipts "docs/data-entry/Carrefour*.pdf" \\
        --output backend/modules/data_entry/services/ocr/profiles/carrefour.py
    # Dry run (analyze only, don't write file)
    python scripts/generate_store_profile.py \\
        --name "Test Store" --cui "11111111" \\
        --receipts "docs/data-entry/test*.pdf" \\
        --dry-run
        """
    )
    parser.add_argument("--name", required=True, help="Store name (e.g., 'LIDL DISCOUNT S.R.L.')")
    parser.add_argument("--cui", required=True, help="CUI number without RO prefix")
    parser.add_argument("--receipts", required=True, help="PDF file path or glob pattern")
    parser.add_argument("--output", help="Output file path (default: auto-generated)")
    parser.add_argument("--dry-run", action="store_true", help="Analyze only, don't write file")
    parser.add_argument("--api-base", default=API_BASE, help=f"API base URL (default: {API_BASE})")
    args = parser.parse_args()
    # Update API base if provided
    api_base = args.api_base
    # Validate CUI format
    cui = args.cui.strip().replace("RO", "").replace(" ", "")
    if not cui.isdigit() or len(cui) < 6 or len(cui) > 10:
        print(f"Error: Invalid CUI format: {args.cui}")
        sys.exit(1)
    # Find PDF files
    pdf_files = glob.glob(args.receipts)
    if not pdf_files:
        print(f"Error: No PDF files found matching: {args.receipts}")
        sys.exit(1)
    print(f"\n{'='*60}")
    print(f"Store Profile Generator")
    print(f"{'='*60}")
    print(f"Store: {args.name}")
    print(f"CUI: {cui}")
    print(f"PDFs: {len(pdf_files)} files")
    print(f"{'='*60}\n")
    # Generate JWT token
    token = create_jwt_token()
    # Submit PDFs to OCR
    print("Step 1: Submitting PDFs to OCR API...")
    results = []
    for pdf_path in pdf_files:
        result = submit_ocr(pdf_path, token, api_base=api_base)
        if result:
            results.append(result)
    if not results:
        print("\nError: No successful extractions. Check if backend is running.")
        sys.exit(1)
    print(f"\nSuccessfully extracted: {len(results)}/{len(pdf_files)} PDFs")
    # Analyze patterns
    print("\nStep 2: Analyzing patterns...")
    tva_analysis = analyze_tva_patterns(results)
    total_analysis = analyze_total_patterns(results)
    date_analysis = analyze_date_patterns(results)
    payment_analysis = analyze_payment_patterns(results)
    client_analysis = analyze_client_patterns(results)
    print(f"  TVA: {tva_analysis['dominant_format']} format, multi-rate={tva_analysis['has_multi_rate']}")
    print(f"  Date: {date_analysis['dominant_format']} format")
    print(f"  Payments: {list(payment_analysis['methods'].keys())}")
    print(f"  B2B: {client_analysis['has_client_cui']}")
    # Generate profile code
    print("\nStep 3: Generating profile code...")
    code = generate_profile_code(
        store_name=args.name,
        cui=cui,
        tva_analysis=tva_analysis,
        total_analysis=total_analysis,
        date_analysis=date_analysis,
        payment_analysis=payment_analysis,
        client_analysis=client_analysis
    )
    # Determine output path
    if args.output:
        output_path = args.output
    else:
        module_name = re.sub(r'[^a-z0-9]', '_', args.name.lower()).strip('_')
        output_path = f"backend/modules/data_entry/services/ocr/profiles/{module_name}.py"
    if args.dry_run:
        print(f"\n[DRY RUN] Would write to: {output_path}")
        print(f"\n{'='*60}")
        print("Generated code:")
        print(f"{'='*60}")
        print(code)
    else:
        # Write file
        os.makedirs(os.path.dirname(output_path), exist_ok=True)
        with open(output_path, 'w') as f:
            f.write(code)
        print(f"  Written to: {output_path}")
        # Verify syntax
        import py_compile
        try:
            py_compile.compile(output_path, doraise=True)
            print(f"  Syntax check: OK")
        except py_compile.PyCompileError as e:
            print(f"  Syntax check: FAILED - {e}")
    print(f"\n{'='*60}")
    print("Profile generation complete!")
    print(f"{'='*60}")
    if not args.dry_run:
        print(f"\nNext steps:")
        print(f"1. Review the generated code: {output_path}")
        print(f"2. Customize patterns if needed")
        print(f"3. Hot-reload profiles: curl -X POST http://localhost:8000/api/data-entry/ocr/profiles/reload")
        print(f"4. Test with a sample receipt")
 if __name__ == "__main__":
    main()