Files
roa2web-service-auto/docs/data-entry/ARCHITECTURE.md
Marius Mutu 9008876b16 chore: Remove obsolete microservices directories and update all references
- Delete data-entry-app/ (1.6GB), reports-app/ (447MB), .auto-build-data/
- Saved ~1.4GB disk space (64% reduction: 2.2GB → 845MB)

Updated references across 38 files:
- .claude/rules/ paths: backend/modules/, src/modules/
- .claude/commands/validate.md: all validation paths
- docs/ (13 files): data-entry, telegram, README, CLAUDE.md
- scripts/ (3 files): backup-secrets, restore-secrets, test-docker
- security/ (2 files): git_cleanup, SECURITY_PROCEDURES
- deployment/ & shared/: updated all stale comments

All paths now reflect ultrathin monolith architecture:
- Backend: backend/modules/{reports,data_entry,telegram}/
- Frontend: src/modules/{reports,data-entry}/
- Shared: shared/{auth,database,routes}/

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30 12:08:20 +02:00

13 KiB

Architecture: Data Entry App

Overview

Aplicatie separata pentru introducere date in ERP, cu workflow de aprobare si staging area inainte de sincronizare in Oracle.

Decizii Arhitecturale

1. SQLModel + Alembic

Alegere: SQLModel (Pydantic + SQLAlchemy) cu Alembic pentru migrari

Motivatie:

  • Creat de autorul FastAPI - integrare perfecta
  • Un model = Pydantic + SQLAlchemy - nu duplici definitii
  • Async support nativ
  • Alembic - standard industrial pentru migrari
  • Validare automata - Pydantic valideaza input, SQLAlchemy gestioneaza DB

Alternative considerate:

  • SQLAlchemy pur: Mai verbose, necesita scheme Pydantic separate
  • Tortoise ORM: Async nativ dar comunitate mai mica
  • Peewee: Simplu dar fara async

2. Separare de Reports-App

Alegere: Aplicatie separata in backend/modules/data_entry/

Motivatie:

  • Responsabilitati diferite: reports = read-only, data-entry = write
  • Lifecycle diferit: data-entry poate avea releases mai frecvente
  • Risc izolat: bug in data-entry nu afecteaza raportarile
  • Scalare independenta

Shared Components:

  • shared/database/oracle_pool.py - conexiune Oracle pentru nomenclatoare si autentificare
  • shared/auth/ - autentificare JWT comuna (middleware, routes factory, auth service)
  • shared/frontend/components/LoginView.vue - UI login partajat
  • shared/frontend/stores/auth.js - Pinia auth store factory
  • shared/frontend/styles/login.css - stiluri login

3. Workflow cu Staging Area

Alegere: SQLite local ca staging, apoi sync in Oracle

Motivatie:

  • Permite lucru offline (utilizator poate completa bonuri)
  • Review de contabil inainte de date in Oracle
  • Rollback simplu (stergem din SQLite)
  • Audit trail complet

Flow:

User Input → SQLite (staging) → Contabil Review → Oracle (final)

4. Storage Fisiere

Alegere: Filesystem local cu referinte in DB

Motivatie:

  • Simplu de implementat si backup
  • Performanta buna pentru imagini
  • Poate migra la S3/Azure Blob daca e nevoie

Structura:

data/uploads/
  {year}/
    {month}/
      {uuid}.{ext}

Diagrama Componente

┌─────────────────────────────────────────────────────────────────┐
│                        data-entry-app                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐   │
│  │   Frontend   │────▶│   Backend    │────▶│   SQLite     │   │
│  │   Vue.js     │     │   FastAPI    │     │   (staging)  │   │
│  │   :3010      │     │   :8003      │     │              │   │
│  └──────────────┘     └──────┬───────┘     └──────────────┘   │
│        │                     │                                  │
│        │ OCR Upload          │ Nomenclatoare                    │
│        ▼                     ▼                                  │
│  ┌──────────────┐     ┌──────────────┐                         │
│  │  OCR Service │     │   Oracle     │                         │
│  │  PaddleOCR   │     │ (read-only)  │                         │
│  │  +Tesseract  │     └──────────────┘                         │
│  └──────────────┘                                               │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Model de Date

Receipt (Bon Fiscal)

receipts
├── id (PK)
├── receipt_type: enum(bon_fiscal, chitanta)
├── direction: enum(cheltuiala, incasare)
├── receipt_number, receipt_series
├── receipt_date, amount, description
├── company_id, partner_id, partner_name
├── cash_register_id, cash_register_name
├── expense_type_code
├── status: enum(draft, pending_review, approved, rejected, synced)
├── created_by, created_at, updated_at
├── submitted_at, reviewed_by, reviewed_at
├── rejection_reason
└── oracle_synced_at, oracle_act_id, oracle_error

ReceiptAttachment (Atasamente)

receipt_attachments
├── id (PK)
├── receipt_id (FK)
├── filename, stored_filename
├── file_path, file_size, mime_type
└── uploaded_at

AccountingEntry (Note Contabile)

accounting_entries
├── id (PK)
├── receipt_id (FK)
├── entry_type: enum(debit, credit)
├── account_code, account_name
├── amount
├── partner_id, cost_center_id
├── is_auto_generated
└── modified_by, modified_at

Workflow States

     ┌─────────┐
     │  DRAFT  │◀────────────────────┐
     └────┬────┘                     │
          │ submit()                 │ (edit after reject)
          ▼                          │
   ┌──────────────┐                  │
   │PENDING_REVIEW│──────────────────┤
   └──────┬───────┘                  │
          │                          │
    ┌─────┴─────┐                    │
    ▼           ▼                    │
┌────────┐  ┌────────┐              │
│APPROVED│  │REJECTED│──────────────┘
└────┬───┘  └────────┘    resubmit()
     │
     │ (Faza 2)
     ▼
  ┌──────┐
  │SYNCED│
  └──────┘

Generare Note Contabile

Algoritm

def generate_entries(receipt):
    expense_type = EXPENSE_TYPES[receipt.expense_type_code]
    entries = []

    if expense_type.has_vat:
        net_amount = receipt.amount / Decimal('1.19')
        vat_amount = receipt.amount - net_amount

        # Cheltuiala (debit)
        entries.append(Entry(DEBIT, expense_type.account, net_amount))
        # TVA (debit)
        entries.append(Entry(DEBIT, "4426", vat_amount))
    else:
        entries.append(Entry(DEBIT, expense_type.account, receipt.amount))

    # Credit casa/banca
    entries.append(Entry(CREDIT, receipt.cash_register_account, receipt.amount))

    return entries

Exemplu: Bon Benzina 200 RON

Debit  6022  Cheltuieli combustibil  168.07
Debit  4426  TVA deductibila          31.93
Credit 5311  Casa in lei             200.00

Integrare Oracle (Faza 2)

Proceduri Stocate

-- 1. Initializare
pack_contafin.init_scriere_act_rul_local()

-- 2. Insert linii
INSERT INTO ACT_TEMP (
    ID_ACT, DATAIREG, DATAACT, SCD, ASCD, SCC, ASCC,
    SUMA, ID_CTR, ID_PARTD, EXPLICATIA, ...
)

-- 3. Finalizare
pack_contafin.finalizeaza_scriere_act_rul()
   SCRIE_IN_ACT()
   SCRIE_IN_RUL()
   Actualizare situatii (BV, BP, TVA)

Securitate

Autentificare

  • JWT tokens din shared auth
  • Middleware valideaza token si injecteaza user

Autorizare

  • Permisiuni verificate in services
  • Utilizator poate edita doar bonurile proprii in DRAFT
  • Doar contabil poate aproba/respinge

Upload Fisiere

  • Validare MIME type (whitelist)
  • Sanitizare nume fisier
  • Limita dimensiune (10MB)
  • Stocare cu UUID (previne path traversal)

Configuratie

Environment Variables

# SQLite Database
SQLITE_DATABASE_PATH=data/receipts.db

# File Storage
UPLOAD_PATH=data/uploads
MAX_UPLOAD_SIZE=10485760  # 10MB

# Oracle (pentru nomenclatoare)
ORACLE_USER=CONTAFIN_ORACLE
ORACLE_PASSWORD=***
ORACLE_HOST=localhost
ORACLE_PORT=1526
ORACLE_SID=ROA

# JWT (shared)
JWT_SECRET_KEY=***
JWT_ALGORITHM=HS256

OCR Processing Pipeline

5. OCR Architecture

Alegere: PaddleOCR (primar) + Tesseract (fallback), procesare 100% locala

Motivatie:

  • Zero costuri externe (fara API-uri cloud)
  • Procesare on-premise (date sensibile raman locale)
  • PaddleOCR: acuratete ridicata, CPU-friendly
  • Tesseract: suport excelent pentru diacritice romanesti

Stack OCR:

┌─────────────────────────────────────────────────────┐
│                   OCR Pipeline                       │
├─────────────────────────────────────────────────────┤
│                                                      │
│  Image Upload → ImagePreprocessor → OCREngine        │
│       │              │                  │            │
│       │              ▼                  ▼            │
│       │         ┌─────────┐      ┌──────────────┐   │
│       │         │ OpenCV  │      │ PaddleOCR    │   │
│       │         │ Pipeline│      │ (primary)    │   │
│       │         └─────────┘      └──────┬───────┘   │
│       │              │                  │            │
│       │              │           fallback│            │
│       │              │                  ▼            │
│       │              │           ┌──────────────┐   │
│       │              │           │ Tesseract    │   │
│       │              │           │ (ron+eng)    │   │
│       │              │           └──────────────┘   │
│       │              │                  │            │
│       ▼              ▼                  ▼            │
│  ┌──────────────────────────────────────────────┐   │
│  │           ReceiptExtractor (Regex)           │   │
│  │  - Amount patterns (TOTAL, DE PLATA)         │   │
│  │  - Date patterns (DD.MM.YYYY)                │   │
│  │  - CUI patterns (C.U.I., C.I.F.)             │   │
│  │  - Vendor extraction (first lines)           │   │
│  └──────────────────────────────────────────────┘   │
│                        │                             │
│                        ▼                             │
│              ExtractionResult + Confidence           │
│                                                      │
└─────────────────────────────────────────────────────┘

Image Preprocessing Pipeline

def preprocess(image):
    1. Convert to grayscale
    2. Resize if width < 1000px (upscale for better OCR)
    3. Deskew using Hough lines (straighten rotated text)
    4. Denoise (Non-local means denoising)
    5. Adaptive thresholding (binarization)
    6. Morphological close (connect broken characters)
    return processed_image

Extraction Patterns (Romanian Receipts)

Pattern Type Regex Examples Confidence
Amount TOTAL\s*:?\s*([\d.,]+) 0.95
Date (\d{2}[./]\d{2}[./]\d{4}) 0.90
CUI C\.?U\.?I\.?\s*:?\s*(\d{6,10}) 0.95
Receipt Number NR\.?\s*BON\s*:?\s*(\d+) 0.95
Vendor First 5 non-keyword lines 0.70

OCR API Endpoints

GET  /api/ocr/status                      # Check OCR availability
POST /api/ocr/extract                     # Extract from uploaded image
POST /api/ocr/extract-attachment/{id}     # Re-process existing attachment

System Dependencies

# Ubuntu/Debian
apt-get install -y \
    tesseract-ocr tesseract-ocr-ron tesseract-ocr-eng \
    poppler-utils libgl1-mesa-glx libglib2.0-0

Testing Strategy

Unit Tests

  • CRUD operations
  • Workflow transitions
  • Entry generation logic
  • OCR extraction patterns

Integration Tests

  • API endpoints
  • File upload/download
  • Oracle nomenclature fetch
  • OCR endpoint with sample receipts

E2E Tests

  • Complete workflow: create → submit → approve
  • File upload cu preview
  • OCR extraction → form auto-fill