# Architecture: Data Entry App ## Overview Aplicatie separata pentru introducere date in ERP, cu workflow de aprobare si staging area inainte de sincronizare in Oracle. ## Decizii Arhitecturale ### 1. SQLModel + Alembic **Alegere**: SQLModel (Pydantic + SQLAlchemy) cu Alembic pentru migrari **Motivatie**: - Creat de autorul FastAPI - integrare perfecta - Un model = Pydantic + SQLAlchemy - nu duplici definitii - Async support nativ - Alembic - standard industrial pentru migrari - Validare automata - Pydantic valideaza input, SQLAlchemy gestioneaza DB **Alternative considerate**: - SQLAlchemy pur: Mai verbose, necesita scheme Pydantic separate - Tortoise ORM: Async nativ dar comunitate mai mica - Peewee: Simplu dar fara async ### 2. Separare de Reports-App **Alegere**: Aplicatie separata in `data-entry-app/` **Motivatie**: - Responsabilitati diferite: reports = read-only, data-entry = write - Lifecycle diferit: data-entry poate avea releases mai frecvente - Risc izolat: bug in data-entry nu afecteaza raportarile - Scalare independenta **Shared Components**: - `shared/database/oracle_pool.py` - conexiune Oracle pentru nomenclatoare - `shared/auth/` - autentificare JWT comuna ### 3. Workflow cu Staging Area **Alegere**: SQLite local ca staging, apoi sync in Oracle **Motivatie**: - Permite lucru offline (utilizator poate completa bonuri) - Review de contabil inainte de date in Oracle - Rollback simplu (stergem din SQLite) - Audit trail complet **Flow**: ``` User Input → SQLite (staging) → Contabil Review → Oracle (final) ``` ### 4. Storage Fisiere **Alegere**: Filesystem local cu referinte in DB **Motivatie**: - Simplu de implementat si backup - Performanta buna pentru imagini - Poate migra la S3/Azure Blob daca e nevoie **Structura**: ``` data/uploads/ {year}/ {month}/ {uuid}.{ext} ``` ## Diagrama Componente ``` ┌─────────────────────────────────────────────────────────────────┐ │ data-entry-app │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Frontend │────▶│ Backend │────▶│ SQLite │ │ │ │ Vue.js │ │ FastAPI │ │ (staging) │ │ │ │ :3010 │ │ :8003 │ │ │ │ │ └──────────────┘ └──────┬───────┘ └──────────────┘ │ │ │ │ │ │ │ OCR Upload │ Nomenclatoare │ │ ▼ ▼ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ OCR Service │ │ Oracle │ │ │ │ PaddleOCR │ │ (read-only) │ │ │ │ +Tesseract │ └──────────────┘ │ │ └──────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ## Model de Date ### Receipt (Bon Fiscal) ``` receipts ├── id (PK) ├── receipt_type: enum(bon_fiscal, chitanta) ├── direction: enum(cheltuiala, incasare) ├── receipt_number, receipt_series ├── receipt_date, amount, description ├── company_id, partner_id, partner_name ├── cash_register_id, cash_register_name ├── expense_type_code ├── status: enum(draft, pending_review, approved, rejected, synced) ├── created_by, created_at, updated_at ├── submitted_at, reviewed_by, reviewed_at ├── rejection_reason └── oracle_synced_at, oracle_act_id, oracle_error ``` ### ReceiptAttachment (Atasamente) ``` receipt_attachments ├── id (PK) ├── receipt_id (FK) ├── filename, stored_filename ├── file_path, file_size, mime_type └── uploaded_at ``` ### AccountingEntry (Note Contabile) ``` accounting_entries ├── id (PK) ├── receipt_id (FK) ├── entry_type: enum(debit, credit) ├── account_code, account_name ├── amount ├── partner_id, cost_center_id ├── is_auto_generated └── modified_by, modified_at ``` ## Workflow States ``` ┌─────────┐ │ DRAFT │◀────────────────────┐ └────┬────┘ │ │ submit() │ (edit after reject) ▼ │ ┌──────────────┐ │ │PENDING_REVIEW│──────────────────┤ └──────┬───────┘ │ │ │ ┌─────┴─────┐ │ ▼ ▼ │ ┌────────┐ ┌────────┐ │ │APPROVED│ │REJECTED│──────────────┘ └────┬───┘ └────────┘ resubmit() │ │ (Faza 2) ▼ ┌──────┐ │SYNCED│ └──────┘ ``` ## Generare Note Contabile ### Algoritm ```python def generate_entries(receipt): expense_type = EXPENSE_TYPES[receipt.expense_type_code] entries = [] if expense_type.has_vat: net_amount = receipt.amount / Decimal('1.19') vat_amount = receipt.amount - net_amount # Cheltuiala (debit) entries.append(Entry(DEBIT, expense_type.account, net_amount)) # TVA (debit) entries.append(Entry(DEBIT, "4426", vat_amount)) else: entries.append(Entry(DEBIT, expense_type.account, receipt.amount)) # Credit casa/banca entries.append(Entry(CREDIT, receipt.cash_register_account, receipt.amount)) return entries ``` ### Exemplu: Bon Benzina 200 RON ``` Debit 6022 Cheltuieli combustibil 168.07 Debit 4426 TVA deductibila 31.93 Credit 5311 Casa in lei 200.00 ``` ## Integrare Oracle (Faza 2) ### Proceduri Stocate ```sql -- 1. Initializare pack_contafin.init_scriere_act_rul_local() -- 2. Insert linii INSERT INTO ACT_TEMP ( ID_ACT, DATAIREG, DATAACT, SCD, ASCD, SCC, ASCC, SUMA, ID_CTR, ID_PARTD, EXPLICATIA, ... ) -- 3. Finalizare pack_contafin.finalizeaza_scriere_act_rul() → SCRIE_IN_ACT() → SCRIE_IN_RUL() → Actualizare situatii (BV, BP, TVA) ``` ## Securitate ### Autentificare - JWT tokens din shared auth - Middleware valideaza token si injecteaza user ### Autorizare - Permisiuni verificate in services - Utilizator poate edita doar bonurile proprii in DRAFT - Doar contabil poate aproba/respinge ### Upload Fisiere - Validare MIME type (whitelist) - Sanitizare nume fisier - Limita dimensiune (10MB) - Stocare cu UUID (previne path traversal) ## Configuratie ### Environment Variables ```bash # SQLite Database SQLITE_DATABASE_PATH=data/receipts.db # File Storage UPLOAD_PATH=data/uploads MAX_UPLOAD_SIZE=10485760 # 10MB # Oracle (pentru nomenclatoare) ORACLE_USER=CONTAFIN_ORACLE ORACLE_PASSWORD=*** ORACLE_HOST=localhost ORACLE_PORT=1526 ORACLE_SID=ROA # JWT (shared) JWT_SECRET_KEY=*** JWT_ALGORITHM=HS256 ``` ## OCR Processing Pipeline ### 5. OCR Architecture **Alegere**: PaddleOCR (primar) + Tesseract (fallback), procesare 100% locala **Motivatie**: - Zero costuri externe (fara API-uri cloud) - Procesare on-premise (date sensibile raman locale) - PaddleOCR: acuratete ridicata, CPU-friendly - Tesseract: suport excelent pentru diacritice romanesti **Stack OCR**: ``` ┌─────────────────────────────────────────────────────┐ │ OCR Pipeline │ ├─────────────────────────────────────────────────────┤ │ │ │ Image Upload → ImagePreprocessor → OCREngine │ │ │ │ │ │ │ │ ▼ ▼ │ │ │ ┌─────────┐ ┌──────────────┐ │ │ │ │ OpenCV │ │ PaddleOCR │ │ │ │ │ Pipeline│ │ (primary) │ │ │ │ └─────────┘ └──────┬───────┘ │ │ │ │ │ │ │ │ │ fallback│ │ │ │ │ ▼ │ │ │ │ ┌──────────────┐ │ │ │ │ │ Tesseract │ │ │ │ │ │ (ron+eng) │ │ │ │ │ └──────────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌──────────────────────────────────────────────┐ │ │ │ ReceiptExtractor (Regex) │ │ │ │ - Amount patterns (TOTAL, DE PLATA) │ │ │ │ - Date patterns (DD.MM.YYYY) │ │ │ │ - CUI patterns (C.U.I., C.I.F.) │ │ │ │ - Vendor extraction (first lines) │ │ │ └──────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ExtractionResult + Confidence │ │ │ └─────────────────────────────────────────────────────┘ ``` ### Image Preprocessing Pipeline ```python def preprocess(image): 1. Convert to grayscale 2. Resize if width < 1000px (upscale for better OCR) 3. Deskew using Hough lines (straighten rotated text) 4. Denoise (Non-local means denoising) 5. Adaptive thresholding (binarization) 6. Morphological close (connect broken characters) return processed_image ``` ### Extraction Patterns (Romanian Receipts) | Pattern Type | Regex Examples | Confidence | |--------------|----------------|------------| | Amount | `TOTAL\s*:?\s*([\d.,]+)` | 0.95 | | Date | `(\d{2}[./]\d{2}[./]\d{4})` | 0.90 | | CUI | `C\.?U\.?I\.?\s*:?\s*(\d{6,10})` | 0.95 | | Receipt Number | `NR\.?\s*BON\s*:?\s*(\d+)` | 0.95 | | Vendor | First 5 non-keyword lines | 0.70 | ### OCR API Endpoints ``` GET /api/ocr/status # Check OCR availability POST /api/ocr/extract # Extract from uploaded image POST /api/ocr/extract-attachment/{id} # Re-process existing attachment ``` ### System Dependencies ```bash # Ubuntu/Debian apt-get install -y \ tesseract-ocr tesseract-ocr-ron tesseract-ocr-eng \ poppler-utils libgl1-mesa-glx libglib2.0-0 ``` ## Testing Strategy ### Unit Tests - CRUD operations - Workflow transitions - Entry generation logic - OCR extraction patterns ### Integration Tests - API endpoints - File upload/download - Oracle nomenclature fetch - OCR endpoint with sample receipts ### E2E Tests - Complete workflow: create → submit → approve - File upload cu preview - OCR extraction → form auto-fill