# Data Entry App - Bonuri Fiscale Aplicatie pentru introducere bonuri fiscale cu workflow de aprobare si extragere automata date prin OCR. ## Quick Start ### Prerequisites - Python 3.10+ - Node.js 18+ - (Optional) SSH tunnel pentru Oracle nomenclatoare ### Using Start Script (Recommended) ```bash # Start all services ./start-data-entry.sh # Or individual commands: ./start-data-entry.sh start # Start all ./start-data-entry.sh stop # Stop all ./start-data-entry.sh status # Check status ./start-data-entry.sh restart backend # Restart backend only ``` **Services:** - Backend: http://localhost:8003 - Frontend: http://localhost:3010 - API Docs: http://localhost:8003/docs ### Manual Setup #### Backend Setup ```bash cd backend/modules/data_entry/backend # Create virtual environment python -m venv venv source venv/bin/activate # Linux/Mac # sau: venv\Scripts\activate # Windows # Install dependencies pip install -r requirements.txt # Create .env file cp .env.example .env # Edit .env with your settings # Run migrations alembic upgrade head # Start server uvicorn app.main:app --reload --port 8003 ``` #### Frontend Setup ```bash cd backend/modules/data_entry/frontend # Install dependencies npm install # Start dev server npm run dev -- --port 3010 ``` ## Features ### Pentru Utilizatori - **OCR Automat** - Extragere automata date din poza bonului (suma, data, furnizor, CUI) - Upload poze bonuri fiscale - Completare date bon (suma, data, furnizor) - Selectie tip cheltuiala - Trimitere spre aprobare ### Pentru Contabili - Vizualizare bonuri in asteptare - Editare note contabile propuse - Aprobare/Respingere bonuri - Aprobare in masa ## OCR Feature ### Cum functioneaza 1. **Upload imagine** - Trage sau selecteaza poza bonului 2. **Procesare OCR** - Click pe "Proceseaza cu OCR" 3. **Previzualizare** - Datele extrase sunt afisate cu indicatori de incredere 4. **Aplicare** - Click "Aplica datele in formular" pentru auto-fill ### Campuri extrase automat | Camp | Acuratete estimata | |------|-------------------| | Suma (TOTAL) | 90-95% | | Data | 85-90% | | Numar bon | 80-85% | | Furnizor | 70-80% | | CUI | 85-90% | | Tip document | 95%+ | ### OCR System Dependencies (Linux/Docker) Pentru functionarea OCR trebuie instalate: ```bash # Ubuntu/Debian apt-get install -y \ tesseract-ocr \ tesseract-ocr-ron \ tesseract-ocr-eng \ poppler-utils \ libgl1-mesa-glx \ libglib2.0-0 # Fedora/RHEL dnf install -y \ tesseract \ tesseract-langpack-ron \ tesseract-langpack-eng \ poppler-utils ``` **Note:** PaddleOCR (engine principal) se instaleaza automat cu pip. Tesseract este folosit ca fallback. ### OCR System Dependencies (Windows) Pe Windows Server trebuie instalate manual urmatoarele componente: #### 1. Poppler (pentru conversie PDF → imagini) ```powershell # Descarca Poppler pentru Windows # https://github.com/osborn/poppler-windows/releases # sau https://github.com/bblanchon/pdfium-binaries # Extrage in C:\Program Files\poppler\ # Adauga la PATH: C:\Program Files\poppler\Library\bin ``` #### 2. Tesseract OCR (engine OCR backup) ```powershell # Descarca installer de la: # https://github.com/UB-Mannheim/tesseract/wiki # Instaleaza cu limbile: English + Romanian # Default path: C:\Program Files\Tesseract-OCR\ # Adauga la PATH ``` #### 3. Python OCR Dependencies (in venv) ```powershell cd C:\inetpub\wwwroot\roa2web\data-entry-backend .\venv\Scripts\activate # Instaleaza dependentele OCR pip install paddlepaddle>=2.5.0 pip install paddleocr>=2.7.0 pip install opencv-python>=4.8.0 pip install pytesseract>=0.3.10 pip install pdf2image>=1.16.0 # Sau din requirements.txt pip install -r requirements.txt ``` #### 4. Restart serviciu ```powershell nssm restart ROA2WEB-DataEntry ``` **Note importante Windows:** - Prima rulare PaddleOCR descarca modele (~200MB) - poate dura cateva minute - PaddleOCR necesita ~2GB RAM disponibil - Verifica PATH-ul pentru Poppler si Tesseract dupa instalare - Restart serviciul backend dupa orice modificare PATH ### OCR API Endpoints | Method | Endpoint | Description | |--------|----------|-------------| | GET | /api/ocr/status | Check OCR service status | | POST | /api/ocr/extract | Extract data from uploaded image | | POST | /api/ocr/extract-attachment/{id} | Re-process existing attachment | ### Test OCR ```bash # Check OCR status curl http://localhost:8003/api/ocr/status # Extract from image curl -X POST -F "file=@bon.jpg" http://localhost:8003/api/ocr/extract ``` ## Workflow ``` DRAFT → PENDING_REVIEW → APPROVED/REJECTED → (SYNCED in Oracle) ``` 1. **DRAFT**: Utilizator completeaza datele (manual sau via OCR) 2. **PENDING_REVIEW**: Sistemul genereaza note contabile automat 3. **APPROVED**: Contabil a aprobat bonul 4. **REJECTED**: Contabil a respins (utilizatorul poate corecta) ## Project Structure ``` backend/modules/data_entry/ ├── backend/ │ ├── app/ │ │ ├── main.py # FastAPI entry point │ │ ├── config.py # Settings │ │ ├── db/ │ │ │ ├── database.py # SQLite engine │ │ │ ├── models/ # SQLModel models │ │ │ └── crud/ # CRUD operations │ │ ├── schemas/ # Pydantic schemas │ │ │ └── ocr.py # OCR response schemas │ │ ├── services/ │ │ │ ├── receipt_service.py │ │ │ ├── ocr_service.py # OCR orchestration │ │ │ ├── ocr_engine.py # PaddleOCR/Tesseract │ │ │ ├── ocr_extractor.py # Regex patterns RO │ │ │ └── image_preprocessor.py # OpenCV pipeline │ │ └── routers/ │ │ ├── receipts.py │ │ └── ocr.py # OCR endpoints │ ├── migrations/ # Alembic migrations │ ├── data/ │ │ ├── receipts.db # SQLite database │ │ └── uploads/ # Uploaded files │ └── requirements.txt │ ├── frontend/ │ ├── src/ │ │ ├── views/receipts/ # Page components │ │ ├── components/ │ │ │ ├── receipts/ # Receipt components │ │ │ └── ocr/ # OCR components │ │ │ ├── OCRUploadZone.vue │ │ │ ├── OCRPreview.vue │ │ │ └── OCRConfidenceIndicator.vue │ │ ├── stores/ # Pinia stores │ │ └── router/ # Vue Router │ ├── package.json │ └── vite.config.js │ └── docs/ # Documentation ``` ## Environment Variables ### Backend (.env) ```bash # SQLite SQLITE_DATABASE_PATH=data/receipts.db # File uploads UPLOAD_PATH=data/uploads MAX_UPLOAD_SIZE_MB=10 # Oracle (for nomenclatures) ORACLE_USER=CONTAFIN_ORACLE ORACLE_PASSWORD=your_password ORACLE_HOST=localhost ORACLE_PORT=1526 ORACLE_SID=ROA # JWT (shared with Reports module) JWT_SECRET_KEY=your_secret_key JWT_ALGORITHM=HS256 ``` ## Development ### Create new migration ```bash cd backend alembic revision --autogenerate -m "Add new field" alembic upgrade head ``` ### Run tests ```bash # Backend cd backend && pytest # Frontend cd frontend && npm run test ``` ## API Documentation Full API documentation available at http://localhost:8003/docs when backend is running. ### Key Endpoints | Method | Endpoint | Description | |--------|----------|-------------| | POST | /api/receipts/ | Create receipt | | GET | /api/receipts/ | List receipts | | GET | /api/receipts/{id} | Get receipt details | | POST | /api/receipts/{id}/submit | Submit for review | | POST | /api/receipts/{id}/approve | Approve receipt | | POST | /api/receipts/{id}/reject | Reject receipt | | POST | /api/receipts/{id}/attachments | Upload attachment | | GET | /api/ocr/status | OCR service status | | POST | /api/ocr/extract | OCR image extraction | ## Troubleshooting ### OCR not working 1. Check OCR status: `curl http://localhost:8003/api/ocr/status` 2. Install system dependencies (tesseract, poppler) 3. Verify PaddleOCR installed: `python -c "from paddleocr import PaddleOCR"` ### OCR Windows - "poppler not in PATH" ```powershell # Eroare: "Unable to get page count. Is poppler installed and in PATH?" # Solutie 1: Adauga Poppler la PATH # System Properties → Environment Variables → System variables → Path → New # Adauga: C:\Program Files\poppler\Library\bin # Solutie 2: Restart serviciul dupa modificarea PATH nssm restart ROA2WEB-DataEntry # Verificare: pdfinfo --version ``` ### OCR Windows - "tesseract not found" ```powershell # Eroare: "tesseract is not installed or it's not in your PATH" # Solutie: Adauga Tesseract la PATH # C:\Program Files\Tesseract-OCR\ # Verificare: tesseract --version tesseract --list-langs # Trebuie sa arate 'ron' si 'eng' ``` ### OCR Windows - PaddleOCR import error ```powershell # Eroare: "No module named 'paddleocr'" cd C:\inetpub\wwwroot\roa2web\data-entry-backend .\venv\Scripts\activate pip install paddlepaddle>=2.5.0 pip install paddleocr>=2.7.0 # Restart serviciu nssm restart ROA2WEB-DataEntry ``` ### Low OCR accuracy - Ensure good lighting when taking receipt photos - Keep receipt flat (no folds/wrinkles) - Try PDF instead of JPG for scanned documents - Check if text is in focus ## Phase 2 (Future) - Oracle sync for approved receipts - Integration with pack_contafin procedures - Automatic posting to ACT/RUL tables