Data Entry App - Bonuri Fiscale
Aplicatie pentru introducere bonuri fiscale cu workflow de aprobare si extragere automata date prin OCR.
Quick Start
Prerequisites
- Python 3.10+
- Node.js 18+
- (Optional) SSH tunnel pentru Oracle nomenclatoare
Using Start Script (Recommended)
# Start all services
./start-data-entry.sh
# Or individual commands:
./start-data-entry.sh start # Start all
./start-data-entry.sh stop # Stop all
./start-data-entry.sh status # Check status
./start-data-entry.sh restart backend # Restart backend only
Services:
- Backend: http://localhost:8003
- Frontend: http://localhost:3010
- API Docs: http://localhost:8003/docs
Manual Setup
Backend Setup
cd backend/modules/data_entry/backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# sau: venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Create .env file
cp .env.example .env
# Edit .env with your settings
# Run migrations
alembic upgrade head
# Start server
uvicorn app.main:app --reload --port 8003
Frontend Setup
cd backend/modules/data_entry/frontend
# Install dependencies
npm install
# Start dev server
npm run dev -- --port 3010
Features
Pentru Utilizatori
- OCR Automat - Extragere automata date din poza bonului (suma, data, furnizor, CUI)
- Upload poze bonuri fiscale
- Completare date bon (suma, data, furnizor)
- Selectie tip cheltuiala
- Trimitere spre aprobare
Pentru Contabili
- Vizualizare bonuri in asteptare
- Editare note contabile propuse
- Aprobare/Respingere bonuri
- Aprobare in masa
OCR Feature
Cum functioneaza
- Upload imagine - Trage sau selecteaza poza bonului
- Procesare OCR - Click pe "Proceseaza cu OCR"
- Previzualizare - Datele extrase sunt afisate cu indicatori de incredere
- Aplicare - Click "Aplica datele in formular" pentru auto-fill
Campuri extrase automat
| Camp | Acuratete estimata |
|---|---|
| Suma (TOTAL) | 90-95% |
| Data | 85-90% |
| Numar bon | 80-85% |
| Furnizor | 70-80% |
| CUI | 85-90% |
| Tip document | 95%+ |
OCR System Dependencies (Linux/Docker)
Pentru functionarea OCR trebuie instalate:
# Ubuntu/Debian
apt-get install -y \
tesseract-ocr \
tesseract-ocr-ron \
tesseract-ocr-eng \
poppler-utils \
libgl1-mesa-glx \
libglib2.0-0
# Fedora/RHEL
dnf install -y \
tesseract \
tesseract-langpack-ron \
tesseract-langpack-eng \
poppler-utils
Note: PaddleOCR (engine principal) se instaleaza automat cu pip. Tesseract este folosit ca fallback.
OCR System Dependencies (Windows)
Pe Windows Server trebuie instalate manual urmatoarele componente:
1. Poppler (pentru conversie PDF → imagini)
# Descarca Poppler pentru Windows
# https://github.com/osborn/poppler-windows/releases
# sau https://github.com/bblanchon/pdfium-binaries
# Extrage in C:\Program Files\poppler\
# Adauga la PATH: C:\Program Files\poppler\Library\bin
2. Tesseract OCR (engine OCR backup)
# Descarca installer de la:
# https://github.com/UB-Mannheim/tesseract/wiki
# Instaleaza cu limbile: English + Romanian
# Default path: C:\Program Files\Tesseract-OCR\
# Adauga la PATH
3. Python OCR Dependencies (in venv)
cd C:\inetpub\wwwroot\roa2web\data-entry-backend
.\venv\Scripts\activate
# Instaleaza dependentele OCR
pip install paddlepaddle>=2.5.0
pip install paddleocr>=2.7.0
pip install opencv-python>=4.8.0
pip install pytesseract>=0.3.10
pip install pdf2image>=1.16.0
# Sau din requirements.txt
pip install -r requirements.txt
4. Restart serviciu
nssm restart ROA2WEB-DataEntry
Note importante Windows:
- Prima rulare PaddleOCR descarca modele (~200MB) - poate dura cateva minute
- PaddleOCR necesita ~2GB RAM disponibil
- Verifica PATH-ul pentru Poppler si Tesseract dupa instalare
- Restart serviciul backend dupa orice modificare PATH
OCR API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/ocr/status | Check OCR service status |
| POST | /api/ocr/extract | Extract data from uploaded image |
| POST | /api/ocr/extract-attachment/{id} | Re-process existing attachment |
Test OCR
# Check OCR status
curl http://localhost:8003/api/ocr/status
# Extract from image
curl -X POST -F "file=@bon.jpg" http://localhost:8003/api/ocr/extract
Workflow
DRAFT → PENDING_REVIEW → APPROVED/REJECTED → (SYNCED in Oracle)
- DRAFT: Utilizator completeaza datele (manual sau via OCR)
- PENDING_REVIEW: Sistemul genereaza note contabile automat
- APPROVED: Contabil a aprobat bonul
- REJECTED: Contabil a respins (utilizatorul poate corecta)
Project Structure
backend/modules/data_entry/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI entry point
│ │ ├── config.py # Settings
│ │ ├── db/
│ │ │ ├── database.py # SQLite engine
│ │ │ ├── models/ # SQLModel models
│ │ │ └── crud/ # CRUD operations
│ │ ├── schemas/ # Pydantic schemas
│ │ │ └── ocr.py # OCR response schemas
│ │ ├── services/
│ │ │ ├── receipt_service.py
│ │ │ ├── ocr_service.py # OCR orchestration
│ │ │ ├── ocr_engine.py # PaddleOCR/Tesseract
│ │ │ ├── ocr_extractor.py # Regex patterns RO
│ │ │ └── image_preprocessor.py # OpenCV pipeline
│ │ └── routers/
│ │ ├── receipts.py
│ │ └── ocr.py # OCR endpoints
│ ├── migrations/ # Alembic migrations
│ ├── data/
│ │ ├── receipts.db # SQLite database
│ │ └── uploads/ # Uploaded files
│ └── requirements.txt
│
├── frontend/
│ ├── src/
│ │ ├── views/receipts/ # Page components
│ │ ├── components/
│ │ │ ├── receipts/ # Receipt components
│ │ │ └── ocr/ # OCR components
│ │ │ ├── OCRUploadZone.vue
│ │ │ ├── OCRPreview.vue
│ │ │ └── OCRConfidenceIndicator.vue
│ │ ├── stores/ # Pinia stores
│ │ └── router/ # Vue Router
│ ├── package.json
│ └── vite.config.js
│
└── docs/ # Documentation
Environment Variables
Backend (.env)
# SQLite
SQLITE_DATABASE_PATH=data/receipts.db
# File uploads
UPLOAD_PATH=data/uploads
MAX_UPLOAD_SIZE_MB=10
# Oracle (for nomenclatures)
ORACLE_USER=CONTAFIN_ORACLE
ORACLE_PASSWORD=your_password
ORACLE_HOST=localhost
ORACLE_PORT=1526
ORACLE_SID=ROA
# JWT (shared with Reports module)
JWT_SECRET_KEY=your_secret_key
JWT_ALGORITHM=HS256
Development
Create new migration
cd backend
alembic revision --autogenerate -m "Add new field"
alembic upgrade head
Run tests
# Backend
cd backend && pytest
# Frontend
cd frontend && npm run test
API Documentation
Full API documentation available at http://localhost:8003/docs when backend is running.
Key Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/receipts/ | Create receipt |
| GET | /api/receipts/ | List receipts |
| GET | /api/receipts/{id} | Get receipt details |
| POST | /api/receipts/{id}/submit | Submit for review |
| POST | /api/receipts/{id}/approve | Approve receipt |
| POST | /api/receipts/{id}/reject | Reject receipt |
| POST | /api/receipts/{id}/attachments | Upload attachment |
| GET | /api/ocr/status | OCR service status |
| POST | /api/ocr/extract | OCR image extraction |
Troubleshooting
OCR not working
- Check OCR status:
curl http://localhost:8003/api/ocr/status - Install system dependencies (tesseract, poppler)
- Verify PaddleOCR installed:
python -c "from paddleocr import PaddleOCR"
OCR Windows - "poppler not in PATH"
# Eroare: "Unable to get page count. Is poppler installed and in PATH?"
# Solutie 1: Adauga Poppler la PATH
# System Properties → Environment Variables → System variables → Path → New
# Adauga: C:\Program Files\poppler\Library\bin
# Solutie 2: Restart serviciul dupa modificarea PATH
nssm restart ROA2WEB-DataEntry
# Verificare:
pdfinfo --version
OCR Windows - "tesseract not found"
# Eroare: "tesseract is not installed or it's not in your PATH"
# Solutie: Adauga Tesseract la PATH
# C:\Program Files\Tesseract-OCR\
# Verificare:
tesseract --version
tesseract --list-langs # Trebuie sa arate 'ron' si 'eng'
OCR Windows - PaddleOCR import error
# Eroare: "No module named 'paddleocr'"
cd C:\inetpub\wwwroot\roa2web\data-entry-backend
.\venv\Scripts\activate
pip install paddlepaddle>=2.5.0
pip install paddleocr>=2.7.0
# Restart serviciu
nssm restart ROA2WEB-DataEntry
Low OCR accuracy
- Ensure good lighting when taking receipt photos
- Keep receipt flat (no folds/wrinkles)
- Try PDF instead of JPG for scanned documents
- Check if text is in focus
Phase 2 (Future)
- Oracle sync for approved receipts
- Integration with pack_contafin procedures
- Automatic posting to ACT/RUL tables