# Data Entry App - Bonuri Fiscale

Aplicatie pentru introducere bonuri fiscale cu workflow de aprobare si extragere automata date prin OCR.

## Quick Start

### Prerequisites

- Python 3.10+
- Node.js 18+
- (Optional) SSH tunnel pentru Oracle nomenclatoare

### Using Start Script (Recommended)

```bash
# Start all services
./start-data-entry.sh

# Or individual commands:
./start-data-entry.sh start              # Start all
./start-data-entry.sh stop               # Stop all
./start-data-entry.sh status             # Check status
./start-data-entry.sh restart backend    # Restart backend only
```

**Services:**
- Backend: http://localhost:8003
- Frontend: http://localhost:3010
- API Docs: http://localhost:8003/docs

### Manual Setup

#### Backend Setup

```bash
cd backend/modules/data_entry/backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# sau: venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

# Create .env file
cp .env.example .env
# Edit .env with your settings

# Run migrations
alembic upgrade head

# Start server
uvicorn app.main:app --reload --port 8003
```

#### Frontend Setup

```bash
cd backend/modules/data_entry/frontend

# Install dependencies
npm install

# Start dev server
npm run dev -- --port 3010
```

## Features

### Pentru Utilizatori
- **OCR Automat** - Extragere automata date din poza bonului (suma, data, furnizor, CUI)
- Upload poze bonuri fiscale
- Completare date bon (suma, data, furnizor)
- Selectie tip cheltuiala
- Trimitere spre aprobare

### Pentru Contabili
- Vizualizare bonuri in asteptare
- Editare note contabile propuse
- Aprobare/Respingere bonuri
- Aprobare in masa

## OCR Feature

### Cum functioneaza

1. **Upload imagine** - Trage sau selecteaza poza bonului
2. **Procesare OCR** - Click pe "Proceseaza cu OCR"
3. **Previzualizare** - Datele extrase sunt afisate cu indicatori de incredere
4. **Aplicare** - Click "Aplica datele in formular" pentru auto-fill

### Campuri extrase automat

| Camp | Acuratete estimata |
|------|-------------------|
| Suma (TOTAL) | 90-95% |
| Data | 85-90% |
| Numar bon | 80-85% |
| Furnizor | 70-80% |
| CUI | 85-90% |
| Tip document | 95%+ |

### OCR System Dependencies (Linux/Docker)

Pentru functionarea OCR trebuie instalate:

```bash
# Ubuntu/Debian
apt-get install -y \
    tesseract-ocr \
    tesseract-ocr-ron \
    tesseract-ocr-eng \
    poppler-utils \
    libgl1-mesa-glx \
    libglib2.0-0

# Fedora/RHEL
dnf install -y \
    tesseract \
    tesseract-langpack-ron \
    tesseract-langpack-eng \
    poppler-utils
```

**Note:** PaddleOCR (engine principal) se instaleaza automat cu pip. Tesseract este folosit ca fallback.

### OCR System Dependencies (Windows)

Pe Windows Server trebuie instalate manual urmatoarele componente:

#### 1. Poppler (pentru conversie PDF → imagini)

```powershell
# Descarca Poppler pentru Windows
# https://github.com/osborn/poppler-windows/releases
# sau https://github.com/bblanchon/pdfium-binaries

# Extrage in C:\Program Files\poppler\
# Adauga la PATH: C:\Program Files\poppler\Library\bin
```

#### 2. Tesseract OCR (engine OCR backup)

```powershell
# Descarca installer de la:
# https://github.com/UB-Mannheim/tesseract/wiki

# Instaleaza cu limbile: English + Romanian
# Default path: C:\Program Files\Tesseract-OCR\
# Adauga la PATH
```

#### 3. Python OCR Dependencies (in venv)

```powershell
cd C:\inetpub\wwwroot\roa2web\data-entry-backend
.\venv\Scripts\activate

# Instaleaza dependentele OCR
pip install paddlepaddle>=2.5.0
pip install paddleocr>=2.7.0
pip install opencv-python>=4.8.0
pip install pytesseract>=0.3.10
pip install pdf2image>=1.16.0

# Sau din requirements.txt
pip install -r requirements.txt
```

#### 4. Restart serviciu

```powershell
nssm restart ROA2WEB-DataEntry
```

**Note importante Windows:**
- Prima rulare PaddleOCR descarca modele (~200MB) - poate dura cateva minute
- PaddleOCR necesita ~2GB RAM disponibil
- Verifica PATH-ul pentru Poppler si Tesseract dupa instalare
- Restart serviciul backend dupa orice modificare PATH

### OCR API Endpoints

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | /api/ocr/status | Check OCR service status |
| POST | /api/ocr/extract | Extract data from uploaded image |
| POST | /api/ocr/extract-attachment/{id} | Re-process existing attachment |

### Test OCR

```bash
# Check OCR status
curl http://localhost:8003/api/ocr/status

# Extract from image
curl -X POST -F "file=@bon.jpg" http://localhost:8003/api/ocr/extract
```

## Workflow

```
DRAFT → PENDING_REVIEW → APPROVED/REJECTED → (SYNCED in Oracle)
```

1. **DRAFT**: Utilizator completeaza datele (manual sau via OCR)
2. **PENDING_REVIEW**: Sistemul genereaza note contabile automat
3. **APPROVED**: Contabil a aprobat bonul
4. **REJECTED**: Contabil a respins (utilizatorul poate corecta)

## Project Structure

```
backend/modules/data_entry/
├── backend/
│   ├── app/
│   │   ├── main.py              # FastAPI entry point
│   │   ├── config.py            # Settings
│   │   ├── db/
│   │   │   ├── database.py      # SQLite engine
│   │   │   ├── models/          # SQLModel models
│   │   │   └── crud/            # CRUD operations
│   │   ├── schemas/             # Pydantic schemas
│   │   │   └── ocr.py           # OCR response schemas
│   │   ├── services/
│   │   │   ├── receipt_service.py
│   │   │   ├── ocr_service.py       # OCR orchestration
│   │   │   ├── ocr_engine.py        # PaddleOCR/Tesseract
│   │   │   ├── ocr_extractor.py     # Regex patterns RO
│   │   │   └── image_preprocessor.py # OpenCV pipeline
│   │   └── routers/
│   │       ├── receipts.py
│   │       └── ocr.py           # OCR endpoints
│   ├── migrations/              # Alembic migrations
│   ├── data/
│   │   ├── receipts.db          # SQLite database
│   │   └── uploads/             # Uploaded files
│   └── requirements.txt
│
├── frontend/
│   ├── src/
│   │   ├── views/receipts/      # Page components
│   │   ├── components/
│   │   │   ├── receipts/        # Receipt components
│   │   │   └── ocr/             # OCR components
│   │   │       ├── OCRUploadZone.vue
│   │   │       ├── OCRPreview.vue
│   │   │       └── OCRConfidenceIndicator.vue
│   │   ├── stores/              # Pinia stores
│   │   └── router/              # Vue Router
│   ├── package.json
│   └── vite.config.js
│
└── docs/                        # Documentation
```

## Environment Variables

### Backend (.env)

```bash
# SQLite
SQLITE_DATABASE_PATH=data/receipts.db

# File uploads
UPLOAD_PATH=data/uploads
MAX_UPLOAD_SIZE_MB=10

# Oracle (for nomenclatures)
ORACLE_USER=CONTAFIN_ORACLE
ORACLE_PASSWORD=your_password
ORACLE_HOST=localhost
ORACLE_PORT=1526
ORACLE_SID=ROA

# JWT (shared with Reports module)
JWT_SECRET_KEY=your_secret_key
JWT_ALGORITHM=HS256
```

## Development

### Create new migration

```bash
cd backend
alembic revision --autogenerate -m "Add new field"
alembic upgrade head
```

### Run tests

```bash
# Backend
cd backend && pytest

# Frontend
cd frontend && npm run test
```

## API Documentation

Full API documentation available at http://localhost:8003/docs when backend is running.

### Key Endpoints

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | /api/receipts/ | Create receipt |
| GET | /api/receipts/ | List receipts |
| GET | /api/receipts/{id} | Get receipt details |
| POST | /api/receipts/{id}/submit | Submit for review |
| POST | /api/receipts/{id}/approve | Approve receipt |
| POST | /api/receipts/{id}/reject | Reject receipt |
| POST | /api/receipts/{id}/attachments | Upload attachment |
| GET | /api/ocr/status | OCR service status |
| POST | /api/ocr/extract | OCR image extraction |

## Troubleshooting

### OCR not working

1. Check OCR status: `curl http://localhost:8003/api/ocr/status`
2. Install system dependencies (tesseract, poppler)
3. Verify PaddleOCR installed: `python -c "from paddleocr import PaddleOCR"`

### OCR Windows - "poppler not in PATH"

```powershell
# Eroare: "Unable to get page count. Is poppler installed and in PATH?"

# Solutie 1: Adauga Poppler la PATH
# System Properties → Environment Variables → System variables → Path → New
# Adauga: C:\Program Files\poppler\Library\bin

# Solutie 2: Restart serviciul dupa modificarea PATH
nssm restart ROA2WEB-DataEntry

# Verificare:
pdfinfo --version
```

### OCR Windows - "tesseract not found"

```powershell
# Eroare: "tesseract is not installed or it's not in your PATH"

# Solutie: Adauga Tesseract la PATH
# C:\Program Files\Tesseract-OCR\

# Verificare:
tesseract --version
tesseract --list-langs  # Trebuie sa arate 'ron' si 'eng'
```

### OCR Windows - PaddleOCR import error

```powershell
# Eroare: "No module named 'paddleocr'"

cd C:\inetpub\wwwroot\roa2web\data-entry-backend
.\venv\Scripts\activate
pip install paddlepaddle>=2.5.0
pip install paddleocr>=2.7.0

# Restart serviciu
nssm restart ROA2WEB-DataEntry
```

### Low OCR accuracy

- Ensure good lighting when taking receipt photos
- Keep receipt flat (no folds/wrinkles)
- Try PDF instead of JPG for scanned documents
- Check if text is in focus

## Phase 2 (Future)

- Oracle sync for approved receipts
- Integration with pack_contafin procedures
- Automatic posting to ACT/RUL tables