- Delete data-entry-app/ (1.6GB), reports-app/ (447MB), .auto-build-data/
- Saved ~1.4GB disk space (64% reduction: 2.2GB → 845MB)
Updated references across 38 files:
- .claude/rules/ paths: backend/modules/, src/modules/
- .claude/commands/validate.md: all validation paths
- docs/ (13 files): data-entry, telegram, README, CLAUDE.md
- scripts/ (3 files): backup-secrets, restore-secrets, test-docker
- security/ (2 files): git_cleanup, SECURITY_PROCEDURES
- deployment/ & shared/: updated all stale comments
All paths now reflect ultrathin monolith architecture:
- Backend: backend/modules/{reports,data_entry,telegram}/
- Frontend: src/modules/{reports,data-entry}/
- Shared: shared/{auth,database,routes}/
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
383 lines
9.5 KiB
Markdown
383 lines
9.5 KiB
Markdown
# Data Entry App - Bonuri Fiscale
|
|
|
|
Aplicatie pentru introducere bonuri fiscale cu workflow de aprobare si extragere automata date prin OCR.
|
|
|
|
## Quick Start
|
|
|
|
### Prerequisites
|
|
|
|
- Python 3.10+
|
|
- Node.js 18+
|
|
- (Optional) SSH tunnel pentru Oracle nomenclatoare
|
|
|
|
### Using Start Script (Recommended)
|
|
|
|
```bash
|
|
# Start all services
|
|
./start-data-entry.sh
|
|
|
|
# Or individual commands:
|
|
./start-data-entry.sh start # Start all
|
|
./start-data-entry.sh stop # Stop all
|
|
./start-data-entry.sh status # Check status
|
|
./start-data-entry.sh restart backend # Restart backend only
|
|
```
|
|
|
|
**Services:**
|
|
- Backend: http://localhost:8003
|
|
- Frontend: http://localhost:3010
|
|
- API Docs: http://localhost:8003/docs
|
|
|
|
### Manual Setup
|
|
|
|
#### Backend Setup
|
|
|
|
```bash
|
|
cd backend/modules/data_entry/backend
|
|
|
|
# Create virtual environment
|
|
python -m venv venv
|
|
source venv/bin/activate # Linux/Mac
|
|
# sau: venv\Scripts\activate # Windows
|
|
|
|
# Install dependencies
|
|
pip install -r requirements.txt
|
|
|
|
# Create .env file
|
|
cp .env.example .env
|
|
# Edit .env with your settings
|
|
|
|
# Run migrations
|
|
alembic upgrade head
|
|
|
|
# Start server
|
|
uvicorn app.main:app --reload --port 8003
|
|
```
|
|
|
|
#### Frontend Setup
|
|
|
|
```bash
|
|
cd backend/modules/data_entry/frontend
|
|
|
|
# Install dependencies
|
|
npm install
|
|
|
|
# Start dev server
|
|
npm run dev -- --port 3010
|
|
```
|
|
|
|
## Features
|
|
|
|
### Pentru Utilizatori
|
|
- **OCR Automat** - Extragere automata date din poza bonului (suma, data, furnizor, CUI)
|
|
- Upload poze bonuri fiscale
|
|
- Completare date bon (suma, data, furnizor)
|
|
- Selectie tip cheltuiala
|
|
- Trimitere spre aprobare
|
|
|
|
### Pentru Contabili
|
|
- Vizualizare bonuri in asteptare
|
|
- Editare note contabile propuse
|
|
- Aprobare/Respingere bonuri
|
|
- Aprobare in masa
|
|
|
|
## OCR Feature
|
|
|
|
### Cum functioneaza
|
|
|
|
1. **Upload imagine** - Trage sau selecteaza poza bonului
|
|
2. **Procesare OCR** - Click pe "Proceseaza cu OCR"
|
|
3. **Previzualizare** - Datele extrase sunt afisate cu indicatori de incredere
|
|
4. **Aplicare** - Click "Aplica datele in formular" pentru auto-fill
|
|
|
|
### Campuri extrase automat
|
|
|
|
| Camp | Acuratete estimata |
|
|
|------|-------------------|
|
|
| Suma (TOTAL) | 90-95% |
|
|
| Data | 85-90% |
|
|
| Numar bon | 80-85% |
|
|
| Furnizor | 70-80% |
|
|
| CUI | 85-90% |
|
|
| Tip document | 95%+ |
|
|
|
|
### OCR System Dependencies (Linux/Docker)
|
|
|
|
Pentru functionarea OCR trebuie instalate:
|
|
|
|
```bash
|
|
# Ubuntu/Debian
|
|
apt-get install -y \
|
|
tesseract-ocr \
|
|
tesseract-ocr-ron \
|
|
tesseract-ocr-eng \
|
|
poppler-utils \
|
|
libgl1-mesa-glx \
|
|
libglib2.0-0
|
|
|
|
# Fedora/RHEL
|
|
dnf install -y \
|
|
tesseract \
|
|
tesseract-langpack-ron \
|
|
tesseract-langpack-eng \
|
|
poppler-utils
|
|
```
|
|
|
|
**Note:** PaddleOCR (engine principal) se instaleaza automat cu pip. Tesseract este folosit ca fallback.
|
|
|
|
### OCR System Dependencies (Windows)
|
|
|
|
Pe Windows Server trebuie instalate manual urmatoarele componente:
|
|
|
|
#### 1. Poppler (pentru conversie PDF → imagini)
|
|
|
|
```powershell
|
|
# Descarca Poppler pentru Windows
|
|
# https://github.com/osborn/poppler-windows/releases
|
|
# sau https://github.com/bblanchon/pdfium-binaries
|
|
|
|
# Extrage in C:\Program Files\poppler\
|
|
# Adauga la PATH: C:\Program Files\poppler\Library\bin
|
|
```
|
|
|
|
#### 2. Tesseract OCR (engine OCR backup)
|
|
|
|
```powershell
|
|
# Descarca installer de la:
|
|
# https://github.com/UB-Mannheim/tesseract/wiki
|
|
|
|
# Instaleaza cu limbile: English + Romanian
|
|
# Default path: C:\Program Files\Tesseract-OCR\
|
|
# Adauga la PATH
|
|
```
|
|
|
|
#### 3. Python OCR Dependencies (in venv)
|
|
|
|
```powershell
|
|
cd C:\inetpub\wwwroot\roa2web\data-entry-backend
|
|
.\venv\Scripts\activate
|
|
|
|
# Instaleaza dependentele OCR
|
|
pip install paddlepaddle>=2.5.0
|
|
pip install paddleocr>=2.7.0
|
|
pip install opencv-python>=4.8.0
|
|
pip install pytesseract>=0.3.10
|
|
pip install pdf2image>=1.16.0
|
|
|
|
# Sau din requirements.txt
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
#### 4. Restart serviciu
|
|
|
|
```powershell
|
|
nssm restart ROA2WEB-DataEntry
|
|
```
|
|
|
|
**Note importante Windows:**
|
|
- Prima rulare PaddleOCR descarca modele (~200MB) - poate dura cateva minute
|
|
- PaddleOCR necesita ~2GB RAM disponibil
|
|
- Verifica PATH-ul pentru Poppler si Tesseract dupa instalare
|
|
- Restart serviciul backend dupa orice modificare PATH
|
|
|
|
### OCR API Endpoints
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | /api/ocr/status | Check OCR service status |
|
|
| POST | /api/ocr/extract | Extract data from uploaded image |
|
|
| POST | /api/ocr/extract-attachment/{id} | Re-process existing attachment |
|
|
|
|
### Test OCR
|
|
|
|
```bash
|
|
# Check OCR status
|
|
curl http://localhost:8003/api/ocr/status
|
|
|
|
# Extract from image
|
|
curl -X POST -F "file=@bon.jpg" http://localhost:8003/api/ocr/extract
|
|
```
|
|
|
|
## Workflow
|
|
|
|
```
|
|
DRAFT → PENDING_REVIEW → APPROVED/REJECTED → (SYNCED in Oracle)
|
|
```
|
|
|
|
1. **DRAFT**: Utilizator completeaza datele (manual sau via OCR)
|
|
2. **PENDING_REVIEW**: Sistemul genereaza note contabile automat
|
|
3. **APPROVED**: Contabil a aprobat bonul
|
|
4. **REJECTED**: Contabil a respins (utilizatorul poate corecta)
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
backend/modules/data_entry/
|
|
├── backend/
|
|
│ ├── app/
|
|
│ │ ├── main.py # FastAPI entry point
|
|
│ │ ├── config.py # Settings
|
|
│ │ ├── db/
|
|
│ │ │ ├── database.py # SQLite engine
|
|
│ │ │ ├── models/ # SQLModel models
|
|
│ │ │ └── crud/ # CRUD operations
|
|
│ │ ├── schemas/ # Pydantic schemas
|
|
│ │ │ └── ocr.py # OCR response schemas
|
|
│ │ ├── services/
|
|
│ │ │ ├── receipt_service.py
|
|
│ │ │ ├── ocr_service.py # OCR orchestration
|
|
│ │ │ ├── ocr_engine.py # PaddleOCR/Tesseract
|
|
│ │ │ ├── ocr_extractor.py # Regex patterns RO
|
|
│ │ │ └── image_preprocessor.py # OpenCV pipeline
|
|
│ │ └── routers/
|
|
│ │ ├── receipts.py
|
|
│ │ └── ocr.py # OCR endpoints
|
|
│ ├── migrations/ # Alembic migrations
|
|
│ ├── data/
|
|
│ │ ├── receipts.db # SQLite database
|
|
│ │ └── uploads/ # Uploaded files
|
|
│ └── requirements.txt
|
|
│
|
|
├── frontend/
|
|
│ ├── src/
|
|
│ │ ├── views/receipts/ # Page components
|
|
│ │ ├── components/
|
|
│ │ │ ├── receipts/ # Receipt components
|
|
│ │ │ └── ocr/ # OCR components
|
|
│ │ │ ├── OCRUploadZone.vue
|
|
│ │ │ ├── OCRPreview.vue
|
|
│ │ │ └── OCRConfidenceIndicator.vue
|
|
│ │ ├── stores/ # Pinia stores
|
|
│ │ └── router/ # Vue Router
|
|
│ ├── package.json
|
|
│ └── vite.config.js
|
|
│
|
|
└── docs/ # Documentation
|
|
```
|
|
|
|
## Environment Variables
|
|
|
|
### Backend (.env)
|
|
|
|
```bash
|
|
# SQLite
|
|
SQLITE_DATABASE_PATH=data/receipts.db
|
|
|
|
# File uploads
|
|
UPLOAD_PATH=data/uploads
|
|
MAX_UPLOAD_SIZE_MB=10
|
|
|
|
# Oracle (for nomenclatures)
|
|
ORACLE_USER=CONTAFIN_ORACLE
|
|
ORACLE_PASSWORD=your_password
|
|
ORACLE_HOST=localhost
|
|
ORACLE_PORT=1526
|
|
ORACLE_SID=ROA
|
|
|
|
# JWT (shared with Reports module)
|
|
JWT_SECRET_KEY=your_secret_key
|
|
JWT_ALGORITHM=HS256
|
|
```
|
|
|
|
## Development
|
|
|
|
### Create new migration
|
|
|
|
```bash
|
|
cd backend
|
|
alembic revision --autogenerate -m "Add new field"
|
|
alembic upgrade head
|
|
```
|
|
|
|
### Run tests
|
|
|
|
```bash
|
|
# Backend
|
|
cd backend && pytest
|
|
|
|
# Frontend
|
|
cd frontend && npm run test
|
|
```
|
|
|
|
## API Documentation
|
|
|
|
Full API documentation available at http://localhost:8003/docs when backend is running.
|
|
|
|
### Key Endpoints
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| POST | /api/receipts/ | Create receipt |
|
|
| GET | /api/receipts/ | List receipts |
|
|
| GET | /api/receipts/{id} | Get receipt details |
|
|
| POST | /api/receipts/{id}/submit | Submit for review |
|
|
| POST | /api/receipts/{id}/approve | Approve receipt |
|
|
| POST | /api/receipts/{id}/reject | Reject receipt |
|
|
| POST | /api/receipts/{id}/attachments | Upload attachment |
|
|
| GET | /api/ocr/status | OCR service status |
|
|
| POST | /api/ocr/extract | OCR image extraction |
|
|
|
|
## Troubleshooting
|
|
|
|
### OCR not working
|
|
|
|
1. Check OCR status: `curl http://localhost:8003/api/ocr/status`
|
|
2. Install system dependencies (tesseract, poppler)
|
|
3. Verify PaddleOCR installed: `python -c "from paddleocr import PaddleOCR"`
|
|
|
|
### OCR Windows - "poppler not in PATH"
|
|
|
|
```powershell
|
|
# Eroare: "Unable to get page count. Is poppler installed and in PATH?"
|
|
|
|
# Solutie 1: Adauga Poppler la PATH
|
|
# System Properties → Environment Variables → System variables → Path → New
|
|
# Adauga: C:\Program Files\poppler\Library\bin
|
|
|
|
# Solutie 2: Restart serviciul dupa modificarea PATH
|
|
nssm restart ROA2WEB-DataEntry
|
|
|
|
# Verificare:
|
|
pdfinfo --version
|
|
```
|
|
|
|
### OCR Windows - "tesseract not found"
|
|
|
|
```powershell
|
|
# Eroare: "tesseract is not installed or it's not in your PATH"
|
|
|
|
# Solutie: Adauga Tesseract la PATH
|
|
# C:\Program Files\Tesseract-OCR\
|
|
|
|
# Verificare:
|
|
tesseract --version
|
|
tesseract --list-langs # Trebuie sa arate 'ron' si 'eng'
|
|
```
|
|
|
|
### OCR Windows - PaddleOCR import error
|
|
|
|
```powershell
|
|
# Eroare: "No module named 'paddleocr'"
|
|
|
|
cd C:\inetpub\wwwroot\roa2web\data-entry-backend
|
|
.\venv\Scripts\activate
|
|
pip install paddlepaddle>=2.5.0
|
|
pip install paddleocr>=2.7.0
|
|
|
|
# Restart serviciu
|
|
nssm restart ROA2WEB-DataEntry
|
|
```
|
|
|
|
### Low OCR accuracy
|
|
|
|
- Ensure good lighting when taking receipt photos
|
|
- Keep receipt flat (no folds/wrinkles)
|
|
- Try PDF instead of JPG for scanned documents
|
|
- Check if text is in focus
|
|
|
|
## Phase 2 (Future)
|
|
|
|
- Oracle sync for approved receipts
|
|
- Integration with pack_contafin procedures
|
|
- Automatic posting to ACT/RUL tables
|