feat: Migrate to ultrathin monolith architecture
Consolidate 3 separate applications (reports-app, data-entry-app, telegram-bot) into a unified
architecture with single backend and frontend:
Backend Changes:
- Unified FastAPI backend at backend/ with modular structure
- Modules: reports, data_entry, telegram in backend/modules/
- Centralized config.py and main.py with all routers registered
- Single worker mode (--workers 1) for Telegram bot compatibility
- Shared Oracle connection pool and JWT authentication
- Unified requirements.txt and environment configuration
Frontend Changes:
- Single Vue.js SPA with module-based routing
- Unified frontend at src/ with modules in src/modules/{reports,data-entry}/
- Shared components and stores in src/shared/
- Error boundaries for module isolation
- Dual API proxy in Vite for module communication
Infrastructure:
- New unified startup scripts: start-prod.sh, start-test.sh, start-backend.sh
- Environment templates: .env.dev.example, .env.test.example, .env.prod.example
- Updated deployment scripts for Windows IIS
- Simplified SSH tunnel management
Documentation:
- Comprehensive CLAUDE.md with architecture overview
- Module-specific docs in docs/{data-entry,telegram}/
- Architecture decision records in docs/ARCHITECTURE-DECISIONS.md
- Deployment guides consolidated in deployment/windows/docs/
This migration reduces complexity, improves maintainability, and enables easier
deployment while maintaining all existing functionality.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
382
docs/data-entry/README.md
Normal file
382
docs/data-entry/README.md
Normal file
@@ -0,0 +1,382 @@
|
||||
# Data Entry App - Bonuri Fiscale
|
||||
|
||||
Aplicatie pentru introducere bonuri fiscale cu workflow de aprobare si extragere automata date prin OCR.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.10+
|
||||
- Node.js 18+
|
||||
- (Optional) SSH tunnel pentru Oracle nomenclatoare
|
||||
|
||||
### Using Start Script (Recommended)
|
||||
|
||||
```bash
|
||||
# Start all services
|
||||
./start-data-entry.sh
|
||||
|
||||
# Or individual commands:
|
||||
./start-data-entry.sh start # Start all
|
||||
./start-data-entry.sh stop # Stop all
|
||||
./start-data-entry.sh status # Check status
|
||||
./start-data-entry.sh restart backend # Restart backend only
|
||||
```
|
||||
|
||||
**Services:**
|
||||
- Backend: http://localhost:8003
|
||||
- Frontend: http://localhost:3010
|
||||
- API Docs: http://localhost:8003/docs
|
||||
|
||||
### Manual Setup
|
||||
|
||||
#### Backend Setup
|
||||
|
||||
```bash
|
||||
cd data-entry-app/backend
|
||||
|
||||
# Create virtual environment
|
||||
python -m venv venv
|
||||
source venv/bin/activate # Linux/Mac
|
||||
# sau: venv\Scripts\activate # Windows
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Create .env file
|
||||
cp .env.example .env
|
||||
# Edit .env with your settings
|
||||
|
||||
# Run migrations
|
||||
alembic upgrade head
|
||||
|
||||
# Start server
|
||||
uvicorn app.main:app --reload --port 8003
|
||||
```
|
||||
|
||||
#### Frontend Setup
|
||||
|
||||
```bash
|
||||
cd data-entry-app/frontend
|
||||
|
||||
# Install dependencies
|
||||
npm install
|
||||
|
||||
# Start dev server
|
||||
npm run dev -- --port 3010
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
### Pentru Utilizatori
|
||||
- **OCR Automat** - Extragere automata date din poza bonului (suma, data, furnizor, CUI)
|
||||
- Upload poze bonuri fiscale
|
||||
- Completare date bon (suma, data, furnizor)
|
||||
- Selectie tip cheltuiala
|
||||
- Trimitere spre aprobare
|
||||
|
||||
### Pentru Contabili
|
||||
- Vizualizare bonuri in asteptare
|
||||
- Editare note contabile propuse
|
||||
- Aprobare/Respingere bonuri
|
||||
- Aprobare in masa
|
||||
|
||||
## OCR Feature
|
||||
|
||||
### Cum functioneaza
|
||||
|
||||
1. **Upload imagine** - Trage sau selecteaza poza bonului
|
||||
2. **Procesare OCR** - Click pe "Proceseaza cu OCR"
|
||||
3. **Previzualizare** - Datele extrase sunt afisate cu indicatori de incredere
|
||||
4. **Aplicare** - Click "Aplica datele in formular" pentru auto-fill
|
||||
|
||||
### Campuri extrase automat
|
||||
|
||||
| Camp | Acuratete estimata |
|
||||
|------|-------------------|
|
||||
| Suma (TOTAL) | 90-95% |
|
||||
| Data | 85-90% |
|
||||
| Numar bon | 80-85% |
|
||||
| Furnizor | 70-80% |
|
||||
| CUI | 85-90% |
|
||||
| Tip document | 95%+ |
|
||||
|
||||
### OCR System Dependencies (Linux/Docker)
|
||||
|
||||
Pentru functionarea OCR trebuie instalate:
|
||||
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
apt-get install -y \
|
||||
tesseract-ocr \
|
||||
tesseract-ocr-ron \
|
||||
tesseract-ocr-eng \
|
||||
poppler-utils \
|
||||
libgl1-mesa-glx \
|
||||
libglib2.0-0
|
||||
|
||||
# Fedora/RHEL
|
||||
dnf install -y \
|
||||
tesseract \
|
||||
tesseract-langpack-ron \
|
||||
tesseract-langpack-eng \
|
||||
poppler-utils
|
||||
```
|
||||
|
||||
**Note:** PaddleOCR (engine principal) se instaleaza automat cu pip. Tesseract este folosit ca fallback.
|
||||
|
||||
### OCR System Dependencies (Windows)
|
||||
|
||||
Pe Windows Server trebuie instalate manual urmatoarele componente:
|
||||
|
||||
#### 1. Poppler (pentru conversie PDF → imagini)
|
||||
|
||||
```powershell
|
||||
# Descarca Poppler pentru Windows
|
||||
# https://github.com/osborn/poppler-windows/releases
|
||||
# sau https://github.com/bblanchon/pdfium-binaries
|
||||
|
||||
# Extrage in C:\Program Files\poppler\
|
||||
# Adauga la PATH: C:\Program Files\poppler\Library\bin
|
||||
```
|
||||
|
||||
#### 2. Tesseract OCR (engine OCR backup)
|
||||
|
||||
```powershell
|
||||
# Descarca installer de la:
|
||||
# https://github.com/UB-Mannheim/tesseract/wiki
|
||||
|
||||
# Instaleaza cu limbile: English + Romanian
|
||||
# Default path: C:\Program Files\Tesseract-OCR\
|
||||
# Adauga la PATH
|
||||
```
|
||||
|
||||
#### 3. Python OCR Dependencies (in venv)
|
||||
|
||||
```powershell
|
||||
cd C:\inetpub\wwwroot\roa2web\data-entry-backend
|
||||
.\venv\Scripts\activate
|
||||
|
||||
# Instaleaza dependentele OCR
|
||||
pip install paddlepaddle>=2.5.0
|
||||
pip install paddleocr>=2.7.0
|
||||
pip install opencv-python>=4.8.0
|
||||
pip install pytesseract>=0.3.10
|
||||
pip install pdf2image>=1.16.0
|
||||
|
||||
# Sau din requirements.txt
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
#### 4. Restart serviciu
|
||||
|
||||
```powershell
|
||||
nssm restart ROA2WEB-DataEntry
|
||||
```
|
||||
|
||||
**Note importante Windows:**
|
||||
- Prima rulare PaddleOCR descarca modele (~200MB) - poate dura cateva minute
|
||||
- PaddleOCR necesita ~2GB RAM disponibil
|
||||
- Verifica PATH-ul pentru Poppler si Tesseract dupa instalare
|
||||
- Restart serviciul backend dupa orice modificare PATH
|
||||
|
||||
### OCR API Endpoints
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| GET | /api/ocr/status | Check OCR service status |
|
||||
| POST | /api/ocr/extract | Extract data from uploaded image |
|
||||
| POST | /api/ocr/extract-attachment/{id} | Re-process existing attachment |
|
||||
|
||||
### Test OCR
|
||||
|
||||
```bash
|
||||
# Check OCR status
|
||||
curl http://localhost:8003/api/ocr/status
|
||||
|
||||
# Extract from image
|
||||
curl -X POST -F "file=@bon.jpg" http://localhost:8003/api/ocr/extract
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
DRAFT → PENDING_REVIEW → APPROVED/REJECTED → (SYNCED in Oracle)
|
||||
```
|
||||
|
||||
1. **DRAFT**: Utilizator completeaza datele (manual sau via OCR)
|
||||
2. **PENDING_REVIEW**: Sistemul genereaza note contabile automat
|
||||
3. **APPROVED**: Contabil a aprobat bonul
|
||||
4. **REJECTED**: Contabil a respins (utilizatorul poate corecta)
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
data-entry-app/
|
||||
├── backend/
|
||||
│ ├── app/
|
||||
│ │ ├── main.py # FastAPI entry point
|
||||
│ │ ├── config.py # Settings
|
||||
│ │ ├── db/
|
||||
│ │ │ ├── database.py # SQLite engine
|
||||
│ │ │ ├── models/ # SQLModel models
|
||||
│ │ │ └── crud/ # CRUD operations
|
||||
│ │ ├── schemas/ # Pydantic schemas
|
||||
│ │ │ └── ocr.py # OCR response schemas
|
||||
│ │ ├── services/
|
||||
│ │ │ ├── receipt_service.py
|
||||
│ │ │ ├── ocr_service.py # OCR orchestration
|
||||
│ │ │ ├── ocr_engine.py # PaddleOCR/Tesseract
|
||||
│ │ │ ├── ocr_extractor.py # Regex patterns RO
|
||||
│ │ │ └── image_preprocessor.py # OpenCV pipeline
|
||||
│ │ └── routers/
|
||||
│ │ ├── receipts.py
|
||||
│ │ └── ocr.py # OCR endpoints
|
||||
│ ├── migrations/ # Alembic migrations
|
||||
│ ├── data/
|
||||
│ │ ├── receipts.db # SQLite database
|
||||
│ │ └── uploads/ # Uploaded files
|
||||
│ └── requirements.txt
|
||||
│
|
||||
├── frontend/
|
||||
│ ├── src/
|
||||
│ │ ├── views/receipts/ # Page components
|
||||
│ │ ├── components/
|
||||
│ │ │ ├── receipts/ # Receipt components
|
||||
│ │ │ └── ocr/ # OCR components
|
||||
│ │ │ ├── OCRUploadZone.vue
|
||||
│ │ │ ├── OCRPreview.vue
|
||||
│ │ │ └── OCRConfidenceIndicator.vue
|
||||
│ │ ├── stores/ # Pinia stores
|
||||
│ │ └── router/ # Vue Router
|
||||
│ ├── package.json
|
||||
│ └── vite.config.js
|
||||
│
|
||||
└── docs/ # Documentation
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
### Backend (.env)
|
||||
|
||||
```bash
|
||||
# SQLite
|
||||
SQLITE_DATABASE_PATH=data/receipts.db
|
||||
|
||||
# File uploads
|
||||
UPLOAD_PATH=data/uploads
|
||||
MAX_UPLOAD_SIZE_MB=10
|
||||
|
||||
# Oracle (for nomenclatures)
|
||||
ORACLE_USER=CONTAFIN_ORACLE
|
||||
ORACLE_PASSWORD=your_password
|
||||
ORACLE_HOST=localhost
|
||||
ORACLE_PORT=1526
|
||||
ORACLE_SID=ROA
|
||||
|
||||
# JWT (shared with reports-app)
|
||||
JWT_SECRET_KEY=your_secret_key
|
||||
JWT_ALGORITHM=HS256
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
### Create new migration
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
alembic revision --autogenerate -m "Add new field"
|
||||
alembic upgrade head
|
||||
```
|
||||
|
||||
### Run tests
|
||||
|
||||
```bash
|
||||
# Backend
|
||||
cd backend && pytest
|
||||
|
||||
# Frontend
|
||||
cd frontend && npm run test
|
||||
```
|
||||
|
||||
## API Documentation
|
||||
|
||||
Full API documentation available at http://localhost:8003/docs when backend is running.
|
||||
|
||||
### Key Endpoints
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| POST | /api/receipts/ | Create receipt |
|
||||
| GET | /api/receipts/ | List receipts |
|
||||
| GET | /api/receipts/{id} | Get receipt details |
|
||||
| POST | /api/receipts/{id}/submit | Submit for review |
|
||||
| POST | /api/receipts/{id}/approve | Approve receipt |
|
||||
| POST | /api/receipts/{id}/reject | Reject receipt |
|
||||
| POST | /api/receipts/{id}/attachments | Upload attachment |
|
||||
| GET | /api/ocr/status | OCR service status |
|
||||
| POST | /api/ocr/extract | OCR image extraction |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### OCR not working
|
||||
|
||||
1. Check OCR status: `curl http://localhost:8003/api/ocr/status`
|
||||
2. Install system dependencies (tesseract, poppler)
|
||||
3. Verify PaddleOCR installed: `python -c "from paddleocr import PaddleOCR"`
|
||||
|
||||
### OCR Windows - "poppler not in PATH"
|
||||
|
||||
```powershell
|
||||
# Eroare: "Unable to get page count. Is poppler installed and in PATH?"
|
||||
|
||||
# Solutie 1: Adauga Poppler la PATH
|
||||
# System Properties → Environment Variables → System variables → Path → New
|
||||
# Adauga: C:\Program Files\poppler\Library\bin
|
||||
|
||||
# Solutie 2: Restart serviciul dupa modificarea PATH
|
||||
nssm restart ROA2WEB-DataEntry
|
||||
|
||||
# Verificare:
|
||||
pdfinfo --version
|
||||
```
|
||||
|
||||
### OCR Windows - "tesseract not found"
|
||||
|
||||
```powershell
|
||||
# Eroare: "tesseract is not installed or it's not in your PATH"
|
||||
|
||||
# Solutie: Adauga Tesseract la PATH
|
||||
# C:\Program Files\Tesseract-OCR\
|
||||
|
||||
# Verificare:
|
||||
tesseract --version
|
||||
tesseract --list-langs # Trebuie sa arate 'ron' si 'eng'
|
||||
```
|
||||
|
||||
### OCR Windows - PaddleOCR import error
|
||||
|
||||
```powershell
|
||||
# Eroare: "No module named 'paddleocr'"
|
||||
|
||||
cd C:\inetpub\wwwroot\roa2web\data-entry-backend
|
||||
.\venv\Scripts\activate
|
||||
pip install paddlepaddle>=2.5.0
|
||||
pip install paddleocr>=2.7.0
|
||||
|
||||
# Restart serviciu
|
||||
nssm restart ROA2WEB-DataEntry
|
||||
```
|
||||
|
||||
### Low OCR accuracy
|
||||
|
||||
- Ensure good lighting when taking receipt photos
|
||||
- Keep receipt flat (no folds/wrinkles)
|
||||
- Try PDF instead of JPG for scanned documents
|
||||
- Check if text is in focus
|
||||
|
||||
## Phase 2 (Future)
|
||||
|
||||
- Oracle sync for approved receipts
|
||||
- Integration with pack_contafin procedures
|
||||
- Automatic posting to ACT/RUL tables
|
||||
Reference in New Issue
Block a user