refactor(docs): consolidate and cleanup documentation

- Delete 9 deprecated/obsolete docs (~6,300 lines removed) - Move test PDFs to tests/fixtures/ocr-samples/ - Create docs/DEPLOYMENT.md as principal guide - Create tests/ocr-validation/README.md - Update all refs for ultrathin monolith architecture - Update OCR tests to use relative paths Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 09:14:51 +00:00
parent 1b9ebf1d8f
commit 62f86250cc
55 changed files with 604 additions and 6334 deletions
--- a/docs/data-entry/Lidl
+++ b/docs/data-entry/Lidl
--- a/docs/data-entry/Lidl
+++ b/docs/data-entry/Lidl
--- a/docs/data-entry/README.md
+++ b/docs/data-entry/README.md
@@ -1,382 +0,0 @@
-# Data Entry App - Bonuri Fiscale
-
-Aplicatie pentru introducere bonuri fiscale cu workflow de aprobare si extragere automata date prin OCR.
-
-## Quick Start
-
-### Prerequisites
-
- Python 3.10+
- Node.js 18+
- (Optional) SSH tunnel pentru Oracle nomenclatoare
-
-### Using Start Script (Recommended)
-
-```bash
-# Start all services
-./start-data-entry.sh
-
-# Or individual commands:
-./start-data-entry.sh start              # Start all
-./start-data-entry.sh stop               # Stop all
-./start-data-entry.sh status             # Check status
-./start-data-entry.sh restart backend    # Restart backend only
-```
-
-**Services:**
- Backend: http://localhost:8003
- Frontend: http://localhost:3010
- API Docs: http://localhost:8003/docs
-
-### Manual Setup
-
-#### Backend Setup
-
-```bash
-cd backend/modules/data_entry/backend
-
-# Create virtual environment
-python -m venv venv
-source venv/bin/activate  # Linux/Mac
-# sau: venv\Scripts\activate  # Windows
-
-# Install dependencies
-pip install -r requirements.txt
-
-# Create .env file
-cp .env.example .env
-# Edit .env with your settings
-
-# Run migrations
-alembic upgrade head
-
-# Start server
-uvicorn app.main:app --reload --port 8003
-```
-
-#### Frontend Setup
-
-```bash
-cd backend/modules/data_entry/frontend
-
-# Install dependencies
-npm install
-
-# Start dev server
-npm run dev -- --port 3010
-```
-
-## Features
-
-### Pentru Utilizatori
- **OCR Automat** - Extragere automata date din poza bonului (suma, data, furnizor, CUI)
- Upload poze bonuri fiscale
- Completare date bon (suma, data, furnizor)
- Selectie tip cheltuiala
- Trimitere spre aprobare
-
-### Pentru Contabili
- Vizualizare bonuri in asteptare
- Editare note contabile propuse
- Aprobare/Respingere bonuri
- Aprobare in masa
-
-## OCR Feature
-
-### Cum functioneaza
-
-1. **Upload imagine** - Trage sau selecteaza poza bonului
-2. **Procesare OCR** - Click pe "Proceseaza cu OCR"
-3. **Previzualizare** - Datele extrase sunt afisate cu indicatori de incredere
-4. **Aplicare** - Click "Aplica datele in formular" pentru auto-fill
-
-### Campuri extrase automat
-
-| Camp | Acuratete estimata |
-|------|-------------------|
-| Suma (TOTAL) | 90-95% |
-| Data | 85-90% |
-| Numar bon | 80-85% |
-| Furnizor | 70-80% |
-| CUI | 85-90% |
-| Tip document | 95%+ |
-
-### OCR System Dependencies (Linux/Docker)
-
-Pentru functionarea OCR trebuie instalate:
-
-```bash
-# Ubuntu/Debian
-apt-get install -y \
-    tesseract-ocr \
-    tesseract-ocr-ron \
-    tesseract-ocr-eng \
-    poppler-utils \
-    libgl1-mesa-glx \
-    libglib2.0-0
-
-# Fedora/RHEL
-dnf install -y \
-    tesseract \
-    tesseract-langpack-ron \
-    tesseract-langpack-eng \
-    poppler-utils
-```
-
-**Note:** PaddleOCR (engine principal) se instaleaza automat cu pip. Tesseract este folosit ca fallback.
-
-### OCR System Dependencies (Windows)
-
-Pe Windows Server trebuie instalate manual urmatoarele componente:
-
-#### 1. Poppler (pentru conversie PDF → imagini)
-
-```powershell
-# Descarca Poppler pentru Windows
-# https://github.com/osborn/poppler-windows/releases
-# sau https://github.com/bblanchon/pdfium-binaries
-
-# Extrage in C:\Program Files\poppler\
-# Adauga la PATH: C:\Program Files\poppler\Library\bin
-```
-
-#### 2. Tesseract OCR (engine OCR backup)
-
-```powershell
-# Descarca installer de la:
-# https://github.com/UB-Mannheim/tesseract/wiki
-
-# Instaleaza cu limbile: English + Romanian
-# Default path: C:\Program Files\Tesseract-OCR\
-# Adauga la PATH
-```
-
-#### 3. Python OCR Dependencies (in venv)
-
-```powershell
-cd C:\inetpub\wwwroot\roa2web\data-entry-backend
-.\venv\Scripts\activate
-
-# Instaleaza dependentele OCR
-pip install paddlepaddle>=2.5.0
-pip install paddleocr>=2.7.0
-pip install opencv-python>=4.8.0
-pip install pytesseract>=0.3.10
-pip install pdf2image>=1.16.0
-
-# Sau din requirements.txt
-pip install -r requirements.txt
-```
-
-#### 4. Restart serviciu
-
-```powershell
-nssm restart ROA2WEB-DataEntry
-```
-
-**Note importante Windows:**
- Prima rulare PaddleOCR descarca modele (~200MB) - poate dura cateva minute
- PaddleOCR necesita ~2GB RAM disponibil
- Verifica PATH-ul pentru Poppler si Tesseract dupa instalare
- Restart serviciul backend dupa orice modificare PATH
-
-### OCR API Endpoints
-
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| GET | /api/ocr/status | Check OCR service status |
-| POST | /api/ocr/extract | Extract data from uploaded image |
-| POST | /api/ocr/extract-attachment/{id} | Re-process existing attachment |
-
-### Test OCR
-
-```bash
-# Check OCR status
-curl http://localhost:8003/api/ocr/status
-
-# Extract from image
-curl -X POST -F "file=@bon.jpg" http://localhost:8003/api/ocr/extract
-```
-
-## Workflow
-
-```
-DRAFT → PENDING_REVIEW → APPROVED/REJECTED → (SYNCED in Oracle)
-```
-
-1. **DRAFT**: Utilizator completeaza datele (manual sau via OCR)
-2. **PENDING_REVIEW**: Sistemul genereaza note contabile automat
-3. **APPROVED**: Contabil a aprobat bonul
-4. **REJECTED**: Contabil a respins (utilizatorul poate corecta)
-
-## Project Structure
-
-```
-backend/modules/data_entry/
-├── backend/
-│   ├── app/
-│   │   ├── main.py              # FastAPI entry point
-│   │   ├── config.py            # Settings
-│   │   ├── db/
-│   │   │   ├── database.py      # SQLite engine
-│   │   │   ├── models/          # SQLModel models
-│   │   │   └── crud/            # CRUD operations
-│   │   ├── schemas/             # Pydantic schemas
-│   │   │   └── ocr.py           # OCR response schemas
-│   │   ├── services/
-│   │   │   ├── receipt_service.py
-│   │   │   ├── ocr_service.py       # OCR orchestration
-│   │   │   ├── ocr_engine.py        # PaddleOCR/Tesseract
-│   │   │   ├── ocr_extractor.py     # Regex patterns RO
-│   │   │   └── image_preprocessor.py # OpenCV pipeline
-│   │   └── routers/
-│   │       ├── receipts.py
-│   │       └── ocr.py           # OCR endpoints
-│   ├── migrations/              # Alembic migrations
-│   ├── data/
-│   │   ├── receipts.db          # SQLite database
-│   │   └── uploads/             # Uploaded files
-│   └── requirements.txt
-│
-├── frontend/
-│   ├── src/
-│   │   ├── views/receipts/      # Page components
-│   │   ├── components/
-│   │   │   ├── receipts/        # Receipt components
-│   │   │   └── ocr/             # OCR components
-│   │   │       ├── OCRUploadZone.vue
-│   │   │       ├── OCRPreview.vue
-│   │   │       └── OCRConfidenceIndicator.vue
-│   │   ├── stores/              # Pinia stores
-│   │   └── router/              # Vue Router
-│   ├── package.json
-│   └── vite.config.js
-│
-└── docs/                        # Documentation
-```
-
-## Environment Variables
-
-### Backend (.env)
-
-```bash
-# SQLite
-SQLITE_DATABASE_PATH=data/receipts.db
-
-# File uploads
-UPLOAD_PATH=data/uploads
-MAX_UPLOAD_SIZE_MB=10
-
-# Oracle (for nomenclatures)
-ORACLE_USER=CONTAFIN_ORACLE
-ORACLE_PASSWORD=your_password
-ORACLE_HOST=localhost
-ORACLE_PORT=1526
-ORACLE_SID=ROA
-
-# JWT (shared with Reports module)
-JWT_SECRET_KEY=your_secret_key
-JWT_ALGORITHM=HS256
-```
-
-## Development
-
-### Create new migration
-
-```bash
-cd backend
-alembic revision --autogenerate -m "Add new field"
-alembic upgrade head
-```
-
-### Run tests
-
-```bash
-# Backend
-cd backend && pytest
-
-# Frontend
-cd frontend && npm run test
-```
-
-## API Documentation
-
-Full API documentation available at http://localhost:8003/docs when backend is running.
-
-### Key Endpoints
-
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| POST | /api/receipts/ | Create receipt |
-| GET | /api/receipts/ | List receipts |
-| GET | /api/receipts/{id} | Get receipt details |
-| POST | /api/receipts/{id}/submit | Submit for review |
-| POST | /api/receipts/{id}/approve | Approve receipt |
-| POST | /api/receipts/{id}/reject | Reject receipt |
-| POST | /api/receipts/{id}/attachments | Upload attachment |
-| GET | /api/ocr/status | OCR service status |
-| POST | /api/ocr/extract | OCR image extraction |
-
-## Troubleshooting
-
-### OCR not working
-
-1. Check OCR status: `curl http://localhost:8003/api/ocr/status`
-2. Install system dependencies (tesseract, poppler)
-3. Verify PaddleOCR installed: `python -c "from paddleocr import PaddleOCR"`
-
-### OCR Windows - "poppler not in PATH"
-
-```powershell
-# Eroare: "Unable to get page count. Is poppler installed and in PATH?"
-
-# Solutie 1: Adauga Poppler la PATH
-# System Properties → Environment Variables → System variables → Path → New
-# Adauga: C:\Program Files\poppler\Library\bin
-
-# Solutie 2: Restart serviciul dupa modificarea PATH
-nssm restart ROA2WEB-DataEntry
-
-# Verificare:
-pdfinfo --version
-```
-
-### OCR Windows - "tesseract not found"
-
-```powershell
-# Eroare: "tesseract is not installed or it's not in your PATH"
-
-# Solutie: Adauga Tesseract la PATH
-# C:\Program Files\Tesseract-OCR\
-
-# Verificare:
-tesseract --version
-tesseract --list-langs  # Trebuie sa arate 'ron' si 'eng'
-```
-
-### OCR Windows - PaddleOCR import error
-
-```powershell
-# Eroare: "No module named 'paddleocr'"
-
-cd C:\inetpub\wwwroot\roa2web\data-entry-backend
-.\venv\Scripts\activate
-pip install paddlepaddle>=2.5.0
-pip install paddleocr>=2.7.0
-
-# Restart serviciu
-nssm restart ROA2WEB-DataEntry
-```
-
-### Low OCR accuracy
-
- Ensure good lighting when taking receipt photos
- Keep receipt flat (no folds/wrinkles)
- Try PDF instead of JPG for scanned documents
- Check if text is in focus
-
-## Phase 2 (Future)
-
- Oracle sync for approved receipts
- Integration with pack_contafin procedures
- Automatic posting to ACT/RUL tables
--- a/docs/data-entry/abonament
+++ b/docs/data-entry/abonament
--- a/docs/data-entry/benzina
+++ b/docs/data-entry/benzina
--- a/docs/data-entry/benzina
+++ b/docs/data-entry/benzina
--- a/docs/data-entry/benzina
+++ b/docs/data-entry/benzina
--- a/docs/data-entry/benzina
+++ b/docs/data-entry/benzina
--- a/docs/data-entry/benzina
+++ b/docs/data-entry/benzina
--- a/docs/data-entry/benzina
+++ b/docs/data-entry/benzina
--- a/docs/data-entry/benzina
+++ b/docs/data-entry/benzina
--- a/docs/data-entry/best
+++ b/docs/data-entry/best
--- a/docs/data-entry/bon
+++ b/docs/data-entry/bon
--- a/docs/data-entry/brick
+++ b/docs/data-entry/brick
--- a/docs/data-entry/brick
+++ b/docs/data-entry/brick
--- a/docs/data-entry/brick
+++ b/docs/data-entry/brick
--- a/docs/data-entry/brick
+++ b/docs/data-entry/brick
--- a/docs/data-entry/brick
+++ b/docs/data-entry/brick
--- a/docs/data-entry/brick
+++ b/docs/data-entry/brick
--- a/docs/data-entry/electrobering
+++ b/docs/data-entry/electrobering
--- a/docs/data-entry/electrobering
+++ b/docs/data-entry/electrobering
--- a/docs/data-entry/factura
+++ b/docs/data-entry/factura
--- a/docs/data-entry/gama
+++ b/docs/data-entry/gama
--- a/docs/data-entry/igiena
+++ b/docs/data-entry/igiena
--- a/docs/data-entry/igiena
+++ b/docs/data-entry/igiena
--- a/docs/data-entry/kineterra
+++ b/docs/data-entry/kineterra
--- a/docs/data-entry/kineterra
+++ b/docs/data-entry/kineterra
--- a/docs/data-entry/rechizite
+++ b/docs/data-entry/rechizite
--- a/docs/data-entry/stepout-bon1-5.jpg
+++ b/docs/data-entry/stepout-bon1-5.jpg
--- a/docs/data-entry/stepout-bon2-5.jpg
+++ b/docs/data-entry/stepout-bon2-5.jpg
--- a/docs/data-entry/unlimited
+++ b/docs/data-entry/unlimited