From 642ae3a96c82d785e1d4f6ecb3c9d5de9432b444 Mon Sep 17 00:00:00 2001 From: Marius Mutu Date: Thu, 18 Dec 2025 19:43:33 +0200 Subject: [PATCH] docs: Add Windows OCR dependencies and fix IIS API error handling MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add OCR installation instructions for Windows (Poppler, Tesseract, PaddleOCR) - Add troubleshooting section for common OCR errors on Windows - Fix web.config.data-entry to use existingResponse="Auto" instead of "Replace" This allows FastAPI JSON error responses to pass through IIS unchanged - Update system requirements to recommend 16GB RAM for OCR workloads 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- data-entry-app/README.md | 98 ++++++++ deployment/windows/README.md | 215 ++++++++++++++---- .../windows/config/web.config.data-entry | 161 +++++++++++++ 3 files changed, 432 insertions(+), 42 deletions(-) create mode 100644 deployment/windows/config/web.config.data-entry diff --git a/data-entry-app/README.md b/data-entry-app/README.md index 172ec71..c4b7f00 100644 --- a/data-entry-app/README.md +++ b/data-entry-app/README.md @@ -125,6 +125,61 @@ dnf install -y \ **Note:** PaddleOCR (engine principal) se instaleaza automat cu pip. Tesseract este folosit ca fallback. +### OCR System Dependencies (Windows) + +Pe Windows Server trebuie instalate manual urmatoarele componente: + +#### 1. Poppler (pentru conversie PDF → imagini) + +```powershell +# Descarca Poppler pentru Windows +# https://github.com/osborn/poppler-windows/releases +# sau https://github.com/bblanchon/pdfium-binaries + +# Extrage in C:\Program Files\poppler\ +# Adauga la PATH: C:\Program Files\poppler\Library\bin +``` + +#### 2. Tesseract OCR (engine OCR backup) + +```powershell +# Descarca installer de la: +# https://github.com/UB-Mannheim/tesseract/wiki + +# Instaleaza cu limbile: English + Romanian +# Default path: C:\Program Files\Tesseract-OCR\ +# Adauga la PATH +``` + +#### 3. Python OCR Dependencies (in venv) + +```powershell +cd C:\inetpub\wwwroot\roa2web\data-entry-backend +.\venv\Scripts\activate + +# Instaleaza dependentele OCR +pip install paddlepaddle>=2.5.0 +pip install paddleocr>=2.7.0 +pip install opencv-python>=4.8.0 +pip install pytesseract>=0.3.10 +pip install pdf2image>=1.16.0 + +# Sau din requirements.txt +pip install -r requirements.txt +``` + +#### 4. Restart serviciu + +```powershell +nssm restart ROA2WEB-DataEntry +``` + +**Note importante Windows:** +- Prima rulare PaddleOCR descarca modele (~200MB) - poate dura cateva minute +- PaddleOCR necesita ~2GB RAM disponibil +- Verifica PATH-ul pentru Poppler si Tesseract dupa instalare +- Restart serviciul backend dupa orice modificare PATH + ### OCR API Endpoints | Method | Endpoint | Description | @@ -270,6 +325,49 @@ Full API documentation available at http://localhost:8003/docs when backend is r 2. Install system dependencies (tesseract, poppler) 3. Verify PaddleOCR installed: `python -c "from paddleocr import PaddleOCR"` +### OCR Windows - "poppler not in PATH" + +```powershell +# Eroare: "Unable to get page count. Is poppler installed and in PATH?" + +# Solutie 1: Adauga Poppler la PATH +# System Properties → Environment Variables → System variables → Path → New +# Adauga: C:\Program Files\poppler\Library\bin + +# Solutie 2: Restart serviciul dupa modificarea PATH +nssm restart ROA2WEB-DataEntry + +# Verificare: +pdfinfo --version +``` + +### OCR Windows - "tesseract not found" + +```powershell +# Eroare: "tesseract is not installed or it's not in your PATH" + +# Solutie: Adauga Tesseract la PATH +# C:\Program Files\Tesseract-OCR\ + +# Verificare: +tesseract --version +tesseract --list-langs # Trebuie sa arate 'ron' si 'eng' +``` + +### OCR Windows - PaddleOCR import error + +```powershell +# Eroare: "No module named 'paddleocr'" + +cd C:\inetpub\wwwroot\roa2web\data-entry-backend +.\venv\Scripts\activate +pip install paddlepaddle>=2.5.0 +pip install paddleocr>=2.7.0 + +# Restart serviciu +nssm restart ROA2WEB-DataEntry +``` + ### Low OCR accuracy - Ensure good lighting when taking receipt photos diff --git a/deployment/windows/README.md b/deployment/windows/README.md index c9960ca..904f32d 100644 --- a/deployment/windows/README.md +++ b/deployment/windows/README.md @@ -2,28 +2,33 @@ Complete deployment solution for ROA2WEB on Windows Server with IIS and Oracle Database. +**Includes:** +- **Reports App** - Read-only Oracle reports (Port 8000) +- **Telegram Bot** - Telegram integration (Port 8002) +- **Data Entry App** - Receipt data entry with approval workflow (Port 8003) + --- ## 📂 Package Contents ``` deployment/windows/ -├── config/ # Configuration files -│ ├── web.config # IIS configuration (URL Rewrite, reverse proxy) -│ └── .env.production.windows # Environment variables template +├── config/ # Configuration files +│ ├── web.config # IIS config for Reports App +│ ├── web.config.data-entry # IIS config for Data Entry App +│ └── .env.production.windows # Environment variables template │ -├── scripts/ # PowerShell automation scripts -│ ├── Install-ROA2WEB.ps1 # Initial installation -│ ├── Deploy-ROA2WEB.ps1 # Deploy updates -│ ├── Build-Frontend.ps1 # Build Vue.js frontend (run locally) -│ ├── Start-ROA2WEB.ps1 # Start backend service -│ ├── Stop-ROA2WEB.ps1 # Stop backend service -│ └── Restart-ROA2WEB.ps1 # Restart backend service +├── scripts/ # PowerShell automation scripts +│ ├── Build-ROA2WEB.ps1 # Build all components (interactive menu) +│ ├── ROA2WEB-Console.ps1 # Unified deployment & management console +│ ├── Install-ROA2WEB.ps1 # Initial Reports App installation +│ ├── Install-TelegramBot.ps1 # Telegram Bot installation +│ └── deploy-config.json # Deployment configuration │ -├── docs/ # Documentation -│ └── WINDOWS_DEPLOYMENT.md # Complete deployment guide +├── docs/ # Documentation +│ └── WINDOWS_DEPLOYMENT.md # Complete deployment guide │ -└── README.md # This file +└── README.md # This file ``` --- @@ -150,24 +155,56 @@ cd C:\inetpub\wwwroot\roa2web\deployment\windows\scripts ## 🔧 Management Commands +### Interactive Console (Recommended) + ```powershell -# Start backend service -.\Start-ROA2WEB.ps1 +# Open unified management console +cd C:\inetpub\wwwroot\roa2web\deployment\windows\scripts +.\ROA2WEB-Console.ps1 -# Stop backend service -.\Stop-ROA2WEB.ps1 +# Menu options: +# [1] Deploy Components +# [2] Manage Services +# [3] Check Status +``` -# Restart backend service -.\Restart-ROA2WEB.ps1 +### Non-Interactive Commands + +```powershell +# Deploy all components +.\ROA2WEB-Console.ps1 -NonInteractive -Action DeployAll + +# Deploy specific component +.\ROA2WEB-Console.ps1 -NonInteractive -Action DeployBackend +.\ROA2WEB-Console.ps1 -NonInteractive -Action DeployTelegramBot +.\ROA2WEB-Console.ps1 -NonInteractive -Action DeployDataEntry + +# Service management +.\ROA2WEB-Console.ps1 -NonInteractive -Action StartAll +.\ROA2WEB-Console.ps1 -NonInteractive -Action StopAll +.\ROA2WEB-Console.ps1 -NonInteractive -Action RestartAll + +# Data Entry service management +.\ROA2WEB-Console.ps1 -NonInteractive -Action StartDataEntry +.\ROA2WEB-Console.ps1 -NonInteractive -Action StopDataEntry +.\ROA2WEB-Console.ps1 -NonInteractive -Action RestartDataEntry + +# Check status +.\ROA2WEB-Console.ps1 -NonInteractive -Action Status +``` + +### Direct Service Commands + +```powershell +# Check all ROA2WEB services +Get-Service ROA2WEB-* # View logs Get-Content C:\inetpub\wwwroot\roa2web\logs\backend-stdout.log -Tail 50 -Wait +Get-Content C:\inetpub\wwwroot\roa2web\data-entry-backend\logs\stdout.log -Tail 50 -Wait -# Check service status -Get-Service ROA2WEB-Backend - -# Check IIS website -Get-Website ROA2WEB +# Check IIS +Get-Website | Where-Object { $_.Name -like "*roa2web*" -or $_.Name -like "*data-entry*" } ``` --- @@ -178,43 +215,85 @@ Get-Website ROA2WEB | Component | Type | Port | Purpose | |-----------|------|------|---------| -| **Frontend** | IIS Static Files | 80/443 | Vue.js SPA | -| **Backend** | Windows Service | 8000 | FastAPI API | -| **Database** | Oracle | 1521 | Data storage | -| **Reverse Proxy** | IIS URL Rewrite | - | API routing | +| **Reports Frontend** | IIS Static Files | 80/443 | Vue.js SPA (Reports) | +| **Reports Backend** | Windows Service | 8000 | FastAPI API (Reports) | +| **Telegram Bot** | Windows Service | 8002 | Telegram integration | +| **Data Entry Frontend** | IIS Static Files | 80/443 | Vue.js SPA (Data Entry) | +| **Data Entry Backend** | Windows Service | 8003 | FastAPI API (Data Entry) | +| **Database** | Oracle | 1521 | Reports data (read-only) | +| **SQLite** | File | - | Data Entry local storage | ### Network Flow ``` -Client → IIS (port 80) → [web.config URL Rewrite] - ├─ /api/* → Backend Service (localhost:8000) - │ ↓ - │ Oracle DB (localhost:1521) - └─ /* → Static Files (Vue.js) +Client → IIS (port 80/443) + │ + ├─ /roa2web/api/* → Reports Backend (localhost:8000) → Oracle DB + │ + ├─ /roa2web/* → Reports Frontend (Vue.js) + │ + ├─ /data-entry/api/* → Data Entry Backend (localhost:8003) → SQLite + │ + └─ /data-entry/* → Data Entry Frontend (Vue.js) ``` +### Windows Services + +| Service Name | Description | Port | +|-------------|-------------|------| +| ROA2WEB-Backend | Reports API | 8000 | +| ROA2WEB-TelegramBot | Telegram Bot | 8002 | +| ROA2WEB-DataEntry | Data Entry API | 8003 | + --- ## 📋 Directory Structure After Installation ``` C:\inetpub\wwwroot\roa2web\ -├── backend\ # FastAPI application +│ +├── backend\ # Reports Backend (FastAPI) │ ├── app\ │ ├── requirements.txt -│ ├── .env # Configuration -│ └── logs\ +│ ├── venv\ +│ └── .env │ -├── frontend\ # Vue.js static files +├── frontend\ # Reports Frontend (Vue.js) │ ├── index.html │ ├── assets\ │ └── web.config │ -├── logs\ # Service logs +├── telegram-bot\ # Telegram Bot +│ ├── app\ +│ ├── data\telegram_bot.db +│ ├── requirements.txt +│ ├── venv\ +│ └── .env +│ +├── data-entry-backend\ # Data Entry Backend (FastAPI) +│ ├── app\ +│ ├── migrations\ +│ ├── data\receipts.db # SQLite database +│ ├── data\uploads\ # Uploaded receipts +│ ├── requirements.txt +│ ├── venv\ +│ └── .env +│ +├── data-entry-frontend\ # Data Entry Frontend (Vue.js) +│ ├── index.html +│ ├── assets\ +│ └── web.config +│ +├── shared\ # Shared Python modules +│ ├── auth\ +│ ├── database\ +│ └── utils\ +│ +├── logs\ # Service logs │ ├── backend-stdout.log │ └── backend-stderr.log │ -└── backups\ # Automatic backups +└── backups\ # Automatic backups └── backup-YYYYMMDD-HHMMSS\ ``` @@ -294,13 +373,64 @@ For complete documentation, see: | Resource | Minimum | Recommended | |----------|---------|-------------| | **OS** | Windows Server 2016 | Windows Server 2019+ | -| **RAM** | 4 GB | 8 GB | +| **RAM** | 4 GB | 8 GB (16 GB if using OCR) | | **CPU** | 2 cores | 4 cores | | **Disk** | 10 GB free | 20 GB free | | **Network** | 100 Mbps | 1 Gbps | --- +## 🔍 OCR Dependencies (Data Entry App) + +Data Entry App foloseste OCR pentru extragerea automata a datelor din bonuri fiscale. Pe Windows trebuie instalate manual: + +### 1. Poppler (conversie PDF → imagini) + +```powershell +# Descarca de la: https://github.com/osborn/poppler-windows/releases +# Extrage in: C:\Program Files\poppler\ +# Adauga la System PATH: C:\Program Files\poppler\Library\bin + +# Verificare instalare: +pdfinfo --version +``` + +### 2. Tesseract OCR (engine OCR backup) + +```powershell +# Descarca installer: https://github.com/UB-Mannheim/tesseract/wiki +# Selecteaza limbile: English + Romanian +# Default path: C:\Program Files\Tesseract-OCR\ +# Adauga la System PATH + +# Verificare instalare: +tesseract --version +``` + +### 3. Python OCR Packages + +```powershell +cd C:\inetpub\wwwroot\roa2web\data-entry-backend +.\venv\Scripts\activate + +pip install paddlepaddle>=2.5.0 +pip install paddleocr>=2.7.0 +pip install opencv-python>=4.8.0 +pip install pytesseract>=0.3.10 +pip install pdf2image>=1.16.0 + +# Restart serviciu +nssm restart ROA2WEB-DataEntry +``` + +### Note importante +- **PaddleOCR** descarca modele (~200MB) la prima rulare +- **RAM**: PaddleOCR necesita ~2GB RAM disponibil +- **PATH**: Dupa modificari PATH, restart serviciul backend +- **Test OCR**: `curl http://localhost:8003/api/ocr/status` + +--- + ## 🔐 Security Recommendations 1. **Generate Strong JWT Secret:** @@ -353,9 +483,10 @@ For issues or questions: | Version | Date | Changes | |---------|------|---------| +| 2.1.0 | 2025-12-18 | Added Data Entry App deployment support | | 2.0.0 | 2025-01-18 | Initial Windows deployment package | --- -*ROA2WEB - Modern ERP Reports Application* -*Windows Server Deployment Package v2.0.0* +*ROA2WEB - Modern ERP Application (Reports + Data Entry)* +*Windows Server Deployment Package v2.1.0* diff --git a/deployment/windows/config/web.config.data-entry b/deployment/windows/config/web.config.data-entry new file mode 100644 index 0000000..6eb8712 --- /dev/null +++ b/deployment/windows/config/web.config.data-entry @@ -0,0 +1,161 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +