docs: Add Windows OCR dependencies and fix IIS API error handling
- Add OCR installation instructions for Windows (Poppler, Tesseract, PaddleOCR) - Add troubleshooting section for common OCR errors on Windows - Fix web.config.data-entry to use existingResponse="Auto" instead of "Replace" This allows FastAPI JSON error responses to pass through IIS unchanged - Update system requirements to recommend 16GB RAM for OCR workloads 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -125,6 +125,61 @@ dnf install -y \
|
||||
|
||||
**Note:** PaddleOCR (engine principal) se instaleaza automat cu pip. Tesseract este folosit ca fallback.
|
||||
|
||||
### OCR System Dependencies (Windows)
|
||||
|
||||
Pe Windows Server trebuie instalate manual urmatoarele componente:
|
||||
|
||||
#### 1. Poppler (pentru conversie PDF → imagini)
|
||||
|
||||
```powershell
|
||||
# Descarca Poppler pentru Windows
|
||||
# https://github.com/osborn/poppler-windows/releases
|
||||
# sau https://github.com/bblanchon/pdfium-binaries
|
||||
|
||||
# Extrage in C:\Program Files\poppler\
|
||||
# Adauga la PATH: C:\Program Files\poppler\Library\bin
|
||||
```
|
||||
|
||||
#### 2. Tesseract OCR (engine OCR backup)
|
||||
|
||||
```powershell
|
||||
# Descarca installer de la:
|
||||
# https://github.com/UB-Mannheim/tesseract/wiki
|
||||
|
||||
# Instaleaza cu limbile: English + Romanian
|
||||
# Default path: C:\Program Files\Tesseract-OCR\
|
||||
# Adauga la PATH
|
||||
```
|
||||
|
||||
#### 3. Python OCR Dependencies (in venv)
|
||||
|
||||
```powershell
|
||||
cd C:\inetpub\wwwroot\roa2web\data-entry-backend
|
||||
.\venv\Scripts\activate
|
||||
|
||||
# Instaleaza dependentele OCR
|
||||
pip install paddlepaddle>=2.5.0
|
||||
pip install paddleocr>=2.7.0
|
||||
pip install opencv-python>=4.8.0
|
||||
pip install pytesseract>=0.3.10
|
||||
pip install pdf2image>=1.16.0
|
||||
|
||||
# Sau din requirements.txt
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
#### 4. Restart serviciu
|
||||
|
||||
```powershell
|
||||
nssm restart ROA2WEB-DataEntry
|
||||
```
|
||||
|
||||
**Note importante Windows:**
|
||||
- Prima rulare PaddleOCR descarca modele (~200MB) - poate dura cateva minute
|
||||
- PaddleOCR necesita ~2GB RAM disponibil
|
||||
- Verifica PATH-ul pentru Poppler si Tesseract dupa instalare
|
||||
- Restart serviciul backend dupa orice modificare PATH
|
||||
|
||||
### OCR API Endpoints
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
@@ -270,6 +325,49 @@ Full API documentation available at http://localhost:8003/docs when backend is r
|
||||
2. Install system dependencies (tesseract, poppler)
|
||||
3. Verify PaddleOCR installed: `python -c "from paddleocr import PaddleOCR"`
|
||||
|
||||
### OCR Windows - "poppler not in PATH"
|
||||
|
||||
```powershell
|
||||
# Eroare: "Unable to get page count. Is poppler installed and in PATH?"
|
||||
|
||||
# Solutie 1: Adauga Poppler la PATH
|
||||
# System Properties → Environment Variables → System variables → Path → New
|
||||
# Adauga: C:\Program Files\poppler\Library\bin
|
||||
|
||||
# Solutie 2: Restart serviciul dupa modificarea PATH
|
||||
nssm restart ROA2WEB-DataEntry
|
||||
|
||||
# Verificare:
|
||||
pdfinfo --version
|
||||
```
|
||||
|
||||
### OCR Windows - "tesseract not found"
|
||||
|
||||
```powershell
|
||||
# Eroare: "tesseract is not installed or it's not in your PATH"
|
||||
|
||||
# Solutie: Adauga Tesseract la PATH
|
||||
# C:\Program Files\Tesseract-OCR\
|
||||
|
||||
# Verificare:
|
||||
tesseract --version
|
||||
tesseract --list-langs # Trebuie sa arate 'ron' si 'eng'
|
||||
```
|
||||
|
||||
### OCR Windows - PaddleOCR import error
|
||||
|
||||
```powershell
|
||||
# Eroare: "No module named 'paddleocr'"
|
||||
|
||||
cd C:\inetpub\wwwroot\roa2web\data-entry-backend
|
||||
.\venv\Scripts\activate
|
||||
pip install paddlepaddle>=2.5.0
|
||||
pip install paddleocr>=2.7.0
|
||||
|
||||
# Restart serviciu
|
||||
nssm restart ROA2WEB-DataEntry
|
||||
```
|
||||
|
||||
### Low OCR accuracy
|
||||
|
||||
- Ensure good lighting when taking receipt photos
|
||||
|
||||
Reference in New Issue
Block a user