refactor(docs): consolidate and cleanup documentation
- Delete 9 deprecated/obsolete docs (~6,300 lines removed) - Move test PDFs to tests/fixtures/ocr-samples/ - Create docs/DEPLOYMENT.md as principal guide - Create tests/ocr-validation/README.md - Update all refs for ultrathin monolith architecture - Update OCR tests to use relative paths Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Binary file not shown.
File diff suppressed because it is too large
Load Diff
@@ -1,382 +0,0 @@
|
||||
# Data Entry App - Bonuri Fiscale
|
||||
|
||||
Aplicatie pentru introducere bonuri fiscale cu workflow de aprobare si extragere automata date prin OCR.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.10+
|
||||
- Node.js 18+
|
||||
- (Optional) SSH tunnel pentru Oracle nomenclatoare
|
||||
|
||||
### Using Start Script (Recommended)
|
||||
|
||||
```bash
|
||||
# Start all services
|
||||
./start-data-entry.sh
|
||||
|
||||
# Or individual commands:
|
||||
./start-data-entry.sh start # Start all
|
||||
./start-data-entry.sh stop # Stop all
|
||||
./start-data-entry.sh status # Check status
|
||||
./start-data-entry.sh restart backend # Restart backend only
|
||||
```
|
||||
|
||||
**Services:**
|
||||
- Backend: http://localhost:8003
|
||||
- Frontend: http://localhost:3010
|
||||
- API Docs: http://localhost:8003/docs
|
||||
|
||||
### Manual Setup
|
||||
|
||||
#### Backend Setup
|
||||
|
||||
```bash
|
||||
cd backend/modules/data_entry/backend
|
||||
|
||||
# Create virtual environment
|
||||
python -m venv venv
|
||||
source venv/bin/activate # Linux/Mac
|
||||
# sau: venv\Scripts\activate # Windows
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Create .env file
|
||||
cp .env.example .env
|
||||
# Edit .env with your settings
|
||||
|
||||
# Run migrations
|
||||
alembic upgrade head
|
||||
|
||||
# Start server
|
||||
uvicorn app.main:app --reload --port 8003
|
||||
```
|
||||
|
||||
#### Frontend Setup
|
||||
|
||||
```bash
|
||||
cd backend/modules/data_entry/frontend
|
||||
|
||||
# Install dependencies
|
||||
npm install
|
||||
|
||||
# Start dev server
|
||||
npm run dev -- --port 3010
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
### Pentru Utilizatori
|
||||
- **OCR Automat** - Extragere automata date din poza bonului (suma, data, furnizor, CUI)
|
||||
- Upload poze bonuri fiscale
|
||||
- Completare date bon (suma, data, furnizor)
|
||||
- Selectie tip cheltuiala
|
||||
- Trimitere spre aprobare
|
||||
|
||||
### Pentru Contabili
|
||||
- Vizualizare bonuri in asteptare
|
||||
- Editare note contabile propuse
|
||||
- Aprobare/Respingere bonuri
|
||||
- Aprobare in masa
|
||||
|
||||
## OCR Feature
|
||||
|
||||
### Cum functioneaza
|
||||
|
||||
1. **Upload imagine** - Trage sau selecteaza poza bonului
|
||||
2. **Procesare OCR** - Click pe "Proceseaza cu OCR"
|
||||
3. **Previzualizare** - Datele extrase sunt afisate cu indicatori de incredere
|
||||
4. **Aplicare** - Click "Aplica datele in formular" pentru auto-fill
|
||||
|
||||
### Campuri extrase automat
|
||||
|
||||
| Camp | Acuratete estimata |
|
||||
|------|-------------------|
|
||||
| Suma (TOTAL) | 90-95% |
|
||||
| Data | 85-90% |
|
||||
| Numar bon | 80-85% |
|
||||
| Furnizor | 70-80% |
|
||||
| CUI | 85-90% |
|
||||
| Tip document | 95%+ |
|
||||
|
||||
### OCR System Dependencies (Linux/Docker)
|
||||
|
||||
Pentru functionarea OCR trebuie instalate:
|
||||
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
apt-get install -y \
|
||||
tesseract-ocr \
|
||||
tesseract-ocr-ron \
|
||||
tesseract-ocr-eng \
|
||||
poppler-utils \
|
||||
libgl1-mesa-glx \
|
||||
libglib2.0-0
|
||||
|
||||
# Fedora/RHEL
|
||||
dnf install -y \
|
||||
tesseract \
|
||||
tesseract-langpack-ron \
|
||||
tesseract-langpack-eng \
|
||||
poppler-utils
|
||||
```
|
||||
|
||||
**Note:** PaddleOCR (engine principal) se instaleaza automat cu pip. Tesseract este folosit ca fallback.
|
||||
|
||||
### OCR System Dependencies (Windows)
|
||||
|
||||
Pe Windows Server trebuie instalate manual urmatoarele componente:
|
||||
|
||||
#### 1. Poppler (pentru conversie PDF → imagini)
|
||||
|
||||
```powershell
|
||||
# Descarca Poppler pentru Windows
|
||||
# https://github.com/osborn/poppler-windows/releases
|
||||
# sau https://github.com/bblanchon/pdfium-binaries
|
||||
|
||||
# Extrage in C:\Program Files\poppler\
|
||||
# Adauga la PATH: C:\Program Files\poppler\Library\bin
|
||||
```
|
||||
|
||||
#### 2. Tesseract OCR (engine OCR backup)
|
||||
|
||||
```powershell
|
||||
# Descarca installer de la:
|
||||
# https://github.com/UB-Mannheim/tesseract/wiki
|
||||
|
||||
# Instaleaza cu limbile: English + Romanian
|
||||
# Default path: C:\Program Files\Tesseract-OCR\
|
||||
# Adauga la PATH
|
||||
```
|
||||
|
||||
#### 3. Python OCR Dependencies (in venv)
|
||||
|
||||
```powershell
|
||||
cd C:\inetpub\wwwroot\roa2web\data-entry-backend
|
||||
.\venv\Scripts\activate
|
||||
|
||||
# Instaleaza dependentele OCR
|
||||
pip install paddlepaddle>=2.5.0
|
||||
pip install paddleocr>=2.7.0
|
||||
pip install opencv-python>=4.8.0
|
||||
pip install pytesseract>=0.3.10
|
||||
pip install pdf2image>=1.16.0
|
||||
|
||||
# Sau din requirements.txt
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
#### 4. Restart serviciu
|
||||
|
||||
```powershell
|
||||
nssm restart ROA2WEB-DataEntry
|
||||
```
|
||||
|
||||
**Note importante Windows:**
|
||||
- Prima rulare PaddleOCR descarca modele (~200MB) - poate dura cateva minute
|
||||
- PaddleOCR necesita ~2GB RAM disponibil
|
||||
- Verifica PATH-ul pentru Poppler si Tesseract dupa instalare
|
||||
- Restart serviciul backend dupa orice modificare PATH
|
||||
|
||||
### OCR API Endpoints
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| GET | /api/ocr/status | Check OCR service status |
|
||||
| POST | /api/ocr/extract | Extract data from uploaded image |
|
||||
| POST | /api/ocr/extract-attachment/{id} | Re-process existing attachment |
|
||||
|
||||
### Test OCR
|
||||
|
||||
```bash
|
||||
# Check OCR status
|
||||
curl http://localhost:8003/api/ocr/status
|
||||
|
||||
# Extract from image
|
||||
curl -X POST -F "file=@bon.jpg" http://localhost:8003/api/ocr/extract
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
DRAFT → PENDING_REVIEW → APPROVED/REJECTED → (SYNCED in Oracle)
|
||||
```
|
||||
|
||||
1. **DRAFT**: Utilizator completeaza datele (manual sau via OCR)
|
||||
2. **PENDING_REVIEW**: Sistemul genereaza note contabile automat
|
||||
3. **APPROVED**: Contabil a aprobat bonul
|
||||
4. **REJECTED**: Contabil a respins (utilizatorul poate corecta)
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
backend/modules/data_entry/
|
||||
├── backend/
|
||||
│ ├── app/
|
||||
│ │ ├── main.py # FastAPI entry point
|
||||
│ │ ├── config.py # Settings
|
||||
│ │ ├── db/
|
||||
│ │ │ ├── database.py # SQLite engine
|
||||
│ │ │ ├── models/ # SQLModel models
|
||||
│ │ │ └── crud/ # CRUD operations
|
||||
│ │ ├── schemas/ # Pydantic schemas
|
||||
│ │ │ └── ocr.py # OCR response schemas
|
||||
│ │ ├── services/
|
||||
│ │ │ ├── receipt_service.py
|
||||
│ │ │ ├── ocr_service.py # OCR orchestration
|
||||
│ │ │ ├── ocr_engine.py # PaddleOCR/Tesseract
|
||||
│ │ │ ├── ocr_extractor.py # Regex patterns RO
|
||||
│ │ │ └── image_preprocessor.py # OpenCV pipeline
|
||||
│ │ └── routers/
|
||||
│ │ ├── receipts.py
|
||||
│ │ └── ocr.py # OCR endpoints
|
||||
│ ├── migrations/ # Alembic migrations
|
||||
│ ├── data/
|
||||
│ │ ├── receipts.db # SQLite database
|
||||
│ │ └── uploads/ # Uploaded files
|
||||
│ └── requirements.txt
|
||||
│
|
||||
├── frontend/
|
||||
│ ├── src/
|
||||
│ │ ├── views/receipts/ # Page components
|
||||
│ │ ├── components/
|
||||
│ │ │ ├── receipts/ # Receipt components
|
||||
│ │ │ └── ocr/ # OCR components
|
||||
│ │ │ ├── OCRUploadZone.vue
|
||||
│ │ │ ├── OCRPreview.vue
|
||||
│ │ │ └── OCRConfidenceIndicator.vue
|
||||
│ │ ├── stores/ # Pinia stores
|
||||
│ │ └── router/ # Vue Router
|
||||
│ ├── package.json
|
||||
│ └── vite.config.js
|
||||
│
|
||||
└── docs/ # Documentation
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
### Backend (.env)
|
||||
|
||||
```bash
|
||||
# SQLite
|
||||
SQLITE_DATABASE_PATH=data/receipts.db
|
||||
|
||||
# File uploads
|
||||
UPLOAD_PATH=data/uploads
|
||||
MAX_UPLOAD_SIZE_MB=10
|
||||
|
||||
# Oracle (for nomenclatures)
|
||||
ORACLE_USER=CONTAFIN_ORACLE
|
||||
ORACLE_PASSWORD=your_password
|
||||
ORACLE_HOST=localhost
|
||||
ORACLE_PORT=1526
|
||||
ORACLE_SID=ROA
|
||||
|
||||
# JWT (shared with Reports module)
|
||||
JWT_SECRET_KEY=your_secret_key
|
||||
JWT_ALGORITHM=HS256
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
### Create new migration
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
alembic revision --autogenerate -m "Add new field"
|
||||
alembic upgrade head
|
||||
```
|
||||
|
||||
### Run tests
|
||||
|
||||
```bash
|
||||
# Backend
|
||||
cd backend && pytest
|
||||
|
||||
# Frontend
|
||||
cd frontend && npm run test
|
||||
```
|
||||
|
||||
## API Documentation
|
||||
|
||||
Full API documentation available at http://localhost:8003/docs when backend is running.
|
||||
|
||||
### Key Endpoints
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| POST | /api/receipts/ | Create receipt |
|
||||
| GET | /api/receipts/ | List receipts |
|
||||
| GET | /api/receipts/{id} | Get receipt details |
|
||||
| POST | /api/receipts/{id}/submit | Submit for review |
|
||||
| POST | /api/receipts/{id}/approve | Approve receipt |
|
||||
| POST | /api/receipts/{id}/reject | Reject receipt |
|
||||
| POST | /api/receipts/{id}/attachments | Upload attachment |
|
||||
| GET | /api/ocr/status | OCR service status |
|
||||
| POST | /api/ocr/extract | OCR image extraction |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### OCR not working
|
||||
|
||||
1. Check OCR status: `curl http://localhost:8003/api/ocr/status`
|
||||
2. Install system dependencies (tesseract, poppler)
|
||||
3. Verify PaddleOCR installed: `python -c "from paddleocr import PaddleOCR"`
|
||||
|
||||
### OCR Windows - "poppler not in PATH"
|
||||
|
||||
```powershell
|
||||
# Eroare: "Unable to get page count. Is poppler installed and in PATH?"
|
||||
|
||||
# Solutie 1: Adauga Poppler la PATH
|
||||
# System Properties → Environment Variables → System variables → Path → New
|
||||
# Adauga: C:\Program Files\poppler\Library\bin
|
||||
|
||||
# Solutie 2: Restart serviciul dupa modificarea PATH
|
||||
nssm restart ROA2WEB-DataEntry
|
||||
|
||||
# Verificare:
|
||||
pdfinfo --version
|
||||
```
|
||||
|
||||
### OCR Windows - "tesseract not found"
|
||||
|
||||
```powershell
|
||||
# Eroare: "tesseract is not installed or it's not in your PATH"
|
||||
|
||||
# Solutie: Adauga Tesseract la PATH
|
||||
# C:\Program Files\Tesseract-OCR\
|
||||
|
||||
# Verificare:
|
||||
tesseract --version
|
||||
tesseract --list-langs # Trebuie sa arate 'ron' si 'eng'
|
||||
```
|
||||
|
||||
### OCR Windows - PaddleOCR import error
|
||||
|
||||
```powershell
|
||||
# Eroare: "No module named 'paddleocr'"
|
||||
|
||||
cd C:\inetpub\wwwroot\roa2web\data-entry-backend
|
||||
.\venv\Scripts\activate
|
||||
pip install paddlepaddle>=2.5.0
|
||||
pip install paddleocr>=2.7.0
|
||||
|
||||
# Restart serviciu
|
||||
nssm restart ROA2WEB-DataEntry
|
||||
```
|
||||
|
||||
### Low OCR accuracy
|
||||
|
||||
- Ensure good lighting when taking receipt photos
|
||||
- Keep receipt flat (no folds/wrinkles)
|
||||
- Try PDF instead of JPG for scanned documents
|
||||
- Check if text is in focus
|
||||
|
||||
## Phase 2 (Future)
|
||||
|
||||
- Oracle sync for approved receipts
|
||||
- Integration with pack_contafin procedures
|
||||
- Automatic posting to ACT/RUL tables
|
||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
Binary file not shown.
Binary file not shown.
File diff suppressed because it is too large
Load Diff
Binary file not shown.
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
|
Before Width: | Height: | Size: 299 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 323 KiB |
Binary file not shown.
Reference in New Issue
Block a user