Files
Marius Mutu 74f7aefc26 feat(ocr): Implement persistent worker pool with SQLite job queue
Major OCR infrastructure improvements:
- Add persistent SQLite-based job queue for OCR tasks
- Implement worker pool with process isolation and auto-restart
- Add OCR engine selector dropdown (Tesseract/PaddleOCR) in upload zone
- Optimize Tesseract preprocessing based on benchmark results (8x faster)
- Add recognize_cif_optimized() with multi-strategy CIF extraction
- Add Romanian CIF checksum validation
- Increase Telegram long polling timeout from 10s to 30s

Squashed commits:
- feat(ocr): Implement persistent worker pool with SQLite job queue
- feat(ocr): Add OCR engine selector dropdown to upload zone
- perf(telegram): Increase long polling timeout from 10s to 30s
- perf(ocr): Optimize Tesseract preprocessing based on benchmark results

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 12:32:12 +02:00

43 lines
1.3 KiB
Python

"""
OCR Services Module
Provides persistent OCR worker pool with job queue for efficient processing.
Components:
- ocr_worker_pool: Manages ProcessPoolExecutor with persistent PaddleOCR
- job_queue: SQLite-based job queue for async processing
- job_worker: Background task that processes queued jobs
- tesseract_engine: Optimized Tesseract with multi-PSM and polarity fix
Architecture:
FastAPI → job_queue.create_job() → SQLite
job_worker loop → ocr_worker_pool.submit_task() → Worker Process
PaddleOCR/Tesseract
"""
from .ocr_worker_pool import ocr_worker_pool, OCRWorkerPool
from .job_queue import job_queue, OCRJobQueue, OCRJob, OCRJobStatus
from .job_worker import start_job_worker, stop_job_worker
from .tesseract_engine import TesseractEngine
from .validation import OCRValidationEngine
__all__ = [
# Worker pool
"ocr_worker_pool",
"OCRWorkerPool",
# Job queue
"job_queue",
"OCRJobQueue",
"OCRJob",
"OCRJobStatus",
# Job worker
"start_job_worker",
"stop_job_worker",
# Engines
"TesseractEngine",
# Validation
"OCRValidationEngine",
]