Major OCR infrastructure improvements: - Add persistent SQLite-based job queue for OCR tasks - Implement worker pool with process isolation and auto-restart - Add OCR engine selector dropdown (Tesseract/PaddleOCR) in upload zone - Optimize Tesseract preprocessing based on benchmark results (8x faster) - Add recognize_cif_optimized() with multi-strategy CIF extraction - Add Romanian CIF checksum validation - Increase Telegram long polling timeout from 10s to 30s Squashed commits: - feat(ocr): Implement persistent worker pool with SQLite job queue - feat(ocr): Add OCR engine selector dropdown to upload zone - perf(telegram): Increase long polling timeout from 10s to 30s - perf(ocr): Optimize Tesseract preprocessing based on benchmark results 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
43 lines
1.3 KiB
Python
43 lines
1.3 KiB
Python
"""
|
|
OCR Services Module
|
|
|
|
Provides persistent OCR worker pool with job queue for efficient processing.
|
|
|
|
Components:
|
|
- ocr_worker_pool: Manages ProcessPoolExecutor with persistent PaddleOCR
|
|
- job_queue: SQLite-based job queue for async processing
|
|
- job_worker: Background task that processes queued jobs
|
|
- tesseract_engine: Optimized Tesseract with multi-PSM and polarity fix
|
|
|
|
Architecture:
|
|
FastAPI → job_queue.create_job() → SQLite
|
|
↓
|
|
job_worker loop → ocr_worker_pool.submit_task() → Worker Process
|
|
↓
|
|
PaddleOCR/Tesseract
|
|
"""
|
|
|
|
from .ocr_worker_pool import ocr_worker_pool, OCRWorkerPool
|
|
from .job_queue import job_queue, OCRJobQueue, OCRJob, OCRJobStatus
|
|
from .job_worker import start_job_worker, stop_job_worker
|
|
from .tesseract_engine import TesseractEngine
|
|
from .validation import OCRValidationEngine
|
|
|
|
__all__ = [
|
|
# Worker pool
|
|
"ocr_worker_pool",
|
|
"OCRWorkerPool",
|
|
# Job queue
|
|
"job_queue",
|
|
"OCRJobQueue",
|
|
"OCRJob",
|
|
"OCRJobStatus",
|
|
# Job worker
|
|
"start_job_worker",
|
|
"stop_job_worker",
|
|
# Engines
|
|
"TesseractEngine",
|
|
# Validation
|
|
"OCRValidationEngine",
|
|
]
|