feat(ocr): Implement persistent worker pool with SQLite job queue

Major OCR infrastructure improvements:
- Add persistent SQLite-based job queue for OCR tasks
- Implement worker pool with process isolation and auto-restart
- Add OCR engine selector dropdown (Tesseract/PaddleOCR) in upload zone
- Optimize Tesseract preprocessing based on benchmark results (8x faster)
- Add recognize_cif_optimized() with multi-strategy CIF extraction
- Add Romanian CIF checksum validation
- Increase Telegram long polling timeout from 10s to 30s

Squashed commits:
- feat(ocr): Implement persistent worker pool with SQLite job queue
- feat(ocr): Add OCR engine selector dropdown to upload zone
- perf(telegram): Increase long polling timeout from 10s to 30s
- perf(ocr): Optimize Tesseract preprocessing based on benchmark results

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-31 12:32:12 +02:00
parent 00f826f7ed
commit 74f7aefc26
23 changed files with 3616 additions and 209 deletions

View File

@@ -260,7 +260,11 @@ async def main():
logger.info("🤖 Starting Telegram bot polling...")
await telegram_app.initialize()
await telegram_app.start()
await telegram_app.updater.start_polling(drop_pending_updates=True)
await telegram_app.updater.start_polling(
drop_pending_updates=True,
poll_interval=0, # No delay between polls
timeout=30 # Long poll timeout 30 seconds (reduces requests from ~6/min to ~2/min)
)
logger.info("✅ Telegram bot is now running and polling for updates")
logger.info(f"📱 Bot ready to receive messages at @{(await telegram_app.bot.get_me()).username}")