feat(ocr): Add docTR OCR engine with metrics infrastructure
Add docTR as primary OCR engine with 2-tier sequential processing, OCR metrics tracking, and simplified engine selection. Features: - docTR OCR engine with light+medium preprocessing tiers - doctr_plus mode with early exit optimization (~65% fast path) - OCR metrics dashboard with per-engine statistics - User OCR preference persistence - Parallel worker pool for OCR processing - Cross-validation for extraction quality Engine options: tesseract, doctr, doctr_plus (recommended), paddleocr 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -48,6 +48,11 @@ class Settings(BaseSettings):
|
||||
# CORS
|
||||
cors_origins: str = "http://localhost:3010,http://localhost:3000"
|
||||
|
||||
# OCR Engines (comma-separated list of active engines shown in UI)
|
||||
# Available: tesseract, paddleocr, doctr, doctr_plus
|
||||
# doctr_plus is recommended (2-tier sequential with early exit)
|
||||
ocr_active_engines: str = "doctr,doctr_plus"
|
||||
|
||||
class Config:
|
||||
env_file = ".env"
|
||||
env_file_encoding = "utf-8"
|
||||
@@ -80,6 +85,11 @@ class Settings(BaseSettings):
|
||||
"""Get CORS origins as list."""
|
||||
return [origin.strip() for origin in self.cors_origins.split(",")]
|
||||
|
||||
@property
|
||||
def ocr_active_engines_list(self) -> List[str]:
|
||||
"""Get OCR active engines as list."""
|
||||
return [engine.strip() for engine in self.ocr_active_engines.split(",")]
|
||||
|
||||
@property
|
||||
def oracle_dsn(self) -> str:
|
||||
"""Get Oracle DSN string."""
|
||||
|
||||
Reference in New Issue
Block a user