feat(ocr): Add validation system and CLIENT CUI extraction
OCR Data Extraction Validation System: - Add 7 validation rules (amount range, TVA ratio, payment sum, etc.) - Add Medium preprocessing to replace Heavy (fixes digit concatenation) - Add validation warnings to API responses - Flag receipts needing manual review (needs_manual_review field) - Add database migration for needs_manual_review column CLIENT CUI Extraction Improvements: - Support all format variations: CIF CLIENT:, CLIENT C.U.I/C.I.F., etc. - Handle OCR errors (R0 vs RO, C1F vs CIF) - Add client_name, client_cui, client_address to API response - Add validation fields to API response (was missing) QA Review: 12 issues found, 9 fixed (5 errors + 4 warnings) - Fixed type safety in validation rules - Fixed ZeroDivisionError risk - Fixed schema mismatch (Optional[bool] for needs_manual_review) - All 37 unit tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -20,6 +20,15 @@ class PaymentMethod(BaseModel):
|
||||
amount: Decimal = Field(description="Amount paid")
|
||||
|
||||
|
||||
class ValidationWarning(BaseModel):
|
||||
"""Validation warning from OCR extraction."""
|
||||
field: str = Field(description="Field name (e.g., 'amount', 'tva_total')")
|
||||
rule: str = Field(description="Rule name (e.g., 'amount_range', 'tva_ratio')")
|
||||
message: str = Field(description="Human-readable warning message")
|
||||
severity: str = Field(description="Severity: 'info', 'warning', 'error'")
|
||||
suggested_value: Optional[str] = Field(default=None, description="Suggested corrected value")
|
||||
|
||||
|
||||
class ExtractionData(BaseModel):
|
||||
"""Extracted receipt data from OCR."""
|
||||
|
||||
@@ -56,6 +65,13 @@ class ExtractionData(BaseModel):
|
||||
ocr_engine: str = Field(default="", description="OCR engine used: paddleocr or tesseract")
|
||||
processing_time_ms: int = Field(default=0, ge=0, description="Processing time in milliseconds")
|
||||
|
||||
# Validation results (added by bon-ocr-validation feature)
|
||||
# needs_manual_review: None = not validated yet (old receipts), False = no review needed, True = needs review
|
||||
needs_manual_review: Optional[bool] = Field(default=None, description="Flag for supervisor review (None=not validated, False=ok, True=needs review)")
|
||||
validation_warnings: List[str] = Field(default=[], description="Validation warnings")
|
||||
validation_errors: List[str] = Field(default=[], description="Validation errors")
|
||||
inter_ocr_ratios: dict[str, float] = Field(default={}, description="Inter-OCR consistency ratios")
|
||||
|
||||
class Config:
|
||||
"""Pydantic config."""
|
||||
json_schema_extra = {
|
||||
|
||||
Reference in New Issue
Block a user