feat(ocr): Add validation system and CLIENT CUI extraction
OCR Data Extraction Validation System: - Add 7 validation rules (amount range, TVA ratio, payment sum, etc.) - Add Medium preprocessing to replace Heavy (fixes digit concatenation) - Add validation warnings to API responses - Flag receipts needing manual review (needs_manual_review field) - Add database migration for needs_manual_review column CLIENT CUI Extraction Improvements: - Support all format variations: CIF CLIENT:, CLIENT C.U.I/C.I.F., etc. - Handle OCR errors (R0 vs RO, C1F vs CIF) - Add client_name, client_cui, client_address to API response - Add validation fields to API response (was missing) QA Review: 12 issues found, 9 fixed (5 errors + 4 warnings) - Fixed type safety in validation rules - Fixed ZeroDivisionError risk - Fixed schema mismatch (Optional[bool] for needs_manual_review) - All 37 unit tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -118,13 +118,23 @@ async def extract_from_image(file: UploadFile = File(...)):
|
||||
items_count=result.items_count,
|
||||
payment_methods=payment_methods_list,
|
||||
suggested_payment_mode=suggested_payment_mode,
|
||||
# Client data (B2B receipts)
|
||||
client_name=result.client_name,
|
||||
client_cui=result.client_cui,
|
||||
client_address=result.client_address,
|
||||
confidence_amount=result.confidence_amount,
|
||||
confidence_date=result.confidence_date,
|
||||
confidence_vendor=result.confidence_vendor,
|
||||
confidence_client=result.confidence_client,
|
||||
overall_confidence=result.overall_confidence,
|
||||
raw_text=result.raw_text,
|
||||
ocr_engine=result.ocr_engine,
|
||||
processing_time_ms=result.processing_time_ms,
|
||||
# Validation results
|
||||
needs_manual_review=result.needs_manual_review,
|
||||
validation_warnings=result.validation_warnings,
|
||||
validation_errors=result.validation_errors,
|
||||
inter_ocr_ratios=result.inter_ocr_ratios,
|
||||
)
|
||||
|
||||
return OCRResponse(success=True, message=message, data=data)
|
||||
@@ -206,13 +216,23 @@ async def extract_from_attachment(
|
||||
items_count=result.items_count,
|
||||
payment_methods=payment_methods_list,
|
||||
suggested_payment_mode=suggested_payment_mode,
|
||||
# Client data (B2B receipts)
|
||||
client_name=result.client_name,
|
||||
client_cui=result.client_cui,
|
||||
client_address=result.client_address,
|
||||
confidence_amount=result.confidence_amount,
|
||||
confidence_date=result.confidence_date,
|
||||
confidence_vendor=result.confidence_vendor,
|
||||
confidence_client=result.confidence_client,
|
||||
overall_confidence=result.overall_confidence,
|
||||
raw_text=result.raw_text,
|
||||
ocr_engine=result.ocr_engine,
|
||||
processing_time_ms=result.processing_time_ms,
|
||||
# Validation results
|
||||
needs_manual_review=result.needs_manual_review,
|
||||
validation_warnings=result.validation_warnings,
|
||||
validation_errors=result.validation_errors,
|
||||
inter_ocr_ratios=result.inter_ocr_ratios,
|
||||
)
|
||||
|
||||
return OCRResponse(success=True, message=message, data=data)
|
||||
|
||||
Reference in New Issue
Block a user