feat(ocr): Add validation system and CLIENT CUI extraction

OCR Data Extraction Validation System:
- Add 7 validation rules (amount range, TVA ratio, payment sum, etc.)
- Add Medium preprocessing to replace Heavy (fixes digit concatenation)
- Add validation warnings to API responses
- Flag receipts needing manual review (needs_manual_review field)
- Add database migration for needs_manual_review column

CLIENT CUI Extraction Improvements:
- Support all format variations: CIF CLIENT:, CLIENT C.U.I/C.I.F., etc.
- Handle OCR errors (R0 vs RO, C1F vs CIF)
- Add client_name, client_cui, client_address to API response
- Add validation fields to API response (was missing)

QA Review: 12 issues found, 9 fixed (5 errors + 4 warnings)
- Fixed type safety in validation rules
- Fixed ZeroDivisionError risk
- Fixed schema mismatch (Optional[bool] for needs_manual_review)
- All 37 unit tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-30 19:12:52 +02:00
parent ce85e0643b
commit ab160b628d
14 changed files with 4161 additions and 33 deletions

View File

@@ -0,0 +1,40 @@
"""Add needs_manual_review flag to receipts table.
Revision ID: 20251230_needs_manual_review
Revises: 20251216_payment_mode
Create Date: 2025-12-30
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = '20251230_needs_manual_review'
down_revision = '20251216_payment_mode'
branch_labels = None
depends_on = None
def upgrade() -> None:
"""Add needs_manual_review column for OCR validation tracking.
This column tracks whether a receipt needs manual supervisor review
based on OCR extraction validation warnings:
- NULL = not validated yet (old receipts before validation feature)
- FALSE = validated, no review needed
- TRUE = validated, needs review
"""
with op.batch_alter_table('receipts', schema=None) as batch_op:
batch_op.add_column(
sa.Column('needs_manual_review', sa.Boolean(), nullable=True)
)
# NOTE: We do NOT set a default value for existing rows.
# NULL indicates the receipt was created before validation was implemented.
# Only new receipts (created after this migration) will have TRUE/FALSE values.
def downgrade() -> None:
"""Remove needs_manual_review column."""
with op.batch_alter_table('receipts', schema=None) as batch_op:
batch_op.drop_column('needs_manual_review')