Files
rar-autopass/app/operatii_seed.py
Claude Agent 756f77730f feat(5.18): corpus k-NN exemple etichetate + seed real Haiku (17181 op)
Seed app/data/operatii-etichetate.json regenerat cu subagenti Haiku pe TOATE
cele 17181 operatii distincte (ordine frecventa, 100%), inlocuind seed-ul Groq
(3758). Validare Haiku vs Groq pe 157 op etichetate: la dezacorduri Haiku corect
~22/30, Groq ~0. Haiku prinde gunoiul ratat de Groq (ITP, chirie anvelope, nume
piese fara actiune): NUL 2200 (12.8%) vs ~7.6% Groq; adaptare electronica OE-7
(nu OE-5), placute frana uzura OE-1 (nu OE-F avarie).

US-001..006: prefiltru NUL determinist, etichetator offline, generator seed,
seeder mapping_suggestions (in init_db, gated seed_operatii_enabled), embeddings
indexeaza corpus etichetat, enrich NUL+kNN. Distributie seed: OE-1 80.1%, NUL
12.8%, OE-2 3.5%, restul rar (OE-4/3/7/8/R/I/5, AITLV, R-ODO).

config: seed_operatii_enabled=True + embeddings_enabled=True implicit (SILVER
populat + sugestii semantice; ambele suggestion-only, dezactivabile prin env).

Suita: 1387 passed, 1 deselected (live).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 06:59:15 +00:00

60 lines
2.2 KiB
Python

"""Seeder corpus operatii etichetate -> mapping_suggestions (SILVER, PRD 5.18 US-004).
Artefactul `app/data/operatii-etichetate.json` e produs offline de
`tools/mapare-llm/genereaza_seed.py` (etichetare LM Studio, o singura data) si comis
in repo. La `init_db` il incarcam in `mapping_suggestions` cu INSERT OR IGNORE, ca
SILVER sa nu mai fie gol in productie (sugestii exact-match + corpus k-NN reale).
Format seed: [{denumire, denumire_normalizata, cod, is_nul, source, confidence}].
Reutilizeaza `shared_store.seed_suggestions` (normalizeaza cheia + impune NUL->cod NULL,
INSERT OR IGNORE). NB (F10): confirmarile UMANE stau in `shared_mappings`, NU aici —
deci INSERT OR IGNORE pastreaza codul LLM existent la re-seed (v1 = ignore, nu upsert).
SUGGESTION-ONLY (invariant #13): nimic din SILVER nu intra in resolve_prestatii/load_mapping.
"""
from __future__ import annotations
import json
import os
import sqlite3
from .shared_store import seed_suggestions
SEED_PATH = os.path.join(os.path.dirname(__file__), "data", "operatii-etichetate.json")
def load_seed_file(path: str = SEED_PATH) -> list[dict]:
"""Citeste artefactul seed. Lipsa / invalid -> [] (degradare gratioasa)."""
if not path or not os.path.exists(path):
return []
try:
with open(path, encoding="utf-8") as fh:
data = json.load(fh)
except (ValueError, OSError):
return []
return data if isinstance(data, list) else []
def seed_operatii_etichetate(conn: sqlite3.Connection, path: str = SEED_PATH) -> int:
"""Incarca seedul in mapping_suggestions (INSERT OR IGNORE). Intoarce nr. randuri inserate.
Mapeaza cheia seedului `cod` -> `cod_prestatie` (forma asteptata de seed_suggestions);
`is_nul=True` forteaza cod NULL acolo. Idempotent: re-rularea nu dubleaza randuri.
"""
raw = load_seed_file(path)
if not raw:
return 0
items = [
{
"denumire": e.get("denumire") or e.get("denumire_normalizata") or "",
"cod_prestatie": e.get("cod"),
"is_nul": bool(e.get("is_nul")),
"source": e.get("source") or "llm_seed",
"confidence": e.get("confidence") or 0.0,
}
for e in raw
if isinstance(e, dict)
]
return seed_suggestions(conn, items)