Seed app/data/operatii-etichetate.json regenerat cu subagenti Haiku pe TOATE cele 17181 operatii distincte (ordine frecventa, 100%), inlocuind seed-ul Groq (3758). Validare Haiku vs Groq pe 157 op etichetate: la dezacorduri Haiku corect ~22/30, Groq ~0. Haiku prinde gunoiul ratat de Groq (ITP, chirie anvelope, nume piese fara actiune): NUL 2200 (12.8%) vs ~7.6% Groq; adaptare electronica OE-7 (nu OE-5), placute frana uzura OE-1 (nu OE-F avarie). US-001..006: prefiltru NUL determinist, etichetator offline, generator seed, seeder mapping_suggestions (in init_db, gated seed_operatii_enabled), embeddings indexeaza corpus etichetat, enrich NUL+kNN. Distributie seed: OE-1 80.1%, NUL 12.8%, OE-2 3.5%, restul rar (OE-4/3/7/8/R/I/5, AITLV, R-ODO). config: seed_operatii_enabled=True + embeddings_enabled=True implicit (SILVER populat + sugestii semantice; ambele suggestion-only, dezactivabile prin env). Suita: 1387 passed, 1 deselected (live). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
60 lines
2.2 KiB
Python
60 lines
2.2 KiB
Python
"""Seeder corpus operatii etichetate -> mapping_suggestions (SILVER, PRD 5.18 US-004).
|
|
|
|
Artefactul `app/data/operatii-etichetate.json` e produs offline de
|
|
`tools/mapare-llm/genereaza_seed.py` (etichetare LM Studio, o singura data) si comis
|
|
in repo. La `init_db` il incarcam in `mapping_suggestions` cu INSERT OR IGNORE, ca
|
|
SILVER sa nu mai fie gol in productie (sugestii exact-match + corpus k-NN reale).
|
|
|
|
Format seed: [{denumire, denumire_normalizata, cod, is_nul, source, confidence}].
|
|
Reutilizeaza `shared_store.seed_suggestions` (normalizeaza cheia + impune NUL->cod NULL,
|
|
INSERT OR IGNORE). NB (F10): confirmarile UMANE stau in `shared_mappings`, NU aici —
|
|
deci INSERT OR IGNORE pastreaza codul LLM existent la re-seed (v1 = ignore, nu upsert).
|
|
|
|
SUGGESTION-ONLY (invariant #13): nimic din SILVER nu intra in resolve_prestatii/load_mapping.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import json
|
|
import os
|
|
import sqlite3
|
|
|
|
from .shared_store import seed_suggestions
|
|
|
|
SEED_PATH = os.path.join(os.path.dirname(__file__), "data", "operatii-etichetate.json")
|
|
|
|
|
|
def load_seed_file(path: str = SEED_PATH) -> list[dict]:
|
|
"""Citeste artefactul seed. Lipsa / invalid -> [] (degradare gratioasa)."""
|
|
if not path or not os.path.exists(path):
|
|
return []
|
|
try:
|
|
with open(path, encoding="utf-8") as fh:
|
|
data = json.load(fh)
|
|
except (ValueError, OSError):
|
|
return []
|
|
return data if isinstance(data, list) else []
|
|
|
|
|
|
def seed_operatii_etichetate(conn: sqlite3.Connection, path: str = SEED_PATH) -> int:
|
|
"""Incarca seedul in mapping_suggestions (INSERT OR IGNORE). Intoarce nr. randuri inserate.
|
|
|
|
Mapeaza cheia seedului `cod` -> `cod_prestatie` (forma asteptata de seed_suggestions);
|
|
`is_nul=True` forteaza cod NULL acolo. Idempotent: re-rularea nu dubleaza randuri.
|
|
"""
|
|
raw = load_seed_file(path)
|
|
if not raw:
|
|
return 0
|
|
items = [
|
|
{
|
|
"denumire": e.get("denumire") or e.get("denumire_normalizata") or "",
|
|
"cod_prestatie": e.get("cod"),
|
|
"is_nul": bool(e.get("is_nul")),
|
|
"source": e.get("source") or "llm_seed",
|
|
"confidence": e.get("confidence") or 0.0,
|
|
}
|
|
for e in raw
|
|
if isinstance(e, dict)
|
|
]
|
|
return seed_suggestions(conn, items)
|