feat(5.18): corpus k-NN exemple etichetate + seed real Haiku (17181 op)
Seed app/data/operatii-etichetate.json regenerat cu subagenti Haiku pe TOATE cele 17181 operatii distincte (ordine frecventa, 100%), inlocuind seed-ul Groq (3758). Validare Haiku vs Groq pe 157 op etichetate: la dezacorduri Haiku corect ~22/30, Groq ~0. Haiku prinde gunoiul ratat de Groq (ITP, chirie anvelope, nume piese fara actiune): NUL 2200 (12.8%) vs ~7.6% Groq; adaptare electronica OE-7 (nu OE-5), placute frana uzura OE-1 (nu OE-F avarie). US-001..006: prefiltru NUL determinist, etichetator offline, generator seed, seeder mapping_suggestions (in init_db, gated seed_operatii_enabled), embeddings indexeaza corpus etichetat, enrich NUL+kNN. Distributie seed: OE-1 80.1%, NUL 12.8%, OE-2 3.5%, restul rar (OE-4/3/7/8/R/I/5, AITLV, R-ODO). config: seed_operatii_enabled=True + embeddings_enabled=True implicit (SILVER populat + sugestii semantice; ambele suggestion-only, dezactivabile prin env). Suita: 1387 passed, 1 deselected (live). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
59
app/operatii_seed.py
Normal file
59
app/operatii_seed.py
Normal file
@@ -0,0 +1,59 @@
|
||||
"""Seeder corpus operatii etichetate -> mapping_suggestions (SILVER, PRD 5.18 US-004).
|
||||
|
||||
Artefactul `app/data/operatii-etichetate.json` e produs offline de
|
||||
`tools/mapare-llm/genereaza_seed.py` (etichetare LM Studio, o singura data) si comis
|
||||
in repo. La `init_db` il incarcam in `mapping_suggestions` cu INSERT OR IGNORE, ca
|
||||
SILVER sa nu mai fie gol in productie (sugestii exact-match + corpus k-NN reale).
|
||||
|
||||
Format seed: [{denumire, denumire_normalizata, cod, is_nul, source, confidence}].
|
||||
Reutilizeaza `shared_store.seed_suggestions` (normalizeaza cheia + impune NUL->cod NULL,
|
||||
INSERT OR IGNORE). NB (F10): confirmarile UMANE stau in `shared_mappings`, NU aici —
|
||||
deci INSERT OR IGNORE pastreaza codul LLM existent la re-seed (v1 = ignore, nu upsert).
|
||||
|
||||
SUGGESTION-ONLY (invariant #13): nimic din SILVER nu intra in resolve_prestatii/load_mapping.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import sqlite3
|
||||
|
||||
from .shared_store import seed_suggestions
|
||||
|
||||
SEED_PATH = os.path.join(os.path.dirname(__file__), "data", "operatii-etichetate.json")
|
||||
|
||||
|
||||
def load_seed_file(path: str = SEED_PATH) -> list[dict]:
|
||||
"""Citeste artefactul seed. Lipsa / invalid -> [] (degradare gratioasa)."""
|
||||
if not path or not os.path.exists(path):
|
||||
return []
|
||||
try:
|
||||
with open(path, encoding="utf-8") as fh:
|
||||
data = json.load(fh)
|
||||
except (ValueError, OSError):
|
||||
return []
|
||||
return data if isinstance(data, list) else []
|
||||
|
||||
|
||||
def seed_operatii_etichetate(conn: sqlite3.Connection, path: str = SEED_PATH) -> int:
|
||||
"""Incarca seedul in mapping_suggestions (INSERT OR IGNORE). Intoarce nr. randuri inserate.
|
||||
|
||||
Mapeaza cheia seedului `cod` -> `cod_prestatie` (forma asteptata de seed_suggestions);
|
||||
`is_nul=True` forteaza cod NULL acolo. Idempotent: re-rularea nu dubleaza randuri.
|
||||
"""
|
||||
raw = load_seed_file(path)
|
||||
if not raw:
|
||||
return 0
|
||||
items = [
|
||||
{
|
||||
"denumire": e.get("denumire") or e.get("denumire_normalizata") or "",
|
||||
"cod_prestatie": e.get("cod"),
|
||||
"is_nul": bool(e.get("is_nul")),
|
||||
"source": e.get("source") or "llm_seed",
|
||||
"confidence": e.get("confidence") or 0.0,
|
||||
}
|
||||
for e in raw
|
||||
if isinstance(e, dict)
|
||||
]
|
||||
return seed_suggestions(conn, items)
|
||||
Reference in New Issue
Block a user