feat(5.15+5.14): CLOSE — fix-uri code-review + embeddings functional

5.15 (propagare design + dashboard editare) si 5.14 (mapare LLM distilata)
inchise dupa /code-review high. 8 buguri reparate TDD:

- HIGH modal nu se deschidea pe randul slim (base.html: trimitere-slim)
- HIGH /repune trunchia prestatii (declaratie incompleta la RAR) -> iterare
  peste existing, codes pozitional
- HIGH embeddings incarca model ~230MB degeaba pe corpus gol -> poarta has_corpus()
- HIGH picker chips gol pe re-render eroare -> conn/account_id pe toate ramurile
- MED obs re-derivat dupa stergere explicita -> _merge_override pastreaza obs=''
- MED mapare salvata fara denumire poluă GOLD -> _record_gold_validation guard
- MED typo nome_prestatie -> nume_prestatie in select /repune
- MED bucketare timp +3h gresita iarna -> SQLite localtime + TZ=Europe/Bucharest

Embeddings WIRE-uit functional (PRD #15, decizie user): ensure_embeddings_corpus
construieste corpus din nomenclator, gated pe AUTOPASS_EMBEDDINGS_ENABLED (default
off). Marime model corectata ~50MB->~230MB (estimare PRD gresita).

Cleanup: hoist load_* din bucla bulk-fix; import re la top.
Regresie: 1256 passed, 1 deselected (live), 0 failed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Claude Agent
2026-06-28 20:48:34 +00:00
parent 9e42e7ed6f
commit 3fc53534e2
53 changed files with 9684 additions and 384 deletions

View File

@@ -0,0 +1,567 @@
"""Harness de evaluare held-out pentru sistemul de mapare operatii->coduri RAR.
Scop (L14-S5, Decision #19 PRD 5.14):
Masurarea ACURATETEI REALE a clasificatorului inainte de a permite orice tier
auto-send peste GOLD propriu.
Rationale:
Masuratorile existente (100% acord vs Groq, 87% unanim NVIDIA) sunt masuri de
ACORD (cross-model), nu de ACURATETE vs ground-truth. Same-family NVIDIA =
eroare corelata: daca ambele modele gresesc la fel, acordul e 100% dar
acuratete = 0. Un set etichetat de OM (esantion aleator stratificat) e singurul
mod de a masura acuratete reala.
Continut:
1. sample_stratified() — esantionare stratificata aleatorie (cap/mijloc/coada
Zipf), determinista cu seed. FARA apel LLM.
2. export_for_labeling() — export CSV gol pt etichetare umana (ground-truth).
Coloana cod_gold RAMANE GOALA: etichetarea umana e
exclusiv responsabilitatea operatorului.
3. eval_predictions() — date (predictii, gold) -> precizie globala + per-cod
+ matrice confuzie + rata cod-gresit.
4. kill_criterion() — evalueaza daca sistemul indeplineste pragul de acceptanta
(F-E, PRD 5.14).
Ce NU face:
NU eticheteaza ground-truth-ul. Etichetarea de cod ar fi exact "antrenare pe test"
si ar invalida precizia raportata (Decision #19). Fisierul exportat se completeaza
MANUAL de operatorul uman.
CLI:
python3 tools/mapare-llm/heldout_eval.py --n 250 --out esantion-heldout.csv
Genereaza esantionul de 250 denumiri pt etichetare umana.
python3 tools/mapare-llm/heldout_eval.py --eval predictii.csv gold.csv
Evalueaza predictii vs ground-truth (ambele CSV cu camp 'denumire').
"""
from __future__ import annotations
import csv
import os
import random
import sys
_HERE = os.path.dirname(os.path.abspath(__file__))
_ROOT = os.path.abspath(os.path.join(_HERE, '..', '..'))
if _ROOT not in sys.path:
sys.path.insert(0, _ROOT)
# ---------------------------------------------------------------------------
# Constante
# ---------------------------------------------------------------------------
# Coduri RAR valide (din or_common.py / nomenclator, 18 coduri + NUL)
# NUL = supresie (non-operatie); NU este cod RAR valid transmis la RAR.
VALID_RAR: frozenset[str] = frozenset([
"OE-1", "OE-2", "OE-3", "OE-4", "OE-5", "OE-6", "OE-7", "OE-8",
"OE-D", "OE-F", "OE-C", "OE-S", "OE-R", "OE-A", "OE-I",
"AITLV", "R-ODO", "I-ODO",
])
NUL = "NUL" # eticheta speciala: supresie (nu e cod RAR)
ALL_LABELS = VALID_RAR | {NUL} # toate etichetele valide ale clasificatorului
UNRESOLVED = "?" # clasificatorul nu a dat raspuns -> needs_mapping
# Seed implicit pentru reproductibilitate esantionare
DEFAULT_SEED = 42
# Strate Zipf (proportii din numarul total de denumiri DISTINCTE):
# cap = top 20% dupa frecventa (cateva denumiri, volum ridicat)
# mijloc = urmatoarele 30%
# coada = restul 50% (multe denumiri, volum scazut individual)
_STRAT_HEAD_END_PCT = 0.20
_STRAT_MID_END_PCT = 0.50 # head+mid = 50%, deci mid = 30%
# Kill-criterion (F-E, PRD 5.14):
#
# DEFAULT_WRONG_CODE_THRESHOLD = 0.005 (0.5%)
# Justificare: un cod gresit = FINALIZATA ireversibila la RAR (Premisa 3).
# La 200 operatii/zi auto-rezolvate cu 0.5% rata gresita = 1 FINALIZATA
# gresita/zi, ceea ce depaseste toleranta operationala acceptabila.
# Pragul poate fi RELAXAT empiric pe baza de date reale; NU inasprit post-hoc.
# Recomandat: strangeti cel putin 200 esantioane inainte de a calibra.
#
# DEFAULT_COVERAGE_THRESHOLD = 0.50 (50%)
# Justificare: sub 50% acoperire, sistemul nu aduce economie reala vs
# needs_mapping uman (ar trebui sa lasi totul pe operatorul uman).
DEFAULT_WRONG_CODE_THRESHOLD = 0.005
DEFAULT_COVERAGE_THRESHOLD = 0.50
# ---------------------------------------------------------------------------
# Esantionare stratificata (FARA LLM)
# ---------------------------------------------------------------------------
def sample_stratified(
rows: list[tuple[str, int]],
n_sample: int = 250,
seed: int = DEFAULT_SEED,
) -> list[dict]:
"""Esantionare stratificata aleatorie pe trei strate Zipf: cap/mijloc/coada.
Determinista cu seed; NU apeleaza LLM (PRD L14-S5).
rows: lista de (denumire, nr) — frecventele absolute.
Nu trebuie sortata in prealabil.
n_sample: marimea totala a esantionului (aproximativa, +/-3 datorita rotunjirii).
Default 250 = practic pt etichetare umana in 2-3 ore.
seed: seed pentru random.Random — acelasi seed produce acelasi esantion.
Returneaza:
list de dict: {denumire: str, nr: int, strat: str}
strat in {"cap", "mijloc", "coada"}
Stratificare (pe count, nu pe volum):
cap = top 20% din denumirile distincte (cele cu frecventa mare)
mijloc = urmatoarele 30%
coada = restul 50%
Alocare per strat: proportionala cu marimea stratului (egal per denumire),
cu minim 1 per strat non-gol.
"""
if not rows:
return []
# Sorteaza descrescator dupa frecventa (ca sa definim stratele corect)
sorted_rows = sorted(rows, key=lambda x: -x[1])
n = len(sorted_rows)
# Limite strate (pe indici)
head_end = max(1, round(n * _STRAT_HEAD_END_PCT))
mid_end = max(head_end + 1, round(n * _STRAT_MID_END_PCT))
mid_end = min(mid_end, n)
strata: dict[str, list[tuple[str, int]]] = {
"cap": sorted_rows[:head_end],
"mijloc": sorted_rows[head_end:mid_end],
"coada": sorted_rows[mid_end:],
}
# Alocare proportionala cu marimea stratului
names = ["cap", "mijloc", "coada"]
sizes = {name: len(strata[name]) for name in names}
total_size = sum(sizes.values()) # == n
rng = random.Random(seed)
# Calculeaza alocarea cu regula: max(1, round(n_sample * frac)) per strat ne-gol
alloc: dict[str, int] = {}
for name in names[:-1]:
if sizes[name] == 0:
alloc[name] = 0
else:
a = max(1, round(n_sample * sizes[name] / total_size))
a = min(a, sizes[name]) # nu mai mult decat avem
alloc[name] = a
# Ultima strata primeste restul (pentru a ne apropia de n_sample)
used = sum(alloc.get(name, 0) for name in names[:-1])
remaining = max(0, n_sample - used)
alloc["coada"] = min(remaining, sizes["coada"])
if alloc["coada"] == 0 and sizes["coada"] > 0:
alloc["coada"] = 1 # garantam minim 1 din coada daca exista
# Esantionare per strat
result: list[dict] = []
for name in names:
items = strata[name]
k = alloc.get(name, 0)
if k > 0 and items:
sampled = rng.sample(items, k)
for (den, nr) in sampled:
result.append({"denumire": den, "nr": nr, "strat": name})
return result
# ---------------------------------------------------------------------------
# Export CSV pentru etichetare umana
# ---------------------------------------------------------------------------
def export_for_labeling(sample: list[dict], path: str) -> None:
"""Exporta esantionul ca CSV pentru etichetare UMANA (ground-truth).
Coloana `cod_gold` ramane GOALA in fisierul exportat.
NU o completa cu etichete LLM sau automate: ar fi "antrenare pe test"
si ar invalida precizia raportata (Decision #19, PRD 5.14).
sample: lista de {denumire, nr, strat} returnata de sample_stratified()
path: fisierul CSV de scris (suprascrie daca exista)
Format CSV: UTF-8-BOM, separator ';', coloane:
denumire;nr;strat;cod_gold
"""
with open(path, 'w', newline='', encoding='utf-8-sig') as f:
writer = csv.writer(f, delimiter=';')
writer.writerow(["denumire", "nr", "strat", "cod_gold"])
for item in sample:
writer.writerow([
item["denumire"],
item["nr"],
item["strat"],
"", # cod_gold GOLA — de completat de operator uman
])
# ---------------------------------------------------------------------------
# Evaluare predictii vs ground-truth
# ---------------------------------------------------------------------------
def eval_predictions(
predictions: list[dict],
ground_truth: list[dict],
) -> dict:
"""Evalueaza predictiile clasificatorului fata de ground-truth uman.
Matching pe 'denumire'. Denumirile din ground_truth fara predictie corespunzatoare
sunt tratate ca UNRESOLVED (pred='?').
predictions: list de {denumire: str, cod_pred: str}
cod_pred: cod RAR ("OE-1"…) | "NUL" | "?" (nerezolvat)
ground_truth: list de {denumire: str, cod_gold: str}
cod_gold: cod RAR | "NUL" (completat de operator uman)
Returneaza dict cu:
total — numarul total de intrari din ground_truth
correct — predictii corecte (pred == gold)
global_precision — correct / total
wrong_code_count — cazuri cod-gresit (critic: FINALIZATA ireversibila)
def: pred in VALID_RAR AND gold in VALID_RAR AND pred != gold
wrong_code_rate — wrong_code_count / total
coverage_count — predictii cu cod_pred != '?' (clasificatorul a raspuns)
coverage_rate — coverage_count / total
per_cod — dict {cod -> {tp, fp, fn, precision, recall}}
confusion_matrix — dict {"gold->pred" -> count}
Nota 'cod gresit' vs 'NUL gresit':
pred=NUL si gold=OE-X -> item merge la needs_mapping, nu la FINALIZATA.
Rau (operatie pierduta), dar REPARABIL.
pred=OE-X si gold=NUL -> trimitem non-operatia la RAR cu un cod.
Rau (inselatoare), dar RAR nu o accepta ca operatie.
pred=OE-X si gold=OE-Y (X!=Y) -> FINALIZATA cu cod GRESIT. IREVERSIBIL.
Doar ultimul caz e 'wrong_code' (blocant pentru auto-send dincolo de GOLD).
"""
if not ground_truth:
return {
"total": 0,
"correct": 0,
"global_precision": 0.0,
"wrong_code_count": 0,
"wrong_code_rate": 0.0,
"coverage_count": 0,
"coverage_rate": 0.0,
"per_cod": {},
"confusion_matrix": {},
}
gt_map: dict[str, str] = {item["denumire"]: item["cod_gold"] for item in ground_truth}
pred_map: dict[str, str] = {item["denumire"]: item["cod_pred"] for item in predictions}
total = len(gt_map)
correct = 0
wrong_code_count = 0
coverage_count = 0
per_cod_tp: dict[str, int] = {}
per_cod_fp: dict[str, int] = {}
per_cod_fn: dict[str, int] = {}
confusion: dict[str, int] = {}
for den, gold in gt_map.items():
pred = pred_map.get(den, UNRESOLVED)
# Matrice confuzie
key = f"{gold}->{pred}"
confusion[key] = confusion.get(key, 0) + 1
# Coverage: classificatorul a dat un raspuns (nu '?')
if pred != UNRESOLVED:
coverage_count += 1
if pred == gold:
# Predictie corecta
correct += 1
per_cod_tp[gold] = per_cod_tp.get(gold, 0) + 1
else:
# Eroare: FN pentru gold, FP pentru pred (daca nu '?')
per_cod_fn[gold] = per_cod_fn.get(gold, 0) + 1
if pred != UNRESOLVED:
per_cod_fp[pred] = per_cod_fp.get(pred, 0) + 1
# COD GRESIT: ambii (pred si gold) sunt coduri RAR valide (diferite)
# -> ar produce FINALIZATA cu cod eronat (ireversibil)
if pred in VALID_RAR and gold in VALID_RAR:
wrong_code_count += 1
# Calculeaza per_cod (union a tuturor codurilor vazute)
all_codes = set(per_cod_tp) | set(per_cod_fp) | set(per_cod_fn)
per_cod: dict[str, dict] = {}
for code in sorted(all_codes):
tp = per_cod_tp.get(code, 0)
fp = per_cod_fp.get(code, 0)
fn = per_cod_fn.get(code, 0)
precision = tp / (tp + fp) if (tp + fp) > 0 else None
recall = tp / (tp + fn) if (tp + fn) > 0 else None
per_cod[code] = {
"tp": tp,
"fp": fp,
"fn": fn,
"precision": precision,
"recall": recall,
}
return {
"total": total,
"correct": correct,
"global_precision": correct / total,
"wrong_code_count": wrong_code_count,
"wrong_code_rate": wrong_code_count / total,
"coverage_count": coverage_count,
"coverage_rate": coverage_count / total,
"per_cod": per_cod,
"confusion_matrix": confusion,
}
# ---------------------------------------------------------------------------
# Kill-criterion (F-E, PRD 5.14)
# ---------------------------------------------------------------------------
def kill_criterion(
metrics: dict,
wrong_code_threshold: float = DEFAULT_WRONG_CODE_THRESHOLD,
coverage_threshold: float = DEFAULT_COVERAGE_THRESHOLD,
) -> dict:
"""Evalueaza daca sistemul de clasificare indeplineste pragul de acceptanta (F-E).
Sistemul TRECE daca:
wrong_code_rate < wrong_code_threshold (implicit 0.5%)
SI
coverage_rate > coverage_threshold (implicit 50%)
Un sistem care nu trece kill-criterion NU trebuie folosit pentru auto-send
dincolo de GOLD propriu (Decision #19, #17, PRD 5.14).
metrics: dict returnat de eval_predictions() sau compatibil
(must have keys: wrong_code_rate, coverage_rate).
wrong_code_threshold: pragul maxim admis pentru rata cod-gresit.
coverage_threshold: pragul minim admis pentru acoperire.
Returneaza dict cu:
passes — True daca ambele conditii sunt indeplinite
reason — explicatie in limba romana
wrong_code_rate — valoarea actuala
coverage_rate — valoarea actuala
thresholds — {"wrong_code": ..., "coverage": ...}
"""
wcr = metrics.get("wrong_code_rate", 1.0)
cvr = metrics.get("coverage_rate", 0.0)
cond_wrong_code = wcr < wrong_code_threshold
cond_coverage = cvr > coverage_threshold
passes = cond_wrong_code and cond_coverage
if passes:
reason = (
f"TRECE: rata cod-gresit {wcr:.2%} < {wrong_code_threshold:.2%} "
f"si acoperire {cvr:.1%} > {coverage_threshold:.1%}."
)
elif not cond_wrong_code and not cond_coverage:
reason = (
f"ESUEAZA: rata cod-gresit {wcr:.2%} >= {wrong_code_threshold:.2%} "
f"(FINALIZATA ireversibila) SI acoperire {cvr:.1%} <= {coverage_threshold:.1%} "
f"(sistem neutilizabil). Auto-send dincolo de GOLD dezactivat."
)
elif not cond_wrong_code:
reason = (
f"ESUEAZA: rata cod-gresit {wcr:.2%} >= {wrong_code_threshold:.2%}. "
f"Un cod gresit = FINALIZATA ireversibila la RAR (Premisa 3, PRD 5.14). "
f"Auto-send dincolo de GOLD dezactivat pana la recalibrat."
)
else:
reason = (
f"ESUEAZA: acoperire {cvr:.1%} <= {coverage_threshold:.1%}. "
f"Sub pragul minim de utilitate practica. "
f"Sistemul ar lasa prea multe intrari in needs_mapping vs efort uman direct."
)
return {
"passes": passes,
"reason": reason,
"wrong_code_rate": wcr,
"coverage_rate": cvr,
"thresholds": {
"wrong_code": wrong_code_threshold,
"coverage": coverage_threshold,
},
}
# ---------------------------------------------------------------------------
# I/O corpus real (refoloseste holdout.load_csv)
# ---------------------------------------------------------------------------
def _load_corpus_from_csvs(data_dir: str) -> list[tuple[str, int]]:
"""Incarca corpus din CSV-urile docs/operatii-service/*.csv.
Refoloseste logica din holdout.load_csv + agregare cross-client.
"""
import glob
from app.mapping import normalize_for_match
agg: dict[str, list] = {}
for path in sorted(glob.glob(os.path.join(data_dir, "*.csv"))):
try:
with open(path, encoding='utf-8-sig') as f:
reader = csv.DictReader(f, delimiter=';')
for row in reader:
denop = (row.get('DENOP') or '').strip().strip('"')
nr_raw = (row.get('NR') or '').strip().strip('"')
if not denop or not nr_raw:
continue
try:
nr = int(nr_raw)
except ValueError:
continue
if nr <= 0:
continue
key = normalize_for_match(denop)
if key not in agg:
agg[key] = [denop, 0]
agg[key][1] += nr
except OSError:
continue
return [(v[0], v[1]) for v in agg.values()]
# ---------------------------------------------------------------------------
# CLI
# ---------------------------------------------------------------------------
def _print_report(metrics: dict) -> None:
sep = "=" * 70
print(sep)
print("RAPORT EVALUARE HELD-OUT (L14-S5, PRD 5.14)")
print(sep)
print(f" Total intrari evaluate: {metrics['total']}")
print(f" Corecte: {metrics['correct']}")
print(f" Precizie globala: {metrics['global_precision']:.2%}")
print(f" Acoperire (pred != '?'): {metrics['coverage_rate']:.2%}")
print(f" Rata cod-gresit: {metrics['wrong_code_rate']:.2%} "
f"({metrics['wrong_code_count']} cazuri)")
print()
print("KILL-CRITERION (F-E):")
kc = kill_criterion(metrics)
print(f" {kc['reason']}")
print()
if metrics['per_cod']:
print("PRECIZIE PER COD (TP/FP/FN/prec/recall):")
for cod, s in sorted(metrics['per_cod'].items()):
prec = f"{s['precision']:.0%}" if s['precision'] is not None else "N/A"
rec = f"{s['recall']:.0%}" if s['recall'] is not None else "N/A"
print(f" {cod:<10} TP={s['tp']:3d} FP={s['fp']:3d} FN={s['fn']:3d}"
f" prec={prec:>5} recall={rec:>5}")
print()
if metrics['confusion_matrix']:
print("MATRICE CONFUZIE (gold->pred, >0):")
for key, cnt in sorted(metrics['confusion_matrix'].items()):
if cnt > 0 and not key.endswith(f"->{key.split('->')[0]}"):
# Afiseaza doar erorile (gold != pred)
gold, pred_lbl = key.split("->", 1)
if gold != pred_lbl:
print(f" {key:<25} {cnt}")
print(sep)
def main() -> None:
import argparse
p = argparse.ArgumentParser(
description="Harness eval held-out L14-S5 (PRD 5.14).",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Moduri de utilizare:
Generare esantion pt etichetare umana (FARA LLM):
python3 tools/mapare-llm/heldout_eval.py --n 250 --out esantion.csv
Evaluare predictii vs ground-truth (dupa etichetare umana):
python3 tools/mapare-llm/heldout_eval.py \\
--eval predictii.csv gold.csv
Format CSV predictii: denumire;cod_pred (separator ';')
Format CSV gold: denumire;cod_gold (separator ';')
""",
)
p.add_argument("--n", type=int, default=250,
help="Marimea esantionului de etichetat (default 250)")
p.add_argument("--seed", type=int, default=DEFAULT_SEED,
help=f"Seed reproductibilitate (default {DEFAULT_SEED})")
p.add_argument("--out", default=None,
help="Fisier output CSV pt esantion (mod generare)")
p.add_argument("--eval", nargs=2, metavar=("PRED_CSV", "GOLD_CSV"),
help="Fisiere predictii si ground-truth (mod evaluare)")
p.add_argument("--data", default=None,
help="Director CSV date (default: docs/operatii-service/)")
args = p.parse_args()
data_dir = args.data or os.path.join(_ROOT, "docs", "operatii-service")
if args.eval:
# Mod evaluare
pred_path, gold_path = args.eval
def read_csv_map(path, cod_col):
result = []
with open(path, encoding='utf-8-sig') as f:
reader = csv.DictReader(f, delimiter=';')
for row in reader:
den = (row.get('denumire') or '').strip()
cod = (row.get(cod_col) or '').strip()
if den:
result.append({"denumire": den, cod_col: cod})
return result
preds = read_csv_map(pred_path, "cod_pred")
gold = read_csv_map(gold_path, "cod_gold")
metrics = eval_predictions(preds, gold)
_print_report(metrics)
return
# Mod generare esantion
print(f"Incarcare corpus din {data_dir} ...")
rows = _load_corpus_from_csvs(data_dir)
print(f"Corpus: {len(rows)} denumiri distincte, "
f"volum total {sum(nr for _, nr in rows):,}")
sample = sample_stratified(rows, n_sample=args.n, seed=args.seed)
# Statistici strate
from collections import Counter
strat_cnt = Counter(item["strat"] for item in sample)
print(f"Esantion ({len(sample)} iteme, seed={args.seed}):")
for strat in ("cap", "mijloc", "coada"):
print(f" {strat:<8}: {strat_cnt.get(strat, 0):4d} iteme")
out_path = args.out or os.path.join(_HERE, "heldout-esantion.csv")
export_for_labeling(sample, out_path)
print(f"Esantion exportat: {out_path}")
print()
print("INSTRUCTIUNI ETICHETARE:")
print(" Deschide fisierul exportat si completeaza coloana 'cod_gold'")
print(" cu codul RAR corect pentru fiecare denumire.")
print(" Coduri RAR valide:", ", ".join(sorted(VALID_RAR)), ", NUL")
print(" NUL = denumire care NU este operatie de service (discount, ITP, etc.)")
print(" '?' = incert (clasificatorul nu poate decide)")
print()
print(" ATENTIE: NU folosi etichete LLM drept cod_gold!")
print(" Asta ar fi 'antrenare pe test' (Decision #19, PRD 5.14) si ar")
print(" invalida orice masurare de acuratete.")
if __name__ == "__main__":
main()

347
tools/mapare-llm/holdout.py Normal file
View File

@@ -0,0 +1,347 @@
"""
Validare empirica Premisa 1 — "90%+ din traficul viitor sunt repetari ale acelorasi denumiri".
LIMITARE CRITICA (documentata explicit):
CSV-urile din docs/operatii-service/ contin frecvente AGREGATE (DENOP + NR),
fara coloana de data/timestamp. Validarea temporala stricta (corpus = lunile 1-N,
test = lunile N+) NU este posibila cu datele curente.
PROXY FOLOSIT (onest, nu pretinde ca = validare temporala):
1. COVERAGE PROXY (Zipf):
hit_rate_at_K = sum(NR pt top-K denumiri dupa frecventa) / total_NR
Masoara: daca etichetam top-K denumiri si traficul viitor urmeaza aceeasi
distributie Zipf (ipoteza stationaritate), ce % din trafic va fi acoperit.
NU masoara drift vocabular in timp.
2. LEAVE-FIRST-OUT PROXY:
leave_one_out_hit_rate = (total_volume - total_distinct) / total_volume
Masoara: daca corpus = "toate denumirile vazute cel putin o data", ce % din
aparitii sunt "repetari" (aparitia 2,3,...n a fiecarei denumiri)?
Singletonii (NR=1) contribuie 0 hit-uri (prima aparitie = miss inevitable).
Aceasta e limita superioara a hit-rate-ului sub stationaritate.
VERDICT Premisa 1 (bazat pe coverage proxy):
SUSTINUTA — <= 10% din denumirile distincte acopera >= 90% din volum
SLABA — intre 10% si 30% din distincte necesare pentru >= 90% volum
NEVALIDABILA — > 30% din distincte necesare (distributie Zipf slaba/plata)
Refoloseste normalize_for_match din app/mapping.py pentru cheia de potrivire.
"""
from __future__ import annotations
import csv
import os
import sys
# Calea la root-ul proiectului (doua nivele deasupra tools/mapare-llm/)
_HERE = os.path.dirname(os.path.abspath(__file__))
_ROOT = os.path.abspath(os.path.join(_HERE, '..', '..'))
if _ROOT not in sys.path:
sys.path.insert(0, _ROOT)
from app.mapping import normalize_for_match
# Re-expunem normalize_for_match sub un alias mai scurt pentru uz intern + teste.
def normalize_key(text: object) -> str:
"""Alias pentru normalize_for_match din app/mapping.py.
Upper + fara diacritice + spatii colapsate.
Exemplu: 'Reparație motor' -> 'REPARATIE MOTOR'.
"""
return normalize_for_match(text)
# ---------------------------------------------------------------------------
# I/O
# ---------------------------------------------------------------------------
def load_csv(path: str) -> list[tuple[str, int]]:
"""Incarca CSV cu coloanele DENOP (denumire) + NR (frecventa).
Returneaza lista de (denumire_originala, nr_total) dupa agregare pe
cheia normalize_key (unifica variante ortografice: diacritice, majuscule).
Randurile cu DENOP gol sau NR non-pozitiv sunt ignorate.
"""
agg: dict[str, list] = {} # normalized_key -> [first_seen_denumire, total_nr]
with open(path, encoding='utf-8-sig') as f:
reader = csv.DictReader(f, delimiter=';')
for row in reader:
denop = (row.get('DENOP') or '').strip().strip('"')
nr_raw = (row.get('NR') or '').strip().strip('"')
if not denop or not nr_raw:
continue
try:
nr = int(nr_raw)
except ValueError:
continue
if nr <= 0:
continue
key = normalize_key(denop)
if key not in agg:
agg[key] = [denop, 0]
agg[key][1] += nr
return [(v[0], v[1]) for v in agg.values()]
# ---------------------------------------------------------------------------
# Functii pure (testabile fara I/O)
# ---------------------------------------------------------------------------
def compute_volume_coverage(rows: list[tuple[str, int]]) -> list[dict]:
"""Sorteaza dupa NR descrescator si calculeaza acoperirea cumulativa de volum.
Returneaza:
[{denumire, nr, cumulative_volume_frac, cumulative_count}, ...]
unde cumulative_volume_frac e fractia din total_NR acoperita de primele
`cumulative_count` denumiri (dupa sortare descrescatoare).
"""
sorted_rows = sorted(rows, key=lambda x: -x[1])
total_volume = sum(nr for _, nr in sorted_rows)
if total_volume == 0:
return []
cumul = 0
result = []
for i, (denumire, nr) in enumerate(sorted_rows, 1):
cumul += nr
result.append({
'denumire': denumire,
'nr': nr,
'cumulative_volume_frac': cumul / total_volume,
'cumulative_count': i,
})
return result
def corpus_size_for_threshold(rows: list[tuple[str, int]], threshold: float = 0.90) -> int:
"""Numarul minim de etichete (top-frecventa) pentru >= threshold acoperire de volum.
Sorteaza descrescator si numara cate denumiri sunt necesare pana la prag.
Returneaza len(rows) daca pragul nu e atins (distributie prea plata).
"""
coverage = compute_volume_coverage(rows)
for entry in coverage:
if entry['cumulative_volume_frac'] >= threshold:
return entry['cumulative_count']
return len(rows)
def compute_hit_rate_at_k(rows: list[tuple[str, int]], k: int) -> float:
"""Fractia de volum total acoperita de top-K denumiri (coverage proxy).
Interpretare: daca etichetam cele mai frecvente K denumiri, si traficul viitor
urmeaza aceeasi distributie, hit_rate_at_K = probabilitatea ca o tranzactie
viitoare sa fie acoperita de corpus.
"""
if not rows:
return 0.0
sorted_rows = sorted(rows, key=lambda x: -x[1])
total_volume = sum(nr for _, nr in sorted_rows)
if total_volume == 0:
return 0.0
top_k_volume = sum(nr for _, nr in sorted_rows[:k])
return top_k_volume / total_volume
def leave_one_out_hit_rate(rows: list[tuple[str, int]]) -> float:
"""Proxy leave-first-out: (total_volume - total_distinct) / total_volume.
Interpretare: daca corpus = toate denumirile vazute cel putin o data,
fractia de aparitii care sunt "repetari" (nu prima aparitie) = hit-uri.
Singletonii (NR=1) contribuie 0 hit-uri (prima si unica aparitie = miss).
Aceasta e LIMITA SUPERIOARA a hit-rate-ului real sub ipoteza de stationaritate.
NU e validare temporala (nu masoara cand apar denumirile noi in timp).
"""
if not rows:
return 0.0
total_volume = sum(nr for _, nr in rows)
total_distinct = len(rows)
if total_volume == 0:
return 0.0
return (total_volume - total_distinct) / total_volume
def singleton_stats(rows: list[tuple[str, int]]) -> dict:
"""Statistici pentru denumirile cu NR=1 (vazute o singura data).
Singletonii sunt importanti: ei sunt INTOTDEAUNA miss-uri la prima aparitie
si, daca nu mai apar, raman miss-uri permanent.
"""
singletons = [(d, n) for d, n in rows if n == 1]
total_distinct = len(rows)
total_volume = sum(nr for _, nr in rows)
singleton_volume = len(singletons) # fiecare singleton contribuie NR=1
return {
'singleton_count': len(singletons),
'total_distinct': total_distinct,
'singleton_volume_frac': singleton_volume / total_volume if total_volume else 0.0,
'singleton_distinct_frac': len(singletons) / total_distinct if total_distinct else 0.0,
}
def run_holdout(rows: list[tuple[str, int]], client_name: str = 'unknown') -> dict:
"""Analiza holdout proxy completa pentru un set de (denumire, nr).
Combina coverage proxy (Zipf) si leave-first-out proxy.
Returneaza un dict cu statistici si verdict privind Premisa 1.
"""
total_distinct = len(rows)
total_volume = sum(nr for _, nr in rows)
coverage_at_100 = compute_hit_rate_at_k(rows, k=100)
coverage_at_500 = compute_hit_rate_at_k(rows, k=500)
coverage_at_1000 = compute_hit_rate_at_k(rows, k=1000)
labels_for_90pct = corpus_size_for_threshold(rows, threshold=0.90)
frac_for_90pct = labels_for_90pct / total_distinct if total_distinct else 1.0
loh = leave_one_out_hit_rate(rows)
s = singleton_stats(rows)
# Verdict bazat pe coverage proxy (Zipf): ce procent din distincte necesare pt 90% vol
if frac_for_90pct <= 0.10:
verdict = 'SUSTINUTA'
elif frac_for_90pct <= 0.30:
verdict = 'SLABA'
else:
verdict = 'NEVALIDABILA'
return {
'client': client_name,
'total_distinct': total_distinct,
'total_volume': total_volume,
'coverage_at_100': round(coverage_at_100 * 100, 2),
'coverage_at_500': round(coverage_at_500 * 100, 2),
'coverage_at_1000': round(coverage_at_1000 * 100, 2),
'labels_for_90pct': labels_for_90pct,
'frac_for_90pct': round(frac_for_90pct * 100, 2),
'leave_one_out_hit_rate': round(loh * 100, 2),
'singleton_count': s['singleton_count'],
'singleton_distinct_frac': round(s['singleton_distinct_frac'] * 100, 2),
'singleton_volume_frac': round(s['singleton_volume_frac'] * 100, 2),
'verdict': verdict,
'nota': (
'PROXY FRECVENTA (fara timestamp temporal): validare temporala stricta '
'imposibila cu datele curente. hit_rate_at_K = % volum acoperit de top-K '
'etichete; valida NUMAI sub ipoteza distributie stabila in timp.'
),
}
# ---------------------------------------------------------------------------
# CLI
# ---------------------------------------------------------------------------
def _format_row(label: str, value: str, width: int = 45) -> str:
return f" {label:<{width}}{value}"
def main() -> None:
"""Ruleaza holdout pe toate CSV-urile din docs/operatii-service/."""
root = os.path.join(_ROOT, 'docs', 'operatii-service')
clients = ['clever', 'sigma', 'automotive', 'south']
sep = "=" * 72
print(sep)
print("HOLDOUT PREMISA 1 — PROXY FRECVENTA (fara date temporale)")
print(sep)
print("LIMITARE: CSV-urile contin frecvente AGREGATE (DENOP + NR), fara")
print("coloana de data/timestamp. Validarea temporala stricta NU e posibila.")
print()
print("PROXY 1 (Coverage Zipf): hit_rate_at_K = % volum acoperit de top-K")
print(" -> valida sub ipoteza distributie stabila (nemasurabila cu date curente)")
print("PROXY 2 (Leave-first-out): (total_vol - total_distinct) / total_vol")
print(" -> limita superioara a hit-rate-ului daca am eticheta tot ce vedem odata")
print(sep)
print()
all_rows_combined: list[tuple[str, int]] = []
results = []
for client in clients:
path = os.path.join(root, f'operatii-service-{client}.csv')
rows = load_csv(path)
all_rows_combined.extend(rows)
r = run_holdout(rows, client_name=client)
results.append(r)
print(f"CLIENT: {client.upper()}")
print(_format_row("Denumiri distincte:", f"{r['total_distinct']:,}"))
print(_format_row("Volum total operatii:", f"{r['total_volume']:,}"))
print(_format_row("Coverage top-100:", f"{r['coverage_at_100']:.1f}%"))
print(_format_row("Coverage top-500:", f"{r['coverage_at_500']:.1f}%"))
print(_format_row("Coverage top-1000:", f"{r['coverage_at_1000']:.1f}%"))
print(_format_row(
"Etichete pt 90% vol:",
f"{r['labels_for_90pct']} ({r['frac_for_90pct']:.1f}% din distinct)"
))
print(_format_row(
"Leave-first-out hit-rate:",
f"{r['leave_one_out_hit_rate']:.1f}%"
))
print(_format_row(
"Singletons (NR=1):",
f"{r['singleton_count']} ({r['singleton_distinct_frac']:.1f}% din distinct,"
f" {r['singleton_volume_frac']:.1f}% din vol)"
))
print(f" VERDICT PREMISA 1: {r['verdict']}")
print()
# Agregat: re-agreg pe cheia normalized (pentru ca clientii pot avea aceleasi denumiri)
agg_dict: dict[str, list] = {}
for client in clients:
path = os.path.join(root, f'operatii-service-{client}.csv')
rows_c = load_csv(path)
for (d, n) in rows_c:
k = normalize_key(d)
if k not in agg_dict:
agg_dict[k] = [d, 0]
agg_dict[k][1] += n
all_rows_agg = [(v[0], v[1]) for v in agg_dict.values()]
agg = run_holdout(all_rows_agg, client_name='AGREGAT_4_CLIENTI')
print(f"CLIENT: AGREGAT (4 clienti, distinct cross-client)")
print(_format_row("Denumiri distincte:", f"{agg['total_distinct']:,}"))
print(_format_row("Volum total operatii:", f"{agg['total_volume']:,}"))
print(_format_row("Coverage top-100:", f"{agg['coverage_at_100']:.1f}%"))
print(_format_row("Coverage top-500:", f"{agg['coverage_at_500']:.1f}%"))
print(_format_row("Coverage top-1000:", f"{agg['coverage_at_1000']:.1f}%"))
print(_format_row(
"Etichete pt 90% vol:",
f"{agg['labels_for_90pct']} ({agg['frac_for_90pct']:.1f}% din distinct)"
))
print(_format_row("Leave-first-out hit-rate:", f"{agg['leave_one_out_hit_rate']:.1f}%"))
print(_format_row(
"Singletons (NR=1):",
f"{agg['singleton_count']} ({agg['singleton_distinct_frac']:.1f}% din distinct,"
f" {agg['singleton_volume_frac']:.1f}% din vol)"
))
print(f" VERDICT PREMISA 1: {agg['verdict']}")
print()
print(sep)
print("CONCLUZIE PREMISA 1:")
verdicts = [r['verdict'] for r in results]
if all(v == 'SUSTINUTA' for v in verdicts):
print(" SUSTINUTA la toti clientii individual.")
elif any(v == 'SUSTINUTA' for v in verdicts):
print(" PARTIALA: sustinuta la unii clienti, slaba/nevalidabila la altii.")
else:
print(" SLABA sau NEVALIDABILA la toti clientii.")
print(f" Agregat: {agg['verdict']}")
print()
print("NOTA METODOLOGICA:")
print(" Concluzia e valida NUMAI sub ipoteza ca distributia de frecvente e stabila")
print(" in timp (vocabularul service-ului nu se schimba semnificativ de la luna la luna).")
print(" Pentru validare temporala stricta, sunt necesare date cu coloana de data.")
print(sep)
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,300 @@
"""Etichetator batch offline OpenRouter (Layer 1) — L14-S1.
Clasifica denumirile de operatii service in cele 18 coduri RAR + NUL.
Cerinte implementate (PRD 5.14 / Decision Audit Trail):
1. Prioritizare pe FRECVENTA (desc): corpus_by_freq() din or_common
2. Grupare pe similaritate (rapidfuzz token_sort_ratio, threshold conservator
Eng-F7): LLM eticheteaza doar reprezentantul, codul se propaga la grup
3. Ensemble NVIDIA (super-120b + nano-9b, PRD #9): acord unanim -> high;
dezacord (orice divergenta) -> needs_mapping. Vot pe coduri, nu pe
self-confidence. ultra-550b EXCLUS (4-5x mai lent, zero castig)
4. Scrub PII (F3): integrat in or_common.call() (regex nr inmatriculare/VIN)
5. Resumabil: scrie *-partial.json incremental, reia de unde a ramas;
retry/backoff pe 429 gestionat de or_common.call()
6. Output: {denumire, cod, sursa, confidence, grup_rep}
NUL = ancore negativa + supresie, NU promovat la cod RAR (#4)
CLI: python3 tools/mapare-llm/or_label.py [N] [--out path] [--partial path]
[--threshold 85] [--batch 20] [--pace 4.0]
"""
import sys
import os
import json
import time
from collections import Counter
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
import or_common as oc
from rapidfuzz import fuzz
# Modele NVIDIA (decizie PRD #9: pastram super-120b + nano-9b; aruncam ultra-550b)
MODELS = [
"nvidia/nemotron-3-super-120b-a12b:free",
"nvidia/nemotron-nano-9b-v2:free",
]
DEFAULT_THRESHOLD = 85 # raza conservatoare pt grupare (Eng-F7)
DEFAULT_BATCH = 20 # denumiri per apel LLM (cap free tier ~50 cereri/zi)
DEFAULT_N = 500 # top N dupa frecventa de procesat
DEFAULT_PACE = 4.0 # sec intre batch-uri (free tier OpenRouter ~20 req/min)
HERE = os.path.dirname(os.path.abspath(__file__))
PARTIAL_PATH = os.path.join(HERE, "or-labels-partial.json")
FINAL_PATH = os.path.join(HERE, "or-labels-final.json")
def group_by_similarity(corpus, threshold=DEFAULT_THRESHOLD):
"""Grupeaza denumirile pe similaritate fuzz.token_sort_ratio.
corpus: lista de (denumire, freq) sortata DESCRESCATOR dupa frecventa.
Elementul cu frecventa maxima = reprezentantul grupului.
threshold: scor minim de similaritate (0-100). Valoare conservatoare = 85.
Algoritm greedy: primul item nemapat devine reprezentant; urmatoarele
iteme cu scor >= threshold fata de reprezentant intra in grupul sau.
Conservator: nu grupeaza tranzitiv (doar fata de reprezentant).
Intoarce: lista de dict {rep: str, freq: int, members: [(den, freq), ...]}
"""
assigned = set()
groups = []
for i, (den_i, freq_i) in enumerate(corpus):
if den_i in assigned:
continue
members = []
for j, (den_j, freq_j) in enumerate(corpus):
if j <= i or den_j in assigned:
continue
if fuzz.token_sort_ratio(den_i, den_j) >= threshold:
members.append((den_j, freq_j))
assigned.add(den_j)
assigned.add(den_i)
groups.append({"rep": den_i, "freq": freq_i, "members": members})
return groups
def ensemble_vote(votes):
"""Calculeaza verdictul ensemble din voturile modelelor.
votes: dict {model_id: cod} - "?" inseamna parse-fail (se exclude).
Reguli (2 modele NVIDIA, aceeasi familie):
- Toate N modele cu acelasi cod valid -> (cod, "high", "ensemble-unanim")
- Toate N modele cu "NUL" -> ("NUL", "high", "ensemble-unanim-nul")
- Orice divergenta / parse-fail partial -> ("?", "needs_mapping", "ensemble-dezacord")
Vot pe coduri, NU pe self-confidence (PRD #10, Eng-F7).
NUL tratat SEPARAT: ancore negativa, nu e cod RAR (#4).
Intoarce: (cod_final, confidence, sursa)
cod_final: cod RAR valid | "NUL" | "?" (needs human review)
confidence: "high" | "needs_mapping"
sursa: "ensemble-unanim" | "ensemble-unanim-nul" | "ensemble-dezacord"
"""
n_models = len(votes)
valid_votes = [v for v in votes.values() if v != "?"]
if not valid_votes:
return "?", "needs_mapping", "ensemble-dezacord"
c = Counter(valid_votes)
top_cod, top_cnt = c.most_common(1)[0]
if top_cnt == n_models:
# Unanimitate: toate cele N modele au raspuns cu acelasi cod
if top_cod == "NUL":
return "NUL", "high", "ensemble-unanim-nul"
if top_cod in oc.VALID:
return top_cod, "high", "ensemble-unanim"
# Cod returnat de LLM nu e in nomenclatorul RAR -> dezacord
return "?", "needs_mapping", "ensemble-dezacord"
# Dezacord (inclusiv parse-fail partial: top_cnt < n_models)
return "?", "needs_mapping", "ensemble-dezacord"
def load_partial(path):
"""Incarca rezultate partiale daca fisierul exista.
Intoarce dict {rep -> {cod, confidence, sursa, votes}} sau {} daca
fisierul lipseste sau e corupt.
"""
if os.path.exists(path):
try:
return json.load(open(path, encoding="utf-8"))
except (json.JSONDecodeError, OSError):
return {}
return {}
def save_partial(path, results):
"""Salveaza rezultate partiale incrementabil (suprascrie fisierul).
results: dict {rep -> {cod, confidence, sursa, votes}}
"""
json.dump(results, open(path, "w", encoding="utf-8"), ensure_ascii=False, indent=1)
def label_groups(groups, partial, batch_size=DEFAULT_BATCH, pace=DEFAULT_PACE):
"""Eticheteaza reprezentantii grupurilor cu ensemble NVIDIA.
Sare reprezentantii deja in partial (resumabil).
Colecteaza voturi per model in batch-uri, calculeaza ensemble,
actualizeaza partial la final.
groups: lista de {rep, freq, members} din group_by_similarity()
partial: dict {rep -> label} - stare anterioara (modificat in-place)
batch_size: denumiri per apel LLM
pace: sec intre batch-uri (0 = fara pauza, util in teste)
Intoarce partial actualizat.
"""
todo = [g["rep"] for g in groups if g["rep"] not in partial]
if not todo:
print("toti reprezentantii sunt deja in partial, nimic de facut", flush=True)
return partial
print(f"de etichetat: {len(todo)} reprezentanti "
f"(skip {len(groups) - len(todo)} din partial)", flush=True)
# Colectam voturile per model, pentru toti reprezentantii nerezolvati
votes_per_rep = {rep: {} for rep in todo}
nb = (len(todo) + batch_size - 1) // batch_size
for mi, m in enumerate(MODELS):
print(f" model: {m}", flush=True)
for bi, k in enumerate(range(0, len(todo), batch_size)):
batch = todo[k:k + batch_size]
codes, meta = oc.call(m, batch)
for rep, cod in zip(batch, codes):
votes_per_rep[rep][m] = cod
print(f" batch {bi+1}/{nb} {meta['ms']}ms err={meta['err']}", flush=True)
if bi < nb - 1 and pace > 0:
time.sleep(pace)
if pace > 0 and mi < len(MODELS) - 1:
time.sleep(pace) # pauza intre modele diferite
# Ensemble + scriere in partial
for rep in todo:
cod, confidence, sursa = ensemble_vote(votes_per_rep[rep])
partial[rep] = {
"cod": cod,
"confidence": confidence,
"sursa": sursa,
"votes": votes_per_rep[rep],
}
return partial
def expand_to_all(groups, partial):
"""Propaga etichetele reprezentantilor la membrii grupului.
Reprezentantul primeste sursa din ensemble ("ensemble-*").
Membrii primesc sursa="propagat" si codul/confidence al reprezentantului.
NUL este pastrat ca NUL la propagare, nu e convertit la cod RAR (#4).
Intoarce: lista de dict {denumire, cod, sursa, confidence, grup_rep}
"""
results = []
for g in groups:
rep = g["rep"]
label = partial.get(rep, {})
cod = label.get("cod", "?")
confidence = label.get("confidence", "needs_mapping")
sursa_rep = label.get("sursa", "ensemble-dezacord")
# Reprezentantul
results.append({
"denumire": rep,
"cod": cod,
"sursa": sursa_rep,
"confidence": confidence,
"grup_rep": rep,
})
# Membrii grupului: propaga codul reprezentantului
for (mem, _freq) in g["members"]:
results.append({
"denumire": mem,
"cod": cod,
"sursa": "propagat",
"confidence": confidence,
"grup_rep": rep,
})
return results
def run(n=DEFAULT_N, output_path=FINAL_PATH, partial_path=PARTIAL_PATH,
threshold=DEFAULT_THRESHOLD, batch_size=DEFAULT_BATCH, pace=DEFAULT_PACE):
"""Punctul principal: citeste corpus, grupeaza, eticheteaza, salveaza.
Resumabil: daca partial_path exista, sare reprezentantii deja etichetati.
n: top N denumiri dupa frecventa de procesat
output_path: fisier JSON cu toate etichetele (final)
partial_path: fisier JSON resumabil (stare intermediara per reprezentant)
threshold: raza similaritate pt grupare (0-100, default 85 = conservator)
batch_size: denumiri per apel LLM
pace: sec intre batch-uri
Intoarce: lista de rezultate (identica cu fisierul output_path).
"""
corpus = oc.corpus_by_freq()
top = corpus[:n]
vol_total = sum(nr for _, nr in corpus) or 1
vol_top = sum(nr for _, nr in top)
print(f"corpus: {len(corpus)} denumiri distincte, volum total {vol_total}")
print(f"top {n} dupa frecventa: volum {vol_top} ({100*vol_top/vol_total:.1f}%)")
groups = group_by_similarity(top, threshold)
n_reps = len(groups)
n_mems = sum(len(g["members"]) for g in groups)
print(f"dupa grupare: {n_reps} reprezentanti, {n_mems} membri propagati din {n}")
partial = load_partial(partial_path)
print(f"partial incarcat: {len(partial)} reprezentanti deja etichetati")
partial = label_groups(groups, partial, batch_size, pace)
save_partial(partial_path, partial)
print(f"partial salvat: {partial_path}")
results = expand_to_all(groups, partial)
json.dump(results, open(output_path, "w", encoding="utf-8"),
ensure_ascii=False, indent=1)
# Raport sumar
nul_cnt = sum(1 for r in results if r["cod"] == "NUL")
high_cnt = sum(1 for r in results if r["confidence"] == "high")
needs_cnt = sum(1 for r in results if r["confidence"] == "needs_mapping")
prop_cnt = sum(1 for r in results if r["sursa"] == "propagat")
print(f"\nREZULTAT: {len(results)} denumiri in output")
print(f" NUL (gunoi, ancore negative): {nul_cnt}")
print(f" confidence high (unanim): {high_cnt}")
print(f" needs_mapping (dezacord): {needs_cnt}")
print(f" propagate din grup: {prop_cnt}")
print(f"salvat: {output_path}")
return results
if __name__ == "__main__":
import argparse
p = argparse.ArgumentParser(description="Etichetator batch offline OpenRouter (L14-S1)")
p.add_argument("n", nargs="?", type=int, default=DEFAULT_N,
help=f"top N denumiri dupa frecventa (default {DEFAULT_N})")
p.add_argument("--out", default=FINAL_PATH, metavar="PATH",
help="fisier output final JSON (default: or-labels-final.json)")
p.add_argument("--partial", default=PARTIAL_PATH, metavar="PATH",
help="fisier partial resumabil JSON (default: or-labels-partial.json)")
p.add_argument("--threshold", type=int, default=DEFAULT_THRESHOLD,
help=f"raza similaritate grupare 0-100 (default {DEFAULT_THRESHOLD})")
p.add_argument("--batch", type=int, default=DEFAULT_BATCH,
help=f"denumiri per apel LLM (default {DEFAULT_BATCH})")
p.add_argument("--pace", type=float, default=DEFAULT_PACE,
help=f"sec intre batch-uri (default {DEFAULT_PACE})")
a = p.parse_args()
run(a.n, a.out, a.partial, a.threshold, a.batch, a.pace)