Files
nlp-master/tests/test_regression.sh
Marius Mutu d22038d002 refactor: parametrize pipeline cu --course flag + suport Vimeo/text
Un singur set de scripturi acum rulează pe orice curs configurat în
courses.py. Master rămâne la rădăcina repo (backward-compat M1-M6);
cursuri noi (ex. practitioner la shop.cursnlp.ro) primesc un root
dedicat (nlp-practitioner/) cu propriile artefacte.

- courses.py: config dict (master, practitioner) + course_paths() +
  validate_manifest_course() (manifest fără course_key = master).
- download.py: --course + --modules; trei tipuri de lecții (audio HTTP,
  Vimeo iframe via yt-dlp audio-only, text-only cu captură HTML);
  merge cu manifest existent în loc de replace; strip [Audio] pentru
  backward-compat paths.
- transcribe.py: --course + --modules; skip type==text; path-uri prin
  course_paths(); validare course_key.
- summarize.py: --course + --compile; template prompt folosește
  course['name']; scrie SUPORT_CURS.md cu LF explicit (WSL2 baseline).
- md_to_pdf.py: --course resolv-ă summaries_dir / pdf_dir per curs.
- run.bat: detectează master|practitioner ca primul argument,
  propagă --course la sub-scripturi; backward-compat run.bat [modules].
- requirements.txt: + yt-dlp.
- .gitignore: nlp-practitioner/audio/, audio_wav/, scratch_recon.py, tmp_recon/.
- tests/test_regression.sh: 5 gate-uri read-only (import, schema,
  disk-coherence, SUPORT_CURS byte-identic, cross-course isolation).

Regression curs master: PASS (manifest + SUPORT_CURS.md hash
identic cu baseline /tmp/suport_before.md).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 14:33:19 +03:00

92 lines
3.5 KiB
Bash

#!/bin/bash
# Regression test: curs master (cursuri.aresens.ro/curs/26) — rulat după
# refactor pentru a confirma că backward-compat e intactă.
#
# Read-only: nu face download, nu re-transcrie, nu modifică manifest în mod
# vizibil (summarize.py --compile suprascrie doar SUPORT_CURS.md pe care îl
# comparăm byte-identic cu baseline-ul capturat pre-refactor).
#
# Baseline: /tmp/suport_before.md (captured pre-refactor).
# Rulare: bash tests/test_regression.sh
set -euo pipefail
ROOT="$(cd "$(dirname "$0")/.." && pwd)"
cd "$ROOT"
PY="$ROOT/.venv/Scripts/python.exe"
[ -x "$PY" ] || PY=python
if [ ! -f /tmp/suport_before.md ]; then
echo "FAIL: baseline /tmp/suport_before.md lipsește. Capturează cu:"
echo " cp SUPORT_CURS.md /tmp/suport_before.md"
exit 1
fi
echo "=== [1/5] courses.py importabil + curs 'master' rezolvă ==="
"$PY" -c "
from courses import get_course, course_paths, validate_manifest_course
c = get_course('master')
p = course_paths(c)
assert c['key'] == 'master'
assert str(p['manifest']) == 'manifest.json', p['manifest']
assert str(p['master_guide']) == 'SUPORT_CURS.md'
print('OK: master root=. manifest=manifest.json')
"
echo "=== [2/5] manifest.json: schema backward-compat (course_key absent sau 'master') ==="
"$PY" - <<'PY'
import json
from courses import validate_manifest_course
m = json.load(open("manifest.json", encoding="utf-8"))
# Legacy (no course_key) must be accepted as 'master'.
validate_manifest_course(m, "master")
# Opposite direction must raise.
try:
validate_manifest_course(m, "practitioner")
except SystemExit as e:
print(f"OK: cross-course validation refuses: {e}")
else:
raise SystemExit("FAIL: cross-course validation silently allowed")
assert len(m["modules"]) >= 1, "no modules"
print(f"OK: {len(m['modules'])} modules in manifest")
PY
echo "=== [3/5] transcribe.py --course master (idempotent dry-run — citește manifest, nu re-transcrie) ==="
# Invocarea directă e dominată de disk-check pe transcript_path; dacă toate
# .txt există, nu rulează whisper.
"$PY" -c "
import json
m = json.load(open('manifest.json', encoding='utf-8'))
from pathlib import Path
missing = [l['title'] for mod in m['modules'] for l in mod['lectures']
if l.get('transcribe_status') == 'complete'
and l.get('type') != 'text'
and not Path(l['transcript_path']).exists()]
if missing:
print('FAIL: transcribe_status=complete but .txt missing for:', missing[:3])
raise SystemExit(1)
print(f'OK: all completed transcripts present on disk')
"
echo "=== [4/5] summarize.py --course master --compile — SUPORT_CURS.md byte-identic cu baseline ==="
"$PY" summarize.py --course master --compile
if ! diff -q SUPORT_CURS.md /tmp/suport_before.md >/dev/null; then
echo "FAIL: SUPORT_CURS.md diferă de baseline /tmp/suport_before.md"
diff /tmp/suport_before.md SUPORT_CURS.md | head -30
exit 1
fi
echo "OK: SUPORT_CURS.md byte-identic cu baseline."
echo "=== [5/5] cross-course isolation — --course practitioner nu atinge state-ul master ==="
OUT="$("$PY" transcribe.py --course practitioner 2>&1 || true)"
if echo "$OUT" | grep -qiE "belongs to course|not found"; then
echo "OK: transcribe --course practitioner nu a rulat pe manifest master"
echo " (mesaj: $(echo "$OUT" | grep -oE '(belongs to course[^"]*|not found[^"]*)' | head -1))"
else
echo "FAIL: transcribe --course practitioner output neașteptat:"
echo "$OUT" | head -3
exit 1
fi
echo ""
echo "REGRESSION OK — backward-compat curs master intactă."