Un singur set de scripturi acum rulează pe orice curs configurat în courses.py. Master rămâne la rădăcina repo (backward-compat M1-M6); cursuri noi (ex. practitioner la shop.cursnlp.ro) primesc un root dedicat (nlp-practitioner/) cu propriile artefacte. - courses.py: config dict (master, practitioner) + course_paths() + validate_manifest_course() (manifest fără course_key = master). - download.py: --course + --modules; trei tipuri de lecții (audio HTTP, Vimeo iframe via yt-dlp audio-only, text-only cu captură HTML); merge cu manifest existent în loc de replace; strip [Audio] pentru backward-compat paths. - transcribe.py: --course + --modules; skip type==text; path-uri prin course_paths(); validare course_key. - summarize.py: --course + --compile; template prompt folosește course['name']; scrie SUPORT_CURS.md cu LF explicit (WSL2 baseline). - md_to_pdf.py: --course resolv-ă summaries_dir / pdf_dir per curs. - run.bat: detectează master|practitioner ca primul argument, propagă --course la sub-scripturi; backward-compat run.bat [modules]. - requirements.txt: + yt-dlp. - .gitignore: nlp-practitioner/audio/, audio_wav/, scratch_recon.py, tmp_recon/. - tests/test_regression.sh: 5 gate-uri read-only (import, schema, disk-coherence, SUPORT_CURS byte-identic, cross-course isolation). Regression curs master: PASS (manifest + SUPORT_CURS.md hash identic cu baseline /tmp/suport_before.md). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
92 lines
3.5 KiB
Bash
92 lines
3.5 KiB
Bash
#!/bin/bash
|
|
# Regression test: curs master (cursuri.aresens.ro/curs/26) — rulat după
|
|
# refactor pentru a confirma că backward-compat e intactă.
|
|
#
|
|
# Read-only: nu face download, nu re-transcrie, nu modifică manifest în mod
|
|
# vizibil (summarize.py --compile suprascrie doar SUPORT_CURS.md pe care îl
|
|
# comparăm byte-identic cu baseline-ul capturat pre-refactor).
|
|
#
|
|
# Baseline: /tmp/suport_before.md (captured pre-refactor).
|
|
# Rulare: bash tests/test_regression.sh
|
|
set -euo pipefail
|
|
|
|
ROOT="$(cd "$(dirname "$0")/.." && pwd)"
|
|
cd "$ROOT"
|
|
PY="$ROOT/.venv/Scripts/python.exe"
|
|
[ -x "$PY" ] || PY=python
|
|
|
|
if [ ! -f /tmp/suport_before.md ]; then
|
|
echo "FAIL: baseline /tmp/suport_before.md lipsește. Capturează cu:"
|
|
echo " cp SUPORT_CURS.md /tmp/suport_before.md"
|
|
exit 1
|
|
fi
|
|
|
|
echo "=== [1/5] courses.py importabil + curs 'master' rezolvă ==="
|
|
"$PY" -c "
|
|
from courses import get_course, course_paths, validate_manifest_course
|
|
c = get_course('master')
|
|
p = course_paths(c)
|
|
assert c['key'] == 'master'
|
|
assert str(p['manifest']) == 'manifest.json', p['manifest']
|
|
assert str(p['master_guide']) == 'SUPORT_CURS.md'
|
|
print('OK: master root=. manifest=manifest.json')
|
|
"
|
|
|
|
echo "=== [2/5] manifest.json: schema backward-compat (course_key absent sau 'master') ==="
|
|
"$PY" - <<'PY'
|
|
import json
|
|
from courses import validate_manifest_course
|
|
m = json.load(open("manifest.json", encoding="utf-8"))
|
|
# Legacy (no course_key) must be accepted as 'master'.
|
|
validate_manifest_course(m, "master")
|
|
# Opposite direction must raise.
|
|
try:
|
|
validate_manifest_course(m, "practitioner")
|
|
except SystemExit as e:
|
|
print(f"OK: cross-course validation refuses: {e}")
|
|
else:
|
|
raise SystemExit("FAIL: cross-course validation silently allowed")
|
|
assert len(m["modules"]) >= 1, "no modules"
|
|
print(f"OK: {len(m['modules'])} modules in manifest")
|
|
PY
|
|
|
|
echo "=== [3/5] transcribe.py --course master (idempotent dry-run — citește manifest, nu re-transcrie) ==="
|
|
# Invocarea directă e dominată de disk-check pe transcript_path; dacă toate
|
|
# .txt există, nu rulează whisper.
|
|
"$PY" -c "
|
|
import json
|
|
m = json.load(open('manifest.json', encoding='utf-8'))
|
|
from pathlib import Path
|
|
missing = [l['title'] for mod in m['modules'] for l in mod['lectures']
|
|
if l.get('transcribe_status') == 'complete'
|
|
and l.get('type') != 'text'
|
|
and not Path(l['transcript_path']).exists()]
|
|
if missing:
|
|
print('FAIL: transcribe_status=complete but .txt missing for:', missing[:3])
|
|
raise SystemExit(1)
|
|
print(f'OK: all completed transcripts present on disk')
|
|
"
|
|
|
|
echo "=== [4/5] summarize.py --course master --compile — SUPORT_CURS.md byte-identic cu baseline ==="
|
|
"$PY" summarize.py --course master --compile
|
|
if ! diff -q SUPORT_CURS.md /tmp/suport_before.md >/dev/null; then
|
|
echo "FAIL: SUPORT_CURS.md diferă de baseline /tmp/suport_before.md"
|
|
diff /tmp/suport_before.md SUPORT_CURS.md | head -30
|
|
exit 1
|
|
fi
|
|
echo "OK: SUPORT_CURS.md byte-identic cu baseline."
|
|
|
|
echo "=== [5/5] cross-course isolation — --course practitioner nu atinge state-ul master ==="
|
|
OUT="$("$PY" transcribe.py --course practitioner 2>&1 || true)"
|
|
if echo "$OUT" | grep -qiE "belongs to course|not found"; then
|
|
echo "OK: transcribe --course practitioner nu a rulat pe manifest master"
|
|
echo " (mesaj: $(echo "$OUT" | grep -oE '(belongs to course[^"]*|not found[^"]*)' | head -1))"
|
|
else
|
|
echo "FAIL: transcribe --course practitioner output neașteptat:"
|
|
echo "$OUT" | head -3
|
|
exit 1
|
|
fi
|
|
|
|
echo ""
|
|
echo "REGRESSION OK — backward-compat curs master intactă."
|