refactor: parametrize pipeline cu --course flag + suport Vimeo/text
Un singur set de scripturi acum rulează pe orice curs configurat în courses.py. Master rămâne la rădăcina repo (backward-compat M1-M6); cursuri noi (ex. practitioner la shop.cursnlp.ro) primesc un root dedicat (nlp-practitioner/) cu propriile artefacte. - courses.py: config dict (master, practitioner) + course_paths() + validate_manifest_course() (manifest fără course_key = master). - download.py: --course + --modules; trei tipuri de lecții (audio HTTP, Vimeo iframe via yt-dlp audio-only, text-only cu captură HTML); merge cu manifest existent în loc de replace; strip [Audio] pentru backward-compat paths. - transcribe.py: --course + --modules; skip type==text; path-uri prin course_paths(); validare course_key. - summarize.py: --course + --compile; template prompt folosește course['name']; scrie SUPORT_CURS.md cu LF explicit (WSL2 baseline). - md_to_pdf.py: --course resolv-ă summaries_dir / pdf_dir per curs. - run.bat: detectează master|practitioner ca primul argument, propagă --course la sub-scripturi; backward-compat run.bat [modules]. - requirements.txt: + yt-dlp. - .gitignore: nlp-practitioner/audio/, audio_wav/, scratch_recon.py, tmp_recon/. - tests/test_regression.sh: 5 gate-uri read-only (import, schema, disk-coherence, SUPORT_CURS byte-identic, cross-course isolation). Regression curs master: PASS (manifest + SUPORT_CURS.md hash identic cu baseline /tmp/suport_before.md). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
91
tests/test_regression.sh
Normal file
91
tests/test_regression.sh
Normal file
@@ -0,0 +1,91 @@
|
||||
#!/bin/bash
|
||||
# Regression test: curs master (cursuri.aresens.ro/curs/26) — rulat după
|
||||
# refactor pentru a confirma că backward-compat e intactă.
|
||||
#
|
||||
# Read-only: nu face download, nu re-transcrie, nu modifică manifest în mod
|
||||
# vizibil (summarize.py --compile suprascrie doar SUPORT_CURS.md pe care îl
|
||||
# comparăm byte-identic cu baseline-ul capturat pre-refactor).
|
||||
#
|
||||
# Baseline: /tmp/suport_before.md (captured pre-refactor).
|
||||
# Rulare: bash tests/test_regression.sh
|
||||
set -euo pipefail
|
||||
|
||||
ROOT="$(cd "$(dirname "$0")/.." && pwd)"
|
||||
cd "$ROOT"
|
||||
PY="$ROOT/.venv/Scripts/python.exe"
|
||||
[ -x "$PY" ] || PY=python
|
||||
|
||||
if [ ! -f /tmp/suport_before.md ]; then
|
||||
echo "FAIL: baseline /tmp/suport_before.md lipsește. Capturează cu:"
|
||||
echo " cp SUPORT_CURS.md /tmp/suport_before.md"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "=== [1/5] courses.py importabil + curs 'master' rezolvă ==="
|
||||
"$PY" -c "
|
||||
from courses import get_course, course_paths, validate_manifest_course
|
||||
c = get_course('master')
|
||||
p = course_paths(c)
|
||||
assert c['key'] == 'master'
|
||||
assert str(p['manifest']) == 'manifest.json', p['manifest']
|
||||
assert str(p['master_guide']) == 'SUPORT_CURS.md'
|
||||
print('OK: master root=. manifest=manifest.json')
|
||||
"
|
||||
|
||||
echo "=== [2/5] manifest.json: schema backward-compat (course_key absent sau 'master') ==="
|
||||
"$PY" - <<'PY'
|
||||
import json
|
||||
from courses import validate_manifest_course
|
||||
m = json.load(open("manifest.json", encoding="utf-8"))
|
||||
# Legacy (no course_key) must be accepted as 'master'.
|
||||
validate_manifest_course(m, "master")
|
||||
# Opposite direction must raise.
|
||||
try:
|
||||
validate_manifest_course(m, "practitioner")
|
||||
except SystemExit as e:
|
||||
print(f"OK: cross-course validation refuses: {e}")
|
||||
else:
|
||||
raise SystemExit("FAIL: cross-course validation silently allowed")
|
||||
assert len(m["modules"]) >= 1, "no modules"
|
||||
print(f"OK: {len(m['modules'])} modules in manifest")
|
||||
PY
|
||||
|
||||
echo "=== [3/5] transcribe.py --course master (idempotent dry-run — citește manifest, nu re-transcrie) ==="
|
||||
# Invocarea directă e dominată de disk-check pe transcript_path; dacă toate
|
||||
# .txt există, nu rulează whisper.
|
||||
"$PY" -c "
|
||||
import json
|
||||
m = json.load(open('manifest.json', encoding='utf-8'))
|
||||
from pathlib import Path
|
||||
missing = [l['title'] for mod in m['modules'] for l in mod['lectures']
|
||||
if l.get('transcribe_status') == 'complete'
|
||||
and l.get('type') != 'text'
|
||||
and not Path(l['transcript_path']).exists()]
|
||||
if missing:
|
||||
print('FAIL: transcribe_status=complete but .txt missing for:', missing[:3])
|
||||
raise SystemExit(1)
|
||||
print(f'OK: all completed transcripts present on disk')
|
||||
"
|
||||
|
||||
echo "=== [4/5] summarize.py --course master --compile — SUPORT_CURS.md byte-identic cu baseline ==="
|
||||
"$PY" summarize.py --course master --compile
|
||||
if ! diff -q SUPORT_CURS.md /tmp/suport_before.md >/dev/null; then
|
||||
echo "FAIL: SUPORT_CURS.md diferă de baseline /tmp/suport_before.md"
|
||||
diff /tmp/suport_before.md SUPORT_CURS.md | head -30
|
||||
exit 1
|
||||
fi
|
||||
echo "OK: SUPORT_CURS.md byte-identic cu baseline."
|
||||
|
||||
echo "=== [5/5] cross-course isolation — --course practitioner nu atinge state-ul master ==="
|
||||
OUT="$("$PY" transcribe.py --course practitioner 2>&1 || true)"
|
||||
if echo "$OUT" | grep -qiE "belongs to course|not found"; then
|
||||
echo "OK: transcribe --course practitioner nu a rulat pe manifest master"
|
||||
echo " (mesaj: $(echo "$OUT" | grep -oE '(belongs to course[^"]*|not found[^"]*)' | head -1))"
|
||||
else
|
||||
echo "FAIL: transcribe --course practitioner output neașteptat:"
|
||||
echo "$OUT" | head -3
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "REGRESSION OK — backward-compat curs master intactă."
|
||||
Reference in New Issue
Block a user