refactor: parametrize pipeline cu --course flag + suport Vimeo/text

Un singur set de scripturi acum rulează pe orice curs configurat în
courses.py. Master rămâne la rădăcina repo (backward-compat M1-M6);
cursuri noi (ex. practitioner la shop.cursnlp.ro) primesc un root
dedicat (nlp-practitioner/) cu propriile artefacte.

- courses.py: config dict (master, practitioner) + course_paths() +
  validate_manifest_course() (manifest fără course_key = master).
- download.py: --course + --modules; trei tipuri de lecții (audio HTTP,
  Vimeo iframe via yt-dlp audio-only, text-only cu captură HTML);
  merge cu manifest existent în loc de replace; strip [Audio] pentru
  backward-compat paths.
- transcribe.py: --course + --modules; skip type==text; path-uri prin
  course_paths(); validare course_key.
- summarize.py: --course + --compile; template prompt folosește
  course['name']; scrie SUPORT_CURS.md cu LF explicit (WSL2 baseline).
- md_to_pdf.py: --course resolv-ă summaries_dir / pdf_dir per curs.
- run.bat: detectează master|practitioner ca primul argument,
  propagă --course la sub-scripturi; backward-compat run.bat [modules].
- requirements.txt: + yt-dlp.
- .gitignore: nlp-practitioner/audio/, audio_wav/, scratch_recon.py, tmp_recon/.
- tests/test_regression.sh: 5 gate-uri read-only (import, schema,
  disk-coherence, SUPORT_CURS byte-identic, cross-course isolation).

Regression curs master: PASS (manifest + SUPORT_CURS.md hash
identic cu baseline /tmp/suport_before.md).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-22 14:33:19 +03:00
parent ada00e380d
commit d22038d002
9 changed files with 1192 additions and 795 deletions

56
run.bat
View File

@@ -2,9 +2,27 @@
setlocal enabledelayedexpansion
cd /d "%~dp0"
:: ============================================================
:: Course + module filter argument parsing
:: Usage:
:: run.bat -> master, all modules (backward-compat)
:: run.bat 1-3 -> master, modules 1-3 (backward-compat)
:: run.bat practitioner -> practitioner, all modules
:: run.bat practitioner 1-3 -> practitioner, modules 1-3
:: ============================================================
set "COURSE_KEY=master"
set "MODULE_FILTER=%~1"
if /i "%~1"=="master" (
set "COURSE_KEY=master"
set "MODULE_FILTER=%~2"
)
if /i "%~1"=="practitioner" (
set "COURSE_KEY=practitioner"
set "MODULE_FILTER=%~2"
)
echo ============================================================
echo NLP Master - Download + Transcribe Pipeline
echo NLP Course Pipeline (course: %COURSE_KEY%)
echo ============================================================
echo.
@@ -46,20 +64,28 @@ if not defined PYTHON_CMD (
)
:: --- .env credentials ---
:: Each course uses its own env var pair. Check based on selected course.
if /i "%COURSE_KEY%"=="practitioner" (
set "ENV_USER=PRACTITIONER_USERNAME"
set "ENV_PASS=PRACTITIONER_PASSWORD"
) else (
set "ENV_USER=COURSE_USERNAME"
set "ENV_PASS=COURSE_PASSWORD"
)
if exist ".env" (
findstr /m "COURSE_USERNAME=." ".env" >nul 2>&1
findstr /m "!ENV_USER!=." ".env" >nul 2>&1
if errorlevel 1 (
echo [X] .env File exists but COURSE_USERNAME is empty
echo Edit .env and fill in your credentials.
echo [X] .env File exists but !ENV_USER! is empty
echo Edit .env and set !ENV_USER! and !ENV_PASS!.
set "PREREQ_OK="
) else (
echo [OK] .env Credentials configured
echo [OK] .env Credentials configured for %COURSE_KEY%
)
) else (
echo [X] .env NOT FOUND
echo Create .env with:
echo COURSE_USERNAME=your_email
echo COURSE_PASSWORD=your_password
echo !ENV_USER!=your_email
echo !ENV_PASS!=your_password
set "PREREQ_OK="
)
@@ -265,11 +291,11 @@ echo Done.
echo.
echo [3/4] Downloading audio files...
echo ============================================================
if "%~1"=="" (
.venv\Scripts\python download.py
if "!MODULE_FILTER!"=="" (
.venv\Scripts\python download.py --course %COURSE_KEY%
) else (
echo Modules filter: %~1
.venv\Scripts\python download.py --modules %~1
echo Modules filter: !MODULE_FILTER!
.venv\Scripts\python download.py --course %COURSE_KEY% --modules !MODULE_FILTER!
)
if errorlevel 1 (
echo.
@@ -287,11 +313,11 @@ echo Using: %WHISPER_BIN%
echo Model: %WHISPER_MODEL%
echo.
if "%~1"=="" (
.venv\Scripts\python transcribe.py
if "!MODULE_FILTER!"=="" (
.venv\Scripts\python transcribe.py --course %COURSE_KEY%
) else (
echo Modules filter: %~1
.venv\Scripts\python transcribe.py --modules %~1
echo Modules filter: !MODULE_FILTER!
.venv\Scripts\python transcribe.py --course %COURSE_KEY% --modules !MODULE_FILTER!
)
if errorlevel 1 (
echo.