feat(practitioner): structură per-modul + PDF-uri sursă + split 2-PC

- audio/Modul {N}/filename.mp3 — fiecare modul în subdirector separat
  pentru copiere pe telefon și transfer între PC-uri.
- PDF-urile se păstrează ca sursă în summaries/pdf/ (fără extract txt).
- transcribe_status="pdf_source_only" pentru lecțiile PDF → summarize.py
  le filtrează automat.
- Fix coliziune manifest transcript_path (stem-based, nu preserve prior).
- .bat per modul (M2-M8) + dispatchers run_pc1_all (M2-M5) + run_pc2_all
  (M6-M8) pentru partajare work pe 2 PC-uri.
- prepare_pc2_bundle.py: zip cu scripts + manifest + .env + PDFs pentru
  PC2 (self-installs whisper.cpp/model/ffmpeg la primul run).
- M1 whisper complete (49/49 audio+vimeo transcrise).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-23 08:48:58 +03:00
parent 2e4bb88624
commit 6ee53133b7
132 changed files with 28904 additions and 74 deletions

View File

@@ -20,6 +20,8 @@ Three-stage batch pipeline, all driven by a shared `manifest.json`:
**run.bat** — Windows batch orchestrator: checks prerequisites, auto-installs missing components, creates venv, runs download→transcribe pipeline. Accepts optional module filter argument.
**retranscribe_tail.py** / **fix_hallucinations.bat** — Post-processing repair. Scans SRTs for Whisper hallucination bursts (repeated lines), classifies each as `burst` (in-file loop) or `tail` (runs to EOF), extracts the bad audio segment with ffmpeg, re-transcribes it with anti-hallucination whisper.cpp flags, and splices the result back. Supports `--dry-run` and per-file targeting. Run only after transcription produces visibly broken outputs.
## Commands
```bash
@@ -34,6 +36,12 @@ python transcribe.py --modules 1 # transcribe module 1 only
python summarize.py # print summary prompts to stdout
python summarize.py --compile # compile SUPORT_CURS.md from existing summaries
# Repair hallucinated transcripts
python retranscribe_tail.py --dry-run # report what would be fixed
python retranscribe_tail.py # auto-fix all broken transcripts
python retranscribe_tail.py "Master 25M1 Z2B" # fix a single file
fix_hallucinations.bat # same, via Windows wrapper
# MD → PDF (from WSL2, uses .venv_pdf)
.venv_pdf/bin/python md_to_pdf.py # all MODUL*_*.md → summaries/pdf/
.venv_pdf/bin/python md_to_pdf.py --modules 1-3 # specific modules