feat(practitioner): structură per-modul + PDF-uri sursă + split 2-PC
- audio/Modul {N}/filename.mp3 — fiecare modul în subdirector separat
pentru copiere pe telefon și transfer între PC-uri.
- PDF-urile se păstrează ca sursă în summaries/pdf/ (fără extract txt).
- transcribe_status="pdf_source_only" pentru lecțiile PDF → summarize.py
le filtrează automat.
- Fix coliziune manifest transcript_path (stem-based, nu preserve prior).
- .bat per modul (M2-M8) + dispatchers run_pc1_all (M2-M5) + run_pc2_all
(M6-M8) pentru partajare work pe 2 PC-uri.
- prepare_pc2_bundle.py: zip cu scripts + manifest + .env + PDFs pentru
PC2 (self-installs whisper.cpp/model/ffmpeg la primul run).
- M1 whisper complete (49/49 audio+vimeo transcrise).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -20,6 +20,8 @@ Three-stage batch pipeline, all driven by a shared `manifest.json`:
|
||||
|
||||
**run.bat** — Windows batch orchestrator: checks prerequisites, auto-installs missing components, creates venv, runs download→transcribe pipeline. Accepts optional module filter argument.
|
||||
|
||||
**retranscribe_tail.py** / **fix_hallucinations.bat** — Post-processing repair. Scans SRTs for Whisper hallucination bursts (repeated lines), classifies each as `burst` (in-file loop) or `tail` (runs to EOF), extracts the bad audio segment with ffmpeg, re-transcribes it with anti-hallucination whisper.cpp flags, and splices the result back. Supports `--dry-run` and per-file targeting. Run only after transcription produces visibly broken outputs.
|
||||
|
||||
## Commands
|
||||
|
||||
```bash
|
||||
@@ -34,6 +36,12 @@ python transcribe.py --modules 1 # transcribe module 1 only
|
||||
python summarize.py # print summary prompts to stdout
|
||||
python summarize.py --compile # compile SUPORT_CURS.md from existing summaries
|
||||
|
||||
# Repair hallucinated transcripts
|
||||
python retranscribe_tail.py --dry-run # report what would be fixed
|
||||
python retranscribe_tail.py # auto-fix all broken transcripts
|
||||
python retranscribe_tail.py "Master 25M1 Z2B" # fix a single file
|
||||
fix_hallucinations.bat # same, via Windows wrapper
|
||||
|
||||
# MD → PDF (from WSL2, uses .venv_pdf)
|
||||
.venv_pdf/bin/python md_to_pdf.py # all MODUL*_*.md → summaries/pdf/
|
||||
.venv_pdf/bin/python md_to_pdf.py --modules 1-3 # specific modules
|
||||
|
||||
Reference in New Issue
Block a user