Files
nlp-master/TODOS.md
Marius Mutu bbc5884545 NLP Master: pipeline download + transcribe + summarize
- run.bat: one-click pipeline (download, convert, transcribe)
- download.py: fetch audio from course platform
- transcribe.py: whisper.cpp batch transcription (CPU, WAV 16kHz)
  - MP3->WAV conversion via ffmpeg
  - --modules filter for splitting work across machines
- summarize.py: generate summaries from transcripts
- setup_whisper.py: auto-download whisper.cpp, ffmpeg, and model
- Medium model (q5_0) instead of large to avoid VRAM crashes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 01:37:13 +02:00

573 B

TODOS

Re-run pipeline for Module 6

  • What: Re-run download.py when module 6 becomes available on cursuri.aresens.ro/curs/26
  • Why: Course has 6 modules total, only 5 are currently available. Pipeline is designed to be re-runnable — manifest.json + resumability means it discovers new modules and skips already-downloaded files.
  • How: Run python download.py → check manifest for new files → run python transcribe.py → generate summaries → update SUPORT_CURS.md
  • Depends on: Course provider publishing module 6
  • Added: 2026-03-24