NLP Master: pipeline download + transcribe + summarize

- run.bat: one-click pipeline (download, convert, transcribe) - download.py: fetch audio from course platform - transcribe.py: whisper.cpp batch transcription (CPU, WAV 16kHz) - MP3->WAV conversion via ffmpeg - --modules filter for splitting work across machines - summarize.py: generate summaries from transcripts - setup_whisper.py: auto-download whisper.cpp, ffmpeg, and model - Medium model (q5_0) instead of large to avoid VRAM crashes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 01:37:13 +02:00
commit bbc5884545
10 changed files with 2203 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,34 @@
+# Audio files
+audio/
+*.mp3
+*.wav
+
+# Whisper models
+models/
+*.bin
+
+# Credentials
+.env
+
+# Transcripts and summaries (large generated content)
+transcripts/
+summaries/
+
+# Binaries (downloaded by setup_whisper.py)
+whisper-bin/
+ffmpeg-bin/
+
+# Temp files
+.whisper_bin_path
+.ffmpeg_bin_path
+
+# WAV cache (converted from MP3)
+audio_wav/
+
+# Python
+__pycache__/
+*.pyc
+.venv/
+
+# Logs
+*.log