NLP Master: pipeline download + transcribe + summarize
- run.bat: one-click pipeline (download, convert, transcribe) - download.py: fetch audio from course platform - transcribe.py: whisper.cpp batch transcription (CPU, WAV 16kHz) - MP3->WAV conversion via ffmpeg - --modules filter for splitting work across machines - summarize.py: generate summaries from transcripts - setup_whisper.py: auto-download whisper.cpp, ffmpeg, and model - Medium model (q5_0) instead of large to avoid VRAM crashes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
34
.gitignore
vendored
Normal file
34
.gitignore
vendored
Normal file
@@ -0,0 +1,34 @@
|
||||
# Audio files
|
||||
audio/
|
||||
*.mp3
|
||||
*.wav
|
||||
|
||||
# Whisper models
|
||||
models/
|
||||
*.bin
|
||||
|
||||
# Credentials
|
||||
.env
|
||||
|
||||
# Transcripts and summaries (large generated content)
|
||||
transcripts/
|
||||
summaries/
|
||||
|
||||
# Binaries (downloaded by setup_whisper.py)
|
||||
whisper-bin/
|
||||
ffmpeg-bin/
|
||||
|
||||
# Temp files
|
||||
.whisper_bin_path
|
||||
.ffmpeg_bin_path
|
||||
|
||||
# WAV cache (converted from MP3)
|
||||
audio_wav/
|
||||
|
||||
# Python
|
||||
__pycache__/
|
||||
*.pyc
|
||||
.venv/
|
||||
|
||||
# Logs
|
||||
*.log
|
||||
Reference in New Issue
Block a user