nlp-master/.gitignore at bbc588454564aabdc6e725576cc211b2b5d8d634 - nlp-master - Gitea: Git with a cup of tea

romfast/nlp-master

Files

Marius Mutu bbc5884545 NLP Master: pipeline download + transcribe + summarize

- run.bat: one-click pipeline (download, convert, transcribe)
- download.py: fetch audio from course platform
- transcribe.py: whisper.cpp batch transcription (CPU, WAV 16kHz)
  - MP3->WAV conversion via ffmpeg
  - --modules filter for splitting work across machines
- summarize.py: generate summaries from transcripts
- setup_whisper.py: auto-download whisper.cpp, ffmpeg, and model
- Medium model (q5_0) instead of large to avoid VRAM crashes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-24 01:37:13 +02:00

35 lines

378 B

Plaintext

Raw Blame History

 # Audio files
 audio/
 *.mp3
 *.wav
 # Whisper models
 models/
 *.bin
 # Credentials
 .env
 # Transcripts and summaries (large generated content)
 transcripts/
 summaries/
 # Binaries (downloaded by setup_whisper.py)
 whisper-bin/
 ffmpeg-bin/
 # Temp files
 .whisper_bin_path
 .ffmpeg_bin_path
 # WAV cache (converted from MP3)
 audio_wav/
 # Python
 __pycache__/
 *.pyc
 .venv/
 # Logs
 *.log