Add complete UPS monitoring system with monthly battery testing

This commit adds a comprehensive UPS monitoring and management system for
the Proxmox cluster with automated shutdown orchestration and monthly
battery health testing.

Features:
- NUT (Network UPS Tools) configuration for INNO TECH USB UPS
- Automated cluster shutdown on power failure (3-minute grace period)
- Monthly automated battery testing with health evaluation
- Email notifications via PVE::Notify system
- WinNUT monitoring client for Windows VM 201

Components added:
- config/: NUT configuration files (ups.conf, upsd.conf, upsmon.conf, etc.)
- scripts/ups-shutdown-cluster.sh: Orchestrated cluster shutdown
- scripts/ups-monthly-test.sh: Monthly battery test with email reports
- scripts/upssched-cmd: Event handler for UPS state changes
- docs/: Complete installation and usage documentation

Key findings:
- UPS battery.charge reporting has 10-40 second delay after test start
- Test must monitor voltage drop (1.5-2V) and charge drop (9-27%)
- Battery health evaluation: EXCELLENT/GOOD/FAIR/POOR based on discharge rate
- Email notifications use Handlebars templates without Unicode emojis for compatibility

Configuration:
- UPS: INNO TECH (Voltronic protocol, vendor 0665:5161)
- Primary node: pvemini (10.0.20.201) with USB connection
- Monthly test: cron 0 0 1 * * /opt/scripts/ups-monthly-test.sh
- Shutdown timer: 180 seconds on battery before cluster shutdown

Documentation includes complete installation guides for NUT server,
WinNUT client, and troubleshooting procedures.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Marius
2025-10-06 21:39:46 +03:00
parent 238c02fdf0
commit 87b9709a0d
14 changed files with 3292 additions and 0 deletions

View File

@@ -0,0 +1,23 @@
# Configurare upssched pentru shutdown orchestrat cluster Proxmox
#
# Acest fișier definește acțiuni temporale pentru evenimente UPS
CMDSCRIPT /usr/local/bin/upssched-cmd
PIPEFN /run/nut/upssched.pipe
LOCKFN /run/nut/upssched.lock
# Când UPS trece pe baterie (ONBATT), așteaptă 180 secunde (3 minute)
# Dacă curentul revine în acest timp, anulează shutdown-ul
AT ONBATT * START-TIMER onbatt 180
# Când UPS raportează baterie scăzută (LOWBATT), shutdown imediat
AT LOWBATT * EXECUTE lowbatt
# Când curentul revine (ONLINE), anulează toate timer-ele
AT ONLINE * CANCEL-TIMER onbatt
# Când comunicația cu UPS se pierde (COMMBAD), așteaptă 30 secunde
AT COMMBAD * START-TIMER commbad 30
# Când comunicația este restabilită (COMMOK), anulează timer-ul
AT COMMOK * CANCEL-TIMER commbad