docs(proxmox): document HA, corosync tuning, diagnostic tools and mail relay
Following the 2026-04-20 cluster outage, the cluster README now covers HA resource limits, corosync token tuning (10s tolerance for USB glitches), rasdaemon/netconsole/kdump diagnostic stack on pvemini, mail relay via mail.romfast.ro with SMTP auth, OOM alerting via cron, and swap on pveelite. VM 109 README now clearly states it was removed from HA and is only started by the weekly DR test script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -7,8 +7,10 @@ proxmox/
|
||||
├── README.md # Acest fișier (index principal)
|
||||
│
|
||||
├── cluster/ # Infrastructură cluster Proxmox
|
||||
│ ├── README.md # Ghid SSH și administrare cluster
|
||||
│ ├── README.md # Ghid SSH, HA, corosync tuning, diagnostic tools, mail relay, OOM alert
|
||||
│ ├── cluster-ha-monitor.sh # Script monitorizare HA
|
||||
│ ├── incidents/ # Post-mortems incidente cluster
|
||||
│ │ └── 2026-04-20-cluster-outage.md # Cascadă OOM pveelite + USB LAN watchdog
|
||||
│ └── ups/ # Sistem UPS pentru cluster
|
||||
│ ├── README.md
|
||||
│ ├── docs/
|
||||
|
||||
Reference in New Issue
Block a user