feat(dr): manual failover/failback scripts + pveelite-down email alert
failover-dr-to-pvemini.sh and failback-dr-to-pveelite.sh promote/demote the rpool/oracle-backups dataset between nodes when pveelite is down. Both refuse to run if the other side is reachable to prevent split-brain. Both patch transfer_backups.ps1 on Oracle Production (10.0.20.36) via SSH to redirect the daily SCP target between 10.0.20.202 and 10.0.20.201. The PowerShell patch uses -EncodedCommand (UTF-16LE base64) so the bash caller does not need to escape PowerShell quoting. End-to-end test including failover -> failback confirmed transfer_backups.ps1 returns to byte-identical state (SHA256 43DD2187...). pveelite-down-alert.sh runs every minute on pvemini and emails an alert with copy-paste failover instructions after 5 consecutive ping failures. The alert body includes the latest oracle-backups and VM 109 replica timestamps so the operator knows the recovery point before deciding. The DR weekly-test script gains a cluster-aware guard at the top that exits silently when /etc/pve/qemu-server/109.conf is not on the local node, allowing the same cron entry to be present on both pveelite and pvemini without double-firing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -44,6 +44,14 @@ export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
|
||||
|
||||
# Configuration
|
||||
DR_VM_ID="109"
|
||||
|
||||
# Cluster-aware exit: only the node currently hosting VM 109 should run the
|
||||
# test. With cron deployed on both pveelite (normal home) and pvemini (DR
|
||||
# failover home), this guard ensures only one instance fires.
|
||||
if [ ! -f "/etc/pve/qemu-server/${DR_VM_ID}.conf" ] && [ "${1:-}" != "--install" ] && [ "${1:-}" != "--help" ]; then
|
||||
exit 0
|
||||
fi
|
||||
|
||||
DR_VM_IP="10.0.20.37"
|
||||
DR_VM_PORT="22122"
|
||||
DR_VM_USER="romfast"
|
||||
|
||||
Reference in New Issue
Block a user