ROMFASTSQL

Author	SHA1	Message	Date
Claude Agent	b7fffe1467	feat(dr): manual failover/failback scripts + pveelite-down email alert failover-dr-to-pvemini.sh and failback-dr-to-pveelite.sh promote/demote the rpool/oracle-backups dataset between nodes when pveelite is down. Both refuse to run if the other side is reachable to prevent split-brain. Both patch transfer_backups.ps1 on Oracle Production (10.0.20.36) via SSH to redirect the daily SCP target between 10.0.20.202 and 10.0.20.201. The PowerShell patch uses -EncodedCommand (UTF-16LE base64) so the bash caller does not need to escape PowerShell quoting. End-to-end test including failover -> failback confirmed transfer_backups.ps1 returns to byte-identical state (SHA256 43DD2187...). pveelite-down-alert.sh runs every minute on pvemini and emails an alert with copy-paste failover instructions after 5 consecutive ping failures. The alert body includes the latest oracle-backups and VM 109 replica timestamps so the operator knows the recovery point before deciding. The DR weekly-test script gains a cluster-aware guard at the top that exits silently when /etc/pve/qemu-server/109.conf is not on the local node, allowing the same cron entry to be present on both pveelite and pvemini without double-firing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 19:18:51 +00:00
Claude Agent	62e9926bd4	feat(dr): add cluster + memory pre-flight, deploy VM 109 watchdog DR test script now refuses to start VM 109 if: * cluster is not quorate (e.g. mid-failover into a degraded state), * available memory on the host is below VM 109 config + 1 GB margin. Both checks scale automatically — memory threshold is computed from qm config so resizing VM 109 does not require touching the script. Adds vm109-watchdog.sh, scheduled cluster-wide every minute. The watchdog is the second line of defence behind the cleanup trap from `8a0c557`: it force-stops VM 109 if the trap was bypassed (script killed, host crash mid-test, manual run forgotten). It honours /var/run/vm109-debug.flag for legitimate manual sessions and is node-aware via /etc/pve/qemu-server/109.conf so it can be deployed on every node without coordinating with VM 109's current location. Both safeguards target the 04-18 → 04-20 chain: VM 109 left running 2.5 days then sandwiched against an HA failover that pushed CT 108 Oracle (8 GB) onto pveelite (16 GB) → OOM cascade. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 18:48:12 +00:00
Claude Agent	2e8cd9ca59	fix(dr-test): guard cleanup trap + surface qm start errors The cleanup trap added in `8a0c557` stopped VM 109 unconditionally on EXIT, which kills the VM during --install/--help or when an operator launched it manually for debugging. Gate the trap with DR_VM_STARTED_BY_US so it only fires when the script itself started the VM. Also remove the 2>/dev/null swallow on qm start so cross-node failures (e.g. running on a node where the VM is not configured) appear in the log instead of producing a silent "Failed to start VM 109" in 0 seconds. Root cause for the 2026-04-25 silent failure: cron lived on pveelite while VM 109 had been migrated to pvemini; qm start returned an error that was hidden by the redirect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 08:47:54 +00:00
Claude Agent	60c27e7232	fix(vm109-dr): trap cleanup to stop VM 109 on script exit The DR test script used set -euo pipefail, so a failing SSH shutdown command caused the script to exit before qm stop. On 2026-04-20 this left VM 109 running for 2.5 days and triggered an OOM cascade when pvemini HA-failed over to pveelite. Adds EXIT trap that force-stops VM 109 regardless of exit path, and makes the Step 7 SSH shutdown tolerant of failure. Incident details: proxmox/cluster/incidents/2026-04-20-cluster-outage.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 11:16:04 +00:00
Marius	a567f75f25	Reorganize oracle/ and chatbot/ into proxmox/ per LXC/VM structure - Move oracle/migration-scripts/ to proxmox/lxc108-oracle/migration/ - Move oracle/roa/ and oracle/roa-romconstruct/ to proxmox/lxc108-oracle/sql/ - Move oracle/standby-server-scripts/ to proxmox/vm109-windows-dr/ - Move chatbot/ to proxmox/lxc104-flowise/ - Update proxmox/README.md with new structure and navigation - Update all documentation with correct directory references - Remove unused input/claude-agent-sdk/ files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-27 17:28:53 +02:00

5 Commits