The cleanup trap added in 8a0c557 stopped VM 109 unconditionally on EXIT,
which kills the VM during --install/--help or when an operator launched
it manually for debugging. Gate the trap with DR_VM_STARTED_BY_US so it
only fires when the script itself started the VM.
Also remove the 2>/dev/null swallow on qm start so cross-node failures
(e.g. running on a node where the VM is not configured) appear in the
log instead of producing a silent "Failed to start VM 109" in 0 seconds.
Root cause for the 2026-04-25 silent failure: cron lived on pveelite
while VM 109 had been migrated to pvemini; qm start returned an error
that was hidden by the redirect.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Following the 2026-04-20 cluster outage, the cluster README now covers
HA resource limits, corosync token tuning (10s tolerance for USB glitches),
rasdaemon/netconsole/kdump diagnostic stack on pvemini, mail relay via
mail.romfast.ro with SMTP auth, OOM alerting via cron, and swap on pveelite.
VM 109 README now clearly states it was removed from HA and is only
started by the weekly DR test script.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The DR test script used set -euo pipefail, so a failing SSH
shutdown command caused the script to exit before qm stop.
On 2026-04-20 this left VM 109 running for 2.5 days and
triggered an OOM cascade when pvemini HA-failed over to
pveelite.
Adds EXIT trap that force-stops VM 109 regardless of exit
path, and makes the Step 7 SSH shutdown tolerant of failure.
Incident details: proxmox/cluster/incidents/2026-04-20-cluster-outage.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Move oracle/migration-scripts/ to proxmox/lxc108-oracle/migration/
- Move oracle/roa/ and oracle/roa-romconstruct/ to proxmox/lxc108-oracle/sql/
- Move oracle/standby-server-scripts/ to proxmox/vm109-windows-dr/
- Move chatbot/ to proxmox/lxc104-flowise/
- Update proxmox/README.md with new structure and navigation
- Update all documentation with correct directory references
- Remove unused input/claude-agent-sdk/ files
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>