VM 109 returned to its original home on pveelite, co-located with
oracle-backups NFS storage. The README is updated to reflect that:
the VM is now in HA (ha-prefer-pveelite, state=stopped, nofailback=1)
rather than excluded from HA, and the new layered defences (trap
guard, watchdog cron, dynamic memory pre-flight, max_restart caps)
are documented alongside the original 8a0c557 trap.
Adds a Storage Failover section describing the pveelite -> pvemini
manual failover flow: email alert from pveelite-down-alert.sh,
failover-dr-to-pvemini.sh on the surviving node, failback when
pveelite returns. The pve1 nightly mirror is the third copy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
failover-dr-to-pvemini.sh and failback-dr-to-pveelite.sh promote/demote
the rpool/oracle-backups dataset between nodes when pveelite is down.
Both refuse to run if the other side is reachable to prevent split-brain.
Both patch transfer_backups.ps1 on Oracle Production (10.0.20.36) via
SSH to redirect the daily SCP target between 10.0.20.202 and 10.0.20.201.
The PowerShell patch uses -EncodedCommand (UTF-16LE base64) so the bash
caller does not need to escape PowerShell quoting. End-to-end test
including failover -> failback confirmed transfer_backups.ps1 returns
to byte-identical state (SHA256 43DD2187...).
pveelite-down-alert.sh runs every minute on pvemini and emails an alert
with copy-paste failover instructions after 5 consecutive ping failures.
The alert body includes the latest oracle-backups and VM 109 replica
timestamps so the operator knows the recovery point before deciding.
The DR weekly-test script gains a cluster-aware guard at the top that
exits silently when /etc/pve/qemu-server/109.conf is not on the local
node, allowing the same cron entry to be present on both pveelite and
pvemini without double-firing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Convert /mnt/pve/oracle-backups from a directory on the pveelite
rootfs into a dedicated ZFS dataset rpool/oracle-backups so it can be
incrementally replicated to pvemini. zfs-replicate-oracle-backups.sh
runs every 15 minutes from cron on pveelite and uses zfs send/recv
over the cluster's internal SSH (direct IP, /etc/pve/priv/known_hosts)
to avoid Tailscale magicDNS detours that broke the first attempt.
The destination dataset is set readonly=on so accidental writes on
pvemini cannot diverge it. Snapshot pruning keeps 5 rolling copies.
nightly-backup-mirror.sh ships a third copy nightly to pve1's
backup-ssd (ext4 SATA) — different physical disk, different
filesystem, different node — guarding against the failure mode where
both pveelite and pvemini are simultaneously unavailable. The same
script tars /etc/pve and rotates 14 days of cluster config archives,
since pmxcfs is in-RAM and a multi-node quorum loss would otherwise
take cluster config with it.
The old directory is kept as oracle-backups.old-DELETE-AFTER-2026-05-02
on pveelite for one week as a safety net.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DR test script now refuses to start VM 109 if:
* cluster is not quorate (e.g. mid-failover into a degraded state),
* available memory on the host is below VM 109 config + 1 GB margin.
Both checks scale automatically — memory threshold is computed from
qm config so resizing VM 109 does not require touching the script.
Adds vm109-watchdog.sh, scheduled cluster-wide every minute. The
watchdog is the second line of defence behind the cleanup trap from
8a0c557: it force-stops VM 109 if the trap was bypassed (script
killed, host crash mid-test, manual run forgotten). It honours
/var/run/vm109-debug.flag for legitimate manual sessions and is
node-aware via /etc/pve/qemu-server/109.conf so it can be deployed
on every node without coordinating with VM 109's current location.
Both safeguards target the 04-18 → 04-20 chain: VM 109 left running
2.5 days then sandwiched against an HA failover that pushed CT 108
Oracle (8 GB) onto pveelite (16 GB) → OOM cascade.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cleanup trap added in 8a0c557 stopped VM 109 unconditionally on EXIT,
which kills the VM during --install/--help or when an operator launched
it manually for debugging. Gate the trap with DR_VM_STARTED_BY_US so it
only fires when the script itself started the VM.
Also remove the 2>/dev/null swallow on qm start so cross-node failures
(e.g. running on a node where the VM is not configured) appear in the
log instead of producing a silent "Failed to start VM 109" in 0 seconds.
Root cause for the 2026-04-25 silent failure: cron lived on pveelite
while VM 109 had been migrated to pvemini; qm start returned an error
that was hidden by the redirect.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VM 201 (Windows critical) stays out of HA by design. Added:
- failover-vm201.sh: interactive failover pvemini -> pveelite with ZFS replication state
- recover-vm201-to-pvemini.sh: interactive reverse migration with uptime + split-brain checks
- pvemini-down-alert.sh: cron watchdog on pveelite, emails full runbook after 2min DOWN
Replication RPO tightened: CT 108 + VM 201 to 5min, CT 171 to 15min.
CT 171 added to HA (ha-group-main) for continuous Claude Code access.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Following the 2026-04-20 cluster outage, the cluster README now covers
HA resource limits, corosync token tuning (10s tolerance for USB glitches),
rasdaemon/netconsole/kdump diagnostic stack on pvemini, mail relay via
mail.romfast.ro with SMTP auth, OOM alerting via cron, and swap on pveelite.
VM 109 README now clearly states it was removed from HA and is only
started by the weekly DR test script.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The DR test script used set -euo pipefail, so a failing SSH
shutdown command caused the script to exit before qm stop.
On 2026-04-20 this left VM 109 running for 2.5 days and
triggered an OOM cascade when pvemini HA-failed over to
pveelite.
Adds EXIT trap that force-stops VM 109 regardless of exit
path, and makes the Step 7 SSH shutdown tolerant of failure.
Incident details: proxmox/cluster/incidents/2026-04-20-cluster-outage.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add cleanup for WRI$_ADV_* tables (can accumulate millions of rows/GBs),
scheduler$_event_log truncate, and automatic UNDO/SYSAUX datafile resize
with progressive fallback (2G→4G→6G). Tested on Oracle 18c XE.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Script Python + bat care converteste automat FORALL/BULK_ROWCOUNT
din PACK_CONTAFIN.pck in FOR LOOP compatibil Oracle 10g.
Include pre/post validare, scriere atomica si diff afisare.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ExecuteScriptOS.prc: runs PowerShell scripts via DBMS_SCHEDULER
- UpdateSQLPLUS.prc: runs SQL*Plus scripts via DBMS_SCHEDULER
- find_oracle_locations.sql: comprehensive script to discover all Oracle DB paths for backup/migration
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add section explaining the root cause (IPVS broken in LXC), the
solution (dnsrr endpoint mode), and the dokploy-dnsrr-fix systemd
service that auto-applies the fix on every Dokploy deployment.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Modificări după upgrade MoltBot → OpenClaw:
- RAM crescut de la 2GB la 4GB (minim recomandat pentru OpenClaw)
- Versiune actualizată: OpenClaw v2026.2.9 (fost MoltBot v2026.1.24-3)
- Adăugat troubleshooting pentru OOM kill issues
- Curățate sesiuni vechi (85 → 80)
Problema rezolvată: Gateway-ul era omorât de OOM killer din cauza
memoriei insuficiente (975MB peak cu doar 2GB RAM total).
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Create proxmox/lxc110-moltbot/ with complete README documentation
- MoltBot AI chatbot with Telegram and WhatsApp channels
- Claude Opus 4.5 model integration via Anthropic API
- Security: dedicated moltbot user, UFW firewall, fail2ban, Tailscale SSH
- Gateway on port 18789 (loopback), token+password auth
- Update proxmox/README.md with LXC 110 quick start and navigation
- Update CLAUDE.md network layout with MoltBot entry
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- New-OracleDirectory: Improved verification with direct SQL check, preserves
existing DMPDIR path instead of blindly recreating
- Get-DatafilePath: Better fallback logic using ORACLE_HOME to derive path,
no longer hardcodes C:\app\oracle
- grants-public.sql: Fixed DMPDIR creation - now preserves existing path
instead of overriding with wrong D:\Oracle\admin\ORCL\dpdump
- config.example.ps1: Added DATAFILE_DIR parameter with documentation
These fixes ensure scripts work without manual intervention on fresh Oracle XE
installations where default DMPDIR points to non-existent paths.
Tested on VM 302 - full installation (01-08) now completes successfully.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- New-OracleDirectory now checks if DMPDIR exists with wrong path
- If path differs from target, drops and recreates the directory
- Fixes Oracle XE issue where DMPDIR defaults to D:\Oracle\admin\ORCL\dpdump
- Added VM302-TESTING.md with complete testing workflow documentation
- Includes Proxmox VM management commands, troubleshooting, and deployment steps
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The script had inline SQL that was missing 20 synonyms compared to
synonyms-public.sql, causing PACK_DEF and other packages to fail with
missing synonym errors (SYN_VNOM_UM_ISO, SYN_ATAS_*, SYN_SAL_*, etc.).
Changes:
- Remove all inline SQL (~350 lines)
- Now runs synonyms-public.sql (81 synonyms vs 61 before)
- Now runs grants-public.sql for all grants and ACL
- Add verification of SESIUNE context
This ensures the script stays in sync with the SQL files and
prevents future desync issues.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Key fixes:
- Add Run.cmd/RunAll.cmd wrappers with ExecutionPolicy Bypass
- Add Get-ListenerHost() to auto-detect listener IP address
- Fix impdp connection using EZConnect format (host:port/service)
- Add parallel=1 for Oracle XE compatibility
- Fix Write-Log to accept empty strings with [AllowEmptyString()]
- Fix Get-SchemaObjectCount regex for Windows line endings (\r\n)
- Fix path comparison for DMP file copy operation
- Add GRANT EXECUTE ON SYS.AUTH_PACK TO PUBLIC for PACK_DREPTURI
- Fix VAUTH_SERII view to use SYN_NOM_PROGRAME (has DENUMIRE column)
- Add sections 10-11 to grants-public.sql for SYS object grants
Tested on VM 302 (10.0.20.130) with Oracle XE 21c.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Increase wait time from 10s to max 60s after listener restart
- Add active polling every 5s to check if service is registered
- Log progress while waiting for service registration
- Fixes race condition where script proceeds before service is ready
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- run-all-sys.sql: Master script that executes sys-objects.sql,
sys-grants.sql, and any scripts in sys-updates/ folder in order
- sys-grants.sql: Grants EXECUTE on AUTH_PACK, DBMS_SCHEDULER,
DBMS_LOCK, UTL_* packages to CONTAFIN_ORACLE; creates public
synonyms for SYS procedures; creates DMPDIR directory
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Rename $Host parameter to $DbHost in oracle-functions.ps1 (Invoke-SqlPlus,
Test-OracleConnection, Get-OracleVersion, Test-PDB, Get-ServiceName)
- Update all function calls in 01-setup-database.ps1 to use -DbHost instead of -Host
- Fix ${Host} -> ${DbHost} in log message (line 147)
- Fix Write-Log "" -> Write-Host "" to avoid empty string parameter error
- Add DbHost/Port parameters and config.ps1 support to setup script
- Update sys-updates/README.md to clarify folder is for future patches only
Tested successfully on ROACENTRAL (10.0.20.130) with Oracle XE 21c.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added roa_kill_user_sessions helper procedure
- Kill all active sessions BEFORE attempting DROP USER
- Improved company user detection (also checks for synonyms to CONTAFIN_ORACLE)
- Added more Oracle 21c internal users to exclusion list
- Better error handling and output messages
- Helper procedure auto-cleanup at end
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Original Oracle 10g R1/R2 setup scripts and SQL migrations from 2007-2026.
Preserved as reference for understanding ROA database structure and
historical schema evolution.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
PowerShell scripts for setting up Oracle 21c/XE with ROA application:
- Automated tablespace, user creation and imports
- sqlnet.ora config for Instant Client 11g/ODBC compatibility
- Oracle 21c read-only Home path handling (homes/OraDB21Home1)
- Listener restart + 10G password verifier for legacy auth
- Tested on VM 302 with CONTAFIN_ORACLE schema import
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add config/sqlnet.ora with ALLOWED_LOGON_VERSION=8 for old client support
- Add scripts/fix-sqlnet.sh startup script to persist config across container restarts
- Update README with ORA-28040 troubleshooting, ODBC connection params, and deployment instructions
- Fix SID description: Oracle 18c has PDB (XEPDB1), not non-CDB
- Update container recreation instructions with startup scripts volume
Resolves ORA-28040: No matching authentication protocol when connecting
from Windows ODBC with Oracle Instant Client 11.2
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Rename proxmox/claude-agent/ to proxmox/lxc171-claude-agent/
- Move scripts to scripts/ subdirectory
- Add complete installation guide for new LXC from scratch
- Update proxmox/README.md with LXC 171 documentation and navigation
- Add LXC 171 to containers table
- Remove .serena/project.yml
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Move oracle/migration-scripts/ to proxmox/lxc108-oracle/migration/
- Move oracle/roa/ and oracle/roa-romconstruct/ to proxmox/lxc108-oracle/sql/
- Move oracle/standby-server-scripts/ to proxmox/vm109-windows-dr/
- Move chatbot/ to proxmox/lxc104-flowise/
- Update proxmox/README.md with new structure and navigation
- Update all documentation with correct directory references
- Remove unused input/claude-agent-sdk/ files
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create cluster/ for Proxmox cluster infrastructure (SSH guide, HA monitor, UPS)
- Create lxc108-oracle/ for Oracle Database documentation and scripts
- Create vm201-windows/ for Windows 11 VM docs and SSL certificate scripts
- Add SSL certificate monitoring scripts (check-ssl-certificates.ps1, monitor-ssl-certificates.sh)
- Remove archived VM107 references (decommissioned)
- Update all cross-references between files
- Update main README.md with new structure and navigation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace FORALL bulk operations with FOR loops to avoid PLS-00436 error
on Oracle 10.2.0.5. The older Oracle version does not support referencing
record fields from collection in FORALL statements.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix permission denied on log files (chown nut:nut)
- Fix upssched.conf permissions (root:nut)
- Add sudo for perl to allow PVE::Notify from user nut
- Add periodic battery status emails every minute when on battery
- Add charging status emails at 5, 10, 30 min after power restore
- Remove diacritics from all notification messages
- Update documentation with sudo and permissions setup
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add email notifications via PVE::Notify for all UPS events:
- ONBATT: when UPS switches to battery
- ONLINE: when power is restored
- LOWBATT: critical battery level
- SHUTDOWN_START/NODE/PRIMARY: during cluster shutdown
- COMMBAD: communication lost with UPS
- Add automatic UPS shutdown command after cluster shutdown
(protects against power surge when power returns)
- Update upssched.conf with ONLINE handler and immediate ONBATT notification
- Add notification templates for HTML and text emails
- Update documentation with new features and timer configuration
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Document Flowise and ngrok configuration on LXC 104, including
troubleshooting steps for CORS and version issues.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Storage Configuration improvements:
- Add "Noduri" column showing which nodes have access to each storage
- Clarify that 'local' is separate on each node (non-shared)
- Clarify that 'local-zfs' is shared across pvemini, pve1, pveelite
- Clarify that 'backup' is only on pvemini (10.0.20.201)
- Add detailed explanations for each storage type
- Add storage paths section with important locations
Node name corrections:
- Fix node name: pve2 → pveelite (correct cluster name)
- Update all references across proxmox-ssh-guide.md and README.md
- Add node descriptions in tables for clarity
Benefits:
- Users now know exactly which storage is available on which nodes
- Clear distinction between shared and non-shared storage
- Correct node naming throughout documentation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>