- Fix Proxmox template compatibility: {{hostname}} → {{node}}, {{timestamp}} → {{date}}
- Remove duplicate node fields and fix JSON structure
- Complete full testing plan execution for monitoring and DR test scripts
- Validate notification system functionality with PVE::Notify
- Sync tested scripts from production back to repository
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
🛡️ Oracle DR System - Complete Architecture
📊 System Overview
┌─────────────────────────────────────────────────────────────────┐
│ PRODUCTION ENVIRONMENT │
├─────────────────────────────────────────────────────────────────┤
│ PRIMARY SERVER (10.0.20.36) │
│ Windows Server + Oracle 19c │
│ ┌──────────────────────────────┐ │
│ │ Database: ROA │ │
│ │ Size: ~80 GB │ │
│ │ Tables: 42,625 │ │
│ └──────────────────────────────┘ │
│ │ │
│ ▼ Backups (Daily) │
│ ┌──────────────────────────────┐ │
│ │ 02:30 - FULL backup (6-7 GB) │ │
│ │ 13:00 - CUMULATIVE (200 MB) │ │
│ │ 18:00 - CUMULATIVE (300 MB) │ │
│ └──────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
│ SSH Transfer (Port 22)
▼
┌─────────────────────────────────────────────────────────────────┐
│ DR ENVIRONMENT │
├─────────────────────────────────────────────────────────────────┤
│ PROXMOX HOST (10.0.20.202 - pveelite) │
│ ┌──────────────────────────────┐ │
│ │ Backup Storage (NFS Server) │◄─────── Monitoring Scripts │
│ │ /mnt/pve/oracle-backups/ │ /opt/scripts/ │
│ │ └── ROA/autobackup/ │ │
│ └──────────────────────────────┘ │
│ │ │
│ │ NFS Mount (F:\) │
│ ▼ │
│ ┌──────────────────────────────┐ │
│ │ DR VM 109 (10.0.20.37) │ │
│ │ Windows Server + Oracle 19c │ │
│ │ Status: OFF (normally) │ │
│ │ Starts for: Tests or Disaster │ │
│ └──────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
🎯 Quick Actions
⚡ Emergency DR Activation (Production Down!)
# 1. Start DR VM
ssh root@10.0.20.202 "qm start 109"
# 2. Connect to VM (wait 3 min for boot)
ssh -p 22122 romfast@10.0.20.37
# 3. Run restore (takes ~10-15 minutes)
D:\oracle\scripts\rman_restore_from_zero.cmd
# 4. Database is now RUNNING - Update app connections to 10.0.20.37
🧪 Weekly Test (Every Saturday)
# Automatic at 06:00 via cron, or manual:
ssh root@10.0.20.202 "/opt/scripts/weekly-dr-test-proxmox.sh"
# What it does:
# ✓ Starts VM → Restores DB → Tests → Cleanup → Shutdown
# ✓ Sends email report with results
📊 Check Backup Health
# Manual check (runs daily at 09:00 automatically)
ssh root@10.0.20.202 "/opt/scripts/oracle-backup-monitor-proxmox.sh"
# Output:
# Status: OK
# FULL backup age: 11 hours ✓
# CUMULATIVE backup age: 2 hours ✓
# Disk usage: 45% ✓
🗂️ Component Locations
📁 PRIMARY Server (10.0.20.36)
D:\rman_backup\
├── rman_backup_full.txt # RMAN script for FULL backup
├── rman_backup_incremental.txt # RMAN script for CUMULATIVE
├── transfer_to_dr.ps1 # Transfer FULL to Proxmox
└── transfer_incremental.ps1 # Transfer CUMULATIVE to Proxmox
Scheduled Tasks:
├── 02:30 - Oracle RMAN Full Backup
├── 13:00 - Oracle RMAN Cumulative Backup
└── 18:00 - Oracle RMAN Cumulative Backup
📁 PROXMOX Host (10.0.20.202)
/opt/scripts/
├── oracle-backup-monitor-proxmox.sh # Daily backup monitoring
├── weekly-dr-test-proxmox.sh # Weekly DR test
└── PROXMOX_NOTIFICATIONS_README.md # Documentation
/mnt/pve/oracle-backups/ROA/autobackup/
├── FULL_20251010_023001.BKP # Latest FULL backup
├── INCR_20251010_130001.BKP # CUMULATIVE 13:00
└── INCR_20251010_180001.BKP # CUMULATIVE 18:00
Cron Jobs:
0 9 * * * /opt/scripts/oracle-backup-monitor-proxmox.sh
0 6 * * 6 /opt/scripts/weekly-dr-test-proxmox.sh
📁 DR VM 109 (10.0.20.37) - When Running
D:\oracle\scripts\
├── rman_restore_from_zero.cmd # Main restore script ⭐
├── cleanup_database.cmd # Cleanup after test
└── mount-nfs.bat # Mount F:\ at startup
F:\ (NFS mount from Proxmox)
└── ROA\autobackup\ # All backup files
🔄 How It Works
Backup Flow (Daily)
PRIMARY PROXMOX
│ │
├─02:30─FULL─Backup────────►
│ (6-7 GB) │
│ │
├─13:00─CUMULATIVE─────────►
│ (200 MB) │
│ │
└─18:00─CUMULATIVE─────────►
(300 MB) Storage
┌──────────┐
│ Monitor │ 09:00 Daily
│ Check Age│ Alert if old
└──────────┘
Restore Process
Start VM → Mount F:\ → Copy Backups → RMAN Restore → Database OPEN
2min Auto 2min 8min Ready!
Total Time: ~15 minutes
🔧 Manual Operations
Test Individual Components
# 1. Test backup transfer (on PRIMARY)
D:\rman_backup\transfer_incremental.ps1
# 2. Test NFS mount (on VM 109)
mount -o rw,nolock,mtype=hard,timeout=60 10.0.20.202:/mnt/pve/oracle-backups F:
dir F:\ROA\autobackup
# 3. Test notification system
ssh root@10.0.20.202 "touch -d '2 days ago' /mnt/pve/oracle-backups/ROA/autobackup/*FULL*.BKP"
ssh root@10.0.20.202 "/opt/scripts/oracle-backup-monitor-proxmox.sh"
# Should send WARNING notification
# 4. Test database restore (on VM 109)
D:\oracle\scripts\rman_restore_from_zero.cmd
Force Actions
# Force backup now (on PRIMARY)
rman cmdfile=D:\rman_backup\rman_backup_incremental.txt
# Force cleanup VM (on VM 109)
D:\oracle\scripts\cleanup_database.cmd
# Force VM shutdown
ssh root@10.0.20.202 "qm stop 109"
🐛 Troubleshooting
❌ Backup Monitor Not Sending Alerts
# 1. Check templates exist
ssh root@10.0.20.202 "ls /usr/share/pve-manager/templates/default/oracle-*"
# 2. Reinstall templates
ssh root@10.0.20.202 "/opt/scripts/oracle-backup-monitor-proxmox.sh --install"
# 3. Check Proxmox notifications work
ssh root@10.0.20.202 "pvesh create /nodes/$(hostname)/apt/update"
# Should receive update notification
❌ F:\ Drive Not Accessible in VM
# On VM 109:
# 1. Check NFS Client service
Get-Service | Where {$_.Name -like "*NFS*"}
# 2. Manual mount
mount -o rw,nolock,mtype=hard,timeout=60 10.0.20.202:/mnt/pve/oracle-backups F:
# 3. Check Proxmox NFS server
ssh root@10.0.20.202 "showmount -e localhost"
# Should show: /mnt/pve/oracle-backups 10.0.20.37
❌ Restore Fails
# 1. Check backup files exist
dir F:\ROA\autobackup\*.BKP
# 2. Check Oracle service
sc query OracleServiceROA
# 3. Check PFILE exists
dir C:\Users\oracle\admin\ROA\pfile\initROA.ora
# 4. View restore log
type D:\oracle\logs\restore_from_zero.log
❌ VM Won't Start
# Check VM status
ssh root@10.0.20.202 "qm status 109"
# Check VM config
ssh root@10.0.20.202 "qm config 109 | grep -E 'memory|cores|bootdisk'"
# Force unlock if locked
ssh root@10.0.20.202 "qm unlock 109"
# Start with console
ssh root@10.0.20.202 "qm start 109 && qm terminal 109"
📈 Monitoring & Metrics
Key Metrics
| Metric | Target | Alert Threshold |
|---|---|---|
| FULL Backup Age | < 24h | > 25h |
| CUMULATIVE Age | < 6h | > 7h |
| Backup Size | ~7 GB/day | > 10 GB |
| Restore Time | < 15 min | > 30 min |
| Disk Usage | < 80% | > 80% |
Check Logs
# Backup logs (on PRIMARY)
Get-Content D:\rman_backup\logs\backup_*.log -Tail 50
# Transfer logs (on PRIMARY)
Get-Content D:\rman_backup\logs\transfer_*.log -Tail 50
# Monitoring logs (on Proxmox)
tail -50 /var/log/oracle-dr/*.log
# Restore logs (on VM 109)
type D:\oracle\logs\restore_from_zero.log
🔐 Security & Access
SSH Keys Setup
PRIMARY (10.0.20.36) ──────► PROXMOX (10.0.20.202)
SSH Key
Port 22
LINUX WORKSTATION ─────────► PROXMOX (10.0.20.202)
SSH Key
Port 22
LINUX WORKSTATION ─────────► VM 109 (10.0.20.37)
SSH Key
Port 22122
Required Credentials
- PRIMARY: Administrator (for scheduled tasks)
- PROXMOX: root (for scripts and VM control)
- VM 109: romfast (user), SYSTEM (Oracle service)
📅 Maintenance Schedule
| Day | Time | Action | Duration | Impact |
|---|---|---|---|---|
| Daily | 02:30 | FULL Backup | 30 min | None |
| Daily | 09:00 | Monitor Backups | 1 min | None |
| Daily | 13:00 | CUMULATIVE Backup | 5 min | None |
| Daily | 18:00 | CUMULATIVE Backup | 5 min | None |
| Saturday | 06:00 | DR Test | 30 min | None |
🚨 Disaster Recovery Procedure
When PRIMARY is DOWN:
-
Confirm PRIMARY is unreachable
ping 10.0.20.36 # Should fail -
Start DR VM
ssh root@10.0.20.202 "qm start 109" -
Wait for boot (3 minutes)
-
Connect to DR VM
ssh -p 22122 romfast@10.0.20.37 -
Run restore
D:\oracle\scripts\rman_restore_from_zero.cmd -
Verify database
sqlplus / as sysdba SELECT name, open_mode FROM v$database; -- Should show: ROA, READ WRITE -
Update application connections
- Change from: 10.0.20.36:1521/ROA
- Change to: 10.0.20.37:1521/ROA
-
Monitor DR system
- Database is now production
- Do NOT run cleanup!
- Keep VM running
📝 Quick Reference Card
╔══════════════════════════════════════════════════════════════╗
║ DR QUICK REFERENCE ║
╠══════════════════════════════════════════════════════════════╣
║ PRIMARY DOWN? ║
║ ssh root@10.0.20.202 ║
║ qm start 109 ║
║ # Wait 3 min ║
║ ssh -p 22122 romfast@10.0.20.37 ║
║ D:\oracle\scripts\rman_restore_from_zero.cmd ║
╠══════════════════════════════════════════════════════════════╣
║ TEST DR? ║
║ ssh root@10.0.20.202 "/opt/scripts/weekly-dr-test-proxmox.sh"║
╠══════════════════════════════════════════════════════════════╣
║ CHECK BACKUPS? ║
║ ssh root@10.0.20.202 "/opt/scripts/oracle-backup-monitor-proxmox.sh"║
╠══════════════════════════════════════════════════════════════╣
║ SUPPORT: ║
║ Logs: /var/log/oracle-dr/ ║
║ Docs: /opt/scripts/PROXMOX_NOTIFICATIONS_README.md ║
╚══════════════════════════════════════════════════════════════╝
Last Updated: October 10, 2025 Version: 2.0 - Complete DR System with Proxmox Integration Status: ✅ Production Ready