# πŸ›‘οΈ Oracle DR System - Complete Architecture ## πŸ“Š System Overview ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ PRODUCTION ENVIRONMENT β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ PRIMARY SERVER (10.0.20.36) β”‚ β”‚ Windows Server + Oracle 19c β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Database: ROA β”‚ β”‚ β”‚ β”‚ Size: ~80 GB β”‚ β”‚ β”‚ β”‚ Tables: 42,625 β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό Backups (Daily) β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ 02:30 - FULL backup (6-7 GB) β”‚ β”‚ β”‚ β”‚ 13:00 - CUMULATIVE (200 MB) β”‚ β”‚ β”‚ β”‚ 18:00 - CUMULATIVE (300 MB) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ SSH Transfer (Port 22) β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ DR ENVIRONMENT β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ PROXMOX HOST (10.0.20.202 - pveelite) β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Backup Storage (NFS Server) │◄─────── Monitoring Scripts β”‚ β”‚ β”‚ /mnt/pve/oracle-backups/ β”‚ /opt/scripts/ β”‚ β”‚ β”‚ └── ROA/autobackup/ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ NFS Mount (F:\) β”‚ β”‚ β–Ό β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ DR VM 109 (10.0.20.37) β”‚ β”‚ β”‚ β”‚ Windows Server + Oracle 19c β”‚ β”‚ β”‚ β”‚ Status: OFF (normally) β”‚ β”‚ β”‚ β”‚ Starts for: Tests or Disaster β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## 🎯 Quick Actions ### ⚑ Emergency DR Activation (Production Down!) ```bash # 1. Start DR VM ssh root@10.0.20.202 "qm start 109" # 2. Connect to VM (wait 3 min for boot) ssh -p 22122 romfast@10.0.20.37 # 3. Run restore (takes ~10-15 minutes) D:\oracle\scripts\rman_restore_from_zero.cmd # 4. Database is now RUNNING - Update app connections to 10.0.20.37 ``` ### πŸ§ͺ Weekly Test (Every Saturday) ```bash # Automatic at 06:00 via cron, or manual: ssh root@10.0.20.202 "/opt/scripts/weekly-dr-test-proxmox.sh" # What it does: # βœ“ Starts VM β†’ Restores DB β†’ Tests β†’ Cleanup β†’ Shutdown # βœ“ Sends email report with results ``` ### πŸ“Š Check Backup Health ```bash # Manual check (runs daily at 09:00 automatically) ssh root@10.0.20.202 "/opt/scripts/oracle-backup-monitor-proxmox.sh" # Output: # Status: OK # FULL backup age: 11 hours βœ“ # CUMULATIVE backup age: 2 hours βœ“ # Disk usage: 45% βœ“ ``` ## πŸ—‚οΈ Component Locations ### πŸ“ PRIMARY Server (10.0.20.36) ``` D:\rman_backup\ β”œβ”€β”€ rman_backup_full.txt # RMAN script for FULL backup β”œβ”€β”€ rman_backup_incremental.txt # RMAN script for CUMULATIVE β”œβ”€β”€ transfer_to_dr.ps1 # Transfer FULL to Proxmox └── transfer_incremental.ps1 # Transfer CUMULATIVE to Proxmox Scheduled Tasks: β”œβ”€β”€ 02:30 - Oracle RMAN Full Backup β”œβ”€β”€ 13:00 - Oracle RMAN Cumulative Backup └── 18:00 - Oracle RMAN Cumulative Backup ``` ### πŸ“ PROXMOX Host (10.0.20.202) ``` /opt/scripts/ β”œβ”€β”€ oracle-backup-monitor-proxmox.sh # Daily backup monitoring β”œβ”€β”€ weekly-dr-test-proxmox.sh # Weekly DR test └── PROXMOX_NOTIFICATIONS_README.md # Documentation /mnt/pve/oracle-backups/ROA/autobackup/ β”œβ”€β”€ FULL_20251010_023001.BKP # Latest FULL backup β”œβ”€β”€ INCR_20251010_130001.BKP # CUMULATIVE 13:00 └── INCR_20251010_180001.BKP # CUMULATIVE 18:00 Cron Jobs: 0 9 * * * /opt/scripts/oracle-backup-monitor-proxmox.sh 0 6 * * 6 /opt/scripts/weekly-dr-test-proxmox.sh ``` ### πŸ“ DR VM 109 (10.0.20.37) - When Running ``` D:\oracle\scripts\ β”œβ”€β”€ rman_restore_from_zero.cmd # Main restore script ⭐ β”œβ”€β”€ cleanup_database.cmd # Cleanup after test └── mount-nfs.bat # Mount F:\ at startup F:\ (NFS mount from Proxmox) └── ROA\autobackup\ # All backup files ``` ## πŸ”„ How It Works ### Backup Flow (Daily) ``` PRIMARY PROXMOX β”‚ β”‚ β”œβ”€02:30─FULL─Backup────────► β”‚ (6-7 GB) β”‚ β”‚ β”‚ β”œβ”€13:00─CUMULATIVE─────────► β”‚ (200 MB) β”‚ β”‚ β”‚ └─18:00─CUMULATIVE─────────► (300 MB) Storage β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Monitor β”‚ 09:00 Daily β”‚ Check Ageβ”‚ Alert if old β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### Restore Process ``` Start VM β†’ Mount F:\ β†’ Copy Backups β†’ RMAN Restore β†’ Database OPEN 2min Auto 2min 8min Ready! Total Time: ~15 minutes ``` ## πŸ”§ Manual Operations ### Test Individual Components ```bash # 1. Test backup transfer (on PRIMARY) D:\rman_backup\transfer_incremental.ps1 # 2. Test NFS mount (on VM 109) mount -o rw,nolock,mtype=hard,timeout=60 10.0.20.202:/mnt/pve/oracle-backups F: dir F:\ROA\autobackup # 3. Test notification system ssh root@10.0.20.202 "touch -d '2 days ago' /mnt/pve/oracle-backups/ROA/autobackup/*FULL*.BKP" ssh root@10.0.20.202 "/opt/scripts/oracle-backup-monitor-proxmox.sh" # Should send WARNING notification # 4. Test database restore (on VM 109) D:\oracle\scripts\rman_restore_from_zero.cmd ``` ### Force Actions ```bash # Force backup now (on PRIMARY) rman cmdfile=D:\rman_backup\rman_backup_incremental.txt # Force cleanup VM (on VM 109) D:\oracle\scripts\cleanup_database.cmd # Force VM shutdown ssh root@10.0.20.202 "qm stop 109" ``` ## πŸ› Troubleshooting ### πŸ” Debugging Restore Tests #### Check Backup Files on Proxmox (10.0.20.202) ```bash # 1. List all backup files with size and date ssh root@10.0.20.202 "ls -lh /mnt/pve/oracle-backups/ROA/autobackup/*.BKP" # 2. Count backup files ssh root@10.0.20.202 "ls /mnt/pve/oracle-backups/ROA/autobackup/*.BKP | wc -l" # 3. Check latest backups (last 24 hours) ssh root@10.0.20.202 "find /mnt/pve/oracle-backups/ROA/autobackup -name '*.BKP' -mtime -1 -ls" # 4. Show backup files grouped by type (with new naming convention) ssh root@10.0.20.202 "ls -lh /mnt/pve/oracle-backups/ROA/autobackup/ | grep -E '(L0_|L1_|ARC_|SPFILE_|CF_|O1_MF)'" # 5. Check disk space usage ssh root@10.0.20.202 "df -h /mnt/pve/oracle-backups" # 6. Verify newest backup timestamp ssh root@10.0.20.202 "stat /mnt/pve/oracle-backups/ROA/autobackup/L0_*.BKP 2>/dev/null | grep Modify || echo 'No L0 backups with new naming'" ``` #### Verify Backup Files on DR VM (when running) ```powershell # 1. Check NFS mount is accessible Test-Path F:\ROA\autobackup # 2. List all backup files Get-ChildItem F:\ROA\autobackup\*.BKP | Format-Table Name, Length, LastWriteTime # 3. Count backup files (Get-ChildItem F:\ROA\autobackup\*.BKP).Count # 4. Show total backup size "{0:N2} GB" -f ((Get-ChildItem F:\ROA\autobackup\*.BKP | Measure-Object -Property Length -Sum).Sum / 1GB) # 5. Check latest Level 0 backup Get-ChildItem F:\ROA\autobackup\L0_*.BKP -ErrorAction SilentlyContinue | Sort-Object LastWriteTime -Descending | Select-Object -First 1 # 6. Check what was copied during last restore Get-Content D:\oracle\logs\restore_from_zero.log | Select-String "Copying|Copied" ``` #### Check DR Test Results ```bash # 1. View latest DR test log ssh root@10.0.20.202 "ls -lt /var/log/oracle-dr/dr_test_*.log | head -1 | awk '{print \$9}' | xargs cat | tail -100" # 2. Check test status (passed/failed) ssh root@10.0.20.202 "grep -E 'PASSED|FAILED|Database Verification' /var/log/oracle-dr/dr_test_*.log | tail -5" # 3. See backup selection logic output ssh root@10.0.20.202 "grep -A5 'TEST MODE: Selecting' /var/log/oracle-dr/dr_test_*.log | tail -20" # 4. Check how many files were selected ssh root@10.0.20.202 "grep 'Total files selected' /var/log/oracle-dr/dr_test_*.log | tail -1" # 5. View RMAN errors (if any) ssh root@10.0.20.202 "grep -i 'RMAN-\|ORA-' /var/log/oracle-dr/dr_test_*.log | tail -20" ``` #### Simulate Test Locally (on DR VM) ```powershell # 1. Start Oracle service manually Start-Service OracleServiceROA # 2. Run cleanup to prepare for restore D:\oracle\scripts\cleanup_database.ps1 /SILENT # 3. Run restore in test mode D:\oracle\scripts\rman_restore_from_zero.ps1 -TestMode # 4. Verify database opened correctly sqlplus / as sysdba @D:\oracle\scripts\verify_db.sql # 5. Check what backups were used Get-Content D:\oracle\logs\restore_from_zero.log | Select-String "backup piece" # 6. View database verification output Get-Content D:\oracle\logs\restore_from_zero.log | Select-String -Pattern "DB_NAME|OPEN_MODE|TABLES" -Context 0,1 ``` #### Common Restore Test Issues | Issue | Check | Fix | |-------|-------|-----| | Test reports FAILED but DB is open | Check log for "OPEN_MODE: READ WRITE" | Already fixed in latest version | | Missing datafiles in restore | Count backup files: should be 15-40+ | Wait for next full backup or copy all files | | "No backups found" error | Verify NFS mount: `Test-Path F:\` | Remount NFS or check Proxmox NFS service | | Restore takes > 30 min | Check backup size: should be ~5-8 GB | Normal for first restore after format change | | RMAN-06023 errors | Check for L0_*.BKP files on F:\ | Old format: need new backup with naming convention | #### Verify Naming Convention is Active ```bash # Check if new naming convention is being used (after Oct 11, 2025) ssh root@10.0.20.202 "ls /mnt/pve/oracle-backups/ROA/autobackup/ | grep -E '^(L0_|L1_|ARC_|SPFILE_|CF_)' | wc -l" # Should return > 0 if active # If 0, backups are still using old format (O1_MF_ANNNN_*) # Wait for next scheduled backup (02:30 daily) or run manual backup ``` #### Manual Test Run with Verbose Output ```bash # Run test with full output visible ssh root@10.0.20.202 cd /opt/scripts ./weekly-dr-test-proxmox.sh 2>&1 | tee /tmp/dr_test_manual.log # Watch in real-time what's happening # Look for these key stages: # - "TEST MODE: Selecting latest backup set" # - "Total files selected: XX" # - "RMAN restore completed successfully" # - "OPEN_MODE: READ WRITE" ``` ### ❌ Backup Monitor Not Sending Alerts ```bash # 1. Check templates exist ssh root@10.0.20.202 "ls /usr/share/pve-manager/templates/default/oracle-*" # 2. Reinstall templates ssh root@10.0.20.202 "/opt/scripts/oracle-backup-monitor-proxmox.sh --install" # 3. Check Proxmox notifications work ssh root@10.0.20.202 "pvesh create /nodes/$(hostname)/apt/update" # Should receive update notification ``` ### ❌ F:\ Drive Not Accessible in VM ```bash # On VM 109: # 1. Check NFS Client service Get-Service | Where {$_.Name -like "*NFS*"} # 2. Manual mount mount -o rw,nolock,mtype=hard,timeout=60 10.0.20.202:/mnt/pve/oracle-backups F: # 3. Check Proxmox NFS server ssh root@10.0.20.202 "showmount -e localhost" # Should show: /mnt/pve/oracle-backups 10.0.20.37 ``` ### ❌ Restore Fails ```bash # 1. Check backup files exist dir F:\ROA\autobackup\*.BKP # 2. Check Oracle service sc query OracleServiceROA # 3. Check PFILE exists dir C:\Users\oracle\admin\ROA\pfile\initROA.ora # 4. View restore log type D:\oracle\logs\restore_from_zero.log ``` ### ❌ VM Won't Start ```bash # Check VM status ssh root@10.0.20.202 "qm status 109" # Check VM config ssh root@10.0.20.202 "qm config 109 | grep -E 'memory|cores|bootdisk'" # Force unlock if locked ssh root@10.0.20.202 "qm unlock 109" # Start with console ssh root@10.0.20.202 "qm start 109 && qm terminal 109" ``` ## πŸ“ˆ Monitoring & Metrics ### Key Metrics | Metric | Target | Alert Threshold | |--------|--------|-----------------| | FULL Backup Age | < 24h | > 25h | | CUMULATIVE Age | < 6h | > 7h | | Backup Size | ~7 GB/day | > 10 GB | | Restore Time | < 15 min | > 30 min | | Disk Usage | < 80% | > 80% | ### Check Logs ```bash # Backup logs (on PRIMARY) Get-Content D:\rman_backup\logs\backup_*.log -Tail 50 # Transfer logs (on PRIMARY) Get-Content D:\rman_backup\logs\transfer_*.log -Tail 50 # Monitoring logs (on Proxmox) tail -50 /var/log/oracle-dr/*.log # Restore logs (on VM 109) type D:\oracle\logs\restore_from_zero.log ``` ## πŸ” Security & Access ### SSH Keys Setup ``` PRIMARY (10.0.20.36) ──────► PROXMOX (10.0.20.202) SSH Key Port 22 LINUX WORKSTATION ─────────► PROXMOX (10.0.20.202) SSH Key Port 22 LINUX WORKSTATION ─────────► VM 109 (10.0.20.37) SSH Key Port 22122 ``` ### Required Credentials - **PRIMARY**: Administrator (for scheduled tasks) - **PROXMOX**: root (for scripts and VM control) - **VM 109**: romfast (user), SYSTEM (Oracle service) ## πŸ“… Maintenance Schedule | Day | Time | Action | Duration | Impact | |-----|------|--------|----------|--------| | Daily | 02:30 | FULL Backup | 30 min | None | | Daily | 09:00 | Monitor Backups | 1 min | None | | Daily | 13:00 | CUMULATIVE Backup | 5 min | None | | Daily | 18:00 | CUMULATIVE Backup | 5 min | None | | Saturday | 06:00 | DR Test | 30 min | None | ## 🚨 Disaster Recovery Procedure ### When PRIMARY is DOWN: 1. **Confirm PRIMARY is unreachable** ```bash ping 10.0.20.36 # Should fail ``` 2. **Start DR VM** ```bash ssh root@10.0.20.202 "qm start 109" ``` 3. **Wait for boot (3 minutes)** 4. **Connect to DR VM** ```bash ssh -p 22122 romfast@10.0.20.37 ``` 5. **Run restore** ```cmd D:\oracle\scripts\rman_restore_from_zero.cmd ``` 6. **Verify database** ```sql sqlplus / as sysdba SELECT name, open_mode FROM v$database; -- Should show: ROA, READ WRITE ``` 7. **Update application connections** - Change from: 10.0.20.36:1521/ROA - Change to: 10.0.20.37:1521/ROA 8. **Monitor DR system** - Database is now production - Do NOT run cleanup! - Keep VM running ## πŸ“ Quick Reference Card ``` ╔══════════════════════════════════════════════════════════════╗ β•‘ DR QUICK REFERENCE β•‘ ╠══════════════════════════════════════════════════════════════╣ β•‘ PRIMARY DOWN? β•‘ β•‘ ssh root@10.0.20.202 β•‘ β•‘ qm start 109 β•‘ β•‘ # Wait 3 min β•‘ β•‘ ssh -p 22122 romfast@10.0.20.37 β•‘ β•‘ D:\oracle\scripts\rman_restore_from_zero.cmd β•‘ ╠══════════════════════════════════════════════════════════════╣ β•‘ TEST DR? β•‘ β•‘ ssh root@10.0.20.202 "/opt/scripts/weekly-dr-test-proxmox.sh"β•‘ ╠══════════════════════════════════════════════════════════════╣ β•‘ CHECK BACKUPS? β•‘ β•‘ ssh root@10.0.20.202 "/opt/scripts/oracle-backup-monitor-proxmox.sh"β•‘ ╠══════════════════════════════════════════════════════════════╣ β•‘ SUPPORT: β•‘ β•‘ Logs: /var/log/oracle-dr/ β•‘ β•‘ Docs: /opt/scripts/PROXMOX_NOTIFICATIONS_README.md β•‘ β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• ``` --- **Last Updated:** October 11, 2025 **Version:** 2.1 - Added restore test debugging guide + naming convention **Status:** βœ… Production Ready