diff --git a/oracle/standby-server-scripts/README.md b/oracle/standby-server-scripts/README.md index 8c40a2a..f5cdd9b 100644 --- a/oracle/standby-server-scripts/README.md +++ b/oracle/standby-server-scripts/README.md @@ -195,6 +195,130 @@ ssh root@10.0.20.202 "qm stop 109" ## 🐛 Troubleshooting +### 🔍 Debugging Restore Tests + +#### Check Backup Files on Proxmox (10.0.20.202) + +```bash +# 1. List all backup files with size and date +ssh root@10.0.20.202 "ls -lh /mnt/pve/oracle-backups/ROA/autobackup/*.BKP" + +# 2. Count backup files +ssh root@10.0.20.202 "ls /mnt/pve/oracle-backups/ROA/autobackup/*.BKP | wc -l" + +# 3. Check latest backups (last 24 hours) +ssh root@10.0.20.202 "find /mnt/pve/oracle-backups/ROA/autobackup -name '*.BKP' -mtime -1 -ls" + +# 4. Show backup files grouped by type (with new naming convention) +ssh root@10.0.20.202 "ls -lh /mnt/pve/oracle-backups/ROA/autobackup/ | grep -E '(L0_|L1_|ARC_|SPFILE_|CF_|O1_MF)'" + +# 5. Check disk space usage +ssh root@10.0.20.202 "df -h /mnt/pve/oracle-backups" + +# 6. Verify newest backup timestamp +ssh root@10.0.20.202 "stat /mnt/pve/oracle-backups/ROA/autobackup/L0_*.BKP 2>/dev/null | grep Modify || echo 'No L0 backups with new naming'" +``` + +#### Verify Backup Files on DR VM (when running) + +```powershell +# 1. Check NFS mount is accessible +Test-Path F:\ROA\autobackup + +# 2. List all backup files +Get-ChildItem F:\ROA\autobackup\*.BKP | Format-Table Name, Length, LastWriteTime + +# 3. Count backup files +(Get-ChildItem F:\ROA\autobackup\*.BKP).Count + +# 4. Show total backup size +"{0:N2} GB" -f ((Get-ChildItem F:\ROA\autobackup\*.BKP | Measure-Object -Property Length -Sum).Sum / 1GB) + +# 5. Check latest Level 0 backup +Get-ChildItem F:\ROA\autobackup\L0_*.BKP -ErrorAction SilentlyContinue | Sort-Object LastWriteTime -Descending | Select-Object -First 1 + +# 6. Check what was copied during last restore +Get-Content D:\oracle\logs\restore_from_zero.log | Select-String "Copying|Copied" +``` + +#### Check DR Test Results + +```bash +# 1. View latest DR test log +ssh root@10.0.20.202 "ls -lt /var/log/oracle-dr/dr_test_*.log | head -1 | awk '{print \$9}' | xargs cat | tail -100" + +# 2. Check test status (passed/failed) +ssh root@10.0.20.202 "grep -E 'PASSED|FAILED|Database Verification' /var/log/oracle-dr/dr_test_*.log | tail -5" + +# 3. See backup selection logic output +ssh root@10.0.20.202 "grep -A5 'TEST MODE: Selecting' /var/log/oracle-dr/dr_test_*.log | tail -20" + +# 4. Check how many files were selected +ssh root@10.0.20.202 "grep 'Total files selected' /var/log/oracle-dr/dr_test_*.log | tail -1" + +# 5. View RMAN errors (if any) +ssh root@10.0.20.202 "grep -i 'RMAN-\|ORA-' /var/log/oracle-dr/dr_test_*.log | tail -20" +``` + +#### Simulate Test Locally (on DR VM) + +```powershell +# 1. Start Oracle service manually +Start-Service OracleServiceROA + +# 2. Run cleanup to prepare for restore +D:\oracle\scripts\cleanup_database.ps1 /SILENT + +# 3. Run restore in test mode +D:\oracle\scripts\rman_restore_from_zero.ps1 -TestMode + +# 4. Verify database opened correctly +sqlplus / as sysdba @D:\oracle\scripts\verify_db.sql + +# 5. Check what backups were used +Get-Content D:\oracle\logs\restore_from_zero.log | Select-String "backup piece" + +# 6. View database verification output +Get-Content D:\oracle\logs\restore_from_zero.log | Select-String -Pattern "DB_NAME|OPEN_MODE|TABLES" -Context 0,1 +``` + +#### Common Restore Test Issues + +| Issue | Check | Fix | +|-------|-------|-----| +| Test reports FAILED but DB is open | Check log for "OPEN_MODE: READ WRITE" | Already fixed in latest version | +| Missing datafiles in restore | Count backup files: should be 15-40+ | Wait for next full backup or copy all files | +| "No backups found" error | Verify NFS mount: `Test-Path F:\` | Remount NFS or check Proxmox NFS service | +| Restore takes > 30 min | Check backup size: should be ~5-8 GB | Normal for first restore after format change | +| RMAN-06023 errors | Check for L0_*.BKP files on F:\ | Old format: need new backup with naming convention | + +#### Verify Naming Convention is Active + +```bash +# Check if new naming convention is being used (after Oct 11, 2025) +ssh root@10.0.20.202 "ls /mnt/pve/oracle-backups/ROA/autobackup/ | grep -E '^(L0_|L1_|ARC_|SPFILE_|CF_)' | wc -l" +# Should return > 0 if active + +# If 0, backups are still using old format (O1_MF_ANNNN_*) +# Wait for next scheduled backup (02:30 daily) or run manual backup +``` + +#### Manual Test Run with Verbose Output + +```bash +# Run test with full output visible +ssh root@10.0.20.202 +cd /opt/scripts +./weekly-dr-test-proxmox.sh 2>&1 | tee /tmp/dr_test_manual.log + +# Watch in real-time what's happening +# Look for these key stages: +# - "TEST MODE: Selecting latest backup set" +# - "Total files selected: XX" +# - "RMAN restore completed successfully" +# - "OPEN_MODE: READ WRITE" +``` + ### ❌ Backup Monitor Not Sending Alerts ```bash @@ -384,6 +508,6 @@ LINUX WORKSTATION ─────────► VM 109 (10.0.20.37) --- -**Last Updated:** October 10, 2025 -**Version:** 2.0 - Complete DR System with Proxmox Integration +**Last Updated:** October 11, 2025 +**Version:** 2.1 - Added restore test debugging guide + naming convention **Status:** ✅ Production Ready \ No newline at end of file