Files
ROMFASTSQL/oracle/standby-server-scripts/DR_WINDOWS_VM_IMPLEMENTATION_PLAN.md
Marius ac2340c967 Oracle DR: Complete Windows VM implementation and cleanup
Major changes:
- Implemented Windows VM 109 as DR target (replaces Linux LXC)
- Tested RMAN restore successfully (12-15 min RTO, 24h RPO)
- Added comprehensive DR documentation:
  * DR_WINDOWS_VM_STATUS_2025-10-09.md - Current implementation status
  * DR_UPGRADE_TO_CUMULATIVE_PLAN.md - Plan for cumulative incremental backups
  * DR_VM_MIGRATION_GUIDE.md - Guide for VM migration between Proxmox nodes
- Updated DR_WINDOWS_VM_IMPLEMENTATION_PLAN.md with completed phases

New scripts:
- add_system_key_dr.ps1 - SSH key setup for automated transfers
- configure_listener_dr.ps1 - Oracle Listener configuration
- fix_ssh_via_service.ps1 - SSH authentication fix
- rman_restore_final.cmd - Working RMAN restore script (tested)
- transfer_to_dr.ps1 - FULL backup transfer (renamed from 02_*)
- transfer_incremental.ps1 - Incremental backup transfer (renamed from 02b_*)

Cleanup:
- Removed 19 obsolete scripts for Linux LXC DR
- Removed 8 outdated documentation files
- Organized project structure

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-09 18:54:08 +03:00

30 KiB

Oracle DR - Windows VM Implementation Plan

Generated: 2025-10-08 Objective: Replace Linux LXC DR with Windows VM for same-platform RMAN restore Target: Windows VM in Proxmox, IP 10.0.20.37, Oracle 19c SE2


📋 PRE-IMPLEMENTATION CHECKLIST

Current Infrastructure (IMPLEMENTED )

  • PRIMARY: Windows Server, Oracle 19c SE2, IP: 10.0.20.36, SSH port 22122
  • Database: ROA, DBID: 1363569330
  • RMAN backups: FULL daily (02:30 AM)
  • DIFFERENTIAL INCREMENTAL (14:00) - NOT USED (causes UNDO corruption on restore)
  • Transfer scripts: PowerShell scripts transferring to VM 109 (Windows)
  • Backup size: ~6-7GB compressed (from 23GB), retention 2 days
  • DR target: Windows VM 109 (10.0.20.37) on pveelite - OPERATIONAL

Planned Upgrade (see DR_UPGRADE_TO_CUMULATIVE_PLAN.md)

  • 🔄 Convert DIFFERENTIAL → CUMULATIVE incremental backups
  • 🔄 Add second daily incremental (13:00 + 18:00 vs current 14:00 only)
  • 🔄 Store backups on Proxmox host (pveelite), mounted in VM when needed
  • 🔄 Target RPO: 3-4 hours (vs current 24 hours)

What We'll Build

  • 🎯 Windows VM in Proxmox (replaces LXC 109)
  • 🎯 IP: 10.0.20.37 (same as current LXC)
  • 🎯 Oracle 19c SE2 installed (empty database template)
  • 🎯 OpenSSH Server for passwordless transfer
  • 🎯 RMAN restore scripts (automated DR recovery)
  • 🎯 Zero daily resource consumption (VM powered off when not needed)

Resource Requirements

  • RAM: 4-6 GB (allocated, but VM runs only during DR events)
  • Disk: 100 GB (OS + Oracle + backup storage)
  • CPU: 2-4 vCPU
  • Network: Access to 10.0.20.0/24

🚀 PHASE 1: CREATE WINDOWS VM IN PROXMOX (30 minutes)

Step 1.1: Download Windows 11 ISO

# On Proxmox host or download station
cd /var/lib/vz/template/iso

# Option A: Download Windows 11 from Microsoft
wget -O Win11_EnglishInternational_x64v1.iso \
  "https://software-download.microsoft.com/download/pr/..."

# Option B: Upload existing ISO via Proxmox web UI
# Datacenter → Storage → ISO Images → Upload

Step 1.2: Create VM in Proxmox Web UI

Proxmox Web UI → Create VM

General:
  - VM ID: 109 (same as LXC number for consistency)
  - Name: oracle-dr-windows
  - Start at boot: NO (VM stays off until DR event)

OS:
  - ISO: Win11_EnglishInternational_x64v1.iso
  - Type: Microsoft Windows
  - Version: 11/2022

System:
  - Machine: q35
  - BIOS: OVMF (UEFI)
  - Add TPM: YES (for Windows 11)
  - SCSI Controller: VirtIO SCSI

Disks:
  - Bus/Device: SCSI 0
  - Storage: local-lvm (or your storage)
  - Size: 100 GB
  - Cache: Write back
  - Discard: YES
  - IO thread: YES

CPU:
  - Cores: 4
  - Type: host

Memory:
  - RAM: 6144 MB (6 GB)
  - Ballooning: NO

Network:
  - Bridge: vmbr0
  - Model: VirtIO
  - Firewall: NO

Step 1.3: Install Windows 11

1. Start VM → Open Console (noVNC)
2. Boot from ISO
3. Windows Setup:
   - Language: English
   - Install Now
   - Windows 11 Pro (or your edition)
   - Custom Install
   - Load driver: Browse → virtio-win-0.1.x (if needed for disk detection)
   - Select disk → Format → Next

4. Initial Setup:
   - Computer name: ORACLE-DR
   - Local account: Administrator / <strong-password>
   - Disable all telemetry/tracking options

5. First boot:
   - Disable Windows Defender real-time protection (for Oracle performance)
   - Disable Windows Update automatic restart
   - Install VirtIO drivers (guest tools)

Step 1.4: Configure Network (Static IP)

# In Windows VM, run PowerShell as Administrator

# Set static IP 10.0.20.37
New-NetIPAddress -InterfaceAlias "Ethernet" -IPAddress 10.0.20.37 -PrefixLength 24 -DefaultGateway 10.0.20.1

# Set DNS
Set-DnsClientServerAddress -InterfaceAlias "Ethernet" -ServerAddresses ("10.0.20.1","8.8.8.8")

# Verify
Get-NetIPAddress | Where-Object {$_.IPAddress -eq "10.0.20.37"}
Test-Connection 10.0.20.36 -Count 2

Step 1.5: Windows Initial Configuration

# Run as Administrator

# Enable Remote Desktop (optional, for management)
Set-ItemProperty -Path 'HKLM:\System\CurrentControlSet\Control\Terminal Server' -Name "fDenyTSConnections" -Value 0
Enable-NetFirewallRule -DisplayGroup "Remote Desktop"

# Disable Windows Firewall for private network (or configure rules)
Set-NetFirewallProfile -Profile Domain,Public,Private -Enabled False

# Set timezone
Set-TimeZone -Id "GTB Standard Time"  # Romania timezone

# Disable hibernation (saves disk space)
powercfg /hibernate off

# Create directories for Oracle
New-Item -ItemType Directory -Path "D:\oracle" -Force
New-Item -ItemType Directory -Path "D:\oracle\backups" -Force
New-Item -ItemType Directory -Path "D:\oracle\oradata" -Force
New-Item -ItemType Directory -Path "D:\oracle\fra" -Force

PHASE 1 COMPLETE: Windows VM created, network configured, ready for Oracle installation


🗄️ PHASE 2: INSTALL ORACLE 19c (60-90 minutes)

Step 2.1: Download Oracle 19c

On developer machine or PRIMARY:

1. Go to: https://www.oracle.com/database/technologies/oracle19c-windows-downloads.html
2. Download: WINDOWS.X64_193000_db_home.zip (3.0 GB)
3. Transfer to VM:
   - Option A: Shared folder via Proxmox
   - Option B: HTTP file server
   - Option C: Direct download in VM

Step 2.2: Prepare Installation (in Windows VM)

# Run as Administrator

# Extract Oracle installation
Expand-Archive -Path "C:\Temp\WINDOWS.X64_193000_db_home.zip" -DestinationPath "D:\oracle\product\19c\dbhome_1"

# Create response file for silent install
$responseFile = @"
oracle.install.option=INSTALL_DB_SWONLY
UNIX_GROUP_NAME=
INVENTORY_LOCATION=D:\oracle\oraInventory
ORACLE_HOME=D:\oracle\product\19c\dbhome_1
ORACLE_BASE=D:\oracle
oracle.install.db.InstallEdition=SE2
oracle.install.db.OSDBA_GROUP=ORA_DBA
oracle.install.db.OSOPER_GROUP=ORA_OPER
oracle.install.db.OSBACKUPDBA_GROUP=ORA_BACKUPDBA
oracle.install.db.OSDGDBA_GROUP=ORA_DG
oracle.install.db.OSKMDBA_GROUP=ORA_KM
oracle.install.db.OSRACDBA_GROUP=ORA_RAC
DECLINE_SECURITY_UPDATES=true
"@

$responseFile | Out-File -FilePath "D:\oracle\db_install.rsp" -Encoding ASCII

Step 2.3: Silent Installation

# Run as Administrator

cd D:\oracle\product\19c\dbhome_1

# Silent install (takes 30-60 minutes)
.\setup.exe -silent -responseFile D:\oracle\db_install.rsp -ignorePrereqFailure

# Wait for completion, check log:
# D:\oracle\oraInventory\logs\installActions<timestamp>.log

# Run root scripts (as Administrator)
D:\oracle\product\19c\dbhome_1\root.bat

Step 2.4: Create Listener

# Set environment
$env:ORACLE_HOME = "D:\oracle\product\19c\dbhome_1"
$env:PATH = "$env:ORACLE_HOME\bin;$env:PATH"

# Create listener.ora
$listenerOra = @"
LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.20.37)(PORT = 1521))
    )
  )
"@

$listenerOra | Out-File -FilePath "D:\oracle\product\19c\dbhome_1\network\admin\listener.ora" -Encoding ASCII

# Start listener
lsnrctl start

# Configure listener as Windows service (optional)

Step 2.5: Create Empty Database Template (for faster DR restore)

# Create init parameter file
$initROA = @"
DB_NAME=ROA
DB_BLOCK_SIZE=8192
COMPATIBLE=19.0.0
MEMORY_TARGET=2G
PROCESSES=300
OPEN_CURSORS=300
DB_RECOVERY_FILE_DEST=D:\oracle\fra
DB_RECOVERY_FILE_DEST_SIZE=20G
CONTROL_FILES=('D:\oracle\oradata\ROA\control01.ctl','D:\oracle\oradata\ROA\control02.ctl')
"@

$initROA | Out-File -FilePath "D:\oracle\product\19c\dbhome_1\database\initROA.ora" -Encoding ASCII

# Create directory structure
New-Item -ItemType Directory -Path "D:\oracle\oradata\ROA" -Force
New-Item -ItemType Directory -Path "D:\oracle\fra" -Force

# Note: We will NOT create the database now
# Database will be created via RMAN RESTORE during DR event

PHASE 2 COMPLETE: Oracle 19c installed, listener configured, ready for SSH setup


🔐 PHASE 3: CONFIGURE SSH FOR AUTOMATED TRANSFERS (20 minutes)

Step 3.1: Install OpenSSH Server

# Run as Administrator

# Install OpenSSH Server
Add-WindowsCapability -Online -Name OpenSSH.Server~~~~0.0.1.0

# Start and enable service
Start-Service sshd
Set-Service -Name sshd -StartupType 'Automatic'

# Confirm firewall rule
Get-NetFirewallRule -Name *ssh*

# Test SSH from developer machine
# ssh Administrator@10.0.20.37

Step 3.2: Configure Passwordless SSH (Key-based Authentication)

# On Windows VM, as Administrator

# Create .ssh directory
$sshDir = "$env:ProgramData\ssh"
New-Item -ItemType Directory -Path $sshDir -Force

# Get public key from PRIMARY server
# Option A: Copy manually from PRIMARY C:\Users\Administrator\.ssh\id_rsa.pub
# Option B: Download via SCP from developer machine

# For this example, manually copy the content:
# From PRIMARY run: Get-Content C:\Users\Administrator\.ssh\id_rsa.pub

# On DR Windows VM:
$publicKey = "<paste-public-key-here>"
$publicKey | Out-File -FilePath "$sshDir\administrators_authorized_keys" -Encoding ASCII

# Set permissions (CRITICAL for SSH to work)
icacls "$sshDir\administrators_authorized_keys" /inheritance:r
icacls "$sshDir\administrators_authorized_keys" /grant "SYSTEM:(F)"
icacls "$sshDir\administrators_authorized_keys" /grant "BUILTIN\Administrators:(F)"

# Restart SSH service
Restart-Service sshd

Step 3.3: Configure SSH for SYSTEM Account (for scheduled tasks)

# Windows scheduled tasks run as SYSTEM, so we need SYSTEM's SSH key

# Create SYSTEM's .ssh directory
$systemSSHDir = "C:\Windows\System32\config\systemprofile\.ssh"
New-Item -ItemType Directory -Path $systemSSHDir -Force

# Copy the same authorized_keys
Copy-Item "$env:ProgramData\ssh\administrators_authorized_keys" `
          -Destination "$systemSSHDir\authorized_keys" -Force

# Set permissions
icacls "$systemSSHDir\authorized_keys" /inheritance:r
icacls "$systemSSHDir\authorized_keys" /grant "SYSTEM:(F)"

Step 3.4: Test SSH Connection from PRIMARY

# On PRIMARY (10.0.20.36), test SSH to DR VM

# Test 1: Manual connection
ssh -i C:\Users\Administrator\.ssh\id_rsa Administrator@10.0.20.37 "echo SSH_OK"

# Test 2: File transfer
echo "test content" > C:\Temp\test.txt
scp -i C:\Users\Administrator\.ssh\id_rsa C:\Temp\test.txt Administrator@10.0.20.37:D:\oracle\backups\

# If successful, you should see the file on DR VM

PHASE 3 COMPLETE: OpenSSH configured, passwordless authentication working


📝 PHASE 4: UPDATE TRANSFER SCRIPTS (15 minutes)

Step 4.1: Modify 02_transfer_to_dr.ps1 for Windows Target

# File: D:\rman_backup\02_transfer_to_dr_windows.ps1
# Changes needed:

# OLD (Linux target):
# $DRPath = "/opt/oracle/backups/primary"

# NEW (Windows target):
$DRHost = "10.0.20.37"
$DRUser = "Administrator"  # Changed from "root"
$DRPath = "D:/oracle/backups/primary"  # Windows path with forward slashes for SCP
$SSHKeyPath = "C:\Users\Administrator\.ssh\id_rsa"

# Update SSH commands to use Windows paths
# Example: Directory creation
$null = & ssh -n -i $SSHKeyPath "${DRUser}@${DRHost}" `
    "New-Item -ItemType Directory -Path '$DRPath' -Force" 2>&1

# Update cleanup command for Windows
function Cleanup-OldBackupsOnDR {
    Write-Log "Cleaning up old backups on DR (keeping last 2 days)..."

    try {
        $cleanupCmd = @"
Get-ChildItem -Path '$DRPath' -Filter '*.BKP' |
    Where-Object { `$_.LastWriteTime -lt (Get-Date).AddDays(-2) } |
    Remove-Item -Force
"@
        $result = & ssh -n -i $SSHKeyPath "${DRUser}@${DRHost}" "powershell -Command `"$cleanupCmd`"" 2>&1

        Write-Log "Cleanup completed on DR"
    } catch {
        Write-Log "Cleanup warning: $_" "WARNING"
    }
}

Step 4.2: Create Updated Transfer Scripts

# Save updated versions:
# - 02_transfer_to_dr_windows.ps1 (FULL backup transfer)
# - 02b_transfer_incremental_to_dr_windows.ps1 (INCREMENTAL transfer)

# Key changes for Windows:
# 1. DRUser = "Administrator" instead of "root"
# 2. DRPath = "D:/oracle/backups/primary" (Windows path)
# 3. SSH commands use PowerShell instead of Linux commands
# 4. Directory check: Test-Path instead of "test -f"
# 5. Cleanup: Get-ChildItem instead of find

Step 4.3: Test Transfer Script

# On PRIMARY, test the new script

# Manual test
D:\rman_backup\02_transfer_to_dr_windows.ps1

# Check log output
Get-Content "D:\rman_backup\logs\transfer_$(Get-Date -Format 'yyyyMMdd').log" -Tail 50

# Verify on DR VM
ssh Administrator@10.0.20.37 "Get-ChildItem D:\oracle\backups\primary"

PHASE 4 COMPLETE: Transfer scripts updated and tested for Windows target


🔄 PHASE 5: CREATE RMAN RESTORE SCRIPT ON DR VM (30 minutes)

Step 5.1: Create RMAN Restore Script

# File: D:\oracle\scripts\rman_restore_from_primary.ps1
# Run on DR Windows VM

param(
    [string]$BackupPath = "D:\oracle\backups\primary",
    [string]$OracleHome = "D:\oracle\product\19c\dbhome_1",
    [string]$OracleBase = "D:\oracle",
    [string]$DataDir = "D:\oracle\oradata\ROA",
    [string]$FRADir = "D:\oracle\fra",
    [int]$DBID = 1363569330,
    [string]$LogFile = "D:\oracle\logs\restore_$(Get-Date -Format 'yyyyMMdd_HHmmss').log"
)

$ErrorActionPreference = "Stop"

function Write-Log {
    param([string]$Message, [string]$Level = "INFO")
    $timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
    $logLine = "[$timestamp] [$Level] $Message"
    Write-Host $logLine
    Add-Content -Path $LogFile -Value $logLine -Encoding UTF8
}

try {
    Write-Log "======================================================================"
    Write-Log "Oracle DR Restore - Starting"
    Write-Log "======================================================================"
    Write-Log "Backup Path: $BackupPath"
    Write-Log "Oracle Home: $OracleHome"
    Write-Log "DBID: $DBID"

    # Set environment
    $env:ORACLE_HOME = $OracleHome
    $env:ORACLE_SID = "ROA"
    $env:PATH = "$OracleHome\bin;$env:PATH"

    # Step 1: Cleanup old database files
    Write-Log "[1/6] Cleaning old database files..."
    if (Test-Path $DataDir) {
        Remove-Item "$DataDir\*" -Recurse -Force -ErrorAction SilentlyContinue
    }
    if (Test-Path $FRADir) {
        Remove-Item "$FRADir\*" -Recurse -Force -ErrorAction SilentlyContinue
    }
    New-Item -ItemType Directory -Path $DataDir -Force | Out-Null
    New-Item -ItemType Directory -Path $FRADir -Force | Out-Null

    # Step 2: Startup NOMOUNT
    Write-Log "[2/6] Starting instance in NOMOUNT mode..."
    $sqlNomount = @"
STARTUP NOMOUNT PFILE='$OracleHome\database\initROA.ora';
EXIT;
"@
    $sqlNomount | sqlplus / as sysdba

    # Step 3: RMAN Restore
    Write-Log "[3/6] Running RMAN RESTORE CONTROLFILE..."

    $rmanScript = @"
SET DBID $DBID;

RUN {
    ALLOCATE CHANNEL ch1 DEVICE TYPE DISK;

    # Restore controlfile from autobackup
    SET CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '$BackupPath/%F';
    RESTORE CONTROLFILE FROM AUTOBACKUP;
}

EXIT;
"@

    $rmanScript | rman TARGET /

    if ($LASTEXITCODE -ne 0) {
        throw "RMAN RESTORE CONTROLFILE failed"
    }

    # Step 4: Mount database
    Write-Log "[4/6] Mounting database..."
    "ALTER DATABASE MOUNT; EXIT;" | sqlplus / as sysdba

    # Step 5: Catalog and restore database
    Write-Log "[5/6] Cataloging backups and restoring database..."

    $rmanRestore = @"
CATALOG START WITH '$BackupPath/' NOPROMPT;

RUN {
    SET NEWNAME FOR DATABASE TO '$DataDir\%b';
    RESTORE DATABASE;
    SWITCH DATAFILE ALL;
    RECOVER DATABASE;
}

EXIT;
"@

    $rmanRestore | rman TARGET /

    if ($LASTEXITCODE -ne 0) {
        throw "RMAN RESTORE DATABASE failed"
    }

    # Step 6: Open database RESETLOGS
    Write-Log "[6/6] Opening database with RESETLOGS..."
    "ALTER DATABASE OPEN RESETLOGS; EXIT;" | sqlplus / as sysdba

    Write-Log "======================================================================"
    Write-Log "DR RESTORE COMPLETED SUCCESSFULLY!"
    Write-Log "======================================================================"
    Write-Log "Database ROA is now OPEN and ready"

    # Verify
    Write-Log "Verification:"
    $verifySQL = @"
SELECT name, open_mode, database_role FROM v`$database;
EXIT;
"@
    $verifySQL | sqlplus -s / as sysdba

    exit 0

} catch {
    Write-Log "CRITICAL ERROR: $($_.Exception.Message)" "ERROR"
    Write-Log "Stack trace: $($_.ScriptStackTrace)" "ERROR"
    exit 1
}

Step 5.2: Create Quick Test Script

# File: D:\oracle\scripts\test_restore_latest.ps1
# Quick test to verify restore works

$BackupPath = "D:\oracle\backups\primary"
$LatestBackup = Get-ChildItem "$BackupPath\*.BKP" |
    Sort-Object LastWriteTime -Descending |
    Select-Object -First 1

Write-Host "Latest backup: $($LatestBackup.Name)"
Write-Host "Size: $([math]::Round($LatestBackup.Length / 1GB, 2)) GB"
Write-Host "Date: $($LatestBackup.LastWriteTime)"
Write-Host ""
Write-Host "Ready to test restore? Run:"
Write-Host "D:\oracle\scripts\rman_restore_from_primary.ps1"

PHASE 5 COMPLETE: RMAN restore script created and ready to test


🧪 PHASE 6: TEST DR RESTORE (30 minutes)

Step 6.1: Verify Backups Transferred

# On DR Windows VM

# Check backup files
Get-ChildItem D:\oracle\backups\primary\*.BKP |
    Sort-Object LastWriteTime -Descending |
    Select-Object Name, @{N='SizeMB';E={[math]::Round($_.Length/1MB,2)}}, LastWriteTime

# Expected output: 15-20 files (FULL + INCREMENTAL + CONTROLFILE + SPFILE + ARCHIVELOGS)

Step 6.2: Run Test Restore

# IMPORTANT: This will create a live database on DR VM
# Make sure PRIMARY is still running (don't confuse them!)

# Run restore
D:\oracle\scripts\rman_restore_from_primary.ps1

# Monitor progress in log
Get-Content "D:\oracle\logs\restore_*.log" -Wait

# Expected duration: 10-15 minutes

Step 6.3: Verify Database

# Connect to restored database
sqlplus sys/romfastsoft@10.0.20.37:1521/ROA as sysdba

SQL> SELECT name, open_mode FROM v$database;
# Expected: ROA, READ WRITE

SQL> SELECT tablespace_name, status FROM dba_tablespaces;
# Expected: SYSTEM, SYSAUX, UNDOTBS, TS_ROA, USERS - all ONLINE

SQL> SELECT COUNT(*) FROM dba_tables WHERE owner='<your-app-schema>';
# Verify application tables restored

SQL> EXIT;

Step 6.4: Shutdown DR Database (conserve resources)

# After successful test, shutdown database
sqlplus / as sysdba

SQL> SHUTDOWN IMMEDIATE;
SQL> EXIT;

# Stop listener
lsnrctl stop

# Optional: Shutdown Windows VM to conserve resources
# (VM will be started only during actual DR events)

PHASE 6 COMPLETE: DR restore tested and verified working


⚙️ PHASE 7: UPDATE TASK SCHEDULER ON PRIMARY (10 minutes)

Step 7.1: Update Scheduled Tasks to Use New Scripts

# On PRIMARY (10.0.20.36)

# Task 1: FULL Backup + Transfer (already exists, just update transfer script)
# Name: "Oracle RMAN Daily Backup + DR Transfer"
# Trigger: Daily 02:30 AM
# Action 1: Run RMAN backup (unchanged)
# Action 2: UPDATE to new script

# Update task to use new transfer script
$action = New-ScheduledTaskAction -Execute "PowerShell.exe" `
    -Argument "-NoProfile -ExecutionPolicy Bypass -File D:\rman_backup\02_transfer_to_dr_windows.ps1"

Set-ScheduledTask -TaskName "Oracle RMAN Daily Backup + DR Transfer" -Action $action

# Task 2: INCREMENTAL Backup + Transfer
# Similar update for incremental task

Step 7.2: Test Scheduled Task Manually

# On PRIMARY

# Run FULL backup + transfer task manually
Start-ScheduledTask -TaskName "Oracle RMAN Daily Backup + DR Transfer"

# Monitor task status
Get-ScheduledTask -TaskName "Oracle RMAN Daily Backup + DR Transfer" |
    Get-ScheduledTaskInfo

# Check transfer log
Get-Content "D:\rman_backup\logs\transfer_$(Get-Date -Format 'yyyyMMdd').log" -Tail 50

# Verify on DR
ssh Administrator@10.0.20.37 "Get-ChildItem D:\oracle\backups\primary -Filter *.BKP | Measure-Object"

PHASE 7 COMPLETE: Automated backup and transfer configured


📚 PHASE 8: CREATE DR RUNBOOK (15 minutes)

Step 8.1: DR Emergency Procedure

# DISASTER RECOVERY PROCEDURE
## When PRIMARY Server (10.0.20.36) Fails

### PRE-REQUISITES
- Proxmox access available
- DR Windows VM exists (ID 109)
- Latest backups transferred (<24h old)

### DR ACTIVATION STEPS (RTO: 15-20 minutes)

1. **Start DR Windows VM (2 minutes)**

Proxmox Web UI → VM 109 (oracle-dr-windows) → Start Wait for Windows to boot Verify network: ping 10.0.20.37


2. **Verify Backups Present (1 minute)**
```powershell
# RDP or Console to 10.0.20.37
Get-ChildItem D:\oracle\backups\primary\*.BKP |
    Sort-Object LastWriteTime -Descending |
    Select-Object -First 10

# Verify you see today's or yesterday's backups
  1. Run RMAN Restore (12-15 minutes)

    # Run restore script
    D:\oracle\scripts\rman_restore_from_primary.ps1
    
    # Monitor log in real-time
    Get-Content D:\oracle\logs\restore_*.log -Wait
    
  2. Verify Database (2 minutes)

    # Connect to database
    sqlplus sys/romfastsoft@localhost:1521/ROA as sysdba
    
    SQL> SELECT name, open_mode FROM v$database;
    SQL> SELECT tablespace_name, status FROM dba_tablespaces;
    SQL> -- Verify critical application tables
    SQL> EXIT;
    
  3. Update Network/DNS (5 minutes)

    - Update DNS: roa-db.example.com → 10.0.20.37
    - OR: Update application connection strings to 10.0.20.37
    - Test application connectivity
    
  4. Monitor & Notify

    - Monitor database alert log: D:\oracle\diag\rdbms\roa\ROA\trace\alert_ROA.log
    - Notify team that DR is active
    - Document incident timeline
    

RECOVERY BACK TO PRIMARY (When repaired)

  1. Create fresh RMAN backup from DR (now contains latest data)
  2. Transfer backup to repaired PRIMARY
  3. Restore on PRIMARY
  4. Switch DNS/connections back to PRIMARY
  5. Shutdown DR VM

TESTING SCHEDULE

  • Monthly DR test: Last Sunday of month
  • Test duration: 30 minutes
  • Document test results

**✅ PHASE 8 COMPLETE:** DR runbook documented

---

## 📊 FINAL ARCHITECTURE

┌─────────────────────────────────────────────────────────────┐ │ PRODUCTION ENVIRONMENT │ ├─────────────────────────────────────────────────────────────┤ │ │ │ PRIMARY (10.0.20.36) - Windows Physical Server │ │ ├─ Oracle 19c SE2 │ │ ├─ Database: ROA │ │ ├─ RMAN Backups: │ │ │ ├─ FULL: Daily 02:30 AM (~7GB compressed) │ │ │ └─ INCREMENTAL: Daily 14:00 (~50MB) │ │ └─ Automatic Transfer to DR via SSH/SCP │ │ │ │ ↓ SSH Transfer │ │ ↓ (950 Mbps) │ │ ↓ │ │ DR (10.0.20.37) - Windows VM in Proxmox (ID 109) │ │ ├─ Oracle 19c SE2 (installed, ready) │ │ ├─ VM State: POWERED OFF (0 RAM consumption) │ │ ├─ Backups: D:\oracle\backups\primary │ │ ├─ Storage: 100 GB (OS + Oracle + backups) │ │ └─ Restore Script: D:\oracle\scripts\rman_restore... │ │ │ │ DR ACTIVATION (when needed): │ │ ├─ 1. Power ON VM (2 min) │ │ ├─ 2. Run restore script (12 min) │ │ ├─ 3. Database OPEN (1 min) │ │ └─ TOTAL RTO: ~15 minutes │ │ │ └─────────────────────────────────────────────────────────────┘

METRICS (Current Implementation):

  • RPO: 24 hours (only FULL backup used; incremental causes UNDO corruption)
  • RTO: 15 minutes
  • Storage: 500 GB VM + backups on host
  • Daily resources: ZERO (VM powered off)
  • DR test: Weekly (planned)

METRICS (After Upgrade to CUMULATIVE):

  • RPO: 3-4 hours (FULL + latest CUMULATIVE)
  • RTO: 15 minutes (unchanged)
  • Storage: 500 GB VM + ~15 GB on Proxmox host
  • Daily resources: ZERO (VM powered off)
  • DR test: Weekly (automated)

---

## ✅ POST-IMPLEMENTATION CHECKLIST

### Phase 1-8 (Initial Setup) - ✅ COMPLETED 2025-10-09

- [x] Windows VM created in Proxmox (VM ID 109, IP 10.0.20.37)
- [x] Oracle 19c SE2 installed and working
- [x] OpenSSH Server configured with passwordless authentication
- [x] Transfer scripts updated and tested (FULL backup)
- [x] RMAN restore script created on DR VM
- [x] DR restore tested successfully (database opens and is usable)
- [x] Scheduled tasks on PRIMARY verified
- [x] DR procedures documented
- [x] VM shutdown after testing (to conserve resources)

### Phase 9 (Upgrade to CUMULATIVE) - 📋 PLANNED

**See:** `DR_UPGRADE_TO_CUMULATIVE_PLAN.md` for detailed implementation steps

- [ ] Proxmox host storage configured (`/mnt/pve/oracle-backups`)
- [ ] VM 109 mount point configured (E:\ from host)
- [ ] RMAN script updated to CUMULATIVE incremental
- [ ] Transfer scripts updated to send to Proxmox host
- [ ] SSH key for Proxmox host access configured
- [ ] Scheduled task created for 13:00 CUMULATIVE backup
- [ ] Scheduled task created for 18:00 CUMULATIVE backup
- [ ] Existing 14:00 task removed
- [ ] 02:30 FULL task updated to use new transfer script
- [ ] DR restore script updated for cumulative backups
- [ ] End-to-end restore test with CUMULATIVE successful
- [ ] Weekly test script created and scheduled
- [ ] Team trained on new backup strategy

---

## 🔧 TROUBLESHOOTING GUIDE

### Issue: SSH Connection Fails
```powershell
# Check 1: SSH service running?
Get-Service sshd

# Check 2: Firewall blocking?
Get-NetFirewallRule -Name *ssh*

# Check 3: Authorized keys permissions?
icacls "C:\ProgramData\ssh\administrators_authorized_keys"

# Check 4: Test from PRIMARY
ssh -v Administrator@10.0.20.37

Issue: RMAN Restore Fails "CONTROLFILE not found"

# This is the cross-platform issue!
# Solution: Ensure you're using Windows→Windows (same platform)
# Check Oracle version matches: 19c on both sides

Issue: Database Won't Start

# Check alert log
Get-Content D:\oracle\diag\rdbms\roa\ROA\trace\alert_ROA.log -Tail 100

# Check parameter file
Get-Content D:\oracle\product\19c\dbhome_1\database\initROA.ora

# Verify directories exist
Test-Path D:\oracle\oradata\ROA
Test-Path D:\oracle\fra

Issue: VM Uses Too Much Disk

# Check backup retention
Get-ChildItem D:\oracle\backups\primary\*.BKP |
    Where-Object { $_.LastWriteTime -lt (Get-Date).AddDays(-3) } |
    Remove-Item -Force

# Check FRA usage
SELECT * FROM V$RECOVERY_FILE_DEST;

# Cleanup old archives
RMAN> DELETE NOPROMPT ARCHIVELOG ALL COMPLETED BEFORE 'SYSDATE-2';

📞 SUPPORT & REFERENCES

Oracle Documentation

Internal Scripts

  • PRIMARY RMAN backup: D:\rman_backup\rman_backup.txt
  • Transfer script (FULL): D:\rman_backup\02_transfer_to_dr_windows.ps1
  • Transfer script (INCREMENTAL): D:\rman_backup\02b_transfer_incremental_to_dr_windows.ps1
  • DR restore script: D:\oracle\scripts\rman_restore_from_primary.ps1 (on DR VM)

Logs Location

  • PRIMARY transfer logs: D:\rman_backup\logs\
  • DR restore logs: D:\oracle\logs\
  • Oracle alert log: D:\oracle\diag\rdbms\roa\ROA\trace\alert_ROA.log

🎯 IMPLEMENTATION TIMELINE

Phase Task Duration Responsible
1 Create Windows VM in Proxmox 30 min Infrastructure Admin
2 Install Oracle 19c 90 min DBA
3 Configure SSH 20 min Infrastructure Admin
4 Update Transfer Scripts 15 min DBA
5 Create Restore Script 30 min DBA
6 Test DR Restore 30 min DBA
7 Update Scheduled Tasks 10 min DBA
8 Document DR Runbook 15 min DBA
TOTAL ~4 hours

Note: This is one-time setup. After completion, daily operations are fully automated with ZERO maintenance overhead.


Generated: 2025-10-08 Last Updated: 2025-10-09 Version: 2.0 Status: Phase 1-8 COMPLETED | 📋 Phase 9 (CUMULATIVE upgrade) PLANNED Implementation Status:

  • Initial setup (Phases 1-8): COMPLETED 2025-10-09
  • RMAN restore tested: SUCCESSFUL (12-15 minutes RTO)
  • Current RPO: 24 hours (FULL backup only)
  • Next: Upgrade to CUMULATIVE incremental for 3-4 hour RPO

Next Session: Implement CUMULATIVE backup strategy See: DR_UPGRADE_TO_CUMULATIVE_PLAN.md for upgrade plan