Files

Marius b44e3c8f9b Oracle DR: Complete cleanup and restore scripts with Proxmox integration

- Remove outdated planning documents and implementation guides
- Update README with comprehensive DR procedures and monitoring
- Enhance rman_restore_from_zero.cmd with SPFILE creation and auto-start
- Add Proxmox monitoring and weekly test scripts
- Archive old implementation documentation

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

2025-10-10 15:13:29 +03:00

28 KiB

Raw Blame History

Oracle DR Windows VM - Implementation Status

Date: 2025-10-09 04:00 AM VM: 109 (oracle-dr-windows) Location: Proxmox pveelite (10.0.20.202) IP: 10.0.20.37 Purpose: Replace Linux LXC DR with Windows VM for same-platform RMAN restore

✅ COMPLETED TASKS

1. VM Creation and Network ✅

VM ID: 109 on pveelite (10.0.20.202)
Template source: Win11-Template (ID 300) from pvemini (10.0.20.201)
Cloned and migrated: Successfully migrated from pvemini to pveelite
Resources configured:
- RAM: 6GB
- CPU: 4 cores
- Disk: 500GB (local-zfs)
- Boot on startup: NO (VM stays off until DR event)
Network:
- Static IP: 10.0.20.37
- Gateway: 10.0.20.1
- DNS: 10.0.20.1, 8.8.8.8
- Windows Firewall: Disabled
- Connectivity: ✅ Verified (ping successful)

2. Windows Configuration ✅

Computer name: ORACLE-DR
Timezone: GTB Standard Time (Romania)
Hibernation: Disabled
Administrator profile: Fixed (C:\Users\Administrator)
Auto-login: Disabled

3. Users Created ✅

User	Password	Admin	Hidden from Login	Purpose
romfast	Romfast2025!	Yes	Yes	SSH access, backup transfers
silvia	Silvia2025!	No	Yes	SSH tunnels (2 ports)
eli	Eli2025!	No	Yes	SSH tunnels (4 ports)

4. OpenSSH Server Configuration ✅

Port: 22122
Service: Running, Automatic startup
Authentication: ✅ SSH Key Authentication WORKING
- User key: mmarius28@gmail.com (for manual SSH from Linux)
- SYSTEM key: administrator@ROA-CARAPETRU2 (for automated backup transfers from PRIMARY)

SSH Config: C:\ProgramData\ssh\sshd_config

Port 22122
ListenAddress 0.0.0.0
PubkeyAuthentication yes
PasswordAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
AllowTcpForwarding yes
GatewayPorts yes

Match User romfast
    PermitOpen localhost:80 localhost:1521 localhost:3000 localhost:3001 localhost:3389 localhost:8006 localhost:8080 localhost:81 localhost:9443 localhost:22

Match User silvia
    PermitOpen localhost:80 localhost:1521

Match User eli
    PermitOpen localhost:80 localhost:1521 localhost:3000

Match Group administrators
    AuthorizedKeysFile __PROGRAMDATA__/ssh/administrators_authorized_keys

SSH Keys Configured:

File: C:\ProgramData\ssh\administrators_authorized_keys
Contains 2 keys:
1. ssh-rsa ...mmarius28@gmail.com (your Linux workstation)
2. ssh-rsa ...administrator@ROA-CARAPETRU2 (PRIMARY SYSTEM user for automated transfers)
Permissions: SYSTEM (Full Control), Administrators (Read)
Status: ✅ Both keys working

Fix Script: D:\oracle\scripts\fix_ssh_via_service.ps1

Stops SSH service
Recreates authorized_keys with both keys
Sets correct permissions using icacls
Restarts SSH service

5. Oracle 19c Installation ✅

Status: ✅ Installed (interactive GUI installation)
ORACLE_HOME: C:\Users\Administrator\Downloads\WINDOWS.X64_193000_db_home
ORACLE_BASE: C:\Users\oracle
Edition: Standard Edition 2 (SE2)
Version: 19.3.0.0.0
Installation Type: Software Only (no database created yet)
Oracle User: oracle (password: Oracle2025!)

Verification:

$env:ORACLE_HOME = "C:\Users\Administrator\Downloads\WINDOWS.X64_193000_db_home"
$env:PATH = "$env:ORACLE_HOME\bin;$env:PATH"
sqlplus -v    # Returns: SQL*Plus: Release 19.0.0.0.0 - Production

6. Oracle Listener Configuration ✅

Script: D:\oracle\scripts\configure_listener_dr.ps1
Status: ✅ Configured and Running
Port: 1521
Service: OracleOraDB19Home1TNSListener

Configuration Files Created:

C:\Users\Administrator\Downloads\WINDOWS.X64_193000_db_home\network\admin\listener.ora
C:\Users\Administrator\Downloads\WINDOWS.X64_193000_db_home\network\admin\tnsnames.ora
C:\Users\Administrator\Downloads\WINDOWS.X64_193000_db_home\network\admin\sqlnet.ora

Listener Status:

LSNRCTL for 64-bit Windows: Version 19.0.0.0.0 - Production
STATUS of the LISTENER
Alias                     LISTENER
Version                   TNSLSNR for 64-bit Windows: Version 19.0.0.0.0 - Production
Start Date                09-OCT-2025 03:18:34
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.0.20.37)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=\\.\pipe\EXTPROC1521ipc)))
Services Summary...
Service "ROA" has 1 instance(s).
  Instance "ROA", status UNKNOWN, has 1 handler(s) for this service...

7. Directory Structure Created ✅

C:\Users\oracle\
├── oradata\ROA\               (will be created by RMAN restore)
├── recovery_area\ROA\         (FRA - Fast Recovery Area)
├── admin\ROA\
│   ├── adump\                 (audit files)
│   ├── dpdump\                (data pump)
│   └── pfile\                 (initialization files)
└── oraInventory\              (Oracle inventory)

D:\oracle\
├── backups\primary\           ✅ (6.32 GB backup files transferred)
├── scripts\                   ✅ (DR automation scripts)
└── logs\                      ✅ (restore logs)

8. Backup Transfer Scripts Updated ✅

Location on PRIMARY: D:\rman_backup\

Scripts Updated:

transfer_to_dr.ps1 - Transfer FULL backups
transfer_incremental.ps1 - Transfer INCREMENTAL backups

Changes Made:

✅ DRHost: 10.0.20.37
✅ DRPort: 22122 (added)
✅ DRUser: romfast (changed from root)
✅ DRPath: D:/oracle/backups/primary (changed from /opt/oracle/backups/primary)
✅ All SSH commands updated with -p 22122
✅ Linux commands replaced with Windows PowerShell equivalents:
- test -f → powershell -Command "Test-Path ..."
- mkdir -p → powershell -Command "New-Item -ItemType Directory ..."
- find ... -delete → powershell -Command "Get-ChildItem ... | Remove-Item ..."

Backup Files Transferred: ✅ 6 files, 6.32 GB total

D:\oracle\backups\primary\
├── O1_MF_NNND0_DAILY_FULL_COMPRESSE_NGFVB4B8_.BKP    (4.81 GB)  # FULL backup
├── O1_MF_ANNNN_DAILY_FULL_COMPRESSE_NGFV7RGN_.BKP    (1.51 GB)  # FULL backup
├── O1_MF_NCNNF_TAG20251009T020551_NGFVLJTG_.BKP      (1.14 MB)  # Control file
├── O1_MF_S_1214013953_NGFVLL29_.BKP                  (1.14 MB)  # SPFILE autobackup
├── O1_MF_NNSNF_TAG20251009T020550_NGFVLGOR_.BKP      (112 KB)
└── O1_MF_ANNNN_DAILY_FULL_COMPRESSE_NGFVLFKN_.BKP    (861 KB)

Transfer Log: D:\rman_backup\logs\transfer_20251009.log

[2025-10-09 03:52:13] [SUCCESS] SSH connection successful
[2025-10-09 03:52:14] [INFO] Found 6 files, total size: 6.32 GB
[2025-10-09 03:57:27] [INFO] Files transferred: 6/6

9. DR Scripts Created ✅

All scripts located in: /mnt/e/proiecte/ROMFASTSQL/oracle/standby-server-scripts/

Installation Scripts:

✅ install_oracle19c_dr.ps1 - Oracle 19c installation (software only)
✅ configure_listener_dr.ps1 - Oracle Listener configuration

SSH Configuration Scripts: 3. ✅ fix_ssh_key_auth.ps1 - Initial SSH key setup attempt 4. ✅ fix_ssh_key_auth_simple.cmd - Simple command-line version 5. ✅ fix_ssh_via_service.ps1 - WORKING - Fixes SSH keys by stopping service

Backup Transfer Scripts (on PRIMARY): 6. ✅ transfer_to_dr.ps1 - Full backup transfer (updated for Windows) 7. ✅ transfer_incremental.ps1 - Incremental backup transfer (updated for Windows) 8. ✅ transfer_to_dr_windows.ps1 - Reference implementation

Restore Script: 9. ✅ rman_restore_from_primary.ps1 - RMAN restore script (ready to test)

Helper Scripts: 10. ✅ copy_system_ssh_key.ps1 - Extract SYSTEM user SSH key from PRIMARY 11. ✅ add_system_key_dr.ps1 - Add SYSTEM key to DR VM

✅ RMAN RESTORE COMPLETED - 2025-10-09 17:40

10. RMAN Restore End-to-End Test ✅ COMPLETED

Final Status: ✅ DATABASE SUCCESSFULLY RESTORED AND OPEN

Database: ROA
Mode: READ WRITE
Instance: OPEN
Tablespaces: 6 (all ONLINE)
Datafiles: 5
Application Owners: 69
Total Application Tables: 45,000+

Session Duration: ~5 hours (including troubleshooting) Actual Restore Time: ~15-20 minutes (datafiles + recovery) Total Data Restored: 6.32 GB compressed → ~15 GB uncompressed

🔧 CRITICAL ISSUES ENCOUNTERED & RESOLUTIONS

Issue 1: Incremental Backup Corruption ⚠️ → ✅ RESOLVED

Problem: Applying DIFFERENTIAL incremental backup (MIDDAY_INCREMENTAL from 14:00) caused UNDO tablespace corruption

Error: ORA-30012: undo tablespace 'UNDOTBS01' does not exist or of wrong type
Error: ORA-00603: ORACLE server session terminated by fatal error
Database crashed immediately after OPEN RESETLOGS attempt

Root Cause: DIFFERENTIAL incremental backup applied on top of FULL backup created inconsistent UNDO state

Initial Workaround: Restore only FULL backup without applying incremental

Permanent Solution: ✅ Upgrade to CUMULATIVE incremental backups

CUMULATIVE backups are independent from Level 0 (no dependency chain)
Each CUMULATIVE contains ALL changes since last Level 0
Eliminates UNDO/SCN mismatch issues
See: DR_UPGRADE_TO_CUMULATIVE_PLAN.md for implementation plan

Issue 2: Control File SCN Mismatch 🔴

Problem: ORA-01190: control file or data file 1 is from before the last RESETLOGS

Control file autobackup (O1_MF_S_1214013953_NGFVLL29_.BKP) created AFTER datafiles backup
SCN in control file was higher than SCN in datafiles
Error: ORA-01152: file 1 was not restored from a sufficiently old backup

Root Cause: Used SPFILE/Controlfile AUTOBACKUP instead of control file from same backup piece as datafiles

Resolution:

Restore control file from SAME backup as datafiles: O1_MF_NCNNF_TAG20251009T020551_NGFVLJTG_.BKP
This control file has matching SCN with datafiles (both from 02:05:51 backup)

Issue 3: ORA-16433 Recovery Loop 🔄

Problem: ORA-16433: The database or pluggable database must be opened in read/write mode

Occurred during RECOVER DATABASE attempts
Error appeared in both SQL*Plus and RMAN
Recovery session canceled due to errors

Root Cause:

Bug 14744052: Flag set in control file during incomplete RESETLOGS
Using SET UNTIL SCN 999999999999 in RMAN caused invalid recovery state
Standard Edition limitations with recovery operations

Resolution:

Remove SET UNTIL SCN from RMAN script
Use SET UNTIL TIME with specific backup completion time
Let RMAN auto-detect and apply only available archive logs
Incomplete recovery flag properly set by stopping at missing archive log

Issue 4: Memory Configuration ⚠️

Problem: ORA-27104: system-defined limits for shared memory was misconfigured

Initial PFILE had memory_target=1536M
VM has 6GB RAM but Windows reserved ~2GB
Database startup failed in NOMOUNT

Resolution: Reduced memory settings in PFILE:

memory_target=1024M
memory_max_target=1024M

Issue 5: Backup Location Issues 📁

Initial Setup: Backups in D:\oracle\backups\primary (custom path)

RMAN couldn't auto-detect backups
Had to specify explicit paths for all operations
Control file autobackup search failed

Final Solution:

Moved all backups to FRA: C:\Users\oracle\recovery_area\ROA\autobackup
Updated PRIMARY transfer scripts to use FRA path
RMAN now auto-detects all backups via CATALOG command
Simplified restore procedure significantly

📋 WORKING RMAN RESTORE PROCEDURE

Prerequisites ✅ ALL COMPLETE

✅ Oracle 19c installed on DR VM
✅ Listener configured and running
✅ FULL backup transferred from PRIMARY to FRA location
✅ OracleServiceROA Windows service created
✅ Backups moved to: C:\Users\oracle\recovery_area\ROA\autobackup

Step-by-Step Manual Procedure (Tested and Verified)

1. Prepare PFILE (Modified for DR) Location: C:\Users\oracle\admin\ROA\pfile\initROA.ora

db_name=ROA
memory_target=1024M
memory_max_target=1024M
processes=150
undo_management=MANUAL
compatible=19.0.0
control_files=('C:\Users\oracle\oradata\ROA\control01.ctl', 'C:\Users\oracle\recovery_area\ROA\control02.ctl')
db_block_size=8192
db_recovery_file_dest=C:\Users\Oracle\recovery_area
db_recovery_file_dest_size=10G
diagnostic_dest=C:\Users\oracle

2. Shutdown Database (if running)

set ORACLE_HOME=C:\Users\Administrator\Downloads\WINDOWS.X64_193000_db_home
set ORACLE_SID=ROA
set PATH=%ORACLE_HOME%\bin;%PATH%

sqlplus / as sysdba
SHUTDOWN ABORT;
EXIT;

3. Startup NOMOUNT

STARTUP NOMOUNT PFILE='C:\Users\oracle\admin\ROA\pfile\initROA.ora';
EXIT;

4. Connect to RMAN and Restore Control File

rman target /

SET DBID 1363569330;

RUN {
  ALLOCATE CHANNEL ch1 DEVICE TYPE DISK;
  RESTORE CONTROLFILE FROM 'C:/Users/oracle/recovery_area/ROA/autobackup/O1_MF_NCNNF_TAG20251009T020551_NGFVLJTG_.BKP';
  RELEASE CHANNEL ch1;
}

ALTER DATABASE MOUNT;

5. Catalog Backups in FRA

CATALOG START WITH 'C:/Users/oracle/recovery_area/ROA/autobackup' NOPROMPT;

6. Restore and Recover Database

RUN {
  ALLOCATE CHANNEL ch1 DEVICE TYPE DISK;
  ALLOCATE CHANNEL ch2 DEVICE TYPE DISK;
  SET UNTIL TIME "TO_DATE('09-OCT-2025 02:05:51','DD-MON-YYYY HH24:MI:SS')";
  RESTORE DATABASE;
  RECOVER DATABASE;
  RELEASE CHANNEL ch1;
  RELEASE CHANNEL ch2;
}

7. Open Database with RESETLOGS

ALTER DATABASE OPEN RESETLOGS;
EXIT;

8. Create TEMP Tablespace

sqlplus / as sysdba

ALTER TABLESPACE TEMP ADD TEMPFILE 'C:\Users\oracle\oradata\ROA\temp01.dbf'
  SIZE 567M REUSE AUTOEXTEND ON NEXT 640K MAXSIZE 32767M;

EXIT;

9. Verify Database Status

sqlplus / as sysdba

SELECT NAME, OPEN_MODE, LOG_MODE FROM V$DATABASE;
SELECT INSTANCE_NAME, STATUS FROM V$INSTANCE;
SELECT TABLESPACE_NAME, STATUS FROM DBA_TABLESPACES ORDER BY TABLESPACE_NAME;
SELECT COUNT(*) AS DATAFILE_COUNT FROM DBA_DATA_FILES;

SELECT OWNER, COUNT(*) AS TABLE_COUNT
FROM DBA_TABLES
WHERE OWNER NOT IN ('SYS','SYSTEM','OUTLN','MDSYS','CTXSYS','XDB','WMSYS','OLAPSYS',
                    'ORDDATA','ORDSYS','EXFSYS','LBACSYS','DBSNMP','APPQOSSYS','GSMADMIN_INTERNAL')
GROUP BY OWNER
ORDER BY OWNER;

EXIT;

Expected Results ✅ VERIFIED

Database Status:

NAME: ROA
OPEN_MODE: READ WRITE
LOG_MODE: ARCHIVELOG
INSTANCE_NAME: ROA
STATUS: OPEN

Tablespaces:

SYSAUX    ONLINE
SYSTEM    ONLINE
TEMP      ONLINE
TS_ROA    ONLINE
UNDOTBS01 ONLINE
USERS     ONLINE

Data Verification:

Datafiles: 5 (excluding TEMP)
Application Owners: 69
Application Tables: 45,000+

Performance Metrics:

NOMOUNT to MOUNT: ~30 seconds
Control file restore: ~10 seconds
Catalog backups: ~20 seconds
Database restore: ~8-10 minutes
Database recovery: ~2-3 minutes
OPEN RESETLOGS: ~1 minute
Total Time: ~12-15 minutes

Automated Script Version

Script: rman_restore_final.cmd Location: /mnt/e/proiecte/ROMFASTSQL/oracle/standby-server-scripts/rman_restore_final.cmd

This CMD script automates all the above steps. Run on DR VM as Administrator:

D:\oracle\scripts\rman_restore_final.cmd

The script will:

Shutdown database if running
Startup NOMOUNT with correct PFILE
Restore control file from correct backup piece (not autobackup)
Mount database
Catalog all backups in FRA
Restore database with 2 parallel channels
Recover database with NOREDO (no incremental)
Open with RESETLOGS
Create TEMP tablespace
Verify database status

Log file: D:\oracle\logs\rman_restore_final.log

11. Document DR Restore Procedure 📝

After successful test, create:

DR_RESTORE_PROCEDURE.md - Step-by-step restore instructions
DR_RUNBOOK.md - Emergency runbook for DR event
Screenshots of successful restore
Performance metrics (restore time, verification steps)

12. Schedule Automated Testing 🗓️

Monthly DR restore test (automated)
Quarterly full DR drill (manual verification)
Document test results in D:\oracle\logs\dr_test_YYYYMMDD.log

📋 PRIMARY SERVER CONFIGURATION (Reference)

Server: 10.0.20.36 (Windows Server) Oracle Version: 19c SE2 (19.3.0.0.0) Database: ROA, DBID: 1363569330, non-CDB (traditional architecture)

Paths:

ORACLE_HOME: C:\Users\Administrator\Downloads\WINDOWS.X64_193000_db_home
ORACLE_BASE: C:\Users\oracle
Datafiles: C:\Users\oracle\oradata\ROA\
- SYSTEM01.DBF
- SYSAUX01.DBF
- UNDOTBS01.DBF
- TS_ROA.DBF (application tablespace)
- USERS01.DBF
- TEMP01.DBF (567 MB)
Control Files:
- C:\Users\oracle\oradata\ROA\control01.ctl
- C:\Users\oracle\recovery_area\ROA\control02.ctl
Redo Logs:
- GROUP 1: C:\Users\oracle\oradata\ROA\REDO01.LOG (200 MB)
- GROUP 2: C:\Users\oracle\oradata\ROA\REDO02.LOG (200 MB)
- GROUP 3: C:\Users\oracle\oradata\ROA\REDO03.LOG (200 MB)
FRA: C:\Users\Oracle\recovery_area\ROA

RMAN Configuration:

Retention Policy: REDUNDANCY 2
Control File Autobackup: ON
Device Type: DISK, PARALLELISM 2, COMPRESSED BACKUPSET
Compression: BASIC

Backup Schedule (Current - to be upgraded):

FULL: Daily 02:30 AM (~6.32 GB compressed)
DIFFERENTIAL INCREMENTAL: Daily 14:00 (~50-120 MB) ⚠️ Not used in restore (causes UNDO corruption)
Retention: 2 days
Transfer to DR: Immediately after backup completes

Planned Upgrade (see DR_UPGRADE_TO_CUMULATIVE_PLAN.md):

FULL: Daily 02:30 AM (~6.32 GB compressed)
CUMULATIVE INCREMENTAL: Daily 13:00 + 18:00 (~150-400 MB each)
Retention: 2 days
Transfer to: Proxmox host (pveelite), mounted in VM when needed
Target RPO: 3-4 hours (vs current 24 hours)

SSH: OpenSSH Server on port 22122

SYSTEM user SSH key configured for automated transfers
Key: ssh-rsa AAAAB3NzaC1yc...administrator@ROA-CARAPETRU2

Scheduled Tasks:

Run as: NT AUTHORITY\SYSTEM
RMAN Full Backup + Transfer: Daily 02:30 AM
RMAN Incremental Backup + Transfer: Daily 14:00

⚠️ KNOWN ISSUES & RESOLUTIONS

1. SSH Key Authentication - RESOLVED ✅

Issue: Initial SSH key authentication failed with "Access Denied" Root Cause: File permissions on administrators_authorized_keys too restrictive Resolution:

Created script fix_ssh_via_service.ps1
Stops SSH service before modifying file
Uses takeown and icacls to set permissions
Both keys now working (user + SYSTEM)

2. Backup Transfer Directory Creation - RESOLVED ✅

Issue: SCP transfers failed with exit code 1 Root Cause: Directory D:\oracle\backups\primary didn't exist Resolution: Created directory manually via SSH Note: Transfer script command for creating directory had escaping issues

3. Oracle Silent Installation - RESOLVED ✅

Issue: Silent installation failed with "username field is empty" (exit code 254) Root Cause: Windows silent install more complex than Linux Resolution: Used interactive GUI installation instead Result: Oracle 19c successfully installed, working perfectly

4. QEMU Guest Agent Intermittent Timeouts

Status: Minor annoyance (NOT blocking) Impact: Cannot use qm guest exec reliably Workaround: Direct SSH access or Proxmox console Fix: Service QEMU-GA set to Automatic startup

📊 DR ARCHITECTURE SUMMARY

PRIMARY (10.0.20.36) - Windows Server       DR (10.0.20.37) - Windows 11 VM
├─ Oracle 19c SE2 (19.3.0.0.0)             ├─ Oracle 19c SE2 (19.3.0.0.0)
├─ Database: ROA (LIVE, non-CDB)           ├─ Database: ROA (OFFLINE, ready for restore)
├─ RMAN Backups (FULL + INCR)              ├─ Backup repository (6.32 GB)
│  └─ Compressed BACKUPSET                 ├─ RMAN restore scripts
│                                           └─ Listener configured and running
└─ Transfer via SSH/SCP (automated)
         ↓ port 22122, SYSTEM user key
         ↓ Daily at 02:30 (FULL) and 14:00 (INCR)
         └─────────────────────────────────────────→ D:\oracle\backups\primary\
            Automated daily transfer
            950 Mbps network (~5 min for 6 GB)

RTO (Recovery Time Objective): ~15 minutes

2 min: Power on VM and wait for boot
12 min: RMAN restore (database + recovery)
1 min: Database open RESETLOGS and verify

RPO (Recovery Point Objective - Current):

Current: Only FULL backup used = 24 hours (incremental not applied due to UNDO corruption issue)

RPO (Planned after upgrade to CUMULATIVE):

Target: FULL + latest CUMULATIVE = 3-4 hours
Best case: 1 hour (disaster at 13:05, use 13:00 cumulative)
Worst case: 10.5 hours (disaster at 13:00, use 02:30 full only)

Storage Requirements:

VM disk: 500 GB total
- Oracle installation: ~10 GB
- Database (restored): ~15 GB
- Backup repository: ~14 GB (2 days retention)
- Free space: ~460 GB
Daily backup transfer: 6-7 GB (FULL) + 50-120 MB (INCR)

Daily Resource Usage:

VM powered OFF when not needed: 0 GB RAM, 0 CPU
VM powered ON during DR event: 6 GB RAM, 4 CPU cores
Network transfer: ~5-10 minutes/day at 950 Mbps

Backup Retention:

PRIMARY: 2 days in FRA
DR: 2 days in D:\oracle\backups\primary
Cleanup: Automated via transfer scripts

🎯 NEXT STEPS

✅ COMPLETED (Current Session):

✅ RMAN Restore Tested - Database successfully restored and operational
✅ Database Verified - All tablespaces, tables, data verified
✅ Documented Results - Restore time ~12-15 minutes
✅ VM Shutdown - Conserving resources

🔄 NEXT SESSION - Upgrade to CUMULATIVE Strategy:

Priority: HIGH - Improves RPO from 24h to 3-4h

See detailed plan: DR_UPGRADE_TO_CUMULATIVE_PLAN.md

Summary of changes:

📦 Configure Proxmox host storage - Store backups on pveelite, mount in VM 109
🔄 Convert DIFFERENTIAL → CUMULATIVE - Add keyword to RMAN script
⏰ Add second incremental - Run at 13:00 + 18:00 (vs current 14:00 only)
📝 Update transfer scripts - Send to Proxmox host instead of VM
🗓️ Update scheduled tasks - Create 13:00 and 18:00 tasks
🧪 Update restore script - Read from mount point (E:), handle cumulative backups
✅ Test end-to-end - Verify FULL + CUMULATIVE restore works

Estimated time: 2-3 hours Recommended: Saturday morning (low activity)

Short Term (After Upgrade):

📄 Update DR Runbook - Include cumulative backup procedures
🧪 Schedule Weekly Tests - Automated Saturday morning DR tests
📊 Create Monitoring - Alert if backups fail to transfer
🔐 Backup VM State - Snapshot of configured DR VM

Long Term:

🔄 Automate Weekly Tests - Script to test restore automatically
📈 Performance Tuning - Optimize restore speed if needed
🌐 Network Failover - DNS/routing changes for DR activation
📋 Compliance - Document DR procedures for audit

📞 SUPPORT CONTACTS & REFERENCES

Documentation:

Implementation plan: oracle/standby-server-scripts/DR_WINDOWS_VM_IMPLEMENTATION_PLAN.md
This status: oracle/standby-server-scripts/DR_WINDOWS_VM_STATUS_2025-10-09.md
Project directory: /mnt/e/proiecte/ROMFASTSQL/oracle/standby-server-scripts/

Proxmox:

Cluster: romfast
Nodes: pve1 (10.0.20.200), pvemini (10.0.20.201), pveelite (10.0.20.202)

VM 109 Commands:

qm status 109           # Check VM status
qm start 109            # Power on VM
qm stop 109             # Graceful shutdown
qm shutdown 109         # Force shutdown
qm console 109          # Open console (if needed)

Access Methods:

SSH (Preferred): ssh -p 22122 romfast@10.0.20.37
- Key authentication: ✅ Working
- Password: Romfast2025! (if key fails)
Proxmox Console: Web UI → pveelite → VM 109 → Console
RDP: Not configured (SSH preferred for security)

Oracle Quick Reference:

# On DR VM - Set environment
$env:ORACLE_HOME = "C:\Users\Administrator\Downloads\WINDOWS.X64_193000_db_home"
$env:ORACLE_SID = "ROA"
$env:PATH = "$env:ORACLE_HOME\bin;$env:PATH"

# Connect to database
sqlplus / as sysdba

# Check listener
lsnrctl status

# Test TNS
tnsping ROA

RMAN Quick Reference:

# Connect to RMAN
rman target /

# List backups
LIST BACKUP SUMMARY;

# Validate backups
VALIDATE BACKUPSET;

# Check database
SELECT NAME, OPEN_MODE, LOG_MODE FROM V$DATABASE;

Useful Scripts Location:

DR VM: D:\oracle\scripts\
PRIMARY: D:\rman_backup\
Project: /mnt/e/proiecte/ROMFASTSQL/oracle/standby-server-scripts/

Oracle Documentation:

RMAN Backup/Recovery: https://docs.oracle.com/en/database/oracle/oracle-database/19/bradv/
Windows Installation: https://docs.oracle.com/en/database/oracle/oracle-database/19/ntqrf/
Database Administrator's Guide: https://docs.oracle.com/en/database/oracle/oracle-database/19/admin/

📈 PROGRESS TRACKING

Overall Status: ~90% Complete Estimated time to completion: 30-60 minutes (RMAN restore test) Blockers: None - ready for final testing

Completed: 9/10 major tasks Remaining: 1/10 (RMAN restore test)

Session Summary (2025-10-09):

✅ Fixed SSH key authentication (2 keys configured)
✅ Installed Oracle 19c (interactive installation)
✅ Configured Oracle Listener (running on port 1521)
✅ Updated backup transfer scripts for Windows target
✅ Added PRIMARY SYSTEM SSH key to DR VM
✅ Successfully transferred 6.32 GB backup files
✅ COMPLETED RMAN restore testing - DATABASE FULLY OPERATIONAL

Time Invested: ~5 hours total

Setup and configuration: ~1.5 hours
RMAN restore attempts and troubleshooting: ~3 hours
Successful restore and verification: ~30 minutes

Critical Lessons Learned:

Control file source matters - Must use control file from same backup piece as datafiles, not autobackup
Incremental backups problematic - Can cause UNDO corruption when restored on different platform state
FRA location critical - Backups must be in Fast Recovery Area for RMAN auto-discovery
Memory constraints - Windows reserves significant RAM, reduce Oracle memory_target accordingly
SET UNTIL TIME - More reliable than SET UNTIL SCN for point-in-time recovery

Final Database Metrics:

Database: ROA (DBID: 1363569330)
Status: READ WRITE, OPEN
Tablespaces: 6 (all ONLINE)
Datafiles: 5
Application Owners: 69
Application Tables: 45,000+
Restore Time: 12-15 minutes (end-to-end)
Data Restored: 6.32 GB compressed → ~15 GB uncompressed

Last Updated: 2025-10-09 17:45 (Session completed) Updated By: Claude Code (Sonnet 4.5) Status: ✅ RMAN RESTORE SUCCESSFUL - DR SYSTEM VALIDATED AND OPERATIONAL

Next Actions:

Shutdown database: SHUTDOWN IMMEDIATE;
Power off VM to conserve resources: qm stop 109
Implement CUMULATIVE backup strategy (see DR_UPGRADE_TO_CUMULATIVE_PLAN.md)
Schedule weekly DR restore tests
Create DR runbook for emergency procedures
Monitor daily backup transfers from PRIMARY

Important Notes:

⚠️ VM 109 partitions: C:, D:, E: (already used)
📁 Mount point from host will appear as *F:* (not E:)
🔄 For VM migration between nodes, see: DR_VM_MIGRATION_GUIDE.md

28 KiB Raw Blame History