diff --git a/proxmox/vm109-windows-dr/README.md b/proxmox/vm109-windows-dr/README.md index 1b6585b..ef9c3a0 100644 --- a/proxmox/vm109-windows-dr/README.md +++ b/proxmox/vm109-windows-dr/README.md @@ -97,6 +97,19 @@ D:\oracle\scripts\rman_restore_from_zero.cmd # 4. Database is now RUNNING - Update app connections to 10.0.20.37 ``` +### 🔄 Failback DR → PRIMARY (when production is repaired) + +Procedura inversă, pentru când serverul de producție a fost reparat sau reinstalat și +trebuie mutată producția înapoi pe `10.0.20.36`: + +> ⚠️ **Pe PRIMARY instalează Oracle 19c (NU 21c) pentru failback acut.** Backup-urile sunt 19.3. 21c poate restore tehnic, dar cere upgrade-of-dictionary suplimentar (~30-60 min în plus) — risc inutil în fereastra de criză. Migrarea la 21c se face separat după failback. Detalii în `FAILBACK_PROCEDURE.md`. + +➡️ Vezi **[docs/FAILBACK_PROCEDURE.md](docs/FAILBACK_PROCEDURE.md)** — pași end-to-end: +- Backup final pe DR (cu DB în read-only / restricted) +- Restore pe PRIMARY nou cu `scripts/rman_restore_to_primary.ps1` +- Switch connection strings + reactivare scheduled tasks RMAN +- Stop VM 109, revenire la state normal + ### 🧪 Weekly Test (Every Saturday) ```bash @@ -557,6 +570,7 @@ vm109-windows-dr/ ├── docs/ │ ├── PLAN_TESTARE_MONITORIZARE.md # Plan testare și monitorizare DR │ ├── PROXMOX_NOTIFICATIONS_README.md # Configurare notificări Proxmox +│ ├── FAILBACK_PROCEDURE.md # Failback DR → PRIMARY (procedura inversă) │ └── archive/ # Planuri și statusuri anterioare │ ├── DR_UPGRADE_TO_CUMULATIVE_PLAN.md │ ├── DR_VM_MIGRATION_GUIDE.md @@ -568,7 +582,8 @@ vm109-windows-dr/ ├── rman_backup.bat # RMAN full backup (Windows) ├── rman_backup_incremental.bat # RMAN incremental (Windows) ├── transfer_backups.ps1 # Transfer backup-uri (Windows) - ├── rman_restore_from_zero.ps1 # Restore complet (Windows DR) + ├── rman_restore_from_zero.ps1 # Restore PRIMARY → DR (disaster activation) + ├── rman_restore_to_primary.ps1 # Restore DR → PRIMARY (failback) ├── cleanup_database.ps1 # Cleanup după test (Windows DR) └── *.ps1 # Alte scripturi configurare ``` diff --git a/proxmox/vm109-windows-dr/docs/FAILBACK_PROCEDURE.md b/proxmox/vm109-windows-dr/docs/FAILBACK_PROCEDURE.md new file mode 100644 index 0000000..87f8382 --- /dev/null +++ b/proxmox/vm109-windows-dr/docs/FAILBACK_PROCEDURE.md @@ -0,0 +1,362 @@ +# Failback DR → PRIMARY (procedura inversă) + +Procedura de revenire de pe serverul DR (VM 109, `10.0.20.37`) pe serverul de producție reparat sau reinstalat (PRIMARY, `10.0.20.36`). + +> **Context**: Această procedură se aplică DUPĂ ce DR a preluat producția (vezi `README.md` § "Emergency DR Activation"). Aplicațiile rulează pe DR (`10.0.20.37:1521/ROA`), PRIMARY a fost reparat sau reinstalat de la zero, și acum trebuie mutată producția înapoi pe PRIMARY. + +--- + +## ⚠️ ATENȚIE: Pe PRIMARY instalează Oracle **19c** (recomandat ferm pentru failback) + +Backup-urile RMAN existente sunt luate din Oracle **19.3** (`ORACLE_HOME = WINDOWS.X64_193000_db_home`, `compatible=19.0.0`). + +**Instalează Oracle Database 19c:** +- Versiune: **Oracle Database 19c** (19.3 base + ultimul Release Update) +- Edition: aceeași cu cea originală — verifică pe DR cu `SELECT banner FROM v$version;` +- Installer: `WINDOWS.X64_193000_db_home.zip` (Oracle eDelivery / OTN) +- Path identic cu DR pentru ca scriptul de restore să funcționeze fără modificări + +**De ce NU 21c (sau 23ai) la failback:** + +Tehnic, RMAN 21c **poate** citi backup-uri 19c (controlfile + datafiles + archivelogs se restore). DAR: datafile headers rămân 19c → `ALTER DATABASE OPEN RESETLOGS` eșuează în mod normal. Path-ul corect pe 21c este: + +``` +RESTORE CONTROLFILE → RESTORE DATABASE → RECOVER DATABASE +→ STARTUP UPGRADE (NU OPEN normal) +→ dbupgrade (sau catctl.pl catupgrd.sql) ~30-60 min +→ ALTER DATABASE OPEN RESETLOGS +→ @utlrp.sql (recompile invalid objects) +→ ALTER SYSTEM SET COMPATIBLE='21.0.0' SCOPE=SPFILE (ireversibil, după validare) +``` + +**Probleme cu acest drum la failback:** +- +30-60 min downtime extra în fereastra critică +- Pasul de upgrade dictionary nu a fost testat pe acest dataset +- Eventuale obiecte/PL-SQL incompatibile descoperite după upgrade — sub presiune +- Dacă upgrade-ul eșuează parțial, roll-back complicat (DR e încă opțiune, dar pierzi timp) + +**Recomandare ferm:** +- **Failback acut** → instalează 19c. Identic cu DR. Drum testat săptămânal. Fără surprize. +- **Migrare la 21c** → operație separată, planificată, după ce producția e stabilă pe PRIMARY 19c. Două opțiuni: + - **DBUA in-place upgrade** 19c → 21c — păstrează DBID, downtime planificat + - **Data Pump** (expdp/impdp full) — mai curat, dar pierzi DBID + statistici + +Scriptul `rman_restore_to_primary.ps1` are check explicit pe major version și abortează dacă găsește ≠ 19, ca să nu se pornească cu jumătate de plan în fereastra de criză. + +--- + +## 1. Precondiții + +| Cerință | Verificare | +|---------|------------| +| PRIMARY (10.0.20.36) accesibil pe rețea | `ping 10.0.20.36` | +| Oracle 19c instalat pe PRIMARY (versiune identică cu DR) | `sqlplus -V` pe PRIMARY | +| `ORACLE_SID=ROA`, `ORACLE_HOME` la aceeași cale ca pe DR | vezi `proxmox/lxc108-oracle/roa-windows-setup/` | +| Aplicațiile pot fi puse în mentenanță (downtime ~30 min) | window planificat | +| NFS share `oracle-backups` accesibil de pe PRIMARY (sau plan B: copy via SMB/SCP) | `mount -o ... 10.0.20.202:/mnt/pve/oracle-backups F:` | +| Cont SYSDBA disponibil pe PRIMARY (proaspăt instalat = `system/manager` sau parola setată la instalare) | `sqlplus / as sysdba` | +| Backup ZFS replica pveelite → pve1 a dataset-ului `oracle-backups` rulează | `zfs list -t snapshot \| grep oracle-backups` | + +--- + +## 2. Pași — overview + +``` +┌──────────────────────────────────────────────────────────────┐ +│ FAZA 1: PREPARE — DR în mentenanță, backup final │ +│ 1. Anunț downtime aplicații │ +│ 2. Read-only pe DR │ +│ 3. RMAN full backup pe DR → NFS │ +│ 4. Switch + archivelog pentru tranzacții finale │ +├──────────────────────────────────────────────────────────────┤ +│ FAZA 2: RESTORE — restore pe PRIMARY nou │ +│ 5. Mount NFS pe PRIMARY (sau copy backup-uri) │ +│ 6. Cleanup PRIMARY (dacă reinstalare a creat DB demo) │ +│ 7. Rulare rman_restore_to_primary.ps1 │ +│ 8. Verify DB OPEN + tabele │ +├──────────────────────────────────────────────────────────────┤ +│ FAZA 3: SWITCH — comutare aplicații, reset infra │ +│ 9. Update connection strings: 10.0.20.37 → 10.0.20.36 │ +│ 10. Test conectivitate aplicații │ +│ 11. Reactivează scheduled tasks RMAN backup pe PRIMARY │ +│ 12. Stop VM 109 (`qm stop 109`), readuce în `state=stopped`│ +│ 13. Verifică ZFS replica + flow nou backup-uri pe NFS │ +└──────────────────────────────────────────────────────────────┘ +``` + +--- + +## FAZA 1 — Prepare + +### Pas 1: Anunț downtime aplicații + +Tot traficul de scriere trebuie oprit în timpul backup-ului final. Notifică utilizatorii și planifică un window de ~30 min. + +### Pas 2: Read-only pe DR (10.0.20.37) + +```sql +-- Conectare la DR +sqlplus / as sysdba + +-- Verifică tranzacții active +SELECT username, status, count(*) FROM v$session WHERE type='USER' GROUP BY username,status; + +-- Pune DB în restricted mode (doar SYSDBA poate scrie) +ALTER SYSTEM ENABLE RESTRICTED SESSION; + +-- Sau, mai sigur: read-only mode (necesită shutdown + mount) +SHUTDOWN IMMEDIATE; +STARTUP MOUNT; +ALTER DATABASE OPEN READ ONLY; +``` + +> **Atenție**: read-only mode oprește scrierile complet, dar și aplicațiile vor primi erori la INSERT/UPDATE. Restricted session e mai blând dacă aplicațiile pot tolera disconectare scurtă. + +### Pas 3: RMAN full backup pe DR + +```cmd +REM Pe DR VM (10.0.20.37) - dacă e încă în restricted/read-only +D:\oracle\scripts\rman_backup.bat +``` + +Backup-ul este scris pe `F:\ROA\autobackup` (NFS mount din `10.0.20.202:/mnt/pve/oracle-backups`). + +### Pas 4: Switch + archivelog pentru tranzacții finale + +Dacă DB e încă în READ WRITE, capturează ultimele archivelogs: + +```sql +ALTER SYSTEM SWITCH LOGFILE; +ALTER SYSTEM ARCHIVE LOG CURRENT; + +-- Backup archivelog +-- în RMAN: +RMAN> BACKUP ARCHIVELOG ALL DELETE INPUT; +``` + +Notează **SCN-ul curent** — necesar pentru verificare ulterioară: + +```sql +SELECT CURRENT_SCN FROM v$database; +-- Notează valoarea, ex: 12345678 +``` + +### Pas 5: Verifică backup-urile pe NFS + +```bash +ssh root@10.0.20.202 "ls -lt /mnt/pve/oracle-backups/ROA/autobackup/ | head -20" +``` + +Trebuie să vezi backup-urile recente (data/ora din pasul 3-4). + +--- + +## FAZA 2 — Restore pe PRIMARY + +### Pas 6: Mount NFS pe PRIMARY (sau copy backup-uri) + +**Opțiunea A (recomandată): NFS mount direct pe PRIMARY** + +Pe Windows PRIMARY: + +```powershell +# Activare NFS Client (o singură dată, dacă nu e instalat) +Install-WindowsFeature -Name NFS-Client + +# Mount NFS share ca F: +mount -o anon,nolock,mtype=hard,timeout=60 10.0.20.202:/mnt/pve/oracle-backups F: + +# Verifică +dir F:\ROA\autobackup +``` + +**Opțiunea B (fallback): Copy via SMB/SCP** + +Dacă NFS nu funcționează pe PRIMARY (rar): + +```powershell +# Pe PRIMARY, copy de pe DR VM care are F:\ montat: +robocopy \\10.0.20.37\F$\ROA\autobackup F:\ROA\autobackup /MIR /Z +``` + +### Pas 7: Cleanup PRIMARY (dacă reinstalare a creat DB demo) + +Dacă instalarea Oracle a creat o DB demo (ex: ORCL), șterge-o ca să eviți conflicte: + +```cmd +REM Pe PRIMARY +sqlplus / as sysdba +SHUTDOWN ABORT; +EXIT; + +oradim -delete -sid ORCL +``` + +Apoi rulează `cleanup_database.ps1` de pe DR (copiat în prealabil pe PRIMARY) sau doar șterge directoarele oradata/recovery_area existente pentru SID `ROA` (atenție: dacă PRIMARY are un `OracleServiceROA` rezidual de la instalare anterioară, oprește și șterge serviciul cu `oradim -delete -sid ROA`). + +### Pas 8: Rulează scriptul de restore + +```cmd +REM Pe PRIMARY (10.0.20.36) +powershell -ExecutionPolicy Bypass -File D:\oracle\scripts\rman_restore_to_primary.ps1 +``` + +Scriptul `rman_restore_to_primary.ps1` (vezi `proxmox/vm109-windows-dr/scripts/`): +- Folosește același DBID `1363569330` +- Restore din `F:\ROA\autobackup` +- Configurează listener pe `10.0.20.36:1521` +- Setează SPFILE și service `OracleServiceROA` pe AUTOMATIC + +### Pas 9: Verifică DB + +```sql +sqlplus / as sysdba + +SELECT name, open_mode, dbid FROM v$database; +-- Așteptat: ROA, READ WRITE, 1363569330 + +SELECT current_scn FROM v$database; +-- Trebuie să fie >= SCN-ul notat în pasul 4 + +SELECT count(*) FROM dba_tables WHERE owner NOT IN ('SYS','SYSTEM','XDB','GSMADMIN_INTERNAL','APPQOSSYS','OUTLN','DBSNMP','WMSYS','OLAPSYS','MDSYS','CTXSYS','EXFSYS','ORDSYS','LBACSYS'); +-- Compară cu count-ul de pe DR (înainte de read-only) + +-- Test scriere +CREATE TABLE test_failback_check (id NUMBER, ts DATE); +INSERT INTO test_failback_check VALUES (1, SYSDATE); +COMMIT; +SELECT * FROM test_failback_check; +DROP TABLE test_failback_check PURGE; +``` + +### Pas 10: Reset RMAN catalog + +După `OPEN RESETLOGS`, marchează incarnation-ul nou: + +```cmd +rman target / + +RMAN> LIST INCARNATION; +RMAN> RESET DATABASE TO INCARNATION ; +RMAN> CROSSCHECK BACKUP; +RMAN> DELETE NOPROMPT EXPIRED BACKUP; +``` + +--- + +## FAZA 3 — Switch & cleanup + +### Pas 11: Update connection strings aplicații + +Schimbă în toate aplicațiile: + +``` +ÎNAINTE: 10.0.20.37:1521/ROA (DR) +DUPĂ: 10.0.20.36:1521/ROA (PRIMARY restaurat) +``` + +Locuri de modificat (conform `vm109-windows-dr/README.md`): +- Aplicații client (TNS_ADMIN sau connection strings hardcoded) +- Reverse proxy IIS (VM 201) — dacă rutează către Oracle +- Flowise (LXC 104) — variabile de mediu Oracle +- Scheduled tasks/cron-uri ce conectează la Oracle + +### Pas 12: Test conectivitate + +```bash +# De pe Claude Agent (10.0.20.171) +sqlplus user/pass@10.0.20.36:1521/ROA + +# Test din aplicație principală — un read + un write minor +``` + +### Pas 13: Reactivează scheduled tasks RMAN pe PRIMARY + +Pe PRIMARY (Windows Task Scheduler): +- `Oracle RMAN Full Backup` — săptămânal (rulează `D:\oracle\scripts\rman_backup.bat`) +- `Oracle RMAN Incremental` — zilnic (rulează `D:\oracle\scripts\rman_backup_incremental.bat`) +- `Oracle Backup Transfer` — după fiecare backup (rulează `D:\oracle\scripts\transfer_backups.ps1`) + +Verifică că rulează corect: + +```cmd +REM Forțează un transfer de test +powershell -File D:\oracle\scripts\transfer_backups.ps1 + +REM Verifică pe Proxmox că au ajuns +ssh root@10.0.20.202 "ls -lt /mnt/pve/oracle-backups/ROA/autobackup/ | head -5" +``` + +### Pas 14: Oprire VM 109, revenire la state normal + +```bash +# Pe pveelite +ssh root@10.0.20.203 "qm stop 109" + +# Verifică HA config: VM 109 trebuie să rămână state=stopped, nofailback=1 +ssh root@10.0.20.203 "ha-manager config | grep -A2 vm:109" +# Așteptat: +# state stopped +# group ha-prefer-pveelite +# nofailback 1 +``` + +### Pas 15: Verifică ZFS replicare backup-uri + +```bash +# Snapshot-urile noi de pe pveelite trebuie să apară pe pve1 +ssh root@10.0.20.201 "zfs list -t snapshot rpool/oracle-backups | tail -10" + +# Verifică job-ul de replicare +ssh root@10.0.20.203 "cat /var/log/oracle-dr/replication.log | tail -20" +``` + +### Pas 16: Reactivează test săptămânal DR + +Test-ul săptămânal din cron (Sâmbătă 06:00) va rula automat. Verifică prima rulare după failback ca să confirmi că backup-urile noi de pe PRIMARY sunt restaurabile pe VM 109: + +```bash +# Sâmbăta următoare, după 06:00 +ssh root@10.0.20.203 "ls -lt /var/log/oracle-dr/dr_test_*.log | head -1" +# Apoi tail pentru "PASSED" +``` + +--- + +## Roll-back (dacă failback eșuează) + +Dacă restore pe PRIMARY eșuează sau verificarea găsește lipsuri de date: + +1. **NU șterge** baza de pe DR — e încă fallback-ul tău +2. Pune aplicațiile înapoi pe `10.0.20.37:1521/ROA` (DR) +3. Repornește scrierile pe DR (`ALTER SYSTEM DISABLE RESTRICTED SESSION;` sau `STARTUP` normal dacă era read-only) +4. Investigare separată pe PRIMARY — doar după ce înțelegi cauza, încearcă din nou + +--- + +## Anexă A — Diferențe între DR test și failback real + +| Aspect | DR test săptămânal | Failback real | +|--------|---------------------|---------------| +| Sursă backup | NFS, ultimele backup-uri din `F:\` | NFS, **backup proaspăt** făcut de DR în pas 3 | +| TestMode flag | `-TestMode` (skip listener config) | **NU** TestMode (full config + listener) | +| Cleanup post-restore | DA — `cleanup_database.ps1 /AFTER` | **NU** — DB e producția acum | +| Stop VM/server după | DA — `qm stop 109` | **NU** — server rămâne up | +| Connection strings | nu se schimbă | se schimbă pe PRIMARY | +| Scheduled tasks RMAN | nu se ating | se reactivează pe PRIMARY | + +## Anexă B — Estimări durată + +| Pas | Durată estimată | +|-----|-----------------| +| Pas 2-4: Read-only + final backup pe DR | 5-10 min (depinde mărime DB) | +| Pas 6: NFS mount + verify | 2 min | +| Pas 7-8: Cleanup + restore script | 15-25 min (RMAN restore din 139 backup files) | +| Pas 9-10: Verify + RMAN catalog reset | 5 min | +| Pas 11-12: Switch aplicații + test | 5-10 min | +| **Total downtime** | **~30-50 min** | + +--- + +**Last updated**: 2026-04-25 +**Status**: Procedura nu a fost încă executată end-to-end — testează într-un environment de probă (vezi VM 302) înainte de un failback real. diff --git a/proxmox/vm109-windows-dr/scripts/rman_restore_to_primary.ps1 b/proxmox/vm109-windows-dr/scripts/rman_restore_to_primary.ps1 new file mode 100644 index 0000000..a23879f --- /dev/null +++ b/proxmox/vm109-windows-dr/scripts/rman_restore_to_primary.ps1 @@ -0,0 +1,452 @@ +# RMAN Restore to PRIMARY (failback procedure) +# Restores ROA database on a freshly-installed/reinstalled PRIMARY server (10.0.20.36) +# from backups on F:\ (NFS mount from Proxmox host). +# +# This is the FAILBACK companion to rman_restore_from_zero.ps1: +# - rman_restore_from_zero.ps1: PRIMARY -> DR (during disaster activation) +# - rman_restore_to_primary.ps1 (this): DR -> PRIMARY (after PRIMARY repaired) +# +# See docs/FAILBACK_PROCEDURE.md for the full procedure. +# +# Run as: Administrator on PRIMARY (10.0.20.36) +# Location: D:\oracle\scripts\rman_restore_to_primary.ps1 +# +# !!! IMPORTANT — ORACLE VERSION !!! +# This script REQUIRES Oracle Database 19c on PRIMARY and aborts otherwise. +# Backups are 19.3 with compatible=19.0.0. Restoring onto 19c is the path tested +# weekly on DR — same flow, no upgrade step. +# +# Technically 21c can RMAN-restore 19c backups, BUT the database then needs +# STARTUP UPGRADE + dbupgrade (catctl.pl) before OPEN — adds 30-60 min and an +# untested upgrade-during-failback risk. For a calm planned migration to 21c, +# do failback to 19c first, then upgrade in a separate window. +# Use installer WINDOWS.X64_193000_db_home.zip. +# +# Prerequisites: +# 1. Final RMAN backup taken on DR (10.0.20.37) and visible on F:\ROA\autobackup +# 2. Oracle 19c installed on PRIMARY at the same path as on DR (NOT 21c!) +# 3. ORACLE_SID=ROA, no demo DB present (if any, run cleanup_database.ps1 first) +# 4. NFS mount: mount -o anon,nolock,mtype=hard,timeout=60 10.0.20.202:/mnt/pve/oracle-backups F: +# +# Parameters: +# -SkipCleanup: Skip the destructive cleanup step (use only if PRIMARY is already clean) +# -DryRun: Print actions but do not execute (validate config before real run) + +param( + [switch]$SkipCleanup, + [switch]$DryRun +) + +$ErrorActionPreference = "Continue" + +$env:ORACLE_HOME = "C:\Users\Administrator\Downloads\WINDOWS.X64_193000_db_home" +$env:ORACLE_SID = "ROA" +$env:PATH = "$env:ORACLE_HOME\bin;$env:PATH" + +$DBID = "1363569330" +$LISTENER_IP = "10.0.20.36" +$LISTENER_PORT = "1521" +$SERVICE_NAME = "ROA" + +Write-Host "============================================" +Write-Host "RMAN Restore TO PRIMARY (Failback)" +Write-Host "============================================" +Write-Host "" +Write-Host "Target: PRIMARY $LISTENER_IP" +Write-Host "Database: $SERVICE_NAME" +Write-Host "DBID: $DBID" +Write-Host "Backups: F:\ROA\autobackup (NFS from Proxmox)" +Write-Host "DryRun: $DryRun" +Write-Host "" + +# Sanity: confirm we're running on PRIMARY, not on DR +$myIP = (Get-NetIPAddress -AddressFamily IPv4 | Where-Object { $_.IPAddress -like "10.0.20.*" }).IPAddress +if ($myIP -eq "10.0.20.37") { + Write-Host "ERROR: This script is for PRIMARY (10.0.20.36), but I'm running on DR (10.0.20.37)" -ForegroundColor Red + Write-Host "Did you mean to run rman_restore_from_zero.ps1 instead?" + exit 1 +} +if ($myIP -ne $LISTENER_IP) { + Write-Host "WARNING: My IP is $myIP, expected $LISTENER_IP" -ForegroundColor Yellow + Write-Host "Continue anyway? (Ctrl+C to abort, Enter to continue)" + if (-not $DryRun) { Read-Host } +} + +# Verify Oracle version is 19c — RMAN backups are 19.3 / compatible=19.0.0 and +# CANNOT be restored on a 21c+ instance (datafile header version mismatch). +Write-Host "[CHECK] Verifying Oracle version on PRIMARY..." +$sqlplusBin = Join-Path $env:ORACLE_HOME "bin\sqlplus.exe" +if (-not (Test-Path $sqlplusBin)) { + Write-Host "ERROR: sqlplus.exe not found at $sqlplusBin" -ForegroundColor Red + Write-Host " Is Oracle Database 19c installed at ORACLE_HOME=$env:ORACLE_HOME ?" + exit 1 +} +$versionOutput = & $sqlplusBin -V 2>&1 | Out-String +# Expected: "SQL*Plus: Release 19.0.0.0.0 ..." +if ($versionOutput -match "Release\s+(\d+)\.(\d+)") { + $majorVersion = [int]$Matches[1] + Write-Host "[CHECK] Detected Oracle major version: $majorVersion" + if ($majorVersion -ne 19) { + Write-Host "" + Write-Host "ERROR: Oracle major version is $majorVersion, this script supports only 19c." -ForegroundColor Red + Write-Host " Backups are 19.3 (compatible=19.0.0)." -ForegroundColor Red + Write-Host "" + Write-Host " 21c CAN technically restore 19c backups, but requires an extra" -ForegroundColor Yellow + Write-Host " STARTUP UPGRADE + dbupgrade step (~30-60 min) which this script" -ForegroundColor Yellow + Write-Host " does NOT perform. Doing a cross-version upgrade during a failback" -ForegroundColor Yellow + Write-Host " is risky — recommended path is: install 19c, failback, then upgrade" -ForegroundColor Yellow + Write-Host " to 21c later in a planned window (DBUA or Data Pump)." -ForegroundColor Yellow + Write-Host "" + Write-Host " Install Oracle Database 19c (WINDOWS.X64_193000_db_home.zip) and re-run." -ForegroundColor Yellow + exit 1 + } + Write-Host "[OK] Oracle 19c confirmed" +} else { + Write-Host "WARNING: Could not parse Oracle version from sqlplus output:" -ForegroundColor Yellow + Write-Host $versionOutput + Write-Host "Continue anyway? (yes/no)" + $confirm = Read-Host + if ($confirm -ne "yes") { exit 1 } +} + +# Verify NFS mount +if (-not (Test-Path "F:\ROA\autobackup")) { + Write-Host "ERROR: F:\ROA\autobackup not accessible!" -ForegroundColor Red + Write-Host "" + Write-Host "Mount NFS first:" + Write-Host " mount -o anon,nolock,mtype=hard,timeout=60 10.0.20.202:/mnt/pve/oracle-backups F:" + exit 1 +} +Write-Host "[OK] F:\ROA\autobackup is accessible" + +# Verify backup freshness — warn if newest backup is > 2 hours old +$newest = Get-ChildItem "F:\ROA\autobackup\*.BKP" -ErrorAction SilentlyContinue | + Sort-Object LastWriteTime -Descending | Select-Object -First 1 +if ($newest) { + $ageHours = ((Get-Date) - $newest.LastWriteTime).TotalHours + Write-Host "[INFO] Newest backup: $($newest.Name) ($([math]::Round($ageHours,1))h old)" + if ($ageHours -gt 2) { + Write-Host "WARNING: Newest backup is more than 2 hours old." -ForegroundColor Yellow + Write-Host " Did you take a final backup on DR before starting failback?" + Write-Host " (See FAILBACK_PROCEDURE.md Pas 3-4)" + if (-not $DryRun) { + $confirm = Read-Host "Continue anyway? (yes/no)" + if ($confirm -ne "yes") { exit 1 } + } + } +} else { + Write-Host "ERROR: No .BKP files found in F:\ROA\autobackup" -ForegroundColor Red + exit 1 +} + +# Create local directories +New-Item -ItemType Directory -Path "D:\oracle\temp" -Force -ErrorAction SilentlyContinue | Out-Null +New-Item -ItemType Directory -Path "D:\oracle\logs" -Force -ErrorAction SilentlyContinue | Out-Null +New-Item -ItemType Directory -Path "C:\Users\oracle\oradata\ROA" -Force -ErrorAction SilentlyContinue | Out-Null +New-Item -ItemType Directory -Path "C:\Users\oracle\recovery_area\ROA" -Force -ErrorAction SilentlyContinue | Out-Null +New-Item -ItemType Directory -Path "C:\Users\oracle\admin\ROA\adump" -Force -ErrorAction SilentlyContinue | Out-Null +New-Item -ItemType Directory -Path "C:\Users\oracle\admin\ROA\pfile" -Force -ErrorAction SilentlyContinue | Out-Null + +if ($DryRun) { + Write-Host "" + Write-Host "[DRYRUN] Would proceed to STEP 1 (cleanup), STEP 2 (restore), STEP 3 (configure listener)." + Write-Host "[DRYRUN] No changes made. Re-run without -DryRun to execute." + exit 0 +} + +# ============================================ +# STEP 1: CLEANUP — delete any pre-existing DB +# ============================================ +if (-not $SkipCleanup) { + Write-Host "" + Write-Host "============================================" + Write-Host "STEP 1: CLEANUP — remove any existing DB" + Write-Host "============================================" + + if (Test-Path "D:\oracle\scripts\cleanup_database.ps1") { + & "D:\oracle\scripts\cleanup_database.ps1" /SILENT + if ($LASTEXITCODE -ne 0) { + Write-Host "ERROR: Cleanup failed!" -ForegroundColor Red + exit 1 + } + } else { + Write-Host "WARNING: cleanup_database.ps1 not found, doing minimal cleanup..." -ForegroundColor Yellow + $svc = Get-Service -Name "OracleService$SERVICE_NAME" -ErrorAction SilentlyContinue + if ($svc) { + Stop-Service -Name "OracleService$SERVICE_NAME" -Force -ErrorAction SilentlyContinue + & oradim -delete -sid $SERVICE_NAME 2>&1 | Out-Null + } + Get-ChildItem "C:\Users\oracle\oradata\ROA" -ErrorAction SilentlyContinue | Remove-Item -Recurse -Force -ErrorAction SilentlyContinue + Get-ChildItem "C:\Users\oracle\recovery_area\ROA" -ErrorAction SilentlyContinue | Remove-Item -Recurse -Force -ErrorAction SilentlyContinue + } + Write-Host "[OK] Cleanup complete" +} else { + Write-Host "[SKIP] Cleanup skipped (-SkipCleanup)" +} + +# ============================================ +# STEP 2: RESTORE +# ============================================ +Write-Host "" +Write-Host "============================================" +Write-Host "STEP 2: RESTORE from F:\ backups" +Write-Host "============================================" + +# Step 2.1: Create PFILE +$pfilePath = "C:\Users\oracle\admin\ROA\pfile\initROA.ora" +$pfileContent = @" +# Initialization parameters for ROA — PRIMARY (failback) +# Auto-generated by rman_restore_to_primary.ps1 — $(Get-Date -Format 'yyyy-MM-dd HH:mm:ss') + +db_name=ROA +db_unique_name=ROA + +memory_target=2048M +memory_max_target=2048M + +control_files=('C:\Users\oracle\oradata\ROA\control01.ctl', 'C:\Users\oracle\recovery_area\ROA\control02.ctl') +db_recovery_file_dest='C:\Users\oracle\recovery_area' +db_recovery_file_dest_size=100G +audit_file_dest='C:\Users\oracle\admin\ROA\adump' + +log_archive_format=%t_%s_%r.dbf + +compatible=19.0.0 + +nls_language=AMERICAN +nls_territory=AMERICA + +processes=300 +sessions=472 + +diagnostic_dest='C:\Users\oracle' +"@ +$pfileContent | Out-File -FilePath $pfilePath -Encoding ASCII +Write-Host "[OK] PFILE created at $pfilePath" + +# Step 2.2: Create Oracle service +Write-Host "[2.2] Creating Oracle service $SERVICE_NAME..." +$svc = Get-Service -Name "OracleService$SERVICE_NAME" -ErrorAction SilentlyContinue +if (-not $svc) { + & oradim -new -sid $SERVICE_NAME -startmode auto -pfile $pfilePath 2>&1 | Out-Null + if ($LASTEXITCODE -ne 0) { + Write-Host "ERROR: oradim -new failed" -ForegroundColor Red + exit 1 + } + Write-Host "[OK] Service OracleService$SERVICE_NAME created" + Start-Sleep -Seconds 3 +} else { + Write-Host "[OK] Service OracleService$SERVICE_NAME already exists" +} + +# Step 2.3: Startup NOMOUNT +Write-Host "[2.3] Starting database NOMOUNT..." +@" +WHENEVER SQLERROR CONTINUE +SHUTDOWN ABORT; +EXIT; +"@ | & sqlplus -S / as sysdba 2>&1 | Out-Null + +@" +STARTUP NOMOUNT PFILE='$pfilePath'; +EXIT; +"@ | & sqlplus -S / as sysdba 2>&1 | Out-Null + +if ($LASTEXITCODE -ne 0) { + Write-Host "ERROR: STARTUP NOMOUNT failed" -ForegroundColor Red + exit 1 +} +Write-Host "[OK] Database in NOMOUNT mode" + +# Step 2.4: Copy ALL backups from F:\ to local recovery area (failback uses ALL, not test subset) +Write-Host "[2.4] Copying ALL backups from F:\ROA\autobackup to local recovery area..." +$backupFiles = Get-ChildItem "F:\ROA\autobackup\*.BKP" -ErrorAction Continue +if ($backupFiles.Count -lt 2) { + Write-Host "ERROR: Insufficient backups (found $($backupFiles.Count))" -ForegroundColor Red + exit 1 +} + +Write-Host "[INFO] Copying $($backupFiles.Count) files (~$([math]::Round(($backupFiles | Measure-Object -Property Length -Sum).Sum / 1GB, 2)) GB)" +$copyErrors = @() +foreach ($f in $backupFiles) { + try { + Copy-Item $f.FullName "C:\Users\oracle\recovery_area\ROA\autobackup\" -Force -ErrorAction Stop + } catch { + $copyErrors += "$($f.Name): $_" + } +} +if ($copyErrors.Count -gt 0) { + Write-Host "ERROR: $($copyErrors.Count) copy failures:" -ForegroundColor Red + $copyErrors | ForEach-Object { Write-Host " $_" -ForegroundColor Red } + exit 1 +} +Write-Host "[OK] All backups copied" + +# Step 2.5: RMAN restore +$rmanScript = "D:\oracle\temp\restore_to_primary.rman" +$logFile = "D:\oracle\logs\restore_to_primary.log" + +# Note: RECOVER DATABASE (without NOREDO) — apply all archivelogs through final SCN. +# RECOVER DATABASE NOREDO is for the DR test path where archivelogs may be stale. +# For failback we want every committed transaction up to the final backup. +$rmanContent = @" +SET DBID $DBID; + +RUN { + ALLOCATE CHANNEL ch1 DEVICE TYPE DISK; + RESTORE CONTROLFILE FROM AUTOBACKUP; + RELEASE CHANNEL ch1; +} + +ALTER DATABASE MOUNT; + +CATALOG START WITH 'C:/USERS/ORACLE/RECOVERY_AREA/ROA/AUTOBACKUP' NOPROMPT; +CROSSCHECK BACKUP; +CROSSCHECK ARCHIVELOG ALL; +DELETE NOPROMPT EXPIRED BACKUP; + +RUN { + ALLOCATE CHANNEL ch1 DEVICE TYPE DISK; + ALLOCATE CHANNEL ch2 DEVICE TYPE DISK; + RESTORE DATABASE; + RECOVER DATABASE; + RELEASE CHANNEL ch1; + RELEASE CHANNEL ch2; +} + +ALTER DATABASE OPEN RESETLOGS; + +LIST INCARNATION; + +DELETE NOPROMPT OBSOLETE; + +EXIT; +"@ +$rmanContent | Out-File -FilePath $rmanScript -Encoding ASCII + +Write-Host "[2.5] Running RMAN restore (10-25 min)..." +Write-Host " Log: $logFile" +& rman target / cmdfile=$rmanScript log=$logFile + +if ($LASTEXITCODE -ne 0) { + Write-Host "ERROR: RMAN restore failed. Last 40 lines of log:" -ForegroundColor Red + Get-Content $logFile -Tail 40 + exit 1 +} +Write-Host "[OK] RMAN restore complete" + +# ============================================ +# STEP 3: CONFIGURE — SPFILE, listener, register +# ============================================ +Write-Host "" +Write-Host "============================================" +Write-Host "STEP 3: CONFIGURE listener and SPFILE" +Write-Host "============================================" + +# Step 3.1: SPFILE +Write-Host "[3.1] Creating SPFILE..." +@" +CREATE SPFILE FROM PFILE='$pfilePath'; +EXIT; +"@ | & sqlplus -S / as sysdba 2>&1 | Out-Null + +# Step 3.2: Recreate service to use SPFILE +Write-Host "[3.2] Reconfiguring service to use SPFILE..." +@" +SHUTDOWN IMMEDIATE; +EXIT; +"@ | & sqlplus -S / as sysdba 2>&1 | Out-Null +Start-Sleep -Seconds 3 + +& oradim -delete -sid $SERVICE_NAME 2>&1 | Out-Null +Start-Sleep -Seconds 2 +& oradim -new -sid $SERVICE_NAME -startmode auto -spfile 2>&1 | Out-Null + +# Step 3.3: Startup with SPFILE +@" +STARTUP; +EXIT; +"@ | & sqlplus -S / as sysdba 2>&1 | Out-Null +Start-Sleep -Seconds 3 +Write-Host "[OK] Database started with SPFILE" + +# Step 3.4: Configure listener.ora to bind on PRIMARY IP +Write-Host "[3.4] Configuring listener for $LISTENER_IP`:$LISTENER_PORT..." +$listenerOra = "$env:ORACLE_HOME\network\admin\listener.ora" +$listenerContent = @" +LISTENER = + (DESCRIPTION_LIST = + (DESCRIPTION = + (ADDRESS = (PROTOCOL = TCP)(HOST = $LISTENER_IP)(PORT = $LISTENER_PORT)) + (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521)) + ) + ) + +SID_LIST_LISTENER = + (SID_LIST = + (SID_DESC = + (GLOBAL_DBNAME = $SERVICE_NAME) + (ORACLE_HOME = $env:ORACLE_HOME) + (SID_NAME = $SERVICE_NAME) + ) + ) +"@ +$listenerContent | Out-File -FilePath $listenerOra -Encoding ASCII + +Set-Service -Name "OracleOraDB19Home1TNSListener" -StartupType Automatic -ErrorAction SilentlyContinue +Restart-Service -Name "OracleOraDB19Home1TNSListener" -Force -ErrorAction SilentlyContinue + +Start-Sleep -Seconds 3 + +# Step 3.5: Register DB with listener +@" +ALTER SYSTEM SET LOCAL_LISTENER='(ADDRESS=(PROTOCOL=TCP)(HOST=$LISTENER_IP)(PORT=$LISTENER_PORT))' SCOPE=BOTH; +ALTER SYSTEM REGISTER; +EXIT; +"@ | & sqlplus -S / as sysdba 2>&1 | Out-Null +Write-Host "[OK] Database registered with listener" + +# ============================================ +# STEP 4: VERIFY +# ============================================ +Write-Host "" +Write-Host "============================================" +Write-Host "STEP 4: VERIFY" +Write-Host "============================================" + +@" +SET PAGESIZE 100 LINESIZE 200 +COLUMN info FORMAT A100 +SELECT 'DB: ' || NAME || ' | OPEN_MODE: ' || OPEN_MODE || ' | DBID: ' || DBID || ' | SCN: ' || CURRENT_SCN AS info FROM V`$DATABASE; +SELECT 'INSTANCE: ' || INSTANCE_NAME || ' | STATUS: ' || STATUS AS info FROM V`$INSTANCE; +SELECT 'LISTENER_HOST: ' || NETWORK_NAME || ' | SERVICE: ' || NAME AS info FROM V`$ACTIVE_SERVICES WHERE NAME = '$SERVICE_NAME'; +SELECT 'TABLESPACES: ' || COUNT(*) AS info FROM DBA_TABLESPACES; +SELECT 'DATAFILES: ' || COUNT(*) AS info FROM DBA_DATA_FILES; +SELECT 'USER_TABLES: ' || COUNT(*) AS info FROM DBA_TABLES + WHERE OWNER NOT IN ('SYS','SYSTEM','XDB','GSMADMIN_INTERNAL','APPQOSSYS','OUTLN','DBSNMP','WMSYS','OLAPSYS','MDSYS','CTXSYS','EXFSYS','ORDSYS','LBACSYS'); +EXIT; +"@ | & sqlplus -S / as sysdba + +# Listener status +Write-Host "" +Write-Host "[4.1] Listener status:" +& lsnrctl status + +Write-Host "" +Write-Host "============================================" +Write-Host "Failback restore COMPLETE" +Write-Host "============================================" +Write-Host "" +Write-Host "Restore log: $logFile" +Write-Host "" +Write-Host "Verify SCN above is >= the SCN you noted on DR after final archive log switch." +Write-Host "" +Write-Host "Next steps (manual):" +Write-Host " 1. Test app connectivity: sqlplus user/pass@$LISTENER_IP`:$LISTENER_PORT/$SERVICE_NAME" +Write-Host " 2. Update app connection strings: 10.0.20.37 -> $LISTENER_IP" +Write-Host " 3. Re-enable scheduled tasks: rman_backup.bat, rman_backup_incremental.bat, transfer_backups.ps1" +Write-Host " 4. Stop DR VM: ssh root@10.0.20.203 'qm stop 109'" +Write-Host " 5. RMAN catalog: open 'rman target /' and run RESET DATABASE TO INCARNATION " +Write-Host "" + +exit 0