Oracle DR: Complete cleanup and restore scripts with Proxmox integration
- Remove outdated planning documents and implementation guides - Update README with comprehensive DR procedures and monitoring - Enhance rman_restore_from_zero.cmd with SPFILE creation and auto-start - Add Proxmox monitoring and weekly test scripts - Archive old implementation documentation Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
This commit is contained in:
@@ -1,445 +1,389 @@
|
||||
# Oracle ROA - Disaster Recovery Setup
|
||||
## Backup-Based DR: Windows PRIMARY (10.0.20.36) → Linux DR (10.0.20.37)
|
||||
# 🛡️ Oracle DR System - Complete Architecture
|
||||
|
||||
**Database:** ROA (Contabilitate)
|
||||
**Strategie:** 4-Level Backup Protection
|
||||
**RTO:** 45-75 minute
|
||||
**RPO:** Max 1 zi (ultimul backup de la 02:00 AM)
|
||||
## 📊 System Overview
|
||||
|
||||
---
|
||||
|
||||
## 📋 COMPONENTE SISTEM
|
||||
|
||||
### PRIMARY Server (10.0.20.36 - Windows)
|
||||
- Oracle 19c SE2 database ROA (producție)
|
||||
- RMAN backup zilnic la 02:00 AM (COMPRESSED)
|
||||
- Transfer DR la 03:00 AM
|
||||
- Copiere HDD extern la 21:00
|
||||
|
||||
### DR Server (10.0.20.37 - Linux LXC 109)
|
||||
- Docker container: `oracle-standby`
|
||||
- Oracle 19c instalat (database OPRIT până la dezastru)
|
||||
- Primește backup-uri automat de pe PRIMARY
|
||||
- Retenție: 1 backup (DOAR cel mai recent - relevant pentru contabilitate!)
|
||||
|
||||
---
|
||||
|
||||
## 🗂️ FIȘIERE ÎN ACEST DIRECTOR
|
||||
|
||||
| Fișier | Descriere | Folosit Pe |
|
||||
|--------|-----------|------------|
|
||||
| `01_rman_backup_upgraded.txt` | Script RMAN upgrade cu compression | PRIMARY (Windows) |
|
||||
| `02_transfer_to_dr.ps1` | Script PowerShell transfer backups → DR | PRIMARY (Windows) |
|
||||
| `03_setup_dr_transfer_task.ps1` | Setup Task Scheduler pentru transfer | PRIMARY (Windows) |
|
||||
| `04_full_dr_restore.sh` | Script COMPLET restore pe DR (disaster recovery) | DR (Linux) |
|
||||
| `05_test_restore_dr.sh` | Test restore LUNAR (verificare DR capability) | DR (Linux) |
|
||||
| `06_quick_verify_backups.sh` | Verificare ZILNICĂ backup-uri (monitoring) | DR (Linux) |
|
||||
| **OPȚIONAL - Incremental Backups (RPO îmbunătățit):** | | |
|
||||
| `01b_rman_backup_incremental.txt` | Script RMAN incremental (midday) | PRIMARY (Windows) |
|
||||
| `02b_transfer_incremental_to_dr.ps1` | Transfer incremental → DR | PRIMARY (Windows) |
|
||||
| `03b_setup_incremental_tasks.ps1` | Setup tasks pentru incremental | PRIMARY (Windows) |
|
||||
| **Documentație:** | | |
|
||||
| `STRATEGIE_BACKUP_CONTABILITATE.md` | Documentație strategiei complete | Referință |
|
||||
| `STRATEGIE_INCREMENTAL.md` | Backup incremental pentru RPO mai bun (OPȚIONAL) | Referință |
|
||||
| `PLAN_BACKUP_DR_SIMPLE.md` | Plan tehnic detaliat original | Referință |
|
||||
| `VERIFICARE_DR.md` | Ghid verificare și testare DR capability | Referință |
|
||||
| `RATIONAL_RETENTIE.md` | Justificare REDUNDANCY 1 pentru contabilitate | Referință |
|
||||
| `README.md` | Acest fișier - quick start guide | Referință |
|
||||
|
||||
---
|
||||
|
||||
## 🚀 SETUP RAPID (Quick Start)
|
||||
|
||||
### Pas 1: Setup SSH Keys (PRIMARY → DR)
|
||||
|
||||
```powershell
|
||||
# Pe PRIMARY (10.0.20.36) - PowerShell ca Administrator
|
||||
ssh-keygen -t rsa -b 4096 -f "$env:USERPROFILE\.ssh\id_rsa" -N '""'
|
||||
|
||||
# Afișează public key
|
||||
Get-Content "$env:USERPROFILE\.ssh\id_rsa.pub"
|
||||
# Copiază OUTPUT-ul
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ PRODUCTION ENVIRONMENT │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ PRIMARY SERVER (10.0.20.36) │
|
||||
│ Windows Server + Oracle 19c │
|
||||
│ ┌──────────────────────────────┐ │
|
||||
│ │ Database: ROA │ │
|
||||
│ │ Size: ~80 GB │ │
|
||||
│ │ Tables: 42,625 │ │
|
||||
│ └──────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ Backups (Daily) │
|
||||
│ ┌──────────────────────────────┐ │
|
||||
│ │ 02:30 - FULL backup (6-7 GB) │ │
|
||||
│ │ 13:00 - CUMULATIVE (200 MB) │ │
|
||||
│ │ 18:00 - CUMULATIVE (300 MB) │ │
|
||||
│ └──────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
│ SSH Transfer (Port 22)
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ DR ENVIRONMENT │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ PROXMOX HOST (10.0.20.202 - pveelite) │
|
||||
│ ┌──────────────────────────────┐ │
|
||||
│ │ Backup Storage (NFS Server) │◄─────── Monitoring Scripts │
|
||||
│ │ /mnt/pve/oracle-backups/ │ /opt/scripts/ │
|
||||
│ │ └── ROA/autobackup/ │ │
|
||||
│ └──────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ │ NFS Mount (F:\) │
|
||||
│ ▼ │
|
||||
│ ┌──────────────────────────────┐ │
|
||||
│ │ DR VM 109 (10.0.20.37) │ │
|
||||
│ │ Windows Server + Oracle 19c │ │
|
||||
│ │ Status: OFF (normally) │ │
|
||||
│ │ Starts for: Tests or Disaster │ │
|
||||
│ └──────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## 🎯 Quick Actions
|
||||
|
||||
### ⚡ Emergency DR Activation (Production Down!)
|
||||
|
||||
```bash
|
||||
# Pe DR Server (10.0.20.37)
|
||||
ssh root@10.0.20.37
|
||||
# 1. Start DR VM
|
||||
ssh root@10.0.20.202 "qm start 109"
|
||||
|
||||
# Adaugă cheia publică
|
||||
mkdir -p /root/.ssh
|
||||
chmod 700 /root/.ssh
|
||||
nano /root/.ssh/authorized_keys
|
||||
# PASTE cheia publică aici, save (Ctrl+X, Y, Enter)
|
||||
chmod 600 /root/.ssh/authorized_keys
|
||||
# 2. Connect to VM (wait 3 min for boot)
|
||||
ssh -p 22122 romfast@10.0.20.37
|
||||
|
||||
exit
|
||||
# 3. Run restore (takes ~10-15 minutes)
|
||||
D:\oracle\scripts\rman_restore_from_zero.cmd
|
||||
|
||||
# 4. Database is now RUNNING - Update app connections to 10.0.20.37
|
||||
```
|
||||
|
||||
```powershell
|
||||
# Test conexiune (pe PRIMARY)
|
||||
ssh -i "$env:USERPROFILE\.ssh\id_rsa" root@10.0.20.37 "echo 'SSH OK'"
|
||||
# Ar trebui să vezi "SSH OK" FĂRĂ parolă!
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Pas 2: Upgrade Script RMAN Backup (PRIMARY)
|
||||
|
||||
```powershell
|
||||
# Pe PRIMARY - backup scriptul vechi
|
||||
Copy-Item "D:\rman_backup\rman_backup.txt" "D:\rman_backup\rman_backup.txt.backup_$(Get-Date -Format 'yyyyMMdd')"
|
||||
|
||||
# Copiază conținutul din 01_rman_backup_upgraded.txt
|
||||
# în D:\rman_backup\rman_backup.txt
|
||||
|
||||
# SAU direct:
|
||||
# Copy-Item "\\path\to\01_rman_backup_upgraded.txt" "D:\rman_backup\rman_backup.txt"
|
||||
```
|
||||
|
||||
**Ce face upgrade-ul:**
|
||||
- ✅ Adaugă compression → reduce de la 23GB la ~8GB
|
||||
- ✅ Include ARCHIVELOG DELETE INPUT
|
||||
- ✅ REDUNDANCY 1 (păstrează doar ultimul backup - relevant pentru contabilitate!)
|
||||
- ✅ BACKUP VALIDATE (verificare integritate după backup)
|
||||
- ✅ Parallelism 2 channels (mai rapid)
|
||||
|
||||
---
|
||||
|
||||
### Pas 3: Instalare Script Transfer (PRIMARY)
|
||||
|
||||
```powershell
|
||||
# Creare director logs
|
||||
New-Item -ItemType Directory -Force -Path "D:\rman_backup\logs"
|
||||
|
||||
# Copiere script
|
||||
Copy-Item "\\path\to\02_transfer_to_dr.ps1" "D:\rman_backup\transfer_to_dr.ps1"
|
||||
|
||||
# Test manual
|
||||
PowerShell -ExecutionPolicy Bypass -File "D:\rman_backup\transfer_to_dr.ps1"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Pas 4: Setup Task Scheduler (PRIMARY)
|
||||
|
||||
```powershell
|
||||
# Rulează scriptul de setup ca Administrator
|
||||
PowerShell -ExecutionPolicy Bypass -File "\\path\to\03_setup_dr_transfer_task.ps1"
|
||||
|
||||
# SAU manual:
|
||||
$action = New-ScheduledTaskAction -Execute "PowerShell.exe" `
|
||||
-Argument "-ExecutionPolicy Bypass -File D:\rman_backup\transfer_to_dr.ps1"
|
||||
|
||||
$trigger = New-ScheduledTaskTrigger -Daily -At "03:00AM"
|
||||
|
||||
$principal = New-ScheduledTaskPrincipal -UserId "SYSTEM" `
|
||||
-LogonType ServiceAccount -RunLevel Highest
|
||||
|
||||
Register-ScheduledTask -TaskName "Oracle_DR_Transfer" `
|
||||
-Action $action -Trigger $trigger -Principal $principal
|
||||
|
||||
# Verificare
|
||||
Get-ScheduledTask -TaskName "Oracle_DR_Transfer"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Pas 5: Setup DR Server (Linux)
|
||||
### 🧪 Weekly Test (Every Saturday)
|
||||
|
||||
```bash
|
||||
# Pe DR Server (10.0.20.37)
|
||||
ssh root@10.0.20.37
|
||||
# Automatic at 06:00 via cron, or manual:
|
||||
ssh root@10.0.20.202 "/opt/scripts/weekly-dr-test-proxmox.sh"
|
||||
|
||||
# Directoare sunt deja create, verificare:
|
||||
ls -la /opt/oracle/backups/primary/
|
||||
ls -la /opt/oracle/scripts/dr/
|
||||
ls -la /opt/oracle/logs/dr/
|
||||
|
||||
# Verificare container Docker
|
||||
docker ps | grep oracle-standby
|
||||
|
||||
# Verificare Oracle software
|
||||
docker exec -u oracle oracle-standby bash -c 'ls -la $ORACLE_HOME/bin/rman'
|
||||
# What it does:
|
||||
# ✓ Starts VM → Restores DB → Tests → Cleanup → Shutdown
|
||||
# ✓ Sends email report with results
|
||||
```
|
||||
|
||||
**Script-ul de restore (`04_full_dr_restore.sh`) e deja instalat pe DR!**
|
||||
|
||||
---
|
||||
|
||||
## 🔥 DISASTER RECOVERY - Procedură Urgență
|
||||
|
||||
### Când să activezi DR?
|
||||
|
||||
**✅ DA - Activează DR dacă:**
|
||||
- PRIMARY server 10.0.20.36 NU răspunde >30 minute
|
||||
- Oracle database corupt (nu se deschide)
|
||||
- Crash disk C:\ sau D:\
|
||||
- Ransomware / malware
|
||||
|
||||
**❌ NU - Nu activa DR pentru:**
|
||||
- Probleme minore de performance
|
||||
- User șters accidental câteva înregistrări
|
||||
- Restart Windows sau maintenance
|
||||
- Erori fixabile în <30 minute
|
||||
|
||||
---
|
||||
|
||||
### Procedură DR (60 minute)
|
||||
### 📊 Check Backup Health
|
||||
|
||||
```bash
|
||||
# Conectare la DR server
|
||||
ssh root@10.0.20.37
|
||||
# Manual check (runs daily at 09:00 automatically)
|
||||
ssh root@10.0.20.202 "/opt/scripts/oracle-backup-monitor-proxmox.sh"
|
||||
|
||||
# IMPORTANT: Verifică că PRIMARY e CU ADEVĂRAT down!
|
||||
ping -c 10 10.0.20.36
|
||||
# Dacă răspunde → STOP! NU continua!
|
||||
|
||||
# Rulează script restore
|
||||
/opt/oracle/scripts/dr/full_dr_restore.sh
|
||||
|
||||
# Monitorizează progres
|
||||
tail -f /opt/oracle/logs/dr/restore_*.log
|
||||
|
||||
# După ~45-60 minute, verifică database e OPEN
|
||||
docker exec -u oracle oracle-standby bash -c "
|
||||
export ORACLE_SID=ROA
|
||||
export ORACLE_HOME=/opt/oracle/product/19c/dbhome_1
|
||||
\$ORACLE_HOME/bin/sqlplus / as sysdba <<< 'SELECT name, open_mode FROM v\$database;'
|
||||
"
|
||||
|
||||
# Output așteptat:
|
||||
# NAME OPEN_MODE
|
||||
# --------- ----------
|
||||
# ROA READ WRITE
|
||||
# Output:
|
||||
# Status: OK
|
||||
# FULL backup age: 11 hours ✓
|
||||
# CUMULATIVE backup age: 2 hours ✓
|
||||
# Disk usage: 45% ✓
|
||||
```
|
||||
|
||||
**După restore:**
|
||||
1. Update connection strings: `10.0.20.36:1521/ROA` → `10.0.20.37:1521/ROA`
|
||||
2. Notifică utilizatori
|
||||
3. Test aplicații
|
||||
4. Monitorizează performance
|
||||
|
||||
---
|
||||
|
||||
## 📊 ARHITECTURĂ FLOW
|
||||
## 🗂️ Component Locations
|
||||
|
||||
### 📁 PRIMARY Server (10.0.20.36)
|
||||
```
|
||||
┌──────────────────────────────────────────────┐
|
||||
│ PRIMARY 10.0.20.36 (Windows) │
|
||||
│ │
|
||||
│ 02:00 → RMAN Backup COMPRESSED │
|
||||
│ └─ FRA: ~8GB (vs 23GB original) │
|
||||
│ ↓ │
|
||||
│ 21:00 → MareBackup (EXISTENT) │
|
||||
│ └─ Copiere → E:\backup_roa\ │
|
||||
│ ↓ │
|
||||
│ 03:00 → Transfer DR (NOU) │
|
||||
│ └─ SCP → 10.0.20.37 │
|
||||
│ │
|
||||
└──────────────────────────────────────────────┘
|
||||
↓ SSH/SCP
|
||||
┌──────────────────────────────────────────────┐
|
||||
│ DR 10.0.20.37 (Linux LXC 109) │
|
||||
│ Docker: oracle-standby │
|
||||
│ │
|
||||
│ /opt/oracle/backups/primary/ │
|
||||
│ ├─ *.BKP (backup files) │
|
||||
│ └─ Retenție: 1 backup (doar ultimul!) │
|
||||
│ │
|
||||
│ Database: OPRIT (pornit la dezastru) │
|
||||
│ │
|
||||
│ La disaster: │
|
||||
│ → /opt/oracle/scripts/dr/full_dr_restore.sh│
|
||||
│ → RTO: 45-75 minute │
|
||||
│ → RPO: Max 1 zi │
|
||||
│ │
|
||||
└──────────────────────────────────────────────┘
|
||||
D:\rman_backup\
|
||||
├── rman_backup_full.txt # RMAN script for FULL backup
|
||||
├── rman_backup_incremental.txt # RMAN script for CUMULATIVE
|
||||
├── transfer_to_dr.ps1 # Transfer FULL to Proxmox
|
||||
└── transfer_incremental.ps1 # Transfer CUMULATIVE to Proxmox
|
||||
|
||||
Scheduled Tasks:
|
||||
├── 02:30 - Oracle RMAN Full Backup
|
||||
├── 13:00 - Oracle RMAN Cumulative Backup
|
||||
└── 18:00 - Oracle RMAN Cumulative Backup
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ CHECKLIST IMPLEMENTARE
|
||||
|
||||
### Pre-Implementation
|
||||
- [ ] Backup script RMAN vechi (`rman_backup.txt.backup_*`)
|
||||
- [ ] Verificare spațiu disk PRIMARY (C:\, D:\, E:\)
|
||||
- [ ] Verificare spațiu disk DR (`/opt/oracle` >50GB free)
|
||||
- [ ] Container `oracle-standby` rulează pe DR
|
||||
|
||||
### Setup SSH (30 minute)
|
||||
- [ ] Generare SSH keys pe PRIMARY
|
||||
- [ ] Copiere public key pe DR
|
||||
- [ ] Test conexiune passwordless
|
||||
- [ ] Verificare firewall permite port 22
|
||||
|
||||
### PRIMARY Setup (20 minute)
|
||||
- [ ] Upgrade `rman_backup.txt` (adaugă compression)
|
||||
- [ ] Copiere `transfer_to_dr.ps1` în `D:\rman_backup\`
|
||||
- [ ] Creare director `D:\rman_backup\logs\`
|
||||
- [ ] Setup Task Scheduler (Oracle_DR_Transfer la 03:00 AM)
|
||||
- [ ] Test manual transfer script
|
||||
|
||||
### DR Setup (10 minute)
|
||||
- [ ] Verificare directoare (`/opt/oracle/backups/primary`)
|
||||
- [ ] Script `full_dr_restore.sh` instalat
|
||||
- [ ] Permissions corecte (oracle:dba)
|
||||
- [ ] Container Oracle functional
|
||||
|
||||
### Testing (60 minute)
|
||||
- [ ] Test manual RMAN backup (verifică compression)
|
||||
- [ ] Test manual transfer (verifică backup-uri ajung pe DR)
|
||||
- [ ] Verificare logs transfer (fără erori)
|
||||
- [ ] Test restore pe DR (OPȚIONAL dar RECOMANDAT!)
|
||||
|
||||
### Go-Live
|
||||
- [ ] Monitorizare 3 nopți consecutive
|
||||
- [ ] Review logs zilnic
|
||||
- [ ] Documentare issues
|
||||
- [ ] Update documentație
|
||||
|
||||
---
|
||||
|
||||
## 📈 MONITORING
|
||||
|
||||
### Daily Checks (5 minute)
|
||||
|
||||
```powershell
|
||||
# Pe PRIMARY - quick health check
|
||||
# Check 1: Ultimul backup
|
||||
$lastBackup = Get-ChildItem "C:\Users\Oracle\recovery_area\ROA\BACKUPSET" -Recurse -File |
|
||||
Sort-Object LastWriteTime -Descending | Select-Object -First 1
|
||||
$age = (Get-Date) - $lastBackup.LastWriteTime
|
||||
Write-Host "Last backup: $($age.Hours) hours ago"
|
||||
|
||||
# Check 2: Transfer log
|
||||
Get-Content "D:\rman_backup\logs\transfer_*.log" | Select-String "completed successfully" | Select-Object -Last 1
|
||||
|
||||
# Check 3: Disk space
|
||||
Get-PSDrive C,D,E | Format-Table Name, @{L="Free(GB)";E={[math]::Round($_.Free/1GB,1)}}
|
||||
### 📁 PROXMOX Host (10.0.20.202)
|
||||
```
|
||||
/opt/scripts/
|
||||
├── oracle-backup-monitor-proxmox.sh # Daily backup monitoring
|
||||
├── weekly-dr-test-proxmox.sh # Weekly DR test
|
||||
└── PROXMOX_NOTIFICATIONS_README.md # Documentation
|
||||
|
||||
/mnt/pve/oracle-backups/ROA/autobackup/
|
||||
├── FULL_20251010_023001.BKP # Latest FULL backup
|
||||
├── INCR_20251010_130001.BKP # CUMULATIVE 13:00
|
||||
└── INCR_20251010_180001.BKP # CUMULATIVE 18:00
|
||||
|
||||
Cron Jobs:
|
||||
0 9 * * * /opt/scripts/oracle-backup-monitor-proxmox.sh
|
||||
0 6 * * 6 /opt/scripts/weekly-dr-test-proxmox.sh
|
||||
```
|
||||
|
||||
### 📁 DR VM 109 (10.0.20.37) - When Running
|
||||
```
|
||||
D:\oracle\scripts\
|
||||
├── rman_restore_from_zero.cmd # Main restore script ⭐
|
||||
├── cleanup_database.cmd # Cleanup after test
|
||||
└── mount-nfs.bat # Mount F:\ at startup
|
||||
|
||||
F:\ (NFS mount from Proxmox)
|
||||
└── ROA\autobackup\ # All backup files
|
||||
```
|
||||
|
||||
## 🔄 How It Works
|
||||
|
||||
### Backup Flow (Daily)
|
||||
```
|
||||
PRIMARY PROXMOX
|
||||
│ │
|
||||
├─02:30─FULL─Backup────────►
|
||||
│ (6-7 GB) │
|
||||
│ │
|
||||
├─13:00─CUMULATIVE─────────►
|
||||
│ (200 MB) │
|
||||
│ │
|
||||
└─18:00─CUMULATIVE─────────►
|
||||
(300 MB) Storage
|
||||
|
||||
┌──────────┐
|
||||
│ Monitor │ 09:00 Daily
|
||||
│ Check Age│ Alert if old
|
||||
└──────────┘
|
||||
```
|
||||
|
||||
### Restore Process
|
||||
```
|
||||
Start VM → Mount F:\ → Copy Backups → RMAN Restore → Database OPEN
|
||||
2min Auto 2min 8min Ready!
|
||||
|
||||
Total Time: ~15 minutes
|
||||
```
|
||||
|
||||
## 🔧 Manual Operations
|
||||
|
||||
### Test Individual Components
|
||||
|
||||
```bash
|
||||
# Pe DR - săptămânal
|
||||
ssh root@10.0.20.37 "ls -lth /opt/oracle/backups/primary/*.BKP | head -5"
|
||||
# 1. Test backup transfer (on PRIMARY)
|
||||
D:\rman_backup\transfer_incremental.ps1
|
||||
|
||||
# 2. Test NFS mount (on VM 109)
|
||||
mount -o rw,nolock,mtype=hard,timeout=60 10.0.20.202:/mnt/pve/oracle-backups F:
|
||||
dir F:\ROA\autobackup
|
||||
|
||||
# 3. Test notification system
|
||||
ssh root@10.0.20.202 "touch -d '2 days ago' /mnt/pve/oracle-backups/ROA/autobackup/*FULL*.BKP"
|
||||
ssh root@10.0.20.202 "/opt/scripts/oracle-backup-monitor-proxmox.sh"
|
||||
# Should send WARNING notification
|
||||
|
||||
# 4. Test database restore (on VM 109)
|
||||
D:\oracle\scripts\rman_restore_from_zero.cmd
|
||||
```
|
||||
|
||||
### Weekly Checks (10 minute)
|
||||
### Force Actions
|
||||
|
||||
```bash
|
||||
# Pe DR - verificare status backup-uri
|
||||
ssh root@10.0.20.37 "/opt/oracle/scripts/dr/06_quick_verify_backups.sh"
|
||||
# Force backup now (on PRIMARY)
|
||||
rman cmdfile=D:\rman_backup\rman_backup_incremental.txt
|
||||
|
||||
# Force cleanup VM (on VM 109)
|
||||
D:\oracle\scripts\cleanup_database.cmd
|
||||
|
||||
# Force VM shutdown
|
||||
ssh root@10.0.20.202 "qm stop 109"
|
||||
```
|
||||
|
||||
### Monthly Tasks (OBLIGATORIU!)
|
||||
## 🐛 Troubleshooting
|
||||
|
||||
**Prima Duminică a lunii - TEST RESTORE complet:**
|
||||
### ❌ Backup Monitor Not Sending Alerts
|
||||
|
||||
```bash
|
||||
# Pe DR - test restore (durează 45-75 min)
|
||||
ssh root@10.0.20.37
|
||||
/opt/oracle/scripts/dr/05_test_restore_dr.sh
|
||||
# 1. Check templates exist
|
||||
ssh root@10.0.20.202 "ls /usr/share/pve-manager/templates/default/oracle-*"
|
||||
|
||||
# Verifică raport
|
||||
cat /opt/oracle/logs/dr/test_report_$(date +%Y%m%d).txt
|
||||
# 2. Reinstall templates
|
||||
ssh root@10.0.20.202 "/opt/scripts/oracle-backup-monitor-proxmox.sh --install"
|
||||
|
||||
# 3. Check Proxmox notifications work
|
||||
ssh root@10.0.20.202 "pvesh create /nodes/$(hostname)/apt/update"
|
||||
# Should receive update notification
|
||||
```
|
||||
|
||||
- **Review:** Metrics, logs, disk space, RTO
|
||||
- **Update:** Documentație dacă e necesar
|
||||
- **Notifică:** Management despre rezultat test
|
||||
|
||||
---
|
||||
|
||||
## 🐛 TROUBLESHOOTING
|
||||
|
||||
### "Transfer failed - SSH connection refused"
|
||||
|
||||
```powershell
|
||||
# Test conexiune
|
||||
ping 10.0.20.37
|
||||
ssh -v -i "$env:USERPROFILE\.ssh\id_rsa" root@10.0.20.37 "echo OK"
|
||||
```
|
||||
|
||||
**Soluții:**
|
||||
- Verifică DR server pornit
|
||||
- Check firewall (port 22)
|
||||
- Regenerare SSH keys
|
||||
|
||||
---
|
||||
|
||||
### "RMAN backup failed"
|
||||
|
||||
```sql
|
||||
-- Pe PRIMARY
|
||||
sqlplus / as sysdba
|
||||
|
||||
-- Check FRA usage
|
||||
SELECT * FROM v$recovery_area_usage;
|
||||
|
||||
-- Cleanup manual
|
||||
RMAN> DELETE NOPROMPT OBSOLETE;
|
||||
```
|
||||
|
||||
**Soluții:**
|
||||
- Disk plin → cleanup old backups
|
||||
- FRA quota exceeded → increase size
|
||||
- Oracle process crash → restart database
|
||||
|
||||
---
|
||||
|
||||
### "Restore failed on DR"
|
||||
### ❌ F:\ Drive Not Accessible in VM
|
||||
|
||||
```bash
|
||||
# Check backup files integrity
|
||||
md5sum /opt/oracle/backups/primary/*.BKP
|
||||
# On VM 109:
|
||||
# 1. Check NFS Client service
|
||||
Get-Service | Where {$_.Name -like "*NFS*"}
|
||||
|
||||
# Check container logs
|
||||
docker logs oracle-standby --tail 100
|
||||
# 2. Manual mount
|
||||
mount -o rw,nolock,mtype=hard,timeout=60 10.0.20.202:/mnt/pve/oracle-backups F:
|
||||
|
||||
# Check Oracle alert log
|
||||
docker exec oracle-standby tail -100 /opt/oracle/diag/rdbms/roa/ROA/trace/alert_ROA.log
|
||||
# 3. Check Proxmox NFS server
|
||||
ssh root@10.0.20.202 "showmount -e localhost"
|
||||
# Should show: /mnt/pve/oracle-backups 10.0.20.37
|
||||
```
|
||||
|
||||
### ❌ Restore Fails
|
||||
|
||||
```bash
|
||||
# 1. Check backup files exist
|
||||
dir F:\ROA\autobackup\*.BKP
|
||||
|
||||
# 2. Check Oracle service
|
||||
sc query OracleServiceROA
|
||||
|
||||
# 3. Check PFILE exists
|
||||
dir C:\Users\oracle\admin\ROA\pfile\initROA.ora
|
||||
|
||||
# 4. View restore log
|
||||
type D:\oracle\logs\restore_from_zero.log
|
||||
```
|
||||
|
||||
### ❌ VM Won't Start
|
||||
|
||||
```bash
|
||||
# Check VM status
|
||||
ssh root@10.0.20.202 "qm status 109"
|
||||
|
||||
# Check VM config
|
||||
ssh root@10.0.20.202 "qm config 109 | grep -E 'memory|cores|bootdisk'"
|
||||
|
||||
# Force unlock if locked
|
||||
ssh root@10.0.20.202 "qm unlock 109"
|
||||
|
||||
# Start with console
|
||||
ssh root@10.0.20.202 "qm start 109 && qm terminal 109"
|
||||
```
|
||||
|
||||
## 📈 Monitoring & Metrics
|
||||
|
||||
### Key Metrics
|
||||
| Metric | Target | Alert Threshold |
|
||||
|--------|--------|-----------------|
|
||||
| FULL Backup Age | < 24h | > 25h |
|
||||
| CUMULATIVE Age | < 6h | > 7h |
|
||||
| Backup Size | ~7 GB/day | > 10 GB |
|
||||
| Restore Time | < 15 min | > 30 min |
|
||||
| Disk Usage | < 80% | > 80% |
|
||||
|
||||
### Check Logs
|
||||
|
||||
```bash
|
||||
# Backup logs (on PRIMARY)
|
||||
Get-Content D:\rman_backup\logs\backup_*.log -Tail 50
|
||||
|
||||
# Transfer logs (on PRIMARY)
|
||||
Get-Content D:\rman_backup\logs\transfer_*.log -Tail 50
|
||||
|
||||
# Monitoring logs (on Proxmox)
|
||||
tail -50 /var/log/oracle-dr/*.log
|
||||
|
||||
# Restore logs (on VM 109)
|
||||
type D:\oracle\logs\restore_from_zero.log
|
||||
```
|
||||
|
||||
## 🔐 Security & Access
|
||||
|
||||
### SSH Keys Setup
|
||||
```
|
||||
PRIMARY (10.0.20.36) ──────► PROXMOX (10.0.20.202)
|
||||
SSH Key
|
||||
Port 22
|
||||
|
||||
LINUX WORKSTATION ─────────► PROXMOX (10.0.20.202)
|
||||
SSH Key
|
||||
Port 22
|
||||
|
||||
LINUX WORKSTATION ─────────► VM 109 (10.0.20.37)
|
||||
SSH Key
|
||||
Port 22122
|
||||
```
|
||||
|
||||
### Required Credentials
|
||||
- **PRIMARY**: Administrator (for scheduled tasks)
|
||||
- **PROXMOX**: root (for scripts and VM control)
|
||||
- **VM 109**: romfast (user), SYSTEM (Oracle service)
|
||||
|
||||
## 📅 Maintenance Schedule
|
||||
|
||||
| Day | Time | Action | Duration | Impact |
|
||||
|-----|------|--------|----------|--------|
|
||||
| Daily | 02:30 | FULL Backup | 30 min | None |
|
||||
| Daily | 09:00 | Monitor Backups | 1 min | None |
|
||||
| Daily | 13:00 | CUMULATIVE Backup | 5 min | None |
|
||||
| Daily | 18:00 | CUMULATIVE Backup | 5 min | None |
|
||||
| Saturday | 06:00 | DR Test | 30 min | None |
|
||||
|
||||
## 🚨 Disaster Recovery Procedure
|
||||
|
||||
### When PRIMARY is DOWN:
|
||||
|
||||
1. **Confirm PRIMARY is unreachable**
|
||||
```bash
|
||||
ping 10.0.20.36 # Should fail
|
||||
```
|
||||
|
||||
2. **Start DR VM**
|
||||
```bash
|
||||
ssh root@10.0.20.202 "qm start 109"
|
||||
```
|
||||
|
||||
3. **Wait for boot (3 minutes)**
|
||||
|
||||
4. **Connect to DR VM**
|
||||
```bash
|
||||
ssh -p 22122 romfast@10.0.20.37
|
||||
```
|
||||
|
||||
5. **Run restore**
|
||||
```cmd
|
||||
D:\oracle\scripts\rman_restore_from_zero.cmd
|
||||
```
|
||||
|
||||
6. **Verify database**
|
||||
```sql
|
||||
sqlplus / as sysdba
|
||||
SELECT name, open_mode FROM v$database;
|
||||
-- Should show: ROA, READ WRITE
|
||||
```
|
||||
|
||||
7. **Update application connections**
|
||||
- Change from: 10.0.20.36:1521/ROA
|
||||
- Change to: 10.0.20.37:1521/ROA
|
||||
|
||||
8. **Monitor DR system**
|
||||
- Database is now production
|
||||
- Do NOT run cleanup!
|
||||
- Keep VM running
|
||||
|
||||
## 📝 Quick Reference Card
|
||||
|
||||
```
|
||||
╔══════════════════════════════════════════════════════════════╗
|
||||
║ DR QUICK REFERENCE ║
|
||||
╠══════════════════════════════════════════════════════════════╣
|
||||
║ PRIMARY DOWN? ║
|
||||
║ ssh root@10.0.20.202 ║
|
||||
║ qm start 109 ║
|
||||
║ # Wait 3 min ║
|
||||
║ ssh -p 22122 romfast@10.0.20.37 ║
|
||||
║ D:\oracle\scripts\rman_restore_from_zero.cmd ║
|
||||
╠══════════════════════════════════════════════════════════════╣
|
||||
║ TEST DR? ║
|
||||
║ ssh root@10.0.20.202 "/opt/scripts/weekly-dr-test-proxmox.sh"║
|
||||
╠══════════════════════════════════════════════════════════════╣
|
||||
║ CHECK BACKUPS? ║
|
||||
║ ssh root@10.0.20.202 "/opt/scripts/oracle-backup-monitor-proxmox.sh"║
|
||||
╠══════════════════════════════════════════════════════════════╣
|
||||
║ SUPPORT: ║
|
||||
║ Logs: /var/log/oracle-dr/ ║
|
||||
║ Docs: /opt/scripts/PROXMOX_NOTIFICATIONS_README.md ║
|
||||
╚══════════════════════════════════════════════════════════════╝
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📞 SUPPORT
|
||||
|
||||
### Log Locations
|
||||
|
||||
| Tip | Location |
|
||||
|-----|----------|
|
||||
| **RMAN Backup** | Oracle Alert Log |
|
||||
| **Transfer DR** | `D:\rman_backup\logs\transfer_YYYYMMDD.log` |
|
||||
| **Restore DR** | `/opt/oracle/logs/dr/restore_*.log` |
|
||||
| **Task Scheduler** | Event Viewer > Task Scheduler |
|
||||
|
||||
### Escalation
|
||||
|
||||
| Severity | Response Time | Action |
|
||||
|----------|---------------|--------|
|
||||
| **P1 - PRIMARY Down** | Immediate | Activate DR |
|
||||
| **P2 - Backup Failed** | 2 hours | Retry manual |
|
||||
| **P3 - Transfer Failed** | 4 hours | Retry next night |
|
||||
|
||||
---
|
||||
|
||||
## 📚 DOCUMENTAȚIE COMPLETĂ
|
||||
|
||||
Pentru detalii tehnice complete, vezi:
|
||||
- **`STRATEGIE_BACKUP_CONTABILITATE.md`** - Strategia completă 4-level protection
|
||||
- **`PLAN_BACKUP_DR_SIMPLE.md`** - Plan tehnic detaliat original
|
||||
|
||||
---
|
||||
|
||||
## ✨ NEXT STEPS
|
||||
|
||||
1. **Citește acest README complet**
|
||||
2. **Urmează CHECKLIST IMPLEMENTARE** (secțiunea de mai sus)
|
||||
3. **Test manual** toate componentele
|
||||
4. **Monitorizare** primele 3 zile după activare
|
||||
5. **Schedule primul test restore** lunar (obligatoriu!)
|
||||
|
||||
---
|
||||
|
||||
**Ultima actualizare:** 2025-10-07
|
||||
**Status:** Production Ready
|
||||
**Versiune:** 1.0
|
||||
**Last Updated:** October 10, 2025
|
||||
**Version:** 2.0 - Complete DR System with Proxmox Integration
|
||||
**Status:** ✅ Production Ready
|
||||
Reference in New Issue
Block a user