Commit Graph

12 Commits

Author SHA1 Message Date
Marius
5750b42836 Oracle DR: Replace fixed VM boot wait with intelligent polling
Performance optimization for VM startup:

Before: Fixed 180s wait regardless of actual boot time
After: Intelligent polling with early exit when VM is ready

Implementation:
- Poll every 5 seconds (max 180s timeout)
- Check 1: VM running status in Proxmox (qm status)
- Check 2: SSH connectivity test
- Check 3: PowerShell availability (what we actually need)
- Exit immediately when all checks pass
- Progress logging every 30 seconds
- Fallback: Continue after 180s with warning

Benefits:
- Fast VM boot (30s) → saves 150s (2min 30s)
- Normal VM boot (60s) → saves 120s (2min)
- Slow VM boot → 180s (same as before)
- More robust: verifies SSH+PowerShell actually work

Average expected improvement: 60-120 seconds per test

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
2025-10-11 14:30:32 +03:00
Marius
835d8b465b Oracle DR: Fix database verification, add bash log, and collect full RMAN log
Critical fixes and improvements:

1. Database verification fix (robust):
   - Use Select-String -Quiet to get True/False boolean
   - Convert PowerShell boolean to bash-friendly format
   - Check for 'READ WRITE' in entire sqlplus output
   - Eliminates false negatives from text parsing issues

2. Collect FULL RMAN restore log:
   - Removed -Head 200 limitation
   - Now sends complete RMAN log in email
   - Better debugging with full context
   - Updated templates: "first 200 lines" → "complete"

3. Add bash script log to email notifications:
   - Include last 100 lines of bash execution log
   - Separate "RMAN Restore Log" and "Bash Script Log" sections
   - Both text and HTML templates updated
   - Shows script flow and any bash-level errors

This fixes the issue where 42,625 tables were restored successfully
but test reported FAILED due to query output format mismatch.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
2025-10-11 14:25:58 +03:00
Marius
12700261c7 Oracle DR: Fix database verification and restore log collection
Critical fixes for false negatives in DR test reporting:

1. Database verification fix:
   - Changed from 'findstr' (CMD) to 'Select-String' (PowerShell native)
   - findstr was failing in PowerShell context causing db_status to be empty
   - Result: DB with 42,625 tables was incorrectly reported as FAILED

2. Restore log collection fix:
   - Changed from 'type' (CMD) to 'Get-Content' (PowerShell native)
   - type command doesn't work through SSH PowerShell context
   - Added -ErrorAction SilentlyContinue for cleaner error handling
   - Simplified fallback logic using [-z] instead of string matching

Both issues were caused by mixing CMD commands in PowerShell context.
Now uses PowerShell-native commands throughout for consistency.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
2025-10-11 11:11:37 +03:00
Marius
3a51880c9e Oracle DR: Fix RMAN crosscheck sequence and improve error handling
- Fix CROSSCHECK BACKUP command to execute after database is mounted
- Correct CATALOG command to use recovery_area instead of F:\ path
- Add robust backup file validation with detailed error reporting
- Improve file-by-file backup copying with individual error tracking
- Enhance restore log collection for both success and failure scenarios
- Fix database verification to check OPEN_MODE instead of STATUS
- Add comprehensive directory and permissions error handling

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
2025-10-11 10:32:49 +03:00
Marius
9ed0ee9e0e Oracle DR: Add TestMode parameter for dual behavior
rman_restore_from_zero.ps1:
- Add -TestMode switch parameter
- TestMode (weekly DR test): Skip service/listener config, only verify restore works
- Standalone mode: Full config with SPFILE + Listener for production use

weekly-dr-test-proxmox.sh:
- Call restore script with -TestMode flag
- Avoids service recreation and SSH disconnect during tests

Benefits:
- Weekly tests are faster and cleaner (no service restart)
- Manual restore prepares system for production use
- No more 'Broken pipe' errors during tests

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
2025-10-11 02:33:43 +03:00
Marius
6accd1f996 Oracle DR: Fix verification commands and auto-start services
weekly-dr-test-proxmox.sh:
- Replace Unix commands (echo, grep) with PowerShell equivalents
- Use PowerShell Select-String for database status verification
- Fix table count query to work properly through SSH

rman_restore_from_zero.ps1:
- Set Oracle service to AUTOMATIC startup (was manual)
- Set Listener service to AUTOMATIC startup
- Auto-start Listener after database restore
- Add fallback to lsnrctl if service start fails

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
2025-10-11 02:03:57 +03:00
Marius
026b0436ba Oracle DR: Complete migration to PowerShell scripts and cleanup
Changes:
- Remove obsolete .cmd scripts (cleanup_database.cmd, rman_restore_from_zero.cmd, rman_restore_final.cmd)
- Update weekly-dr-test-proxmox.sh to call PowerShell scripts with /SILENT parameter
- Add initROA.ora configuration file for reference

All DR test scripts now use PowerShell for SSH compatibility
Resolves input redirection issues with Windows SSH

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
2025-10-11 01:47:39 +03:00
Marius
2f8c927bbe Oracle DR: Convert restore scripts to PowerShell for SSH compatibility
- Add cleanup_database.ps1: PowerShell version without input redirection issues
- Add rman_restore_from_zero.ps1: PowerShell version with inline SQL commands
- Update weekly-dr-test-proxmox.sh: Call .ps1 scripts via PowerShell

PowerShell scripts resolve SSH 'Input redirection not supported' errors
All SQL commands are piped directly to sqlplus (no temp files needed)

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
2025-10-11 01:26:35 +03:00
Marius
839f1b6b82 Oracle DR: Enhance notification templates with compact HTML layouts and improved data collection
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
2025-10-10 22:41:32 +03:00
Marius
6f56e61b04 Oracle DR: Fix Gmail compatibility with plain text email templates
- Convert complex HTML/CSS templates to plain text format for Gmail compatibility
- Replace decorative characters (box drawing, special symbols) with simple text
- Use single-line bullet points instead of complex table layouts
- Improve readability across all email clients (Gmail, Outlook, mobile)
- Remove HTML templates completely, use only text format
- Keep informative structure with clear section separators
- Both text and HTML templates now identical for consistency
- Critical for Gmail users who only see plain text formatting

New format works perfectly in Gmail:
Oracle Backup WARNING - pveelite
WARNING

========================================
WARNINGS:
- FULL backup is 51 hours old (threshold: 25)

========================================
BACKUP STATUS:
FULL: 51h old TOO OLD (limit: 25h)
CUMULATIVE: 4h old OK (limit: 7h)
Total: 12 files | Size: 6.3GB | Disk: 2%

========================================
Next check: 2025-10-10 + 24h | Proxmox Monitoring

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
2025-10-10 17:41:33 +03:00
Marius
b34006a499 Oracle DR: Fix template variables and complete monitoring system testing
- Fix Proxmox template compatibility: {{hostname}} → {{node}}, {{timestamp}} → {{date}}
- Remove duplicate node fields and fix JSON structure
- Complete full testing plan execution for monitoring and DR test scripts
- Validate notification system functionality with PVE::Notify
- Sync tested scripts from production back to repository

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
2025-10-10 17:17:27 +03:00
Marius
b44e3c8f9b Oracle DR: Complete cleanup and restore scripts with Proxmox integration
- Remove outdated planning documents and implementation guides
- Update README with comprehensive DR procedures and monitoring
- Enhance rman_restore_from_zero.cmd with SPFILE creation and auto-start
- Add Proxmox monitoring and weekly test scripts
- Archive old implementation documentation

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
2025-10-10 15:13:29 +03:00