Critical fix based on user analysis:
PROBLEM:
Cleanup is called in 2 contexts with different requirements:
1. BEFORE restore (from rman_restore): Should NOT shutdown
2. AFTER restore (from weekly-test): MUST shutdown to delete files
USER INSIGHT:
"Why shutdown if restore will clean anyway? But AFTER restore,
you MUST shutdown to release file locks for deletion!"
SOLUTION:
Add /AFTER parameter to cleanup_database.ps1:
WITHOUT /AFTER (before restore):
- Skip SHUTDOWN ABORT
- Skip Stop-Service
- Leave service in current state (running/stopped)
- Files CAN be deleted (no lock before restore)
- Optimization: If service running → restore saves ~30s
WITH /AFTER (after restore):
- SHUTDOWN ABORT (stop instance)
- Stop-Service (release file locks)
- REQUIRED for file deletion after restore
- Files are locked by active instance/service
CALL SITES:
1. rman_restore: cleanup_database.ps1 /SILENT (no /AFTER)
2. weekly-test: cleanup_database.ps1 /SILENT /AFTER (with /AFTER)
FLOW OPTIMIZATION:
Test 1: Service stopped → start(30s) → restore → cleanup /AFTER
Test 2: Service stopped → start(30s) → restore → cleanup /AFTER
→ No improvement yet
BUT if we keep service running between tests:
Test 1: Service stopped → start(30s) → restore → cleanup /AFTER
Test 2: Service running → restore(0s saved!) → cleanup /AFTER
→ Save 30s on subsequent tests!
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Performance optimization for VM startup:
Before: Fixed 180s wait regardless of actual boot time
After: Intelligent polling with early exit when VM is ready
Implementation:
- Poll every 5 seconds (max 180s timeout)
- Check 1: VM running status in Proxmox (qm status)
- Check 2: SSH connectivity test
- Check 3: PowerShell availability (what we actually need)
- Exit immediately when all checks pass
- Progress logging every 30 seconds
- Fallback: Continue after 180s with warning
Benefits:
- Fast VM boot (30s) → saves 150s (2min 30s)
- Normal VM boot (60s) → saves 120s (2min)
- Slow VM boot → 180s (same as before)
- More robust: verifies SSH+PowerShell actually work
Average expected improvement: 60-120 seconds per test
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Critical fixes and improvements:
1. Database verification fix (robust):
- Use Select-String -Quiet to get True/False boolean
- Convert PowerShell boolean to bash-friendly format
- Check for 'READ WRITE' in entire sqlplus output
- Eliminates false negatives from text parsing issues
2. Collect FULL RMAN restore log:
- Removed -Head 200 limitation
- Now sends complete RMAN log in email
- Better debugging with full context
- Updated templates: "first 200 lines" → "complete"
3. Add bash script log to email notifications:
- Include last 100 lines of bash execution log
- Separate "RMAN Restore Log" and "Bash Script Log" sections
- Both text and HTML templates updated
- Shows script flow and any bash-level errors
This fixes the issue where 42,625 tables were restored successfully
but test reported FAILED due to query output format mismatch.
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Critical fixes for false negatives in DR test reporting:
1. Database verification fix:
- Changed from 'findstr' (CMD) to 'Select-String' (PowerShell native)
- findstr was failing in PowerShell context causing db_status to be empty
- Result: DB with 42,625 tables was incorrectly reported as FAILED
2. Restore log collection fix:
- Changed from 'type' (CMD) to 'Get-Content' (PowerShell native)
- type command doesn't work through SSH PowerShell context
- Added -ErrorAction SilentlyContinue for cleaner error handling
- Simplified fallback logic using [-z] instead of string matching
Both issues were caused by mixing CMD commands in PowerShell context.
Now uses PowerShell-native commands throughout for consistency.
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Fix CROSSCHECK BACKUP command to execute after database is mounted
- Correct CATALOG command to use recovery_area instead of F:\ path
- Add robust backup file validation with detailed error reporting
- Improve file-by-file backup copying with individual error tracking
- Enhance restore log collection for both success and failure scenarios
- Fix database verification to check OPEN_MODE instead of STATUS
- Add comprehensive directory and permissions error handling
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
rman_restore_from_zero.ps1:
- Add -TestMode switch parameter
- TestMode (weekly DR test): Skip service/listener config, only verify restore works
- Standalone mode: Full config with SPFILE + Listener for production use
weekly-dr-test-proxmox.sh:
- Call restore script with -TestMode flag
- Avoids service recreation and SSH disconnect during tests
Benefits:
- Weekly tests are faster and cleaner (no service restart)
- Manual restore prepares system for production use
- No more 'Broken pipe' errors during tests
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
weekly-dr-test-proxmox.sh:
- Replace Unix commands (echo, grep) with PowerShell equivalents
- Use PowerShell Select-String for database status verification
- Fix table count query to work properly through SSH
rman_restore_from_zero.ps1:
- Set Oracle service to AUTOMATIC startup (was manual)
- Set Listener service to AUTOMATIC startup
- Auto-start Listener after database restore
- Add fallback to lsnrctl if service start fails
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Changes:
- Remove obsolete .cmd scripts (cleanup_database.cmd, rman_restore_from_zero.cmd, rman_restore_final.cmd)
- Update weekly-dr-test-proxmox.sh to call PowerShell scripts with /SILENT parameter
- Add initROA.ora configuration file for reference
All DR test scripts now use PowerShell for SSH compatibility
Resolves input redirection issues with Windows SSH
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add cleanup_database.ps1: PowerShell version without input redirection issues
- Add rman_restore_from_zero.ps1: PowerShell version with inline SQL commands
- Update weekly-dr-test-proxmox.sh: Call .ps1 scripts via PowerShell
PowerShell scripts resolve SSH 'Input redirection not supported' errors
All SQL commands are piped directly to sqlplus (no temp files needed)
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Convert complex HTML/CSS templates to plain text format for Gmail compatibility
- Replace decorative characters (box drawing, special symbols) with simple text
- Use single-line bullet points instead of complex table layouts
- Improve readability across all email clients (Gmail, Outlook, mobile)
- Remove HTML templates completely, use only text format
- Keep informative structure with clear section separators
- Both text and HTML templates now identical for consistency
- Critical for Gmail users who only see plain text formatting
New format works perfectly in Gmail:
Oracle Backup WARNING - pveelite
WARNING
========================================
WARNINGS:
- FULL backup is 51 hours old (threshold: 25)
========================================
BACKUP STATUS:
FULL: 51h old TOO OLD (limit: 25h)
CUMULATIVE: 4h old OK (limit: 7h)
Total: 12 files | Size: 6.3GB | Disk: 2%
========================================
Next check: 2025-10-10 + 24h | Proxmox Monitoring
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Fix Proxmox template compatibility: {{hostname}} → {{node}}, {{timestamp}} → {{date}}
- Remove duplicate node fields and fix JSON structure
- Complete full testing plan execution for monitoring and DR test scripts
- Validate notification system functionality with PVE::Notify
- Sync tested scripts from production back to repository
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Remove outdated planning documents and implementation guides
- Update README with comprehensive DR procedures and monitoring
- Enhance rman_restore_from_zero.cmd with SPFILE creation and auto-start
- Add Proxmox monitoring and weekly test scripts
- Archive old implementation documentation
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>