Commit Graph

114 Commits

Author SHA1 Message Date
Claude Agent
a62bcb4331 feat(dr): replicate oracle-backups dataset, mirror to pve1 nightly
Convert /mnt/pve/oracle-backups from a directory on the pveelite
rootfs into a dedicated ZFS dataset rpool/oracle-backups so it can be
incrementally replicated to pvemini. zfs-replicate-oracle-backups.sh
runs every 15 minutes from cron on pveelite and uses zfs send/recv
over the cluster's internal SSH (direct IP, /etc/pve/priv/known_hosts)
to avoid Tailscale magicDNS detours that broke the first attempt.
The destination dataset is set readonly=on so accidental writes on
pvemini cannot diverge it. Snapshot pruning keeps 5 rolling copies.

nightly-backup-mirror.sh ships a third copy nightly to pve1's
backup-ssd (ext4 SATA) — different physical disk, different
filesystem, different node — guarding against the failure mode where
both pveelite and pvemini are simultaneously unavailable. The same
script tars /etc/pve and rotates 14 days of cluster config archives,
since pmxcfs is in-RAM and a multi-node quorum loss would otherwise
take cluster config with it.

The old directory is kept as oracle-backups.old-DELETE-AFTER-2026-05-02
on pveelite for one week as a safety net.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 19:00:04 +00:00
Claude Agent
62e9926bd4 feat(dr): add cluster + memory pre-flight, deploy VM 109 watchdog
DR test script now refuses to start VM 109 if:
  * cluster is not quorate (e.g. mid-failover into a degraded state),
  * available memory on the host is below VM 109 config + 1 GB margin.

Both checks scale automatically — memory threshold is computed from
qm config so resizing VM 109 does not require touching the script.

Adds vm109-watchdog.sh, scheduled cluster-wide every minute. The
watchdog is the second line of defence behind the cleanup trap from
8a0c557: it force-stops VM 109 if the trap was bypassed (script
killed, host crash mid-test, manual run forgotten). It honours
/var/run/vm109-debug.flag for legitimate manual sessions and is
node-aware via /etc/pve/qemu-server/109.conf so it can be deployed
on every node without coordinating with VM 109's current location.

Both safeguards target the 04-18 → 04-20 chain: VM 109 left running
2.5 days then sandwiched against an HA failover that pushed CT 108
Oracle (8 GB) onto pveelite (16 GB) → OOM cascade.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 18:48:12 +00:00
Claude Agent
2e8cd9ca59 fix(dr-test): guard cleanup trap + surface qm start errors
The cleanup trap added in 8a0c557 stopped VM 109 unconditionally on EXIT,
which kills the VM during --install/--help or when an operator launched
it manually for debugging. Gate the trap with DR_VM_STARTED_BY_US so it
only fires when the script itself started the VM.

Also remove the 2>/dev/null swallow on qm start so cross-node failures
(e.g. running on a node where the VM is not configured) appear in the
log instead of producing a silent "Failed to start VM 109" in 0 seconds.

Root cause for the 2026-04-25 silent failure: cron lived on pveelite
while VM 109 had been migrated to pvemini; qm start returned an error
that was hidden by the redirect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 08:47:54 +00:00
Claude Agent
8a0c557981 feat(failover): add VM 201 manual failover + recovery scripts, watchdog alert
VM 201 (Windows critical) stays out of HA by design. Added:
- failover-vm201.sh: interactive failover pvemini -> pveelite with ZFS replication state
- recover-vm201-to-pvemini.sh: interactive reverse migration with uptime + split-brain checks
- pvemini-down-alert.sh: cron watchdog on pveelite, emails full runbook after 2min DOWN

Replication RPO tightened: CT 108 + VM 201 to 5min, CT 171 to 15min.
CT 171 added to HA (ha-group-main) for continuous Claude Code access.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:40:49 +00:00
Claude Agent
1203c24d63 docs(proxmox): document HA, corosync tuning, diagnostic tools and mail relay
Following the 2026-04-20 cluster outage, the cluster README now covers
HA resource limits, corosync token tuning (10s tolerance for USB glitches),
rasdaemon/netconsole/kdump diagnostic stack on pvemini, mail relay via
mail.romfast.ro with SMTP auth, OOM alerting via cron, and swap on pveelite.

VM 109 README now clearly states it was removed from HA and is only
started by the weekly DR test script.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 11:30:46 +00:00
Claude Agent
60c27e7232 fix(vm109-dr): trap cleanup to stop VM 109 on script exit
The DR test script used set -euo pipefail, so a failing SSH
shutdown command caused the script to exit before qm stop.
On 2026-04-20 this left VM 109 running for 2.5 days and
triggered an OOM cascade when pvemini HA-failed over to
pveelite.

Adds EXIT trap that force-stops VM 109 regardless of exit
path, and makes the Step 7 SSH shutdown tolerant of failure.
Incident details: proxmox/cluster/incidents/2026-04-20-cluster-outage.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 11:16:04 +00:00
Claude Agent
11001933f2 feat(cleanup): add advisor objects, scheduler log cleanup and datafile resize
Add cleanup for WRI$_ADV_* tables (can accumulate millions of rows/GBs),
scheduler$_event_log truncate, and automatic UNDO/SYSAUX datafile resize
with progressive fallback (2G→4G→6G). Tested on Oracle 18c XE.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 16:55:58 +00:00
Claude Agent
6410339196 feat(clienti): add Oracle XE PDB recreare scripts and audit cleanup
- Complete PDB export/import workflow (16 scripts in clienti/oracle-xe-21c/import/)
- Recreare PDB script with step-by-step guide (recreare_pdb.sql)
- Universal audit cleanup script for Oracle XE 11g-21c (cleanup_audit.sql)
- Troubleshooting guide with all lessons learned (depanare-ora-12954-spatiu.md)
- Fixed: DIRECTORY grant syntax, DBMS_LOCK grant, remap_tablespace USERS:ROA,
  impdp quoted AS SYSDBA for Windows, AWR retention 8 days, datafile full path
- Updated roa-windows-setup docs with XE prevention steps and gotchas table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 16:31:17 +00:00
Claude Agent
36c6405c21 docs(infra): add space-booking deploy guide and Gitea LXC 106 docs
- Add proxmox/lxc106-gitea/README.md: app.ini editing, Docker restart,
  webhook ALLOWED_HOST_LIST fix (hairpin NAT), troubleshooting
- Add proxmox/lxc103-dokploy/docs/space-booking-app.md: full deploy
  guide with env vars, auto-seed accounts, SMTP, troubleshooting
- Update proxmox/README.md: add LXC 106 entry and quick start
- Update lxc103-dokploy/README.md: add space.roa.romfast.ro in
  domains table, ASCII architecture, and docs links

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 16:31:00 +00:00
Marius
29e88631ba feat(scripts): add PACK_CONTAFIN Oracle 10g converter
Script Python + bat care converteste automat FORALL/BULK_ROWCOUNT
din PACK_CONTAFIN.pck in FOR LOOP compatibil Oracle 10g.
Include pre/post validare, scriere atomica si diff afisare.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 15:19:36 +02:00
Marius
13114ef41d feat(oracle): add OS script execution procedures and Oracle locations finder
- ExecuteScriptOS.prc: runs PowerShell scripts via DBMS_SCHEDULER
- UpdateSQLPLUS.prc: runs SQL*Plus scripts via DBMS_SCHEDULER
- find_oracle_locations.sql: comprehensive script to discover all Oracle DB paths for backup/migration

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 14:05:46 +02:00
Claude Agent
2ca27aefc6 docs(lxc103): document Docker Swarm VIP DNS fix with dnsrr
Add section explaining the root cause (IPVS broken in LXC), the
solution (dnsrr endpoint mode), and the dokploy-dnsrr-fix systemd
service that auto-applies the fix on every Dokploy deployment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 15:46:13 +00:00
Claude Agent
fcf1e06c66 feat(infra): add Dokploy LXC 103 and new IIS web domains
- Add LXC 103 Dokploy infrastructure (v0.28.2) with Traefik
- Deploy pdf-qr-app and qr-generator via Dokploy from GitHub
- Configure IIS VM 201: roa-qr and *.roa.romfast.ro wildcard sites
- Add SSL certificates (Let's Encrypt + wildcard DNS challenge)
- Fix Docker Swarm VIP DNS issue with dnsrr endpoint mode
- Document architecture: IIS → Traefik → Dokploy containers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 15:44:53 +00:00
Claude Agent
ae325d33b6 Update LXC 110 MoltBot: OpenClaw v2026.2.9 și RAM 4GB
Modificări după upgrade MoltBot → OpenClaw:
- RAM crescut de la 2GB la 4GB (minim recomandat pentru OpenClaw)
- Versiune actualizată: OpenClaw v2026.2.9 (fost MoltBot v2026.1.24-3)
- Adăugat troubleshooting pentru OOM kill issues
- Curățate sesiuni vechi (85 → 80)

Problema rezolvată: Gateway-ul era omorât de OOM killer din cauza
memoriei insuficiente (975MB peak cu doar 2GB RAM total).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-10 08:30:48 +00:00
Marius
3402d7fffa Add LXC 110 MoltBot documentation and infrastructure setup
- Create proxmox/lxc110-moltbot/ with complete README documentation
- MoltBot AI chatbot with Telegram and WhatsApp channels
- Claude Opus 4.5 model integration via Anthropic API
- Security: dedicated moltbot user, UFW firewall, fail2ban, Tailscale SSH
- Gateway on port 18789 (loopback), token+password auth
- Update proxmox/README.md with LXC 110 quick start and navigation
- Update CLAUDE.md network layout with MoltBot entry

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 14:46:21 +02:00
Marius
f50bfcf8d8 Fix DMPDIR handling and datafile auto-detection for ROA Windows setup
- New-OracleDirectory: Improved verification with direct SQL check, preserves
  existing DMPDIR path instead of blindly recreating
- Get-DatafilePath: Better fallback logic using ORACLE_HOME to derive path,
  no longer hardcodes C:\app\oracle
- grants-public.sql: Fixed DMPDIR creation - now preserves existing path
  instead of overriding with wrong D:\Oracle\admin\ORCL\dpdump
- config.example.ps1: Added DATAFILE_DIR parameter with documentation

These fixes ensure scripts work without manual intervention on fresh Oracle XE
installations where default DMPDIR points to non-existent paths.

Tested on VM 302 - full installation (01-08) now completes successfully.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 02:38:54 +02:00
Marius
709c822e38 Fix DMPDIR path detection and add VM302 testing documentation
- New-OracleDirectory now checks if DMPDIR exists with wrong path
- If path differs from target, drops and recreates the directory
- Fixes Oracle XE issue where DMPDIR defaults to D:\Oracle\admin\ORCL\dpdump
- Added VM302-TESTING.md with complete testing workflow documentation
- Includes Proxmox VM management commands, troubleshooting, and deployment steps

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 02:09:06 +02:00
Marius
ed3f5f2c43 Simplify config.example.ps1 with default values
- SYSTEM_PASSWORD: romfastsoft
- CONTAFIN_PASSWORD: ROMFASTSOFT (uppercase)
- COMPANY_PASSWORD: ROMFASTSOFT (uppercase)
- SERVICE_NAME: XEPDB1
- DMPDIR: C:\DMPDIR
- ROAUPDATE_BASE_PATH: D:\ROAUPDATE

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 01:50:23 +02:00
Marius
ef1e40675f Fix ROA Windows setup scripts discovered during VM 302 testing
- 03-import-contafin.ps1: Auto-detect DMP file when not specified
- 05-import-companies.ps1: Default DumpDirectory to C:\DMPDIR
- 08-post-install-config.ps1: Fix SERVER_INFO column names (NAME/VALUE)

Tested full installation on VM 302 (Oracle XE 21c):
- CONTAFIN_ORACLE: 344 objects imported
- CAPIDAVATOUR: 3418 objects imported
- 54 ROAUPDATE directories created
- Scheduler jobs configured

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 01:42:39 +02:00
Marius
648342c0a8 Add post-install configuration for ROA Windows setup
New files:
- 08-post-install-config.ps1: Creates ROAUPDATE folders (54 dirs),
  Oracle DIRECTORY objects, SERVER_INFO config, scheduler jobs
- directories-roaupdate.sql: 54 UPD_* directory objects for PACK_UPDATE
- server-info-init.sql: Encoded passwords, paths, email settings
- scheduler-jobs.sql: UPDATEROA_ZILNIC, UPDATERTVAI_ZILNIC (disabled)
- auth-detalii-init.sql: Customer ID for licensing

Updates:
- RunAll.cmd: Added step 6 (08-post-install-config.ps1)
- README.md: Simplified Quick Start, single execution path (RunAll.cmd)
- 00-INSTALL-ORACLE-*.md: Removed redundant manual steps (handled by scripts)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 01:24:15 +02:00
Marius
aaf5942f6b Fix 04-create-synonyms-grants.ps1 to use SQL files instead of inline SQL
The script had inline SQL that was missing 20 synonyms compared to
synonyms-public.sql, causing PACK_DEF and other packages to fail with
missing synonym errors (SYN_VNOM_UM_ISO, SYN_ATAS_*, SYN_SAL_*, etc.).

Changes:
- Remove all inline SQL (~350 lines)
- Now runs synonyms-public.sql (81 synonyms vs 61 before)
- Now runs grants-public.sql for all grants and ACL
- Add verification of SESIUNE context

This ensures the script stays in sync with the SQL files and
prevents future desync issues.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 00:30:12 +02:00
Marius
bcb558f1dc Fix ROA Windows setup scripts for Oracle XE deployment
Key fixes:
- Add Run.cmd/RunAll.cmd wrappers with ExecutionPolicy Bypass
- Add Get-ListenerHost() to auto-detect listener IP address
- Fix impdp connection using EZConnect format (host:port/service)
- Add parallel=1 for Oracle XE compatibility
- Fix Write-Log to accept empty strings with [AllowEmptyString()]
- Fix Get-SchemaObjectCount regex for Windows line endings (\r\n)
- Fix path comparison for DMP file copy operation
- Add GRANT EXECUTE ON SYS.AUTH_PACK TO PUBLIC for PACK_DREPTURI
- Fix VAUTH_SERII view to use SYN_NOM_PROGRAME (has DENUMIRE column)
- Add sections 10-11 to grants-public.sql for SYS object grants

Tested on VM 302 (10.0.20.130) with Oracle XE 21c.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 20:56:58 +02:00
Marius
75a5daab6f Improve listener restart wait logic with active polling
- Increase wait time from 10s to max 60s after listener restart
- Add active polling every 5s to check if service is registered
- Log progress while waiting for service registration
- Fixes race condition where script proceeds before service is ready

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 18:33:16 +02:00
Marius
d00d4d94c2 Add SYS grants and master installation scripts for ROA
- run-all-sys.sql: Master script that executes sys-objects.sql,
  sys-grants.sql, and any scripts in sys-updates/ folder in order
- sys-grants.sql: Grants EXECUTE on AUTH_PACK, DBMS_SCHEDULER,
  DBMS_LOCK, UTL_* packages to CONTAFIN_ORACLE; creates public
  synonyms for SYS procedures; creates DMPDIR directory

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 18:32:02 +02:00
Marius
498025160e Fix PowerShell $Host reserved variable conflict in ROA setup scripts
- Rename $Host parameter to $DbHost in oracle-functions.ps1 (Invoke-SqlPlus,
  Test-OracleConnection, Get-OracleVersion, Test-PDB, Get-ServiceName)
- Update all function calls in 01-setup-database.ps1 to use -DbHost instead of -Host
- Fix ${Host} -> ${DbHost} in log message (line 147)
- Fix Write-Log "" -> Write-Host "" to avoid empty string parameter error
- Add DbHost/Port parameters and config.ps1 support to setup script
- Update sys-updates/README.md to clarify folder is for future patches only

Tested successfully on ROACENTRAL (10.0.20.130) with Oracle XE 21c.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 18:30:31 +02:00
Marius
a74d93f3ac Fix uninstall script: kill sessions before dropping users
- Added roa_kill_user_sessions helper procedure
- Kill all active sessions BEFORE attempting DROP USER
- Improved company user detection (also checks for synonyms to CONTAFIN_ORACLE)
- Added more Oracle 21c internal users to exclusion list
- Better error handling and output messages
- Helper procedure auto-cleanup at end

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 17:45:33 +02:00
Marius
33a3581823 Add ROA uninstall/cleanup script for testing iterations
- sql/uninstall-roa.sql: Removes all ROA objects in correct order
  - Drops company users (dynamically detected by ROA tablespace)
  - Drops CONTAFIN_ORACLE user CASCADE
  - Drops public synonyms pointing to CONTAFIN_ORACLE
  - Drops SYS custom objects (AUTH_PACK, AUTH_SERII, INFO, etc.)
  - Drops application context SESIUNE
  - Drops tablespace ROA including datafiles
- scripts/99-uninstall-roa.ps1: PowerShell wrapper with confirmation
- Updated README with uninstall documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 17:13:06 +02:00
Marius
4f51ee48f6 Add legacy ROA Oracle 10g server setup scripts (reference)
Original Oracle 10g R1/R2 setup scripts and SQL migrations from 2007-2026.
Preserved as reference for understanding ROA database structure and
historical schema evolution.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 17:08:13 +02:00
Marius
989477f7a4 Add ROA Oracle Database Windows setup scripts with old client support
PowerShell scripts for setting up Oracle 21c/XE with ROA application:
- Automated tablespace, user creation and imports
- sqlnet.ora config for Instant Client 11g/ODBC compatibility
- Oracle 21c read-only Home path handling (homes/OraDB21Home1)
- Listener restart + 10G password verifier for legacy auth
- Tested on VM 302 with CONTAFIN_ORACLE schema import

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 17:08:02 +02:00
Marius
665c2b5d37 Add Oracle 18c sqlnet.ora config for old ODBC/Instant Client 11g compatibility
- Add config/sqlnet.ora with ALLOWED_LOGON_VERSION=8 for old client support
- Add scripts/fix-sqlnet.sh startup script to persist config across container restarts
- Update README with ORA-28040 troubleshooting, ODBC connection params, and deployment instructions
- Fix SID description: Oracle 18c has PDB (XEPDB1), not non-CDB
- Update container recreation instructions with startup scripts volume

Resolves ORA-28040: No matching authentication protocol when connecting
from Windows ODBC with Oracle Instant Client 11.2

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 00:22:03 +02:00
Marius
fb474c3726 curatare 2026-01-27 23:43:20 +02:00
Marius
d2b24c1c47 Update Oracle 18c/21c export scripts and documentation
- Increase LXC 108 memory from 4GB to 8GB + 2GB swap
- Add manual startup/shutdown instructions for Oracle containers
- Document CDB/PDB architecture and correct connection strings
- Fix export-roa2.sh: use XEPDB1 PDB for Oracle 18c, separate DMPDIR
- Fix export-roa2.ps1: dual DMPDIR paths, auto-start containers
- Add container/database status checks before export
- Add TNS entries with SERVICE_NAME=XEPDB1 (not SID=XE)
- Document DBMS_CUBE_EXP warnings as harmless

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 23:42:34 +02:00
Marius
7c6e54f018 Add CLAUDE.md for Claude Code guidance
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 18:07:23 +02:00
Marius
c5936791d0 Rename claude-agent to lxc171-claude-agent with full setup documentation
- Rename proxmox/claude-agent/ to proxmox/lxc171-claude-agent/
- Move scripts to scripts/ subdirectory
- Add complete installation guide for new LXC from scratch
- Update proxmox/README.md with LXC 171 documentation and navigation
- Add LXC 171 to containers table
- Remove .serena/project.yml

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 17:54:17 +02:00
Marius
a567f75f25 Reorganize oracle/ and chatbot/ into proxmox/ per LXC/VM structure
- Move oracle/migration-scripts/ to proxmox/lxc108-oracle/migration/
- Move oracle/roa/ and oracle/roa-romconstruct/ to proxmox/lxc108-oracle/sql/
- Move oracle/standby-server-scripts/ to proxmox/vm109-windows-dr/
- Move chatbot/ to proxmox/lxc104-flowise/
- Update proxmox/README.md with new structure and navigation
- Update all documentation with correct directory references
- Remove unused input/claude-agent-sdk/ files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 17:28:53 +02:00
Marius
4d51d5b2d2 Reorganize proxmox documentation into subdirectories per LXC/VM
- Create cluster/ for Proxmox cluster infrastructure (SSH guide, HA monitor, UPS)
- Create lxc108-oracle/ for Oracle Database documentation and scripts
- Create vm201-windows/ for Windows 11 VM docs and SSL certificate scripts
- Add SSL certificate monitoring scripts (check-ssl-certificates.ps1, monitor-ssl-certificates.sh)
- Remove archived VM107 references (decommissioned)
- Update all cross-references between files
- Update main README.md with new structure and navigation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 17:02:49 +02:00
Marius
1da4c2347c Fix Oracle 10g compatibility in PACK_CONTAFIN SCRIE_JC_2007 procedure
Replace FORALL bulk operations with FOR loops to avoid PLS-00436 error
on Oracle 10.2.0.5. The older Oracle version does not support referencing
record fields from collection in FORALL statements.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 13:06:17 +02:00
Marius
a96b9b8d8b romconstruct 2026-01-15 13:00:14 +02:00
Marius
1011d9202c Fix UPS notifications and add periodic battery status emails
- Fix permission denied on log files (chown nut:nut)
- Fix upssched.conf permissions (root:nut)
- Add sudo for perl to allow PVE::Notify from user nut
- Add periodic battery status emails every minute when on battery
- Add charging status emails at 5, 10, 30 min after power restore
- Remove diacritics from all notification messages
- Update documentation with sudo and permissions setup

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 13:06:25 +02:00
Marius
ab6ac77d50 Add UPS email notifications and automatic UPS shutdown
- Add email notifications via PVE::Notify for all UPS events:
  - ONBATT: when UPS switches to battery
  - ONLINE: when power is restored
  - LOWBATT: critical battery level
  - SHUTDOWN_START/NODE/PRIMARY: during cluster shutdown
  - COMMBAD: communication lost with UPS

- Add automatic UPS shutdown command after cluster shutdown
  (protects against power surge when power returns)

- Update upssched.conf with ONLINE handler and immediate ONBATT notification

- Add notification templates for HTML and text emails

- Update documentation with new features and timer configuration

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 20:11:30 +02:00
Marius
e0f84298e9 curatare 2026-01-11 15:29:12 +02:00
Marius
00c6410dbd Document VM 201 power outage incident and update HA configuration
- Add troubleshooting guide for 2026-01-11 power outage incident
- Update vm201-windows11.md with correct storage details (disk-1, disk-3)
- Remove HA configuration, document manual failover procedure
- Add ZFS replication status and commands
- Document lessons learned: ISO attachments block migration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 15:28:00 +02:00
Marius
594b77e449 Add Claude Agent LXC setup and workflow scripts
- Create LXC 171 (claude-agent) on pveelite with Ubuntu 24.04
- Install Node.js 20.x, Claude Code, tmux, Tailscale
- Configure SSH access and Gitea integration
- Add workflow scripts: start-agent.sh, work.sh, new-task.sh, finish-task.sh
- Add code-server for mobile file browsing
- Document complete setup in proxmox/claude-agent/README.md

LXC Details:
- IP internal: 10.0.20.171
- IP Tailscale: 100.95.55.51
- code-server: port 8080

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 18:53:23 +02:00
Marius
f01341a707 Add Claude Code configuration
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 01:03:23 +02:00
Marius
42f5d5ac85 Add chatbot infrastructure documentation
Document Flowise and ngrok configuration on LXC 104, including
troubleshooting steps for CORS and version issues.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 01:02:50 +02:00
Marius
91b9e08e9d adunare generala 2025-12-20 22:23:11 +02:00
Marius
90d77704d6 Reorganize Proxmox documentation with clear structure and VM/LXC mapping
## Changes

### Documentation Reorganization
- **README.md**: Complete restructure with logical sections
  - Infrastructure General (proxmox-ssh-guide.md)
  - LXC Containers (oracle-database-lxc108.md)
  - Virtual Machines (vm201-*.md)
  - Cluster-Wide Resources (cluster-ha-monitor.sh, ups/)
  - Archived/Decommissioned (archived-vm107-monitor.sh)
  - Added quick navigation "Am nevoie să..." section
  - Added recommended workflows
  - Added complete directory structure map

- **proxmox-ssh-guide.md**: Added documentation references section
  - Clear links to all related documentation
  - When to use each document
  - Quick start snippets for each resource

### File Renames for Clarity
- `certificat-letsencrypt-iis.md` → `vm201-certificat-letsencrypt-iis.md`
- `troubleshooting-vm201-backup-nfs.md` → `vm201-troubleshooting-backup-nfs.md`
- `ha-monitor.sh` → `cluster-ha-monitor.sh`
- `vm107-monitor.sh` → `archived-vm107-monitor.sh`

### New Documentation
- **vm201-windows11.md**: Complete VM 201 documentation
  - Hardware configuration
  - Installed services (IIS, SQL*Plus, WinNUT, RDP)
  - Network configuration
  - Backup and recovery procedures
  - Common troubleshooting

## Benefits
- Clear naming convention: VM/LXC/Cluster prefixes
- Central index in README.md with navigation
- Cross-references between documents
- Complete VM 201 documentation suite
- Clear archival of decommissioned resources

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 13:43:44 +02:00
Marius
cd7b2ed9e7 Clarify storage configuration and fix node names
Storage Configuration improvements:
- Add "Noduri" column showing which nodes have access to each storage
- Clarify that 'local' is separate on each node (non-shared)
- Clarify that 'local-zfs' is shared across pvemini, pve1, pveelite
- Clarify that 'backup' is only on pvemini (10.0.20.201)
- Add detailed explanations for each storage type
- Add storage paths section with important locations

Node name corrections:
- Fix node name: pve2 → pveelite (correct cluster name)
- Update all references across proxmox-ssh-guide.md and README.md
- Add node descriptions in tables for clarity

Benefits:
- Users now know exactly which storage is available on which nodes
- Clear distinction between shared and non-shared storage
- Correct node naming throughout documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 13:01:08 +02:00
Marius
f1b982794b Reorganize Oracle and Proxmox documentation structure
- Move oracle/CONEXIUNI-ORACLE.md → proxmox/oracle-database-lxc108.md
- Create proxmox/README.md as documentation index
- Update proxmox-ssh-guide.md:
  * Remove VM 107 references (decommissioned)
  * Update LXC and VM tables with IP addresses
  * Add IP address map for all services
  * Simplify Oracle section (detailed info in oracle-database-lxc108.md)
  * Update backup job configuration

Benefits:
- All infrastructure docs in proxmox/ directory
- Clear separation: general Proxmox (proxmox-ssh-guide.md) vs Oracle-specific (oracle-database-lxc108.md)
- No duplicate information between files
- Easy navigation with README.md index

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 12:58:22 +02:00
Marius
b4c2a24281 Fix Oracle DR test ORA-00600 error by forcing service shutdown in cleanup
Problem: DR weekly test failed with ORA-00600 [kcbzib_kcrsds_1] when executed
via cron, but succeeded when run manually. Error occurred during "ALTER DATABASE
OPEN RESETLOGS" step after successful restore and recovery.

Root Cause Analysis:
- Manual test (12:09): Undo initialization = 0ms, no errors
- Cron test (10:45): Undo initialization = 2735ms, ORA-00600 crash
- Alert log showed: "Undo initialization recovery: err:600"
- Oracle instance was in inconsistent state from previous run

The cleanup_database.ps1 script had an "optimization" that preserved the
running Oracle service to "save ~30s startup time". This left the service
in an inconsistent state between test runs, causing Oracle to crash when
attempting to open the database with RESETLOGS.

Solution:
Modified cleanup_database.ps1 to ALWAYS stop Oracle service completely:
1. SHUTDOWN ABORT the instance (not just when /AFTER flag)
2. Stop-Service OracleServiceROA (force clean state)
3. Kill remaining oracle processes
4. Service starts fresh during restore (clean Undo initialization)

Changes:
- Removed if/else branch that skipped shutdown before restore
- Always perform full shutdown regardless of /AFTER parameter
- Updated messages to reflect clean state approach
- Added explanation: "This ensures no state inconsistencies (prevents ORA-00600)"

Testing: Manual test confirmed clean 0ms Undo initialization after fix.

Related: Works in conjunction with weekly-dr-test-proxmox.sh PATH fix (commit 34f91ba)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 12:25:38 +02:00