- Clone romfastsql repo local pe /home/moltbot/workspace/romfastsql/ - Fix: LXC 171 e pe pvemini, nu pveelite - Adaug secțiuni lipsă: HA groups, corosync token tuning (post-incident 2026-04-20) - Diagnostic tools: rasdaemon, netconsole, kdump-tools - OOM alerting, mail notifications, swap pveelite Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
17 KiB
Infrastructură (Proxmox + Docker)
Ultima actualizare: 2026-04-26. Sync cu romfastsql/proxmox/ din Gitea. Repo clonat local:
/home/moltbot/workspace/romfastsql/(HTTPS, fără SSH key) Documentație detaliată per LXC/VM:romfastsql/proxmox/<component>/README.md
Acces rapid LXC
| ID | Nume | Nod | IP | SSH direct | Via Proxmox |
|---|---|---|---|---|---|
| 100 | portainer | pvemini | 10.0.20.170 | ssh echo@10.0.20.170 |
ssh echo@10.0.20.201 "sudo pct exec 100 -- bash" |
| 101 | minecraft | pveelite | 10.0.20.162 | ssh echo@10.0.20.162 |
ssh echo@10.0.20.202 "sudo pct exec 101 -- bash" |
| 103 | dokploy | pvemini | 10.0.20.167 | ssh echo@10.0.20.167 |
ssh echo@10.0.20.201 "sudo pct exec 103 -- bash" |
| 104 | flowise | pvemini | 10.0.20.161 | ❌ (publickey only) | ssh echo@10.0.20.201 "sudo pct exec 104 -- bash" |
| 106 | gitea | pvemini | 10.0.20.165 | — | ssh echo@10.0.20.201 "sudo pct exec 106 -- sh" ⚠️ Alpine (sh, nu bash) |
| 108 | central-oracle | pvemini | 10.0.20.121 | ssh echo@10.0.20.121 |
ssh echo@10.0.20.201 "sudo pct exec 108 -- bash" |
| 110 | moltbot | pveelite | 10.0.20.173 | ssh moltbot@10.0.20.173 |
ssh echo@10.0.20.202 "sudo pct exec 110 -- bash" |
| 171 | claude-agent | pvemini ⚠️ | 10.0.20.171 | ssh claude@10.0.20.171 |
ssh echo@10.0.20.201 "sudo pct exec 171 -- bash" |
LXC 100 — portainer (pvemini)
- IP: 10.0.20.170 | OS: Debian/systemd | Tailscale: Da
- Resurse: 4GB RAM (414MB used) | 20GB disk (4GB used, 20%)
- Portainer UI: https://10.0.20.170:9443
- Proiecte docker-compose:
/opt/docker/
Containere Docker:
| Container | Port extern | Status | Descriere |
|---|---|---|---|
| portainer | 9443 | ✅ healthy | Management Docker |
| hbbs | 21115-21116, 21118 | ✅ | RustDesk relay (STUN) |
| hbbr | 21117, 21119 | ✅ | RustDesk relay (TURN) |
| pulse | 7655 | ✅ healthy | Monitoring Proxmox |
| wol-manager | — | ✅ | Wake-on-LAN |
| bt-web-automation | 5000, 8081→8080 | ✅ | BT automation |
| roa-efactura | 5003→5000 | ⚠️ unhealthy | E-Factura ANAF |
| pdf-qr-app | 5002→5000 | ✅ healthy | QR facturi |
| docker-flask_app-1 | 5001→5000 | ✅ | ROA Flask |
Depanare:
# Logs container
ssh echo@10.0.20.201 "sudo pct exec 100 -- docker logs <container> --tail 50"
# Restart container
ssh echo@10.0.20.201 "sudo pct exec 100 -- docker restart <container>"
# Status toate
ssh echo@10.0.20.201 "sudo pct exec 100 -- docker ps -a"
LXC 101 — minecraft (pveelite)
- IP: 10.0.20.162 | OS: Debian/systemd | Tailscale: Nu
- Resurse: 8GB RAM (3.8GB used) | 100GB disk (49GB used, 49%)
Servicii:
| Serviciu | Port | Descriere |
|---|---|---|
| crafty | 8443 | Crafty4 web panel (Python) |
| minecraft | 25565 | Server Minecraft (Java) |
| playit | — | Tunel public pentru Minecraft |
Depanare:
ssh echo@10.0.20.202 "sudo pct exec 101 -- systemctl status crafty"
ssh echo@10.0.20.202 "sudo pct exec 101 -- journalctl -u crafty -n 50"
LXC 103 — dokploy (pvemini)
- IP: 10.0.20.167 | OS: Debian/systemd | Tailscale: Da
- Resurse: 4GB RAM (1.1GB used) | 50GB disk (5.8GB used, 12%)
- Dokploy UI: http://10.0.20.167:3000
Containere Docker (managed by Dokploy + Traefik):
| Container | Port | Status | Descriere |
|---|---|---|---|
| dokploy-traefik | 80, 443 | ✅ | Reverse proxy |
| dokploy | 3000 | ✅ healthy | Deployment platform |
| dokploy-postgres | 5432 (intern) | ✅ | DB Dokploy |
| dokploy-redis | 6379 (intern) | ✅ | Cache Dokploy |
| utile-icongenerator | 80 (intern) | ✅ | Icon generator |
| qr-qrgenerator | 80 (intern) | ✅ | QR generator |
| qr-pdfqrapp | — | ✅ | PDF+QR app |
| constanta-space-booking-backend | 8000 (intern) | ✅ | Space booking API |
Depanare:
ssh echo@10.0.20.201 "sudo pct exec 103 -- docker ps -a"
ssh echo@10.0.20.201 "sudo pct exec 103 -- docker logs <container> --tail 50"
LXC 104 — flowise (pvemini)
- IP: 10.0.20.161 | OS: Debian/systemd | Tailscale: Da
- Resurse: 8GB RAM (418MB used) | 100GB disk (23GB used, 23%)
- SSH direct: ❌ (nu merge cu user echo, doar via pct exec)
Servicii:
| Serviciu | Port | Status | Descriere |
|---|---|---|---|
| ollama | 127.0.0.1:11434 | ✅ | LLM local (CPU-only, avx2) |
| flowise | 3000 | ✅ | Flow builder AI |
| ngrok | — | ✅ | Tunel public |
Ollama — modele disponibile:
all-minilm:latest— embeddings rapid ← folosit de echo-core memory_searchnomic-embed-text:latest— embeddings calitatellama3.2:3b-instruct-q8_0— LLM conversațiellama3.2:3b,llama3.2:1b— LLM generalsmollm:135m— LLM mic rapid
Note importante:
- Modele stocate în
/usr/share/ollama/.ollama/models/(userollama) - Serviciul ollama rulează ca user
ollama, cuHOME=/usr/share/ollama - CPU-only — fără GPU; fără CUDA/ROCm
Depanare Ollama:
# Status
ssh echo@10.0.20.201 "sudo pct exec 104 -- systemctl status ollama"
# Logs (problemă frecventă: $HOME undefined sau permisiuni)
ssh echo@10.0.20.201 "sudo pct exec 104 -- journalctl -u ollama -n 30"
# Fix permisiuni (dacă ollama nu pornește)
ssh echo@10.0.20.201 "sudo pct exec 104 -- chown -R ollama:ollama /usr/share/ollama/.ollama/"
# Test API
ssh echo@10.0.20.201 "sudo pct exec 104 -- curl -s http://localhost:11434/api/tags"
# Pull model
ssh echo@10.0.20.201 "sudo pct exec 104 -- ollama pull all-minilm"
LXC 106 — gitea (pvemini)
- IP: 10.0.20.165 | OS: Alpine Linux + OpenRC ⚠️ (nu systemd, nu bash!)
- Resurse: disk 250GB (1.1GB used, 0%)
- Gitea web: http://10.0.20.165:3000 (sau gitea.romfast.ro)
- Gitea SSH: port 222
Particularități Alpine:
- Shell:
sh(nu bash) —pct exec 106 -- sh - Init: OpenRC (nu systemd) —
rc-status, nusystemctl - Gitea rulează prin Docker + s6, nu nativ
Servicii OpenRC:
| Serviciu | Status | Descriere |
|---|---|---|
| networking | ✅ | Rețea |
| tailscale | ✅ | VPN |
| crond | ✅ | Cron |
| tailscale-gitea | ❌ CRASHED | Script Tailscale custom — de investigat |
Depanare:
# Acces (folosește sh, nu bash!)
ssh echo@10.0.20.201 "sudo pct exec 106 -- sh -c 'rc-status'"
ssh echo@10.0.20.201 "sudo pct exec 106 -- sh -c 'docker ps'"
# Logs tailscale-gitea
ssh echo@10.0.20.201 "sudo pct exec 106 -- sh -c 'cat /var/log/tailscale-gitea.log 2>/dev/null || rc-service tailscale-gitea status'"
LXC 108 — central-oracle (pvemini)
- IP: 10.0.20.121 | OS: Debian/systemd | Tailscale: Nu
- Resurse: 8GB RAM (4.2GB used) | 50GB disk (15GB used, 29%)
Containere Docker:
| Container | Port | Status | Descriere |
|---|---|---|---|
| oracle-xe | 1521, 5500 (EM Express) | ✅ healthy | Oracle XE principal |
| oracle18-xe | 1522→1521, 5502→5500 | ✅ | Oracle 18 XE |
| portainer | 9000, 9443, 8000 | ✅ | Management local |
Depanare Oracle:
# Status
ssh echo@10.0.20.201 "sudo pct exec 108 -- docker ps -a"
# Logs Oracle
ssh echo@10.0.20.201 "sudo pct exec 108 -- docker logs oracle-xe --tail 50"
# Intră în container Oracle
ssh echo@10.0.20.201 "sudo pct exec 108 -- docker exec -it oracle-xe bash"
LXC 110 — moltbot (pveelite)
- IP: 10.0.20.173 | Tailscale IP: 100.120.119.70 | OS: Debian/systemd | Tailscale: Da
- Resurse: 4GB RAM | 8GB disk (local-zfs) | 2 cores
- SSH direct:
ssh moltbot@10.0.20.173(user dedicat non-root) - Acesta este LXC-ul pe care rulează echo-core (OpenClaw)
Servicii:
| Serviciu | Port | Descriere |
|---|---|---|
| code-server@moltbot | 8080 | VS Code în browser |
| ttyd | 7681 | Web terminal |
| echo-core dashboard | 8088 | Echo Task Board |
| whatsapp-bridge | 8098 | Baileys bridge (Node.js) |
| fail2ban | — | Protecție SSH |
LXC 171 — claude-agent (pveelite)
- IP: 10.0.20.171 | Tailscale: 100.95.55.51 | OS: Ubuntu 24.04 LTS/systemd
- Resurse: 4GB RAM | 32GB disk (local-zfs) | 2 cores
- User principal:
claude| Workspace:/workspace/
Servicii:
| Serviciu | Port | Descriere |
|---|---|---|
| code-server@claude | 8080 | VS Code (user: claude) |
| ttyd | 7681 | Web terminal (/workspace/start-agent.sh, auth: claude:claude2025) |
Claude Code:
- Instalat și configurat, Git →
gitea.romfast.ro - Mod programatic:
claude -p "task"din directorul proiectului
Proiecte în /workspace/ → detalii complete în kb/tools/claude-agent-projects.md
| Proiect | Stack | Scop |
|---|---|---|
| roa2web | FastAPI + Vue.js + Oracle | ERP web modern ROA |
| roaauto | Vue 3 + wa-sqlite + FastAPI | PWA service auto (offline-first) |
| vfp_roaauto | Visual FoxPro (legacy) | ROA AUTO versiunea VFP |
| romfastsql | Docs + SQL + Python | Infrastructură + Oracle migrare |
| gomag-vending | FastAPI + Oracle PL/SQL | Import comenzi GoMag → ROA |
| space-booking | FastAPI + SQLite + Vue | Rezervări birouri multi-tenant |
| service-auto | Vue 3 + Vite + Tailwind 4 | PWA service auto (versiune nouă) |
| atm | Python 3.11+ | Automated Trading Monitor (M2D) |
| paula-escape | HTML | Escape room joc |
Depanare:
ssh echo@10.0.20.202 "sudo pct exec 171 -- systemctl status code-server@claude ttyd"
ssh echo@10.0.20.202 "sudo pct exec 171 -- df -h /"
VM 201 — roacentral (pvemini)
- VMID: 201 | Host: pvemini | Status: Running (autostart)
- OS: Windows 11 Pro (24H2) | QEMU Guest Agent: Da
- Resurse: 2 cores | 4GB RAM | 500GB disk (local-zfs, ~89GB used)
- Network: virtio bridge (DHCP) | RDP: port 3389
Rol principal — Reverse Proxy IIS:
| Domeniu | Destinație |
|---|---|
| roa.romfast.ro | aplicație ROA |
| gitea.romfast.ro | LXC 106 |
| dokploy.romfast.ro | LXC 103 Traefik |
| roa-qr.romfast.ro | LXC 103 Traefik |
| *.roa.romfast.ro | Dokploy wildcard |
Servicii instalate:
- IIS 10.0 — ASP.NET 4.8, WebSockets, URL Rewrite, SSL termination
- Win-ACME v2.2.9 — certificate Let's Encrypt automate
- Oracle Instant Client — JDBC client pentru LXC 108
- WinNUT — UPS monitor (NUT server: 10.0.20.201:3493)
Backup & Replicare:
- Backup zilnic 02:00 (zstd comprimat)
- ZFS replication activă: pvemini → pve1 + pveelite (interval 30 min)
- HA dezactivat — pornire manuală la failover
VM 109 — oracle-dr (pveelite)
- VMID: 109 | Host: pveelite | Status: Stopped (pornit doar pentru DR/test)
- IP: 10.0.20.37 | OS: Windows Server + Oracle 19c
- HA group: ha-prefer-pveelite | state=stopped, nofailback=1
- Scop: Disaster Recovery pentru Oracle Database (backup RMAN de pe server Windows extern)
Oracle Database:
- DB Name: ROA | Dimensiune: ~80 GB | Tabele: 42.625
- Strategie: full backup zilnic (6-7 GB) + cumulative incremental (200-300 MB)
Schedule backup RMAN:
| Oră | Tip |
|---|---|
| 02:30 | Full backup |
| 13:00 | Cumulative incremental |
| 18:00 | Cumulative incremental |
| 09:00 | Monitorizare automată |
Depanare:
ssh echo@10.0.20.202 "sudo qm status 109"
VM 302 — oracle-test (pvemini)
- VMID: 302 | Host: pvemini | Status: Stopped (test on-demand)
- IP: 10.0.20.130 | OS: Windows 11
- Resurse: 4GB RAM | 500GB disk
- Scop: Mediu de test pentru scripturi instalare ROA pe Windows cu Oracle 21c XE
Oracle Configuration:
- Ediție: Oracle 21c XE (CDB/PDB) | Port: 1521 | Service: XEPDB1
- Setup dir:
C:\roa-setup\| DMP files:C:\DMPDIR\ - Instalare completă: ~8 minute
Depanare:
ssh echo@10.0.20.201 "sudo qm status 302"
Server Windows extern — producție
| Mașină | IP | Port | Rol |
|---|---|---|---|
| Oracle producție | 10.0.20.36 | 1521 | Oracle 10g Windows, baza de date principală ROA |
Proxmox Noduri
Versiune: Proxmox VE 8.4.14 | Cluster: romfast (3 noduri, quorum activ)
User: echo | Acces SSH: ssh echo@<IP> | Sudo: qm, pct, pvesh
Storage cluster:
| Storage | Tip | Capacitate | Scop |
|---|---|---|---|
| local-zfs | ZFS Pool | 1.75 TiB | Diskuri VM/LXC |
| backup | Directory | 1.79 TiB | Backup-uri (pvemini only) |
| local | Directory | 1.51 TiB | ISO-uri și template-uri |
pvemini (10.0.20.201) — host principal
- Resurse: 64GB RAM, 1.4TB disk
- LXC-uri: 100(running), 103(running), 104(running), 105(stopped), 106(running), 108(running), 171(running)
- VM-uri: 201(running), 300(stopped — Windows 11 template), 302(stopped — oracle test)
- Backup zilnic 02:00: LXC 100, 104, 106, 108, VM 201 → storage "backup"
Scripturi /opt/scripts/:
ha-monitor.sh— zilnic 00:00, status cluster HAmonitor-ssl-certificates.sh— verifică SSL-uriups-shutdown-cluster.sh— shutdown orchestrat la UPS criticups-monthly-test.sh— 1 ale lunii, test baterie UPSups-maintenance-shutdown.sh— shutdown mentenanță UPSvm107-monitor.sh— monitorizare VM 107
pveelite (10.0.20.202)
- Resurse: 16GB RAM, 557GB disk (+ 8GB ZFS swap — adăugat 2026-04-20 anti-OOM)
- LXC-uri: 101(running), 105(stopped), 110(running), 301(stopped)
- VM-uri: 109(stopped — oracle DR)
- Backup zilnic 22:00: LXC 101, 110 → backup-pvemini-nfs
Scripturi /opt/scripts/:
oracle-backup-monitor-proxmox.sh— zilnic 21:00, verifică backup Oracleweekly-dr-test-proxmox.sh— sâmbătă 06:00, test restore Oracle DR (VM 109)
pve1 (10.0.20.200)
- Resurse: 32GB RAM, 1.3TB disk
- Status: Gol (fără VM/LXC activ)
Servicii LLM/AI locale
| Serviciu | LXC | IP:Port | Note |
|---|---|---|---|
| Ollama | 104 flowise | 10.0.20.161:11434 | CPU-only; modele: all-minilm, nomic-embed-text, llama3.2 |
| Flowise | 104 flowise | 10.0.20.161:3000 | Flow builder AI |
High Availability (HA)
Grupuri HA:
ha-group-main → pvemini (100), pveelite (50), pve1 (33)
ha-group-elite → pveelite (100), pve1 (33), pvemini (50)
Resurse HA active:
| Resursă | Grup | Max restart | Max relocate | Notă |
|---|---|---|---|---|
| ct:100 portainer | ha-group-main | 3 | 3 | |
| ct:101 minecraft | ha-group-elite | 3 | 3 | Rulează pe pveelite |
| ct:104 flowise | ha-group-main | 3 | 2 | Limite adăugate 2026-04-20 |
| ct:106 gitea | ha-group-main | 3 | 3 | |
| ct:108 central-oracle | ha-group-main | 3 | 2 | Limite adăugate 2026-04-20 |
VM 109 NU mai e în HA — scos 2026-04-20 după buclă OOM. Pornit exclusiv manual (DR test săptămânal sâmbătă 06:00).
# Verificare HA
ssh echo@10.0.20.201 "sudo ha-manager status"
# Modificare limite (exemplu)
ssh echo@10.0.20.201 "sudo ha-manager set ct:108 --max_restart 3 --max_relocate 2"
Corosync Tuning (post-incident 2026-04-20)
Token mărit la 10000ms (default: 1000ms) — tolerează USB disconnect scurt pe pveelite fără reboot forțat.
# Verificare
ssh echo@10.0.20.201 "sudo corosync-cmapctl | grep 'totem.token '"
# runtime.config.totem.token (u32) = 10650
# totem.token (u32) = 10000
Diagnostic Tools (instalate 2026-04-20)
rasdaemon — MCE + PCIe AER monitoring
ssh echo@10.0.20.201 "sudo ras-mc-ctl --summary"
netconsole — kernel logs → pve1
Dacă pvemini crashează hard, ultimele linii kernel se găsesc pe pve1:
ssh echo@10.0.20.200 "sudo tail /var/log/netconsole-pvemini.log"
ssh echo@10.0.20.200 "sudo systemctl status netconsole-receiver"
kdump-tools — captură crash dump
ssh echo@10.0.20.201 "sudo systemctl is-active kdump-tools"
# Dump-uri la crash: /var/crash/ pe pvemini
kernel.panic auto-reboot
ssh echo@10.0.20.201 "sudo sysctl kernel.panic"
# kernel.panic = 10 → auto-reboot după 10s la kernel panic
OOM Alerting
Script /opt/scripts/oom-alert.sh pe toate 3 nodurile — cron la 1 minut — trimite mail la mmarius28@gmail.com dacă detectează OOM kill.
# Verificare instalat pe toate nodurile
for ip in 10.0.20.200 10.0.20.201 10.0.20.202; do
ssh echo@$ip "sudo crontab -l | grep oom-alert"
done
Mail Notifications (Proxmox → mail.romfast.ro)
Toate 3 nodurile trimit prin mail.romfast.ro:465 cu ups@romfast.ro.
# Test rapid
ssh echo@10.0.20.201 "echo 'test' | sudo mail -r 'ups@romfast.ro' -s 'test pvemini' mmarius28@gmail.com"
ssh echo@10.0.20.201 "sudo journalctl -u 'postfix@-' --since '1 min ago' | grep status="
# Trebuie: status=sent (250 OK ...)
Swap pe pveelite (8GB ZFS zvol)
Adăugat 2026-04-20 anti-OOM (pveelite are 16GB RAM).
ssh echo@10.0.20.202 "sudo swapon --show; sudo sysctl vm.swappiness"
# swappiness: 10 (swap doar sub presiune reală)
Alertă automată când
- Container/VM down neașteptat
- Disk >85% utilizare pe orice container/VM
- Serviciu
unhealthy>1h - Erori repetate în logs
Acțiunez singur (fără să întreb)
- Monitorizare și citire status
- Diagnozare: logs, configurații, health checks
- Fix-uri safe: permisiuni, restart servicii
Întreb întâi
- Start/Stop VM sau LXC
- Modificări configurare (network, storage, resurse)
- Orice operație distructivă