T6 — worker supravegheat:
- app/worker/healthcheck.py: probe pe heartbeat-ul din DB (beat invechit -> exit 1).
Prinde worker-ul agatat (proces viu, beat inghetat) pe care restart:always nu-l
vede. Cablat ca healthcheck pe serviciul worker in compose.
- sidecar autoheal: restarteaza efectiv containerul unhealthy (compose simplu doar
marcheaza, nu restarteaza la unhealthy).
T7 — deploy:
- tools/backup.py: backup ONLINE via Connection.backup (WAL nu se copiaza sigur cu
cp); --keep N roteste snapshot-urile.
- .env.example documenteaza env-urile; volum persistent numit deja in compose.
Fix critic (split api/worker in 2 containere): AUTOPASS_CREDS_KEY trebuie PARTAJATA
api<->worker, altfel worker nu decripteaza creds-urile criptate de API -> submission
blocate. Acum impusa in compose (${...:?} -> fail explicit daca lipseste).
.gitignore: exceptie !.env.example.
5 teste noi (tests/test_deploy.py). 100 pass total.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
49 lines
1.5 KiB
Python
49 lines
1.5 KiB
Python
"""Liveness probe pentru worker (T6) — folosit de healthcheck-ul Docker.
|
|
|
|
Worker-ul nu e server HTTP, deci `restart: always` prinde doar procesul MORT,
|
|
nu si worker-ul AGATAT (proces viu care nu mai bate heartbeat). Acest probe
|
|
citeste `worker_heartbeat` din DB si pica daca ultimul beat e mai vechi decat
|
|
`worker_heartbeat_stale_s` -> Docker restarteaza containerul worker.
|
|
|
|
Utilizare (compose healthcheck): python -m app.worker.healthcheck
|
|
Exit 0 = sanatos, 1 = invechit/lipsa.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import sys
|
|
from datetime import datetime, timezone
|
|
|
|
from ..config import Settings, get_settings
|
|
from ..db import get_connection, read_heartbeat
|
|
|
|
|
|
def worker_healthy(conn, settings: Settings, *, now: datetime | None = None) -> bool:
|
|
"""True daca ultimul heartbeat e mai proaspat decat pragul de invechire."""
|
|
hb = read_heartbeat(conn)
|
|
if hb is None or not hb["last_beat"]:
|
|
return False
|
|
try:
|
|
last = datetime.fromisoformat(hb["last_beat"])
|
|
except (ValueError, TypeError):
|
|
return False
|
|
now = now or datetime.now(timezone.utc)
|
|
return (now - last).total_seconds() <= settings.worker_heartbeat_stale_s
|
|
|
|
|
|
def main() -> int:
|
|
settings = get_settings()
|
|
conn = get_connection()
|
|
try:
|
|
ok = worker_healthy(conn, settings)
|
|
finally:
|
|
conn.close()
|
|
if not ok:
|
|
print("[healthcheck] worker invechit sau nepornit", flush=True)
|
|
return 1
|
|
return 0
|
|
|
|
|
|
if __name__ == "__main__":
|
|
sys.exit(main())
|