Files
roa2web-service-auto/shared/docs/MULTI_TENANT_UPGRADE_PLAN.md
Marius Mutu c56f832e81 Add comprehensive multi-tenant architecture upgrade plan
Creates detailed 60-page implementation roadmap for transforming ROA2WEB from
single-tenant to multi-tenant SaaS architecture. Plan includes 6 phases with
backward compatibility, hybrid connection support (SSH tunnel + direct), and
complete deployment strategies for dev/Docker/Windows environments.

Key features:
- Tenant isolation with separate Oracle connection pools per tenant
- Dynamic SSH tunnel management with auto-restart
- Encrypted credentials in PostgreSQL/SQLite tenant config DB
- JWT-based tenant identification and access validation
- Redis cache namespacing per tenant
- Comprehensive testing and migration strategies

Timeline: 14-20 days implementation
Target: <10% performance overhead, zero downtime migration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-25 22:59:12 +03:00

2502 lines
87 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Plan Upgrade Multi-Tenant Architecture - ROA2WEB
**Version:** 1.0
**Created:** 2025-10-25
**Status:** Planning Phase
---
## 📋 Sumar Executiv
ROA2WEB va fi transformat de la o aplicație **single-tenant** (un singur client, o singură bază de date Oracle) la o arhitectură **multi-tenant SaaS** care suportă:
- **Multiple clienți simultaneous** cu izolare completă între tenants (pool-uri, cache, audit logs)
- **Conexiuni hibride**: SSH tunnel pentru clienți remote SAU direct TCP pentru clienți în LAN
- **Deployment flexibil**: Development (WSL), Docker (Proxmox LXC), Windows IIS
- **Backward compatibility**: Tenant "default" funcționează exact ca single-tenant actual (zero breaking changes)
- **Gradual migration**: Fiecare fază testabilă independent, rollout incremental
- **Security-first**: Passwords encrypted în tenant DB, SSH keys read-only, JWT signing per tenant
- **Performance**: < 10% overhead vs single-tenant, izolare pool-uri per tenant
---
## 🏗️ Arhitectură Target
### Single-Tenant (Actual)
```
┌─────────────────────────────────────────────────────┐
│ FastAPI Backend │
│ │
│ ┌─────────────────────────────────────────────┐ │
│ │ OraclePool (Singleton) │ │
│ │ - Hardcoded credentials din .env │ │
│ │ - Min: 2, Max: 10 connections │ │
│ │ - Shared pentru toți userii │ │
│ └─────────────────────────────────────────────┘ │
│ ▼ │
└──────────────────────┼──────────────────────────────┘
┌─────────────┴───────────┐
│ │
SSH Tunnel Direct Connection
(Development) (Windows Production)
│ │
▼ ▼
┌─────────────────┐ ┌──────────────────┐
│ Oracle Server │ │ Oracle Server │
│ (Remote) │ │ (Local LAN) │
└─────────────────┘ └──────────────────┘
JWT Token Structure (Actual):
{
"username": "john.doe",
"user_id": 123,
"companies": ["COMP1", "COMP2"],
"permissions": ["read", "reports"],
"exp": 1234567890,
"iat": 1234567800,
"type": "access"
}
```
### Multi-Tenant (Target)
```
┌────────────────────────────────────────────────────────────────────┐
│ FastAPI Backend │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ MultiTenantPoolManager (New) │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Client A │ │ Client B │ │ Client C │ │ │
│ │ │ Pool (2-10) │ │ Pool (2-10) │ │ Pool (2-10) │ │ │
│ │ │ SSH Tunnel │ │ Direct Conn │ │ SSH Tunnel │ │ │
│ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │
│ │ │ │ │ │ │
│ └─────────┼─────────────────┼─────────────────┼──────────────┘ │
│ │ │ │ │
└────────────┼─────────────────┼─────────────────┼────────────────┘
│ │ │
┌────────┴─────┐ ┌────────┴─────┐ ┌────────┴─────┐
│ SSH Process │ │ Direct │ │ SSH Process │
│ localhost: │ │ 192.168.1.50 │ │ localhost: │
│ 15261 │ │ :1521 │ │ 15262 │
└────────┬─────┘ └────────┬─────┘ └────────┬─────┘
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Oracle │ │ Oracle │ │ Oracle │
│ Client A │ │ Client B │ │ Client C │
│ (Remote) │ │ (LAN) │ │ (Remote) │
└──────────────┘ └──────────────┘ └──────────────┘
┌──────────────────────┐
│ Tenant Config DB │
│ (PostgreSQL/SQLite) │
│ │
│ - tenants │
│ - tenant_users │
│ - audit_logs │
└──────────────────────┘
JWT Token Structure (Target):
{
"username": "john.doe",
"user_id": 123,
"tenant_id": "client-a-uuid", ← NEW
"companies": ["COMP1", "COMP2"],
"permissions": ["read", "reports"],
"exp": 1234567890,
"iat": 1234567800,
"type": "access"
}
Redis Cache Keys:
cache:{tenant_id}:dashboard:{company_id} ← Already prepared!
cache:{tenant_id}:invoices:{filters_hash}
```
### Key Architectural Decisions
1. **Lazy Pool Initialization**: Pool-uri create doar când tenant-ul e accesat prima dată (economie memorie)
2. **SSH Tunnel per Tenant**: Subprocess separat pentru fiecare tenant remote (izolare, resilience)
3. **Tenant Config DB Separate**: Nu stocăm tenant config în Oracle (evităm dependențe circulare)
4. **JWT Tenant ID Signed**: Tenant ID e în token signed, nu poate fi modificat de client
5. **Pool Cleanup**: Pool-uri inactive > 1h se închid automat (economie resurse)
6. **Backward Compatible**: Tenant "default" mapează la .env actual (zero migration pain)
---
## 🗂️ Structura Fișierelor
### Fișiere Noi
```
shared/
├── database/
│ ├── multi_tenant_pool.py ✅ NEW - MultiTenantPoolManager class
│ ├── tenant_config.py ✅ NEW - Tenant configuration loader
│ ├── ssh_tunnel_manager.py ✅ NEW - SSH tunnel per tenant management
│ └── tenant_models.py ✅ NEW - Pydantic models for tenants
├── middleware/
│ └── tenant_middleware.py ✅ NEW - Tenant identification middleware
├── schemas/
│ └── tenant_config_schema.sql ✅ NEW - PostgreSQL/SQLite schema
└── utils/
├── encryption.py ✅ NEW - Fernet encryption for passwords
└── tenant_utils.py ✅ NEW - Tenant helper functions
deployment/
├── docker/
│ └── tenant-config-db.dockerfile ✅ NEW - PostgreSQL tenant config container
└── windows/
└── tenant-config-setup.ps1 ✅ NEW - SQL Server Express setup for tenants
```
### Fișiere Modificate
```
shared/
├── database/
│ └── oracle_pool.py ⚠️ MODIFY - Add DEPRECATED warning
├── auth/
│ ├── jwt_handler.py ⚠️ MODIFY - Add tenant_id to JWT payload
│ └── middleware.py ⚠️ MODIFY - Extract tenant_id, validate access
└── cache/
└── redis_client.py ⚠️ MODIFY - Use real tenant_id (not "default")
reports-app/backend/
├── app/
│ ├── main.py ⚠️ MODIFY - Initialize MultiTenantPoolManager
│ └── routers/
│ ├── companies.py ⚠️ MODIFY - Use tenant_id from request.state
│ ├── dashboard.py ⚠️ MODIFY - Use tenant_id from request.state
│ ├── invoices.py ⚠️ MODIFY - Use tenant_id from request.state
│ └── treasury.py ⚠️ MODIFY - Use tenant_id from request.state
└── .env.example ⚠️ MODIFY - Add tenant config DB variables
docker-compose.yml ⚠️ MODIFY - Add tenant-config-db service
deployment/windows/
└── scripts/
└── Install-ROA2WEB.ps1 ⚠️ MODIFY - Add tenant DB setup
```
### Database Schema (Tenant Config DB)
**PostgreSQL/SQLite Compatible Schema**
```sql
-- shared/schemas/tenant_config_schema.sql
-- Tenants configuration table
CREATE TABLE IF NOT EXISTS tenants (
id VARCHAR(36) PRIMARY KEY, -- UUID
name VARCHAR(255) NOT NULL, -- Display name (ex: "Client A - Retail SRL")
connection_type VARCHAR(20) NOT NULL, -- 'ssh_tunnel' | 'direct'
-- Oracle connection details
oracle_host VARCHAR(255) NOT NULL, -- Oracle server IP/hostname
oracle_port INTEGER NOT NULL DEFAULT 1521,
oracle_sid VARCHAR(50) NOT NULL DEFAULT 'ROA',
oracle_user VARCHAR(100) NOT NULL,
oracle_password_encrypted TEXT NOT NULL, -- Fernet encrypted password
-- SSH tunnel configuration (NULL if connection_type='direct')
ssh_host VARCHAR(255), -- SSH server IP
ssh_port INTEGER DEFAULT 22,
ssh_user VARCHAR(100),
ssh_key_path VARCHAR(500), -- Path to SSH private key
ssh_tunnel_local_port INTEGER, -- Local port for tunnel (ex: 15261)
-- Pool configuration
min_connections INTEGER NOT NULL DEFAULT 2,
max_connections INTEGER NOT NULL DEFAULT 10,
-- Status
is_active BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
-- Constraints
CONSTRAINT chk_connection_type CHECK (connection_type IN ('ssh_tunnel', 'direct')),
CONSTRAINT chk_ssh_config CHECK (
(connection_type = 'direct') OR
(connection_type = 'ssh_tunnel' AND ssh_host IS NOT NULL AND ssh_key_path IS NOT NULL)
)
);
-- Tenant users mapping (which users have access to which tenants)
CREATE TABLE IF NOT EXISTS tenant_users (
id SERIAL PRIMARY KEY, -- Auto-increment ID
tenant_id VARCHAR(36) NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
user_id INTEGER NOT NULL, -- Oracle user ID from CONTAFIN_ORACLE.UTILIZATORI
username VARCHAR(100) NOT NULL, -- Oracle username
is_admin BOOLEAN NOT NULL DEFAULT FALSE, -- Tenant admin (can manage tenant config)
granted_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
granted_by INTEGER, -- User ID who granted access
UNIQUE(tenant_id, user_id)
);
-- Audit logs per tenant
CREATE TABLE IF NOT EXISTS audit_logs (
id SERIAL PRIMARY KEY,
tenant_id VARCHAR(36) NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
user_id INTEGER NOT NULL,
username VARCHAR(100) NOT NULL,
action VARCHAR(100) NOT NULL, -- 'login', 'query', 'export', etc.
resource VARCHAR(255), -- Resource accessed (ex: 'dashboard', 'invoices')
status VARCHAR(20) NOT NULL, -- 'success' | 'error'
error_message TEXT,
ip_address VARCHAR(50),
user_agent TEXT,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
-- Index for fast queries
INDEX idx_tenant_user (tenant_id, user_id),
INDEX idx_created_at (created_at)
);
-- Insert default tenant (backward compatibility)
-- This maps to existing .env credentials
INSERT INTO tenants (
id, name, connection_type,
oracle_host, oracle_port, oracle_sid, oracle_user, oracle_password_encrypted,
min_connections, max_connections, is_active
) VALUES (
'default',
'Default Tenant (Single-Tenant Legacy)',
'ssh_tunnel', -- Will be read from environment
'localhost', -- Will be overridden by environment if needed
1526,
'ROA',
'CONTAFIN_ORACLE',
'PLACEHOLDER_ENCRYPTED_PASSWORD', -- Will be replaced by migration script
2,
10,
TRUE
) ON CONFLICT (id) DO NOTHING;
-- Indexes for performance
CREATE INDEX IF NOT EXISTS idx_tenants_active ON tenants(is_active);
CREATE INDEX IF NOT EXISTS idx_tenant_users_user ON tenant_users(user_id);
CREATE INDEX IF NOT EXISTS idx_audit_tenant ON audit_logs(tenant_id);
```
---
## 🚀 Faze de Upgrade
### FAZA 1: Tenant Configuration Database (2-3 zile)
**Obiectiv:** Creează tenant configuration database și loader pentru citirea tenant configs la startup.
#### Tasks
1. **Creează PostgreSQL/SQLite schema pentru tenant config**
- **Fișier:** `shared/schemas/tenant_config_schema.sql`
- **Acțiune:** Define tables `tenants`, `tenant_users`, `audit_logs`
- **Deployment:**
- Dev: SQLite (`data/tenant_config.db`)
- Docker: PostgreSQL container (`roa-tenant-config-db`)
- Windows: SQL Server Express SAU PostgreSQL Windows service
2. **Implementează TenantConfigLoader**
- **Fișier:** `shared/database/tenant_config.py`
- **Clasa:** `TenantConfigLoader(db_url: str)`
- **Metode:**
- `async def load_tenants() -> Dict[str, TenantConfig]` - Load all active tenants
- `async def get_tenant(tenant_id: str) -> Optional[TenantConfig]` - Get specific tenant
- `async def reload_tenant(tenant_id: str)` - Reload tenant config (for updates)
- **Pattern:** Async context manager pentru DB connections
3. **Implementează Pydantic models pentru tenant config**
- **Fișier:** `shared/database/tenant_models.py`
- **Models:**
```python
class TenantConfig(BaseModel):
id: str # UUID
name: str
connection_type: Literal['ssh_tunnel', 'direct']
oracle_host: str
oracle_port: int
oracle_sid: str
oracle_user: str
oracle_password: str # Decrypted
ssh_host: Optional[str] = None
ssh_port: Optional[int] = 22
ssh_user: Optional[str] = None
ssh_key_path: Optional[str] = None
ssh_tunnel_local_port: Optional[int] = None
min_connections: int = 2
max_connections: int = 10
is_active: bool = True
```
4. **Implementează password encryption/decryption**
- **Fișier:** `shared/utils/encryption.py`
- **Funcții:**
- `encrypt_password(password: str, key: str) -> str` - Fernet encryption
- `decrypt_password(encrypted: str, key: str) -> str` - Fernet decryption
- **Environment:** `DB_ENCRYPTION_KEY` (generate with `Fernet.generate_key()`)
5. **Creează migration script pentru tenant default**
- **Fișier:** `shared/scripts/create_default_tenant.py`
- **Acțiune:**
- Citește credențiale din `.env` actual
- Encrypt password cu `DB_ENCRYPTION_KEY`
- Insert tenant "default" în tenant DB
- Testează decryption și Oracle connection
6. **Update Docker Compose cu tenant config DB**
- **Fișier:** `docker-compose.yml`
- **Service nou:**
```yaml
roa-tenant-config-db:
image: postgres:15-alpine
container_name: roa-tenant-config-db
environment:
POSTGRES_DB: tenant_config
POSTGRES_USER: tenant_admin
POSTGRES_PASSWORD: ${TENANT_DB_PASSWORD}
volumes:
- tenant-config-data:/var/lib/postgresql/data
- ./shared/schemas/tenant_config_schema.sql:/docker-entrypoint-initdb.d/schema.sql:ro
networks:
- roa-network
```
7. **Update .env.example cu tenant DB variables**
```bash
# Tenant Configuration Database
TENANT_DB_URL=postgresql://tenant_admin:password@localhost:5432/tenant_config
# For SQLite (development): sqlite:///data/tenant_config.db
DB_ENCRYPTION_KEY=GENERATE_WITH_Fernet.generate_key()
```
#### Output Verificabil
- ✅ Tenant DB se creează cu succes (PostgreSQL/SQLite)
- ✅ Schema tables create (`tenants`, `tenant_users`, `audit_logs`)
- ✅ Default tenant se încarcă cu credențiale din `.env` actual
- ✅ Password encryption/decryption funcționează
- ✅ Test: `pytest shared/tests/test_tenant_config.py -v`
- ✅ Docker: `docker-compose up roa-tenant-config-db` pornește cu succes
---
### FAZA 2: MultiTenantPoolManager (3-4 zile)
**Obiectiv:** Implementează pool manager care creează pool-uri Oracle separate per tenant cu lazy initialization.
#### Tasks
1. **Implementează MultiTenantPoolManager class**
- **Fișier:** `shared/database/multi_tenant_pool.py`
- **Pattern:** Singleton (similar cu `OraclePool` actual)
- **Structură:**
```python
class MultiTenantPoolManager:
_instance: Optional['MultiTenantPoolManager'] = None
_pools: Dict[str, oracledb.ConnectionPool] = {} # tenant_id -> pool
_tenant_configs: Dict[str, TenantConfig] = {}
_pool_locks: Dict[str, asyncio.Lock] = {} # Thread-safe pool creation
_last_access: Dict[str, datetime] = {} # For cleanup inactive pools
async def initialize(self, tenant_db_url: str):
"""Load tenant configs from tenant DB"""
async def get_connection(self, tenant_id: str):
"""Context manager - get connection from tenant pool (lazy init)"""
async def _ensure_pool(self, tenant_id: str):
"""Lazy initialize pool if not exists"""
async def reload_tenant(self, tenant_id: str):
"""Reload tenant config and recreate pool"""
async def cleanup_inactive_pools(self, max_idle_hours: int = 1):
"""Close pools inactive > max_idle_hours"""
async def close_all_pools(self):
"""Shutdown - close all pools"""
```
2. **Implementează lazy pool initialization**
- **Logica:**
```python
async def _ensure_pool(self, tenant_id: str):
if tenant_id in self._pools:
self._last_access[tenant_id] = datetime.utcnow()
return # Pool already exists
# Acquire lock pentru thread-safety
async with self._pool_locks.setdefault(tenant_id, asyncio.Lock()):
# Double-check în lock
if tenant_id in self._pools:
return
# Load tenant config
tenant_config = await self._load_tenant_config(tenant_id)
if not tenant_config.is_active:
raise ValueError(f"Tenant {tenant_id} is not active")
# Create pool
pool = oracledb.create_pool(
user=tenant_config.oracle_user,
password=tenant_config.oracle_password,
host=tenant_config.oracle_host,
port=tenant_config.oracle_port,
sid=tenant_config.oracle_sid,
min=tenant_config.min_connections,
max=tenant_config.max_connections,
increment=1,
getmode=oracledb.POOL_GETMODE_WAIT
)
self._pools[tenant_id] = pool
self._tenant_configs[tenant_id] = tenant_config
self._last_access[tenant_id] = datetime.utcnow()
logger.info(f"Created pool for tenant {tenant_id} ({tenant_config.name})")
```
3. **Implementează get_connection context manager**
- **Pattern:** Same as `OraclePool.get_connection()` dar per tenant
```python
@asynccontextmanager
async def get_connection(self, tenant_id: str):
await self._ensure_pool(tenant_id) # Lazy init
pool = self._pools[tenant_id]
connection = None
try:
connection = pool.acquire()
self._last_access[tenant_id] = datetime.utcnow()
logger.debug(f"Connection acquired for tenant {tenant_id}")
yield connection
finally:
if connection is not None:
connection.close()
logger.debug(f"Connection returned for tenant {tenant_id}")
```
4. **Implementează pool cleanup pentru inactive tenants**
- **Scheduled task:** Run every hour, close pools inactive > 1h
```python
async def cleanup_inactive_pools(self, max_idle_hours: int = 1):
now = datetime.utcnow()
inactive_tenants = []
for tenant_id, last_access in self._last_access.items():
idle_hours = (now - last_access).total_seconds() / 3600
if idle_hours > max_idle_hours:
inactive_tenants.append(tenant_id)
for tenant_id in inactive_tenants:
logger.info(f"Closing inactive pool for tenant {tenant_id}")
pool = self._pools.pop(tenant_id, None)
if pool:
pool.close()
self._tenant_configs.pop(tenant_id, None)
self._last_access.pop(tenant_id, None)
```
5. **Implementează tenant config reload (for dynamic updates)**
- **Use case:** Admin updates tenant config în DB, aplicația reloadează fără restart
```python
async def reload_tenant(self, tenant_id: str):
# Close existing pool
old_pool = self._pools.pop(tenant_id, None)
if old_pool:
old_pool.close()
# Reload config from DB
tenant_config = await self._tenant_config_loader.get_tenant(tenant_id)
if not tenant_config:
raise ValueError(f"Tenant {tenant_id} not found")
# Pool will be recreated on next request (lazy init)
self._tenant_configs.pop(tenant_id, None)
self._last_access.pop(tenant_id, None)
logger.info(f"Reloaded tenant config for {tenant_id}")
```
6. **Add backward compatibility layer**
- **Tenant "default"** mapează la credențiale din `.env` pentru zero breaking changes
```python
async def _load_default_tenant_from_env(self) -> TenantConfig:
"""Fallback: Load default tenant from .env if tenant DB is not available"""
return TenantConfig(
id='default',
name='Default Tenant (Legacy)',
connection_type='ssh_tunnel' if os.getenv('ORACLE_HOST') == 'localhost' else 'direct',
oracle_host=os.getenv('ORACLE_HOST', 'localhost'),
oracle_port=int(os.getenv('ORACLE_PORT', '1526')),
oracle_sid=os.getenv('ORACLE_SID', 'ROA'),
oracle_user=os.getenv('ORACLE_USER'),
oracle_password=os.getenv('ORACLE_PASSWORD'),
min_connections=2,
max_connections=10,
is_active=True
)
```
7. **Mark OraclePool as DEPRECATED**
- **Fișier:** `shared/database/oracle_pool.py`
- **Acțiune:** Add deprecation warning
```python
import warnings
class OraclePool:
"""
DEPRECATED: Use MultiTenantPoolManager instead.
This class is kept for backward compatibility only.
Will be removed in version 2.0.
"""
def __init__(self):
warnings.warn(
"OraclePool is deprecated. Use MultiTenantPoolManager for multi-tenant support.",
DeprecationWarning,
stacklevel=2
)
# ... rest of code
```
#### Output Verificabil
- ✅ `MultiTenantPoolManager` creează pool-uri per tenant
- ✅ Lazy initialization: Pool creat doar la prima cerere
- ✅ Tenant "default" funcționează cu credențiale din `.env` (backward compatible)
- ✅ Pool cleanup: Inactive pools se închid automat după 1h
- ✅ Reload tenant: Config update fără restart aplicație
- ✅ Test: `pytest shared/tests/test_multi_tenant_pool.py -v`
- ✅ Test: Connect la 3 tenants dummy simultaneous
---
### FAZA 3: SSH Tunnel Management per Tenant (2-3 zile)
**Obiectiv:** Implementează SSH tunnel manager care creează și monitorizează subprocess SSH per tenant remote.
#### Tasks
1. **Implementează SSHTunnelManager class**
- **Fișier:** `shared/database/ssh_tunnel_manager.py`
- **Responsabilități:**
- Start SSH tunnel subprocess per tenant
- Monitor tunnel health (periodic checks)
- Auto-restart on failure (exponential backoff)
- Cleanup la shutdown
- **Structură:**
```python
class SSHTunnelManager:
_tunnels: Dict[str, subprocess.Popen] = {} # tenant_id -> SSH process
_tunnel_ports: Dict[str, int] = {} # tenant_id -> local port
_restart_attempts: Dict[str, int] = {} # For exponential backoff
async def start_tunnel(self, tenant_config: TenantConfig) -> int:
"""Start SSH tunnel for tenant, return local port"""
async def stop_tunnel(self, tenant_id: str):
"""Stop SSH tunnel subprocess"""
async def check_tunnel_health(self, tenant_id: str) -> bool:
"""Check if tunnel is alive and responding"""
async def restart_tunnel(self, tenant_id: str):
"""Restart tunnel with exponential backoff"""
async def cleanup_all_tunnels(self):
"""Shutdown - kill all SSH processes"""
```
2. **Implementează SSH tunnel start logic**
- **Logica:**
```python
async def start_tunnel(self, tenant_config: TenantConfig) -> int:
tenant_id = tenant_config.id
# Generate unique local port for this tenant
local_port = tenant_config.ssh_tunnel_local_port or self._allocate_port()
# Build SSH command
ssh_cmd = [
'ssh', '-f', '-N',
'-L', f'{local_port}:{tenant_config.oracle_host}:{tenant_config.oracle_port}',
'-p', str(tenant_config.ssh_port),
'-i', tenant_config.ssh_key_path,
'-o', 'ServerAliveInterval=60',
'-o', 'ServerAliveCountMax=3',
'-o', 'ExitOnForwardFailure=yes',
f'{tenant_config.ssh_user}@{tenant_config.ssh_host}'
]
# Start process
process = subprocess.Popen(ssh_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# Wait for tunnel to establish (max 10 seconds)
for _ in range(10):
if self._check_port_open('localhost', local_port):
break
await asyncio.sleep(1)
else:
process.kill()
raise RuntimeError(f"SSH tunnel failed to start for tenant {tenant_id}")
self._tunnels[tenant_id] = process
self._tunnel_ports[tenant_id] = local_port
logger.info(f"SSH tunnel started for tenant {tenant_id} on port {local_port}")
return local_port
```
3. **Implementează tunnel health checks**
- **Periodic check:** Every 60 seconds, verify tunnel is alive
```python
async def check_tunnel_health(self, tenant_id: str) -> bool:
if tenant_id not in self._tunnels:
return False
process = self._tunnels[tenant_id]
local_port = self._tunnel_ports[tenant_id]
# Check process is alive
if process.poll() is not None:
logger.warning(f"SSH tunnel process died for tenant {tenant_id}")
return False
# Check port is accessible
if not self._check_port_open('localhost', local_port):
logger.warning(f"SSH tunnel port {local_port} not accessible for tenant {tenant_id}")
return False
return True
def _check_port_open(self, host: str, port: int) -> bool:
import socket
try:
with socket.create_connection((host, port), timeout=2):
return True
except:
return False
```
4. **Implementează auto-restart cu exponential backoff**
- **Logica:** Dacă tunnel moare, restart cu delay: 5s, 10s, 20s, 40s, max 60s
```python
async def restart_tunnel(self, tenant_id: str):
attempts = self._restart_attempts.get(tenant_id, 0)
delay = min(5 * (2 ** attempts), 60) # Exponential backoff, max 60s
logger.info(f"Restarting tunnel for tenant {tenant_id} (attempt {attempts+1}, delay {delay}s)")
await asyncio.sleep(delay)
try:
await self.stop_tunnel(tenant_id)
tenant_config = await self._get_tenant_config(tenant_id)
await self.start_tunnel(tenant_config)
# Reset attempts on success
self._restart_attempts[tenant_id] = 0
logger.info(f"Tunnel restarted successfully for tenant {tenant_id}")
except Exception as e:
self._restart_attempts[tenant_id] = attempts + 1
logger.error(f"Tunnel restart failed for tenant {tenant_id}: {e}")
raise
```
5. **Integrate SSH tunnel manager în MultiTenantPoolManager**
- **Logica:** Dacă tenant are `connection_type='ssh_tunnel'`, start tunnel înainte de pool
```python
# În MultiTenantPoolManager._ensure_pool()
tenant_config = await self._load_tenant_config(tenant_id)
# Start SSH tunnel if needed
if tenant_config.connection_type == 'ssh_tunnel':
if not await self._ssh_tunnel_manager.check_tunnel_health(tenant_id):
local_port = await self._ssh_tunnel_manager.start_tunnel(tenant_config)
# Override Oracle host/port to use tunnel
tenant_config.oracle_host = 'localhost'
tenant_config.oracle_port = local_port
# Create pool (rest of code same as before)
pool = oracledb.create_pool(...)
```
6. **Implementează cleanup la shutdown**
- **Logica:** Kill all SSH processes gracefully
```python
async def cleanup_all_tunnels(self):
for tenant_id, process in self._tunnels.items():
try:
process.terminate() # SIGTERM
await asyncio.sleep(2)
if process.poll() is None:
process.kill() # SIGKILL if not dead
logger.info(f"Stopped SSH tunnel for tenant {tenant_id}")
except Exception as e:
logger.error(f"Error stopping tunnel for tenant {tenant_id}: {e}")
self._tunnels.clear()
self._tunnel_ports.clear()
```
7. **Add background task pentru health monitoring**
- **Fișier:** `reports-app/backend/app/main.py`
- **Task:** Run every 60 seconds
```python
async def monitor_ssh_tunnels():
while True:
await asyncio.sleep(60)
for tenant_id in multi_tenant_pool._tunnels.keys():
if not await multi_tenant_pool._ssh_tunnel_manager.check_tunnel_health(tenant_id):
logger.warning(f"Tunnel unhealthy for tenant {tenant_id}, restarting...")
await multi_tenant_pool._ssh_tunnel_manager.restart_tunnel(tenant_id)
# În lifespan startup
asyncio.create_task(monitor_ssh_tunnels())
```
#### Output Verificabil
- ✅ SSH tunnel subprocess pornește per tenant remote
- ✅ Tunnel health check detectează tunnels moarte
- ✅ Auto-restart cu exponential backoff funcționează
- ✅ Multiple tenants cu SSH tunnels simultaneous (port allocation unique)
- ✅ Cleanup la shutdown: toate procesele SSH se opresc
- ✅ Test: `pytest shared/tests/test_ssh_tunnel_manager.py -v`
- ✅ Manual test: Kill SSH process, verifică auto-restart în < 60s
---
### FAZA 4: JWT & Middleware Update (2-3 zile)
**Obiectiv:** Update JWT tokens să includă `tenant_id` și middleware să extragă/valideze tenant access.
#### Tasks
1. **Update JWT handler să includă tenant_id**
- **Fișier:** `shared/auth/jwt_handler.py`
- **Modificări:**
```python
# În TokenData model
class TokenData(BaseModel):
username: str
user_id: Optional[int] = None
tenant_id: str = Field(description="Tenant ID (UUID)") # NEW
companies: List[str] = Field(default_factory=list)
permissions: List[str] = Field(default_factory=list)
exp: datetime
iat: datetime
token_type: str = Field(alias="type")
# În create_access_token()
def create_access_token(
self,
username: str,
tenant_id: str, # NEW parameter
companies: List[str],
user_id: Optional[int] = None,
permissions: Optional[List[str]] = None
) -> str:
payload = {
"username": username,
"user_id": user_id,
"tenant_id": tenant_id, # NEW
"companies": companies or [],
"permissions": permissions or ["read"],
"exp": expire,
"iat": now,
"type": "access"
}
# ... rest same
```
2. **Update login endpoint să determine tenant_id**
- **Fișier:** `reports-app/backend/app/main.py` (auth router)
- **Logica:**
- Check `tenant_users` table pentru user_id
- Dacă user are access la multiple tenants, return primul (default)
- Sau user selectează tenant la login (future enhancement)
```python
# În login endpoint
# Get user's tenants from tenant_users table
tenants = await tenant_config_loader.get_user_tenants(user_id)
if not tenants:
# Fallback: Use "default" tenant (backward compatibility)
tenant_id = "default"
else:
# Use first tenant (or let user select in future)
tenant_id = tenants[0]['tenant_id']
# Create JWT with tenant_id
access_token = jwt_handler.create_access_token(
username=credentials.username,
tenant_id=tenant_id, # NEW
companies=companies,
user_id=user_id,
permissions=["read", "reports"]
)
```
3. **Implementează TenantMiddleware pentru validare tenant access**
- **Fișier:** `shared/middleware/tenant_middleware.py`
- **Responsabilități:**
- Extract `tenant_id` din JWT token
- Validate user are acces la tenant-ul respectiv
- Inject `tenant_id` în `request.state.tenant_id`
```python
class TenantMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
# Skip pentru excluded paths
if request.url.path in self.excluded_paths:
return await call_next(request)
# Extract tenant_id from JWT token (already decoded by AuthMiddleware)
user = getattr(request.state, 'user', None)
if not user:
return JSONResponse(
status_code=401,
content={"detail": "Not authenticated"}
)
tenant_id = user.get('tenant_id')
if not tenant_id:
return JSONResponse(
status_code=400,
content={"detail": "Missing tenant_id in token"}
)
# Validate tenant exists and is active
tenant_config = await tenant_config_loader.get_tenant(tenant_id)
if not tenant_config or not tenant_config.is_active:
return JSONResponse(
status_code=403,
content={"detail": f"Tenant {tenant_id} is not active"}
)
# Validate user has access to this tenant
user_id = user.get('user_id')
has_access = await tenant_config_loader.check_user_tenant_access(user_id, tenant_id)
if not has_access:
return JSONResponse(
status_code=403,
content={"detail": f"User {user_id} does not have access to tenant {tenant_id}"}
)
# Inject tenant_id în request state
request.state.tenant_id = tenant_id
request.state.tenant_name = tenant_config.name
# Continue request
response = await call_next(request)
# Log audit (async background task)
await self._log_audit(request, response, tenant_id, user_id)
return response
```
4. **Update AuthenticationMiddleware să funcționeze cu TenantMiddleware**
- **Fișier:** `shared/auth/middleware.py`
- **Ordinea middleware-urilor:**
```python
# În main.py
app.add_middleware(TenantMiddleware, excluded_paths=["/", "/docs", "/health", ...])
app.add_middleware(AuthenticationMiddleware, excluded_paths=["/", "/docs", "/health", ...])
```
- **Flow:** AuthMiddleware decode JWT → TenantMiddleware validate tenant access
5. **Update toate router-urile să folosească tenant_id din request.state**
- **Fișiere:** `reports-app/backend/app/routers/*.py`
- **Pattern:**
```python
# Înainte (single-tenant)
async with oracle_pool.get_connection() as connection:
# query...
# După (multi-tenant)
tenant_id = request.state.tenant_id # Injected by TenantMiddleware
async with multi_tenant_pool.get_connection(tenant_id) as connection:
# query...
```
- **Exemplu:** `dashboard.py`
```python
@router.get("/{company_id}")
async def get_dashboard(company_id: str, request: Request):
tenant_id = request.state.tenant_id # NEW
async with multi_tenant_pool.get_connection(tenant_id) as connection:
with connection.cursor() as cursor:
# ... rest same
```
6. **Update Telegram bot pentru tenant support**
- **Fișier:** `reports-app/telegram-bot/app/auth/linking.py`
- **Modificări:**
- La linking, salvează și `tenant_id` în SQLite
- JWT token include `tenant_id`
- Toate requests la backend includ tenant_id corect
7. **Add tenant selection endpoint (future enhancement)**
- **Endpoint:** `POST /api/auth/select-tenant`
- **Use case:** User cu access la multiple tenants poate switcha între ele
- **Response:** New JWT token cu alt tenant_id
#### Output Verificabil
- ✅ JWT token include `tenant_id` field
- ✅ Login endpoint generate token cu tenant_id corect
- ✅ TenantMiddleware extrage și validează tenant_id
- ✅ Router-uri folosesc `multi_tenant_pool.get_connection(tenant_id)`
- ✅ Request la tenant invalid returnează 403 Forbidden
- ✅ User fără access la tenant returnează 403 Forbidden
- ✅ Test: `pytest shared/tests/test_tenant_middleware.py -v`
- ✅ Test: Login cu user care are access la tenant A, request la tenant B → 403
---
### FAZA 5: Cache & Audit Logging Integration (1-2 zile)
**Obiectiv:** Update Redis cache să folosească real tenant_id (nu "default") și implementează audit logging per tenant.
#### Tasks
1. **Update Redis cache să folosească real tenant_id**
- **Fișier:** `shared/cache/redis_client.py` (dacă există) sau inline în routers
- **Modificare:** Înlocuiește hardcoded `"default"` cu real `tenant_id`
- **Înainte:**
```python
cache_key = f"cache:default:dashboard:{company_id}"
```
- **După:**
```python
tenant_id = request.state.tenant_id
cache_key = f"cache:{tenant_id}:dashboard:{company_id}"
```
2. **Implementează cache invalidation per tenant**
- **Use case:** Admin updates tenant data, invalidate doar cache-ul tenant-ului respectiv
- **Endpoint:** `DELETE /api/cache/{tenant_id}` (admin only)
- **Logica:**
```python
pattern = f"cache:{tenant_id}:*"
keys = redis_client.keys(pattern)
if keys:
redis_client.delete(*keys)
```
3. **Implementează audit logging în TenantMiddleware**
- **Fișier:** `shared/middleware/tenant_middleware.py`
- **Logica:** Log toate request-urile în `audit_logs` table
```python
async def _log_audit(self, request: Request, response: Response, tenant_id: str, user_id: int):
# Extract info
action = f"{request.method} {request.url.path}"
status = "success" if response.status_code < 400 else "error"
error_message = None if status == "success" else response.body.decode()
# Insert în audit_logs table (async background task)
await audit_logger.log(
tenant_id=tenant_id,
user_id=user_id,
username=request.state.user.get('username'),
action=action,
resource=request.url.path,
status=status,
error_message=error_message,
ip_address=request.client.host,
user_agent=request.headers.get('user-agent')
)
```
4. **Implementează AuditLogger helper class**
- **Fișier:** `shared/utils/audit_logger.py`
- **Metodă:**
```python
class AuditLogger:
def __init__(self, tenant_db_url: str):
self.db_url = tenant_db_url
async def log(
self,
tenant_id: str,
user_id: int,
username: str,
action: str,
resource: str,
status: str,
error_message: Optional[str] = None,
ip_address: Optional[str] = None,
user_agent: Optional[str] = None
):
# Insert în audit_logs table (PostgreSQL/SQLite)
query = """
INSERT INTO audit_logs (
tenant_id, user_id, username, action, resource,
status, error_message, ip_address, user_agent
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
"""
await self._execute_query(query, [
tenant_id, user_id, username, action, resource,
status, error_message, ip_address, user_agent
])
```
5. **Add audit logs viewing endpoint**
- **Endpoint:** `GET /api/audit-logs/{tenant_id}` (tenant admin only)
- **Filters:** `?user_id=123&start_date=2025-10-01&end_date=2025-10-31&status=error`
- **Response:** Paginated audit logs for tenant
6. **Add metrics per tenant (optional, future)**
- **Metrics:**
- Request count per tenant
- Response time per tenant
- Error rate per tenant
- Active users per tenant
- **Storage:** Time-series database (InfluxDB) sau Redis sorted sets
#### Output Verificabil
- ✅ Redis cache keys include real tenant_id (not "default")
- ✅ Cache isolation: Tenant A cache nu e vizibil pentru tenant B
- ✅ Cache invalidation per tenant funcționează
- ✅ Audit logs se salvează în `audit_logs` table
- ✅ Audit logs include tenant_id, user_id, action, status
- ✅ Audit logs viewing endpoint returnează logs filtered per tenant
- ✅ Test: `pytest shared/tests/test_audit_logging.py -v`
---
### FAZA 6: Deployment & Testing (3-4 zile)
**Obiectiv:** Deploy multi-tenant în toate environment-urile (dev, Docker, Windows) și test complet.
#### Tasks
1. **Update development environment (WSL)**
- **Setup:**
```bash
# Create SQLite tenant DB
sqlite3 data/tenant_config.db < shared/schemas/tenant_config_schema.sql
# Generate encryption key
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
# Update .env
echo "TENANT_DB_URL=sqlite:///data/tenant_config.db" >> .env
echo "DB_ENCRYPTION_KEY=<generated_key>" >> .env
# Create default tenant
python shared/scripts/create_default_tenant.py
# Start app
./start-dev.sh
```
- **Verificare:** Login funcționează cu tenant "default"
2. **Update Docker deployment**
- **Fișier:** `docker-compose.yml`
- **Modificări:**
- Add `roa-tenant-config-db` service (PostgreSQL)
- Update `roa-backend` env vars (`TENANT_DB_URL`, `DB_ENCRYPTION_KEY`)
- Mount SSH keys volume read-only
- **Deployment:**
```bash
# Build images
docker-compose build
# Start services
docker-compose up -d
# Initialize tenant DB
docker-compose exec roa-backend python shared/scripts/create_default_tenant.py
# Verify
docker-compose logs roa-backend | grep "tenant"
```
3. **Update Windows IIS deployment**
- **Script:** `deployment/windows/scripts/Setup-TenantDB.ps1`
- **Acțiuni:**
- Install SQL Server Express SAU PostgreSQL Windows service
- Create `tenant_config` database
- Run schema SQL
- Generate encryption key (store în Windows Credential Manager)
- Create default tenant
- **Manual steps:**
```powershell
# Run setup
.\deployment\windows\scripts\Setup-TenantDB.ps1
# Update web.config cu TENANT_DB_URL
# Restart ROA2WEB-Backend service
Restart-Service ROA2WEB-Backend
```
4. **Implementează comprehensive integration tests**
- **Fișier:** `shared/tests/integration/test_multi_tenant_flow.py`
- **Scenarios:**
- Login cu tenant A → Get dashboard → Cache hit tenant A
- Login cu tenant B → Get dashboard → Cache miss (different tenant)
- User cu access la tenant A încearcă tenant B → 403 Forbidden
- SSH tunnel tenant restart după kill → Auto-recovery
- Tenant inactive > 1h → Pool cleanup
- **Run:**
```bash
pytest shared/tests/integration/ -v --tb=short
```
5. **Implementează load testing cu multiple tenants**
- **Tool:** Locust sau Apache Bench
- **Scenario:** 3 tenants, 100 requests each, simultaneous
- **Script:** `shared/tests/load/test_multi_tenant_load.py`
- **Metrics:**
- Response time per tenant (< 200ms avg)
- Error rate (< 1%)
- Pool usage (max connections per tenant)
- SSH tunnel stability (no restarts)
6. **Create tenant onboarding guide**
- **Fișier:** `shared/docs/TENANT_ONBOARDING.md`
- **Conținut:**
- How to add a new tenant (manual SQL sau admin UI)
- SSH key setup pentru tenant remote
- User assignment la tenant
- Testing tenant connection
- Troubleshooting common issues
7. **Create monitoring dashboard (optional)**
- **Tools:** Grafana + Prometheus
- **Metrics:**
- Active tenants count
- Pool connections per tenant
- Request rate per tenant
- Error rate per tenant
- SSH tunnel uptime per tenant
#### Output Verificabil
- ✅ Development (WSL): Multi-tenant funcționează cu SQLite tenant DB
- ✅ Docker: Multi-tenant funcționează cu PostgreSQL tenant DB
- ✅ Windows IIS: Multi-tenant funcționează cu SQL Server Express
- ✅ Integration tests pass (100% success rate)
- ✅ Load tests: 3 tenants × 100 requests, < 200ms avg response time
- ✅ SSH tunnels: No crashes during 1h load test
- ✅ Cache isolation validated: Tenant A cache ≠ Tenant B cache
- ✅ Audit logs populated corect pentru toate requests
- ✅ Documentation complete (onboarding guide, troubleshooting)
---
## 🔧 Connection Management
### SSH Tunnel Configuration
**Tenant cu SSH Tunnel (Client Remote)**
```json
{
"id": "client-a-uuid",
"name": "Client A - Retail SRL",
"connection_type": "ssh_tunnel",
"oracle_host": "10.0.20.36",
"oracle_port": 1521,
"oracle_sid": "ROA",
"oracle_user": "CLIENT_A_USER",
"oracle_password_encrypted": "gAAAAABh...",
"ssh_host": "83.103.197.79",
"ssh_port": 22122,
"ssh_user": "roa2web",
"ssh_key_path": "/app/ssh-keys/client-a.key",
"ssh_tunnel_local_port": 15261,
"min_connections": 2,
"max_connections": 10,
"is_active": true
}
```
**SSH Tunnel Flow:**
```
Backend Process
SSHTunnelManager.start_tunnel()
subprocess: ssh -f -N -L 15261:10.0.20.36:1521 -p 22122 roa2web@83.103.197.79
Tunnel established: localhost:15261 → 10.0.20.36:1521
OraclePool connects to localhost:15261
Oracle queries routed prin SSH tunnel
```
### Direct Connection Configuration
**Tenant cu Direct Connection (Client LAN)**
```json
{
"id": "client-b-uuid",
"name": "Client B - Import Export SA",
"connection_type": "direct",
"oracle_host": "192.168.1.50",
"oracle_port": 1521,
"oracle_sid": "ROA",
"oracle_user": "CLIENT_B_USER",
"oracle_password_encrypted": "gAAAAABh...",
"ssh_host": null,
"ssh_port": null,
"ssh_user": null,
"ssh_key_path": null,
"ssh_tunnel_local_port": null,
"min_connections": 5,
"max_connections": 20,
"is_active": true
}
```
**Direct Connection Flow:**
```
Backend Process
MultiTenantPoolManager.get_connection(tenant_id)
Check connection_type: "direct" → Skip SSH tunnel
OraclePool.create_pool(host=192.168.1.50, port=1521, ...)
Oracle queries direct la 192.168.1.50:1521
```
### Mixed Environment Setup
**3 Tenants: 2 SSH, 1 Direct**
| Tenant ID | Name | Type | Oracle Host | SSH Tunnel | Local Port |
|-----------|------|------|-------------|------------|------------|
| client-a-uuid | Client A - Retail SRL | ssh_tunnel | 10.0.20.36:1521 | 83.103.197.79:22122 | 15261 |
| client-b-uuid | Client B - Import SA | direct | 192.168.1.50:1521 | N/A | N/A |
| client-c-uuid | Client C - Distribution | ssh_tunnel | 10.0.20.36:1521 | 212.18.45.99:22 | 15262 |
**Resource Usage:**
```
Backend Memory:
├── Pool Client A: 2-10 connections × ~5MB = 10-50MB
├── Pool Client B: 5-20 connections × ~5MB = 25-100MB
├── Pool Client C: 2-10 connections × ~5MB = 10-50MB
└── Total: ~50-200MB (vs single-tenant ~10-50MB)
SSH Processes:
├── Tunnel Client A: ~10MB RAM
├── Tunnel Client C: ~10MB RAM
└── Total: ~20MB
Total Overhead: ~70-220MB (acceptable for multi-tenant SaaS)
```
---
## 🔒 Security Model
### Encryption Strategy
**Password Encryption în Tenant DB**
```python
from cryptography.fernet import Fernet
# Generate encryption key (store în .env)
encryption_key = Fernet.generate_key() # Example: b'Xs3J7...'
# Encrypt password
fernet = Fernet(encryption_key)
encrypted_password = fernet.encrypt(b"oracle_password_plaintext")
# Result: "gAAAAABh3J..."
# Decrypt password (la runtime)
decrypted_password = fernet.decrypt(encrypted_password.encode()).decode()
```
**Security Properties:**
- ✅ Symmetric encryption (Fernet - AES 128 CBC + HMAC)
- ✅ Encryption key în environment variable (`DB_ENCRYPTION_KEY`)
- ✅ Passwords encrypted at rest în tenant DB
- ✅ Decryption doar la pool initialization (memory only)
- ❌ **NOT**: Passwords în logs, error messages, audit trails
### Tenant Isolation
**Izolare Completă între Tenants**
```
┌─────────────────────────────────────────────────────────┐
│ Tenant A │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Connection Pool (2-10 connections) │ │
│ │ - oracle_host: 10.0.20.36 (via SSH tunnel) │ │
│ │ - oracle_user: CLIENT_A_USER │ │
│ │ - Schema: CLIENT_A_SCHEMA │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Redis Cache Namespace │ │
│ │ - cache:client-a-uuid:* │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Audit Logs │ │
│ │ - audit_logs WHERE tenant_id='client-a-uuid' │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
❌ ZERO SHARING ❌
┌─────────────────────────────────────────────────────────┐
│ Tenant B │
│ (Same structure, COMPLETELY ISOLATED) │
└─────────────────────────────────────────────────────────┘
```
**Isolation Guarantees:**
1. **Connection Pool:** Tenant A connections NEVER folosite pentru tenant B queries
2. **Cache:** Redis keys namespaced per tenant (`cache:{tenant_id}:*`)
3. **Audit Logs:** Query filter `WHERE tenant_id = $1` (indexat pentru performance)
4. **SSH Tunnels:** Separate processes, separate local ports (no crosstalk)
### JWT Token Structure
**Token cu Tenant ID (Signed)**
```json
{
"username": "john.doe",
"user_id": 123,
"tenant_id": "client-a-uuid",
"companies": ["COMP1", "COMP2"],
"permissions": ["read", "reports"],
"exp": 1735142400,
"iat": 1735140600,
"type": "access"
}
```
**Security Checks în TenantMiddleware:**
```python
# 1. Extract tenant_id from JWT (decoded by AuthMiddleware)
tenant_id = request.state.user.get('tenant_id')
# 2. Validate tenant exists and is active
tenant_config = await tenant_config_loader.get_tenant(tenant_id)
if not tenant_config or not tenant_config.is_active:
raise HTTPException(403, "Tenant not active")
# 3. Validate user has access to this tenant
user_id = request.state.user.get('user_id')
has_access = await tenant_config_loader.check_user_tenant_access(user_id, tenant_id)
if not has_access:
raise HTTPException(403, "User does not have access to this tenant")
# 4. Inject tenant_id în request state (immutable)
request.state.tenant_id = tenant_id # Routers use this
```
**Attack Scenarios Prevented:**
- ❌ **Tenant ID Tampering:** JWT signed, client nu poate modifica tenant_id fără invalid signature
- ❌ **Cross-Tenant Access:** User cu access la tenant A nu poate accesa tenant B (check în step 3)
- ❌ **Inactive Tenant Access:** Tenant deactivated → requests rejected (check în step 2)
- ❌ **SQL Injection via Tenant ID:** UUID validated, folosit în parameterized queries
---
## 🧪 Testing Strategy
### Unit Tests
**Test Coverage per Component**
```bash
shared/tests/
├── test_tenant_config.py # TenantConfigLoader
│ ├── test_load_tenants() # Load all tenants from DB
│ ├── test_get_tenant() # Get specific tenant
│ ├── test_reload_tenant() # Reload tenant config
│ ├── test_encryption_decryption() # Password encryption/decryption
│ └── test_default_tenant_fallback() # Fallback la .env credențiale
├── test_multi_tenant_pool.py # MultiTenantPoolManager
│ ├── test_lazy_pool_initialization() # Pool creat doar la prima cerere
│ ├── test_pool_per_tenant() # Pool-uri separate per tenant
│ ├── test_pool_cleanup_inactive() # Cleanup după 1h inactivity
│ ├── test_tenant_reload() # Reload tenant fără restart
│ └── test_connection_context_manager() # get_connection() pattern
├── test_ssh_tunnel_manager.py # SSHTunnelManager
│ ├── test_start_tunnel() # Start SSH tunnel subprocess
│ ├── test_stop_tunnel() # Stop SSH tunnel gracefully
│ ├── test_tunnel_health_check() # Detect dead tunnels
│ ├── test_auto_restart() # Restart cu exponential backoff
│ └── test_cleanup_all_tunnels() # Kill all processes la shutdown
├── test_tenant_middleware.py # TenantMiddleware
│ ├── test_extract_tenant_id() # Extract tenant_id din JWT
│ ├── test_validate_tenant_access() # User access validation
│ ├── test_inactive_tenant_blocked() # Inactive tenant → 403
│ ├── test_cross_tenant_access_blocked() # User A tenant → User B tenant → 403
│ └── test_audit_logging() # Audit logs salvate corect
└── test_encryption.py # Encryption utils
├── test_fernet_encryption() # Encrypt/decrypt passwords
└── test_key_rotation() # Future: Key rotation support
```
**Run Unit Tests:**
```bash
cd shared/
pytest tests/ -v --cov=database --cov=middleware --cov=utils --cov-report=html
# Expected output:
# ✅ test_tenant_config.py::test_load_tenants PASSED
# ✅ test_multi_tenant_pool.py::test_lazy_pool_initialization PASSED
# ...
# Coverage: 85% (target: > 80%)
```
### Integration Tests
**End-to-End Scenarios**
```bash
shared/tests/integration/
├── test_multi_tenant_flow.py # Complete multi-tenant flow
│ ├── test_login_with_tenant_a() # Login → JWT cu tenant A
│ ├── test_dashboard_tenant_a() # Dashboard query tenant A
│ ├── test_cache_hit_tenant_a() # Cache hit pentru tenant A
│ ├── test_cross_tenant_isolation() # Tenant A cache ≠ Tenant B cache
│ └── test_audit_logs_populated() # Audit logs salvate per tenant
├── test_ssh_tunnel_resilience.py # SSH tunnel stability
│ ├── test_tunnel_auto_recovery() # Kill tunnel → Auto-restart
│ ├── test_multiple_tunnels_parallel() # 3 tenants SSH simultaneous
│ └── test_tunnel_port_conflicts() # Port allocation unique
└── test_deployment_scenarios.py # Deployment compatibility
├── test_development_sqlite() # Development cu SQLite tenant DB
├── test_docker_postgresql() # Docker cu PostgreSQL tenant DB
└── test_backward_compatibility() # Tenant "default" funcționează
```
**Run Integration Tests:**
```bash
# Requires: PostgreSQL tenant DB running + Redis + Oracle test server
docker-compose -f docker-compose.test.yml up -d
pytest shared/tests/integration/ -v --tb=short
# Expected output:
# ✅ test_multi_tenant_flow.py::test_login_with_tenant_a PASSED (0.5s)
# ✅ test_multi_tenant_flow.py::test_cache_hit_tenant_a PASSED (0.2s)
# ...
```
### Load Testing
**Performance Validation cu Multiple Tenants**
```python
# shared/tests/load/test_multi_tenant_load.py
from locust import HttpUser, task, between
import random
class MultiTenantUser(HttpUser):
wait_time = between(1, 3)
def on_start(self):
# Login to random tenant
self.tenant = random.choice(['client-a-uuid', 'client-b-uuid', 'client-c-uuid'])
response = self.client.post('/api/auth/login', json={
'username': f'user_{self.tenant}',
'password': 'test_password'
})
self.token = response.json()['access_token']
self.client.headers.update({'Authorization': f'Bearer {self.token}'})
@task(3)
def get_dashboard(self):
self.client.get(f'/api/dashboard/COMP1')
@task(2)
def get_invoices(self):
self.client.get(f'/api/invoices/COMP1')
@task(1)
def get_treasury(self):
self.client.get(f'/api/treasury/COMP1')
```
**Run Load Test:**
```bash
locust -f shared/tests/load/test_multi_tenant_load.py --host=http://localhost:8001
# Scenario: 3 tenants × 100 users = 300 concurrent users
# Duration: 10 minutes
# Expected:
# - Response time: < 200ms (p95)
# - Error rate: < 1%
# - SSH tunnels: No restarts
# - Pool connections: Max 10 per tenant (no exhaustion)
```
---
## 📊 Migration Checklist
### Pre-Migration
- [ ] **Backup production database**
```bash
# Backup Oracle database
expdp username/password@ROA directory=BACKUP dumpfile=pre_migration.dmp
# Backup existing .env files
cp reports-app/backend/.env reports-app/backend/.env.backup
```
- [ ] **Document current single-tenant config**
```bash
# Save current credentials
cat reports-app/backend/.env > docs/pre_migration_env.txt
# Save current SSH tunnel config
./ssh_tunnel.sh status > docs/pre_migration_ssh.txt
```
- [ ] **Test deployment în environment non-production**
```bash
# Create staging environment
docker-compose -f docker-compose.staging.yml up -d
# Deploy multi-tenant în staging
# ... follow migration steps ...
# Validate staging works
curl http://staging.roa2web.local/api/health
```
- [ ] **Generate DB encryption key**
```bash
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
# Save în .env: DB_ENCRYPTION_KEY=<generated_key>
```
- [ ] **Prepare tenant configuration**
- Create tenant DB (PostgreSQL/SQLite)
- Populate cu tenant "default" (credențiale existente)
- Add SSH keys pentru tenants remote
### Migration Steps (Production)
**Step 1: Deploy Tenant Config DB (30 min)**
```bash
# Docker deployment
docker-compose up -d roa-tenant-config-db
# Verify DB is running
docker-compose exec roa-tenant-config-db psql -U tenant_admin -d tenant_config -c '\dt'
# Run schema
docker-compose exec roa-tenant-config-db psql -U tenant_admin -d tenant_config -f /docker-entrypoint-initdb.d/schema.sql
```
**Step 2: Populate Tenant "default" (15 min)**
```bash
# Run migration script
docker-compose exec roa-backend python shared/scripts/create_default_tenant.py
# Verify tenant created
docker-compose exec roa-tenant-config-db psql -U tenant_admin -d tenant_config -c 'SELECT id, name, connection_type FROM tenants;'
```
**Step 3: Deploy Backend cu MultiTenantPoolManager (45 min)**
```bash
# Update .env with tenant DB URL
echo "TENANT_DB_URL=postgresql://tenant_admin:password@roa-tenant-config-db:5432/tenant_config" >> .env
# Rebuild backend image
docker-compose build roa-backend
# Deploy new backend (rolling update)
docker-compose up -d roa-backend
# Wait for health check
watch -n 2 'curl -s http://localhost:8001/health | jq'
```
**Step 4: Verify Tenant "default" funcționează (15 min)**
```bash
# Test login (should work exactly as before)
curl -X POST http://localhost:8001/api/auth/login \
-H 'Content-Type: application/json' \
-d '{"username": "test_user", "password": "test_password"}'
# Response should include tenant_id: "default"
# {
# "access_token": "eyJ...",
# "user": {
# "tenant_id": "default",
# ...
# }
# }
# Test dashboard (should work as before)
curl -H "Authorization: Bearer $TOKEN" http://localhost:8001/api/dashboard/COMP1
```
**Step 5: Add Tenants Noi (One by One)**
```bash
# Add tenant A (SSH tunnel)
docker-compose exec roa-backend python shared/scripts/add_tenant.py \
--name "Client A - Retail SRL" \
--connection-type ssh_tunnel \
--oracle-host 10.0.20.36 \
--oracle-user CLIENT_A_USER \
--oracle-password "encrypted_password" \
--ssh-host 83.103.197.79 \
--ssh-port 22122 \
--ssh-key /app/ssh-keys/client-a.key \
--ssh-local-port 15261
# Add users la tenant A
docker-compose exec roa-tenant-config-db psql -U tenant_admin -d tenant_config -c \
"INSERT INTO tenant_users (tenant_id, user_id, username) VALUES ('client-a-uuid', 123, 'john.doe');"
# Test tenant A login
curl -X POST http://localhost:8001/api/auth/login \
-H 'Content-Type: application/json' \
-d '{"username": "john.doe", "password": "password"}'
# Verify JWT includes tenant_id: "client-a-uuid"
```
**Step 6: Monitor Logs per Tenant (Ongoing)**
```bash
# Monitor all tenant logs
docker-compose logs -f roa-backend | grep "tenant_id"
# Monitor SSH tunnels
docker-compose logs -f roa-backend | grep "SSH tunnel"
# Monitor pool connections
docker-compose logs -f roa-backend | grep "pool"
# Check audit logs
docker-compose exec roa-tenant-config-db psql -U tenant_admin -d tenant_config -c \
'SELECT tenant_id, username, action, status, created_at FROM audit_logs ORDER BY created_at DESC LIMIT 20;'
```
**Step 7: Performance Validation (1-2h)**
```bash
# Run load test
locust -f shared/tests/load/test_multi_tenant_load.py --host=http://localhost:8001 --users=100 --spawn-rate=10 --run-time=1h
# Monitor metrics
# - Response time: < 200ms (p95)
# - Error rate: < 1%
# - Pool usage: < 80% per tenant
# - SSH tunnels: No restarts
```
### Post-Migration
- [ ] **All tenants functional**
- Tenant "default" works (backward compatibility)
- Tenant A works (SSH tunnel)
- Tenant B works (direct connection)
- [ ] **No performance degradation**
- Response time same as single-tenant (< 10% overhead)
- No connection pool exhaustion
- SSH tunnels stable (no auto-restarts)
- [ ] **Audit logs populated**
```bash
# Verify audit logs per tenant
SELECT tenant_id, COUNT(*) FROM audit_logs GROUP BY tenant_id;
```
- [ ] **Documentation updated**
- Update `CLAUDE.md` cu multi-tenant architecture
- Update deployment guides (Docker, Windows)
- Create tenant onboarding guide
- [ ] **Monitoring dashboards**
- Grafana dashboard per tenant
- Alerts pentru pool exhaustion, SSH tunnel failures
---
## 🎯 Deployment Guides
### Development Setup (WSL/Local)
**Prerequisites:**
- Python 3.11+
- SQLite3
- Redis server
- SSH access la Oracle server (pentru tenants cu SSH tunnel)
**Setup Steps:**
```bash
# 1. Create SQLite tenant DB
mkdir -p data
sqlite3 data/tenant_config.db < shared/schemas/tenant_config_schema.sql
# 2. Generate encryption key
python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())" > .encryption_key
DB_ENCRYPTION_KEY=$(cat .encryption_key)
# 3. Update .env
cat >> reports-app/backend/.env << EOF
# Tenant Configuration
TENANT_DB_URL=sqlite:///data/tenant_config.db
DB_ENCRYPTION_KEY=$DB_ENCRYPTION_KEY
EOF
# 4. Create default tenant
cd shared/
python scripts/create_default_tenant.py
# 5. Start Redis
redis-server --daemonize yes
# 6. Start application
cd ../
./start-dev.sh
# 7. Verify
curl http://localhost:8001/health
# Should return: {"database": "connected", "tenants_loaded": 1}
```
**Add New Tenant (Development):**
```bash
# Add tenant via SQL
sqlite3 data/tenant_config.db << EOF
INSERT INTO tenants (
id, name, connection_type,
oracle_host, oracle_port, oracle_sid, oracle_user, oracle_password_encrypted,
ssh_host, ssh_port, ssh_user, ssh_key_path, ssh_tunnel_local_port
) VALUES (
'dev-tenant-uuid',
'Dev Tenant - Test Company',
'ssh_tunnel',
'10.0.20.36',
1521,
'ROA',
'DEV_USER',
'encrypted_password_here',
'83.103.197.79',
22122,
'roa2web',
'/tmp/roa_oracle_server',
15263
);
-- Add user to tenant
INSERT INTO tenant_users (tenant_id, user_id, username)
VALUES ('dev-tenant-uuid', 999, 'dev_user');
EOF
# Restart backend
pkill -f "uvicorn app.main:app"
./start-dev.sh
```
---
### Docker Deployment (Proxmox LXC)
**Prerequisites:**
- Docker 24+
- Docker Compose 2.20+
- 4GB RAM minimum
- PostgreSQL 15 container
**docker-compose.multi-tenant.yml:**
```yaml
version: '3.8'
services:
# Tenant Configuration Database
roa-tenant-config-db:
image: postgres:15-alpine
container_name: roa-tenant-config-db
restart: unless-stopped
environment:
POSTGRES_DB: tenant_config
POSTGRES_USER: tenant_admin
POSTGRES_PASSWORD: ${TENANT_DB_PASSWORD}
volumes:
- tenant-config-data:/var/lib/postgresql/data
- ./shared/schemas/tenant_config_schema.sql:/docker-entrypoint-initdb.d/schema.sql:ro
networks:
- roa-network
healthcheck:
test: ["CMD-SHELL", "pg_isready -U tenant_admin -d tenant_config"]
interval: 10s
timeout: 5s
retries: 5
# Backend (Multi-Tenant)
roa-backend:
build:
context: .
dockerfile: ./reports-app/backend/Dockerfile
image: roa2web/backend:multi-tenant
container_name: roa-backend
restart: unless-stopped
environment:
# Tenant Configuration
- TENANT_DB_URL=postgresql://tenant_admin:${TENANT_DB_PASSWORD}@roa-tenant-config-db:5432/tenant_config
- DB_ENCRYPTION_KEY=${DB_ENCRYPTION_KEY}
# JWT Configuration
- JWT_SECRET_KEY=${JWT_SECRET_KEY}
# Redis Cache
- REDIS_URL=redis://:${REDIS_PASSWORD}@roa-redis:6379/0
volumes:
# SSH keys for tenant tunnels (read-only)
- ./ssh-keys:/app/ssh-keys:ro
- backend-logs:/app/logs
networks:
- roa-network
depends_on:
roa-tenant-config-db:
condition: service_healthy
roa-redis:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
# Redis Cache
roa-redis:
image: redis:7-alpine
container_name: roa-redis
restart: unless-stopped
command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}
volumes:
- redis-data:/data
networks:
- roa-network
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
# Frontend (unchanged)
roa-frontend:
build:
context: ./reports-app/frontend
dockerfile: Dockerfile
image: roa2web/frontend:latest
container_name: roa-frontend
restart: unless-stopped
networks:
- roa-network
# Nginx Gateway (unchanged)
roa-gateway:
build:
context: ./nginx
dockerfile: Dockerfile
image: roa2web/nginx-gateway:latest
container_name: roa-gateway
restart: unless-stopped
ports:
- "80:80"
- "443:443"
networks:
- roa-network
depends_on:
- roa-backend
- roa-frontend
volumes:
tenant-config-data:
redis-data:
backend-logs:
networks:
roa-network:
driver: bridge
```
**Deployment:**
```bash
# 1. Create .env file
cat > .env << EOF
TENANT_DB_PASSWORD=$(openssl rand -base64 32)
DB_ENCRYPTION_KEY=$(python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())")
JWT_SECRET_KEY=$(openssl rand -base64 64)
REDIS_PASSWORD=$(openssl rand -base64 32)
EOF
# 2. Prepare SSH keys directory
mkdir -p ssh-keys
chmod 700 ssh-keys
cp /path/to/client-a.key ssh-keys/client-a.key
chmod 400 ssh-keys/client-a.key
# 3. Build and start services
docker-compose -f docker-compose.multi-tenant.yml build
docker-compose -f docker-compose.multi-tenant.yml up -d
# 4. Wait for tenant DB initialization
docker-compose logs -f roa-tenant-config-db | grep "database system is ready"
# 5. Create default tenant
docker-compose exec roa-backend python shared/scripts/create_default_tenant.py
# 6. Verify deployment
curl http://localhost/api/health
# {"api": "healthy", "database": "connected", "tenants_loaded": 1}
```
**Add New Tenant:**
```bash
# Connect to tenant DB
docker-compose exec roa-tenant-config-db psql -U tenant_admin -d tenant_config
# Insert tenant (with encrypted password)
INSERT INTO tenants (id, name, connection_type, oracle_host, oracle_port, oracle_sid, oracle_user, oracle_password_encrypted, ssh_host, ssh_port, ssh_user, ssh_key_path, ssh_tunnel_local_port, is_active)
VALUES (
'client-a-uuid',
'Client A - Retail SRL',
'ssh_tunnel',
'10.0.20.36',
1521,
'ROA',
'CLIENT_A_USER',
'gAAAAABh...', -- Fernet encrypted password
'83.103.197.79',
22122,
'roa2web',
'/app/ssh-keys/client-a.key',
15261,
TRUE
);
-- Add user to tenant
INSERT INTO tenant_users (tenant_id, user_id, username)
VALUES ('client-a-uuid', 123, 'john.doe');
\q
# Reload backend (or wait for auto-reload)
docker-compose restart roa-backend
```
---
### Windows IIS Deployment
**Prerequisites:**
- Windows Server 2019+
- IIS 10+
- SQL Server Express 2019+ SAU PostgreSQL 15 for Windows
- Python 3.11+ (Windows installer)
- Redis for Windows (MSI installer)
**Setup Script:** `deployment/windows/scripts/Setup-MultiTenant.ps1`
```powershell
# Run as Administrator
.\deployment\windows\scripts\Setup-MultiTenant.ps1
<#
This script will:
1. Install SQL Server Express 2019
2. Create tenant_config database
3. Run schema SQL
4. Generate encryption key (save în Windows Credential Manager)
5. Create default tenant
6. Update ROA2WEB backend service
7. Restart IIS
#>
```
**Manual Setup:**
```powershell
# 1. Install SQL Server Express
# Download from: https://www.microsoft.com/en-us/sql-server/sql-server-downloads
# Install with default instance name: SQLEXPRESS
# 2. Create tenant database
sqlcmd -S localhost\SQLEXPRESS -E -Q "CREATE DATABASE tenant_config"
# 3. Run schema
sqlcmd -S localhost\SQLEXPRESS -d tenant_config -E -i shared\schemas\tenant_config_schema.sql
# 4. Generate encryption key
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())" | Out-File -FilePath .encryption_key -NoNewline
# 5. Store key în Windows Credential Manager
cmdkey /generic:ROA2WEB_DB_ENCRYPTION_KEY /user:system /pass:(Get-Content .encryption_key)
# 6. Update backend .env
@"
TENANT_DB_URL=mssql+pyodbc://localhost\SQLEXPRESS/tenant_config?driver=ODBC+Driver+17+for+SQL+Server&trusted_connection=yes
DB_ENCRYPTION_KEY=$(Get-Content .encryption_key)
"@ | Add-Content -Path C:\inetpub\wwwroot\roa2web\backend\.env
# 7. Create default tenant
cd C:\inetpub\wwwroot\roa2web
python shared\scripts\create_default_tenant.py
# 8. Restart backend service
Restart-Service ROA2WEB-Backend
# 9. Verify
curl http://localhost:8000/health
```
**Add New Tenant (Windows):**
```powershell
# Connect to SQL Server
sqlcmd -S localhost\SQLEXPRESS -d tenant_config -E
-- Insert tenant
INSERT INTO tenants (id, name, connection_type, oracle_host, oracle_port, oracle_sid, oracle_user, oracle_password_encrypted, is_active)
VALUES (
'client-b-uuid',
'Client B - Import Export SA',
'direct',
'192.168.1.50',
1521,
'ROA',
'CLIENT_B_USER',
'gAAAAABh...', -- Encrypted password
1
);
-- Add user to tenant
INSERT INTO tenant_users (tenant_id, user_id, username)
VALUES ('client-b-uuid', 456, 'jane.smith');
GO
EXIT
# Restart backend
Restart-Service ROA2WEB-Backend
```
---
## 📝 Configuration Examples
### Tenant Config: SSH Tunnel (Development)
```json
{
"id": "dev-client-uuid",
"name": "Development Client - Test Company",
"connection_type": "ssh_tunnel",
"oracle_host": "10.0.20.36",
"oracle_port": 1521,
"oracle_sid": "ROA",
"oracle_user": "DEV_USER",
"oracle_password_encrypted": "gAAAAABhXj7Ks3J...",
"ssh_host": "83.103.197.79",
"ssh_port": 22122,
"ssh_user": "roa2web",
"ssh_key_path": "/tmp/roa_oracle_server",
"ssh_tunnel_local_port": 15260,
"min_connections": 2,
"max_connections": 5,
"is_active": true
}
```
### Tenant Config: Direct Connection (Production)
```json
{
"id": "prod-client-uuid",
"name": "Production Client - Enterprise Corp",
"connection_type": "direct",
"oracle_host": "192.168.100.50",
"oracle_port": 1521,
"oracle_sid": "ROA",
"oracle_user": "PROD_USER",
"oracle_password_encrypted": "gAAAAABhXj8Nm4K...",
"ssh_host": null,
"ssh_port": null,
"ssh_user": null,
"ssh_key_path": null,
"ssh_tunnel_local_port": null,
"min_connections": 5,
"max_connections": 20,
"is_active": true
}
```
### Tenant Config: Docker Deployment (PostgreSQL Tenant DB)
**.env for Docker Compose:**
```bash
# Tenant Configuration Database
TENANT_DB_PASSWORD=SecurePostgresPassword123!
DB_ENCRYPTION_KEY=Xs3J7vN2pQ8kR9mT1wY5zC6bA4dF0gH=
# Backend
JWT_SECRET_KEY=YourVerySecureJWTSecretKeyHere123456789
# Redis
REDIS_PASSWORD=SecureRedisPassword456!
```
### User-Tenant Mapping Example
```sql
-- User john.doe has access to 2 tenants
INSERT INTO tenant_users (tenant_id, user_id, username, is_admin) VALUES
('client-a-uuid', 123, 'john.doe', TRUE),
('client-b-uuid', 123, 'john.doe', FALSE);
-- User jane.smith has access to 1 tenant
INSERT INTO tenant_users (tenant_id, user_id, username, is_admin) VALUES
('client-b-uuid', 456, 'jane.smith', FALSE);
-- Query: Get all tenants for user
SELECT t.id, t.name, tu.is_admin
FROM tenants t
JOIN tenant_users tu ON t.id = tu.tenant_id
WHERE tu.user_id = 123 AND t.is_active = TRUE;
-- Result:
-- | id | name | is_admin |
-- |----------------|-------------------------------|----------|
-- | client-a-uuid | Client A - Retail SRL | TRUE |
-- | client-b-uuid | Client B - Import Export SA | FALSE |
```
---
## 🎯 Success Criteria
### Definition of Done
**Funcțional:**
- ✅ Aplicația suportă minimum 3 tenants simultaneous
- ✅ Tenant identification din JWT funcționează corect
- ✅ SSH tunnels pornesc/opresc automat per tenant
- ✅ Connection pools izolate per tenant (zero sharing)
- ✅ Cache isolation între tenants (namespace per tenant)
- ✅ No cross-tenant data leakage în audit logs sau cache
**Deployment:**
- ✅ Funcționează în toate deployment scenarios (dev/WSL, Docker, Windows IIS)
- ✅ Backward compatibility: Tenant "default" funcționează exact ca single-tenant
- ✅ Zero downtime pentru existing tenant când adaugi tenant nou (lazy loading)
- ✅ Migration script successful în < 2h (staging environment)
**Performance:**
- ✅ Overhead < 10% vs single-tenant (measured în load testing)
- ✅ Response time < 200ms (p95) cu 3 tenants × 100 requests
- ✅ No connection pool exhaustion (max 80% usage per tenant)
- ✅ SSH tunnels stable (zero auto-restarts în 1h load test)
**Security:**
- ✅ Passwords encrypted at rest în tenant DB (Fernet AES-128)
- ✅ SSH keys mounted read-only în Docker volumes
- ✅ JWT tenant_id signed (nu poate fi modificat de client)
- ✅ Tenant access validation în middleware (403 pentru unauthorized)
- ✅ Audit logging TOATE operațiile per tenant
**Testing:**
- ✅ Unit tests: > 80% code coverage
- ✅ Integration tests: All scenarios pass (login, dashboard, cross-tenant isolation)
- ✅ Load tests: 3 tenants × 100 users, 10 minutes, < 1% error rate
- ✅ Manual testing: Tenant onboarding guide validated
**Documentation:**
- ✅ CLAUDE.md updated cu multi-tenant architecture
- ✅ Deployment guides (dev, Docker, Windows) complete
- ✅ Tenant onboarding guide created
- ✅ Troubleshooting guide created
- ✅ API documentation updated (Swagger/ReDoc)
---
## ⚠️ Risks & Mitigations
### Risk: SSH Tunnel Instability
**Scenario:** SSH tunnel process crashes sau network interruption între backend și SSH server.
**Impact:** Tenant-ul afectat nu poate accesa Oracle DB (requests fail cu connection error).
**Mitigation:**
1. **Health Checks:** Background task checks tunnel health every 60s
2. **Auto-Restart:** Restart tunnel automat cu exponential backoff (5s, 10s, 20s, max 60s)
3. **Monitoring:** Alert dacă tunnel e down > 5 minutes
4. **Fallback:** Graceful degradation - alți tenants continuă să funcționeze normal
**Detection:**
```python
async def monitor_ssh_tunnels():
for tenant_id in ssh_tunnel_manager.tunnels:
if not await ssh_tunnel_manager.check_tunnel_health(tenant_id):
logger.error(f"Tunnel down for tenant {tenant_id}, restarting...")
await ssh_tunnel_manager.restart_tunnel(tenant_id)
```
---
### Risk: Connection Pool Exhaustion
**Scenario:** Tenant face burst de requests, pool ajunge la max connections (ex: 10), noi requests block sau timeout.
**Impact:** Slow response time sau 503 Service Unavailable pentru tenant-ul respectiv.
**Mitigation:**
1. **Pool Limits:** Set realistic limits per tenant (min=2, max=10 default, configurable)
2. **Queue Timeout:** `getmode=POOL_GETMODE_WAIT` cu timeout (ex: 30s)
3. **Rate Limiting:** Limit requests per user/tenant (ex: 100 req/min)
4. **Monitoring:** Alert dacă pool usage > 80% pentru > 5 minutes
5. **Scaling:** Increase `max_connections` pentru high-traffic tenants
**Configuration:**
```python
# În tenant config DB
UPDATE tenants SET max_connections = 20 WHERE id = 'high-traffic-tenant-uuid';
# Reload tenant
await multi_tenant_pool.reload_tenant('high-traffic-tenant-uuid')
```
---
### Risk: Tenant Credential Leak
**Scenario:** Attacker obține acces la tenant DB sau logs și vede Oracle passwords.
**Impact:** Data breach - attacker poate accesa Oracle DB direct.
**Mitigation:**
1. **Encryption at Rest:** Passwords encrypted cu Fernet în tenant DB
2. **Encryption Key Security:** `DB_ENCRYPTION_KEY` în environment variables (nu în git)
3. **Access Control:** Tenant DB access restricted (firewall, VPN)
4. **No Plaintext Logs:** NEVER log decrypted passwords (check code reviews)
5. **Audit Logging:** Log all access la tenant config (who/when)
6. **Key Rotation:** Support key rotation (encrypt cu new key, decrypt cu old key)
**Validation:**
```bash
# Check logs pentru password leaks
docker-compose logs roa-backend | grep -i "password" | grep -v "encrypted"
# Should return ZERO results
# Check tenant DB
docker-compose exec roa-tenant-config-db psql -U tenant_admin -d tenant_config -c 'SELECT id, name, oracle_password_encrypted FROM tenants LIMIT 5;'
# oracle_password_encrypted should start with "gAAAAA..." (Fernet token)
```
---
### Risk: Cross-Tenant Data Leakage
**Scenario:** Bug în middleware sau router permite user din tenant A să acceseze date din tenant B.
**Impact:** CRITICAL data breach - confidențialitate compromisă.
**Mitigation:**
1. **Mandatory Middleware:** TenantMiddleware validează tenant access pentru TOATE requests
2. **Explicit Tenant ID:** Routers MUST use `request.state.tenant_id` (no global state)
3. **Code Reviews:** TOATE modificările în routers reviewed pentru tenant isolation
4. **Integration Tests:** Test cross-tenant access blocked (403 Forbidden)
5. **Audit Logging:** Log tenant_id în TOATE audit entries pentru forensics
**Test Scenario:**
```python
# Test: User cu tenant A încearcă să acceseze tenant B
def test_cross_tenant_access_blocked():
# Login cu tenant A
token_a = login(user_id=123, tenant_id='client-a-uuid')
# Modify JWT tenant_id → tenant B (attack simulation)
forged_token = jwt.encode({
'user_id': 123,
'tenant_id': 'client-b-uuid', # FORGED
'exp': datetime.utcnow() + timedelta(hours=1)
}, secret_key, algorithm='HS256')
# Request cu forged token
response = client.get('/api/dashboard/COMP1', headers={'Authorization': f'Bearer {forged_token}'})
# MUST return 403 Forbidden (not 200 OK)
assert response.status_code == 403
assert 'does not have access to tenant' in response.json()['detail']
```
---
### Risk: Performance Degradation cu Multiple Tenants
**Scenario:** Cu 10+ tenants, response time crește sau backend consumă prea multă memorie.
**Impact:** Poor user experience, server overload.
**Mitigation:**
1. **Lazy Loading:** Pool-uri create doar când tenant e accesat (economie memorie)
2. **Pool Cleanup:** Inactive pools > 1h se închid automat
3. **Resource Limits:** Set `max_connections` realistic per tenant (evită OOM)
4. **Monitoring:** Track memory usage, response time per tenant
5. **Horizontal Scaling:** Add more backend replicas (Docker Swarm, Kubernetes)
6. **Connection Pooling:** Reuse connections (oracle `create_pool` already does this)
**Performance Baseline:**
```
Single-Tenant:
- Memory: 50MB (1 pool × 2-10 connections)
- Response time: 50ms (p95)
Multi-Tenant (3 tenants):
- Memory: 150MB (3 pools × 2-10 connections)
- Response time: 55ms (p95)
- Overhead: 10% (acceptable)
Multi-Tenant (10 tenants):
- Memory: 500MB (10 pools × 2-10 connections)
- Response time: 65ms (p95)
- Overhead: 30% (needs optimization if > 10% target)
```
**Optimization:**
- Reduce `min_connections` de la 2 la 1 pentru low-traffic tenants
- Aggressive cleanup: Idle > 30 min (instead of 1h)
- Cache more aggressively (reduce Oracle queries)
---
## 📚 Referințe
### Current Implementation
- **OraclePool:** `shared/database/oracle_pool.py` - Singleton pattern for single-tenant
- **JWT Handler:** `shared/auth/jwt_handler.py` - Token creation/validation (needs tenant_id)
- **Auth Middleware:** `shared/auth/middleware.py` - JWT verification (needs tenant validation)
- **Backend Main:** `reports-app/backend/app/main.py` - Startup logic (needs MultiTenantPoolManager)
- **SSH Tunnel Script:** `ssh_tunnel.sh` - Single tunnel script (needs per-tenant manager)
### Inspiration & Patterns
- **Redis Implementation Plan:** `shared/docs/REDIS_IMPLEMENTATION_PLAN.md` - Good structure for this plan
- **Docker Compose:** `docker-compose.yml` - Current deployment (needs tenant-config-db service)
- **Windows Deployment:** `deployment/windows/scripts/` - Deployment patterns for Windows
- **Python oracledb Docs:** https://python-oracledb.readthedocs.io/en/latest/user_guide/connection_handling.html
- **Fernet Encryption:** https://cryptography.io/en/latest/fernet/
### Multi-Tenant Best Practices
- **Tenant Isolation Patterns:** https://docs.microsoft.com/en-us/azure/architecture/guide/multitenant/
- **Connection Pooling:** https://python-oracledb.readthedocs.io/en/latest/user_guide/connection_handling.html#connection-pooling
- **SSH Tunnel Management:** https://www.ssh.com/academy/ssh/tunneling-example
- **JWT Security:** https://jwt.io/introduction
### Testing Resources
- **pytest-asyncio:** https://pytest-asyncio.readthedocs.io/
- **Locust Load Testing:** https://docs.locust.io/en/stable/
- **Docker Compose Testing:** https://docs.docker.com/compose/
---
## 📅 Timeline Summary
| Faza | Durată | Obiectiv | Output Verificabil |
|------|--------|----------|-------------------|
| **Faza 1** | 2-3 zile | Tenant Config DB | Tenant DB funcționează, default tenant creat |
| **Faza 2** | 3-4 zile | MultiTenantPoolManager | Pool-uri per tenant, lazy loading |
| **Faza 3** | 2-3 zile | SSH Tunnel Manager | SSH tunnels per tenant, auto-restart |
| **Faza 4** | 2-3 zile | JWT & Middleware | JWT cu tenant_id, tenant validation |
| **Faza 5** | 1-2 zile | Cache & Audit | Redis cache per tenant, audit logs |
| **Faza 6** | 3-4 zile | Deployment & Testing | Deploy în toate env-urile, tests pass |
| **TOTAL** | **14-20 zile** | **Multi-Tenant Production-Ready** | All success criteria met |
---
## 🚀 Next Steps
1. **Review acest plan** cu team/stakeholders
2. **Prioritizează fazele** (poate Faza 1+2 first, restul după)
3. **Setup development environment** pentru testing
4. **Creează branch:** `feature/multi-tenant-architecture`
5. **Start Faza 1:** Tenant Configuration Database
6. **Iterate:** Test după fiecare fază, adjust plan dacă e nevoie
---
**Document Version:** 1.0
**Last Updated:** 2025-10-25
**Author:** Claude Code (Anthropic)
**Status:** Ready for Implementation