HA Manager¶

Section: High Availability | Article 62
Audience: System Administrators
Last Updated: 2026-04-08

Overview¶

The RP-PAM HA Manager is a built-in tool for managing multi-node clusters. It provides a single interface for all cluster operations — adding nodes, monitoring health, pushing configuration, testing failover, and managing the virtual IP.

On Windows, launch from the Start Menu → RP-PAM HA Manager for a visual interface with tabs for Cluster Status, Node Management, Redis, VIP, and Failover Testing. On Linux, use the interactive CLI menu (rppam-ha-manager).

The HA Manager ships with the RP-PAM installer — no additional installation is required.

Architecture¶

Every RP-PAM node runs the same binary. The tool automatically determines its role:

Node Role	HA Manager Mode	What It Does
Primary (Leader)	Controller	Full cluster management — add/remove nodes, push config, run tests
Standby	Agent	Reports status, accepts config pushes from the controller
Witness	Agent	Participates in quorum voting only

When leadership changes (failover), the new leader automatically becomes the controller. No manual intervention needed.

Using the HA Manager¶

Interactive Mode¶

sudo rppam-ha-manager

This opens the menu:

╔══════════════════════════════════════╗
║       RP-PAM HA Manager v1.0        ║
╚══════════════════════════════════════╝

Commands:
  1) Cluster Status
  2) Add Node
  3) Remove Node
  4) Push Config
  5) Failover Test
  6) VIP Configuration
  7) Redis Configuration
  8) Node Maintenance Mode
  0) Exit

Direct Command Mode¶

# View cluster status
rppam-ha-manager status

# Add a standby node
rppam-ha-manager add-node --host 10.0.0.2 --port 7100 --role standby

# Add a witness node
rppam-ha-manager add-node --host 10.0.0.3 --port 7100 --role witness

Cluster Status¶

Shows all nodes, their health, roles, and cluster state:

  Cluster: 3/3 nodes healthy
  Quorum:  YES
  Leader:  node1.corp.local
  VIP:     192.168.1.100 (held)
  Mode:    Normal

  NODE                 ROLE       STATUS     HEALTHY    LAST HEARTBEAT
  ───────────────────────────────────────────────────────────────────────────
  node1.corp.local     primary    active     YES        14:30:05        ←
  node2.corp.local     standby    active     YES        14:30:03
  node3.corp.local     witness    active     YES        14:30:04

Adding Nodes¶

The node addition flow:

Enter hostname/IP — the target node's address
Reachability check — HA Manager pings the node to verify it's online
Agent check — verifies RP-PAM is installed and the agent is running
Registration — node is registered in the cluster database
Config push — cluster configuration is pushed to the new node
Health verification — confirms the node is healthy and participating

Witness Nodes¶

For 2-node clusters, the HA Manager recommends adding a witness node for quorum. The witness: - Must be on a separate host from all other nodes (the tool rejects if the IP matches an existing node) - Is lightweight — no database, no API, no vault keys - Only participates in leader election quorum voting

Failover Testing¶

The failover test validates that your cluster survives a primary node failure:

Pre-test checklist: all nodes healthy, quorum met, not in read-only mode
Stop primary: the test stops the primary node's RP-PAM service
Monitor promotion: measures how long until a new leader is elected
Verify: health endpoint responds, VIP transferred, grants intact
Restart old primary: verifies it rejoins the cluster as standby
Report: pass/fail with failover time, session survival, data integrity

Tip: Run failover tests from a standby node so the tool stays connected during the test.

Configuration Push¶

Push configuration changes to one or all nodes without manually editing rppam.config on each server:

  Config section (e.g., 'cluster', 'redis'): redis
  Config JSON: {"enabled": true, "connectionString": "redis.corp.local:6379"}
  Target node (blank for all): 
  Restart after push? (yes/no): yes

All config changes are written as properly formatted JSON — no manual file editing required.

Troubleshooting¶

Problem	Cause	Solution
"Cannot reach node" when adding	Firewall blocking gRPC port	Allow TCP 7100 between all cluster nodes
Node shows "unhealthy"	Heartbeat timeout (>30 seconds)	Check if RP-PAM service is running on that node
Quorum not met	Fewer than N/2+1 nodes healthy	Bring offline nodes back online or add a witness
VIP not transferring	gratuitous ARP blocked	Check network switch ARP settings; use DNS failover on Windows
Failover test fails pre-checks	Cluster not in healthy state	Resolve node health issues first

Next Steps¶

HA Multi-Node Setup — Initial cluster setup
Redis Configuration — Required for HA deployments
VIP Failover — Virtual IP configuration
Failover Testing — Manual failover procedures