Skip to content

HA Manager

Section: High Availability | Article 62
Audience: System Administrators
Last Updated: 2026-04-08


Overview

The RP-PAM HA Manager is a built-in tool for managing multi-node clusters. It provides a single interface for all cluster operations — adding nodes, monitoring health, pushing configuration, testing failover, and managing the virtual IP.

On Windows, launch from the Start Menu → RP-PAM HA Manager for a visual interface with tabs for Cluster Status, Node Management, Redis, VIP, and Failover Testing. On Linux, use the interactive CLI menu (rppam-ha-manager).

The HA Manager ships with the RP-PAM installer — no additional installation is required.


Architecture

Every RP-PAM node runs the same binary. The tool automatically determines its role:

Node Role HA Manager Mode What It Does
Primary (Leader) Controller Full cluster management — add/remove nodes, push config, run tests
Standby Agent Reports status, accepts config pushes from the controller
Witness Agent Participates in quorum voting only

When leadership changes (failover), the new leader automatically becomes the controller. No manual intervention needed.


Using the HA Manager

Interactive Mode

sudo rppam-ha-manager

This opens the menu:

╔══════════════════════════════════════╗
║       RP-PAM HA Manager v1.0        ║
╚══════════════════════════════════════╝

Commands:
  1) Cluster Status
  2) Add Node
  3) Remove Node
  4) Push Config
  5) Failover Test
  6) VIP Configuration
  7) Redis Configuration
  8) Node Maintenance Mode
  0) Exit

Direct Command Mode

# View cluster status
rppam-ha-manager status

# Add a standby node
rppam-ha-manager add-node --host 10.0.0.2 --port 7100 --role standby

# Add a witness node
rppam-ha-manager add-node --host 10.0.0.3 --port 7100 --role witness

Cluster Status

Shows all nodes, their health, roles, and cluster state:

  Cluster: 3/3 nodes healthy
  Quorum:  YES
  Leader:  node1.corp.local
  VIP:     192.168.1.100 (held)
  Mode:    Normal

  NODE                 ROLE       STATUS     HEALTHY    LAST HEARTBEAT
  ───────────────────────────────────────────────────────────────────────────
  node1.corp.local     primary    active     YES        14:30:05        ←
  node2.corp.local     standby    active     YES        14:30:03
  node3.corp.local     witness    active     YES        14:30:04

Adding Nodes

The node addition flow:

  1. Enter hostname/IP — the target node's address
  2. Reachability check — HA Manager pings the node to verify it's online
  3. Agent check — verifies RP-PAM is installed and the agent is running
  4. Registration — node is registered in the cluster database
  5. Config push — cluster configuration is pushed to the new node
  6. Health verification — confirms the node is healthy and participating

Witness Nodes

For 2-node clusters, the HA Manager recommends adding a witness node for quorum. The witness: - Must be on a separate host from all other nodes (the tool rejects if the IP matches an existing node) - Is lightweight — no database, no API, no vault keys - Only participates in leader election quorum voting


Failover Testing

The failover test validates that your cluster survives a primary node failure:

  1. Pre-test checklist: all nodes healthy, quorum met, not in read-only mode
  2. Stop primary: the test stops the primary node's RP-PAM service
  3. Monitor promotion: measures how long until a new leader is elected
  4. Verify: health endpoint responds, VIP transferred, grants intact
  5. Restart old primary: verifies it rejoins the cluster as standby
  6. Report: pass/fail with failover time, session survival, data integrity

Tip: Run failover tests from a standby node so the tool stays connected during the test.


Configuration Push

Push configuration changes to one or all nodes without manually editing rppam.config on each server:

  Config section (e.g., 'cluster', 'redis'): redis
  Config JSON: {"enabled": true, "connectionString": "redis.corp.local:6379"}
  Target node (blank for all): 
  Restart after push? (yes/no): yes

All config changes are written as properly formatted JSON — no manual file editing required.


Troubleshooting

Problem Cause Solution
"Cannot reach node" when adding Firewall blocking gRPC port Allow TCP 7100 between all cluster nodes
Node shows "unhealthy" Heartbeat timeout (>30 seconds) Check if RP-PAM service is running on that node
Quorum not met Fewer than N/2+1 nodes healthy Bring offline nodes back online or add a witness
VIP not transferring gratuitous ARP blocked Check network switch ARP settings; use DNS failover on Windows
Failover test fails pre-checks Cluster not in healthy state Resolve node health issues first

Next Steps


RP-PAM v1.0.0 — Copyright 2026 Ravenphyre. All rights reserved.