HA Multi-Node Setup¶

Section: High Availability | Article 15
Audience: System Administrators
Last Updated: 2026-04-06

Overview¶

RP-PAM supports high-availability (HA) clustering with two or more nodes. In an HA deployment, one node is the primary (leader) and the others are standby nodes. If the primary fails, a standby automatically promotes to leader within seconds. Users experience no downtime.

This article walks through a complete 2-node HA setup from scratch. If you need three or more nodes, follow the 2-node guide first, then see Adding More Nodes at the end.

Prerequisites¶

Before starting, ensure you have:

Requirement	Detail
Two servers	Both meeting the System Requirements for a medium deployment (8 cores, 16 GB RAM each)
RP-PAM installed on both	Same version on both nodes (Windows install or Linux install)
Redis instance	Redis 6.2 or later, accessible from both nodes (see Redis Setup)
Network connectivity	Ports 7001-7012, 5201, and 6379 open between nodes (see Network Requirements)
Database	Either a shared external database or local databases on each node (see Database Mode)
License	Enterprise or MSP tier license (Standard tier does not include HA)
Load balancer	nginx, HAProxy, or cloud load balancer for production use (see Load Balancer Example)

Naming Convention¶

Throughout this guide:

Name	IP Address	Role
node1	`10.0.1.10`	Primary
node2	`10.0.1.11`	Standby
redis	`10.0.1.20`	Redis server (can also run on node1)
db-server	`10.0.1.30`	External database (if using external-shared mode)

Replace these with your actual hostnames and IP addresses.

Step 1: Verify Network Connectivity¶

Before configuring anything, confirm that both nodes can reach each other and the supporting infrastructure.

Windows PowerShell (run on each node):

# From node1, test connectivity to node2
Test-NetConnection -ComputerName 10.0.1.11 -Port 5201
Test-NetConnection -ComputerName 10.0.1.11 -Port 7001

# Test Redis
Test-NetConnection -ComputerName 10.0.1.20 -Port 6379

# Test database (MSSQL example)
Test-NetConnection -ComputerName 10.0.1.30 -Port 1433

Linux (run on each node):

# From node1, test connectivity to node2
nc -zv 10.0.1.11 5201
nc -zv 10.0.1.11 7001

# Test Redis
nc -zv 10.0.1.20 6379

# Test database (PostgreSQL example)
nc -zv 10.0.1.30 5432

All tests must succeed before proceeding. If any fail, review your firewall rules.

Step 2: Install and Configure Redis¶

Redis is required for all HA deployments. It serves as the distributed cache and coordinates leader election between nodes.

Option A: Redis on a Dedicated Server (Recommended for Production)¶

Linux (on the Redis server):

# Ubuntu / Debian
sudo apt update && sudo apt install -y redis-server

# RHEL / CentOS
sudo dnf install -y redis

Edit /etc/redis/redis.conf:

# Bind to all interfaces (restrict via firewall)
bind 0.0.0.0

# Set a strong password
requirepass YOUR_REDIS_PASSWORD_HERE

# Enable persistence
appendonly yes
appendfilename "appendonly.aof"

# Optional: enable TLS (recommended for production)
# tls-port 6379
# port 0
# tls-cert-file /etc/redis/tls/redis.crt
# tls-key-file /etc/redis/tls/redis.key
# tls-ca-cert-file /etc/redis/tls/ca.crt

sudo systemctl enable redis-server
sudo systemctl restart redis-server
sudo systemctl status redis-server

Windows (on the Redis server):

Redis does not officially support Windows. For Windows environments, use one of: - Memurai (Redis-compatible, native Windows): download from https://www.memurai.com/ - Redis in WSL2: run Redis inside Windows Subsystem for Linux - Redis in Docker Desktop: docker run -d --name redis -p 6379:6379 redis:7 --requirepass YOUR_REDIS_PASSWORD_HERE

The Docker approach is simplest for Windows:

docker run -d --name rppam-redis `
  -p 6379:6379 `
  --restart unless-stopped `
  redis:7 --requirepass YOUR_REDIS_PASSWORD_HERE

Option B: Redis Co-Located on Node 1¶

For smaller deployments, Redis can run on the same server as Node 1. Use the same installation steps above but install on Node 1 instead of a dedicated server. Ensure Node 2 can reach Node 1 on port 6379.

Verify Redis¶

redis-cli -h 10.0.1.20 -a YOUR_REDIS_PASSWORD_HERE ping
# Expected output: PONG

# Windows (if Redis CLI is available)
redis-cli -h 10.0.1.20 -a YOUR_REDIS_PASSWORD_HERE ping

Step 3: Choose Your Database Mode¶

RP-PAM supports two database modes for HA clusters. Choose the one that fits your environment.

Option A: `external-shared` (Recommended)¶

Both nodes connect to the same external database server. This is the simplest approach and is recommended for most deployments.

Aspect	Detail
How it works	A single MSSQL or PostgreSQL instance serves both nodes. Both read and write to the same database.
Advantages	Simple setup. No data replication to manage. Standard database HA (AlwaysOn, replication) can protect the database itself.
Disadvantages	Database is a single point of failure unless you also configure database-level HA.
Best for	Organisations that already have enterprise database infrastructure with their own HA and backup.

Option B: `local-sync` (Advanced)¶

Each node has its own local database. RP-PAM handles replication between nodes automatically.

Aspect	Detail
How it works	Each node runs its own database instance. When the primary writes data, the write is replicated to N/2+1 nodes (quorum) before it is considered committed.
Advantages	No external database dependency. A compromised or failed node can be isolated without affecting the cluster — the other nodes continue operating with their own data.
Disadvantages	More complex setup. Each node needs sufficient disk space for a full database copy. Write latency is slightly higher due to quorum replication.
Best for	Air-gapped environments, high-security deployments, or organisations that prefer self-contained nodes.
Quorum rule	Writes must reach N/2+1 nodes. For a 2-node cluster, that means both nodes must acknowledge the write. For a 3-node cluster, 2 of 3 must acknowledge.
Compromised node	If a node is suspected of compromise, it can be isolated (removed from the cluster) immediately. The remaining nodes retain a complete, consistent copy of all data.

Step 4: Configure Node 1 (Primary)¶

Edit rppam.config on Node 1.

Using `external-shared` Database Mode¶

Windows: C:\ProgramData\Ravenphyre\RP-PAM\rppam.config
Linux: /etc/rppam/rppam.config

{
  "node": {
    "nodeName": "node1",
    "grpcPortBase": 7001
  },
  "database": {
    "globalConnectionString": "Server=10.0.1.30,1433;Database=RpPam_Global;User Id=rppam_svc;Password=YOUR_DB_PASSWORD;Encrypt=True;TrustServerCertificate=False",
    "dbType": "mssql"
  },
  "databaseSync": {
    "mode": "external-shared",
    "nodeRole": "primary",
    "peerEndpoints": [
      "https://10.0.1.11:5201"
    ],
    "writeQuorum": 1
  },
  "redis": {
    "enabled": true,
    "connectionString": "10.0.1.20:6379,password=YOUR_REDIS_PASSWORD_HERE,ssl=false",
    "keyPrefix": "rppam:",
    "tlsEnabled": false
  },
  "cluster": {
    "leaderLockTtlSeconds": 30,
    "leaderRenewalIntervalSeconds": 10,
    "outboxPollIntervalSeconds": 5,
    "heartbeatIntervalSeconds": 5,
    "virtualIp": "",
    "vipInterface": ""
  }
}

Using `local-sync` Database Mode¶

If using local-sync, change the database and databaseSync sections:

{
  "node": {
    "nodeName": "node1",
    "grpcPortBase": 7001
  },
  "database": {
    "globalConnectionString": "Server=127.0.0.1,1433;Database=RpPam_Global;User Id=rppam_svc;Password=YOUR_DB_PASSWORD;Encrypt=True;TrustServerCertificate=True",
    "dbType": "mssql"
  },
  "databaseSync": {
    "mode": "local-sync",
    "nodeRole": "primary",
    "peerEndpoints": [
      "https://10.0.1.11:5201"
    ],
    "writeQuorum": 2,
    "backup": {
      "enabled": true,
      "schedule": "0 2 * * *",
      "retentionDays": 30
    }
  },
  "redis": {
    "enabled": true,
    "connectionString": "10.0.1.20:6379,password=YOUR_REDIS_PASSWORD_HERE,ssl=false",
    "keyPrefix": "rppam:",
    "tlsEnabled": false
  },
  "cluster": {
    "leaderLockTtlSeconds": 30,
    "leaderRenewalIntervalSeconds": 10,
    "outboxPollIntervalSeconds": 5,
    "heartbeatIntervalSeconds": 5,
    "virtualIp": "",
    "vipInterface": ""
  }
}

Note: For a 2-node local-sync cluster, writeQuorum must be 2 (both nodes). This ensures no data is lost if either node fails. For 3+ nodes, set writeQuorum to 2 (majority of 3).

Step 5: Configure Node 2 (Standby)¶

Edit rppam.config on Node 2.

Using `external-shared` Database Mode¶

{
  "node": {
    "nodeName": "node2",
    "grpcPortBase": 7001
  },
  "database": {
    "globalConnectionString": "Server=10.0.1.30,1433;Database=RpPam_Global;User Id=rppam_svc;Password=YOUR_DB_PASSWORD;Encrypt=True;TrustServerCertificate=False",
    "dbType": "mssql"
  },
  "databaseSync": {
    "mode": "external-shared",
    "nodeRole": "standby",
    "peerEndpoints": [
      "https://10.0.1.10:5201"
    ],
    "writeQuorum": 1
  },
  "redis": {
    "enabled": true,
    "connectionString": "10.0.1.20:6379,password=YOUR_REDIS_PASSWORD_HERE,ssl=false",
    "keyPrefix": "rppam:",
    "tlsEnabled": false
  },
  "cluster": {
    "leaderLockTtlSeconds": 30,
    "leaderRenewalIntervalSeconds": 10,
    "outboxPollIntervalSeconds": 5,
    "heartbeatIntervalSeconds": 5,
    "virtualIp": "",
    "vipInterface": ""
  }
}

Using `local-sync` Database Mode¶

{
  "node": {
    "nodeName": "node2",
    "grpcPortBase": 7001
  },
  "database": {
    "globalConnectionString": "Server=127.0.0.1,1433;Database=RpPam_Global;User Id=rppam_svc;Password=YOUR_DB_PASSWORD;Encrypt=True;TrustServerCertificate=True",
    "dbType": "mssql"
  },
  "databaseSync": {
    "mode": "local-sync",
    "nodeRole": "standby",
    "peerEndpoints": [
      "https://10.0.1.10:5201"
    ],
    "writeQuorum": 2,
    "backup": {
      "enabled": true,
      "schedule": "0 2 * * *",
      "retentionDays": 30
    }
  },
  "redis": {
    "enabled": true,
    "connectionString": "10.0.1.20:6379,password=YOUR_REDIS_PASSWORD_HERE,ssl=false",
    "keyPrefix": "rppam:",
    "tlsEnabled": false
  },
  "cluster": {
    "leaderLockTtlSeconds": 30,
    "leaderRenewalIntervalSeconds": 10,
    "outboxPollIntervalSeconds": 5,
    "heartbeatIntervalSeconds": 5,
    "virtualIp": "",
    "vipInterface": ""
  }
}

Key differences from Node 1: - nodeName is "node2" - nodeRole is "standby" - peerEndpoints points to Node 1's address (10.0.1.10) instead of Node 2's

Step 6: Start the Cluster¶

Start the primary node first, then the standby.

Windows PowerShell¶

# On Node 1 (primary) — start first
Start-Service RpPam

# Wait for Node 1 to become healthy
do {
    Start-Sleep -Seconds 2
    $health = Invoke-RestMethod -Uri "http://localhost:7101/system/health/ping" -ErrorAction SilentlyContinue
} while ($health.status -ne "healthy")

Write-Host "Node 1 is healthy"

# On Node 2 (standby) — start second
Start-Service RpPam

Linux¶

# On Node 1 (primary) — start first
sudo systemctl start rppam

# Wait for Node 1 to become healthy
until curl -sf http://localhost:7101/system/health/ping | grep -q '"status":"healthy"'; do
    sleep 2
done
echo "Node 1 is healthy"

# On Node 2 (standby) — start second
sudo systemctl start rppam

Step 7: Verify Cluster Status¶

Once both nodes are running, verify the cluster is healthy.

Windows PowerShell¶

# Check cluster status from either node
$cluster = Invoke-RestMethod -Uri "http://localhost:7101/system/health/cluster" `
  -Headers @{ Authorization = "Bearer $adminJwt" }

$cluster | ConvertTo-Json -Depth 5

Expected response:

{
  "clusterHealthy": true,
  "leaderNode": "node1",
  "nodes": [
    {
      "nodeName": "node1",
      "role": "primary",
      "status": "healthy",
      "lastHeartbeat": "2026-04-06T10:30:00Z",
      "databaseMode": "external-shared"
    },
    {
      "nodeName": "node2",
      "role": "standby",
      "status": "healthy",
      "lastHeartbeat": "2026-04-06T10:30:02Z",
      "databaseMode": "external-shared"
    }
  ],
  "redisConnected": true,
  "writeQuorum": 1,
  "quorumMet": true
}

Linux (curl)¶

curl -s http://localhost:7101/system/health/cluster \
  -H "Authorization: Bearer $ADMIN_JWT" | jq .

What to Check¶

Field	Expected Value	Problem if Wrong
`clusterHealthy`	`true`	One or more nodes unhealthy — check logs
`leaderNode`	`"node1"`	If empty, leader election has not completed — check Redis
Both nodes `status`	`"healthy"`	If `"unreachable"`, check firewall rules between nodes
`redisConnected`	`true`	If `false`, Redis is down or unreachable
`quorumMet`	`true`	If `false`, not enough nodes are online to satisfy `writeQuorum`

Step 8: Configure a Load Balancer¶

In production, users should connect through a load balancer rather than directly to individual nodes. This ensures automatic failover if the primary becomes unavailable.

nginx Example¶

Install nginx on a separate server or on one of the RP-PAM nodes.

upstream rppam_cluster {
    # Primary node — preferred
    server 10.0.1.10:7101 weight=5;
    # Standby node — failover
    server 10.0.1.11:7101 backup;
}

server {
    listen 443 ssl;
    server_name pam.corp.local;

    ssl_certificate     /etc/nginx/ssl/pam.corp.local.crt;
    ssl_certificate_key /etc/nginx/ssl/pam.corp.local.key;
    ssl_protocols       TLSv1.2 TLSv1.3;
    ssl_ciphers         HIGH:!aNULL:!MD5;

    # Health check — route traffic only to healthy nodes
    location /system/health/ping {
        proxy_pass http://rppam_cluster;
        proxy_connect_timeout 3s;
        proxy_read_timeout 5s;
    }

    location / {
        proxy_pass https://rppam_cluster;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support (for live session monitoring)
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";

        proxy_connect_timeout 10s;
        proxy_read_timeout 300s;
        proxy_send_timeout 60s;
    }
}

After installing nginx:

sudo nginx -t          # Test configuration
sudo systemctl reload nginx

Users now connect to https://pam.corp.local instead of individual node addresses.

Adding More Nodes (3+)¶

Adding a third (or fourth, fifth, etc.) node follows the same pattern as Node 2.

Step-by-Step¶

Install RP-PAM on the new server using the same version as the existing nodes.

Edit rppam.config on the new node. Use nodeRole: "standby" and list all other nodes in peerEndpoints:

{
  "node": {
    "nodeName": "node3",
    "grpcPortBase": 7001
  },
  "databaseSync": {
    "mode": "external-shared",
    "nodeRole": "standby",
    "peerEndpoints": [
      "https://10.0.1.10:5201",
      "https://10.0.1.11:5201"
    ],
    "writeQuorum": 1
  }
}

Update existing nodes. Add the new node's endpoint to peerEndpoints on every existing node. Restart each node for the change to take effect (rolling restart — one at a time).

On Node 1 and Node 2, add "https://10.0.1.12:5201" to their peerEndpoints arrays.
Update the load balancer to include the new node.
Start the new node:
```
sudo systemctl start rppam
```

Verify cluster status — all three nodes should appear:

curl -s http://localhost:7101/system/health/cluster \
  -H "Authorization: Bearer $ADMIN_JWT" | jq .

Write Quorum for 3+ Nodes¶

Cluster Size	Recommended `writeQuorum`	Meaning
2 nodes	`2` (local-sync) or `1` (external-shared)	Both nodes must confirm (local-sync); or single DB handles it (external-shared)
3 nodes	`2`	Majority. Cluster survives 1 node failure.
5 nodes	`3`	Majority. Cluster survives 2 node failures.

For external-shared mode, writeQuorum controls cluster-level acknowledgement only. The external database handles its own consistency.

DR Node (Disaster Recovery)¶

A DR node is a special standby node located in a separate data centre or region. It receives data via asynchronous replication (not synchronous quorum), so it does not add latency to normal writes.

Key Differences from a Standard Standby¶

Aspect	Standard Standby	DR Node
Replication	Synchronous (participates in quorum)	Asynchronous (does not participate in quorum)
Latency impact	Must be low-latency to primary	Can be high-latency (cross-region)
Automatic failover	Yes — can be promoted to leader	No — must be manually promoted
Purpose	Local HA	Disaster recovery for regional/site failure

Configuring a DR Node¶

{
  "node": {
    "nodeName": "dr-node",
    "grpcPortBase": 7001
  },
  "databaseSync": {
    "mode": "local-sync",
    "nodeRole": "dr",
    "peerEndpoints": [
      "https://10.0.1.10:5201",
      "https://10.0.1.11:5201"
    ],
    "writeQuorum": 2
  }
}

The nodeRole: "dr" setting tells RP-PAM that this node: - Receives replicated data asynchronously - Does not participate in leader election - Does not count toward write quorum - Can be manually promoted to primary in a disaster

Promoting a DR Node¶

If the primary data centre is completely lost:

Windows PowerShell:

& "C:\Program Files\Ravenphyre\RP-PAM\tools\rppam-cluster.exe" promote-dr `
  --node dr-node `
  --confirm

Linux:

sudo /opt/rppam/tools/rppam-cluster promote-dr \
  --node dr-node \
  --confirm

After promotion: 1. The DR node becomes the new primary 2. Update DNS or load balancer to point to the DR node 3. Verify cluster status and data integrity 4. When the original data centre recovers, add the old nodes back as standbys

Warning: There may be a small amount of data loss when promoting a DR node, limited to writes that were committed on the primary but not yet replicated to the DR node. The promotion tool reports the replication lag at the time of promotion.

Troubleshooting¶

Problem	Likely Cause	Solution
"Leader election failed"	Redis unreachable	Check Redis is running and both nodes can connect
Node shows "unreachable"	Firewall blocking port 5201 or 7001-7012	Open inter-node ports (see Network Requirements)
"Quorum not met"	Too few nodes online	Start additional nodes; check `writeQuorum` setting
Standby never syncs (local-sync)	`peerEndpoints` misconfigured	Verify each node lists the other nodes' addresses, not its own
Split-brain warning	Redis connectivity issue	Check that all nodes can reach the same Redis instance
High write latency (local-sync)	Cross-network replication	Ensure nodes are on the same low-latency network, or switch to `external-shared`

Next Steps¶

Redis Configuration for HA — Advanced Redis settings, TLS, and persistence
VIP Failover Configuration — Configure a virtual IP for automatic client failover
Failover Testing — Test that failover works correctly
LVS Relay Setup — For HA clusters, deploy a single LVS relay behind your load balancer so license validation survives internet or upstream outages
HA Manager — Manage and monitor your cluster from a single tool

HA Multi-Node Setup¶

Overview¶

Prerequisites¶

Naming Convention¶

Step 1: Verify Network Connectivity¶

Step 2: Install and Configure Redis¶

Option A: Redis on a Dedicated Server (Recommended for Production)¶

Option B: Redis Co-Located on Node 1¶

Verify Redis¶

Step 3: Choose Your Database Mode¶

Option A: external-shared (Recommended)¶

Option B: local-sync (Advanced)¶

Step 4: Configure Node 1 (Primary)¶

Using external-shared Database Mode¶

Using local-sync Database Mode¶

Step 5: Configure Node 2 (Standby)¶

Using external-shared Database Mode¶

Using local-sync Database Mode¶

Step 6: Start the Cluster¶

Windows PowerShell¶

Linux¶

Step 7: Verify Cluster Status¶

Windows PowerShell¶

Linux (curl)¶

What to Check¶

Step 8: Configure a Load Balancer¶

nginx Example¶

Adding More Nodes (3+)¶

Step-by-Step¶

Write Quorum for 3+ Nodes¶

DR Node (Disaster Recovery)¶

Key Differences from a Standard Standby¶

Configuring a DR Node¶

Promoting a DR Node¶

Troubleshooting¶

Next Steps¶

Option A: `external-shared` (Recommended)¶

Option B: `local-sync` (Advanced)¶

Using `external-shared` Database Mode¶

Using `local-sync` Database Mode¶

Using `external-shared` Database Mode¶

Using `local-sync` Database Mode¶