Skip to content

HA Multi-Node Setup

Section: High Availability | Article 15
Audience: System Administrators
Last Updated: 2026-04-06


Overview

RP-PAM supports high-availability (HA) clustering with two or more nodes. In an HA deployment, one node is the primary (leader) and the others are standby nodes. If the primary fails, a standby automatically promotes to leader within seconds. Users experience no downtime.

This article walks through a complete 2-node HA setup from scratch. If you need three or more nodes, follow the 2-node guide first, then see Adding More Nodes at the end.


Prerequisites

Before starting, ensure you have:

Requirement Detail
Two servers Both meeting the System Requirements for a medium deployment (8 cores, 16 GB RAM each)
RP-PAM installed on both Same version on both nodes (Windows install or Linux install)
Redis instance Redis 6.2 or later, accessible from both nodes (see Redis Setup)
Network connectivity Ports 7001-7012, 5201, and 6379 open between nodes (see Network Requirements)
Database Either a shared external database or local databases on each node (see Database Mode)
License Enterprise or MSP tier license (Standard tier does not include HA)
Load balancer nginx, HAProxy, or cloud load balancer for production use (see Load Balancer Example)

Naming Convention

Throughout this guide:

Name IP Address Role
node1 10.0.1.10 Primary
node2 10.0.1.11 Standby
redis 10.0.1.20 Redis server (can also run on node1)
db-server 10.0.1.30 External database (if using external-shared mode)

Replace these with your actual hostnames and IP addresses.


Step 1: Verify Network Connectivity

Before configuring anything, confirm that both nodes can reach each other and the supporting infrastructure.

Windows PowerShell (run on each node):

# From node1, test connectivity to node2
Test-NetConnection -ComputerName 10.0.1.11 -Port 5201
Test-NetConnection -ComputerName 10.0.1.11 -Port 7001

# Test Redis
Test-NetConnection -ComputerName 10.0.1.20 -Port 6379

# Test database (MSSQL example)
Test-NetConnection -ComputerName 10.0.1.30 -Port 1433

Linux (run on each node):

# From node1, test connectivity to node2
nc -zv 10.0.1.11 5201
nc -zv 10.0.1.11 7001

# Test Redis
nc -zv 10.0.1.20 6379

# Test database (PostgreSQL example)
nc -zv 10.0.1.30 5432

All tests must succeed before proceeding. If any fail, review your firewall rules.


Step 2: Install and Configure Redis

Redis is required for all HA deployments. It serves as the distributed cache and coordinates leader election between nodes.

Linux (on the Redis server):

# Ubuntu / Debian
sudo apt update && sudo apt install -y redis-server

# RHEL / CentOS
sudo dnf install -y redis

Edit /etc/redis/redis.conf:

# Bind to all interfaces (restrict via firewall)
bind 0.0.0.0

# Set a strong password
requirepass YOUR_REDIS_PASSWORD_HERE

# Enable persistence
appendonly yes
appendfilename "appendonly.aof"

# Optional: enable TLS (recommended for production)
# tls-port 6379
# port 0
# tls-cert-file /etc/redis/tls/redis.crt
# tls-key-file /etc/redis/tls/redis.key
# tls-ca-cert-file /etc/redis/tls/ca.crt

sudo systemctl enable redis-server
sudo systemctl restart redis-server
sudo systemctl status redis-server

Windows (on the Redis server):

Redis does not officially support Windows. For Windows environments, use one of: - Memurai (Redis-compatible, native Windows): download from https://www.memurai.com/ - Redis in WSL2: run Redis inside Windows Subsystem for Linux - Redis in Docker Desktop: docker run -d --name redis -p 6379:6379 redis:7 --requirepass YOUR_REDIS_PASSWORD_HERE

The Docker approach is simplest for Windows:

docker run -d --name rppam-redis `
  -p 6379:6379 `
  --restart unless-stopped `
  redis:7 --requirepass YOUR_REDIS_PASSWORD_HERE

Option B: Redis Co-Located on Node 1

For smaller deployments, Redis can run on the same server as Node 1. Use the same installation steps above but install on Node 1 instead of a dedicated server. Ensure Node 2 can reach Node 1 on port 6379.

Verify Redis

redis-cli -h 10.0.1.20 -a YOUR_REDIS_PASSWORD_HERE ping
# Expected output: PONG
# Windows (if Redis CLI is available)
redis-cli -h 10.0.1.20 -a YOUR_REDIS_PASSWORD_HERE ping

Step 3: Choose Your Database Mode

RP-PAM supports two database modes for HA clusters. Choose the one that fits your environment.

Both nodes connect to the same external database server. This is the simplest approach and is recommended for most deployments.

Aspect Detail
How it works A single MSSQL or PostgreSQL instance serves both nodes. Both read and write to the same database.
Advantages Simple setup. No data replication to manage. Standard database HA (AlwaysOn, replication) can protect the database itself.
Disadvantages Database is a single point of failure unless you also configure database-level HA.
Best for Organisations that already have enterprise database infrastructure with their own HA and backup.

Option B: local-sync (Advanced)

Each node has its own local database. RP-PAM handles replication between nodes automatically.

Aspect Detail
How it works Each node runs its own database instance. When the primary writes data, the write is replicated to N/2+1 nodes (quorum) before it is considered committed.
Advantages No external database dependency. A compromised or failed node can be isolated without affecting the cluster — the other nodes continue operating with their own data.
Disadvantages More complex setup. Each node needs sufficient disk space for a full database copy. Write latency is slightly higher due to quorum replication.
Best for Air-gapped environments, high-security deployments, or organisations that prefer self-contained nodes.
Quorum rule Writes must reach N/2+1 nodes. For a 2-node cluster, that means both nodes must acknowledge the write. For a 3-node cluster, 2 of 3 must acknowledge.
Compromised node If a node is suspected of compromise, it can be isolated (removed from the cluster) immediately. The remaining nodes retain a complete, consistent copy of all data.

Step 4: Configure Node 1 (Primary)

Edit rppam.config on Node 1.

Using external-shared Database Mode

Windows: C:\ProgramData\Ravenphyre\RP-PAM\rppam.config
Linux: /etc/rppam/rppam.config

{
  "node": {
    "nodeName": "node1",
    "grpcPortBase": 7001
  },
  "database": {
    "globalConnectionString": "Server=10.0.1.30,1433;Database=RpPam_Global;User Id=rppam_svc;Password=YOUR_DB_PASSWORD;Encrypt=True;TrustServerCertificate=False",
    "dbType": "mssql"
  },
  "databaseSync": {
    "mode": "external-shared",
    "nodeRole": "primary",
    "peerEndpoints": [
      "https://10.0.1.11:5201"
    ],
    "writeQuorum": 1
  },
  "redis": {
    "enabled": true,
    "connectionString": "10.0.1.20:6379,password=YOUR_REDIS_PASSWORD_HERE,ssl=false",
    "keyPrefix": "rppam:",
    "tlsEnabled": false
  },
  "cluster": {
    "leaderLockTtlSeconds": 30,
    "leaderRenewalIntervalSeconds": 10,
    "outboxPollIntervalSeconds": 5,
    "heartbeatIntervalSeconds": 5,
    "virtualIp": "",
    "vipInterface": ""
  }
}

Using local-sync Database Mode

If using local-sync, change the database and databaseSync sections:

{
  "node": {
    "nodeName": "node1",
    "grpcPortBase": 7001
  },
  "database": {
    "globalConnectionString": "Server=127.0.0.1,1433;Database=RpPam_Global;User Id=rppam_svc;Password=YOUR_DB_PASSWORD;Encrypt=True;TrustServerCertificate=True",
    "dbType": "mssql"
  },
  "databaseSync": {
    "mode": "local-sync",
    "nodeRole": "primary",
    "peerEndpoints": [
      "https://10.0.1.11:5201"
    ],
    "writeQuorum": 2,
    "backup": {
      "enabled": true,
      "schedule": "0 2 * * *",
      "retentionDays": 30
    }
  },
  "redis": {
    "enabled": true,
    "connectionString": "10.0.1.20:6379,password=YOUR_REDIS_PASSWORD_HERE,ssl=false",
    "keyPrefix": "rppam:",
    "tlsEnabled": false
  },
  "cluster": {
    "leaderLockTtlSeconds": 30,
    "leaderRenewalIntervalSeconds": 10,
    "outboxPollIntervalSeconds": 5,
    "heartbeatIntervalSeconds": 5,
    "virtualIp": "",
    "vipInterface": ""
  }
}

Note: For a 2-node local-sync cluster, writeQuorum must be 2 (both nodes). This ensures no data is lost if either node fails. For 3+ nodes, set writeQuorum to 2 (majority of 3).


Step 5: Configure Node 2 (Standby)

Edit rppam.config on Node 2.

Using external-shared Database Mode

{
  "node": {
    "nodeName": "node2",
    "grpcPortBase": 7001
  },
  "database": {
    "globalConnectionString": "Server=10.0.1.30,1433;Database=RpPam_Global;User Id=rppam_svc;Password=YOUR_DB_PASSWORD;Encrypt=True;TrustServerCertificate=False",
    "dbType": "mssql"
  },
  "databaseSync": {
    "mode": "external-shared",
    "nodeRole": "standby",
    "peerEndpoints": [
      "https://10.0.1.10:5201"
    ],
    "writeQuorum": 1
  },
  "redis": {
    "enabled": true,
    "connectionString": "10.0.1.20:6379,password=YOUR_REDIS_PASSWORD_HERE,ssl=false",
    "keyPrefix": "rppam:",
    "tlsEnabled": false
  },
  "cluster": {
    "leaderLockTtlSeconds": 30,
    "leaderRenewalIntervalSeconds": 10,
    "outboxPollIntervalSeconds": 5,
    "heartbeatIntervalSeconds": 5,
    "virtualIp": "",
    "vipInterface": ""
  }
}

Using local-sync Database Mode

{
  "node": {
    "nodeName": "node2",
    "grpcPortBase": 7001
  },
  "database": {
    "globalConnectionString": "Server=127.0.0.1,1433;Database=RpPam_Global;User Id=rppam_svc;Password=YOUR_DB_PASSWORD;Encrypt=True;TrustServerCertificate=True",
    "dbType": "mssql"
  },
  "databaseSync": {
    "mode": "local-sync",
    "nodeRole": "standby",
    "peerEndpoints": [
      "https://10.0.1.10:5201"
    ],
    "writeQuorum": 2,
    "backup": {
      "enabled": true,
      "schedule": "0 2 * * *",
      "retentionDays": 30
    }
  },
  "redis": {
    "enabled": true,
    "connectionString": "10.0.1.20:6379,password=YOUR_REDIS_PASSWORD_HERE,ssl=false",
    "keyPrefix": "rppam:",
    "tlsEnabled": false
  },
  "cluster": {
    "leaderLockTtlSeconds": 30,
    "leaderRenewalIntervalSeconds": 10,
    "outboxPollIntervalSeconds": 5,
    "heartbeatIntervalSeconds": 5,
    "virtualIp": "",
    "vipInterface": ""
  }
}

Key differences from Node 1: - nodeName is "node2" - nodeRole is "standby" - peerEndpoints points to Node 1's address (10.0.1.10) instead of Node 2's


Step 6: Start the Cluster

Start the primary node first, then the standby.

Windows PowerShell

# On Node 1 (primary) — start first
Start-Service RpPam

# Wait for Node 1 to become healthy
do {
    Start-Sleep -Seconds 2
    $health = Invoke-RestMethod -Uri "http://localhost:7101/system/health/ping" -ErrorAction SilentlyContinue
} while ($health.status -ne "healthy")

Write-Host "Node 1 is healthy"

# On Node 2 (standby) — start second
Start-Service RpPam

Linux

# On Node 1 (primary) — start first
sudo systemctl start rppam

# Wait for Node 1 to become healthy
until curl -sf http://localhost:7101/system/health/ping | grep -q '"status":"healthy"'; do
    sleep 2
done
echo "Node 1 is healthy"

# On Node 2 (standby) — start second
sudo systemctl start rppam

Step 7: Verify Cluster Status

Once both nodes are running, verify the cluster is healthy.

Windows PowerShell

# Check cluster status from either node
$cluster = Invoke-RestMethod -Uri "http://localhost:7101/system/health/cluster" `
  -Headers @{ Authorization = "Bearer $adminJwt" }

$cluster | ConvertTo-Json -Depth 5

Expected response:

{
  "clusterHealthy": true,
  "leaderNode": "node1",
  "nodes": [
    {
      "nodeName": "node1",
      "role": "primary",
      "status": "healthy",
      "lastHeartbeat": "2026-04-06T10:30:00Z",
      "databaseMode": "external-shared"
    },
    {
      "nodeName": "node2",
      "role": "standby",
      "status": "healthy",
      "lastHeartbeat": "2026-04-06T10:30:02Z",
      "databaseMode": "external-shared"
    }
  ],
  "redisConnected": true,
  "writeQuorum": 1,
  "quorumMet": true
}

Linux (curl)

curl -s http://localhost:7101/system/health/cluster \
  -H "Authorization: Bearer $ADMIN_JWT" | jq .

What to Check

Field Expected Value Problem if Wrong
clusterHealthy true One or more nodes unhealthy — check logs
leaderNode "node1" If empty, leader election has not completed — check Redis
Both nodes status "healthy" If "unreachable", check firewall rules between nodes
redisConnected true If false, Redis is down or unreachable
quorumMet true If false, not enough nodes are online to satisfy writeQuorum

Step 8: Configure a Load Balancer

In production, users should connect through a load balancer rather than directly to individual nodes. This ensures automatic failover if the primary becomes unavailable.

nginx Example

Install nginx on a separate server or on one of the RP-PAM nodes.

upstream rppam_cluster {
    # Primary node — preferred
    server 10.0.1.10:7101 weight=5;
    # Standby node — failover
    server 10.0.1.11:7101 backup;
}

server {
    listen 443 ssl;
    server_name pam.corp.local;

    ssl_certificate     /etc/nginx/ssl/pam.corp.local.crt;
    ssl_certificate_key /etc/nginx/ssl/pam.corp.local.key;
    ssl_protocols       TLSv1.2 TLSv1.3;
    ssl_ciphers         HIGH:!aNULL:!MD5;

    # Health check — route traffic only to healthy nodes
    location /system/health/ping {
        proxy_pass http://rppam_cluster;
        proxy_connect_timeout 3s;
        proxy_read_timeout 5s;
    }

    location / {
        proxy_pass https://rppam_cluster;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support (for live session monitoring)
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";

        proxy_connect_timeout 10s;
        proxy_read_timeout 300s;
        proxy_send_timeout 60s;
    }
}

After installing nginx:

sudo nginx -t          # Test configuration
sudo systemctl reload nginx

Users now connect to https://pam.corp.local instead of individual node addresses.


Adding More Nodes (3+)

Adding a third (or fourth, fifth, etc.) node follows the same pattern as Node 2.

Step-by-Step

  1. Install RP-PAM on the new server using the same version as the existing nodes.

  2. Edit rppam.config on the new node. Use nodeRole: "standby" and list all other nodes in peerEndpoints:

    {
      "node": {
        "nodeName": "node3",
        "grpcPortBase": 7001
      },
      "databaseSync": {
        "mode": "external-shared",
        "nodeRole": "standby",
        "peerEndpoints": [
          "https://10.0.1.10:5201",
          "https://10.0.1.11:5201"
        ],
        "writeQuorum": 1
      }
    }
    
  3. Update existing nodes. Add the new node's endpoint to peerEndpoints on every existing node. Restart each node for the change to take effect (rolling restart — one at a time).

    On Node 1 and Node 2, add "https://10.0.1.12:5201" to their peerEndpoints arrays.

  4. Update the load balancer to include the new node.

  5. Start the new node:

    sudo systemctl start rppam
    

  6. Verify cluster status — all three nodes should appear:

    curl -s http://localhost:7101/system/health/cluster \
      -H "Authorization: Bearer $ADMIN_JWT" | jq .
    

Write Quorum for 3+ Nodes

Cluster Size Recommended writeQuorum Meaning
2 nodes 2 (local-sync) or 1 (external-shared) Both nodes must confirm (local-sync); or single DB handles it (external-shared)
3 nodes 2 Majority. Cluster survives 1 node failure.
5 nodes 3 Majority. Cluster survives 2 node failures.

For external-shared mode, writeQuorum controls cluster-level acknowledgement only. The external database handles its own consistency.


DR Node (Disaster Recovery)

A DR node is a special standby node located in a separate data centre or region. It receives data via asynchronous replication (not synchronous quorum), so it does not add latency to normal writes.

Key Differences from a Standard Standby

Aspect Standard Standby DR Node
Replication Synchronous (participates in quorum) Asynchronous (does not participate in quorum)
Latency impact Must be low-latency to primary Can be high-latency (cross-region)
Automatic failover Yes — can be promoted to leader No — must be manually promoted
Purpose Local HA Disaster recovery for regional/site failure

Configuring a DR Node

{
  "node": {
    "nodeName": "dr-node",
    "grpcPortBase": 7001
  },
  "databaseSync": {
    "mode": "local-sync",
    "nodeRole": "dr",
    "peerEndpoints": [
      "https://10.0.1.10:5201",
      "https://10.0.1.11:5201"
    ],
    "writeQuorum": 2
  }
}

The nodeRole: "dr" setting tells RP-PAM that this node: - Receives replicated data asynchronously - Does not participate in leader election - Does not count toward write quorum - Can be manually promoted to primary in a disaster

Promoting a DR Node

If the primary data centre is completely lost:

Windows PowerShell:

& "C:\Program Files\Ravenphyre\RP-PAM\tools\rppam-cluster.exe" promote-dr `
  --node dr-node `
  --confirm

Linux:

sudo /opt/rppam/tools/rppam-cluster promote-dr \
  --node dr-node \
  --confirm

After promotion: 1. The DR node becomes the new primary 2. Update DNS or load balancer to point to the DR node 3. Verify cluster status and data integrity 4. When the original data centre recovers, add the old nodes back as standbys

Warning: There may be a small amount of data loss when promoting a DR node, limited to writes that were committed on the primary but not yet replicated to the DR node. The promotion tool reports the replication lag at the time of promotion.


Troubleshooting

Problem Likely Cause Solution
"Leader election failed" Redis unreachable Check Redis is running and both nodes can connect
Node shows "unreachable" Firewall blocking port 5201 or 7001-7012 Open inter-node ports (see Network Requirements)
"Quorum not met" Too few nodes online Start additional nodes; check writeQuorum setting
Standby never syncs (local-sync) peerEndpoints misconfigured Verify each node lists the other nodes' addresses, not its own
Split-brain warning Redis connectivity issue Check that all nodes can reach the same Redis instance
High write latency (local-sync) Cross-network replication Ensure nodes are on the same low-latency network, or switch to external-shared

Next Steps


RP-PAM v1.0.0 — Copyright 2026 Ravenphyre. All rights reserved.