Skip to content

AI Setup with Ollama (Self-Hosted)

Section: AI Assistant | Article 31
Audience: System Administrators
Last Updated: 2026-04-07


Overview

Ollama is a self-hosted AI runtime that runs large language models locally on your own hardware. When you use Ollama with RP-PAM, no data leaves your network. All AI processing -- embeddings, completions, risk scoring, and anomaly detection -- happens entirely on infrastructure you control.

This makes Ollama the recommended choice for: - Air-gapped environments - Highly regulated industries (healthcare, finance, government) - Organisations with strict data sovereignty requirements - Environments where cloud API costs are a concern


Prerequisites

Requirement Detail
RP-PAM licence Enterprise or MSP tier
Ollama host A server (physical or virtual) with sufficient hardware (see below)
Network access The RP-PAM server must be able to reach the Ollama host on port 11434

Hardware Requirements for the Ollama Host

The Ollama host runs the AI models in memory. Resource requirements depend on the models you choose.

Configuration CPU RAM GPU Storage Use Case
Minimum 8 cores 16 GB None (CPU-only) 20 GB Small deployments, basic NL queries, slower responses
Recommended 16 cores 32 GB NVIDIA GPU with 8+ GB VRAM 50 GB Standard deployments, fast responses
Production 16+ cores 64 GB NVIDIA GPU with 16+ GB VRAM 100 GB Large deployments, multiple concurrent users

Note: Ollama can run on CPU only (no GPU required), but responses will be significantly slower. A GPU with CUDA support dramatically improves performance.

GPU Support

GPU Vendor Supported? Notes
NVIDIA Yes CUDA 11.8+ required; install NVIDIA Container Toolkit for Docker
AMD Partial ROCm support is experimental
Intel No Not currently supported
Apple Silicon Yes Metal acceleration on macOS

Step 1: Install Ollama

Linux

# One-line installer
curl -fsSL https://ollama.com/install.sh | sh

# Verify installation
ollama --version
# Expected: ollama version 0.x.x

# Start the service
sudo systemctl enable ollama
sudo systemctl start ollama

Windows

  1. Download the installer from ollama.com/download.
  2. Run the installer and follow the prompts.
  3. Ollama starts automatically as a background service.

Verify in PowerShell:

ollama --version

Docker (Either Platform)

docker run -d \
  --name ollama \
  -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  --restart unless-stopped \
  ollama/ollama
docker run -d `
  --name ollama `
  -p 11434:11434 `
  -v ollama-data:/root/.ollama `
  --restart unless-stopped `
  ollama/ollama

With GPU support (NVIDIA):

docker run -d \
  --name ollama \
  --gpus all \
  -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  --restart unless-stopped \
  ollama/ollama


Step 2: Pull the Required Models

RP-PAM needs two models: one for completions and one for embeddings.

Pull the Completion Model

ollama pull llama3
ollama pull llama3

This downloads the Llama 3 8B model (approximately 4.7 GB). The download happens once; the model is cached locally.

Pull the Embedding Model

ollama pull nomic-embed-text
ollama pull nomic-embed-text

This downloads the nomic-embed-text model (approximately 274 MB).

Verify Models Are Available

ollama list
ollama list

Expected output:

NAME                 ID            SIZE    MODIFIED
llama3:latest        a6990ed6be41  4.7 GB  2 minutes ago
nomic-embed-text     0a109f422b47  274 MB  1 minute ago

Alternative Models

Model Type Size Notes
llama3 Completion 4.7 GB Recommended default; good balance of quality and speed
llama3:70b Completion 40 GB Higher quality; requires 64+ GB RAM or large GPU
mistral Completion 4.1 GB Alternative to Llama 3; slightly different strengths
phi3 Completion 2.3 GB Smaller, faster; good for resource-constrained environments
nomic-embed-text Embedding 274 MB Recommended; 768 dimensions
mxbai-embed-large Embedding 670 MB Higher quality; 1024 dimensions

Step 3: Configure Ollama for Network Access

By default, Ollama listens on 127.0.0.1:11434 (localhost only). If the Ollama host is a different server than your RP-PAM node, you must configure it to listen on all interfaces.

Linux

Edit the Ollama service configuration:

sudo systemctl edit ollama

Add:

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

Restart:

sudo systemctl restart ollama

Windows

Set the environment variable:

[Environment]::SetEnvironmentVariable("OLLAMA_HOST", "0.0.0.0:11434", "Machine")
# Restart the Ollama service

Docker

If using Docker, the -p 11434:11434 flag already exposes the port to all interfaces.

Verify Remote Access

From your RP-PAM server:

Linux:

curl -s http://ollama-host:11434/api/tags | jq .

PowerShell:

Invoke-RestMethod -Uri "http://ollama-host:11434/api/tags"

You should see the list of available models.


Step 4: Configure rppam.config

Edit the AI section in rppam.config on your RP-PAM node(s).

Windows path: C:\ProgramData\Ravenphyre\RP-PAM\rppam.config
Linux path: /etc/rppam/rppam.config

{
  "ai": {
    "enabled": true,
    "provider": "ollama",
    "ollamaBaseUrl": "http://10.0.1.40:11434",
    "embeddingModel": "nomic-embed-text",
    "completionModel": "llama3",
    "embeddingDimension": 768,
    "maxTokens": 4096,
    "temperature": 0.3,
    "riskScoring": {
      "enabled": true,
      "threshold": 60
    },
    "anomalyDetection": {
      "enabled": true,
      "lookbackDays": 90
    }
  }
}

Configuration Fields

Field Description Default
provider Set to "ollama" (required)
ollamaBaseUrl URL of the Ollama API http://localhost:11434
embeddingModel Ollama model name for embeddings nomic-embed-text
completionModel Ollama model name for completions llama3
embeddingDimension Vector dimension (must match the embedding model) 768
maxTokens Maximum tokens in a completion response 4096
temperature Response creativity (0.0-1.0) 0.3

Embedding Dimensions by Model

Model Dimension Set embeddingDimension to
nomic-embed-text 768 768
mxbai-embed-large 1024 1024

Important: If you change the embedding model after initial setup, the existing embeddings in the database are incompatible. You must rebuild the embedding index. See Rebuilding Embeddings below.


Step 5: Restart RP-PAM

Windows PowerShell:

Restart-Service RpPam

Linux:

sudo systemctl restart rppam


Step 6: Verify AI Is Working

Check Module Health

Linux:

curl -s http://localhost:7101/api/v1/modules \
  -H "Authorization: Bearer $ADMIN_JWT" | jq '.items[] | select(.moduleName == "ai")'

PowerShell:

$modules = Invoke-RestMethod -Uri "http://localhost:7101/api/v1/modules" `
  -Headers @{ Authorization = "Bearer $adminJwt" }
$modules.items | Where-Object { $_.moduleName -eq "ai" } | ConvertTo-Json

Expected:

{
  "moduleName": "ai",
  "status": "healthy",
  "provider": "ollama",
  "embeddingModel": "nomic-embed-text",
  "completionModel": "llama3",
  "ollamaBaseUrl": "http://10.0.1.40:11434"
}

Test a Query

Linux:

curl -s -X POST http://localhost:7101/api/v1/ai/query \
  -H "Authorization: Bearer $ADMIN_JWT" \
  -H "Content-Type: application/json" \
  -d '{ "query": "What access requests were submitted today?" }' | jq .

PowerShell:

$response = Invoke-RestMethod -Uri "http://localhost:7101/api/v1/ai/query" `
  -Method Post `
  -Headers @{ Authorization = "Bearer $adminJwt" } `
  -ContentType "application/json" `
  -Body '{ "query": "What access requests were submitted today?" }'
$response | ConvertTo-Json

Note: The first query after a cold start may be slower as Ollama loads the model into memory. Subsequent queries are faster.


Privacy Note

When using Ollama as your AI provider:

  • No data is sent to any external service. All prompts and responses are processed locally.
  • No API keys are needed. There is no cloud account to manage.
  • No internet connection is required after the initial model download. You can even download models on a connected machine and copy them to an air-gapped host.
  • Audit data stays local. Embedding vectors generated from access patterns are stored in your RP-PAM database, never transmitted externally.

This makes Ollama the only AI provider option suitable for fully air-gapped deployments.


Rebuilding Embeddings

If you change the embedding model, existing embeddings become incompatible. Rebuild them:

Linux:

sudo /opt/rppam/tools/rppam-migrate rebuild-embeddings

PowerShell:

& "C:\Program Files\Ravenphyre\RP-PAM\tools\rppam-migrate.exe" rebuild-embeddings

This re-processes all historical data through the new embedding model. Duration depends on the amount of historical data and the hardware available.


Troubleshooting

Problem Cause Solution
"status": "unhealthy" for AI module Cannot reach Ollama Verify Ollama is running (ollama list); check network connectivity to ollamaBaseUrl
Connection refused Ollama listening on localhost only Set OLLAMA_HOST=0.0.0.0:11434 and restart Ollama
Very slow responses (30+ seconds) No GPU; model running on CPU Add a GPU, or switch to a smaller model (phi3)
"model not found" Model not pulled Run ollama pull <model-name> on the Ollama host
Out of memory Model too large for available RAM/VRAM Use a smaller model; add more RAM/VRAM
"embeddingDimension mismatch" Dimension config does not match model Set embeddingDimension to 768 for nomic-embed-text or 1024 for mxbai-embed-large
First query slow, subsequent queries fast Normal -- model loading into memory Expected behaviour; no action needed

Next Steps


RP-PAM v1.0.0 -- Copyright 2026 Ravenphyre. All rights reserved.