AI Setup with Ollama (Self-Hosted)¶

Section: AI Assistant | Article 31
Audience: System Administrators
Last Updated: 2026-04-07

Overview¶

Ollama is a self-hosted AI runtime that runs large language models locally on your own hardware. When you use Ollama with RP-PAM, no data leaves your network. All AI processing -- embeddings, completions, risk scoring, and anomaly detection -- happens entirely on infrastructure you control.

This makes Ollama the recommended choice for: - Air-gapped environments - Highly regulated industries (healthcare, finance, government) - Organisations with strict data sovereignty requirements - Environments where cloud API costs are a concern

Prerequisites¶

Requirement	Detail
RP-PAM licence	Enterprise or MSP tier
Ollama host	A server (physical or virtual) with sufficient hardware (see below)
Network access	The RP-PAM server must be able to reach the Ollama host on port 11434

Hardware Requirements for the Ollama Host¶

The Ollama host runs the AI models in memory. Resource requirements depend on the models you choose.

Configuration	CPU	RAM	GPU	Storage	Use Case
Minimum	8 cores	16 GB	None (CPU-only)	20 GB	Small deployments, basic NL queries, slower responses
Recommended	16 cores	32 GB	NVIDIA GPU with 8+ GB VRAM	50 GB	Standard deployments, fast responses
Production	16+ cores	64 GB	NVIDIA GPU with 16+ GB VRAM	100 GB	Large deployments, multiple concurrent users

Note: Ollama can run on CPU only (no GPU required), but responses will be significantly slower. A GPU with CUDA support dramatically improves performance.

GPU Support¶

GPU Vendor	Supported?	Notes
NVIDIA	Yes	CUDA 11.8+ required; install NVIDIA Container Toolkit for Docker
AMD	Partial	ROCm support is experimental
Intel	No	Not currently supported
Apple Silicon	Yes	Metal acceleration on macOS

Step 1: Install Ollama¶

Linux¶

# One-line installer
curl -fsSL https://ollama.com/install.sh | sh

# Verify installation
ollama --version
# Expected: ollama version 0.x.x

# Start the service
sudo systemctl enable ollama
sudo systemctl start ollama

Windows¶

Download the installer from ollama.com/download.
Run the installer and follow the prompts.
Ollama starts automatically as a background service.

Verify in PowerShell:

ollama --version

Docker (Either Platform)¶

docker run -d \
  --name ollama \
  -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  --restart unless-stopped \
  ollama/ollama

docker run -d `
  --name ollama `
  -p 11434:11434 `
  -v ollama-data:/root/.ollama `
  --restart unless-stopped `
  ollama/ollama

With GPU support (NVIDIA):

docker run -d \
  --name ollama \
  --gpus all \
  -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  --restart unless-stopped \
  ollama/ollama

Step 2: Pull the Required Models¶

RP-PAM needs two models: one for completions and one for embeddings.

Pull the Completion Model¶

ollama pull llama3

ollama pull llama3

This downloads the Llama 3 8B model (approximately 4.7 GB). The download happens once; the model is cached locally.

Pull the Embedding Model¶

ollama pull nomic-embed-text

ollama pull nomic-embed-text

This downloads the nomic-embed-text model (approximately 274 MB).

Verify Models Are Available¶

ollama list

ollama list

Expected output:

NAME                 ID            SIZE    MODIFIED
llama3:latest        a6990ed6be41  4.7 GB  2 minutes ago
nomic-embed-text     0a109f422b47  274 MB  1 minute ago

Alternative Models¶

Model	Type	Size	Notes
`llama3`	Completion	4.7 GB	Recommended default; good balance of quality and speed
`llama3:70b`	Completion	40 GB	Higher quality; requires 64+ GB RAM or large GPU
`mistral`	Completion	4.1 GB	Alternative to Llama 3; slightly different strengths
`phi3`	Completion	2.3 GB	Smaller, faster; good for resource-constrained environments
`nomic-embed-text`	Embedding	274 MB	Recommended; 768 dimensions
`mxbai-embed-large`	Embedding	670 MB	Higher quality; 1024 dimensions

Step 3: Configure Ollama for Network Access¶

By default, Ollama listens on 127.0.0.1:11434 (localhost only). If the Ollama host is a different server than your RP-PAM node, you must configure it to listen on all interfaces.

Linux¶

Edit the Ollama service configuration:

sudo systemctl edit ollama

Add:

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

Restart:

sudo systemctl restart ollama

Windows¶

Set the environment variable:

[Environment]::SetEnvironmentVariable("OLLAMA_HOST", "0.0.0.0:11434", "Machine")
# Restart the Ollama service

Docker¶

If using Docker, the -p 11434:11434 flag already exposes the port to all interfaces.

Verify Remote Access¶

From your RP-PAM server:

Linux:

curl -s http://ollama-host:11434/api/tags | jq .

PowerShell:

Invoke-RestMethod -Uri "http://ollama-host:11434/api/tags"

You should see the list of available models.

Step 4: Configure rppam.config¶

Edit the AI section in rppam.config on your RP-PAM node(s).

Windows path: C:\ProgramData\Ravenphyre\RP-PAM\rppam.config
Linux path: /etc/rppam/rppam.config

{
  "ai": {
    "enabled": true,
    "provider": "ollama",
    "ollamaBaseUrl": "http://10.0.1.40:11434",
    "embeddingModel": "nomic-embed-text",
    "completionModel": "llama3",
    "embeddingDimension": 768,
    "maxTokens": 4096,
    "temperature": 0.3,
    "riskScoring": {
      "enabled": true,
      "threshold": 60
    },
    "anomalyDetection": {
      "enabled": true,
      "lookbackDays": 90
    }
  }
}

Configuration Fields¶

Field	Description	Default
`provider`	Set to `"ollama"`	(required)
`ollamaBaseUrl`	URL of the Ollama API	`http://localhost:11434`
`embeddingModel`	Ollama model name for embeddings	`nomic-embed-text`
`completionModel`	Ollama model name for completions	`llama3`
`embeddingDimension`	Vector dimension (must match the embedding model)	`768`
`maxTokens`	Maximum tokens in a completion response	`4096`
`temperature`	Response creativity (0.0-1.0)	`0.3`

Embedding Dimensions by Model¶

Model	Dimension	Set `embeddingDimension` to
`nomic-embed-text`	768	`768`
`mxbai-embed-large`	1024	`1024`

Important: If you change the embedding model after initial setup, the existing embeddings in the database are incompatible. You must rebuild the embedding index. See Rebuilding Embeddings below.

Step 5: Restart RP-PAM¶

Windows PowerShell:

Restart-Service RpPam

Linux:

sudo systemctl restart rppam

Step 6: Verify AI Is Working¶

Check Module Health¶

Linux:

curl -s http://localhost:7101/api/v1/modules \
  -H "Authorization: Bearer $ADMIN_JWT" | jq '.items[] | select(.moduleName == "ai")'

PowerShell:

$modules = Invoke-RestMethod -Uri "http://localhost:7101/api/v1/modules" `
  -Headers @{ Authorization = "Bearer $adminJwt" }
$modules.items | Where-Object { $_.moduleName -eq "ai" } | ConvertTo-Json

Expected:

{
  "moduleName": "ai",
  "status": "healthy",
  "provider": "ollama",
  "embeddingModel": "nomic-embed-text",
  "completionModel": "llama3",
  "ollamaBaseUrl": "http://10.0.1.40:11434"
}

Test a Query¶

Linux:

curl -s -X POST http://localhost:7101/api/v1/ai/query \
  -H "Authorization: Bearer $ADMIN_JWT" \
  -H "Content-Type: application/json" \
  -d '{ "query": "What access requests were submitted today?" }' | jq .

PowerShell:

$response = Invoke-RestMethod -Uri "http://localhost:7101/api/v1/ai/query" `
  -Method Post `
  -Headers @{ Authorization = "Bearer $adminJwt" } `
  -ContentType "application/json" `
  -Body '{ "query": "What access requests were submitted today?" }'
$response | ConvertTo-Json

Note: The first query after a cold start may be slower as Ollama loads the model into memory. Subsequent queries are faster.

Privacy Note¶

When using Ollama as your AI provider:

No data is sent to any external service. All prompts and responses are processed locally.
No API keys are needed. There is no cloud account to manage.
No internet connection is required after the initial model download. You can even download models on a connected machine and copy them to an air-gapped host.
Audit data stays local. Embedding vectors generated from access patterns are stored in your RP-PAM database, never transmitted externally.

This makes Ollama the only AI provider option suitable for fully air-gapped deployments.

Rebuilding Embeddings¶

If you change the embedding model, existing embeddings become incompatible. Rebuild them:

Linux:

sudo /opt/rppam/tools/rppam-migrate rebuild-embeddings

PowerShell:

& "C:\Program Files\Ravenphyre\RP-PAM\tools\rppam-migrate.exe" rebuild-embeddings

This re-processes all historical data through the new embedding model. Duration depends on the amount of historical data and the hardware available.

Troubleshooting¶

Problem	Cause	Solution
`"status": "unhealthy"` for AI module	Cannot reach Ollama	Verify Ollama is running (`ollama list`); check network connectivity to `ollamaBaseUrl`
`Connection refused`	Ollama listening on localhost only	Set `OLLAMA_HOST=0.0.0.0:11434` and restart Ollama
Very slow responses (30+ seconds)	No GPU; model running on CPU	Add a GPU, or switch to a smaller model (`phi3`)
`"model not found"`	Model not pulled	Run `ollama pull <model-name>` on the Ollama host
Out of memory	Model too large for available RAM/VRAM	Use a smaller model; add more RAM/VRAM
`"embeddingDimension mismatch"`	Dimension config does not match model	Set `embeddingDimension` to 768 for `nomic-embed-text` or 1024 for `mxbai-embed-large`
First query slow, subsequent queries fast	Normal -- model loading into memory	Expected behaviour; no action needed

Next Steps¶

AI Assistant Overview -- What the AI module can do
AI Setup with OpenAI -- Cloud alternative if local hosting is not required
AI Setup with Anthropic -- Pair Ollama embeddings with Anthropic completions