AI Setup with Ollama (Self-Hosted)¶
Section: AI Assistant | Article 31
Audience: System Administrators
Last Updated: 2026-04-07
Overview¶
Ollama is a self-hosted AI runtime that runs large language models locally on your own hardware. When you use Ollama with RP-PAM, no data leaves your network. All AI processing -- embeddings, completions, risk scoring, and anomaly detection -- happens entirely on infrastructure you control.
This makes Ollama the recommended choice for: - Air-gapped environments - Highly regulated industries (healthcare, finance, government) - Organisations with strict data sovereignty requirements - Environments where cloud API costs are a concern
Prerequisites¶
| Requirement | Detail |
|---|---|
| RP-PAM licence | Enterprise or MSP tier |
| Ollama host | A server (physical or virtual) with sufficient hardware (see below) |
| Network access | The RP-PAM server must be able to reach the Ollama host on port 11434 |
Hardware Requirements for the Ollama Host¶
The Ollama host runs the AI models in memory. Resource requirements depend on the models you choose.
| Configuration | CPU | RAM | GPU | Storage | Use Case |
|---|---|---|---|---|---|
| Minimum | 8 cores | 16 GB | None (CPU-only) | 20 GB | Small deployments, basic NL queries, slower responses |
| Recommended | 16 cores | 32 GB | NVIDIA GPU with 8+ GB VRAM | 50 GB | Standard deployments, fast responses |
| Production | 16+ cores | 64 GB | NVIDIA GPU with 16+ GB VRAM | 100 GB | Large deployments, multiple concurrent users |
Note: Ollama can run on CPU only (no GPU required), but responses will be significantly slower. A GPU with CUDA support dramatically improves performance.
GPU Support¶
| GPU Vendor | Supported? | Notes |
|---|---|---|
| NVIDIA | Yes | CUDA 11.8+ required; install NVIDIA Container Toolkit for Docker |
| AMD | Partial | ROCm support is experimental |
| Intel | No | Not currently supported |
| Apple Silicon | Yes | Metal acceleration on macOS |
Step 1: Install Ollama¶
Linux¶
# One-line installer
curl -fsSL https://ollama.com/install.sh | sh
# Verify installation
ollama --version
# Expected: ollama version 0.x.x
# Start the service
sudo systemctl enable ollama
sudo systemctl start ollama
Windows¶
- Download the installer from ollama.com/download.
- Run the installer and follow the prompts.
- Ollama starts automatically as a background service.
Verify in PowerShell:
Docker (Either Platform)¶
docker run -d \
--name ollama \
-p 11434:11434 \
-v ollama-data:/root/.ollama \
--restart unless-stopped \
ollama/ollama
docker run -d `
--name ollama `
-p 11434:11434 `
-v ollama-data:/root/.ollama `
--restart unless-stopped `
ollama/ollama
With GPU support (NVIDIA):
docker run -d \
--name ollama \
--gpus all \
-p 11434:11434 \
-v ollama-data:/root/.ollama \
--restart unless-stopped \
ollama/ollama
Step 2: Pull the Required Models¶
RP-PAM needs two models: one for completions and one for embeddings.
Pull the Completion Model¶
This downloads the Llama 3 8B model (approximately 4.7 GB). The download happens once; the model is cached locally.
Pull the Embedding Model¶
This downloads the nomic-embed-text model (approximately 274 MB).
Verify Models Are Available¶
Expected output:
NAME ID SIZE MODIFIED
llama3:latest a6990ed6be41 4.7 GB 2 minutes ago
nomic-embed-text 0a109f422b47 274 MB 1 minute ago
Alternative Models¶
| Model | Type | Size | Notes |
|---|---|---|---|
llama3 |
Completion | 4.7 GB | Recommended default; good balance of quality and speed |
llama3:70b |
Completion | 40 GB | Higher quality; requires 64+ GB RAM or large GPU |
mistral |
Completion | 4.1 GB | Alternative to Llama 3; slightly different strengths |
phi3 |
Completion | 2.3 GB | Smaller, faster; good for resource-constrained environments |
nomic-embed-text |
Embedding | 274 MB | Recommended; 768 dimensions |
mxbai-embed-large |
Embedding | 670 MB | Higher quality; 1024 dimensions |
Step 3: Configure Ollama for Network Access¶
By default, Ollama listens on 127.0.0.1:11434 (localhost only). If the Ollama host is a different server than your RP-PAM node, you must configure it to listen on all interfaces.
Linux¶
Edit the Ollama service configuration:
Add:
Restart:
Windows¶
Set the environment variable:
[Environment]::SetEnvironmentVariable("OLLAMA_HOST", "0.0.0.0:11434", "Machine")
# Restart the Ollama service
Docker¶
If using Docker, the -p 11434:11434 flag already exposes the port to all interfaces.
Verify Remote Access¶
From your RP-PAM server:
Linux:
PowerShell:
You should see the list of available models.
Step 4: Configure rppam.config¶
Edit the AI section in rppam.config on your RP-PAM node(s).
Windows path: C:\ProgramData\Ravenphyre\RP-PAM\rppam.config
Linux path: /etc/rppam/rppam.config
{
"ai": {
"enabled": true,
"provider": "ollama",
"ollamaBaseUrl": "http://10.0.1.40:11434",
"embeddingModel": "nomic-embed-text",
"completionModel": "llama3",
"embeddingDimension": 768,
"maxTokens": 4096,
"temperature": 0.3,
"riskScoring": {
"enabled": true,
"threshold": 60
},
"anomalyDetection": {
"enabled": true,
"lookbackDays": 90
}
}
}
Configuration Fields¶
| Field | Description | Default |
|---|---|---|
provider |
Set to "ollama" |
(required) |
ollamaBaseUrl |
URL of the Ollama API | http://localhost:11434 |
embeddingModel |
Ollama model name for embeddings | nomic-embed-text |
completionModel |
Ollama model name for completions | llama3 |
embeddingDimension |
Vector dimension (must match the embedding model) | 768 |
maxTokens |
Maximum tokens in a completion response | 4096 |
temperature |
Response creativity (0.0-1.0) | 0.3 |
Embedding Dimensions by Model¶
| Model | Dimension | Set embeddingDimension to |
|---|---|---|
nomic-embed-text |
768 | 768 |
mxbai-embed-large |
1024 | 1024 |
Important: If you change the embedding model after initial setup, the existing embeddings in the database are incompatible. You must rebuild the embedding index. See Rebuilding Embeddings below.
Step 5: Restart RP-PAM¶
Windows PowerShell:
Linux:
Step 6: Verify AI Is Working¶
Check Module Health¶
Linux:
curl -s http://localhost:7101/api/v1/modules \
-H "Authorization: Bearer $ADMIN_JWT" | jq '.items[] | select(.moduleName == "ai")'
PowerShell:
$modules = Invoke-RestMethod -Uri "http://localhost:7101/api/v1/modules" `
-Headers @{ Authorization = "Bearer $adminJwt" }
$modules.items | Where-Object { $_.moduleName -eq "ai" } | ConvertTo-Json
Expected:
{
"moduleName": "ai",
"status": "healthy",
"provider": "ollama",
"embeddingModel": "nomic-embed-text",
"completionModel": "llama3",
"ollamaBaseUrl": "http://10.0.1.40:11434"
}
Test a Query¶
Linux:
curl -s -X POST http://localhost:7101/api/v1/ai/query \
-H "Authorization: Bearer $ADMIN_JWT" \
-H "Content-Type: application/json" \
-d '{ "query": "What access requests were submitted today?" }' | jq .
PowerShell:
$response = Invoke-RestMethod -Uri "http://localhost:7101/api/v1/ai/query" `
-Method Post `
-Headers @{ Authorization = "Bearer $adminJwt" } `
-ContentType "application/json" `
-Body '{ "query": "What access requests were submitted today?" }'
$response | ConvertTo-Json
Note: The first query after a cold start may be slower as Ollama loads the model into memory. Subsequent queries are faster.
Privacy Note¶
When using Ollama as your AI provider:
- No data is sent to any external service. All prompts and responses are processed locally.
- No API keys are needed. There is no cloud account to manage.
- No internet connection is required after the initial model download. You can even download models on a connected machine and copy them to an air-gapped host.
- Audit data stays local. Embedding vectors generated from access patterns are stored in your RP-PAM database, never transmitted externally.
This makes Ollama the only AI provider option suitable for fully air-gapped deployments.
Rebuilding Embeddings¶
If you change the embedding model, existing embeddings become incompatible. Rebuild them:
Linux:
PowerShell:
This re-processes all historical data through the new embedding model. Duration depends on the amount of historical data and the hardware available.
Troubleshooting¶
| Problem | Cause | Solution |
|---|---|---|
"status": "unhealthy" for AI module |
Cannot reach Ollama | Verify Ollama is running (ollama list); check network connectivity to ollamaBaseUrl |
Connection refused |
Ollama listening on localhost only | Set OLLAMA_HOST=0.0.0.0:11434 and restart Ollama |
| Very slow responses (30+ seconds) | No GPU; model running on CPU | Add a GPU, or switch to a smaller model (phi3) |
"model not found" |
Model not pulled | Run ollama pull <model-name> on the Ollama host |
| Out of memory | Model too large for available RAM/VRAM | Use a smaller model; add more RAM/VRAM |
"embeddingDimension mismatch" |
Dimension config does not match model | Set embeddingDimension to 768 for nomic-embed-text or 1024 for mxbai-embed-large |
| First query slow, subsequent queries fast | Normal -- model loading into memory | Expected behaviour; no action needed |
Next Steps¶
- AI Assistant Overview -- What the AI module can do
- AI Setup with OpenAI -- Cloud alternative if local hosting is not required
- AI Setup with Anthropic -- Pair Ollama embeddings with Anthropic completions
RP-PAM v1.0.0 -- Copyright 2026 Ravenphyre. All rights reserved.