Ollama Setup
Configure Mailpilot to use Ollama for running local LLM models with complete privacy.
Privacy-focused solution: Ollama runs models entirely on your hardware. Emails never leave your server.
What is Ollama?
Ollama is a tool that makes running large language models locally easy:
- Download and run models with a single command
- No API keys or cloud services required
- Complete privacy - data never leaves your machine
- Free to use - no per-request costs
- GPU acceleration - fast inference with NVIDIA/AMD/Apple Silicon
Prerequisites
- Hardware:
- 8GB+ RAM minimum (16GB+ recommended)
- GPU recommended for better performance (optional)
- OS: Linux, macOS, or Windows
- Disk space: 4-8GB per model
Step 1: Install Ollama
Linux
curl -fsSL https://ollama.com/install.sh | shmacOS
brew install ollamaOr download from ollama.com/download
Windows
Download the installer from ollama.com/download
Step 2: Start Ollama Service
Linux/macOS
ollama serveThis starts the Ollama server on http://localhost:11434
Run in background:
# Using systemd (Linux)
sudo systemctl enable ollama
sudo systemctl start ollama
# Using launchd (macOS)
brew services start ollamaWindows
Ollama runs as a service automatically after installation. Check the system tray for the Ollama icon.
Step 3: Download a Model
Recommended Models for Email Classification
| Model | Size | RAM Needed | Speed | Accuracy |
|---|---|---|---|---|
| llama3.2:latest | 2GB | 8GB | ⚡⚡⚡ | ⭐⭐⭐⭐ |
| llama3.2:3b | 2GB | 8GB | ⚡⚡⚡ | ⭐⭐⭐⭐ |
| llama3.1:8b | 4.7GB | 16GB | ⚡⚡ | ⭐⭐⭐⭐ |
| mistral:latest | 4.1GB | 16GB | ⚡⚡ | ⭐⭐⭐⭐ |
| phi3:mini | 2.3GB | 8GB | ⚡⚡⚡ | ⭐⭐⭐ |
Download Your First Model
ollama pull llama3.2:latestThis downloads and caches the model locally.
List Available Models
ollama listOutput:
NAME ID SIZE MODIFIED
llama3.2:latest abc123def 2.0 GB 2 hours ago
mistral:latest xyz789ghi 4.1 GB 1 day agoStep 4: Test Ollama
Test the model works:
ollama run llama3.2:latestChat with the model:
>>> Classify this email: "Meeting reminder for tomorrow at 3pm"
This appears to be a calendar/scheduling email...Type /bye to exit.
Step 5: Configure Mailpilot
Add Ollama to your config.yaml:
llm_providers:
- name: ollama
provider: ollama
base_url: http://localhost:11434
model: llama3.2:latest
temperature: 0.1
accounts:
- name: personal
imap:
host: imap.gmail.com
# ... imap config
folders:
- name: INBOX
llm_provider: ollama
prompt: |
Classify this email into one of the following categories:
- Important
- Social
- Promotions
- Spam
Return JSON: {"action": "move", "folder": "category", "confidence": 0.95}Step 6: Start Mailpilot
pnpm startCheck logs for:
✓ LLM Provider 'ollama' initialized successfully
✓ Connected to Ollama server at http://localhost:11434
✓ Using model: llama3.2:latestConfiguration Options
Basic Configuration
llm_providers:
- name: ollama
provider: ollama # Required: Provider type
base_url: http://localhost:11434 # Required: Ollama server URL
model: llama3.2:latest # Required: Model nameAdvanced Configuration
llm_providers:
- name: ollama-advanced
provider: ollama
base_url: http://localhost:11434
model: llama3.2:latest
temperature: 0.1 # Randomness (0 = deterministic)
top_p: 0.9 # Nucleus sampling
top_k: 40 # Token selection
repeat_penalty: 1.1 # Reduce repetition
num_ctx: 2048 # Context window size
timeout: 60000 # Request timeout (ms)Remote Ollama Server
Run Ollama on a different machine:
llm_providers:
- name: ollama-remote
provider: ollama
base_url: http://192.168.1.100:11434 # Remote server IP
model: llama3.2:latestModel Selection Guide
For General Email Classification
Recommended: llama3.2:latest (2GB)
ollama pull llama3.2:latestPros:
- Small and fast
- Good accuracy for classification
- Runs on modest hardware (8GB RAM)
Typical performance: ~1-2 seconds per email
For Better Accuracy
Recommended: llama3.1:8b (4.7GB)
ollama pull llama3.1:8bPros:
- Higher accuracy than 3B models
- Better understanding of complex emails
- Good at following instructions
Requirements: 16GB+ RAM recommended
For Minimal Hardware
Recommended: phi3:mini (2.3GB)
ollama pull phi3:miniPros:
- Very small model
- Runs on 8GB RAM
- Decent accuracy
Cons:
- Lower accuracy than Llama models
- May miss nuanced classifications
Performance Optimization
GPU Acceleration
Ollama automatically uses GPU if available:
Check GPU usage:
ollama psOutput shows GPU memory usage:
NAME SIZE GPU
llama3.2:latest 2.0 GB 4.2 GB/8.0 GBCPU-Only Performance
Without GPU, models run slower but still work:
Tips for CPU-only:
- Use smaller models (llama3.2, phi3:mini)
- Reduce
num_ctx(context window) - Increase
timeoutsetting - Use fewer concurrent connections
Concurrent Requests
Ollama can handle multiple requests:
llm_providers:
- name: ollama
provider: ollama
base_url: http://localhost:11434
model: llama3.2:latest
max_concurrent: 3 # Process 3 emails simultaneouslyModel Preloading
Keep model loaded in memory for faster responses:
# Preload model (stays in memory)
ollama run llama3.2:latest &
# Or configure keep_alive
ollama run --keep-alive=24h llama3.2:latestAdvanced Features
Custom Models
Create custom models with Modelfiles:
Create: Modelfile
FROM llama3.2:latest
# Set temperature
PARAMETER temperature 0.1
# Set system prompt
SYSTEM You are an expert email classifier. Classify emails concisely and accurately.Build:
ollama create email-classifier -f ModelfileUse:
llm_providers:
- name: ollama-custom
provider: ollama
model: email-classifierMultiple Models
Use different models for different purposes:
llm_providers:
- name: ollama-fast
provider: ollama
model: llama3.2:latest # Fast, small model
- name: ollama-accurate
provider: ollama
model: llama3.1:8b # Slower, more accurate
accounts:
- name: personal
folders:
- name: INBOX
llm_provider: ollama-fast # Fast classification for high volume
- name: work
folders:
- name: INBOX
llm_provider: ollama-accurate # Better accuracy for important emailsTroubleshooting
"Connection refused" or "Cannot connect to Ollama"
Cause: Ollama server is not running.
Solutions:
# Start Ollama server
ollama serve
# Or check if running
curl http://localhost:11434Should return: Ollama is running
"Model not found"
Cause: Model not downloaded locally.
Solutions:
# List available models
ollama list
# Pull the model you need
ollama pull llama3.2:latestOut of memory errors
Cause: Model too large for available RAM.
Solutions:
-
Use smaller model:
ollama pull phi3:mini # Only 2.3GB -
Quantize model (reduce precision):
ollama pull llama3.2:latest-q4 # 4-bit quantization -
Close other applications to free RAM
-
Upgrade RAM (16GB+ recommended)
Slow performance
Causes:
- No GPU acceleration
- Large model on limited hardware
- High concurrent requests
Solutions:
- Enable GPU if available
- Use smaller model (llama3.2 vs llama3.1:8b)
- Reduce concurrent requests:
max_concurrent: 1 - Reduce context window:
num_ctx: 1024 # Reduce from default 2048
Model produces poor classifications
Causes:
- Model too small for task complexity
- Poor prompt engineering
- Low temperature causing repetitive outputs
Solutions:
-
Try larger model:
ollama pull llama3.1:8b -
Improve prompts (see Prompts Guide)
-
Adjust temperature:
temperature: 0.2 # Increase from 0.1
System Requirements
Minimum Requirements
- CPU: Modern x64 processor
- RAM: 8GB
- Disk: 10GB free space
- OS: Linux, macOS 11+, Windows 10+
Recommended Requirements
- CPU: 8+ cores
- RAM: 16GB+
- GPU: NVIDIA RTX 3060+ / AMD RX 6000+ / Apple Silicon M1+
- Disk: 20GB+ SSD
GPU Support
NVIDIA:
- CUDA 11.7+
- 6GB+ VRAM
AMD:
- ROCm 5.7+
- 6GB+ VRAM
Apple Silicon:
- M1/M2/M3
- 8GB+ unified memory
Privacy & Security
Data Privacy
With Ollama:
- ✅ All processing happens locally
- ✅ No data sent to external servers
- ✅ No API keys required
- ✅ Works offline
- ✅ GDPR/HIPAA compliant (data never leaves your infrastructure)
Network Security
Ollama binds to localhost by default:
- Only accessible from your machine
- No internet exposure
- Safe behind firewall
To allow remote access (advanced):
# NOT RECOMMENDED for production
OLLAMA_HOST=0.0.0.0:11434 ollama serveCost Comparison
Ollama (Free)
- Initial cost: $0 (open source)
- Running cost: $0/month
- Hardware cost: One-time purchase or existing server
- Per-email cost: $0
Example Calculation
Processing 100 emails/day:
- Ollama: $0/month
- OpenAI gpt-4o-mini: ~$0.45/month
- Anthropic Claude: ~$0.75/month
Annual savings with Ollama: ~$5-10/year
For high-volume email processing, Ollama saves significant costs.
Monitoring
Check Ollama Status
# View running models
ollama ps
# View model details
ollama show llama3.2:latest
# Check logs
journalctl -u ollama -f # Linux systemdPerformance Metrics
Monitor resource usage:
# CPU and RAM usage
top
# GPU usage (NVIDIA)
nvidia-smi
# GPU usage (AMD)
rocm-smiUpdating Ollama
Update Ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# macOS
brew upgrade ollama
# Windows
# Download latest installer from ollama.comUpdate Models
# Pull latest version of a model
ollama pull llama3.2:latest
# Remove old versions
ollama rm old-model-name