Mailpilot

Performance Benchmarks

Performance benchmarks for Mailpilot across different configurations and hardware setups.

Test Methodology

All benchmarks were conducted using:

  • Test emails: 1000 emails with varying complexity
  • Classification: Standard prompts with folder organization
  • Database: SQLite with WAL mode
  • Measurement: Average of 10 runs with outliers removed

Hardware Specifications

Test Systems

Baseline (Minimum Requirements):

  • CPU: Intel Core i5-1135G7 (4 cores @ 2.4GHz)
  • RAM: 8 GB DDR4
  • Storage: SATA SSD (500 MB/s)
  • Network: 100 Mbps

Recommended (Optimal Performance):

  • CPU: AMD Ryzen 7 5800X (8 cores @ 3.8GHz)
  • RAM: 16 GB DDR4
  • Storage: NVMe SSD (3500 MB/s)
  • Network: 1 Gbps

High-Performance (Local Models):

  • CPU: AMD Ryzen 9 5950X (16 cores @ 3.4GHz)
  • RAM: 32 GB DDR4
  • GPU: NVIDIA RTX 4090 (24 GB VRAM)
  • Storage: NVMe SSD (7000 MB/s)
  • Network: 1 Gbps

LLM Provider Comparison

Latency Benchmarks

Average time to classify a single email (includes network + API processing):

ProviderModelAvg LatencyP95 LatencyP99 Latency
OpenAIgpt-4o-mini450ms780ms1200ms
OpenAIgpt-4o850ms1400ms2100ms
Anthropicclaude-3-haiku520ms890ms1300ms
Anthropicclaude-3.5-sonnet980ms1600ms2400ms
Ollamallama3.2:3b180ms290ms450ms
Ollamallama3.1:8b420ms680ms950ms
Ollamaqwen2.5:14b890ms1400ms2100ms

Latency measured on Recommended hardware. Local models (Ollama) assume GPU acceleration.

Throughput Benchmarks

Emails processed per hour (concurrent processing, 100 emails/account):

ProviderModelBaselineRecommendedHigh-Performance
OpenAIgpt-4o-mini6,8008,0008,000*
Anthropicclaude-3-haiku5,9006,9006,900*
Ollamallama3.2:3b12,00020,00035,000
Ollamallama3.1:8b5,4008,60018,000

API-based models are bottlenecked by rate limits, not local hardware

Cost Analysis

API Provider Costs

Cost to classify 10,000 emails (typical monthly volume for heavy users):

ProviderModelCost per 1K emails10K emails100K emails
OpenAIgpt-4o-mini$0.003$0.03$0.30
OpenAIgpt-4o$0.015$0.15$1.50
Anthropicclaude-3-haiku$0.0025$0.025$0.25
Anthropicclaude-3.5-sonnet$0.030$0.30$3.00

Assumes average 200 tokens per classification (150 input + 50 output)

Local Model Costs

One-time hardware investment for local processing:

ConfigurationHardware CostMonthly ElectricityBreak-even (vs API)
CPU-only$800~$5N/A (too slow)
Entry GPU$1,500~$15~25K emails/month
High-end GPU$3,500~$30~50K emails/month

Break-even calculated against gpt-4o-mini pricing

Resource Usage

Memory Footprint

ComponentIdle1 Account5 Accounts10 Accounts
Backend45 MB80 MB150 MB280 MB
Dashboard-25 MB25 MB25 MB
Database2 MB15 MB60 MB120 MB
Ollama (llama3.2:3b)2.1 GB2.3 GB2.5 GB2.8 GB
Ollama (qwen2.5:14b)8.2 GB8.5 GB8.9 GB9.3 GB

Ollama memory includes model loaded in VRAM/RAM

CPU Utilization

Average CPU usage during email processing:

ScenarioBaselineRecommendedHigh-Performance
Idle (IMAP IDLE)1%<1%<1%
Processing (API)8-15%5-10%3-7%
Processing (Ollama 3b)45-80%30-60%15-35%
Processing (Ollama 14b)95-100%*75-90%40-65%

100% CPU on Baseline hardware = significant slowdown

Network Bandwidth

Provider TypeIdleLight (100/day)Heavy (1000/day)
API Providers<1 KB/s5-10 KB/s50-80 KB/s
Local Models<1 KB/s<1 KB/s<1 KB/s

Network usage is minimal - mainly IMAP heartbeat and API calls

Scalability Benchmarks

Email Volume

How Mailpilot handles different volumes (using gpt-4o-mini):

Daily VolumeAccountsAvg LatencyDatabase Size (30d)RAM Usage
1001-2450ms50 MB100 MB
5003-5480ms200 MB180 MB
1,0005-10520ms400 MB300 MB
5,00010-20650ms2 GB600 MB
10,00020-50800ms4 GB1.2 GB

Latency increases with scale due to database size and concurrent processing

Account Limits

Practical limits based on testing:

HardwareMax AccountsMax Emails/DayNotes
Baseline101,000Slow with local models
Recommended255,000Comfortable for most users
High-Performance100+20,000+Limited by IMAP connections

Optimization Strategies

For High Volume

1. Use Faster Models

llm_providers:
  - name: fast
    provider: openai
    model: gpt-4o-mini  # 3x faster than gpt-4o

2. Reduce Context

attachments:
  max_extracted_chars: 5000  # Limit attachment text

3. Batch Processing

concurrency: 5  # Process 5 emails concurrently

For Cost Reduction

1. Use Local Models

llm_providers:
  - name: local
    provider: ollama
    model: llama3.2:3b  # $0 per email

2. Selective Processing

accounts:
  - name: important
    llm_provider: claude-3.5-sonnet  # Expensive, accurate
  - name: newsletters
    llm_provider: ollama-3b  # Free, good enough

For Latency Reduction

1. Local Models with GPU

  • Use NVIDIA GPU with CUDA support
  • Choose smaller models (3b-8b parameters)
  • Enable GPU acceleration in Ollama

2. IMAP IDLE

  • Use providers that support IDLE (instant notifications)
  • Reduces polling overhead
  • Lower latency for real-time processing

3. Geographic Proximity

  • Choose API provider region closest to you
  • Self-host Ollama on local network
  • Reduces network latency

Real-World Performance

Personal Inbox (500 emails/day)

Setup:

  • Provider: OpenAI gpt-4o-mini
  • Accounts: 2 (Gmail, Outlook)
  • Hardware: Recommended

Results:

  • Processing time: <30 seconds behind email arrival
  • Monthly cost: ~$0.05
  • CPU usage: 5-8% average
  • RAM usage: 150 MB

Business Inbox (2000 emails/day)

Setup:

  • Provider: Claude 3 Haiku (primary), Ollama llama3.2:3b (newsletters)
  • Accounts: 10
  • Hardware: High-Performance

Results:

  • Processing time: <15 seconds behind email arrival
  • Monthly cost: ~$1.50 (mostly Claude API)
  • CPU usage: 12-20% average
  • RAM usage: 600 MB (including Ollama model)

Bottleneck Analysis

Common Bottlenecks

  1. IMAP Connection (No IDLE support)

    • Impact: 30-60s delay per email
    • Solution: Use providers with IDLE support
  2. LLM API Rate Limits

    • Impact: Emails queued during bursts
    • Solution: Increase concurrency or use local models
  3. Database Locks (High concurrency)

    • Impact: Occasional retry delays
    • Solution: Already mitigated with WAL mode
  4. Slow Models (Local inference)

    • Impact: Processing lag on large volumes
    • Solution: Use smaller models or GPU acceleration

Next Steps