Intelligent LLM routing. One config file. Zero lock-in.

llmsoup is a lightweight, high-performance LLM routing proxy built in Rust. Route requests to the optimal model based on cost, quality, and latency — with a single YAML config and zero code changes.

Get Started Documentation

Get started in seconds

curl -fsSL https://llmsoup.insideapp.fr/install.sh | sh

llmsoup — serve --stats

  ██      ██      ██    ██  ██████   ████   ██  ██  █████ 
  ██      ██      ███  ███  ██      ██  ██  ██  ██  ██  ██
  ██      ██      ████████  ██████  ██  ██  ██  ██  █████ 
  ██      ██      ██ ██ ██      ██  ██  ██  ██  ██  ██    
  ██████  ██████  ██    ██  ██████   ████    ████   ██    

  v0.1.0  |  http://127.0.0.1:8080  |  Uptime: 2h 15m 23s  |  [r]eset  [q]uit  [l]ogs  [m]odel

┌ Cost & Savings ─────────────────────────────────────┐┌ Requests ───────────────────────────────────────────┐
│Cost:   $5.21                                        ││Requests: 1,247                                      │
│Saved:  $59.92                                       ││Active:   3                                          │
│Rate:   92.0%                                        ││Errors:   0                                          │
└─────────────────────────────────────────────────────┘└─────────────────────────────────────────────────────┘
┌ Per-Model [cost] ──────────────────────────────────────────────────────────────────────────────────────────┐
│google/gemini-3.1-pro-preview $3.12      480K↑    180K↓  59.9%    █████████████████████████░░░░░░░░░░░░░░░░░│
│xiaomi/mimo-v2-flash          $0.84      3.8M↑    1.5M↓  16.1%    ███████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│
│minimax/minimax-m2.5          $0.73      935K↑    374K↓  14.0%    ██████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│
│deepseek/deepseek-v3.2        $0.52      1.3M↑    500K↓  10.0%    ████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│
│arcee-ai/trinity-mini:free    $0.00         0↑       0↓  0%       ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│
│deepseek/deepseek-v3.2:free   $0.00         0↑       0↓  0%       ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│
└────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌ Triggered Routes ────────────────────────────────────────────────────┐┌ Top Errors ────────────────────────┐
│code_generation     362  29%   ███████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░││No error                            │
│code_review         287  23%   █████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░││                                    │
│testing             212  17%   ███████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░││                                    │
│bug_fixing          137  11%   ████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░││                                    │
│refactoring         100  8%    ███░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░││                                    │
│feature_planning    75   6%    ██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░││                                    │
│architecture        37   3%    █░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░││                                    │
│deep_analysis       25   2%    █░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░││                                    │
│simple_task         12   1%    ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░││                                    │
│fallback            0    0%    ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░││                                    │
│                                                                      ││                                    │
└──────────────────────────────────────────────────────────────────────┘└────────────────────────────────────┘

Why llmsoup?

Cost-Aware Routing

Route requests to the cheapest model that meets quality thresholds. Track per-request costs with Prometheus metrics.

Privacy First

Use local models, detect and redact PII before it reaches any provider. Your data stays on your infrastructure.

LLM Observability

See which tasks hit which models, track domain classification, and understand your LLM traffic with Prometheus metrics.

Single Binary

One Rust binary, one YAML config. No runtime dependencies, no containers required. Idle memory under 500MB.

OpenAI-Compatible

Drop-in replacement for the OpenAI API. Zero code changes in your existing applications.

Local-First Security

Runs on your infrastructure. Token-based auth on all endpoints. No external dependencies required.