Metrics Reference
llmsoup exposes 36 Prometheus metrics covering requests, routing, signals, caching, plugins, costs, models, and reasoning. All metrics are available at the /metrics endpoint in Prometheus text exposition format.
Endpoint behavior
Section titled “Endpoint behavior”| Property | Value |
|---|---|
| URL | GET /metrics |
| Authentication | Not required |
| Content-Type | text/plain; version=0.0.4 |
| Format | Prometheus text exposition format |
curl http://localhost:8080/metricsFor Prometheus scrape configuration and Docker Compose monitoring setup, see the Deployment Guide.
Naming conventions
Section titled “Naming conventions”All llmsoup metrics follow Prometheus best practices:
- Prefix: Every metric starts with
llmsoup_ - Case:
snake_casefor metric names and label names - Unit suffixes:
_total— counters (monotonically increasing)_seconds— duration histograms_dollars— cost values (project-specific, not a Prometheus standard)_bytes— sizes (reserved, not currently used)
- Label conventions:
user_id— authenticated user or"anonymous"model— model name from routing configurationdecision_name— routing rule/decision namestatus— operation outcome (success,error,hit,miss)
Request metrics
Section titled “Request metrics”Core HTTP request tracking.
llmsoup_requests_total
Section titled “llmsoup_requests_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | method, status_code, endpoint, user_id |
| Description | Total HTTP requests received |
Labels:
method— HTTP method (GET,POST)status_code— HTTP status code (200,401,500)endpoint— Request path (/v1/chat/completions,/metrics)user_id— Authenticated user ID or"anonymous"
llmsoup_active_connections
Section titled “llmsoup_active_connections”| Property | Value |
|---|---|
| Type | Gauge |
| Labels | None |
| Description | Current active HTTP connections |
llmsoup_errors_total
Section titled “llmsoup_errors_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | error_type, user_id |
| Description | Total errors by type |
Labels:
error_type— Error category (routing_error,model_error,config_error)user_id— Authenticated user ID or"anonymous"
Example PromQL — Request rate by endpoint:
rate(llmsoup_requests_total{endpoint="/v1/chat/completions"}[5m])Example PromQL — Error rate percentage:
sum(rate(llmsoup_errors_total[5m])) / sum(rate(llmsoup_requests_total[5m])) * 100Routing metrics
Section titled “Routing metrics”Routing decision evaluation, rule matching, and model selection.
llmsoup_routing_decisions_total
Section titled “llmsoup_routing_decisions_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | outcome, decision_name, user_id |
| Description | Routing decision outcomes |
Labels:
outcome— Result of routing (success,failed,fallback,default)decision_name— Name of the matched routing ruleuser_id— Authenticated user ID or"anonymous"
llmsoup_routing_duration_seconds
Section titled “llmsoup_routing_duration_seconds”| Property | Value |
|---|---|
| Type | Histogram |
| Labels | decision_name, user_id |
| Description | Time spent making routing decisions |
| Buckets | 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0 (Prometheus defaults) |
llmsoup_decision_evaluation_total
Section titled “llmsoup_decision_evaluation_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | user_id |
| Description | Total routing rule evaluations |
llmsoup_decision_match_total
Section titled “llmsoup_decision_match_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | decision_name, user_id |
| Description | Total routing rule matches by decision name |
llmsoup_decision_confidence
Section titled “llmsoup_decision_confidence”| Property | Value |
|---|---|
| Type | Histogram |
| Labels | decision_name, user_id |
| Description | Algorithm confidence/score for routing decisions |
| Buckets | 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99 |
llmsoup_model_selection_total
Section titled “llmsoup_model_selection_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | algorithm_type, model, decision_name, user_id |
| Description | Total algorithm-based model selections |
Labels:
algorithm_type— Algorithm used ("confidence"or"ratings")model— Selected model namedecision_name— Name of the matched routing ruleuser_id— Authenticated user ID or"anonymous"
llmsoup_model_routing_modifications_total
Section titled “llmsoup_model_routing_modifications_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | source_model, target_model, user_id |
| Description | Total fallback/parallel re-routing events |
Labels:
source_model— Original primary model before re-routingtarget_model— Actual model that responded after re-routing
Example PromQL — Routing decisions per second by outcome:
sum by (outcome) (rate(llmsoup_routing_decisions_total[5m]))Example PromQL — Average routing latency:
rate(llmsoup_routing_duration_seconds_sum[5m]) / rate(llmsoup_routing_duration_seconds_count[5m])Signal metrics
Section titled “Signal metrics”Per-signal evaluation tracking across all signal types: keyword, embedding, domain, language, latency, fact_check, user_feedback, preference.
llmsoup_signal_extraction_total
Section titled “llmsoup_signal_extraction_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | signal_type, signal_name, user_id |
| Description | Signal extraction attempts |
Labels:
signal_type— Signal evaluator type (keyword,embedding,domain,language,latency,fact_check,user_feedback,preference)signal_name— Specific signal name from evaluator configuration
llmsoup_signal_match_total
Section titled “llmsoup_signal_match_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | signal_type, signal_name, user_id |
| Description | Successful signal matches (triggered=true) |
llmsoup_signal_extraction_duration_seconds
Section titled “llmsoup_signal_extraction_duration_seconds”| Property | Value |
|---|---|
| Type | Histogram |
| Labels | signal_type, user_id |
| Description | Per-signal evaluation latency |
| Buckets | 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0 |
llmsoup_signal_errors_total
Section titled “llmsoup_signal_errors_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | signal_type, error_type, user_id |
| Description | Signal evaluation failures |
llmsoup_classification_confidence
Section titled “llmsoup_classification_confidence”| Property | Value |
|---|---|
| Type | Histogram |
| Labels | category, classification_method, user_id |
| Description | Classification confidence scores |
| Buckets | 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99 |
Labels:
category— Predicted classification category (e.g.,"math","computer_science")classification_method— Classifier type ("domain","fact_check","preference")
llmsoup_classification_total
Section titled “llmsoup_classification_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | category, user_id |
| Description | Classification outcomes across all classifier-based evaluators |
Tracks domain, fact_check, and preference classification events. Incremented once per signal evaluation.
Example PromQL — Signal evaluation rate by type:
sum by (signal_type) (rate(llmsoup_signal_extraction_total[5m]))Example PromQL — Signal match ratio (how often signals trigger):
sum by (signal_type) (rate(llmsoup_signal_match_total[5m])) / sum by (signal_type) (rate(llmsoup_signal_extraction_total[5m]))Cache metrics
Section titled “Cache metrics”Observability for three cache backends tracked by Prometheus metrics:
- model_response — TTL-based cache for model responses
- embedding — Capacity-based LRU cache for embedding vectors
- semantic — Semantic cache plugin cache
Note: llmsoup also maintains an internal TPOT (Time Per Output Token) EMA cache for latency-aware routing. TPOT values are tracked via
llmsoup_model_tpot_secondsin the Model metrics section rather than through cache-level metrics.
llmsoup_cache_operations_total
Section titled “llmsoup_cache_operations_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | backend, operation, status |
| Description | Cache operations by backend, operation, and status |
Labels:
backend— Cache backend (model_response,embedding,semantic)operation— Operation type (get,put,evict)status— Operation outcome (hit,miss,error)
llmsoup_cache_operation_duration_seconds
Section titled “llmsoup_cache_operation_duration_seconds”| Property | Value |
|---|---|
| Type | Histogram |
| Labels | backend, operation |
| Description | Cache operation latency |
| Buckets | 0.00001, 0.00005, 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5 |
Buckets are microsecond-to-millisecond focused for in-memory caches.
llmsoup_cache_entries
Section titled “llmsoup_cache_entries”| Property | Value |
|---|---|
| Type | Gauge |
| Labels | backend |
| Description | Current cache entry count by backend |
llmsoup_cache_plugin_hits_total
Section titled “llmsoup_cache_plugin_hits_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | decision_name |
| Description | Semantic cache plugin hits by decision name |
llmsoup_cache_plugin_misses_total
Section titled “llmsoup_cache_plugin_misses_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | decision_name |
| Description | Semantic cache plugin misses by decision name |
Example PromQL — Cache hit rate by backend:
sum by (backend) (rate(llmsoup_cache_operations_total{status="hit"}[5m])) / sum by (backend) (rate(llmsoup_cache_operations_total{operation="get"}[5m]))Example PromQL — Semantic cache plugin effectiveness:
sum(rate(llmsoup_cache_plugin_hits_total[5m])) / (sum(rate(llmsoup_cache_plugin_hits_total[5m])) + sum(rate(llmsoup_cache_plugin_misses_total[5m])))Plugin metrics
Section titled “Plugin metrics”Execution tracking for all plugin types: semantic-cache, jailbreak, pii, system_prompt, header_mutation, hallucination, router_replay.
llmsoup_plugin_execution_total
Section titled “llmsoup_plugin_execution_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | plugin_type, decision_name, status, user_id |
| Description | Plugin executions by type, decision, status, and user |
Labels:
plugin_type— Plugin type (semantic-cache,jailbreak,pii,system_prompt,header_mutation,hallucination,router_replay)decision_name— Name of the matched routing rulestatus— Execution status (success,error)
llmsoup_plugin_execution_duration_seconds
Section titled “llmsoup_plugin_execution_duration_seconds”| Property | Value |
|---|---|
| Type | Histogram |
| Labels | plugin_type, user_id |
| Description | Plugin execution latency |
| Buckets | 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0 |
Expect wide variance: regex-only plugins run in ~10-100us, embedding-based in ~10-500ms, and LLM-calling plugins in ~100ms-5s.
llmsoup_plugin_errors_total
Section titled “llmsoup_plugin_errors_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | plugin_type, error_reason, user_id |
| Description | Plugin execution failures by type and error reason |
Labels:
error_reason— Error category (execution_failed,configuration_error,timeout,internal_error)
llmsoup_pii_violations_total
Section titled “llmsoup_pii_violations_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | model, pii_type, user_id |
| Description | PII violations detected by model, type, and user |
Labels:
pii_type— PII type detected (EMAIL_ADDRESS,PHONE_NUMBER, etc.)
Example PromQL — Plugin error rate by type:
sum by (plugin_type) (rate(llmsoup_plugin_errors_total[5m])) / sum by (plugin_type) (rate(llmsoup_plugin_execution_total[5m]))Example PromQL — PII violations over time:
sum by (pii_type) (rate(llmsoup_pii_violations_total[5m]))Cost metrics
Section titled “Cost metrics”Per-request cost tracking and cost-aware routing savings.
llmsoup_request_cost_dollars_total
Section titled “llmsoup_request_cost_dollars_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | model, user_id, currency |
| Description | Cumulative request cost in dollars by model |
llmsoup_request_cost_dollars
Section titled “llmsoup_request_cost_dollars”| Property | Value |
|---|---|
| Type | Histogram |
| Labels | model, user_id |
| Description | Per-request cost distribution in dollars |
| Buckets | 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0, 10.0 |
llmsoup_routing_cost_dollars_total
Section titled “llmsoup_routing_cost_dollars_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | user_id, currency |
| Description | Cumulative cost of external routing calls (e.g., preference signal LLM calls) |
llmsoup_cost_savings_dollars_total
Section titled “llmsoup_cost_savings_dollars_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | decision_name, user_id, currency |
| Description | Estimated cost savings from cost-aware routing |
llmsoup_tokens_total
Section titled “llmsoup_tokens_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | model, user_id, token_type |
| Description | Total tokens processed by model and type |
Labels:
token_type— One of"prompt","completion","cached_prompt","cached_completion"
llmsoup_tokens_per_request
Section titled “llmsoup_tokens_per_request”| Property | Value |
|---|---|
| Type | Histogram |
| Labels | model, user_id, token_type |
| Description | Token count distribution per request |
| Buckets | 10, 50, 100, 500, 1000, 2000, 5000, 10000, 50000, 100000 |
Example PromQL — Hourly cost by model:
sum by (model) (increase(llmsoup_request_cost_dollars_total[1h]))Example PromQL — Total cost savings:
sum(increase(llmsoup_cost_savings_dollars_total[24h]))Model metrics
Section titled “Model metrics”Per-model latency, throughput, and error tracking.
llmsoup_model_request_duration_seconds
Section titled “llmsoup_model_request_duration_seconds”| Property | Value |
|---|---|
| Type | Histogram |
| Labels | model, user_id |
| Description | End-to-end model call latency in seconds |
| Buckets | 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0 (Prometheus defaults) |
llmsoup_model_tpot_seconds
Section titled “llmsoup_model_tpot_seconds”| Property | Value |
|---|---|
| Type | Histogram |
| Labels | model |
| Description | Time Per Output Token (TPOT) in seconds |
| Buckets | 0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1.0 |
TPOT measures how long each output token takes to generate. llmsoup uses Exponential Moving Average (EMA) smoothing for TPOT values in its internal cache, but the raw per-request TPOT is recorded in this histogram. Lower TPOT indicates faster model generation speed.
llmsoup_model_request_errors_total
Section titled “llmsoup_model_request_errors_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | model, reason, user_id |
| Description | Model request errors by model and reason |
Labels:
reason— Error reason code (timeout,http_error,parse_error,connection_error)
Example PromQL — P99 model latency:
histogram_quantile(0.99, sum by (model, le) (rate(llmsoup_model_request_duration_seconds_bucket[5m])))Example PromQL — Model error rate:
sum by (model) (rate(llmsoup_model_request_errors_total[5m])) / sum by (model) (rate(llmsoup_model_request_duration_seconds_count[5m]))Reasoning metrics
Section titled “Reasoning metrics”Reasoning control decisions and effort distribution for models that support reasoning parameters.
llmsoup_reasoning_decisions_total
Section titled “llmsoup_reasoning_decisions_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | category, model, enabled, effort, user_id |
| Description | Reasoning decisions by category, model, enabled state, effort level, and user |
Labels:
category— Matched rule name or"default"model— Routed model nameenabled— Whether reasoning is enabled ("true","false")effort— Reasoning effort level ("low","medium","high","default")
llmsoup_reasoning_effort_usage_total
Section titled “llmsoup_reasoning_effort_usage_total”| Property | Value |
|---|---|
| Type | Counter |
| Labels | family, effort, user_id |
| Description | Reasoning effort distribution by family and effort level |
Labels:
family— Reasoning family type ("reasoning_effort","chat_template_kwargs")effort— Reasoning effort level ("low","medium","high","default")
Example PromQL — Reasoning usage by effort level:
sum by (effort) (rate(llmsoup_reasoning_decisions_total[5m]))Example PromQL — Reasoning family distribution:
sum by (family, effort) (rate(llmsoup_reasoning_effort_usage_total[5m]))Complete metric index
Section titled “Complete metric index”| Metric | Type | Category |
|---|---|---|
llmsoup_requests_total | Counter | Request |
llmsoup_active_connections | Gauge | Request |
llmsoup_errors_total | Counter | Request |
llmsoup_routing_decisions_total | Counter | Routing |
llmsoup_routing_duration_seconds | Histogram | Routing |
llmsoup_decision_evaluation_total | Counter | Routing |
llmsoup_decision_match_total | Counter | Routing |
llmsoup_decision_confidence | Histogram | Routing |
llmsoup_model_selection_total | Counter | Routing |
llmsoup_model_routing_modifications_total | Counter | Routing |
llmsoup_signal_extraction_total | Counter | Signal |
llmsoup_signal_match_total | Counter | Signal |
llmsoup_signal_extraction_duration_seconds | Histogram | Signal |
llmsoup_signal_errors_total | Counter | Signal |
llmsoup_classification_confidence | Histogram | Signal |
llmsoup_classification_total | Counter | Signal |
llmsoup_cache_operations_total | Counter | Cache |
llmsoup_cache_operation_duration_seconds | Histogram | Cache |
llmsoup_cache_entries | Gauge | Cache |
llmsoup_cache_plugin_hits_total | Counter | Cache |
llmsoup_cache_plugin_misses_total | Counter | Cache |
llmsoup_plugin_execution_total | Counter | Plugin |
llmsoup_plugin_execution_duration_seconds | Histogram | Plugin |
llmsoup_plugin_errors_total | Counter | Plugin |
llmsoup_pii_violations_total | Counter | Plugin |
llmsoup_request_cost_dollars_total | Counter | Cost |
llmsoup_request_cost_dollars | Histogram | Cost |
llmsoup_routing_cost_dollars_total | Counter | Cost |
llmsoup_cost_savings_dollars_total | Counter | Cost |
llmsoup_tokens_total | Counter | Cost |
llmsoup_tokens_per_request | Histogram | Cost |
llmsoup_model_request_duration_seconds | Histogram | Model |
llmsoup_model_tpot_seconds | Histogram | Model |
llmsoup_model_request_errors_total | Counter | Model |
llmsoup_reasoning_decisions_total | Counter | Reasoning |
llmsoup_reasoning_effort_usage_total | Counter | Reasoning |