Metrics Reference

llmsoup exposes 36 Prometheus metrics covering requests, routing, signals, caching, plugins, costs, models, and reasoning. All metrics are available at the /metrics endpoint in Prometheus text exposition format.

Endpoint behavior

Property	Value
URL	`GET /metrics`
Authentication	Not required
Content-Type	`text/plain; version=0.0.4`
Format	Prometheus text exposition format

curl http://localhost:8080/metrics

For Prometheus scrape configuration and Docker Compose monitoring setup, see the Deployment Guide.

Naming conventions

All llmsoup metrics follow Prometheus best practices:

Prefix: Every metric starts with llmsoup_
Case: snake_case for metric names and label names
Unit suffixes:
- _total — counters (monotonically increasing)
- _seconds — duration histograms
- _dollars — cost values (project-specific, not a Prometheus standard)
- _bytes — sizes (reserved, not currently used)
Label conventions:
- user_id — authenticated user or "anonymous"
- model — model name from routing configuration
- decision_name — routing rule/decision name
- status — operation outcome (success, error, hit, miss)

Request metrics

Core HTTP request tracking.

llmsoup_requests_total

Property	Value
Type	Counter
Labels	`method`, `status_code`, `endpoint`, `user_id`
Description	Total HTTP requests received

Labels:

method — HTTP method (GET, POST)
status_code — HTTP status code (200, 401, 500)
endpoint — Request path (/v1/chat/completions, /metrics)
user_id — Authenticated user ID or "anonymous"

llmsoup_active_connections

Property	Value
Type	Gauge
Labels	None
Description	Current active HTTP connections

llmsoup_errors_total

Property	Value
Type	Counter
Labels	`error_type`, `user_id`
Description	Total errors by type

Labels:

error_type — Error category (routing_error, model_error, config_error)
user_id — Authenticated user ID or "anonymous"

Example PromQL — Request rate by endpoint:

rate(llmsoup_requests_total{endpoint="/v1/chat/completions"}[5m])

Example PromQL — Error rate percentage:

sum(rate(llmsoup_errors_total[5m])) / sum(rate(llmsoup_requests_total[5m])) * 100

Routing metrics

Routing decision evaluation, rule matching, and model selection.

llmsoup_routing_decisions_total

Property	Value
Type	Counter
Labels	`outcome`, `decision_name`, `user_id`
Description	Routing decision outcomes

Labels:

outcome — Result of routing (success, failed, fallback, default)
decision_name — Name of the matched routing rule
user_id — Authenticated user ID or "anonymous"

llmsoup_routing_duration_seconds

Property	Value
Type	Histogram
Labels	`decision_name`, `user_id`
Description	Time spent making routing decisions
Buckets	0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0 (Prometheus defaults)

llmsoup_decision_evaluation_total

Property	Value
Type	Counter
Labels	`user_id`
Description	Total routing rule evaluations

llmsoup_decision_match_total

Property	Value
Type	Counter
Labels	`decision_name`, `user_id`
Description	Total routing rule matches by decision name

llmsoup_decision_confidence

Property	Value
Type	Histogram
Labels	`decision_name`, `user_id`
Description	Algorithm confidence/score for routing decisions
Buckets	0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99

llmsoup_model_selection_total

Property	Value
Type	Counter
Labels	`algorithm_type`, `model`, `decision_name`, `user_id`
Description	Total algorithm-based model selections

Labels:

algorithm_type — Algorithm used ("confidence" or "ratings")
model — Selected model name
decision_name — Name of the matched routing rule
user_id — Authenticated user ID or "anonymous"

llmsoup_model_routing_modifications_total

Property	Value
Type	Counter
Labels	`source_model`, `target_model`, `user_id`
Description	Total fallback/parallel re-routing events

Labels:

source_model — Original primary model before re-routing
target_model — Actual model that responded after re-routing

Example PromQL — Routing decisions per second by outcome:

sum by (outcome) (rate(llmsoup_routing_decisions_total[5m]))

Example PromQL — Average routing latency:

rate(llmsoup_routing_duration_seconds_sum[5m]) / rate(llmsoup_routing_duration_seconds_count[5m])

Signal metrics

Per-signal evaluation tracking across all signal types: keyword, embedding, domain, language, latency, fact_check, user_feedback, preference.

llmsoup_signal_extraction_total

Property	Value
Type	Counter
Labels	`signal_type`, `signal_name`, `user_id`
Description	Signal extraction attempts

Labels:

signal_type — Signal evaluator type (keyword, embedding, domain, language, latency, fact_check, user_feedback, preference)
signal_name — Specific signal name from evaluator configuration

llmsoup_signal_match_total

Property	Value
Type	Counter
Labels	`signal_type`, `signal_name`, `user_id`
Description	Successful signal matches (triggered=true)

llmsoup_signal_extraction_duration_seconds

Property	Value
Type	Histogram
Labels	`signal_type`, `user_id`
Description	Per-signal evaluation latency
Buckets	0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0

llmsoup_signal_errors_total

Property	Value
Type	Counter
Labels	`signal_type`, `error_type`, `user_id`
Description	Signal evaluation failures

llmsoup_classification_confidence

Property	Value
Type	Histogram
Labels	`category`, `classification_method`, `user_id`
Description	Classification confidence scores
Buckets	0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99

Labels:

category — Predicted classification category (e.g., "math", "computer_science")
classification_method — Classifier type ("domain", "fact_check", "preference")

llmsoup_classification_total

Property	Value
Type	Counter
Labels	`category`, `user_id`
Description	Classification outcomes across all classifier-based evaluators

Tracks domain, fact_check, and preference classification events. Incremented once per signal evaluation.

Example PromQL — Signal evaluation rate by type:

sum by (signal_type) (rate(llmsoup_signal_extraction_total[5m]))

Example PromQL — Signal match ratio (how often signals trigger):

sum by (signal_type) (rate(llmsoup_signal_match_total[5m]))
  / sum by (signal_type) (rate(llmsoup_signal_extraction_total[5m]))

Cache metrics

Observability for three cache backends tracked by Prometheus metrics:

model_response — TTL-based cache for model responses
embedding — Capacity-based LRU cache for embedding vectors
semantic — Semantic cache plugin cache

Note: llmsoup also maintains an internal TPOT (Time Per Output Token) EMA cache for latency-aware routing. TPOT values are tracked via llmsoup_model_tpot_seconds in the Model metrics section rather than through cache-level metrics.

llmsoup_cache_operations_total

Property	Value
Type	Counter
Labels	`backend`, `operation`, `status`
Description	Cache operations by backend, operation, and status

Labels:

backend — Cache backend (model_response, embedding, semantic)
operation — Operation type (get, put, evict)
status — Operation outcome (hit, miss, error)

llmsoup_cache_operation_duration_seconds

Property	Value
Type	Histogram
Labels	`backend`, `operation`
Description	Cache operation latency
Buckets	0.00001, 0.00005, 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5

Buckets are microsecond-to-millisecond focused for in-memory caches.

llmsoup_cache_entries

Property	Value
Type	Gauge
Labels	`backend`
Description	Current cache entry count by backend

llmsoup_cache_plugin_hits_total

Property	Value
Type	Counter
Labels	`decision_name`
Description	Semantic cache plugin hits by decision name

llmsoup_cache_plugin_misses_total

Property	Value
Type	Counter
Labels	`decision_name`
Description	Semantic cache plugin misses by decision name

Example PromQL — Cache hit rate by backend:

sum by (backend) (rate(llmsoup_cache_operations_total{status="hit"}[5m]))
  / sum by (backend) (rate(llmsoup_cache_operations_total{operation="get"}[5m]))

Example PromQL — Semantic cache plugin effectiveness:

sum(rate(llmsoup_cache_plugin_hits_total[5m]))
  / (sum(rate(llmsoup_cache_plugin_hits_total[5m])) + sum(rate(llmsoup_cache_plugin_misses_total[5m])))

Plugin metrics

Execution tracking for all plugin types: semantic-cache, jailbreak, pii, system_prompt, header_mutation, hallucination, router_replay.

llmsoup_plugin_execution_total

Property	Value
Type	Counter
Labels	`plugin_type`, `decision_name`, `status`, `user_id`
Description	Plugin executions by type, decision, status, and user

Labels:

plugin_type — Plugin type (semantic-cache, jailbreak, pii, system_prompt, header_mutation, hallucination, router_replay)
decision_name — Name of the matched routing rule
status — Execution status (success, error)

llmsoup_plugin_execution_duration_seconds

Property	Value
Type	Histogram
Labels	`plugin_type`, `user_id`
Description	Plugin execution latency
Buckets	0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0

Expect wide variance: regex-only plugins run in ~10-100us, embedding-based in ~10-500ms, and LLM-calling plugins in ~100ms-5s.

llmsoup_plugin_errors_total

Property	Value
Type	Counter
Labels	`plugin_type`, `error_reason`, `user_id`
Description	Plugin execution failures by type and error reason

Labels:

error_reason — Error category (execution_failed, configuration_error, timeout, internal_error)

llmsoup_pii_violations_total

Property	Value
Type	Counter
Labels	`model`, `pii_type`, `user_id`
Description	PII violations detected by model, type, and user

Labels:

pii_type — PII type detected (EMAIL_ADDRESS, PHONE_NUMBER, etc.)

Example PromQL — Plugin error rate by type:

sum by (plugin_type) (rate(llmsoup_plugin_errors_total[5m]))
  / sum by (plugin_type) (rate(llmsoup_plugin_execution_total[5m]))

Example PromQL — PII violations over time:

sum by (pii_type) (rate(llmsoup_pii_violations_total[5m]))

Cost metrics

Per-request cost tracking and cost-aware routing savings.

llmsoup_request_cost_dollars_total

Property	Value
Type	Counter
Labels	`model`, `user_id`, `currency`
Description	Cumulative request cost in dollars by model

llmsoup_request_cost_dollars

Property	Value
Type	Histogram
Labels	`model`, `user_id`
Description	Per-request cost distribution in dollars
Buckets	0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0, 10.0

llmsoup_routing_cost_dollars_total

Property	Value
Type	Counter
Labels	`user_id`, `currency`
Description	Cumulative cost of external routing calls (e.g., preference signal LLM calls)

llmsoup_cost_savings_dollars_total

Property	Value
Type	Counter
Labels	`decision_name`, `user_id`, `currency`
Description	Estimated cost savings from cost-aware routing

llmsoup_tokens_total

Property	Value
Type	Counter
Labels	`model`, `user_id`, `token_type`
Description	Total tokens processed by model and type

Labels:

token_type — One of "prompt", "completion", "cached_prompt", "cached_completion"

llmsoup_tokens_per_request

Property	Value
Type	Histogram
Labels	`model`, `user_id`, `token_type`
Description	Token count distribution per request
Buckets	10, 50, 100, 500, 1000, 2000, 5000, 10000, 50000, 100000

Example PromQL — Hourly cost by model:

sum by (model) (increase(llmsoup_request_cost_dollars_total[1h]))

Example PromQL — Total cost savings:

sum(increase(llmsoup_cost_savings_dollars_total[24h]))

Model metrics

Per-model latency, throughput, and error tracking.

llmsoup_model_request_duration_seconds

Property	Value
Type	Histogram
Labels	`model`, `user_id`
Description	End-to-end model call latency in seconds
Buckets	0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0 (Prometheus defaults)

llmsoup_model_tpot_seconds

Property	Value
Type	Histogram
Labels	`model`
Description	Time Per Output Token (TPOT) in seconds
Buckets	0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1.0

TPOT measures how long each output token takes to generate. llmsoup uses Exponential Moving Average (EMA) smoothing for TPOT values in its internal cache, but the raw per-request TPOT is recorded in this histogram. Lower TPOT indicates faster model generation speed.

llmsoup_model_request_errors_total

Property	Value
Type	Counter
Labels	`model`, `reason`, `user_id`
Description	Model request errors by model and reason

Labels:

reason — Error reason code (timeout, http_error, parse_error, connection_error)

Example PromQL — P99 model latency:

histogram_quantile(0.99, sum by (model, le) (rate(llmsoup_model_request_duration_seconds_bucket[5m])))

Example PromQL — Model error rate:

sum by (model) (rate(llmsoup_model_request_errors_total[5m]))
  / sum by (model) (rate(llmsoup_model_request_duration_seconds_count[5m]))

Reasoning metrics

Reasoning control decisions and effort distribution for models that support reasoning parameters.

llmsoup_reasoning_decisions_total

Property	Value
Type	Counter
Labels	`category`, `model`, `enabled`, `effort`, `user_id`
Description	Reasoning decisions by category, model, enabled state, effort level, and user

Labels:

category — Matched rule name or "default"
model — Routed model name
enabled — Whether reasoning is enabled ("true", "false")
effort — Reasoning effort level ("low", "medium", "high", "default")

llmsoup_reasoning_effort_usage_total

Property	Value
Type	Counter
Labels	`family`, `effort`, `user_id`
Description	Reasoning effort distribution by family and effort level

Labels:

family — Reasoning family type ("reasoning_effort", "chat_template_kwargs")
effort — Reasoning effort level ("low", "medium", "high", "default")

Example PromQL — Reasoning usage by effort level:

sum by (effort) (rate(llmsoup_reasoning_decisions_total[5m]))

Example PromQL — Reasoning family distribution:

sum by (family, effort) (rate(llmsoup_reasoning_effort_usage_total[5m]))

Complete metric index

Metric	Type	Category
`llmsoup_requests_total`	Counter	Request
`llmsoup_active_connections`	Gauge	Request
`llmsoup_errors_total`	Counter	Request
`llmsoup_routing_decisions_total`	Counter	Routing
`llmsoup_routing_duration_seconds`	Histogram	Routing
`llmsoup_decision_evaluation_total`	Counter	Routing
`llmsoup_decision_match_total`	Counter	Routing
`llmsoup_decision_confidence`	Histogram	Routing
`llmsoup_model_selection_total`	Counter	Routing
`llmsoup_model_routing_modifications_total`	Counter	Routing
`llmsoup_signal_extraction_total`	Counter	Signal
`llmsoup_signal_match_total`	Counter	Signal
`llmsoup_signal_extraction_duration_seconds`	Histogram	Signal
`llmsoup_signal_errors_total`	Counter	Signal
`llmsoup_classification_confidence`	Histogram	Signal
`llmsoup_classification_total`	Counter	Signal
`llmsoup_cache_operations_total`	Counter	Cache
`llmsoup_cache_operation_duration_seconds`	Histogram	Cache
`llmsoup_cache_entries`	Gauge	Cache
`llmsoup_cache_plugin_hits_total`	Counter	Cache
`llmsoup_cache_plugin_misses_total`	Counter	Cache
`llmsoup_plugin_execution_total`	Counter	Plugin
`llmsoup_plugin_execution_duration_seconds`	Histogram	Plugin
`llmsoup_plugin_errors_total`	Counter	Plugin
`llmsoup_pii_violations_total`	Counter	Plugin
`llmsoup_request_cost_dollars_total`	Counter	Cost
`llmsoup_request_cost_dollars`	Histogram	Cost
`llmsoup_routing_cost_dollars_total`	Counter	Cost
`llmsoup_cost_savings_dollars_total`	Counter	Cost
`llmsoup_tokens_total`	Counter	Cost
`llmsoup_tokens_per_request`	Histogram	Cost
`llmsoup_model_request_duration_seconds`	Histogram	Model
`llmsoup_model_tpot_seconds`	Histogram	Model
`llmsoup_model_request_errors_total`	Counter	Model
`llmsoup_reasoning_decisions_total`	Counter	Reasoning
`llmsoup_reasoning_effort_usage_total`	Counter	Reasoning