Skip to content

Metrics Reference

llmsoup exposes 36 Prometheus metrics covering requests, routing, signals, caching, plugins, costs, models, and reasoning. All metrics are available at the /metrics endpoint in Prometheus text exposition format.

PropertyValue
URLGET /metrics
AuthenticationNot required
Content-Typetext/plain; version=0.0.4
FormatPrometheus text exposition format
Terminal window
curl http://localhost:8080/metrics

For Prometheus scrape configuration and Docker Compose monitoring setup, see the Deployment Guide.


All llmsoup metrics follow Prometheus best practices:

  • Prefix: Every metric starts with llmsoup_
  • Case: snake_case for metric names and label names
  • Unit suffixes:
    • _total — counters (monotonically increasing)
    • _seconds — duration histograms
    • _dollars — cost values (project-specific, not a Prometheus standard)
    • _bytes — sizes (reserved, not currently used)
  • Label conventions:
    • user_id — authenticated user or "anonymous"
    • model — model name from routing configuration
    • decision_name — routing rule/decision name
    • status — operation outcome (success, error, hit, miss)

Core HTTP request tracking.

PropertyValue
TypeCounter
Labelsmethod, status_code, endpoint, user_id
DescriptionTotal HTTP requests received

Labels:

  • method — HTTP method (GET, POST)
  • status_code — HTTP status code (200, 401, 500)
  • endpoint — Request path (/v1/chat/completions, /metrics)
  • user_id — Authenticated user ID or "anonymous"
PropertyValue
TypeGauge
LabelsNone
DescriptionCurrent active HTTP connections
PropertyValue
TypeCounter
Labelserror_type, user_id
DescriptionTotal errors by type

Labels:

  • error_type — Error category (routing_error, model_error, config_error)
  • user_id — Authenticated user ID or "anonymous"

Example PromQL — Request rate by endpoint:

rate(llmsoup_requests_total{endpoint="/v1/chat/completions"}[5m])

Example PromQL — Error rate percentage:

sum(rate(llmsoup_errors_total[5m])) / sum(rate(llmsoup_requests_total[5m])) * 100

Routing decision evaluation, rule matching, and model selection.

PropertyValue
TypeCounter
Labelsoutcome, decision_name, user_id
DescriptionRouting decision outcomes

Labels:

  • outcome — Result of routing (success, failed, fallback, default)
  • decision_name — Name of the matched routing rule
  • user_id — Authenticated user ID or "anonymous"
PropertyValue
TypeHistogram
Labelsdecision_name, user_id
DescriptionTime spent making routing decisions
Buckets0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0 (Prometheus defaults)
PropertyValue
TypeCounter
Labelsuser_id
DescriptionTotal routing rule evaluations
PropertyValue
TypeCounter
Labelsdecision_name, user_id
DescriptionTotal routing rule matches by decision name
PropertyValue
TypeHistogram
Labelsdecision_name, user_id
DescriptionAlgorithm confidence/score for routing decisions
Buckets0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99
PropertyValue
TypeCounter
Labelsalgorithm_type, model, decision_name, user_id
DescriptionTotal algorithm-based model selections

Labels:

  • algorithm_type — Algorithm used ("confidence" or "ratings")
  • model — Selected model name
  • decision_name — Name of the matched routing rule
  • user_id — Authenticated user ID or "anonymous"
PropertyValue
TypeCounter
Labelssource_model, target_model, user_id
DescriptionTotal fallback/parallel re-routing events

Labels:

  • source_model — Original primary model before re-routing
  • target_model — Actual model that responded after re-routing

Example PromQL — Routing decisions per second by outcome:

sum by (outcome) (rate(llmsoup_routing_decisions_total[5m]))

Example PromQL — Average routing latency:

rate(llmsoup_routing_duration_seconds_sum[5m]) / rate(llmsoup_routing_duration_seconds_count[5m])

Per-signal evaluation tracking across all signal types: keyword, embedding, domain, language, latency, fact_check, user_feedback, preference.

PropertyValue
TypeCounter
Labelssignal_type, signal_name, user_id
DescriptionSignal extraction attempts

Labels:

  • signal_type — Signal evaluator type (keyword, embedding, domain, language, latency, fact_check, user_feedback, preference)
  • signal_name — Specific signal name from evaluator configuration
PropertyValue
TypeCounter
Labelssignal_type, signal_name, user_id
DescriptionSuccessful signal matches (triggered=true)

llmsoup_signal_extraction_duration_seconds

Section titled “llmsoup_signal_extraction_duration_seconds”
PropertyValue
TypeHistogram
Labelssignal_type, user_id
DescriptionPer-signal evaluation latency
Buckets0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0
PropertyValue
TypeCounter
Labelssignal_type, error_type, user_id
DescriptionSignal evaluation failures
PropertyValue
TypeHistogram
Labelscategory, classification_method, user_id
DescriptionClassification confidence scores
Buckets0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99

Labels:

  • category — Predicted classification category (e.g., "math", "computer_science")
  • classification_method — Classifier type ("domain", "fact_check", "preference")
PropertyValue
TypeCounter
Labelscategory, user_id
DescriptionClassification outcomes across all classifier-based evaluators

Tracks domain, fact_check, and preference classification events. Incremented once per signal evaluation.

Example PromQL — Signal evaluation rate by type:

sum by (signal_type) (rate(llmsoup_signal_extraction_total[5m]))

Example PromQL — Signal match ratio (how often signals trigger):

sum by (signal_type) (rate(llmsoup_signal_match_total[5m]))
/ sum by (signal_type) (rate(llmsoup_signal_extraction_total[5m]))

Observability for three cache backends tracked by Prometheus metrics:

  • model_response — TTL-based cache for model responses
  • embedding — Capacity-based LRU cache for embedding vectors
  • semantic — Semantic cache plugin cache

Note: llmsoup also maintains an internal TPOT (Time Per Output Token) EMA cache for latency-aware routing. TPOT values are tracked via llmsoup_model_tpot_seconds in the Model metrics section rather than through cache-level metrics.

PropertyValue
TypeCounter
Labelsbackend, operation, status
DescriptionCache operations by backend, operation, and status

Labels:

  • backend — Cache backend (model_response, embedding, semantic)
  • operation — Operation type (get, put, evict)
  • status — Operation outcome (hit, miss, error)
PropertyValue
TypeHistogram
Labelsbackend, operation
DescriptionCache operation latency
Buckets0.00001, 0.00005, 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5

Buckets are microsecond-to-millisecond focused for in-memory caches.

PropertyValue
TypeGauge
Labelsbackend
DescriptionCurrent cache entry count by backend
PropertyValue
TypeCounter
Labelsdecision_name
DescriptionSemantic cache plugin hits by decision name
PropertyValue
TypeCounter
Labelsdecision_name
DescriptionSemantic cache plugin misses by decision name

Example PromQL — Cache hit rate by backend:

sum by (backend) (rate(llmsoup_cache_operations_total{status="hit"}[5m]))
/ sum by (backend) (rate(llmsoup_cache_operations_total{operation="get"}[5m]))

Example PromQL — Semantic cache plugin effectiveness:

sum(rate(llmsoup_cache_plugin_hits_total[5m]))
/ (sum(rate(llmsoup_cache_plugin_hits_total[5m])) + sum(rate(llmsoup_cache_plugin_misses_total[5m])))

Execution tracking for all plugin types: semantic-cache, jailbreak, pii, system_prompt, header_mutation, hallucination, router_replay.

PropertyValue
TypeCounter
Labelsplugin_type, decision_name, status, user_id
DescriptionPlugin executions by type, decision, status, and user

Labels:

  • plugin_type — Plugin type (semantic-cache, jailbreak, pii, system_prompt, header_mutation, hallucination, router_replay)
  • decision_name — Name of the matched routing rule
  • status — Execution status (success, error)
PropertyValue
TypeHistogram
Labelsplugin_type, user_id
DescriptionPlugin execution latency
Buckets0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0

Expect wide variance: regex-only plugins run in ~10-100us, embedding-based in ~10-500ms, and LLM-calling plugins in ~100ms-5s.

PropertyValue
TypeCounter
Labelsplugin_type, error_reason, user_id
DescriptionPlugin execution failures by type and error reason

Labels:

  • error_reason — Error category (execution_failed, configuration_error, timeout, internal_error)
PropertyValue
TypeCounter
Labelsmodel, pii_type, user_id
DescriptionPII violations detected by model, type, and user

Labels:

  • pii_type — PII type detected (EMAIL_ADDRESS, PHONE_NUMBER, etc.)

Example PromQL — Plugin error rate by type:

sum by (plugin_type) (rate(llmsoup_plugin_errors_total[5m]))
/ sum by (plugin_type) (rate(llmsoup_plugin_execution_total[5m]))

Example PromQL — PII violations over time:

sum by (pii_type) (rate(llmsoup_pii_violations_total[5m]))

Per-request cost tracking and cost-aware routing savings.

PropertyValue
TypeCounter
Labelsmodel, user_id, currency
DescriptionCumulative request cost in dollars by model
PropertyValue
TypeHistogram
Labelsmodel, user_id
DescriptionPer-request cost distribution in dollars
Buckets0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0, 10.0
PropertyValue
TypeCounter
Labelsuser_id, currency
DescriptionCumulative cost of external routing calls (e.g., preference signal LLM calls)
PropertyValue
TypeCounter
Labelsdecision_name, user_id, currency
DescriptionEstimated cost savings from cost-aware routing
PropertyValue
TypeCounter
Labelsmodel, user_id, token_type
DescriptionTotal tokens processed by model and type

Labels:

  • token_type — One of "prompt", "completion", "cached_prompt", "cached_completion"
PropertyValue
TypeHistogram
Labelsmodel, user_id, token_type
DescriptionToken count distribution per request
Buckets10, 50, 100, 500, 1000, 2000, 5000, 10000, 50000, 100000

Example PromQL — Hourly cost by model:

sum by (model) (increase(llmsoup_request_cost_dollars_total[1h]))

Example PromQL — Total cost savings:

sum(increase(llmsoup_cost_savings_dollars_total[24h]))

Per-model latency, throughput, and error tracking.

PropertyValue
TypeHistogram
Labelsmodel, user_id
DescriptionEnd-to-end model call latency in seconds
Buckets0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0 (Prometheus defaults)
PropertyValue
TypeHistogram
Labelsmodel
DescriptionTime Per Output Token (TPOT) in seconds
Buckets0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1.0

TPOT measures how long each output token takes to generate. llmsoup uses Exponential Moving Average (EMA) smoothing for TPOT values in its internal cache, but the raw per-request TPOT is recorded in this histogram. Lower TPOT indicates faster model generation speed.

PropertyValue
TypeCounter
Labelsmodel, reason, user_id
DescriptionModel request errors by model and reason

Labels:

  • reason — Error reason code (timeout, http_error, parse_error, connection_error)

Example PromQL — P99 model latency:

histogram_quantile(0.99, sum by (model, le) (rate(llmsoup_model_request_duration_seconds_bucket[5m])))

Example PromQL — Model error rate:

sum by (model) (rate(llmsoup_model_request_errors_total[5m]))
/ sum by (model) (rate(llmsoup_model_request_duration_seconds_count[5m]))

Reasoning control decisions and effort distribution for models that support reasoning parameters.

PropertyValue
TypeCounter
Labelscategory, model, enabled, effort, user_id
DescriptionReasoning decisions by category, model, enabled state, effort level, and user

Labels:

  • category — Matched rule name or "default"
  • model — Routed model name
  • enabled — Whether reasoning is enabled ("true", "false")
  • effort — Reasoning effort level ("low", "medium", "high", "default")
PropertyValue
TypeCounter
Labelsfamily, effort, user_id
DescriptionReasoning effort distribution by family and effort level

Labels:

  • family — Reasoning family type ("reasoning_effort", "chat_template_kwargs")
  • effort — Reasoning effort level ("low", "medium", "high", "default")

Example PromQL — Reasoning usage by effort level:

sum by (effort) (rate(llmsoup_reasoning_decisions_total[5m]))

Example PromQL — Reasoning family distribution:

sum by (family, effort) (rate(llmsoup_reasoning_effort_usage_total[5m]))

MetricTypeCategory
llmsoup_requests_totalCounterRequest
llmsoup_active_connectionsGaugeRequest
llmsoup_errors_totalCounterRequest
llmsoup_routing_decisions_totalCounterRouting
llmsoup_routing_duration_secondsHistogramRouting
llmsoup_decision_evaluation_totalCounterRouting
llmsoup_decision_match_totalCounterRouting
llmsoup_decision_confidenceHistogramRouting
llmsoup_model_selection_totalCounterRouting
llmsoup_model_routing_modifications_totalCounterRouting
llmsoup_signal_extraction_totalCounterSignal
llmsoup_signal_match_totalCounterSignal
llmsoup_signal_extraction_duration_secondsHistogramSignal
llmsoup_signal_errors_totalCounterSignal
llmsoup_classification_confidenceHistogramSignal
llmsoup_classification_totalCounterSignal
llmsoup_cache_operations_totalCounterCache
llmsoup_cache_operation_duration_secondsHistogramCache
llmsoup_cache_entriesGaugeCache
llmsoup_cache_plugin_hits_totalCounterCache
llmsoup_cache_plugin_misses_totalCounterCache
llmsoup_plugin_execution_totalCounterPlugin
llmsoup_plugin_execution_duration_secondsHistogramPlugin
llmsoup_plugin_errors_totalCounterPlugin
llmsoup_pii_violations_totalCounterPlugin
llmsoup_request_cost_dollars_totalCounterCost
llmsoup_request_cost_dollarsHistogramCost
llmsoup_routing_cost_dollars_totalCounterCost
llmsoup_cost_savings_dollars_totalCounterCost
llmsoup_tokens_totalCounterCost
llmsoup_tokens_per_requestHistogramCost
llmsoup_model_request_duration_secondsHistogramModel
llmsoup_model_tpot_secondsHistogramModel
llmsoup_model_request_errors_totalCounterModel
llmsoup_reasoning_decisions_totalCounterReasoning
llmsoup_reasoning_effort_usage_totalCounterReasoning