Skip to content

Plugins

Plugins add security, caching, and observability capabilities to routing decisions. They execute as hooks around model calls — some run before the request reaches the model, others inspect the response afterward. This page documents the plugin system and every built-in plugin.

For the YAML field reference and rule syntax, see the Configuration Reference. For signal evaluation and rule matching, see Signals & Routing.


Plugins are scoped to routing rules (decisions), not global. Each rule can define its own set of plugins that execute only when that rule matches a request. This gives fine-grained control — you might enable jailbreak detection on public-facing rules while skipping it for internal traffic.

Plugins hook into two phases of the request lifecycle:

Incoming request
Signal evaluation & rule matching
┌─────────────────────────────────┐
│ Plugin request hooks (pre) │ ← semantic-cache, jailbreak, pii, system_prompt, header_mutation
│ Execute in declaration order │
└─────────────────────────────────┘
Model HTTP call
┌─────────────────────────────────┐
│ Plugin response hooks (post) │ ← hallucination, semantic-cache (store), header_mutation, router_replay
│ Execute in declaration order │
└─────────────────────────────────┘
Response returned to client

Pre-request plugins can inspect or modify the request before it reaches the model. Some can short-circuit the pipeline entirely:

  • semantic-cache returns a cached response if a similar request was seen recently, skipping the model call.
  • jailbreak and pii can block the request and return an error response.
  • system_prompt injects or replaces the system message in the conversation.
  • header_mutation adds, sets, or removes headers on the outbound model request.

Post-response plugins inspect or modify the model response before it reaches the client:

  • hallucination evaluates response quality and can add warning headers, modify the body, or block the response.
  • semantic-cache stores the response for future cache hits.
  • header_mutation can also modify response headers sent back to the client.
  • router_replay captures the full routing decision and optional request/response data for debugging.

Plugins execute in the order they appear in the plugins array of a rule. This order is deterministic and matters — for example, placing jailbreak before system_prompt ensures malicious prompts are caught before any system prompt injection occurs.

Plugin failures do not block routing. If a plugin encounters an error during execution, the error is logged and metrics are recorded, but the request continues through the remaining plugins and model call. This ensures that a misconfigured or failing plugin never takes down the routing pipeline. For broader context on how llmsoup handles component failures, see Graceful degradation in the Signals & Routing guide.


Plugins are configured within the plugins array of a routing rule. Each plugin entry requires a type field identifying the plugin and a configuration object with plugin-specific settings.

decisions:
- name: general-routing
priority: 50
conditions: []
action:
model: gpt-5-mini
strategy: default
plugins:
- type: jailbreak
configuration:
enabled: true
threshold: 0.7
- type: system_prompt
configuration:
enabled: true
system_prompt: "You are a helpful assistant."
mode: replace
- type: hallucination
configuration:
enabled: true
threshold: 0.7
action: header

Every plugin supports these fields in its configuration:

FieldTypeDefaultDescription
enabledbooleanvariesWhether the plugin is active. Disabled plugins are skipped entirely.

Plugin-specific fields are documented in each plugin section below.


Caches model responses based on semantic similarity of the input. When a new request is similar enough to a previously cached request, the cached response is returned directly — skipping the model call and saving latency and cost.

  1. On request, the plugin computes an embedding of the last user message.
  2. It searches the cache for entries with the same context hash (conversation history minus the last message) and a cosine similarity above the threshold.
  3. On a cache hit, the cached response is injected into the plugin context and the model call is skipped.
  4. On a cache miss, after the model responds, the plugin stores the response keyed by context hash and embedding.

The cache is model-aware — responses are partitioned by the target model name, so a cached GPT-5 response is never returned for a Claude request.

FieldTypeDefaultDescription
enabledbooleantrueEnable or disable the cache.
similarity_thresholdfloat0.85Minimum cosine similarity (0.0–1.0) for a cache hit. Higher values require closer semantic matches.
ttl_secondsinteger3600Time-to-live in seconds. Cached entries expire after this duration.

The cache holds a fixed maximum of 10,000 entries (not configurable via YAML) and uses FIFO eviction when full.

- type: semantic-cache
configuration:
enabled: true
similarity_threshold: 0.9
ttl_seconds: 1800

Detects jailbreak attempts in user messages and blocks them before they reach the model. Uses a hybrid approach combining keyword pattern matching with embedding-based semantic analysis.

  1. The plugin scans the last user message against a built-in list of jailbreak keyword patterns (phrases like “ignore previous instructions”, “DAN mode”, etc.).
  2. If an embedding model is available, it also computes a semantic similarity score against known jailbreak patterns.
  3. The combined score is calculated as 40% keyword match + 60% embedding similarity (or keyword-only if no embedding model is loaded).
  4. If the combined score exceeds the threshold, the request is blocked with an OpenAI-compatible error response using finish_reason: content_filter.
FieldTypeDefaultDescription
enabledbooleantrueEnable or disable jailbreak detection.
thresholdfloat0.7Detection threshold (0.0–1.0). Lower values are more aggressive.
- type: jailbreak
configuration:
enabled: true
threshold: 0.65

Detects personally identifiable information (PII) in user messages and blocks requests containing sensitive data. Uses hybrid regex pattern matching combined with embedding-based detection.

  1. The plugin runs regex patterns for 14 PII types against the last user message.
  2. If an embedding model is available, it also computes semantic similarity for PII-related content.
  3. The combined score is calculated as 70% regex match + 30% embedding similarity. When regex patterns match, the regex component contributes a high confidence score (0.95).
  4. If PII is detected above the threshold, the request is blocked with an OpenAI-compatible error response using finish_reason: content_filter.

EMAIL_ADDRESS, PHONE_NUMBER, US_SSN, CREDIT_CARD, IP_ADDRESS, IBAN_CODE, STREET_ADDRESS, PERSON, DOMAIN_NAME, DATE_TIME, AGE, US_DRIVER_LICENSE, ZIP_CODE, ORGANIZATION

FieldTypeDefaultDescription
enabledbooleantrueEnable or disable PII detection.
thresholdfloat0.7Detection threshold (0.0–1.0). Lower values are more sensitive.
pii_types_allowedlist[]PII types to allow through without blocking. Use type names from the list above.
- type: pii
configuration:
enabled: true
threshold: 0.7
pii_types_allowed:
- DOMAIN_NAME
- DATE_TIME

Injects or replaces the system message in the conversation before it reaches the model. Useful for enforcing consistent behavior, adding safety instructions, or customizing model personality per routing rule.

The plugin operates in one of two modes:

  • replace (default) — Replaces the first system message in the conversation. If no system message exists, the configured prompt is prepended as a new system message.
  • insert — Always prepends a new system message at position 0, regardless of existing system messages.

This plugin only runs on request (pre-model). It has no post-response behavior.

FieldTypeDefaultDescription
enabledbooleantrueEnable or disable prompt injection.
system_promptstringrequiredThe system prompt text to inject.
modestring"replace"Injection mode: replace or insert.
- type: system_prompt
configuration:
enabled: true
system_prompt: "You are a helpful coding assistant. Always include code examples."
mode: replace

Manipulates HTTP headers on requests sent to models and/or responses returned to clients. Supports adding, setting, and removing headers across request and response phases.

Each mutation specifies an operation, a header name, an optional value, and a phase:

  • Operations:

    • add — Adds the header only if it is not already present.
    • set — Sets the header unconditionally, overwriting any existing value.
    • remove — Removes the header if present.
  • Phases:

    • request — Applies to the outbound request sent to the model.
    • response — Applies to the response returned to the client.
    • both — Applies to both request and response.

Certain restricted headers cannot be modified: authorization, content-type, content-length, and host.

FieldTypeDefaultDescription
enabledbooleantrueEnable or disable header mutation.
mutationslist[]Array of mutation objects (see below).

Each mutation object:

FieldTypeDescription
operationstringadd, set, or remove.
headerstringHeader name.
valuestringHeader value (not required for remove).
phasestringrequest, response, or both.
- type: header_mutation
configuration:
enabled: true
mutations:
- operation: set
header: x-routed-by
value: llmsoup
phase: response
- operation: add
header: x-request-source
value: internal
phase: request
- operation: remove
header: x-debug-trace
phase: both

Detects potential hallucinations in model responses using heuristic analysis. Checks for hedging language, fabrication indicators, and self-contradictions. Runs post-response only.

  1. The plugin analyzes the model response text for three hallucination signals:
    • Hedging (weight: 30%) — phrases indicating uncertainty (“I think”, “it’s possible”, etc.). Produces a continuous score (0.0–1.0) based on pattern density.
    • Fabrication indicators (weight: 40%) — patterns suggesting made-up information. Produces a continuous score (0.0–1.0) based on pattern density.
    • Self-contradictions (weight: 30%) — conflicting statements within the response. Binary detection: contributes a fixed penalty of 0.5 when contradictions are found, 0.0 otherwise.
  2. The hallucination score is calculated as: (hedging × 0.3 + fabrication × 0.4 + contradiction_penalty × 0.3) × (0.5 + sensitivity), where contradiction_penalty is 0.5 when contradictions are detected or 0.0 otherwise.
  3. If the score exceeds the threshold, the configured action is taken.
ActionBehavior
headerAdds x-llmsoup-hallucination-score and x-llmsoup-hallucination-detected response headers. The original response is returned unchanged.
bodyInjects _llmsoup_warnings into the response body alongside the original content.
blockReplaces the entire response with an error message indicating hallucination was detected.
logLogs the detection. No modification to the response.
FieldTypeDefaultDescription
enabledbooleantrueEnable or disable hallucination detection.
thresholdfloat0.7Score threshold (0.0–1.0) to trigger the action.
actionstring"header"Action on detection: header, body, block, or log.
heuristic_sensitivityfloat0.5Sensitivity multiplier (0.0–1.0). Higher values produce higher scores.
- type: hallucination
configuration:
enabled: true
threshold: 0.6
action: header
heuristic_sensitivity: 0.7

Captures routing decisions and optional request/response data for debugging and replay. This is primarily a development and troubleshooting tool — it is disabled by default.

  1. On every request that passes through a rule with this plugin enabled, the plugin records the routing decision: matched rule name, priority, selected model, strategy, and algorithm details.
  2. Optionally, it captures truncated snippets of the request body and response body.
  3. Records are stored in a bounded in-memory store with FIFO eviction when max_records is reached.
FieldTypeDefaultDescription
enabledbooleanfalseEnable or disable replay capture. Disabled by default.
max_recordsinteger200Maximum number of records to keep in the replay store.
capture_request_bodybooleanfalseWhether to capture a snippet of the request body.
capture_response_bodybooleanfalseWhether to capture a snippet of the response body.
max_body_bytesinteger4096Maximum bytes to capture per request/response body.
- type: router_replay
configuration:
enabled: true
max_records: 500
capture_request_body: true
capture_response_body: true
max_body_bytes: 2048

All plugins emit Prometheus metrics for monitoring execution health and performance. These metrics use the standard llmsoup_ prefix.

MetricTypeLabelsDescription
llmsoup_plugin_execution_totalCounterplugin_type, decision_name, status, user_idTotal plugin executions. status is success or error.
llmsoup_plugin_execution_duration_secondsHistogramplugin_type, user_idPlugin execution latency distribution.
llmsoup_plugin_errors_totalCounterplugin_type, error_reason, user_idPlugin failures. error_reason is one of: execution_failed, configuration_error, timeout, internal_error.

The PII plugin also emits:

MetricTypeLabelsDescription
llmsoup_pii_violations_totalCountermodel, pii_type, user_idPII violations detected, broken down by PII type.

For the complete metrics catalog, see the Metrics Reference.


The plugin system is built-in only. llmsoup does not expose a custom plugin API — all available plugins are listed on this page. To add new plugin behavior, modifications to the llmsoup source code are required.