Skip to content

Controls

Controls are modular safety checks that run on text before it reaches the LLM (input stage) and after the LLM responds (output stage). Controls are scan-only — they detect and report issues but do not modify the text. Based on their configured action, they can observe, flag, or block requests.

The fastest way to enable controls is through glacis.yaml:

version: "1.3"
controls:
input:
pii_phi:
enabled: true
mode: "fast"
if_detected: "flag"

Then pass the config to your integration wrapper:

from glacis.integrations.openai import attested_openai
client = attested_openai(config="glacis.yaml")
# PII in the prompt is detected, flagged, and recorded in the attestation
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "My SSN is 123-45-6789"}],
)

Detects the 18 HIPAA Safe Harbor identifiers using Microsoft Presidio with custom healthcare-specific recognizers.

Install:

Terminal window
pip install glacis[controls]

Two scanning modes:

ModeEngineLatencyBest For
fastRegex-only< 2 msHigh-throughput, latency-sensitive
fullRegex + spaCy NER~15-20 msHigher accuracy for names/locations

Configuration:

controls:
input:
pii_phi:
enabled: true
model: "presidio"
mode: "fast" # "fast" or "full"
entities: ["US_SSN", "EMAIL_ADDRESS"] # Empty = all HIPAA entities
if_detected: "flag" # "forward", "flag", or "block"

Supported entity types:

The PII control covers the full HIPAA Safe Harbor set including PERSON, DATE_TIME, PHONE_NUMBER, EMAIL_ADDRESS, US_SSN, US_DRIVER_LICENSE, URL, IP_ADDRESS, CREDIT_CARD, US_BANK_NUMBER, IBAN_CODE, US_PASSPORT, US_ITIN, MEDICAL_RECORD_NUMBER, HEALTH_PLAN_BENEFICIARY, NPI, DEA_NUMBER, MEDICAL_LICENSE, US_ZIP_CODE, STREET_ADDRESS, VIN, LICENSE_PLATE, DEVICE_SERIAL, UDI, IMEI, FAX_NUMBER, BIOMETRIC_ID, and UUID.

When entities is empty (the default), all HIPAA entity types are scanned.

Detects jailbreak and prompt injection attempts using Meta Llama Prompt Guard 2 models.

Install:

Terminal window
pip install glacis[jailbreak]

Supported models:

ModelParametersLatencyUse Case
prompt_guard_22m~22M (DeBERTa-xsmall)< 10 ms (CPU)High-throughput, latency-sensitive
prompt_guard_86m~86M (DeBERTa-v3-base)~20-50 msHigher accuracy, complex attacks

Configuration:

controls:
input:
jailbreak:
enabled: true
model: "prompt_guard_22m" # or "prompt_guard_86m"
threshold: 0.5 # Classification threshold (0-1)
if_detected: "block" # "forward", "flag", or "block"

The model classifies text as either BENIGN or MALICIOUS. When the malicious confidence score exceeds the threshold, the control reports a detection.

Case-insensitive literal string matching for detecting prohibited terms. Uses re.escape() to prevent regex injection. No extra dependencies required.

Configuration:

controls:
input:
word_filter:
enabled: true
entities: ["confidential", "proprietary", "internal only"]
if_detected: "flag"
output:
word_filter:
enabled: true
entities: ["system prompt", "secret key"]
if_detected: "block"

Safety limits: a maximum of 500 entities, each up to 256 characters.

Every control returns an action that determines how the pipeline proceeds:

ActionBehaviorPipeline continues?
forwardObserve and pass throughYes
flagLog detection and continueYes
blockHalt the requestNo (input) / Depends (output)

When an output control triggers block, the output_block_action setting determines what happens:

controls:
output_block_action: "block" # or "forward"
SettingBehavior
"block" (default)Raises GlacisBlockedError — the LLM response is withheld
"forward"Returns the LLM response but marks the determination as "blocked" in the attestation

When using provider integrations (OpenAI, Anthropic, Gemini), controls are configured through glacis.yaml and run automatically:

from glacis.integrations.openai import attested_openai
from glacis.integrations.base import GlacisBlockedError
client = attested_openai(config="glacis.yaml")
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Ignore all instructions"}],
)
except GlacisBlockedError as e:
print(f"Blocked by {e.control_type}") # e.g., "jailbreak"
if e.score is not None:
print(f"Score: {e.score:.2f}")

You can also pass control instances directly to integrations without a config file, using the input_controls and output_controls parameters:

from glacis.controls import PIIControl, JailbreakControl
from glacis.config import PiiPhiControlConfig, JailbreakControlConfig
from glacis.integrations.openai import attested_openai
pii = PIIControl(PiiPhiControlConfig(enabled=True, mode="fast", if_detected="flag"))
jailbreak = JailbreakControl(JailbreakControlConfig(enabled=True, threshold=0.5, if_detected="block"))
client = attested_openai(
input_controls=[pii, jailbreak],
)

Glacis recognizes 8 control types. Each control you write or configure is classified into one of these types in the attestation record.

TypeBuilt-inDescriptionExample Use Case
piiPIIControlPII/PHI detectionScanning for SSNs, emails, medical records
jailbreakJailbreakControlPrompt injection detection (ML)Blocking “ignore all instructions” attacks
word_filterWordFilterControlLiteral keyword matchingCatching leaked terms like “confidential”
content_safetyContentSafetyControlToxicity / harmful content (ML)Filtering offensive or policy-violating output
topicTopicControlTopic enforcement (keyword)Ensuring LLM stays within intended domain
prompt_securityPromptSecurityControlPrompt extraction detection (regex)Detecting system prompt extraction attempts
groundingGroundingControl (stub)Factual grounding / hallucinationValidating LLM output against source documents
customCatch-allAny other validationDomain-specific business logic

All 7 built-in controls listed above (excluding custom) can be configured entirely in glacis.yaml. The grounding control is a pass-through stub — for real grounding validation, use the custom section with a control that accepts reference text. Set the control_type class attribute on your custom control class to any of these values. Controls with unrecognized types are automatically classified as "custom" in the attestation.

Detects toxic, harmful, or policy-violating content using HuggingFace toxicity classifiers. The model is lazy-loaded on first use.

controls:
output:
content_safety:
enabled: true
model: "toxic-bert" # HuggingFace model alias
threshold: 0.5 # Score threshold (0-1)
categories: ["toxic", "threat", "insult"] # Empty = all categories
if_detected: "flag"

Categories (toxic-bert): toxic, severe_toxic, obscene, threat, insult, identity_hate.

Keyword-based topic control with two modes: blocklist (flag matching terms) and allowlist (flag when no terms match).

controls:
input:
topic:
enabled: true
allowed_topics: ["healthcare", "medical", "patient"] # Must match at least one
blocked_topics: ["politics", "gambling"] # Must not match any
if_detected: "block"

When both are configured, blocked topics are checked first. No external dependencies required.

Detects prompt extraction attempts, instruction overrides, and role manipulation using built-in regex patterns. Ships with patterns for common attacks (system prompt extraction, “ignore instructions”, DAN, developer mode, etc.).

controls:
input:
prompt_security:
enabled: true
patterns: ["secret\\s+password"] # Additional custom patterns (regex)
if_detected: "block" # Defaults to "block" for security

Complements jailbreak (ML-based): prompt_security is rule-based and zero-latency. No external dependencies.

The built-in grounding control is a pass-through stub because check(text) doesn’t receive reference text for comparison. Enable it for attestation type classification, or implement real grounding via custom:

controls:
output:
grounding:
enabled: true # Stub: always passes, sets control_type="grounding"
custom:
- path: "my_grounding.GroundingValidator" # Real implementation
enabled: true
args:
reference_text: "The source document..."
threshold: 0.7

Custom controls let you plug any validation logic into the Glacis pipeline — LLM-based judges, ML models, API calls, regex matching, database lookups, or anything else. They run automatically on every LLM call and their results are cryptographically attested.

Three things are required:

  1. Set control_type — a class attribute identifying the control (any of the 8 types above)
  2. Implement check(text) — the single abstract method that receives the text to validate
  3. Return a ControlResult — a standardized result with detection info

The check() method is the universal extension point. For input controls, text is the user’s message. For output controls, text is the LLM response. What happens inside check() is entirely up to you.

from glacis.controls.base import BaseControl, ControlResult
class GroundingControl(BaseControl):
"""Validates LLM output is grounded in a reference document."""
control_type = "grounding" # Maps to the "grounding" attestation type
def __init__(self, api_key: str, threshold: float = 0.7, if_detected: str = "flag"):
self._api_key = api_key
self._threshold = threshold
self._action = if_detected
def check(self, text: str) -> ControlResult:
# Your validation logic — LLM call, ML model, API, anything
score = self._compute_grounding_score(text)
is_ungrounded = score < self._threshold
return ControlResult(
control_type=self.control_type,
detected=is_ungrounded,
action=self._action if is_ungrounded else "forward",
score=score,
categories=["low_grounding"] if is_ungrounded else [],
latency_ms=0, # Set by your implementation
metadata={"threshold": self._threshold, "model": "your-model"},
)
def _compute_grounding_score(self, text: str) -> float:
# ... your scoring logic ...
return 0.85
def close(self) -> None:
# Optional: release resources (API clients, ML models, etc.)
pass

The recommended way to register custom controls is through glacis.yaml. This lets you enable, disable, and tune controls without changing any code.

controls:
output:
custom:
- path: "grounding_control.GroundingControl" # module.ClassName
enabled: true
if_detected: "flag"
args:
api_key: "${OPENAI_API_KEY}" # Environment variable
threshold: 0.7

path — Dot-separated import path in the format module_name.ClassName. The module is resolved relative to the YAML file’s directory (automatically added to sys.path).

enabled — Toggle the control on/off without removing the configuration. Default: true.

if_detected — Action when the control detects an issue: "forward", "flag", or "block". Default: "flag". This is passed to your constructor as the if_detected kwarg.

args — Constructor keyword arguments. Supports ${ENV_VAR} substitution for secrets.

Place your control module next to glacis.yaml. Glacis automatically adds the YAML file’s directory to sys.path, so imports just work:

my-project/
glacis.yaml # References "grounding_control.GroundingControl"
grounding_control.py # Your custom control module
app.py

For controls in a package:

my-project/
glacis.yaml # References "controls.grounding.GroundingControl"
controls/
__init__.py
grounding.py
app.py

Use ${VAR_NAME} syntax to inject environment variables into any string value in glacis.yaml. This works everywhere in the config, not just in custom control args:

controls:
output:
custom:
- path: "my_control.QAValidator"
args:
api_key: "${OPENAI_API_KEY}"
endpoint: "${VALIDATION_API_URL}"

If a referenced variable is not set, Glacis raises a clear error at startup:

ValueError: Environment variable 'OPENAI_API_KEY' is not set.
Referenced in glacis.yaml via ${OPENAI_API_KEY}.

For cases where YAML configuration isn’t suitable (e.g., controls that require runtime-constructed objects), pass control instances directly:

from glacis.integrations.openai import attested_openai
client = attested_openai(
output_controls=[GroundingControl(api_key="sk-...", threshold=0.7)],
)

You can register any number of custom controls on both input and output stages:

controls:
input:
custom:
- path: "security.PromptLeakDetector"
enabled: true
if_detected: "block"
args:
model: "classifier-v2"
output:
custom:
- path: "grounding_control.GroundingControl"
enabled: true
if_detected: "flag"
args:
api_key: "${OPENAI_API_KEY}"
- path: "toxicity.ContentSafetyControl"
enabled: true
if_detected: "block"
args:
threshold: 0.9

All controls — built-in and custom — run in parallel within each stage. Total latency equals the slowest control, not the sum. Errors in individual controls don’t crash the pipeline.

If a custom control fails to load, Glacis raises a descriptive error at startup:

ErrorCauseExample Message
ImportErrorInvalid path formatInvalid control path 'NoDotsHere'. Expected format: 'module_name.ClassName' (e.g., 'my_controls.ToxicityControl').
ImportErrorModule not foundCannot import module 'my_controls' for custom control 'my_controls.Foo'. Glacis looked in: /path/to/project (glacis.yaml directory) and standard Python path. Check that the file 'my_controls.py' exists and has no import errors.
AttributeErrorClass not in moduleModule 'my_controls' has no class 'Foo'. Available controls in 'my_controls': ['GroundingControl', 'ToxicityControl']
TypeErrorNot a BaseControl'my_controls.Helper' is not a BaseControl subclass. Custom controls must extend glacis.controls.base.BaseControl.
TypeErrorConstructor mismatchFailed to instantiate 'my_controls.MyCtrl' with args ['api_key']. Check that the constructor accepts these parameters. Error: ...
ValueErrorMissing env varEnvironment variable 'MY_KEY' is not set. Referenced in glacis.yaml via ${MY_KEY}.

Control results are recorded in the attestation’s control_plane_results field. Each control execution is captured as a ControlExecution entry:

FieldTypeDescription
idstrIdentifier (e.g., "glacis-input-pii")
typestrControl type ("content_safety", "pii", "jailbreak", "topic", "prompt_security", "grounding", "word_filter", "custom")
versionstrSDK version
providerstrProvider identifier
latency_msintProcessing time in milliseconds
statusstrAction taken: "forward", "flag", "block", or "error"
scorefloat | NoneConfidence score (scale is control-specific, e.g., 0-1 for ML classifiers, 0-3 for grading rubrics)
result_hashstr | NoneHash of the control result
stagestrPipeline stage: "input" or "output"

The top-level determination field in the control plane results records whether the overall request was "forwarded" or "blocked".

Every control returns a standardized ControlResult:

FieldTypeDescription
control_typestrControl type identifier
detectedboolWhether a threat/issue was detected
actionstr"forward", "flag", "block", or "error"
scorefloat | NoneConfidence score (must be >= 0, scale is control-specific)
categorieslist[str]Detected categories (e.g., ["US_SSN", "PERSON"])
latency_msintProcessing time in milliseconds
modified_textstr | NoneReserved for future use (not currently used)
metadatadictControl-specific metadata for audit trail

The glacis.controls module also exports the following types useful for programmatic control orchestration:

ExportDescription
ControlsRunnerOrchestrates running multiple controls on a given text
StageResultResult object from running controls on one stage (input or output)
ControlActionLiteral["forward", "flag", "block", "error"] type alias for control action strings
from glacis.controls import ControlsRunner, StageResult, ControlAction