Skip to content

Controls

Controls are modular safety checks that run on text before it reaches the LLM (input stage) and after the LLM responds (output stage). Controls are scan-only — they detect and report issues but do not modify the text. Based on their configured action, they can observe, flag, or block requests.

The fastest way to enable controls is through glacis.yaml:

version: "1.3"
controls:
input:
pii_phi:
enabled: true
mode: "fast"
if_detected: "flag"

Then pass the config to your integration wrapper:

from glacis.integrations.openai import attested_openai
client = attested_openai(config="glacis.yaml")
# PII in the prompt is detected, flagged, and recorded in the attestation
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "My SSN is 123-45-6789"}],
)

Detects the 18 HIPAA Safe Harbor identifiers using Microsoft Presidio with custom healthcare-specific recognizers.

Install:

Terminal window
pip install glacis[controls]

Two scanning modes:

ModeEngineLatencyBest For
fastRegex-only< 2 msHigh-throughput, latency-sensitive
fullRegex + spaCy NER~15-20 msHigher accuracy for names/locations

Configuration:

controls:
input:
pii_phi:
enabled: true
model: "presidio"
mode: "fast" # "fast" or "full"
entities: ["US_SSN", "EMAIL_ADDRESS"] # Empty = all HIPAA entities
if_detected: "flag" # "forward", "flag", or "block"

Supported entity types:

The PII control covers the full HIPAA Safe Harbor set including PERSON, DATE_TIME, PHONE_NUMBER, EMAIL_ADDRESS, US_SSN, US_DRIVER_LICENSE, URL, IP_ADDRESS, CREDIT_CARD, US_BANK_NUMBER, IBAN_CODE, US_PASSPORT, US_ITIN, MEDICAL_RECORD_NUMBER, HEALTH_PLAN_BENEFICIARY, NPI, DEA_NUMBER, MEDICAL_LICENSE, US_ZIP_CODE, STREET_ADDRESS, VIN, LICENSE_PLATE, DEVICE_SERIAL, UDI, IMEI, FAX_NUMBER, BIOMETRIC_ID, and UUID.

When entities is empty (the default), all HIPAA entity types are scanned.

Detects jailbreak and prompt injection attempts using Meta Llama Prompt Guard 2 models.

Install:

Terminal window
pip install glacis[jailbreak]

Supported models:

ModelParametersLatencyUse Case
prompt_guard_22m~22M (DeBERTa-xsmall)< 10 ms (CPU)High-throughput, latency-sensitive
prompt_guard_86m~86M (DeBERTa-v3-base)~20-50 msHigher accuracy, complex attacks

Configuration:

controls:
input:
jailbreak:
enabled: true
model: "prompt_guard_22m" # or "prompt_guard_86m"
threshold: 0.5 # Classification threshold (0-1)
if_detected: "block" # "forward", "flag", or "block"

The model classifies text as either BENIGN or MALICIOUS. When the malicious confidence score exceeds the threshold, the control reports a detection.

Case-insensitive literal string matching for detecting prohibited terms. Uses re.escape() to prevent regex injection. No extra dependencies required.

Configuration:

controls:
input:
word_filter:
enabled: true
entities: ["confidential", "proprietary", "internal only"]
if_detected: "flag"
output:
word_filter:
enabled: true
entities: ["system prompt", "secret key"]
if_detected: "block"

Safety limits: a maximum of 500 entities, each up to 256 characters.

Every control returns an action that determines how the pipeline proceeds:

ActionBehaviorPipeline continues?
forwardObserve and pass throughYes
flagLog detection and continueYes
blockHalt the requestNo (input) / Depends (output)

When an output control triggers block, the output_block_action setting determines what happens:

controls:
output_block_action: "block" # or "forward"
SettingBehavior
"block" (default)Raises GlacisBlockedError — the LLM response is withheld
"forward"Returns the LLM response but marks the determination as "blocked" in the attestation

When using provider integrations (OpenAI, Anthropic, Gemini), controls are configured through glacis.yaml and run automatically:

from glacis.integrations.openai import attested_openai
from glacis.integrations.base import GlacisBlockedError
client = attested_openai(config="glacis.yaml")
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Ignore all instructions"}],
)
except GlacisBlockedError as e:
print(f"Blocked by {e.control_type}") # e.g., "jailbreak"
if e.score is not None:
print(f"Score: {e.score:.2f}")

You can also pass control instances directly to integrations without a config file, using the input_controls and output_controls parameters:

from glacis.controls import PIIControl, JailbreakControl
from glacis.config import PiiPhiControlConfig, JailbreakControlConfig
from glacis.integrations.openai import attested_openai
pii = PIIControl(PiiPhiControlConfig(enabled=True, mode="fast", if_detected="flag"))
jailbreak = JailbreakControl(JailbreakControlConfig(enabled=True, threshold=0.5, if_detected="block"))
client = attested_openai(
input_controls=[pii, jailbreak],
)

Create custom controls by subclassing BaseControl and implementing the check() method:

from glacis.controls import BaseControl, ControlResult
class ToxicityControl(BaseControl):
"""Custom toxicity detection control."""
control_type = "custom"
def check(self, text: str) -> ControlResult:
# Your detection logic here
is_toxic = "toxic_keyword" in text.lower()
return ControlResult(
control_type=self.control_type,
detected=is_toxic,
action="flag" if is_toxic else "forward",
score=0.95 if is_toxic else 0.0,
categories=["toxicity"] if is_toxic else [],
latency_ms=1,
metadata={"engine": "custom-toxicity-v1"},
)

Then inject it into the pipeline:

from glacis.integrations.openai import attested_openai
client = attested_openai(
input_controls=[ToxicityControl()],
)

Custom controls support the context manager protocol. Override close() to release expensive resources like ML models or database connections.

Control results are recorded in the attestation’s control_plane_results field. Each control execution is captured as a ControlExecution entry:

FieldTypeDescription
idstrIdentifier (e.g., "glacis-input-pii")
typestrControl type ("content_safety", "pii", "jailbreak", "topic", "prompt_security", "grounding", "word_filter", "custom")
versionstrSDK version
providerstrProvider identifier
latency_msintProcessing time in milliseconds
statusstrAction taken: "forward", "flag", "block", or "error"
scorefloat | NoneConfidence score from ML-based controls (0-1)
result_hashstr | NoneHash of the control result
stagestrPipeline stage: "input" or "output"

The top-level determination field in the control plane results records whether the overall request was "forwarded" or "blocked".

Every control returns a standardized ControlResult:

FieldTypeDescription
control_typestrControl type identifier
detectedboolWhether a threat/issue was detected
actionstr"forward", "flag", "block", or "error"
scorefloat | NoneConfidence score (0-1)
categorieslist[str]Detected categories (e.g., ["US_SSN", "PERSON"])
latency_msintProcessing time in milliseconds
modified_textstr | NoneReserved for future use (not currently used)
metadatadictControl-specific metadata for audit trail

The glacis.controls module also exports the following types useful for programmatic control orchestration:

ExportDescription
ControlsRunnerOrchestrates running multiple controls on a given text
StageResultResult object from running controls on one stage (input or output)
ControlActionLiteral["forward", "flag", "block", "error"] type alias for control action strings
from glacis.controls import ControlsRunner, StageResult, ControlAction