corail/guards
Guards
Guards provide input/output security checks for agent conversations. They detect prompt injection, PII leakage, and secret exposure. Guards are used by the agent-react strategy via the GuardPipeline.
Guard interface
from corail.guards.base import Guard, GuardDirection, GuardResult
class Guard(ABC):
@property
def name(self) -> str: ...
@property
def direction(self) -> GuardDirection:
return GuardDirection.BOTH # INPUT, OUTPUT, or BOTH
async def check(self, content: str, direction: GuardDirection) -> GuardResult: ...GuardResult carries:
| Field | Type | Description |
|---|---|---|
allowed | bool | Whether the content passed |
reason | str | Why it was blocked (empty if allowed) |
sanitized | str | Modified content to use instead (e.g., PII masked) |
guard_name | str | Name of the guard that produced this result |
details | dict | Additional metadata (pattern matched, PII types, etc.) |
Built-in guards
PromptInjectionGuard
Direction: INPUT only
Detects common prompt injection patterns in user messages. Blocks the request if any pattern matches.
Patterns detected:
- "ignore all previous instructions"
- "disregard previous", "forget previous"
- "you are now", "act as", "pretend to be"
- "new instructions:", "system:"
- "override all rules"
- "jailbreak", "DAN mode"
- XML-style
<system>tags
from corail.guards.builtins import PromptInjectionGuard
guard = PromptInjectionGuard()
result = await guard.check("ignore all previous instructions and ...", GuardDirection.INPUT)
# result.allowed = False
# result.reason = "Prompt injection detected: 'ignore all previous instructions'"PIIGuard
Direction: OUTPUT only
Detects personally identifiable information in agent responses. Two modes:
| Mode | Behavior |
|---|---|
mask=True (default) | Replaces PII with [TYPE_REDACTED] tokens, allows the response |
block=True | Blocks the entire response |
PII types detected:
| Type | Pattern |
|---|---|
email | Email addresses |
phone_fr | French phone numbers (+33, 0x) |
phone_intl | International phone numbers |
ssn | US Social Security numbers |
credit_card | Credit card numbers (4 groups of 4) |
iban | International Bank Account Numbers |
from corail.guards.builtins import PIIGuard
guard = PIIGuard(mask=True, block=False)
result = await guard.check("Contact john@example.com for details", GuardDirection.OUTPUT)
# result.allowed = True
# result.sanitized = "Contact [EMAIL_REDACTED] for details"SecretGuard
Direction: BOTH
Detects API keys, tokens, and credentials. Always blocks (no masking mode).
Patterns detected:
| Type | Example |
|---|---|
api_key | api_key=sk_live_abc123... |
bearer_token | Bearer eyJhbG... |
aws_key | AKIA0123456789ABCDEF |
github_token | ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
generic_secret | password="my_secret_value" |
private_key | -----BEGIN PRIVATE KEY----- |
GuardPipeline
Runs guards in sequence. If any guard blocks, the pipeline stops immediately. If a guard returns sanitized content, the next guard checks the sanitized version.
from corail.guards.pipeline import GuardPipeline
from corail.guards.builtins import PromptInjectionGuard, PIIGuard, SecretGuard
pipeline = GuardPipeline(
guards=[PromptInjectionGuard(), PIIGuard(), SecretGuard()],
event_bus=event_bus, # Optional: emits GUARD_INPUT_CHECKED, GUARD_BLOCKED events
)
# Check user input
input_result = await pipeline.check_input("Hello, can you help me?")
# Check agent output
output_result = await pipeline.check_output("Sure! Contact support@company.com")
# output_result.sanitized = "Sure! Contact [EMAIL_REDACTED]"The pipeline emits events:
| Event | When |
|---|---|
GUARD_INPUT_CHECKED | Input guard passed |
GUARD_OUTPUT_CHECKED | Output guard passed |
GUARD_BLOCKED | Any guard blocked the content |
GuardFactory
Guards are resolved via the registry:
from corail.guards.factory import GuardFactory
guard = GuardFactory.create("prompt_injection")
guard = GuardFactory.create("pii", mask=True, block=False)
guard = GuardFactory.create("secrets")
# List available guards
print(GuardFactory.available()) # ['pii', 'prompt_injection', 'secrets']Adding a custom guard
from corail.guards.factory import register_guard
register_guard("toxicity", "mypackage.guards", "ToxicityGuard")Configuration
Guards are configured via the CORAIL_GUARDS environment variable (used by the agent-react strategy initializer):
CORAIL_GUARDS='["prompt_injection", "pii", "secrets"]'This JSON array of guard names is parsed at startup. Each guard is created via GuardFactory and assembled into a GuardPipeline.