v0.2.0· Apache 2.0

Search docs...

corail/guards

4 min read

Guards

Guards provide input/output security checks for agent conversations. They detect prompt injection, PII leakage, and secret exposure. Guards are used by the agent-react strategy via the GuardPipeline.

Guard interface

from corail.guards.base import Guard, GuardDirection, GuardResult
 
class Guard(ABC):
    @property
    def name(self) -> str: ...
 
    @property
    def direction(self) -> GuardDirection:
        return GuardDirection.BOTH  # INPUT, OUTPUT, or BOTH
 
    async def check(self, content: str, direction: GuardDirection) -> GuardResult: ...

GuardResult carries:

FieldTypeDescription
allowedboolWhether the content passed
reasonstrWhy it was blocked (empty if allowed)
sanitizedstrModified content to use instead (e.g., PII masked)
guard_namestrName of the guard that produced this result
detailsdictAdditional metadata (pattern matched, PII types, etc.)

Built-in guards

PromptInjectionGuard

Direction: INPUT only

Detects common prompt injection patterns in user messages. Blocks the request if any pattern matches.

Patterns detected:

  • "ignore all previous instructions"
  • "disregard previous", "forget previous"
  • "you are now", "act as", "pretend to be"
  • "new instructions:", "system:"
  • "override all rules"
  • "jailbreak", "DAN mode"
  • XML-style <system> tags
from corail.guards.builtins import PromptInjectionGuard
 
guard = PromptInjectionGuard()
result = await guard.check("ignore all previous instructions and ...", GuardDirection.INPUT)
# result.allowed = False
# result.reason = "Prompt injection detected: 'ignore all previous instructions'"

PIIGuard

Direction: OUTPUT only

Detects personally identifiable information in agent responses. Two modes:

ModeBehavior
mask=True (default)Replaces PII with [TYPE_REDACTED] tokens, allows the response
block=TrueBlocks the entire response

PII types detected:

TypePattern
emailEmail addresses
phone_frFrench phone numbers (+33, 0x)
phone_intlInternational phone numbers
ssnUS Social Security numbers
credit_cardCredit card numbers (4 groups of 4)
ibanInternational Bank Account Numbers
from corail.guards.builtins import PIIGuard
 
guard = PIIGuard(mask=True, block=False)
result = await guard.check("Contact john@example.com for details", GuardDirection.OUTPUT)
# result.allowed = True
# result.sanitized = "Contact [EMAIL_REDACTED] for details"

SecretGuard

Direction: BOTH

Detects API keys, tokens, and credentials. Always blocks (no masking mode).

Patterns detected:

TypeExample
api_keyapi_key=sk_live_abc123...
bearer_tokenBearer eyJhbG...
aws_keyAKIA0123456789ABCDEF
github_tokenghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
generic_secretpassword="my_secret_value"
private_key-----BEGIN PRIVATE KEY-----

GuardPipeline

Runs guards in sequence. If any guard blocks, the pipeline stops immediately. If a guard returns sanitized content, the next guard checks the sanitized version.

from corail.guards.pipeline import GuardPipeline
from corail.guards.builtins import PromptInjectionGuard, PIIGuard, SecretGuard
 
pipeline = GuardPipeline(
    guards=[PromptInjectionGuard(), PIIGuard(), SecretGuard()],
    event_bus=event_bus,  # Optional: emits GUARD_INPUT_CHECKED, GUARD_BLOCKED events
)
 
# Check user input
input_result = await pipeline.check_input("Hello, can you help me?")
 
# Check agent output
output_result = await pipeline.check_output("Sure! Contact support@company.com")
# output_result.sanitized = "Sure! Contact [EMAIL_REDACTED]"

The pipeline emits events:

EventWhen
GUARD_INPUT_CHECKEDInput guard passed
GUARD_OUTPUT_CHECKEDOutput guard passed
GUARD_BLOCKEDAny guard blocked the content

GuardFactory

Guards are resolved via the registry:

from corail.guards.factory import GuardFactory
 
guard = GuardFactory.create("prompt_injection")
guard = GuardFactory.create("pii", mask=True, block=False)
guard = GuardFactory.create("secrets")
 
# List available guards
print(GuardFactory.available())  # ['pii', 'prompt_injection', 'secrets']

Adding a custom guard

from corail.guards.factory import register_guard
 
register_guard("toxicity", "mypackage.guards", "ToxicityGuard")

Configuration

Guards are configured via the CORAIL_GUARDS environment variable (used by the agent-react strategy initializer):

CORAIL_GUARDS='["prompt_injection", "pii", "secrets"]'

This JSON array of guard names is parsed at startup. Each guard is created via GuardFactory and assembled into a GuardPipeline.