Working with Guardrails

Guardrails have become an essential component in agentic AI systems. They allow you to validate user inputs and LLM responses using configurable policies. For vendor-supported guardrails, see:

Embabel provides a framework for building custom guardrails, enabling developers to integrate validation logic of their choice.

Motivation

While you can validate user prompts or thinking blocks using custom validators, Embabel provides a standardized framework through the withGuardRails API. Guardrails can be implemented as POJOs or Spring beans that implement Embabel’s guardrail interfaces.

Common use cases for guardrails:

Input validation: Validate user prompts with common, streaming, or thinking prompt runners
Response validation with thinking: The thinking API provides access to LLM thinking blocks, even when the LLM cannot construct an object
Object response validation: When the LLM constructs an object, you can still validate the output (the content being validated is the object’s JSON representation)
Streaming validation: In streaming mode, StreamingEvent.Thinking provides direct access to LLM reasoning content via the doOnNext callback (see Working with Streams)

A key benefit of this framework is access to the Blackboard object, which allows guardrail logic to consider other entities participating in the agentic workflow.

Concepts

UserInputGuardRail and AssistantMessageGuardRail interfaces define guardrails for user inputs and LLM responses, respectively
Guardrails are registered using the withGuardRails API, which can be chained
Guardrail validation returns a ValidationResult object
Validation errors are sorted by ValidationSeverity level and logged at the corresponding level
A CRITICAL severity level causes a GuardRailViolationException to be thrown for user input guardrails, preventing the LLM operation from executing
By design, createObjectIfPossible handles exceptions gracefully and completes without constructing an object; however, GuardRailViolationException is wrapped inside ThinkingResponse when using thinking mode

Customizing Message Combining

In multi-turn conversations, guardrails often need to validate not just a single prompt but an entire conversation history. When doTransform or similar methods are called with multiple UserMessage objects, each UserInputGuardRail receives the full list and must combine them into a single string for validation.

The combineMessages method controls how this combination happens. Different guardrails may need different formats:

A toxicity filter might want all messages concatenated to check the overall tone
An audit guardrail might want each message tagged with its position in the conversation
A PII detector might want clear separators to identify which message contains sensitive data

The default implementation joins messages with newlines:

Java
Kotlin

default String combineMessages(List<UserMessage> userMessages) {
    return userMessages.stream()
            .map(UserMessage::getContent)
            .collect(Collectors.joining("\n"));
}

fun combineMessages(userMessages: List<UserMessage>): String {
    return userMessages.joinToString(separator = "\n") { message ->
        message.content
    }
}

For example, three messages ["Hello", "How are you?", "Tell me about X"] become:

Hello
How are you?
Tell me about X

To customize this behavior, override combineMessages in your guardrail:

Java
Kotlin

class AuditGuardRail implements UserInputGuardRail {

    @Override
    public @NotNull String getName() {
        return "AuditGuard";
    }

    @Override
    public @NotNull String getDescription() {
        return "Logs conversation with message markers for audit trail";
    }

    @Override
    public @NotNull String combineMessages(@NotNull List<UserMessage> userMessages) {
        // Tag each message with its position for audit logging
        StringBuilder result = new StringBuilder();
        for (int i = 0; i < userMessages.size(); i++) {
            if (i > 0) {
                result.append("\n");
            }
            result.append("[Turn ").append(i + 1).append("]: ")
                  .append(userMessages.get(i).getContent());
        }
        return result.toString();
    }

    @Override
    public @NotNull ValidationResult validate(@NotNull String input, @NotNull Blackboard blackboard) {
        // input now contains: "[Turn 1]: Hello\n[Turn 2]: How are you?\n[Turn 3]: Tell me about X"
        logger.info("Audit trail: {}", input);
        return ValidationResult.VALID;
    }
}

class AuditGuardRail : UserInputGuardRail {
    override val name = "AuditGuard"
    override val description = "Logs conversation with message markers for audit trail"

    override fun combineMessages(userMessages: List<UserMessage>): String {
        // Tag each message with its position for audit logging
        return userMessages.mapIndexed { index, message ->
            "[Turn ${index + 1}]: ${message.content}"
        }.joinToString("\n")
    }

    override fun validate(input: String, blackboard: Blackboard): ValidationResult {
        // input now contains: "[Turn 1]: Hello\n[Turn 2]: How are you?\n[Turn 3]: Tell me about X"
        logger.info("Audit trail: {}", input)
        return ValidationResult.VALID
    }
}

Example: Blocking LLM Execution with CRITICAL Validation Errors

This example demonstrates how a guardrail with CRITICAL severity prevents LLM execution by throwing a GuardRailViolationException.

Step 1: Define the guardrails

First, define a user input guardrail that returns a CRITICAL validation error:

Java
Kotlin

/**
 * A guardrail that blocks execution by returning a CRITICAL validation error.
 */
class CriticalUserInputGuardRail implements UserInputGuardRail {

    @Override
    public @NotNull String getName() {
        return "CriticalUserInputGuardRail";
    }

    @Override
    public @NotNull String getDescription() {
        return "Blocks execution when critical policy violations are detected";
    }

    @Override
    public @NotNull ValidationResult validate(@NotNull String input, @NotNull Blackboard blackboard) {
        // Return a CRITICAL error to block LLM execution
        return new ValidationResult(true, List.of(
            new ValidationError("policy-violation", "Content violates safety policy", ValidationSeverity.CRITICAL)
        ));
    }
}

/**
 * A guardrail that blocks execution by returning a CRITICAL validation error.
 */
class CriticalUserInputGuardRail : UserInputGuardRail {

    override val name = "CriticalUserInputGuardRail"

    override val description = "Blocks execution when critical policy violations are detected"

    override fun validate(input: String, blackboard: Blackboard): ValidationResult {
        // Return a CRITICAL error to block LLM execution
        return ValidationResult(true, listOf(
            ValidationError("policy-violation", "Content violates safety policy", ValidationSeverity.CRITICAL)
        ))
    }
}

Next, define an assistant message guardrail to validate LLM responses:

Java
Kotlin

/**
 * A guardrail that validates LLM thinking blocks.
 */
class ThinkingBlocksGuardRail implements AssistantMessageGuardRail {

    @Override
    public @NotNull String getName() {
        return "ThinkingBlocksGuardRail";
    }

    @Override
    public @NotNull String getDescription() {
        return "Validates LLM thinking blocks for compliance";
    }

    @Override
    public @NotNull ValidationResult validate(@NotNull ThinkingResponse<?> response, @NotNull Blackboard blackboard) {
        logger.info("Validating thinking blocks: {}", response.getThinkingBlocks());
        return new ValidationResult(true, Collections.emptyList());
    }

    @Override
    public @NotNull ValidationResult validate(@NotNull String input, @NotNull Blackboard blackboard) {
        return new ValidationResult(true, Collections.emptyList());
    }
}

/**
 * A guardrail that validates LLM thinking blocks.
 */
class ThinkingBlocksGuardRail : AssistantMessageGuardRail {

    override val name = "ThinkingBlocksGuardRail"

    override val description = "Validates LLM thinking blocks for compliance"

    override fun validate(response: ThinkingResponse<*>, blackboard: Blackboard): ValidationResult {
        logger.info("Validating thinking blocks: {}", response.thinkingBlocks)
        return ValidationResult(true, emptyList())
    }

    override fun validate(input: String, blackboard: Blackboard): ValidationResult {
        return ValidationResult(true, emptyList())
    }
}

Step 2: Use the guardrails with a PromptRunner

Java
Kotlin

// Configure the PromptRunner with guardrails
PromptRunner runner = ai.withDefaultLlm()  // Example uses claude-sonnet-4-5
        .withToolObject(Tooling.class)
        .withGenerateExamples(true)
        .withGuardRails(new CriticalUserInputGuardRail(), new ThinkingBlocksGuardRail());

String prompt = """
        What is the hottest month in Florida and provide its temperature.
        The name should be the month name, temperature should be in Fahrenheit.
        """;

try {
    // Attempt to create an object with thinking
    ThinkingResponse<MonthItem> response = runner
            .thinking()
            .createObject(prompt, MonthItem.class);
} catch (GuardRailViolationException ex) {
    // CRITICAL validation errors cause this exception to be thrown,
    // preventing the LLM operation from executing
    logger.error("Guardrail blocked execution: {}", ex.getMessage());
}

// Configure the PromptRunner with guardrails
val runner = ai.withDefaultLlm()  // Example uses claude-sonnet-4-5
        .withToolObject(Tooling::class.java)
        .withGenerateExamples(true)
        .withGuardRails(CriticalUserInputGuardRail(), ThinkingBlocksGuardRail())

val prompt = """
        What is the hottest month in Florida and provide its temperature.
        The name should be the month name, temperature should be in Fahrenheit.
        """.trimIndent()

try {
    // Attempt to create an object with thinking
    val response = runner
            .thinking()
            .createObject(prompt, MonthItem::class.java)
} catch (ex: GuardRailViolationException) {
    // CRITICAL validation errors cause this exception to be thrown,
    // preventing the LLM operation from executing
    logger.error("Guardrail blocked execution: {}", ex.message)
}

Example: Using Guardrails for Response Analysis

When the LLM cannot construct an object (for example, when the prompt is ambiguous), guardrails can still analyze the LLM’s thinking process. This is useful for understanding why object creation failed or for extracting insights from the reasoning.

Step 1: Define a simple user input guardrail

Java
Kotlin

/**
 * A guardrail that logs user input with INFO-level validation messages.
 */
class LoggingUserInputGuardRail implements UserInputGuardRail {

    @Override
    public @NotNull String getName() {
        return "LoggingUserInputGuardRail";
    }

    @Override
    public @NotNull String getDescription() {
        return "Logs user input for auditing purposes";
    }

    @Override
    public @NotNull ValidationResult validate(@NotNull String input, @NotNull Blackboard blackboard) {
        logger.info("Processing user input: {}", input);
        // Return an INFO-level message (does not block execution)
        return new ValidationResult(true, List.of(
            new ValidationError("audit", "Input logged", ValidationSeverity.INFO)
        ));
    }
}

/**
 * A guardrail that logs user input with INFO-level validation messages.
 */
class LoggingUserInputGuardRail : UserInputGuardRail {

    override val name = "LoggingUserInputGuardRail"

    override val description = "Logs user input for auditing purposes"

    override fun validate(input: String, blackboard: Blackboard): ValidationResult {
        logger.info("Processing user input: {}", input)
        // Return an INFO-level message (does not block execution)
        return ValidationResult(true, listOf(
            ValidationError("audit", "Input logged", ValidationSeverity.INFO)
        ))
    }
}

Step 2: Use guardrails with createObjectIfPossible

Java
Kotlin

// Configure the PromptRunner with chained guardrails
PromptRunner runner = ai.withDefaultLlm()  // Example uses claude-sonnet-4-5
        .withToolObject(Tooling.class)
        .withGuardRails(new LoggingUserInputGuardRail())
        .withGuardRails(new ThinkingBlocksGuardRail());

String prompt = "Think about the coldest month in Alaska and its temperature. Provide your analysis.";

// Use createObjectIfPossible to handle cases where object creation may fail
ThinkingResponse<MonthItem> response = runner
        .thinking()
        .createObjectIfPossible(prompt, MonthItem.class);

// The LLM may not be able to construct an object if the prompt is ambiguous
if (response.getResult() == null) {
    // Analyze the thinking blocks to understand the LLM's reasoning
    logger.info("Object creation not possible. Thinking blocks: {}", response.getThinkingBlocks());
}

// Configure the PromptRunner with chained guardrails
val runner = ai.withDefaultLlm()  // Example uses claude-sonnet-4-5
        .withToolObject(Tooling::class.java)
        .withGuardRails(LoggingUserInputGuardRail())
        .withGuardRails(ThinkingBlocksGuardRail())

val prompt = "Think about the coldest month in Alaska and its temperature. Provide your analysis."

// Use createObjectIfPossible to handle cases where object creation may fail
val response = runner
        .thinking()
        .createObjectIfPossible(prompt, MonthItem::class.java)

// The LLM may not be able to construct an object if the prompt is ambiguous
if (response.result == null) {
    // Analyze the thinking blocks to understand the LLM's reasoning
    logger.info("Object creation not possible. Thinking blocks: {}", response.thinkingBlocks)
}

When the LLM cannot provide a definitive answer, you might see reasoning like:

Since I must be SURE about EVERY field and cannot make assumptions or provide approximate values,
I cannot provide the success structure with confidence.

Guardrails can automate further analysis of LLM responses, for example by using semantic text processing tools like CoreNLP.

For more examples, see:

embabel-agent-autoconfigure/models/embabel-agent-anthropic-autoconfigure/
  src/test/java/com/embabel/agent/config/models/anthropic/LLMAnthropicThinkingIT.java

Global Guardrails Configuration

In addition to attaching guardrails per-call via withGuardRails(…), Embabel supports declaring global guardrails through application properties. A global guardrail is instantiated once at startup and applied to every LLM operation, in addition to any interaction-specific guardrails configured on the PromptRunner.

This is useful for cross-cutting safety policies that should always run, regardless of which call site invokes the LLM (PII redaction, profanity filtering, cost limits, audit logging, etc.).

Property Configuration

Global guardrails are defined as comma-separated, fully-qualified class names in application.properties (or application.yml):

# Guardrails applied to every user input
embabel.agent.guardrails.user-input=com.example.ProfanityFilter,com.example.LengthValidator

# Guardrails applied to every assistant message
embabel.agent.guardrails.assistant-message=com.example.OutputValidator

# Whether instantiation failures should fail-fast at startup (default: false)
embabel.agent.guardrails.fail-on-error=false

Each class must:

Implement the appropriate interface (UserInputGuardRail for user-input, AssistantMessageGuardRail for assistant-message)
Provide a public no-arg constructor (instances are created via BeanUtils.instantiateClass)

Guardrails registered through these properties are plain POJOs, not Spring-managed beans. The registry calls BeanUtils.instantiateClass(…) directly, so:

@Autowired, @Value, and constructor injection of Spring beans do not work
@PostConstruct / @PreDestroy lifecycle callbacks are not invoked
ApplicationContextAware, BeanNameAware, and similar *Aware callbacks are not invoked
Dynamic access patterns (e.g. composing the name field from a Spring-backed holder, or pulling a config bean via a static ApplicationContext accessor) will fail or return null at startup, because the holder may not be initialized yet

If your guardrail genuinely needs Spring dependencies, the recommended pattern is to bridge the ApplicationContext through a small ApplicationContextAware holder, and have the guardrail look up beans or properties at validation time rather than in the constructor (see Accessing Spring Beans from a POJO Guardrail). Alternatively, declare the guardrail as a @Component and attach it per-call through withGuardRails(…) on the PromptRunner.

Accessing Spring Beans from a POJO Guardrail

When a property-registered guardrail must consult Spring beans or environment properties (for example, a cost limit defined in application.yml, or a shared MeterRegistry), expose the ApplicationContext through a static holder and let the guardrail call back into it at validation time.

package com.example.observability;

import org.springframework.context.ApplicationContext;
import org.springframework.context.ApplicationContextAware;
import org.springframework.stereotype.Component;

/**
 * Bridges Spring's ApplicationContext to POJO guardrails instantiated by
 * embabel's GlobalGuardRailsRegistry. Those guardrails are created via
 * BeanUtils.instantiateClass() with a no-arg constructor, so they cannot
 * use @Autowired — they reach back through this holder to look up beans
 * and properties at validation time.
 */
@Component
public class SpringContextHolder implements ApplicationContextAware {

    private static volatile ApplicationContext context;

    @Override
    public void setApplicationContext(ApplicationContext applicationContext) {
        context = applicationContext;
    }

    public static ApplicationContext context() {
        return context;
    }

    public static <T> T getBean(Class<T> type) {
        ApplicationContext ctx = context;
        return ctx != null ? ctx.getBean(type) : null;
    }

    public static String getProperty(String key, String defaultValue) {
        ApplicationContext ctx = context;
        return ctx != null ? ctx.getEnvironment().getProperty(key, defaultValue) : defaultValue;
    }
}

A guardrail can then resolve its dependencies lazily, inside validate(…):

Java
Kotlin

public class CostLimitGuardRail implements UserInputGuardRail {

    @Override
    public @NotNull String getName() {
        return "CostLimitGuardRail";
    }

    @Override
    public @NotNull String getDescription() {
        return "Blocks requests when the monthly LLM budget is exceeded";
    }

    @Override
    public @NotNull ValidationResult validate(@NotNull String input, @NotNull Blackboard blackboard) {
        // Look up dependencies at validation time — the holder is wired by then
        CostTracker tracker = SpringContextHolder.getBean(CostTracker.class);
        String budget = SpringContextHolder.getProperty("app.llm.monthly-budget-usd", "100");

        if (tracker != null && tracker.spentThisMonth() > Double.parseDouble(budget)) {
            return new ValidationResult(false, List.of(
                new ValidationError("budget-exceeded",
                    "Monthly LLM budget exceeded",
                    ValidationSeverity.CRITICAL)
            ));
        }
        return new ValidationResult(true, Collections.emptyList());
    }
}

class CostLimitGuardRail : UserInputGuardRail {

    override val name = "CostLimitGuardRail"
    override val description = "Blocks requests when the monthly LLM budget is exceeded"

    override fun validate(input: String, blackboard: Blackboard): ValidationResult {
        // Look up dependencies at validation time — the holder is wired by then
        val tracker = SpringContextHolder.getBean(CostTracker::class.java)
        val budget = SpringContextHolder.getProperty("app.llm.monthly-budget-usd", "100").toDouble()

        if (tracker != null && tracker.spentThisMonth() > budget) {
            return ValidationResult(false, listOf(
                ValidationError("budget-exceeded",
                    "Monthly LLM budget exceeded",
                    ValidationSeverity.CRITICAL)
            ))
        }
        return ValidationResult(true, emptyList())
    }
}

Merging with Interaction-Specific Guardrails

When an LLM operation executes, global guardrails are merged with the interaction-specific guardrails set through withGuardRails(…):

Global guardrails always run first, then interaction-specific ones
Duplicates are removed based on class identity (not the name field), so a class registered both globally and per-call will be invoked only once
The deduplication keeps the global instance, ensuring singleton semantics

Strict Mode: `fail-on-error`

By default (fail-on-error=false), if a configured guardrail cannot be instantiated (typo in the class name, missing no-arg constructor, constructor throws), the error is logged and the application continues with the remaining guardrails.

Setting fail-on-error=true causes Spring startup to fail with a GuardRailInstantiationException if any of the following occurs:

The class cannot be loaded
The class does not implement the expected guardrail interface
The constructor throws an exception

Strict mode is recommended in production where missing a guardrail must be treated as a deployment error rather than a silent omission.

package com.example;

public class ProfanityFilter implements UserInputGuardRail {

    @Override
    public @NotNull String getName() {
        return "ProfanityFilter";
    }

    @Override
    public @NotNull String getDescription() {
        return "Blocks user input containing profanity";
    }

    @Override
    public @NotNull ValidationResult validate(@NotNull String input, @NotNull Blackboard blackboard) {
        if (containsProfanity(input)) {
            return new ValidationResult(false, List.of(
                new ValidationError("profanity", "Input contains disallowed terms", ValidationSeverity.CRITICAL)
            ));
        }
        return new ValidationResult(true, Collections.emptyList());
    }

    private boolean containsProfanity(String input) {
        // ... implementation
        return false;
    }
}

package com.example

class ProfanityFilter : UserInputGuardRail {

    override val name = "ProfanityFilter"
    override val description = "Blocks user input containing profanity"

    override fun validate(input: String, blackboard: Blackboard): ValidationResult {
        if (containsProfanity(input)) {
            return ValidationResult(false, listOf(
                ValidationError("profanity", "Input contains disallowed terms", ValidationSeverity.CRITICAL)
            ))
        }
        return ValidationResult(true, emptyList())
    }

    private fun containsProfanity(input: String): Boolean {
        // ... implementation
        return false
    }
}

Once registered in application.properties, the guardrail applies to every LLM call — no code changes are needed at the call site:

// ProfanityFilter runs automatically — no explicit withGuardRails() needed
val response = ai.withDefaultLlm()
    .createObject("Hello, world", Greeting::class.java)

Programmatic Access

The registry can also be accessed programmatically, either as an injected Spring bean or via its companion accessor:

Java
Kotlin

// Spring injection
public class MyService {
    private final GlobalGuardRailsRegistry registry;

    public MyService(GlobalGuardRailsRegistry registry) {
        this.registry = registry;
    }
}

// Static access — get() returns null if Spring hasn't initialized the registry yet;
// the list accessors return an empty list in that case.
GlobalGuardRailsRegistry registry = GlobalGuardRailsRegistry.get();
List<UserInputGuardRail> userGuards = GlobalGuardRailsRegistry.getUserInputGuardRails();
List<AssistantMessageGuardRail> assistantGuards = GlobalGuardRailsRegistry.getAssistantMessageGuardRails();

// Spring injection
class MyService(private val registry: GlobalGuardRailsRegistry)

// Static access — get() returns null if Spring hasn't initialized the registry yet;
// the list accessors return an empty list in that case.
val registry = GlobalGuardRailsRegistry.get()
val userGuards = GlobalGuardRailsRegistry.getUserInputGuardRails()
val assistantGuards = GlobalGuardRailsRegistry.getAssistantMessageGuardRails()

Relationship with Other Validation Mechanisms

The Agent API framework supports Jakarta Bean Validation (JSR-380) for domain object constraints. These constraints are injected into the schema and validated during object construction.

In addition, a planned validation framework for Agent Actions will reuse the same data structures as guardrails, including ValidationResult, ValidationError, and ContentValidator.

In summary, guardrails and bean validators are complementary but distinct:

Bean validation ensures objects are well-formed and meet business constraints
Guardrails ensure AI interactions are safe and compliant with policies

Both can be enabled independently and serve different aspects of the AI safety stack.

@SecureAgentTool is a third, orthogonal mechanism: it enforces access control rather than content safety or data validity. Where guardrails ask “is this content acceptable?”, @SecureAgentTool asks “is this principal allowed to invoke this agent action at all?” The two work well together — @SecureAgentTool prevents unauthorised principals from calling a tool, while guardrails validate the inputs and outputs of calls that are permitted. See @SecureAgentTool for details.