Tracking LLM Cost and Usage
Embabel emits an event for every LLM and embedding call your agent makes. Subscribe to those events to know, in real time, how much each call cost, which model handled it, and which agent process it belongs to.
The events
Section titled “The events”Two events are available:
LlmInvocationEvent— emitted once per LLM call.EmbeddingInvocationEvent— emitted once per embedding call.
Each event exposes:
invocation.llmMetadata(orembeddingMetadata) — model name and providerinvocation.usage— token countsinvocation.cost()— computed cost for that callinteractionId— identifier of the originating interactionagentProcess— the agent process that triggered the call (useagentProcess.idto group,agentProcess.agent.nameto label)
Subscribing to cost events
Section titled “Subscribing to cost events”Implement AgenticEventListener and react to the events you care about.
The listener is registered like any other Embabel event listener.
public class OrganizationCostTracker implements AgenticEventListener {
private final ConcurrentMap<String, DoubleAdder> costPerAgent = new ConcurrentHashMap<>();
@Override public void onProcessEvent(AgentProcessEvent event) { if (event instanceof LlmInvocationEvent llm) { costPerAgent .computeIfAbsent(llm.getAgentProcess().getAgent().getName(), k -> new DoubleAdder()) .add(llm.getInvocation().cost()); } }}class OrganizationCostTracker : AgenticEventListener {
private val costPerAgent = ConcurrentHashMap<String, DoubleAdder>()
override fun onProcessEvent(event: AgentProcessEvent) { if (event is LlmInvocationEvent) { costPerAgent .computeIfAbsent(event.agentProcess.agent.name) { DoubleAdder() } .add(event.invocation.cost()) } }}The same pattern works for EmbeddingInvocationEvent.
Blocking spending: the Budget Guardrail pattern
Section titled “Blocking spending: the Budget Guardrail pattern”Cost events fire after the call completes, so they cannot stop the call that just ran. What they can do is stop the next one.
The pattern combines two pieces you already know:
- A listener that counts. Subscribe to
LlmInvocationEventand accumulate cost or tokens against the key you care about — agent process id, tenant, end user. - A guardrail that blocks. A
UserInputGuardRailreads the counter before the next LLM call. If the budget is exceeded, the guardrail returns aCRITICALvalidation error and the call never happens.
LLM call ───► LlmInvocationEvent ─┐ ▼ counter (per agent / tenant / user) │ next call ──► UserInputGuardRail reads counter ────────┘ │ over budget? ──► CRITICAL ──► call blockedThe counter lives in your listener; the decision lives in your guardrail.
Embabel wires both into the agent process for you.
See Working with Guardrails for how to register a UserInputGuardRail and how CRITICAL validation errors stop execution.