The demo always works. That is the whole problem with demos. A human runs the agent on a clean input they chose, the tools happen to return good data, the model happens to produce parseable output, and everyone agrees it is ready to ship. Then it meets real traffic, where inputs are malformed, tools time out, APIs return half-broken JSON, and the model occasionally outputs a paragraph of prose where you asked for a number. The agent that sailed through the demo now fails in ways nobody planned for, because nobody planned for failure at all.
The difference between a demo agent and a production agent is not a better model or a cleverer prompt. It is an entire layer the demo never needed: the robustness layer. It is unglamorous, it never shows up in a screenshot, and it is the single biggest reason agents that look done are not. Here is what is actually in it.
Guardrails on the way in
The first thing a production agent does with a request is not trust it. Input guardrails sit in front of the model and decide whether this input should reach it at all.
There are two jobs here. The first is correctness: is the input well-formed and within the bounds the agent can handle? An empty request, a request in a language the agent does not support, a file that is the wrong type, a query that is ten times longer than anything the agent was designed for. These should be caught and handled with a clear response, not passed to the model to flail at.
The second job is safety: is this input trying to make the agent do something it should not? Prompt injection arriving through a document the agent was asked to summarize, an instruction buried in user-supplied text trying to override the system prompt, a request to exfiltrate data the agent has access to. Input guardrails are where you catch the obvious cases before they become incidents.
The mistake teams make is treating the model as its own guardrail: "we told it in the prompt not to do that." A prompt instruction is a suggestion, not a control. Guardrails are code that runs regardless of what the model decides.
Guardrails on the way out
Output guardrails are the mirror image, and they are the more commonly skipped of the two. Before the agent acts on what the model produced, something has to check that the output is valid.
If you asked for structured output, validate it against the schema. Models produce almost-valid JSON often enough that "almost" will eventually break you: a trailing comma, a hallucinated field, a number formatted as a sentence. If the output drives an action (sending an email, updating a record, charging a card), validate that the action is within policy before executing it. An agent that decides to issue a refund should not be the only thing standing between a malformed reasoning step and your payment system.
This is also where you catch the model confidently making things up. A faithfulness check that verifies the answer is grounded in the retrieved source is an output guardrail: it sits between the model and the user and refuses to pass an answer the evidence does not support. The principle generalizes. Anything the model produces that will be acted on should pass through a gate that can say no.
Retries that are actually smart
When a step fails, the naive response is to retry. The naive retry is also how you turn one failure into a cascade of failures that costs ten times as much.
A real retry policy distinguishes two kinds of failure. Transient failures (a timeout, a rate limit, a temporary network blip, a momentary 503) are worth retrying, because the same call might succeed a moment later. These get retried with exponential backoff: wait a bit, then a bit longer, then give up after a bounded number of attempts. Deterministic failures (a malformed request, an authentication error, a 400 because the input is wrong) will fail identically every time. Retrying them is pure waste: same money, same latency, same failure at the end.
The number that matters most is the bound. An unbounded retry, or a retry nested inside an agent loop that is itself retrying, is how a single stuck tool turns into a runaway that burns the budget and pins the latency at the timeout ceiling. Every retry needs a hard cap and a backoff, and the agent needs to know what to do when the cap is hit, which brings us to the part teams skip entirely.
Fallbacks: what happens when a step just cannot succeed
Here is the question almost no demo agent can answer: when this step genuinely cannot succeed, what does the agent do?
The default answer, the one you get when you do not design a fallback, is the worst one: the agent produces something anyway. The tool failed, so the model guesses what the tool would have returned and continues as if it had real data. This is how agents ship confident wrong answers. The failure was invisible because the agent papered over it.
A real fallback is a deliberate decision for each step about how to degrade:
- Degrade gracefully: return a partial result with the missing piece clearly marked, rather than a complete-looking result that is secretly fabricated.
- Substitute: fall back to a simpler tool, a cached value, or a default when the primary path fails.
- Escalate: hand off to a human when the agent cannot complete the task safely. For high-stakes actions this is not a failure of the agent, it is the agent working correctly.
- Abort cleanly: stop and report the failure honestly, rather than continuing on fabricated data.
The unifying rule: a failed step should never silently become a successful-looking output. The agent should always know the difference between "I did this" and "I could not do this and here is what I did instead."
You cannot make robust what you cannot see
All of this assumes you can tell when a guardrail tripped, a retry fired, or a fallback engaged. If those events are invisible, you are flying blind, and your robustness layer might be silently catching nothing or catching everything. This is why robustness and observability are the same project: every guardrail, retry, and fallback should emit a signal you can see and alert on. The first time you find out your output guardrail has been rejecting 20% of responses should not be a customer complaint.
It also makes failures debuggable. When something goes wrong in production, you want to look at the trace and see exactly which guardrail fired or which retry exhausted, the same discipline we apply when doing forensics on an agent failure after the fact.
A robustness checklist before you ship
Before an agent goes to real users, walk through this:
- Every input passes through validation and safety guardrails before reaching the model.
- Every model output that drives an action is validated against a schema or policy before the action runs.
- Every tool call has a bounded, backed-off retry that distinguishes transient from deterministic failures.
- Every step has a defined fallback, and no failed step can silently produce a successful-looking result.
- Every guardrail, retry, and fallback emits a signal you can monitor.
If you cannot check all five, the agent is not done, no matter how good the demo looked.
If your agent is fragile in ways you keep discovering in production
The pattern of "it broke again, in a new way we did not expect" is the signature of a missing robustness layer. Each new failure is not a new bug to patch, it is the absence of a layer that should have handled an entire class of failures at once.
Sapota runs a one-week agent hardening pass that adds the input and output guardrails, the retry and fallback policies, and the monitoring to see them work, shipped as a working integration on top of your existing agent. We have done this for support agents, financial workflows, and document pipelines where a confident wrong answer is expensive.
Reach out via the AI engineering page with a description of how your agent fails. The failure modes are usually familiar, and the layer that prevents them is well understood.








