We’re at an inflection point with AI. The systems being deployed today don’t just answer questions or generate text – they plan, execute multi-step workflows, and make decisions across extended chains of reasoning. This is agentic AI, and it represents a fundamental shift in how we interact with these tools.
If you’re building agentic systems, you probably already know what I’m about to say. The question is whether everyone using them does.
The potential is obvious. An AI agent could research a legal question, draft a memo, identify relevant cases, and format citations – all while you’re in court or meeting with clients. It could manage your inbox, schedule meetings based on priority, and draft responses that account for your communication style and ongoing projects. These aren’t hypothetical futures; they’re capabilities being built right now.
But there’s a problem we need to talk about, and it’s not the one you might expect. The issue isn’t whether agentic AI will work. It’s whether we’re ready for what happens when it doesn’t.
The Multiplication of Error
Traditional AI systems present a contained risk. You ask Claude or ChatGPT for something, you get an answer, and you verify it. One question, one output, one opportunity to catch a mistake. The verification step is your responsibility, but at least the error surface is limited.
Agentic systems change this equation. They take action after action after action, each one building on the last. And here’s where things get dangerous: a single mistake early in the chain doesn’t just create one error – it creates a foundation that every subsequent action builds upon.
Consider the attorney who asks an AI to research case law. The AI hallucinates a case – we know this happens. In a traditional interaction, the lawyer sees the citation, should verify it, and catches the problem. But an agent doesn’t stop there. It uses that fabricated case to support its legal reasoning. It drafts a brief around that reasoning. It might even cite the fake case multiple times throughout the document, each reference appearing to corroborate the others. By the time the document reaches the lawyer, it looks thoroughly researched and internally consistent. The error hasn’t just persisted – it’s metastasized.
This is what makes cascading failures so insidious. Each step adds apparent legitimacy to the mistake that came before.
We Haven’t Learned the First Lesson Yet
Here’s what worries me most: we’re rushing toward agentic AI while many users haven’t internalized the basic requirement of verifying AI outputs at all.
The cases keep coming. Lawyers sanctioned for citing non-existent cases. Researchers retracting papers built on fabricated data. Business decisions based on confidently wrong AI analysis. These aren’t edge cases anymore – they’re a pattern. And they all share a common cause: someone trusted the AI output without verification.
If people won’t verify a single citation from ChatGPT, what happens when an agent generates fifty of them across multiple documents? If someone doesn’t catch a factual error in one AI-written email, how will they audit an agent that’s been managing their correspondence for a week?
The autonomy that makes agents powerful is the same quality that removes the natural checkpoints where humans would catch errors. Every automated action is one less moment where someone stops and asks, “Wait, is this actually right?”
This Isn’t Anti-Progress
I want to be clear: I’m not arguing against agentic AI. The capabilities are real, and the productivity gains will be substantial. Trying to stop this technology would be both futile and counterproductive.
But we need to be honest about the transition we’re making. We’re moving from tools that assist us to tools that act for us. That’s not a small distinction, and it carries implications we’re not yet taking seriously enough.
The solution isn’t to avoid agents. It’s to build and use them responsibly:
Verification by design: Agentic systems need built-in checkpoints where they surface their reasoning and sources for human review before proceeding to the next step. Not as an optional feature, but as a core design principle.
Audit trails that matter: Every action an agent takes should be logged in a way that humans can actually review. Not buried in technical logs, but presented clearly: “I did X because of Y, using source Z.”
Education that sticks: Users need to understand that agents aren’t more reliable than traditional AI – they’re less forgiving of our inattention. The same verification discipline required for a single AI response applies to every single action an agent takes.
Appropriate scope: Some tasks are well-suited for autonomous execution. Others require human judgment at every step. We need to get better at distinguishing between them, and that distinction will be different for different users and contexts.
Signs of Progress (and Why They’re Not Enough)
There are promising developments. Some agent frameworks now include explicit reasoning traces that show their step-by-step logic. Others build in source citation and self-verification steps or include the ability to pause and ask for confirmation before taking high-stakes actions. These are exactly the kinds of safeguards we need. But here’s the problem: they’re not yet standard. Many agentic systems being deployed today lack these features entirely. And even when the technical safeguards exist, adoption is inconsistent.
More importantly, cultural habits lag far behind the technology. Even with reasoning traces available, how many users actually read them? Even with confirmation prompts built in, how many people wave them through without genuine review? The tools are getting better, but our practices haven’t caught up – and in some cases, haven’t even started the journey.
This gap between what’s technically possible and what’s culturally embedded is where the real risk lives. We can build all the safeguards we want, but if users treat them as obstacles to efficiency rather than essential checkpoints, we haven’t solved the problem.
The Stakes Are Rising
The lawyer who doesn’t verify AI citations might face sanctions. Embarrassing, costly, but usually containable. An agent acting on bad information could file motions, miss deadlines, and make strategic decisions that compound over weeks or months before anyone notices. The financial analyst who doesn’t verify an AI-generated report might make one bad call. An agent trading on flawed analysis could execute an entire strategy before the underlying error surfaces.
This isn’t fear-mongering. It’s extrapolating from what we already know: AI makes mistakes, and people don’t verify as consistently as they should. Agentic systems don’t fix either of those problems. They amplify the consequences.
What We Owe the Future
There is no doubt in my mind that agentic AI will become ubiquitous. The efficiency gains are too significant, and the competitive pressure too intense, for any other outcome. The question isn’t whether we’ll use these systems, but whether we’ll use them wisely.
That means acknowledging an uncomfortable truth: the very features that make agents useful – their autonomy, their ability to chain actions together, their capacity to work without constant supervision – also create new failure modes that our current habits aren’t equipped to handle.
We need to mature our relationship with AI before we hand it the keys to autonomous action. That means treating verification not as an optional safety check but as a non-negotiable practice. It means designing systems that assume errors will happen and make them visible when they do. It means recognizing that increased automation requires increased vigilance, not less.
The good news is that we’re seeing the right technical foundations being laid. The concerning news is that technical capability alone won’t save us from ourselves. We need the cultural shift to match the technological one – and right now, we’re not even close.
The technology is moving forward regardless. The question is whether we’re moving forward with it or just being carried along.