AI Agents Are Injecting Chaos You Can't See
Every autonomous AI agent in production is a chaos engineering experiment you didn't design, didn't approve, and aren't tracking. That is the central finding from a new analysis by Sayali Patil, a veteran of Cisco and Splunk who spent six years building infrastructure automation at enterprise scale. Patil argues that the gap between agentic AI and chaos engineering is generating a wave of production incidents that most organizations have no framework to detect, let alone prevent.
Seventy-nine percent of organizations now have some form of AI agent in production, with 96% planning expansion, according to PwC. Gartner predicts 33% of enterprise software will include agentic AI by 2028, but separately warns that 40% of those projects will be canceled by end of 2027 due to poor risk controls. What neither statistic captures is the failure mode happening between those two numbers: agents that are running, that are not canceled, and that are quietly generating infrastructure events no one has categorized as risk.
For enterprise executives, this is not a theoretical concern. The same agents designed to remediate latency, restart services, or reroute traffic are themselves becoming the primary cause of cascading failures. The postmortems don't capture it because the agent is invisible—logged as a service restart or a connection pool saturation, not as the initiating cause. The result is a blind spot that will only grow as agent adoption accelerates.
The Judgment Call That Agents Skip
Traditional chaos engineering has a critical safety property: a human makes a judgment call before injecting failure. They check dashboards, assess error budget burn rates, and evaluate whether dependencies are stable. It's imperfect, but there is a person in the loop asking the right question before anything runs.
Autonomous remediation agents bypass that entirely. An agent sees an anomaly, takes an action—restart a service, scale resources, modify configurations—and that action is a chaos event. No SLO burn rate check. No blast radius calculation. No human judgment about whether right now is the right moment to introduce additional stress into a system already under pressure.
Patil describes a specific failure mode: a remediation agent detects elevated latency on a microservice and restarts the service cluster. Reasonable, given its narrow view. But the agent doesn't know that three other services are handling peak traffic, the shared connection pool is at 87% utilization, and a dependent database is running a background index rebuild. The restart triggers a thundering herd. What started as a latency spike becomes a cascade the agent was never designed to model.
According to the AI Incidents Database, reported AI-related incidents rose 21% from 2024 to 2025. That count almost certainly understates actual exposure because most organizations have no incident classification that captures an autonomous agent action as the initiating cause.
Absorb Capacity: The Missing Resource
The underlying problem is that enterprise systems have no shared language for absorb capacity—the real-time estimate of how much additional stress a system can take before breaching SLO commitments. Chaos engineering programs manage it implicitly through human judgment and static thresholds. Agents don't manage it at all.
Patil proposes a resilience budget model that treats absorb capacity as a continuously recomputed, consumable resource. It draws on four live signal classes: SLO burn rate, P99 latency trend, dependency saturation state, and application behavioral signals like session completion rates and API call pattern shifts. Every chaos experiment and every agent action draws from this budget. Without a shared ledger, two teams running experiments against overlapping dependencies produce a combined blast radius neither planned. Add autonomous agents acting outside the ledger, and the accounting collapses.
Where LLMs Help—And Where They Fail
Some organizations are using large language models to generate chaos hypotheses from dependency graphs and postmortem corpora. Results are directionally useful: LLMs surface plausible failure modes faster than manual processes. But the limit is dependency graph staleness. A hypothesis generated from a graph that doesn't reflect last month's service extraction will propose an experiment with incorrect blast radius assumptions. The model is confidently incorrect about a system boundary that no longer exists.
Stanford's Trustworthy AI Research Lab found that model-level guardrails alone are insufficient: fine-tuning attacks bypassed leading models in the majority of tested cases. The implication for chaos hypothesis generation is direct: a model that cannot reliably hold its own safety boundaries cannot be trusted to accurately model the blast radius of an action it has never seen in a dependency graph it has not verified.
When hypothesis generation draws from postmortem corpora, the staleness problem shrinks. Postmortems describe failures that actually occurred—the signal is inherently validated by production reality. This is the tractable near-term AI application: generating hypotheses from incident history. What AI cannot do—and should not be asked to do—is make the execution decision when signals are ambiguous. That judgment requires awareness of things outside any monitoring system: pending deployments, on-call staffing, customer commitments. A model without that context should not be making that call.
Winners & Losers
Winners: Chaos engineering and reliability automation vendors will see surging demand as enterprises realize they need integrated governance for agent actions. Enterprises that invest early in resilience budgets and agent governance will safely adopt agentic AI and gain competitive advantage while competitors struggle with cascading failures.
Losers: Enterprises with weak risk controls face higher likelihood of project cancellations and operational failures from unmonitored AI agents. AI vendors relying solely on model-level guardrails will lose trust and market share as their solutions prove insufficient against real-world production chaos.
Second-Order Effects
The market will shift from focusing on AI model capabilities to emphasizing robust testing, monitoring, and governance frameworks. Chaos engineering will become a standard practice for agentic AI deployments. Regulatory scrutiny may increase as incidents rise, potentially slowing adoption for unprepared enterprises. The next wave of AI infrastructure spending will flow to reliability and observability tools, not just model training.
Market / Industry Impact
The enterprise AI market is at an inflection point. The same forces driving adoption—autonomy, speed, scale—are creating systemic risks that traditional engineering practices cannot manage. The organizations that operate autonomous agents reliably at scale will not be those with the most sophisticated models. They will be the ones that understood, before something went badly wrong, that every agent action is a chaos event and built their governance layer accordingly.
Executive Action
- Audit every autonomous agent currently touching infrastructure. Map its action surface against live SLO burn rate signals and define explicit floor conditions below which the agent must wait or escalate.
- Implement a resilience budget model that treats absorb capacity as a shared, consumable resource. Ensure all agent actions and chaos experiments draw from the same ledger.
- Require human-in-the-loop for ambiguous execution decisions. A circuit breaker that hands ambiguous cases to a human is not a weakness—it is what makes the architecture trustworthy enough to run in production.
Why This Matters
The gap between agentic AI and chaos engineering is not a technical nuance—it is a structural vulnerability that will produce the next wave of major production incidents. Every day without an integrated governance layer is a day your agents are running experiments you didn't approve, against systems you don't fully understand, with consequences you won't discover until it's too late. The organizations that act now will build a moat; those that wait will be caught in the cascade.
Final Take
AI agents are not just tools—they are autonomous chaos injectors. Treating them as anything less is a strategic error that will compound as adoption scales. The winning enterprises will be those that integrate chaos engineering into their agent governance from day one, not those that retrofit it after the first unplanned outage. The clock is ticking.
Rate the Intelligence Signal
Intelligence FAQ
AI agents can trigger cascading infrastructure failures because they act without human judgment about system absorb capacity, and most organizations lack frameworks to detect or prevent these incidents.
Implement a resilience budget model that treats absorb capacity as a shared resource, require all agent actions to register against live SLO signals, and mandate human-in-the-loop for ambiguous decisions.
Gartner attributes cancellations to poor risk controls—specifically, the inability to manage the chaos that autonomous agents introduce into production systems.



