Intro: The Core Shift
OpenAI and Thrive Holdings just revealed a working self-improving agent in the wild. Tax AI, built on Codex, processed 7,000 tax returns for Crete's 30+ accounting firms this season. The headline numbers—97% accuracy, 50% throughput increase, one-third time savings—are impressive. But the strategic signal is far bigger: the system measurably improved itself without manual retraining. At launch, only 25% of returns reached 75% field completion. Six weeks later, 86% did. That's not a software update; that's a paradigm shift in how professional services automation works.
Analysis: Strategic Consequences
1. The Self-Improvement Loop Is Now Production-Proven
The three-pillar architecture—practitioner feedback, production traces, and Codex-driven iteration—turns every correction into a structured signal. Codex investigates the trace, runs targeted evals, and proposes code changes. This is not a demo; it's a live system that improved from 25% to 86% accuracy at the 75% completion threshold in six weeks. The implications for any domain with repetitive, expert-driven workflows are profound. Bookkeeping, audit, legal document review, medical coding—any process where humans correct AI outputs can now become a self-improving system.
2. Winners and Losers
Winners: Thrive Holdings gains a reusable blueprint for its portfolio companies. OpenAI validates Codex's agentic capabilities in a high-stakes, regulated domain. Crete accountants win back time—one senior accountant dropped from 180 hours to 15 hours on tax prep, reallocating time to client advisory and business growth. Losers: Traditional tax software vendors like Intuit (TurboTax) and Thomson Reuters (UltraTax) face an existential threat. Their products require manual updates and lack autonomous improvement loops. Low-skill data entry roles in tax preparation will shrink rapidly.
3. Second-Order Effects: The Blueprint Expands
Thrive Holdings is already applying the same three-pillar design to bookkeeping, audit, and IT help desk automation. This means the self-improving agent pattern will spread horizontally across professional services. The key insight: the loop works best when practitioners, product, and engineering sit under one roof. Thrive's structure as an owner-operator allows this integration. Competitors without this vertical alignment will struggle to replicate the speed of improvement.
4. Market and Industry Impact
The tax preparation software market is worth roughly $12 billion annually. Tax AI's ability to handle complex 1040 and 1041 returns with 97% accuracy and self-improvement will force incumbents to either acquire AI capabilities or lose market share. The broader professional services automation market—estimated at $5 billion and growing—will see a wave of similar agents. The barrier to entry is not just AI technology but the ability to embed it inside a practitioner-led feedback loop. OpenAI's Codex becomes the default platform for building such systems, strengthening its enterprise moat.
Bottom Line: Impact for Executives
For CEOs of professional services firms, the message is clear: the era of static automation is over. Any system that does not improve from human corrections is a liability. For technology vendors, the race is on to build self-improving loops into their products. The Tax AI case study provides a concrete, measurable template. Executives should ask: Where in our operations do experts correct AI outputs? Can we capture those corrections as structured signals? If not, a competitor will.
Rate the Intelligence Signal
Intelligence FAQ
Practitioners correct AI outputs; the system captures the full trace from source files to final return. Codex then investigates the failure, runs targeted evals, and proposes code changes. This turns every correction into a structured improvement signal without manual engineering.
Vendors like Intuit and Thomson Reuters face an existential threat. Their products require manual updates and lack autonomous improvement. Tax AI's 86% accuracy improvement in six weeks sets a new bar; incumbents must either acquire AI capabilities or risk losing market share to self-improving agents.
Any domain with repetitive, expert-driven workflows: bookkeeping, audit, legal document review, medical coding, insurance claims processing. Thrive Holdings is already applying the blueprint to bookkeeping and IT help desk automation.


