- Published on
Making AI Coding Agents Follow True TDD
- Authors
- Name
- Ptrck Brgr
AI coding agents can write tests and features, but without discipline they tend to skip the core rhythm of test-driven development. Instead of the tight red–green–refactor loop, they often write all the tests at once, then produce a full implementation in one pass. The result works, but it misses the benefits of iterative design and focused scope.
A growing approach is to add an enforcement layer between the AI and the codebase. This layer blocks changes that violate TDD sequencing and forces the agent to proceed test-by-test. It’s slower, but it aligns AI output with your team’s engineering values.
Main Story
Test-driven development is built on three short steps: write a failing test (red), write just enough code to pass it (green), then refactor without changing behavior. This cycle keeps code lean, guides design, and reduces waste.
AI agents, even with careful prompting, tend to drift toward “big bang” test-first development. They generate multiple tests, then implement the entire feature at once. This bypasses the learning and design feedback that come from small, iterative steps.
A structured experiment on eShopOnWeb, a realistic .NET sample app, confirmed the pattern. Prompt-only TDD guidance produced functional code in minutes, but the agent skipped the iterative loop. It introduced minor architectural deviations and wrote extra, unneeded methods.
A tool called tdd-guard changes this behavior. It hooks into file writes, runs tests, and uses a separate AI “judge” to check if the change follows TDD rules. If it detects missing red phases, multiple tests added at once, or bottom-up work when outside-in was specified, it blocks the change until fixed.
"We're now enforcing test-driven development… cloud code now has no option to not follow these instructions." — Jo Van Eyck
With tdd-guard, the agent had to write a single failing test, fix it, then repeat. It improved assertion quality, adhered to the intended architecture, and avoided unnecessary code. The trade-off: roughly double the development time and higher token use due to the extra validation step.
Technical Considerations
Strict enforcement requires integration between your AI assistant, a test runner, and a validation process. tdd-guard uses a language-specific test reporter and a shared state directory to track the last test run. The AI judge is programmed with your team’s TDD rules and reviews each proposed change in context.
This setup adds latency. Every change triggers a test run and an AI validation call. For large codebases or slow test suites, this can be significant. Token consumption rises because of repeated AI evaluations. Teams must weigh the cost of slower throughput against the benefit of higher adherence to process.
Integration is straightforward for teams already using CI/CD hooks and automated tests. Privacy and security considerations apply—AI validation may require sending code snippets to an external service. Vendor risk and compliance requirements should be factored in.
Business Impact & Strategy
From a leadership perspective, the decision comes down to quality vs. speed. Strict TDD enforcement with AI agents can:
- Improve architectural consistency and maintainability
- Reduce rework caused by overengineering or irrelevant features
- Align AI output with existing team practices
However, it also:
- Increases development time by roughly 2× in the tested scenario
- Raises operational costs due to additional AI calls
- May require retraining teams to work effectively with the enforced loop
For high-stakes, long-lived systems, the quality gains may outweigh the slowdown. For exploratory or low-risk projects, relaxed enforcement may be more pragmatic.
Key Insights
- AI coding agents default to large, test-first steps without enforcement
- True TDD requires guardrails to maintain the red–green–refactor rhythm
- Enforcement tools like tdd-guard improve adherence to design intent
- The cost is slower delivery and higher token consumption
- The approach is most valuable for core, long-term codebases
Why It Matters
AI-assisted development is becoming common, but without process discipline it risks producing code that’s functional yet misaligned with team values. Enforcing TDD with AI agents is a way to blend automation with craftsmanship. It gives leaders a lever to control quality, predictability, and maintainability while still benefiting from AI’s speed in the right contexts.
Actionable Playbook
- Integrate Guardrails: Link your AI assistant to a TDD enforcement tool via file-write hooks; measure pass rate of enforced TDD steps
- Define Preferred Workflow: Configure the guard for your TDD style; check architectural compliance in each iteration
- Seed with Acceptance Criteria: Give the AI minimal, clear criteria for each step; track how often extra, unneeded code is avoided
- Review and Refine Tests: Inspect generated tests before green phases; aim for 90%+ acceptance on first review
- Balance Speed vs. Discipline: Apply strict enforcement only where long-term quality is critical; monitor delivery time impact
Conclusion
AI can follow true TDD—but only with help. By adding a guard layer, teams can enforce the discipline that keeps code clean and architectures sound. The trade-off is speed, but for the right projects, the return is lasting quality.
Inspired by: Can AI coding agents do Test-Driven Development (TDD)? — Jo Van Eyck
Dive deeper into the content →
https://www.youtube.com/watch?v=IVdYaVKuekk