Making AI Coding Agents Follow True TDD

This article summarizes and builds on key ideas from a video by Jo Van Eyck, published September 20, 2025. Original content: https://www.youtube.com/watch?v=IVdYaVKuekk

AI coding agents write working code fast—but they routinely violate the discipline that makes test-driven development effective. Jo Van Eyck demonstrates how enforcement layers can force AI agents into true red-green-refactor cycles, ensuring each test drives minimal, focused implementation. In enterprise environments where long-term maintainability determines total cost of ownership, this tension between AI speed and engineering discipline isn't theoretical—it's a daily architecture decision that shapes technical debt for years.

A growing approach is to add an enforcement layer between the AI and the codebase. This layer blocks changes that violate TDD sequencing and forces the agent to proceed test-by-test. It's slower, but it aligns AI output with your team's engineering values.

Main Story

Test-driven development is built on three short steps: write a failing test (red), write just enough code to pass it (green), then refactor without changing behavior. This cycle keeps code lean, guides design, and reduces waste.

AI agents, even with careful prompting, tend to drift toward “big bang” test-first development. They generate multiple tests, then implement the entire feature at once. This bypasses the learning and design feedback that come from small, iterative steps.

A structured experiment on eShopOnWeb, a realistic .NET sample app, confirmed the pattern. Prompt-only TDD guidance produced functional code in minutes, but the agent skipped the iterative loop. It introduced minor architectural deviations and wrote extra, unneeded methods.

A tool called tdd-guard changes this behavior. It hooks into file writes, runs tests, and uses a separate AI “judge” to check if the change follows TDD rules. If it detects missing red phases, multiple tests added at once, or bottom-up work when outside-in was specified, it blocks the change until fixed.

"We're now enforcing test-driven development… cloud code now has no option to not follow these instructions." — Jo Van Eyck

With tdd-guard, the agent had to write a single failing test, fix it, then repeat. It improved assertion quality, adhered to the intended architecture, and avoided unnecessary code. The trade-off: roughly double the development time and higher token use due to the extra validation step.

Technical Considerations

Strict enforcement requires integration between your AI assistant, a test runner, and a validation process. tdd-guard uses a language-specific test reporter and a shared state directory to track the last test run. The AI judge is programmed with your team’s TDD rules and reviews each proposed change in context.

This setup adds latency. Every change triggers a test run and an AI validation call. For large codebases or slow test suites, this can be significant. Token consumption rises because of repeated AI evaluations. Teams must weigh the cost of slower throughput against the benefit of higher adherence to process.

Integration is straightforward for teams already using CI/CD hooks and automated tests. Privacy and security considerations apply—AI validation may require sending code snippets to an external service. Vendor risk and compliance requirements should be factored in.

Business Impact & Strategy

From a leadership perspective, the decision comes down to quality vs. speed. Strict TDD enforcement with AI agents can:

Improve architectural consistency and maintainability
Reduce rework caused by overengineering or irrelevant features
Align AI output with existing team practices

However, it also:

Increases development time by roughly 2× in the tested scenario
Raises operational costs due to additional AI calls
May require retraining teams to work effectively with the enforced loop

For high-stakes, long-lived systems, the quality gains may outweigh the slowdown. For exploratory or low-risk projects, relaxed enforcement may be more pragmatic.

Key Insights

AI coding agents default to large, test-first steps without enforcement
True TDD requires guardrails to maintain the red–green–refactor rhythm
Enforcement tools like tdd-guard improve adherence to design intent
The cost is slower delivery and higher token consumption
The approach is most valuable for core, long-term codebases

Why It Matters

AI-assisted development is becoming common, but without process discipline it risks producing code that’s functional yet misaligned with team values. Enforcing TDD with AI agents is a way to blend automation with craftsmanship. It gives leaders a lever to control quality, predictability, and maintainability while still benefiting from AI’s speed in the right contexts.

Actionable Playbook

Integrate Guardrails: Link your AI assistant to a TDD enforcement tool via file-write hooks; measure pass rate of enforced TDD steps
Define Preferred Workflow: Configure the guard for your TDD style; check architectural compliance in each iteration
Seed with Acceptance Criteria: Give the AI minimal, clear criteria for each step; track how often extra, unneeded code is avoided
Review and Refine Tests: Inspect generated tests before green phases; aim for 90%+ acceptance on first review
Balance Speed vs. Discipline: Apply strict enforcement only where long-term quality is critical; monitor delivery time impact

Conclusion

AI can follow true TDD—but only with help. By adding a guard layer, teams can enforce the discipline that keeps code clean and architectures sound. The trade-off is speed, but for the right projects, the return is lasting quality. Watch Jo Van Eyck's full demonstration to see enforcement in action. Questions or feedback? Reach out!

Main Story

Technical Considerations

Business Impact & Strategy

Key Insights

Why It Matters

Actionable Playbook

Conclusion

Related Articles

Building AI Agents with a True Cognitive Core

Data Streaming as AI’s Real-Time Backbone

AI Agents and the Future of Enterprise Security