Stanford's New Rule for AI Coding: No Contracts, No Agents

Your agents are only as good as the codebase they land in. No tests, no linting, inconsistent APIs—and an agent will compound every one of those gaps faster than any intern ever could.

Mihail Eric, who teaches Stanford's first AI-across-the-SDLC course, lays this out in From Writing Code to Managing Agents—the bottleneck isn't model quality or prompt tricks. It's whether your codebase has explicit contracts an agent can follow.

That framing stopped me. At Tier, deploying edge ML on scooters taught me that constraints force clarity—you can't ship ambiguity to a device with 512MB of RAM. Agents face the same wall. They just hit it at compile speed instead of deploy speed.

The Last Boss in the Game

Eric's Stanford class filled over 100 students within hours. CS graduates have doubled or tripled in the last decade, the post-COVID correction wiped out roles, and employers are asking whether fewer AI-native hires can cover the headcount. One Berkeley grad applied to a thousand places. Heard back from two.

Eric's prescription isn't "learn prompting." It's learn orchestration.

Really knowing how to properly handle multiple agents is like the last boss in a game. If you can do that really well, then you are literally the top 0.1% of users even today. — Mihail Eric

I'm not sure that framing holds for long. Multi-agent orchestration feels rare now, but so did containerization in 2014. What sticks is the underlying skill: breaking work into contract-bounded units and knowing when to add parallelism. That transfers regardless of what tooling looks like in two years.

One Agent, Then Two

Eric's advice to students is almost aggressively incremental. Master one agent workflow. Get confident. Then add a second agent for an isolated task—fix the logo while the first agent handles the backend. Only scale when each agent's scope is clean.

He's explicit about why:

Agents can compound errors very quickly. If an agent has one misunderstanding in the code, and then it sees that misunderstanding that it created in step one, it can double down and create another error in step two. — Mihail Eric

One confused agent doesn't ask for help. It builds on its own bad assumptions—and the more agents you run without boundaries, the faster that compounding propagates. This resonates with my PhD work on autonomous systems: complex agents fail in ways that cascade, and the failure modes are rarely the ones you tested for.

Context Switching Is the Real Skill

Here's what caught me off guard. Eric describes multi-agent work as constant context switching—watching three agents in terminals, jumping between them while retaining enough state to steer each one.

His observation: the people best at this are former engineering managers. They already built the muscle for tracking parallel workstreams and knowing when to intervene.

We keep talking about "AI-native" as a technical skill. Eric is saying it's a management skill. The terminal is your team standup. The agents are your reports. And the skill that separates top performers isn't technical depth—it's the ability to allocate attention without losing the thread.

The Contract Requirement

This is where it gets concrete:

You need to define these contracts. If you don't have enough test coverage, then you don't have contracts for your software. Agents only can operate on contracts—explicitly defined contracts of software. — Mihail Eric

Eric walks through the failure chain: outdated READMEs contradicting the code, two different APIs for the same object. Agents face the same confusion a new hire would—except they don't stop to ask which pattern is right.

I'd push this further. Tests and linting are necessary but not sufficient. In enterprise codebases, the bigger limiter is organizational—who owns what, which decisions are current versus legacy, what the release process actually is. That lives in people's heads. (Honestly, most humans in large orgs can't access tribal knowledge either, which is half the problem.)

Speed Doesn't Fix Product Sense

Near the end, Eric drops a sharp warning: iterate with Claude or Codex for a month, build something "crazy overengineered," launch—and nobody wants it.

Agent speed doesn't guarantee market fit. I don't have a clean answer for how to balance build velocity with demand validation when iteration costs minutes instead of days. The question worth asking: does near-zero iteration cost make product discipline harder, not easier?

Rem Koning from Harvard broadens the frame: "Increasingly what matters is your ability to allocate intelligence." Not just writing code—deciding where machine intelligence goes and where human judgment stays.

Why This Matters

Enterprise codebases—decade-old services, inconsistent APIs, READMEs nobody updates—are where agents struggle most. The agent won't flag confusion. It'll pick one interpretation and compound from there. Agents just made the cost of skipping hygiene dramatically higher.

But here's the catch: Eric's framing stops at the codebase layer. Reviews, ownership models, release gates, and incentive structures are the organizational layer. If you don't redesign code review for agent-generated output, you've moved the bottleneck from generation to review. The traffic jam relocates. Net throughput? Flat.

What Works

Start with one agent on a well-tested codebase. Measure what ships, not how much code gets written. Activity metrics lie.

Add agents incrementally. Isolated tasks, clear boundaries. No shared dependencies between concurrent agents until you've earned that complexity.

Treat your test suite as the agent contract. If a behavior isn't in a test, an agent has no way to verify its work. Coverage isn't just for CI—it's the interface between human intent and machine execution.

Build context-switching muscle. Track each agent's state and scope. The skill isn't prompting—it's attention management, and if you can't handle three, adding a fourth won't help.

Don't confuse build speed with product validation. Agents make iteration nearly free. That's dangerous when you're iterating on something nobody wants.

These practices work when your codebase has clear contracts. Without that foundation, agents amplify whatever mess was already there. Ask me again in six months whether tooling closes the gap—right now, the discipline is on you.

Full talk: Watch on YouTube

The Last Boss in the Game

One Agent, Then Two

Context Switching Is the Real Skill

The Contract Requirement

Speed Doesn't Fix Product Sense

Why This Matters

What Works

Related Articles

Stanford Tracked 120K Devs. AI ROI? Just 10%

How Amazon Kiro Turns Prompts Into Verifiable Specs

Stop Reviewing Agent Code. Start Verifying It.