MCP Servers Are Agent UIs, Not API Wrappers

Your MCP server isn't infrastructure. It's a user interface. And right now, most of them are designed like a 2005-era Swagger page dumped into a context window.

Jeremiah Lowin, founder of Prefect and creator of FastMCP, lays this out sharply in Your MCP Server is Bad (and you should feel bad)—agents aren't humans browsing documentation. They're constrained systems with tight token budgets, slow iteration loops, and zero tolerance for ambiguity.

This hit close to home. At ENVAIO, we designed IoT product surfaces for devices with severe resource constraints—limited memory, bandwidth, processing. You don't expose everything the backend can do. You curate what the constrained client needs. I didn't expect that principle to map so cleanly onto agent tool design, but here we are.

The Handshake Tax

Every time an agent connects to an MCP server, it pays a tax. It enumerates every tool, reads every schema, ingests every description—before doing a single useful thing.

Every single time that thing turns on, it shakes hands with the server. It learns about the server. It enumerates every single tool and every single description on that server. — Jeremiah Lowin, Prefect

The math gets uncomfortable. Lowin describes a company that wanted to expose 800 endpoints as MCP tools. With a 200,000-token context window, each tool gets roughly 250 tokens for its name, schema, and docs. Use all 800? The entire context window is gone on handshake. Zero room to think.

Here's the question I keep coming back to: we obsess over model quality and prompt engineering, but what if the tool surface is eating the context budget before the model even starts reasoning? Context engineering beats model quality—and tool design is context engineering.

50 Tools Is a Smell

Lowin draws the line at about 50 tools per agent before performance degrades. Not per server—per agent, across all connected servers. That feels low.

But here's the catch: it's a heuristic, not a law. GitHub's MCP server has around 170 tools and ships. The question is whether you've invested in routing and curation—or whether you're dumping endpoints and hoping the model figures it out.

The Fiverr case makes this concrete. An engineer built a server that grew to 188 tools. A month later, he curated it down to 5. Five. Not trimming—a fundamental rethink of what the agent actually needs.

(Honestly, this surprised me most. Not that fewer tools work better, but that the reduction was that dramatic and performance improved.)

Errors Are Prompts

This one reframed how I think about error handling in agent systems.

Errors are prompts. — Jeremiah Lowin, Prefect

When a tool call fails, the error message becomes part of the model's next input. It's not a log line for a developer—it's a prompt that shapes the agent's next decision. "Invalid date format—use YYYY-MM-DD" beats "ValueError: could not parse datetime" every time.

I could be wrong, but error design might be the highest-ROI investment in most agent stacks right now. It directly reduces iteration loops—and iteration is the enemy.

Stop Wrapping REST

Lowin's most pointed plea: stop converting REST APIs into MCP servers and shipping them to production.

The irony isn't lost on him. FastMCP's OpenAPI-to-MCP converter is one of its most popular features. He built it. He also wrote a blog post that says, "I know I introduced this capability. Please stop."

Auto-conversion is great for bootstrapping—validate tool usage, test patterns, figure out which endpoints matter. But the REST wrapper carries debt: bloated tool counts, schemas designed for developers, naming that confuses the model.

The fix isn't complicated. It's just work. Curate down. Write descriptions for the agent. Name tools so the model picks the right one—not so a future engineer understands the codebase.

The Client Problem

What do you do when the client doesn't follow the spec? I don't have a clean answer.

Lowin calls out Claude Desktop specifically—until recently, it sent all structured arguments as strings. FastMCP had to add string-to-object deserialization to cope. Even spec-compliant progressive disclosure can break because clients cache tools in ways the spec doesn't anticipate.

From enterprise deployments, I've seen this with every integration layer. You design for the spec, then work around implementations. Lowin's advice: design for worst-case clients. Assume tools get discovered all at once. Assume schemas get mangled.

My sample size is limited to enterprise contexts, but I'd push harder on measurement. "50 tools" is a useful starting point, but production teams need task success rates by tool count, token cost per outcome, retry frequency. Without numbers, you're tuning by intuition.

Why This Matters

The gap between "works in a demo" and "works in production" is almost entirely a design gap. Not model capability. Not infrastructure. Design.

Agents operate under constraints that look nothing like human workflows—small context windows, expensive discovery, slow iteration. Every tool, schema, and error message eats the agent's cognitive budget. Waste it on handshake overhead and there's nothing left.

Lowin predicts we'll talk about "context products" instead of "MCP servers" within a year. I'm not sure about the timeline, but the direction feels right. Teams that treat tool surfaces as agent UIs—curated, tested, token-aware—will ship agents that work. Everyone else will wonder why theirs keep failing on step one.

What Works

Curate aggressively. More than 15 tools exposed to a single agent? Ask whether each one earns its token budget. The Fiverr path—188 down to 5—is extreme but directional.

Write error messages for the model, not for developers. Every word costs tokens and shapes the agent's next move.

Measure what matters. Track task success rate by tool count. Monitor token cost per outcome. Count retries. Heuristics start you off; metrics tell you when you've arrived.

Design for bad clients. Assume tools get discovered all at once, schemas get cached unpredictably, and structured arguments arrive as strings.

Start with the REST wrapper to validate. Ship the curated version. The bootstrap-to-production gap is where the design work lives—and where most teams stop too early.

These patterns work best when you control both server and client. When you don't, the constraints tighten and defensive design matters more.

Full talk: Watch on YouTube

The Handshake Tax

50 Tools Is a Smell

Errors Are Prompts

Stop Wrapping REST

The Client Problem

Why This Matters

What Works

Related Articles

How Amazon Kiro Turns Prompts Into Verifiable Specs

Architecture Decisions Drive 100x More Cost Than Code

Stanford Tracked 120K Devs. AI ROI? Just 10%

Explore by Topic

ai-agents(3 articles)

enterprise-ai(3 articles)

mlops(2 articles)