Advanced Context Engineering for AI Agents

AI coding agents can feel magical in small demos—and maddening in real projects. In greenfield code, naive prompting sometimes works. In large, messy systems, it quickly collapses under the weight of ambiguity, noise, and wasted context.

Dexter Horthy, founder of Human Layer, argues the fix is not a smarter model, but smarter inputs. His team built a repeatable method—what he calls advanced context engineering—to turn agents into reliable, high‑use tools for shipping complex changes in production.

Main Story

The early approach was familiar: tell the agent what you want, iterate until it works. That broke down in brownfield Go systems with thousands of lines of intertwined logic. The back‑and‑forth produced “slop”—half‑right code that took longer to fix than to write from scratch.

The breakthrough was spec‑first development. Instead of reviewing giant PRs, the team aligned on detailed specs and test plans before touching code. This shifted review from line‑by‑line syntax to intent and architecture. It was uncomfortable at first—slowing down to speed up—but it became a productivity multiplier.

"Everything that makes agents good is context engineering."

Context engineering means being deliberate about every byte that goes into the model’s context window. Large language models behave like pure functions: output quality is a direct function of input quality. The levers are correctness, completeness, size, and trajectory.

Human Layer replaced ad hoc /compact commands with structured progress files—a distillation of exactly what matters for the next step. This trims bloat like oversized JSON tool outputs while keeping critical state. They also used subagents for context‑heavy searches, so the main agent stayed focused.

The workflow settled into three phases:

Research: Map system behavior, key files, line numbers
Plan: List each change, test strategy, and affected files
Implement: Write code guided by the plan

At each phase, context was intentionally compacted, keeping utilization under ~40%. If implementation failed, they revisited the plan, not the code. Human review happened at research and plan stages—fast, high‑signal checkpoints that avoided the pain of reading massive diffs.

The payoff: one‑shot fixes to sprawling codebases, rapid feature delivery, and complex changes—like adding WASM support—done in hours instead of weeks.

Technical Considerations

For engineering leaders, the constraints are clear:

Context window limits: Even with large models, noise and irrelevant state degrade output
Trade‑offs: More context is not always better; precision beats volume
Tooling: Structured progress files and subagents require lightweight orchestration, but not heavy infra
Latency and throughput: Frequent compaction adds steps but reduces wasted loops and rework
Privacy and security: Progress files and subagent outputs must respect data boundaries
Vendor risk: Model choice matters less than context discipline, but switching costs still exist
Skills: Teams need to learn to write high‑fidelity specs and plans; this is a muscle, not a toggle
Integration paths: Embed compaction and review gates into existing dev workflows without overhauling everything at once

The method assumes your agents can run multiple times per task and that you can insert human review at key points without blocking the whole pipeline.

Business Impact & Strategy

The business outcomes are tangible:

Time‑to‑value: Large, complex changes ship in hours, not weeks
Cost vectors: Less agent time wasted on rework; fewer human hours spent on low‑value code review
KPIs: PR cycle time, defect rates, and review load all improve when plans are correct from the start
Org design: Shifts review culture from reactive code police to proactive spec alignment
Risks and mitigations: The main risk is poor specs leading to large volumes of bad code; mitigate with early, lightweight reviews
Evaluation criteria: Success is measured by first‑pass yield—how often the first implementation matches the agreed plan

Horthy’s team found that a bad part of a plan could mean hundreds of bad lines of code. Catching it early was far cheaper than fixing it later.

Key Insights

Naive prompting fails in complex, messy codebases
Spec‑first development enables high‑signal, low‑cost reviews
Context engineering is the primary lever for agent quality
Structured progress files outperform generic compaction commands
Subagents help manage context‑heavy searches without polluting the main agent’s state
Frequent intentional compaction keeps context under control and output aligned

Why It Matters

For technical teams, this is a blueprint for scaling AI agents beyond toy problems. For business leaders, it’s a way to unlock AI use without the hidden costs of rework and review fatigue.

The principle is simple: you cannot out‑model bad context. The discipline is in designing workflows that keep agents focused, humans aligned, and context clean. Done well, this turns AI from an experiment into a dependable part of your delivery process.

Conclusion

Advanced context engineering reframes AI agent performance as a context problem, not a model problem. By adopting spec‑first workflows, structured progress files, and frequent compaction, teams can achieve reliable, one‑shot results in even the most complex codebases.

Watch the full conversation with Dexter Horthy here: https://www.youtube.com/watch?v=IS_y40zY-hc