Smart context management for coding agents

Posted on Oct 28, 2025

You’re working with Claude Code, and the session is going well. Then something shifts. Responses get vaguer. The agent suggests things you already tried.

Your instinct? Add more context. Paste in more files, explain the architecture again. But that’s exactly the wrong move, because the problem isn’t that your coding agent needs more information — it’s somehow drowning in what it already has.

What context window actually means

A context window is the total amount of input and output tokens an LLM can process in a single interaction. Everything counts — your messages, the agent’s responses, files it’s reading, system instructions, those MCP servers you installed.

Hit the token limit and generation stops, but that’s not necessarily the main problem. The real issue happens long before you run out of tokens: as you stuff more information into context, retrieval quality degrades. This is especially true for information buried in the middle — the lost-in-the-middle effect.

Think of it as attention, not RAM. A coding agent with 200,000 tokens of context isn’t necessarily more capable than one with 20,000 tokens of focused, relevant context. Often it’s less capable, spending resources navigating an ocean of information when it needs to make precise decisions.

The 2M token myth

Models with 1 million, 2 million tokens felt like the solution. They’re not — at least not by themselves.

Large context windows do matter. They let you work with entire codebases at once and when you need to refactor across dozens of files or understand complex system interactions, a bigger window is genuinely valuable.

But what matters more isn’t how much information the model can hold, but how well it retrieves what it needs. A model with perfect retrieval from 50,000 tokens outperforms one with mediocre retrieval from 2 million tokens. Context overload is still overload, regardless of capacity.

Don’t judge models by window size alone. Judge them by how well they use the context you give them—and by how well you manage what goes in.

Practical strategies for leaner sessions

Default to fresh starts. This is the single most important shift. Instead of maintaining one epic conversation thread that spans days, prefer short, focused sessions. When a thread starts feeling heavy—responses slow down, the agent seems confused — just start fresh.

You’ve built up context, taught the agent about your codebase. Starting over feels wasteful, but a clean slate with a clear objective almost always outperforms a bloated thread. Use /compact sparingly. It costs tokens and time, and it’s rarely as effective as a new thread with a clear goal.

Document decisions, don’t hoard context. Ask the coding agent to document key decisions in markdown files as you go. In Claude Code you can use # to put relevant instructions in memory.

Create a decisions.md or progress.md file where it records—concisely—what was decided, why, what failed, what constraints matter.

When you start a fresh session, scan the markdown and give the agent exactly the context it needs. You’re pointing to documented decisions, not asking it to remember everything from days ago. A 200-message thread doesn’t scale. A well-maintained markdown file with 20 key decisions? Useful indefinitely.

Watch for hidden context drain. Every MCP server you install consumes context before you even start typing. Install five or six and you begin each session with a significant chunk already spoken for. Plus, the agent now has more irrelevant information to navigate.

Be intentional. Do you need that database MCP for today’s frontend work? Disable tools you’re not using.

Judge by retrieval quality, not window size. The agent suggests things you already rejected. Asks questions you answered. Provides code that conflicts with earlier patterns. Responses become generic.

These aren’t signs the agent is “bad” — it’s struggling with too much context. Time for a reset.

Quick reference tips

Session nanagement

Start fresh sessions for new tasks or features;
Reset when responses slow down or get vague by using /clear or a similar command;
Use /compact only when absolutely necessary;
Aim for focused threads, not epic conversations.

Context “hygiene”

Audit installed MCP servers regularly;
Disable servers not needed for current work;
Keep lean system prompts;
Remove persistent instructions that aren’t relevant

Documentation over memory

Create decisions.md or progress.md in your project;
Have the agent update it after key decisions;
Keep entries concise: what was decided, why, what failed;
Reference it when starting fresh sessions instead of repeating context.

Warning signs

Agent suggests previously rejected approaches;
Asks questions you already answered;
Code conflicts with established patterns;
Responses become increasingly generic or hedged;
Session feels sluggish or unresponsive.

Closing

Productivity with coding agents doesn’t come from feeding them more information. It comes from keeping context clean and relevant.

When something goes wrong, the natural response is adding more context. But you’re not solving a knowledge problem—you’re creating a retrieval problem.

For those working with tools like Claude Code, this means rethinking how we structure work. Focused sessions with clear objectives. Documented decisions, not hoarded context. The discipline of keeping sessions lean matters more than the theoretical capacity to make them massive.

Start fresh more often than feels necessary. Document more than you think you should. Reset when things feel heavy.