Teh limited context window of large language models (LLMs) poses a challenge when working with large codebases.If you feed an AI model lots of huge code files, it can quickly burn through token or usage limits, as the LLM has to re-evaluate everything with each response.
Tricks of the trade
To overcome these limits, coding agent creators use several techniques. For example, AI models are fine-tuned to write code that outsources tasks to other software tools. They might write Python scripts to extract data from images or files instead of feeding the entire file to the LLM, which saves tokens and improves accuracy.
Anthropic’s documentation notes that Claude Code also uses this approach for complex data analysis over large databases. It writes targeted queries and uses Bash commands like “head” and “tail” to analyze large volumes of data without loading everything into context.
(These AI agents are, in a way, guided but semi-autonomous tool-using programs – a major extension of a concept we first saw in early 2023.)
Another breakthrough in agents is dynamic context management.Agents can do this in a few ways that aren’t fully public, but the most significant technique is context compression.
The command-line version of OpenAI codex running in a macOS terminal window.
Credit:
Benj Edwards
When a coding LLM nears its context limit, it compresses the context history by summarizing it, losing some details but shortening the history to key information. Anthropic’s documentation describes this “compaction” as distilling context in a high-fidelity way, preserving important details like architectural decisions and unresolved bugs while discarding redundant outputs.
This means the AI coding agents periodically “forget” a lot of what they’re doing, but they aren’t fully lost. They can quickly re-orient themselves by reading existing code, notes in files, and change logs.