The XML Diet: a leaner knowledge base for agents that fix themselves

Fri, 12 Jun 2026 09:00:00 +0000

Feed a local model a 500-page PDF and it bogs down fast with slower answers and more confused output.

When you decide to leave cloud models and run your agents locally, this is the first big challenge you’ll face. Renting AI by the token buries the cost of bloat inside someone else’s infinite servers, so you never feel it. Move the same agent onto hardware you own and every wasted token turns into latency and inaccuracy. Don’t give up at this point. It forces a discipline that will be worth the payoff. A sprawling knowledge base on a local box is a grand piano trying to fit into a Honda Civic.

The solution is the XML Diet.

The bloat: why your knowledge base is failing

Most teams treat retrieval (RAG) like a messy attic. They dump the docs, the Slack logs, and the internal wikis into a vector database. When an agent needs an answer, the system grabs a handful of “relevant” chunks and drops them into the context window.

The result is a context window that reads like a junk drawer.

The model gets confused. It sees five different ways to reset a password, because the retrieved context includes a chat from 2022, a policy from 2024, and a snarky aside from an engineer named Dave. It burns tokens figuring out what matters and what’s noise. That’s token friction, and it wrecks agent performance.

On local hardware, token friction is the enemy. It slows your inference. It bloats your RAM. It makes a self-hosted agent feel like it’s wading through waist-deep mud.

Enter the XML Diet

XML earns its place for one reason: LLMs are good at reading tag boundaries. Those angle brackets from the 1990s web turn out to be exactly the structure a model needs to tell one thought from the next. XML is the skeleton of a fast knowledge base.

Compressing a knowledge base into XML is closer to semantic surgery than to zipping a file. You strip the prose and keep the logic.

1. Tag-based boundaries

Wrap your data in tags like , , or , and you tell the model exactly where one thought ends and the next begins. The model stops guessing whether a sentence is an instruction or background. The tag says so.

2. Concise shorthand

On the XML Diet, the labels lose weight. Instead of , use . Instead of , use . LLMs have seen so much code and markup that they read these abbreviations fine. You save tokens on the labels and spend them on the payload.

3. Structural hierarchy

XML nests without confusion. A can hold a , which holds individual tags. That hierarchy maps to how an agent works through instructions. It’s a roadmap for the model’s inner monologue.

The secret sauce: a knowledge base that heals itself

The best part of an XML knowledge base is that it can repair itself.

Say your agent fails a task. Maybe it couldn’t find the right server credentials because the docs were stale. In a self-hosted pipeline, you set up a second agent (call it the Doctor) to watch the logs.

Because the knowledge base is XML, the Doctor can parse it without reading a whole document. It looks for the one whose ID caused the failure.

The failure. Agent A misses a step because is deprecated.
The diagnosis. The Doctor reads the error log and finds the tag.
The healing. The Doctor fetches the new info, rewrites only that XML node, and bumps the version to .
The update. The vector index re-runs for just that chunk.

That’s the self-healing loop. Your knowledge base gets leaner and sharper every time it makes a mistake, rather than rotting into a tombstone of stale docs.

How to put your own KB on the diet

If you want to start shrinking your knowledge base today, here’s the protocol we use.

Step 1: Define your schema

Don’t wing the tags. Pick a small core set. We like a minimalist stack:

: global rules.
: retrieved knowledge chunks.
: user-specific data.
: few-shot examples.
: the required output format.

Step 2: Strip the prose

Run your current RAG chunks through a summarizer with one instruction: rewrite this as a high-density XML entry, no fluff, facts and logic only.

A fat chunk: “To restart the payment gateway, first check the load balancer. If it’s green, go to the terminal and type sudo service restart. Do this in order.”

A diet chunk:

lb_status=="green"

sudo service restart strict

The second version is about 70% smaller in tokens and far more reliable for an agent to execute.

Step 3: Optimize for local hardware

Since this runs on your own gear, host the models with vLLM or Ollama and pair them with an XML-aware parser. When the model outputs XML, you catch it, validate it against your schema, and if it’s broken you send it back for a self-healing retry before the user ever sees it.

Why this matters for your business

This works for agents running on cloud models too. If you’re a founder or an engineer at a mid-sized company, you don’t want a $5,000-a-month API bill for an agent that hallucinates because its context window is full of garbage. Leaner context is fewer tokens, and on a metered cloud model that lands straight on your invoice.

The XML Diet is about efficiency. It’s how a garage-built pipeline you own can out-run the big, bloated cloud setups. Structure the knowledge base this way and you can run real, multi-step agents on a single Mac Studio or one dedicated Linux box.

You go from renting intelligence to owning the infrastructure.

At Ghost Hat Studio we help companies build exactly these pipelines: lean, fast, self-improving agent systems that are theirs to own.

Stop feeding your agents junk. Put them on the XML Diet and watch them get the job done.

The Ghost Hat Studio Blog