<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/blog/feed.xsl"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
<title>The Ghost Hat Studio Blog</title>
<link>https://ghosthatstudio.com/blog/</link>
<atom:link href="https://ghosthatstudio.com/blog/feed.xml" rel="self" type="application/rss+xml"/>
<description>Field notes on building AI agents you own.</description>
<language>en-us</language>
<lastBuildDate>Sat, 13 Jun 2026 09:00:00 +0000</lastBuildDate>
<item><title>The XML Diet: a leaner knowledge base for agents that fix themselves</title><link>https://ghosthatstudio.com/blog/xml-diet/</link><guid>https://ghosthatstudio.com/blog/xml-diet/</guid><pubDate>Fri, 12 Jun 2026 09:00:00 +0000</pubDate><enclosure url="https://ghosthatstudio.com/blog/xml-diet/img/hero.webp" type="image/webp" length="0"/><description><![CDATA[Feed a local model a 500-page PDF and it bogs down fast with slower answers and more confused output.]]></description><content:encoded><![CDATA[<p><img src="https://ghosthatstudio.com/blog/xml-diet/img/hero.webp"
        alt="A vintage 1950s wood-cabinet television used as a computer monitor, its curved screen glowing with a structured stream of code, against a workshop wall covered in brass gears and gauges."></p>

<p>Feed a local model a 500-page PDF and it bogs down fast with slower answers and more confused output.</p>

<p>When you decide to leave cloud models and run your agents locally, this is the first big challenge you&rsquo;ll face. Renting AI by the token buries the cost of bloat inside someone else&rsquo;s infinite servers, so you never feel it. Move the same agent onto hardware you own and every wasted token turns into latency and inaccuracy. Don&rsquo;t give up at this point. It forces a discipline that will be worth the payoff. A sprawling knowledge base on a local box is a grand piano trying to fit into a Honda Civic.</p>

<p>The solution is the XML Diet.</p>

<h2>The bloat: why your knowledge base is failing</h2>
<p>Most teams treat retrieval (RAG) like a messy attic. They dump the docs, the Slack logs, and the internal wikis into a vector database. When an agent needs an answer, the system grabs a handful of &ldquo;relevant&rdquo; chunks and drops them into the context window.</p>
<p>The result is a context window that reads like a junk drawer.</p>
<p>The model gets confused. It sees five different ways to reset a password, because the retrieved context includes a chat from 2022, a policy from 2024, and a snarky aside from an engineer named Dave. It burns tokens figuring out what matters and what&rsquo;s noise. That&rsquo;s token friction, and it wrecks agent performance.</p>
<p>On local hardware, token friction is the enemy. It slows your inference. It bloats your RAM. It makes a self-hosted agent feel like it&rsquo;s wading through waist-deep mud.</p>

<p><img src="https://ghosthatstudio.com/blog/xml-diet/img/bloat.webp"
        alt="An old metal workshop drawer pulled open and overstuffed with crumpled papers, tangled wires, and brass parts spilling over the edges."></p>

<h2>Enter the XML Diet</h2>
<p>XML earns its place for one reason: LLMs are good at reading tag boundaries. Those angle brackets from the 1990s web turn out to be exactly the structure a model needs to tell one thought from the next. XML is the skeleton of a fast knowledge base.</p>
<p>Compressing a knowledge base into XML is closer to semantic surgery than to zipping a file. You strip the prose and keep the logic.</p>

<h3>1. Tag-based boundaries</h3>
<p>Wrap your data in tags like <code>&lt;instr&gt;</code>, <code>&lt;policy&gt;</code>, or <code>&lt;ref&gt;</code>, and you tell the model exactly where one thought ends and the next begins. The model stops guessing whether a sentence is an instruction or background. The tag says so.</p>

<h3>2. Concise shorthand</h3>
<p>On the XML Diet, the labels lose weight. Instead of <code>&lt;detailed_customer_profile_information&gt;</code>, use <code>&lt;usr&gt;</code>. Instead of <code>&lt;relevant_context_from_the_database&gt;</code>, use <code>&lt;ctx&gt;</code>. LLMs have seen so much code and markup that they read these abbreviations fine. You save tokens on the labels and spend them on the payload.</p>

<h3>3. Structural hierarchy</h3>
<p>XML nests without confusion. A <code>&lt;task&gt;</code> can hold a <code>&lt;step_list&gt;</code>, which holds individual <code>&lt;step&gt;</code> tags. That hierarchy maps to how an agent works through instructions. It&rsquo;s a roadmap for the model&rsquo;s inner monologue.</p>

<h2>The secret sauce: a knowledge base that heals itself</h2>
<p>The best part of an XML knowledge base is that it can repair itself.</p>
<p>Say your agent fails a task. Maybe it couldn&rsquo;t find the right server credentials because the docs were stale. In a self-hosted pipeline, you set up a second agent (call it the Doctor) to watch the logs.</p>
<p>Because the knowledge base is XML, the Doctor can parse it without reading a whole document. It looks for the one <code>&lt;entry&gt;</code> whose ID caused the failure.</p>
<ol>
    <li><strong>The failure.</strong> Agent A misses a step because <code>&lt;policy_id=&quot;v1&quot;&gt;</code> is deprecated.</li>
    <li><strong>The diagnosis.</strong> The Doctor reads the error log and finds the tag.</li>
    <li><strong>The healing.</strong> The Doctor fetches the new info, rewrites only that XML node, and bumps the version to <code>&lt;policy_id=&quot;v2&quot;&gt;</code>.</li>
    <li><strong>The update.</strong> The vector index re-runs for just that chunk.</li>
</ol>
<p>That&rsquo;s the self-healing loop. Your knowledge base gets leaner and sharper every time it makes a mistake, rather than rotting into a tombstone of stale docs.</p>

<p><img src="https://ghosthatstudio.com/blog/xml-diet/img/healing.webp"
        alt="A robotic hand using a digital laser to stitch together glowing fragments of XML code, healing a broken knowledge base."></p>

<h2>How to put your own KB on the diet</h2>
<p>If you want to start shrinking your knowledge base today, here&rsquo;s the protocol we use.</p>

<h3>Step 1: Define your schema</h3>
<p>Don&rsquo;t wing the tags. Pick a small core set. We like a minimalist stack:</p>
<ul>
    <li><code>&lt;sys&gt;</code>: global rules.</li>
    <li><code>&lt;ctx&gt;</code>: retrieved knowledge chunks.</li>
    <li><code>&lt;usr&gt;</code>: user-specific data.</li>
    <li><code>&lt;ex&gt;</code>: few-shot examples.</li>
    <li><code>&lt;out&gt;</code>: the required output format.</li>
</ul>

<h3>Step 2: Strip the prose</h3>
<p>Run your current RAG chunks through a summarizer with one instruction: rewrite this as a high-density XML entry, no fluff, facts and logic only.</p>
<p>A fat chunk: <em>&ldquo;To restart the payment gateway, first check the load balancer. If it&rsquo;s green, go to the terminal and type sudo service restart. Do this in order.&rdquo;</em></p>
<p>A diet chunk: <code>&lt;action id=&quot;pay_restart&quot;&gt; &lt;pre&gt;lb_status==&quot;green&quot;&lt;/pre&gt; &lt;cmd&gt;sudo service restart&lt;/cmd&gt; &lt;order&gt;strict&lt;/order&gt; &lt;/action&gt;</code></p>
<p>The second version is about 70% smaller in tokens and far more reliable for an agent to execute.</p>

<h3>Step 3: Optimize for local hardware</h3>
<p>Since this runs on your own gear, host the models with <a href="https://docs.vllm.ai/en/latest/">vLLM</a> or <a href="https://ollama.com/">Ollama</a> and pair them with an XML-aware parser. When the model outputs XML, you catch it, validate it against your schema, and if it&rsquo;s broken you send it back for a self-healing retry before the user ever sees it.</p>

<h2>Why this matters for your business</h2>
<p>This works for agents running on cloud models too. If you&rsquo;re a founder or an engineer at a mid-sized company, you don&rsquo;t want a $5,000-a-month API bill for an agent that hallucinates because its context window is full of garbage. Leaner context is fewer tokens, and on a metered cloud model that lands straight on your invoice.</p>
<p>The XML Diet is about efficiency. It&rsquo;s how a garage-built pipeline you own can out-run the big, bloated cloud setups. Structure the knowledge base this way and you can run real, multi-step agents on a single Mac Studio or one dedicated Linux box.</p>
<p>You go from renting intelligence to owning the infrastructure.</p>
<p>At <a href="https://www.ghosthatstudio.com">Ghost Hat Studio</a> we help companies build exactly these pipelines: lean, fast, self-improving agent systems that are theirs to own.</p>
<p>Stop feeding your agents junk. Put them on the XML Diet and watch them get the job done.</p>

<p><img src="https://ghosthatstudio.com/blog/xml-diet/img/architecture.webp"
        alt="A minimalist architecture diagram on a dark background: a circle labeled Local Hardware holding three nodes, XML Knowledge Base, Agent, and Self-Healing Loop, with arrows flowing between them."></p>]]></content:encoded></item>
</channel>
</rss>
