Documentation that doesn't go stale

Across every organisation I’ve worked in, documentation is either missing or, once written, out of date. So, we’ve stopped treating it as something people maintain and made agents regenerate it on every code change.

TL;DR

  • estate-wiki is an internal, self-updating wiki for the Jollyes backend estate: one page per Bitbucket repo, regenerated by a scheduled agent that re-reads the code on a cadence, if there’s been a change. The docs are a generated artefact, so they cannot rot.
  • Each repo read produces two views from one pass: a README human onboarding view and a CLAUDE.md machine view, rendered together with a structured facts blob that machines consume.
  • Everything is searchable over an MCP, so Claude can pull in the real Jollyes landscape while coding any project, or understanding company-wide data flows.
  • There’s a fun symmetry between each file being deterministically pre-digested for the models into information-rich summaries, and the project as a whole condensing the estate into a summary other agents can call via the MCP: like Russian dolls of information.

The stack

estate-wiki runs a single, provider-agnostic review agent against the Jollyes Bitbucket estate inside an ephemeral ECS task. The model call sits behind a neutral interface, so the same agent runs on OpenAI, Anthropic, or others. It auto-discovers repos from the workspace, so the live wiki now covers a few hundred pages (one per repo, plus a sub-page per Airflow DAG) with no config edit when a new repo appears.

Airflow DAG
        │
        ▼
ECS Fargate task (review agent, service token)
        │  clone → HEAD → skip-if-unchanged?
        ▼
scope to git-tracked files  ──►  digesters (.dtsx, DAGs)
        │
        ▼
ONE summarise pass ──► facts JSON + human view + machine view
        │  (model chosen by config: OpenAI · Anthropic · Bedrock)
        ▼
backend REST /api/private/*  ──►  Postgres (one row per repo)
        ▲
        └── read back by AI agents over the MCP

The agent never touches the database directly. It writes through /api/private/* with a service token, so the backend stays the single Postgres writer (and reader).

Two views, one source

Each repo is one Postgres row holding the Q&A, both rendered views plus a facts JSONB. The agent extracts the facts and renders both markdowns in a single pass: a README-flavoured human view for the helpdesk and new starters, and a CLAUDE.md-flavoured machine view for developers and AI agents.

The facts blob does three jobs: it seeds both views, it’s cheap grounding handed to the Q&A agent so it needn’t re-read a whole repo per question, and it’s machine-consumable over the MCP. Fields include languages, endpoints, env vars, data stores, integrations, deploy target, owners and key files plus category-specific dags[] and ssisPackages[].

Digesting files before reading them

Not every repo is source code you can hand directly to a model. For both SSIS packages and Airflow DAGs, we have deterministic digesters that run before any LLM call. The model never sees raw files, for both security and better context.

For example, SSIS (.dtsx) packages are often huge XML documents (for example, a single MAIN.dtsx can be ~800 KB) and may contain encrypted secrets. Passing the raw XML to a model would be slow, expensive, and could expose credentials. dtsxDigest parses the package into a compact, secret-masked JSON representation containing:

  • Connection managers (the actual source and destination systems)
  • Data-flow components in execution order
  • SQL executed by each step

The result reads like a concise “source → transform → destination” pipeline rather than hundreds of kilobytes of XML markup.

I find it an interesting principle to think over: use cheap, fast and deterministic parsing to decide exactly what information reaches the model. The expensive LLM step receives only a clean, safe, information-dense representation.

In many ways, you can extend the entire wiki to the same principles: the repos are being parsed into information-dense, machine-accessible representations of the code base for rapid, cheaper consumption by downstream systems.

An example: stock, end to end

The wiki has a built-in Q&A per page, where the agent responds to user questions from the facts blob. Conceptually that’s simple, as the information is self-contained.

A much harder question is one that spans the whole estate, and here the wiki MCP comes into its own. I asked Claude Code: how does a “linked” pack/single SKU work, end to end? (A single 390g can, 61329, and its 12-pack, 61338, are the same product sold two ways, but stocked only as 12-packs, and counted as singles!)

Claude Code first searched across the wiki via the MCP for ‘stock’, ‘parent’ and ‘child’ and found the relevant repos. From here, the agent fanned out roughly 30 sub-agents across 17 repos, to first build the flow, and then adversarially verify each part of the final claim. The whole review cost a couple of million tokens, cheap because it ran small models over repos and only opened and read the key files with bigger models, and we managed to dig out a complex multi-system, multi-repo flow within 15 minutes.

(What’s more, the entire finding was then verified using the SQL MCP against our allocation and stock data, and double-checked against live point-of-sale (POS) APIs with a temporary token. Finally, the whole write-up was emailed to me from within the CLI using a new draft_emailsend_email tool chain in the SQL MCP (from claude@jollyes.com)!)

Closing

The ‘win’ isn’t that an agent can write your documentation. It’s that the documentation is never stale, and is readily available to downstream consumers in a useful format.

The stock example is where the system stops being just documentation and starts becoming operational infrastructure. That level of analysis is only possible because the estate has already been indexed, digested, and made searchable. The MCP compresses hundreds of pages of organisational knowledge (derived from millions of lines of code) into a form that an agent can navigate in seconds, while the facts blobs allow it to piece together the exact endpoints and inter-system dependencies in a structured way.

The wiki and MCP serve complementary roles. The wiki keeps the documentation accurate and discoverable; the MCP turns that knowledge into a machine-accessible interface. Together they allow agents to self-serve information at low cost and high speed to complete tasks more efficiently and accurately.

Written on June 8, 2026