Defining the source of truth

Every app has an implicit truth about what it is meant to be, and what it actually is. Call this the specification.

An idea that I’ve been thinking about is that in our new agentic development world this has become even more important.

If the specification is correct, you can actually rebuild an application from scratch fairly cheaply in terms of human effort. You don’t even need to use the same technology underneath.

In this sense, the actual code is a derived product of the specification. What’s more, things like the test suite can be separately derived from the specification. As can automated or human QA.

If the specification for an application is understood, then it can be copied and modified. I can share the specification for an application that I’ve found useful, and someone else can change the human readable description of it, and build it from scratch to their liking.

Some problems

In practise, there often is no source of truth except for the artefacts surrounding it. The development team has a backlog of stuff that has been built and a roadmap, the QA team has their own description of the product, marketing has its own view of what it does.

And this is in a traditional organization. I’ve recently been building apps myself. I create plans, agents implement them. The source of truth about what the application is becomes the code itself. “Add a button so that I can export that”.

What’s more, the specification of the product can be further divided into two descriptive groups: stuff that it does, stuff that we want it to do. Hopefully the former is a subset of the latter, but that’s not always the case. What’s more, the aspirational side is not always the same among different stakeholders. Other interesting sets include: stuff we don’t want it to do (which it may or may not do); stuff we think it does, but it doesn’t; and stuff it does that we aren’t aware of (whether or not we want it).

Trying to solve that

I think that “intent” is the current way of describing what we would like an application to be.

There’s a problem with intent being ephemeral. At a given point in time, we wanted a thing, so we built it. And now we may or may not remember why we wanted it at that point. Agents make bad assumptions about things that already exist.

Plans go stale. They are superseded. Plans also need to be highly granular when they first exist, but get summarised implicitly once we are familiar with them.

Maintaining a source of truth

When I ask Claude to build something, what I’m really doing is asking for the description of the application to be changed. If that is not recorded formally, the description of the application still changes, but it is implicit in the code and the tests.

This is problematic because I don’t really read the code any more. I can’t reason about a spec that I can’t see. One way to get around this is to read the code. But I don’t think that’s the answer. The code doesn’t necessarily matter. It’s like assembler used to be. It’s a poor communicator of intent.

When I ask for something, the set of aspiration items increases. The delta of built items and aspirational items increases. When the new thing is built, the delta decreases.

But what is the workflow? In the past, the delta of what we want vs what we have needs to be rigorously ordered, because it is expensive to build things. Now the delta gets attacked as soon as we ask for the idea. But it doesn’t need to be like that.

In an ideal world, I think it would work like this:

I ask for something
The specification is updated
It is noted that something else might have become redundant as a result
It is noted that this new thing is not implemented
The delta of things we want vs the things we don’t want vs what we have is resolved
The specification is updated to reflect the new situation

This does happen currently, but the code is the specification. Claude often says it is going to read the code to see what the situation is. That is fine for noting what is done, but not for understanding intent.

In practise, I could either tackle this in the manner above, or just maintain a spec of what actually exists. This would look like the following:

I ask for something
A thing is built
The spec is updated to reflect what was built and why

The delta of things that we aspire to, but haven’t built yet remains in all the tickets and planning documents that we already have.

But another issue is discipline. It’s hard to do everything in the right way.

It’s hard not to simply ask for implementation changes, rather than thinking about the spec holistically, and how our model of the world has changed. But this is potentially something agents can do more thoroughly than we can.

Wiki spec

Over the last couple of days I’ve been thinking about how I can improve the way I work. My latest idea is to keep the specification of the project as a set of markdown documents in the repo that can be served as a wiki.

A wiki is good for this. Firstly, it can be divided into different sections that act as different lenses on the application. Examples include:

Functionality - what does the system allow a user to do
UI - how are the controls of the system presented to the user
Architecture - what are the important aspects of how the system should be built
Algorithms, data structures, processing pipelines - how is data manipulated and transformed by the system

So there are parts of the specification that are declarative, and some that are more descriptive and lean into the quality of the system that we want to build. Even things like algorithms can be described in human language at a high level, without resorting to putting it in code where it gets mixed with other competing concepts.

Second, a wiki can describe a system fractally — splitting off into new pages that are linked together. In this way, we can try to be comprehensive about describing what we want the system to be, without putting it all in a massive document. My intuition is that it’s useful for agents to be able to have a narrow context window in many circumstances, and looking at a subset of pages can help with that.

Invariants

For all the assertions made about the system in the specification, we should be able to declare them as invariants. However the system is built, these things should be true. We should then be able to say “is this invariant true?” for each of them to get the subset of the system that is actually complete. The remaining invariants are the delta of features that is aspirational.

The way that we check whether an invariant is true might be through integration tests, it might be through QA, or it might be some other mechanism. Ideally, the actual tests are derived from the spec. Many people like to say that user stories, or acceptance criteria or something else is the source of truth for the system. My view is that having humans write these kinds of things in very specific ways is kind of a waste of time. They should be produced by agents.

Agents put it together and keep it running

A key point is that wikis — like all documentation — are subject to entropy. Things go stale or become out of date. Humans are terrible at sticking to conventions, or updating existing documents when something changes.

Agents can help in multiple ways:

The wiki is essentially read only for humans. If you want something updated, you ask an agent to do it. The agent has specific skills (like a checklist) to make sure it’s done properly. The wiki interface has mechanisms for asking agents to do stuff.
The wiki is crawled regularly for changes. When it changes, agents make sure that it is not broken.
When code changes, the wiki is checked for consistency with the code. Are invariants being met? Has the delta between aspiration and reality diminished?
When asking for changes, agents should be viewed as ultra disciplined PMs. They don’t go and implement anything. They update the wiki. They can then plan an implementation based on the delta between the current state of the code, and what the wiki now describes.

Bootstrapping

With any attempt to introduce rigour to a process, there is a cold start problem. How can I see the value of working in this way, without putting in a huge amount of effort? How do I know it will be worth it?

I’m trying to make this easier. This means that there is a batch import for the wiki that:

Looks for markdown files in your project that might contain plans
Categorises the content by wiki categories
Updates existing pages or creates new ones

The next steps are to look through the git commits and the codebase itself to see if there are aspects of the system that haven’t been documented yet. This step would also check whether invariants are met or not.

I’ve run this script against the documents in my wiki project directory, and I’ve hosted a version of it online for reference. The agent instruction boxes and search don’t work, but the principle is demonstrated. There’s a page for the underlying principles behind it, as well as the initial version of the batch process mentioned above. The repo is here, but there’s not a huge amount to it at the moment. The principles and the spec is the interesting part that can be built however you like!