Thoughts from working with Claude intensely on a project that got complicated
Working with Claude interactively is often frustrating. Waiting while it explores. Giving permissions. Testing and providing feedback.
It also has an unhelpful tendency to be overly action-oriented and to jump into work when that’s not the right time to do so. I’ve seen it fix bugs that another agent created, do more items in a list of tasks than it had been authorised to, and commit too much to the repo.
It’s been my experience that it often layers new code paths into the application when old ones could have been re-used. This causes annoying tech debt.
When I work with Claude interactively, there are also benefits. I can optimise the workflow and the rules governing development. I can stop it going down stupid rabbit holes or getting into loops where it changes variables back and forth. But this basically means I’m babysitting.
Another benefit is that it sparks ideas for future development or process improvements. The problem I ran into was that I often felt torn with these ideas: do them now, because they seem related to the current work; or somehow capture the intent of them so that they can be worked on later at the appropriate time. As a corollary to the latter, I don’t want to capture too much now because it will derail the current work (and is potential waste in the lean manufacturing sense), but I also don’t want to leave myself with a vague todo that will be unrecognisable in future.
I also want to make sure that technical debt is kept to a bare minimum. It slows down development and makes inspecting outcomes very difficult. At the same time, I don’t want to be inspecting the code and making constant refactoring suggestions. However, on a number of occasions, I’ve had to come up with a better abstraction and request a refactoring.
What I really want is to operate at the level of intent. I want to imagine the best experience that users can have, and work towards adding and removing functionality that doesn’t fit with this. I want agents to flesh out plans, work out testing strategies, and fill in the details.
I want to describe rules for how the application works in plain language and have agents build things that don’t violate those rules.
I want records of all the ideas that I’ve requested and built. When there are silly regressions, or a new Claude doesn’t have it in its context, I want to be able to give it a map before it goes to fix a bug.
I don’t want to write long detailed documents. I want to describe intent in a very short form and have the guardrails mapped out by an agent. My preference is hand-writing notes away from the computer.
I want to be the gatekeeper of the user experience, but that doesn’t mean I want to be a QA who has to file tons of bug reports.
It’s worth pointing out that I’ve written zero tests, zero code, and zero planning documents on this project. I rarely read the code that is produced.
What I’ve done
When everything became unmanageable, I stepped back and made a plan for a new way of working that would help me manage product development better.
-
I treat the interactive Claude as an assistant project manager and avoid using it for development work. I created a
CLAUDE.local.mdin a different part of the repo from where Claude expects it, symlinked it into place, and put that symlink in.gitignore. Interactive Claude gets instructions to act as a product manager and to avoid writing code at all costs. Its primary mode is helping me understand what needs doing and enqueuing work for other agents. -
I prefer asynchronous work wherever possible. PM Claude writes task breakdowns into a directory and puts them into an orchestration database. An orchestrator looks for work that needs doing and spins up appropriate agents.
-
I have worker agents do their work in a git worktree set up specifically for them. This might be a new feature branch if the task is self-contained, or an existing project branch that already contains work from other agents.
-
I give implementers their task as a prompt. They write progress reports as they go and then declare that they are done. Some work auto-merges. Other work requires checks before it can proceed to final approval from me.
-
I allow auto-merging when work does not affect the user experience. This is usually infrastructure or tooling work that I might previously have asked Claude to do interactively. If it doesn’t work, I can revert or fix it later. These things are much cheaper than they used to be, so I try to take myself out of the loop.
-
When a feature needs human checking, it goes through a set of gatekeeper agents. The orchestrator spins these up automatically. The task itself declares which gatekeepers should review it. For product features, the main one is a QA agent.
-
I don’t allow the QA agent to look at the code. I ask it to read the acceptance tests and imagine what state needs to be created in the application and what that will look like. It serialises the starting state, opens a browser via Playwright MCP with that state loaded, and attempts the operations described. It reports whether the UI behaves as expected and whether the operations are discoverable without explanation. This catches the most egregious problems before I ever see the PR.
-
I created a drafts directory. I use a slash command to capture ideas, which are categorised, numbered, and have my original prompt stored. PM Claude fleshes these out and adds clarifying questions.
-
I created tasks for reviewing drafts and turning them into actionable work. Processing a draft checks whether the work has already been done, proposes tasks if not, splits remaining work into new drafts if partially complete, and captures any rules about how the application should behave.
-
I run an automatic check for drafts older than a few days. An agent processes them and drops a note into a human inbox describing what action was taken. This keeps work moving without me needing to remember it.
Future improvements
This system is working reasonably well, but there are still areas I haven’t built out properly:
- Improving architectural quality
- Message passing and actioning
- Prioritising work
Proactively improving the code (and tests)
I realised I was missing a role that looks at the trade-off between short-term speed and long-term speed.
Implementer agents with acceptance tests tend to find the shortest path to completion. I need adversarial roles that introduce checks and balances.
For my project’s velocity to stay high, I need agents to follow the same conventions and avoid making the situation worse.
I’m planning to address this in two ways:
- I will introduce dedicated gatekeeper agents that call out violations such as creating new code paths when existing ones could be reused.
- I will periodically spin up advisory agents to review the codebase and testing. They will draft proposals for improvement, but only if there aren’t already outstanding proposals from them.
Message passing and actioning
With all this going on, I needed a way to keep both the agents and myself in the loop asynchronously.
I introduced a human inbox where agents can send messages. These messages must include a clear list of possible actions I can take in response. For example, reversing a decision, enqueuing a project, or accepting a proposal.
The agent that sent the message doesn’t have to be the one that handles the follow-up. It just needs to include instructions that any agent can follow.
I also introduced agent inboxes, although I’m still figuring out exactly how they should be used. There’s no need for continuity of identity between agents, so any agent can pick up work.
Prioritising work
Where I want to get to is being able to record very generalised product intent and have agents work towards making it happen.
I want a roadmap that orders all proposed projects: bug fixes, features, and architectural improvements.
I want a current priorities document that informs how automated PM agents make decisions. Sometimes I might want to focus on tech debt. Sometimes I might want to push towards a minimal viable product.
I also want to create some form of product culture documentation that describes how I want the system to evolve: keeping the feature set minimal, prioritising safety in architecture, favouring progressive disclosure for advanced users, and being cautious about adding new functionality.
Process wrangling
Getting this process working has been difficult, partly because of early decisions like using git submodules.
I’ve had to fix a lot of this through interactive Claude. In future, I’m creating an agentic meta-process improver that activates when there are problems in the queue.
The orchestrator has a TUI that visualises what’s happening, and it’s extremely satisfying to watch a fleet of agents working autonomously.