Improve Codebase Architecture
Surface architectural friction and propose deepening opportunities — refactors that turn shallow modules into deep ones. The aim is testability and AI-navigability.
Glossary
Use these terms exactly in every suggestion. Consistent language is the point — don't drift into "component," "service," "API," or "boundary." Full definitions in LANGUAGE.md.
- Module — anything with an interface and an implementation (function, class, package, slice).
- Interface — everything a caller must know to use the module: types, invariants, error modes, ordering, config. Not just the type signature.
- Implementation — the code inside.
- Depth — leverage at the interface: a lot of behaviour behind a small interface. Deep = high leverage. Shallow = interface nearly as complex as the implementation.
- Seam — where an interface lives; a place behaviour can be altered without editing in place. (Use this, not "boundary.")
- Adapter — a concrete thing satisfying an interface at a seam.
- Leverage — what callers get from depth.
- Locality — what maintainers get from depth: change, bugs, knowledge concentrated in one place.
Key principles (see LANGUAGE.md for the full list):
- Deletion test: imagine deleting the module. If complexity vanishes, it was a pass-through. If complexity reappears across N callers, it was earning its keep.
- The interface is the test surface.
- One adapter = hypothetical seam. Two adapters = real seam.
If the project has its own domain vocabulary (in code, types, or docs), use it for the domain — LANGUAGE.md vocabulary is only for the architecture layer.
Process
1. Explore
Use the Agent tool with subagent_type=Explore to walk the codebase. Don't follow rigid heuristics — explore organically and note where you experience friction:
- Where does understanding one concept require bouncing between many small modules?
- Where are modules shallow — interface nearly as complex as the implementation?
- Where have pure functions been extracted just for testability, but the real bugs hide in how they're called (no locality)?
- Where do tightly-coupled modules leak across their seams?
- Which parts of the codebase are untested, or hard to test through their current interface?
Apply the deletion test to anything you suspect is shallow: would deleting it concentrate complexity, or just move it? A "yes, concentrates" is the signal you want.
2. Present candidates as an HTML report
Write a self-contained HTML file to the OS temp directory so nothing lands in the repo. Resolve the temp dir from $TMPDIR, falling back to /tmp (or %TEMP% on Windows), and write to <tmpdir>/architecture-review-<timestamp>.html so each run gets a fresh file. Open it for the user — xdg-open <path> on Linux, open <path> on macOS, start <path> on Windows — and tell them the absolute path.
The report uses Tailwind via CDN for layout and styling, and Mermaid via CDN for diagrams where a graph/flow/sequence reliably communicates the structure. Mix Mermaid with hand-crafted CSS/SVG visuals — use Mermaid when relationships are graph-shaped (call graphs, dependencies, sequences), and hand-built divs/SVG when you want something more editorial (mass diagrams, cross-sections, collapse animations). Each candidate gets a before/after visualisation. Be visual.
For each candidate, the same template as before, but rendered as a card:
- Files — which files/modules are involved
- Problem — why the current architecture is causing friction
- Solution — plain English description of what would change
- Benefits — explained in terms of locality and leverage, and how tests would improve
- Before / After diagram — side-by-side, custom-drawn, illustrating the shallowness and the deepening
- Recommendation strength — one of
Strong,Worth exploring,Speculative, rendered as a badge
End the report with a Top recommendation section: which candidate you'd tackle first and why.
Use the project's own domain vocabulary, and LANGUAGE.md vocabulary for the architecture. If the project's code or docs name an "Order," talk about "the Order intake module" — not "the FooBarHandler," and not "the Order service."
See HTML-REPORT.md for the full HTML scaffold, diagram patterns, and styling guidance.
Do NOT propose interfaces yet. After the file is written, ask the user: "Which of these would you like to explore?"
3. Refine the chosen candidate
Once the user picks a candidate, walk the design tree with them — constraints, dependencies, the shape of the deepened module, what sits behind the seam, what tests survive. Load interview:plan for a structured interview.
If they want to explore alternative interfaces for the deepened module, see INTERFACE-DESIGN.md.