Debug Skill
4-phase root-cause investigation. Iron Law: no fix without root cause.
When to use
- Any test failure (unit, integration, E2E)
- Build break or compile error
- Runtime exception in production code
- Performance regression
- Integration issue between modules
- "It worked yesterday" bugs
When NOT to use
- Feature implementation (use
/planor wave-executor) - Pure refactoring with no broken behavior
- Style/lint fixes with no behavioral impact
Phase 0: Bootstrap Gate
Read skills/_shared/bootstrap-gate.md and execute the gate check. If the gate is CLOSED, invoke skills/bootstrap/SKILL.md and wait for completion before proceeding. If the gate is OPEN, continue to Phase 1.
The Iron Law
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.
No matter how obvious the fix looks, complete Phase 1 (Root Cause Investigation) and produce the artifact BEFORE writing any fix code. This is non-negotiable.
The "2-hour obviously a typo" case has been the root cause of every multi-day debugging session in every project's history. Treat the obvious fix as a hypothesis to verify, not a conclusion to act on.
Phase 1: Root Cause Investigation
1.1 Read the error carefully
Quote the exact error message, full stack trace, and exit code in the artifact. Do not paraphrase. Paraphrasing introduces your assumptions before investigation begins.
1.2 Reproduce reliably
Document the exact command and environment that triggers it. If reproduction is flaky, that itself is a data point — write down the failure rate.
# minimum reproduction attempt
<exact command>
# env: Node version, OS, env vars relevant to the failure
1.3 Check recent changes
git log --oneline -20
git diff HEAD~5..HEAD -- <affected paths>
Identify the smallest set of commits that could have introduced this. Do not assume the most recent commit is the culprit without checking.
1.4 Instrument every component boundary
Add temporary logging at each input/output point along the failing path's modules. Capture actual values, not assumptions about what they should be. Run with instrumentation in place and record results.
1.5 Trace data flow backward
Start from the failure point and walk upstream. At each step, verify what you think is true with an actual Read or Bash call. "I assume X is Y" is not data.
Phase 1 artifact
Write to .orchestrator/debug/<session-id>-<sequence>.md (sequence = 1, 2, 3 within session).
Artifact MUST be written before proceeding to Phase 2. See "Artifact contract" section below.
Phase 2: Pattern Identification
After the Phase 1 artifact is written:
- Is this a recurrence of a previously fixed bug? Search
git log --all --grep="<symptom keyword>". - Is there a class of bug this exemplifies (race condition, off-by-one, null check, encoding, async ordering)?
- Is there a missing test that would have caught this? Identify what the test would look like.
- Search the codebase for similar patterns:
grep -rn "<suspect pattern>" --include="*.ts" --include="*.mjs" git grep "<symbol>"
Record findings in the artifact's Phase 2 section.
Optional operator-side companion: /tmux-layout --layout debug renders a 4-pane layout (scratch shell + npm test --watch + tail -F .orchestrator/debug/*.md + watch -n 2 'git diff --stat') so you can observe hypothesis tests, debug artifacts, and diff progression peripherally. Pure observability — the /debug skill works identically with or without the layout (per ADR-0007 + GitLab #562).
Phase 3: Impact Analysis
- What else does the root-caused code touch?
grep -rn "<function or symbol name>" - What other code paths could share the same root cause?
- Will the fix break anything else? Identify callers, dependents, transitive effects.
- List the affected files explicitly in the artifact.
Phase 4: Solution
Only after Phases 1-3 are complete and the artifact is written:
- Propose the minimal fix — the smallest code change that addresses the confirmed root cause
- Identify test cases that would have caught this; write them alongside or before the fix if appropriate
- Document the fix decision in the artifact under a "Resolution" section
- Verify the fix: run the test command from Session Config (
test-command) and confirm the failure no longer reproduces
Do not gold-plate the fix. Scope creep during a bugfix introduces new bugs.
Artifact contract
.orchestrator/debug/<session-id>-<sequence>.md:
# Debug session: <topic>
Created: <ISO timestamp>
Session: <session-id>
## Phase 1 — Root Cause
### Error
[exact error message + stack trace + exit code — verbatim]
### Reproduction
[exact command]
[environment: Node version, platform, relevant env vars]
[frequency: always | intermittent (~N/10)]
### Suspect commits
- <sha> <subject> — reason
### Instrumentation data
[per-boundary observations with actual values]
### Hypothesized root cause
[ONE sentence] · Confidence: [low|medium|high]
## Phase 2 — Pattern
[recurrence? class? missing test? similar code paths?]
## Phase 3 — Impact
[affected files, callers, dependents]
## Phase 4 — Solution
[proposed fix + test cases identified]
## Resolution
[commit SHA, files changed, tests added — filled in after fix lands]
Rules for the artifact:
- "Hypothesized root cause" is always ONE sentence. Multiple hypotheses signal you have none — pick the most evidence-backed one and lower confidence if uncertain.
- Do not write fix code in Phase 1-3 sections. Those sections are investigation only.
- The artifact is the deliverable — not an optional note.
Integration with wave-executor
When wave-executor classifies a task as bugfix, it routes through this skill. code-implementer agents handling bugfix-classified tasks must reference an existing Phase-1 artifact path in their task description before editing any production code. Wire-up happens in wave-executor's routing layer (Wave 3 P1 cross-reference).
Anti-Patterns
- Skipping Phase 1 because "it's obviously a typo" — see Iron Law. The obvious hypothesis goes in Phase 1 as a hypothesis to verify, not a fix to apply.
- Multiple hypotheses in "Hypothesized root cause" — pick ONE. Lower confidence rating if uncertain. "Either X or Y" means you don't know yet.
- Writing fix code in Phase 1 — fixes belong in Phase 4. Investigation and implementation are separate acts.
- Skipping the artifact write — the artifact IS the work product. No artifact = no investigation.
- Fixing the symptom — the error location is the symptom; the root cause may be several frames upstream. Trace backward.
- Assuming the most recent commit is the cause — check
git log -20and the diff before assuming recency.
See Also
skills/discovery/SKILL.md— discovery surfaces problems; debug investigates them.claude/rules/verification-before-completion.md— once fixed, verify fresh (issue #38)agents/code-implementer.md— references this skill for bugfix-classified tasks