Debugging
If .khuym/onboarding.json is missing or stale for the current repo, stop and invoke khuym:using-khuym before continuing.
Resolve blockers and failures systematically. Do not guess. Triage first, reproduce second, diagnose third, fix fourth.
When To Use This Skill
- a build fails
- a test fails
- a runtime crash or exception occurs
- an integration breaks
- a worker is blocked by dependencies or reservations
- reviewing or executing hands off with a failure that needs root-cause analysis
Step 1: Triage
Classify the issue before you investigate it.
| Type | Signals |
|---|---|
| Build failure | compiler error, type error, missing module, bundler failure |
| Test failure | assertion mismatch, timeout, snapshot diff, flake |
| Runtime error | crash, uncaught exception, undefined behavior |
| Integration failure | HTTP 4xx/5xx, auth failure, env mismatch, schema mismatch |
| Worker blocker | circular bead dependency, conflicting reservations, no safe execution path |
Output a one-line classification:
[TYPE] in [component]: [symptom]
Step 2: Reproduce
Check history/learnings/critical-patterns.md first. If a matching pattern already exists, start from that fix path.
If not, rerun the exact failing command and capture the exact output.
Examples:
npm run build 2>&1 | tee /tmp/debug-output.txt
pytest tests/specific_test.py -v 2>&1 | tee /tmp/debug-output.txt
Run it twice. If it is intermittent, treat it as a flaky failure rather than a deterministic one.
Step 3: Diagnose
Work through these checks in order.
3a. Read the relevant files
Use the failing output to identify the smallest relevant slice. Do not read the entire repo.
3b. Check recent changes
git log --oneline -20
git blame <file> -L <line>,<line>
git diff HEAD~3 -- <file>
3c. Check bead intent
br show <bead-id>
Ask whether the code drifted from the bead, or the bead itself is wrong.
3d. Check locked decisions
Read the relevant CONTEXT.md entries and confirm the implementation did not violate a locked decision.
3e. Check local reservation state
node .codex/khuym_reservations.mjs list --active-only --json
Look for:
- overlapping reservations
- leaked reservations from a finished worker
- a worker that still holds files after a blocker or timeout
Also inspect .khuym/state.json for the active worker list.
3f. Check recent worker results in the parent thread
If this debugging pass was spawned from swarming, use the parent-thread context and the saved worker status in .khuym/state.json as the coordination surface. Do not assume an external inbox exists.
3g. Write the root cause sentence
Do not proceed until you can write:
Root cause:
<file>:<line>—<what is wrong and why>
If you cannot write that sentence, you do not have the root cause yet.
Step 4: Fix And Verify
Small fix
If the fix is obvious and low risk:
- implement directly
- run the exact failing command again
- run the next-wider verification that protects against regressions
Larger fix
If the fix is cross-cutting or changes the intended behavior:
br create "Fix: <root cause summary>" -t task --blocks <original-bead-id>
Then implement against that new bead.
Decision violation
If a locked decision was violated:
- do not silently "fix" it by changing behavior on your own
- return or report a blocker summary to the parent thread or user
- propose the conservative fix that honors
CONTEXT.md
Reservation-related fixes
If the failure is caused by leaked or stale reservations:
- inspect the holder
- release the reservation only if the holder is clearly done or abandoned
- note the release explicitly in your final report
Use:
node .codex/khuym_reservations.mjs release --agent "<codex-name>" --bead "<bead-id>" --json
node .codex/khuym_reservations.mjs sweep --json
Verify
The original failing command must pass cleanly. If it still fails, return to diagnosis.
Step 5: Report
If you are inside a swarm, return the debugging result to the parent thread using a clear status heading:
[DONE]if the fix is complete and verified[BLOCKED]if the problem needs a decision, another worker release, or a broader redesign
At minimum include:
- root cause sentence
- fix summary
- verification result
- reservation impact, if any
- next action needed
If you are working directly for the user, give the same information in the final response.
Step 6: Learn
If this exposed a new reusable failure pattern, write a debug note for compounding so the lesson can be promoted later.
If the failure matched an existing pattern from critical-patterns.md, verify whether that guidance still works. If not, flag it for compounding.
Blocker-Specific Protocol
When a worker is stuck rather than code-broken:
- inspect cycles and dependencies:
bv --robot-insights 2>/dev/null | jq '.Cycles' - inspect local reservations:
node .codex/khuym_reservations.mjs list --active-only --json - determine whether the worker is:
- waiting on another bead
- blocked by an overlapping reservation
- blocked by a real product decision
If it is only waiting, return [BLOCKED] with the dependency or reservation holder and stop.
If it is a real dead-end, return [BLOCKED] with concrete options for the parent or user.
Do not spin.
Red Flags
- fixing symptoms instead of root cause
- skipping reproduction
- ignoring
critical-patterns.md - patching around a locked decision violation
- reporting success without rerunning the original failing command
- forgetting to account for local reservations during swarm debugging
Quick Reference
| Situation | First action |
|---|---|
| Build fails | rerun the exact build command |
| Test fails | rerun the exact test and capture assertion output |
| Runtime crash | read the stack trace and find the first line in your code |
| Integration error | check env/config, then the real response body |
| Worker stuck | inspect bv plus local reservations |
| Recurring issue | check history/learnings/critical-patterns.md first |