Context Loading Protocol
Token-budget reference (CLAUDE.md baseline, full-load ceiling, per-agent and per-skill costs) is the Baseline Budget section of CLAUDE.md. This skill is the runtime procedure; don't duplicate the table here — it goes stale.
Constraints
- Never load all agents upfront; load only the primary agent for each phase.
- Keep total context below 40% of the model's window at all times.
- Load agents on demand when their phase begins, not speculatively.
- Use tool-based file reads (Read); do not paste file contents into the prompt.
Loading Decision Procedure
Step 1: Classify the task
| Profile | Description | Example |
|---|---|---|
| Simple/Single | One agent, no skills | "Fix this typo", "Write a unit test" |
| Standard/Single | One agent + 1–2 skills | "Implement this feature using hexagonal architecture" |
| Multi-Agent | 2–3 agents coordinating | "Design and implement a new API endpoint" |
| Complex/Multi | 3+ agents + skills | "Build a new bounded context with full test coverage" |
Step 2: Select agents
Load the minimum set:
- Identify the primary agent (owns the deliverable).
- Identify supporting agents (input or review).
- Do NOT load agents for downstream validation yet — load them when their phase begins.
Order: primary first, then supporting agents one at a time as their phase begins.
Step 3: Select skills
For each loaded agent, check its ## Skills section:
- Only load skills relevant to the current task — not all skills the agent references.
- Skills shared by multiple loaded agents only need to be loaded once.
Step 4: Calculate token budget
Total = CLAUDE.md baseline
+ conversation history (estimate)
+ agent files (sum selected)
+ skill files (sum selected)
+ expected output (estimate)
Target: total < 40% of the model's context window. For Claude with a 200K window, that's < 80K tokens. The config files are a small fraction; the real budget concern is conversation history + output accumulation over multi-turn tasks.
Step 5: Load via tool-based file reads
Read agents/software-engineer.md
Read skills/hexagonal-architecture/SKILL.md
Do NOT copy file contents into the system prompt or conversation.
Loading Profiles
Pre-computed loading sets for common task types.
Code Implementation
- Load: Software Engineer + relevant skill(s)
- Defer: QA (load after implementation), Architect (load only if design questions arise)
Architecture Design
- Load: Architect + relevant architecture skill(s)
- Defer: Software Engineer (load at implementation), QA (load at validation)
Bug Fix
- Load: Software Engineer only
- Defer: QA (load if regression test needed)
New Feature (full lifecycle)
Three phases, each in a fresh context window with a human review gate between. Each phase's output is a structured progress file in memory/ that onboards the next phase.
| Phase | Load | Purpose | Output |
|---|---|---|---|
| 1. Research | Orchestrator + sub-agents (exploration) | Understand system, find files, trace data flows | Research progress file |
| 2. Plan | Architect + PM (if needed) + relevant skill(s) | Specify every change: files, snippets, tests | Implementation plan progress file |
| 3. Implement | Software Engineer + QA + skill(s) | Execute the plan; code, tests | Working code + test results |
Key rules:
- Each phase starts with a fresh context window, loading only the previous phase's progress file.
- Human reviews and approves the progress file before the next phase begins.
- Sub-agents primarily provide context isolation — they search, read, and return concise findings.
- If implementation is large, compact mid-phase: update the plan progress file with completed steps and continue in a fresh context.
Unloading
Since tokens can't be literally removed from context:
- Phase transitions — summarize completed phase output into
memory/and start a new conversation for the next phase. - Within a conversation — stop referencing the agent/skill; the orchestrator mentally notes it's no longer active. Use the Context Summarization skill to compress stale content.
- Multi-turn accumulation — when conversation history crosses 30% utilization, trigger summarization before loading additional agents.
Anti-patterns
- Loading all agents upfront — wastes tokens before any work begins. Load only the primary agent.
- Loading all of an agent's skills — most are irrelevant to the specific request.
- Never unloading — context grows monotonically until hallucination risk. Summarize and phase-transition.
- Loading agents "just in case" — adds cost without value. Load on demand when the phase begins.
Output
Loading plan as one table: selected agents + skills, token costs, estimated total, and utilization percentage against the 40% ceiling. No narration.