Context Loading Protocol

Token-budget reference (CLAUDE.md baseline, full-load ceiling, per-agent and per-skill costs) is the Baseline Budget section of CLAUDE.md. This skill is the runtime procedure; don't duplicate the table here — it goes stale.

Constraints

Never load all agents upfront; load only the primary agent for each phase.
Keep total context below 40% of the model's window at all times.
Load agents on demand when their phase begins, not speculatively.
Use tool-based file reads (Read); do not paste file contents into the prompt.

Loading Decision Procedure

Step 1: Classify the task

Profile	Description	Example
Simple/Single	One agent, no skills	"Fix this typo", "Write a unit test"
Standard/Single	One agent + 1–2 skills	"Implement this feature using hexagonal architecture"
Multi-Agent	2–3 agents coordinating	"Design and implement a new API endpoint"
Complex/Multi	3+ agents + skills	"Build a new bounded context with full test coverage"

Step 2: Select agents

Load the minimum set:

Identify the primary agent (owns the deliverable).
Identify supporting agents (input or review).
Do NOT load agents for downstream validation yet — load them when their phase begins.

Order: primary first, then supporting agents one at a time as their phase begins.

Step 3: Select skills

For each loaded agent, check its ## Skills section:

Only load skills relevant to the current task — not all skills the agent references.
Skills shared by multiple loaded agents only need to be loaded once.

Step 4: Calculate token budget

Total = CLAUDE.md baseline
      + conversation history (estimate)
      + agent files (sum selected)
      + skill files (sum selected)
      + expected output (estimate)

Target: total < 40% of the model's context window. For Claude with a 200K window, that's < 80K tokens. The config files are a small fraction; the real budget concern is conversation history + output accumulation over multi-turn tasks.

Step 5: Load via tool-based file reads

Read agents/software-engineer.md
Read skills/hexagonal-architecture/SKILL.md

Do NOT copy file contents into the system prompt or conversation.

Loading Profiles

Pre-computed loading sets for common task types.

Code Implementation

Load: Software Engineer + relevant skill(s)
Defer: QA (load after implementation), Architect (load only if design questions arise)

Architecture Design

Load: Architect + relevant architecture skill(s)
Defer: Software Engineer (load at implementation), QA (load at validation)

Bug Fix

Load: Software Engineer only
Defer: QA (load if regression test needed)

New Feature (full lifecycle)

Three phases, each in a fresh context window with a human review gate between. Each phase's output is a structured progress file in memory/ that onboards the next phase.

Phase	Load	Purpose	Output
1. Research	Orchestrator + sub-agents (exploration)	Understand system, find files, trace data flows	Research progress file
2. Plan	Architect + PM (if needed) + relevant skill(s)	Specify every change: files, snippets, tests	Implementation plan progress file
3. Implement	Software Engineer + QA + skill(s)	Execute the plan; code, tests	Working code + test results

Key rules:

Each phase starts with a fresh context window, loading only the previous phase's progress file.
Human reviews and approves the progress file before the next phase begins.
Sub-agents primarily provide context isolation — they search, read, and return concise findings.
If implementation is large, compact mid-phase: update the plan progress file with completed steps and continue in a fresh context.

Unloading

Since tokens can't be literally removed from context:

Phase transitions — summarize completed phase output into memory/ and start a new conversation for the next phase.
Within a conversation — stop referencing the agent/skill; the orchestrator mentally notes it's no longer active. Use the Context Summarization skill to compress stale content.
Multi-turn accumulation — when conversation history crosses 30% utilization, trigger summarization before loading additional agents.

Anti-patterns

Loading all agents upfront — wastes tokens before any work begins. Load only the primary agent.
Loading all of an agent's skills — most are irrelevant to the specific request.
Never unloading — context grows monotonically until hallucination risk. Summarize and phase-transition.
Loading agents "just in case" — adds cost without value. Load on demand when the phase begins.

Output

Loading plan as one table: selected agents + skills, token costs, estimated total, and utilization percentage against the 40% ceiling. No narration.