test

Input Handling

Treat the current command arguments as this workflow's input. When invoked from a slash command, use the forwarded $ARGUMENTS value.

test: test coverage with risk-aware focus

Description

What — Triage changes into risk tiers, dispatch @tester subagents to fix lint and write risk-appropriate tests (not brute-force 100% line coverage), and commit after each passing batch.
Outcome — All lint issues fixed, surgical test coverage that maximizes confidence while minimizing token cost, incremental commits per batch, artifacts recorded in OUT_DIR.
Philosophy — Test behaviors at boundaries, not implementation details. Prioritize code that can hurt users when it breaks. Skip tests that only measure "was this line executed?" without verifying correctness.

ARGUMENTS Input

Optional scope hint or specific files to focus on.

<ARGUMENTS> $ARGUMENTS </ARGUMENTS>

Instructions

Primary agent plans and verifies; @tester subagents write test code
Maximize parallelism: dispatch multiple @tester agents simultaneously, not sequentially
Primary agent coordinates; subagents execute test writing in parallel batches
No OUT_DIR artifacts — this is a lightweight flow
Risk assessment is inline reasoning, not a classification phase
Test behaviors at boundaries, not implementation details
Skip tests for P3 files (types, configs, simple wrappers)
when committing, —no-verify and eslint-disable, or committing code with eslint-disable, is expressly forbidden without the user’s explicit permission.

Why Risk-Weighted > 100% Line Coverage

100% line coverage is a vanity metric that:

Treats all code equally (a payment handler vs a string formatter)
Tests implementation details (brittle, breaks on refactor)
Creates maintenance burden (tests that slow you down)
Gives false confidence (100% coverage ≠ 100% correctness)
Burns tokens on code that can't break in production

Risk-weighted coverage instead:

Focuses testing effort where bugs cause user pain
Tests behaviors and contracts, not internal wiring
Creates tests that survive refactoring
Catches actual bugs via mutation-resistant assertions
Dramatically reduces token cost while increasing safety

Risk Tier Definitions

P0 — Critical (Must Test Thoroughly)

Identification patterns:

Path contains: auth, payment, security, crypto, session, token
File has @critical JSDoc/comment annotation
Handles: user data mutations, financial transactions, PII, permissions
API handlers for external consumers
Database migration files

Coverage requirements:

100% behavioral coverage (every user-facing outcome has a test)
All error paths tested with specific error assertions
Edge cases for security-sensitive inputs (null, empty, malformed, overflow)
Contract tests for all public APIs (schema validation)
Mutation-resistant assertions (would a bug actually fail this test?)

Test quality bar:

// ✅ GOOD P0 test - tests behavior and catches real bugs
it('rejects payment when card is expired', async () => {
  const result = await processPayment({ card: expiredCard, amount: 100 });
  expect(result.status).toBe('DECLINED');
  expect(result.reason).toBe('CARD_EXPIRED');
  expect(chargeService.charge).not.toHaveBeenCalled(); // Side effect prevented
});

// ❌ BAD test - tests implementation, not behavior
it('calls validateCard', async () => {
  await processPayment({ card, amount: 100 });
  expect(validateCard).toHaveBeenCalledWith(card);
});

P1 — Core (Test Key Behaviors)

Identification patterns:

Main feature components (not utility wrappers)
API route handlers (internal)
State management (stores, reducers, contexts)
Core business logic
Data fetching/caching layers

Coverage requirements:

Happy path coverage for all public functions
Critical error paths (ones users would see)
Contract tests at team boundaries (exported APIs other modules consume)
No need to test internal helper functions
No need to test every code branch, just primary behaviors

Test quality bar:

// ✅ GOOD P1 test - covers the behavior users care about
it('fetches and caches user profile', async () => {
  const profile = await getUserProfile(userId);
  expect(profile.name).toBe('Joe');

  // Second call uses cache
  await getUserProfile(userId);
  expect(api.get).toHaveBeenCalledTimes(1);
});

// ❌ SKIP - internal implementation detail
it('calls normalizeUserData internally', () => { ... });

P2 — Supporting (Test Public Surface Only)

Identification patterns:

Utility functions and helpers
Internal services not exposed to other teams
Formatters, validators, transformers
Hooks that compose other hooks
Adapters and wrappers

Coverage requirements:

Public exported functions: happy path only
Skip internal/private functions entirely
Skip trivial functions (single-line returns, simple compositions)
Only test if the function has logic worth verifying

Test quality bar:

// ✅ GOOD P2 test - public util with actual logic
it('formats currency correctly', () => {
  expect(formatCurrency(1234.5, 'USD')).toBe('$1,234.50');
  expect(formatCurrency(1234.5, 'EUR')).toBe('€1,234.50');
});

// ❌ SKIP - trivial wrapper with no logic
// export const getFullName = (u) => `${u.first} ${u.last}`;

P3 — Low Risk (Skip Testing)

Identification patterns:

TypeScript type definitions (.d.ts)
JSON/YAML configuration files
CSS/SCSS/Tailwind styles
Markdown documentation
Constants and enums (no logic)
Re-export barrels (index.ts that just re-exports)
Simple component wrappers (just pass props through)
Build scripts and tooling config

Coverage requirements:

NO TESTS REQUIRED — Types are the test
Type checking + linting is sufficient
These files cannot break at runtime in ways tests would catch

Test Quality Requirements (All Tiers)

Each test MUST:

Test ONE behavior — Single assertion focus, clear failure message
Use descriptive names — when_[condition]_then_[outcome] or [action]_should_[result]
Assert outcomes, not calls — Verify what happened, not what was invoked
Be refactor-resilient — Test should pass if behavior unchanged, even if internals change
Catch real bugs — Ask: "If I introduced a bug, would this test fail?"

Each test MUST NOT:

Mock implementation details — Don't mock internal functions
Assert on call counts — Unless testing side-effect prevention
Duplicate type coverage — Don't test that TS types are correct
Test framework behavior — Don't test that React renders or Express routes

Mutation Testing Mindset

For every test, ask: "If I changed the implementation to return a wrong value, would this test catch it?"

// ✅ Mutation-resistant — changing the discount calculation would fail this
it('applies 20% discount for premium users', () => {
  expect(calculateTotal({ items: [100], userTier: 'premium' })).toBe(80);
});

// ❌ NOT mutation-resistant — always passes regardless of implementation
it('calls calculateDiscount', () => {
  calculateTotal({ items: [100], userTier: 'premium' });
  expect(calculateDiscount).toHaveBeenCalled();
});

Contract Tests at Team Boundaries

When your code is consumed by other teams/modules, add contract tests:

// API Contract Test
describe('UserAPI contract', () => {
  it('GET /users/:id returns UserResponse schema', async () => {
    const response = await request(app).get('/users/123');
    expect(response.body).toMatchSchema(UserResponseSchema);
  });

  it('returns standard APIError shape on 404', async () => {
    const response = await request(app).get('/users/nonexistent');
    expect(response.status).toBe(404);
    expect(response.body).toMatchSchema(APIErrorSchema);
  });
});

// Event Contract Test
describe('UserCreated event contract', () => {
  it('emits event matching UserCreatedEvent schema', async () => {
    await createUser({ name: 'Test' });
    expect(eventBus.lastEvent).toMatchSchema(UserCreatedEventSchema);
  });
});

Steps

Step 1/4 — Discover Full Working Set and Plan

Action — DiscoverFullWorkingSet:
- Validate commit_id if provided
- Gather: committed changes + staged + unstaged + untracked
- Full Working Set = UNION of all sources
Action — RecordWorkingSet: Write OUT_DIR/working_set.json
Action — BaselineLintFull: Run lint on ALL files in Full Working Set
Action — MapDependencies: Build import/dep snapshot

Step (2/4) - Risk Assessment & Test Plan

Action — InlineRiskCheck: Quick mental triage of changed files

P0 Critical (thorough coverage required):
- Paths containing: auth, payment, security, crypto, session, token
- Handles: user data mutations, financial transactions, PII, permissions
- Has @critical annotation
P1 Core (key behaviors):
- API handlers, feature components, state management, services
P2 Supporting (public surface only):
- Utils, helpers, hooks, formatters
P3 Skip (no tests):
- Type definitions (.d.ts), configs, styles, index barrels, simple wrappers
Action — CreateTestPlan: Write 3-7 bullet test plan
- Format: - [P{tier}] {file}: {behavior to test}
- P0 files get multiple bullets (all behaviors + error paths)
- P1 files get 1-2 bullets (happy path + critical errors)
- P2 files get 1 bullet (public function smoke test)
- P3 files listed as "SKIP — {reason}"
Action - UpdateOUT_DIR/working_set.json with risk tier categorization.

Step (3/4) - Write Tests & Verify

Action — DispatchTestWriter: Spawn MULTIPLE @tester subagents IN PARALLEL
- Parallelization Strategy:
  - Partition test plan items into independent batches (by file or logical grouping)
  - Dispatch one @tester per batch — aim for 3-5 parallel agents for medium scope, up to 8 for large scope
  - Each agent receives: its batch of test plan items, file paths, risk tier context
  - Critical: Use a single message with multiple Task tool calls to launch all agents simultaneously
- Batching Heuristics:
  - P0 files: 1 agent per file (thorough coverage requires focus)
  - P1 files: Group 2-3 related files per agent
  - P2 files: Group 3-5 files per agent (lighter coverage)
- Instruct each: "Write behavioral tests, assert outcomes not calls, mutation-resistant"
- Wait for all agents to complete before proceeding to lint/test verification
Action — RunLint: Execute linter; fix violations
- If lint fails → autofix first, then manual fix
- Else → continue
Action — RunTests: Execute full test suite
- If tests fail → analyze failure, fix via @tester or direct edit
- Else → continue
Action — VerifyQuality: Spot-check 1-2 tests
- Confirm: tests assert behaviors, would catch real bugs, survive refactoring
- If test quality poor → rework via @tester
- Else → continue

Step (4/4) - Commit

Action — CommitPlanningArtifacts: Gather and commit planning/working docs FIRST
- Check for uncommitted files in OUT_DIR/:
  - working_set.json (scope and risk tier categorization)
  - Any other .md or .json artifacts created during this flow
- Check for uncommitted docs in docs/tasks/{branch_name}/ or related planning directories
- If uncommitted planning artifacts exist:
  - Stage all: git add docs/tasks/{branch_name}/ OUT_DIR/
  - Commit: docs(test): add test planning artifacts for {branch_name}
Action — GroupChanges: Organize code changes into logical commits
- Group by: feat/fix/refactor/test/chore
- Tests can be bundled with their feature or separate (your judgment)
Action — CommitAll: Create conventional commits for code changes
- Format: type(scope): description
- Each commit answers: What changed and why?
Action — RenderFooter: Render Next Steps footer using Skill(spectre-guide) skill (contains format template and spectre command options)

Next Steps

See Skill(spectre-guide) skill for footer format and command options.

Success Criteria

Step 1 - Analyze Diff:

Scope identified (files changed) and documented
Behaviors changed listed (not just file names)

Step 2 - Risk Assessment & Test Plan:

Each changed file assigned P0-P3 tier
Test plan created with 3-7 bullets
P3 files explicitly marked SKIP

Step 3 - Write Tests & Verify:

Multiple @tester agents dispatched in parallel (not sequential)
Test plan partitioned into independent batches
All agents launched in single message (parallel tool calls)
P0 files have thorough behavioral coverage
P1 files have key path coverage
P2 files have public surface coverage
P3 files have NO tests (confirmed skipped)
Lint passes
All tests pass
Test quality spot-checked

Step 4 - Commit:

Changes grouped logically
Conventional commit format used
Single Next Steps footer rendered
Next steps guide read and options sourced