Phases run in order. Skip a phase if you already have the information it produces. Phase 3 runs only in fix mode.
Mode: plan vs fix
This skill runs in one of two modes:
- Fix mode (default): produce a plan, then apply it.
- Plan mode: produce a plan and stop, for human review.
Use plan mode when the user asks for a plan, an investigation, a triage report, or says "don't fix yet" / "just plan it". Otherwise default to fix mode. Both modes share the same diagnosis path; the plan is the artifact you hand to a reviewer (plan mode) or to yourself (fix mode) before editing code.
Phase 1: Classify Test Type
Determine the test type from the user's input before doing anything else. The type dictates the investigation path.
| Type | Signals |
|---|---|
| E2E (Playwright) | .spec.ts file, mentions Playwright, has a GitHub Actions run URL with a playwright-llm-report artifact, browser-level errors |
| Service (NestJS integration) | Spins up a NestJS app, uses supertest or similar HTTP testing, MongoDB/Redis connection errors, *.service.spec.ts or test descriptions mentioning "service test" |
| React component | Uses @testing-library/react, render(), screen.*, .test.tsx file, React act() warnings |
| Unit | Pure logic tests, .test.ts file, no app bootstrap or DOM, Jest/Vitest matchers on plain functions or classes |
If the type is ambiguous, check the test file extension and imports to confirm.
Phase 1b: Check for Existing Fixes
Before investigating, check whether someone (or another agent) has already fixed this flake.
- Search open PRs with the
flaky-test-fixlabel that touch the failing test file or its surrounding code. Use GitHub search scoped to the repo:- Search PRs labeled
flaky-test-fixfor the test file name or test directory - Review the PR's changes to assess whether they address the same flake pattern with reasonable confidence — if so, stop and report it to the user rather than opening a duplicate fix
- If the PR only partially addresses the flake or targets a different root cause, note it and proceed with investigation
- Search PRs labeled
- Check recent commits on
mainthat touch the failing test file or its surrounding code:git log --oneline -20 origin/main -- <test-file-path>and also check the parent directory or related source files- Read the commit messages — if one clearly fixes the same flake pattern, stop and report it to the user
If an existing fix is found, report:
- The PR number/URL or commit hash
- A brief summary of what it addresses
- Whether it fully covers the current flake or only partially
If no existing fix is found, proceed to Phase 2.
Phase 2: Produce a plan
Follow references/plan.md. It walks investigation, diagnosis, evidence gathering, and the fix decision tree, and produces a structured plan with confidence score.
If the plan's confidence is less than 5/5, it must include the frontend and/or backend observability changes needed to reach 5/5 confidence next time. The plan may request changes across multiple repositories; assume we have access to all code.
If you are in plan mode, present the plan and stop here.
Phase 3: Apply the plan (fix mode only)
Follow references/fix.md. It takes the plan from Phase 2, applies the proposed fix, searches for sibling anti-patterns, and verifies. PR creation is out of scope -- if the user later opens one (or invokes a PR-shipping skill), label it flaky-test-fix.