PR Review — 3-Phase Orchestrator
End-to-end PR review workflow that orchestrates phases to explore independent fix alternatives and produce a recommendation.
Trigger phrases: "review PR #XXXXX", "work on PR #XXXXX", "fix issue #XXXXX"
🚨 NEVER use
gh pr review --approveor--request-changes. AI agents must NEVER post review comments. 🚨 DO NOT post any comments to the PR. This skill only produces output files inCustomAgentLogsTmp/PRState/.
Overview
Gate (pre-run) → Already completed by Review-PR.ps1 before this skill runs
Phase 1: Pre-Flight → Gather context, classify files, code review → .github/pr-review/pr-preflight.md
Phase 2: Try-Fix → ⚠️ MANDATORY multi-model exploration → invoke try-fix skill (×4 models)
Phase 3: Report → Write review recommendation → .github/pr-review/pr-report.md
Gate and Branch setup are handled by
Review-PR.ps1before this skill is invoked. The gate result is passed in the prompt. Do NOT re-run gate verification.
All phases write output to: CustomAgentLogsTmp/PRState/{PRNumber}/PRAgent/{phase}/content.md
Pre-Flight also writes: CustomAgentLogsTmp/PRState/{PRNumber}/PRAgent/pre-flight/code-review.md
Critical Rules
- ❌ Never run
git checkoutorgit switchto change branches — stay on the review branch set up by the caller - ❌ Never stop and ask the user — use best judgment to skip blocked phases and continue
- ❌ Never mark a phase complete with pending fields
- ❌ Never skip Phase 2 multi-model exploration — it is MANDATORY for every review, no exceptions
- ❌ Never run git commands that change branch state during Phases 2-3 (scripts handle file manipulation)
- ❌ Never duplicate phase content — each phase writes ONLY to its own
content.md. Do NOT copy gate results into try-fix or report content files. - ✅ Always create
CustomAgentLogsTmp/output files for every phase - ✅ Always include
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>in any commits - ✅ Always use skills' scripts — don't bypass with manual commands
- ✅ Each phase's
content.mdmust use the exact template from the phase instruction doc — no extra prose
Multi-Model Configuration
Phase 2 uses these 4 AI models (run SEQUENTIALLY — they modify the same files):
| Order | Model |
|---|---|
| 1 | claude-opus-4.6 |
| 2 | claude-opus-4.7 |
| 3 | gpt-5.3-codex |
| 4 | gpt-5.5 |
🚨 MANDATORY: Use mode: "sync" for ALL try-fix task invocations. Never use mode: "background". Background mode causes the orchestrator to move on before the attempt finishes, which means try-fix/content.md is never written and try-fix results are lost from the PR comment. Each try-fix task MUST complete and return its result before you proceed to the next attempt or to the Phase 3 completion checklist.
Environment Blockers
| Blocker Type | Max Retries | Then Do |
|---|---|---|
| Missing tool/driver | 1 install attempt | Skip phase, continue |
| Server errors (500, timeout) | 1 retry | Skip phase, continue |
| Port conflicts | 1 (kill process) | Skip phase, continue |
| Build failures in try-fix | 2 attempts | Skip remaining models, proceed to Report |
| Configuration issues | 1 fix attempt | Skip phase, continue |
Phase 1: Pre-Flight
Read and follow
.github/pr-review/pr-preflight.md
Gather context from the issue, PR, comments, classify changed files, and perform a deep code review using the code-review skill.
Pre-Flight now has two parts:
- Part A (Steps 1–6): Context gathering — read issue, PR, comments, classify files
- Part B (Step 7): Code review — independence-first code analysis using
.github/skills/code-review/SKILL.mdand.github/skills/code-review/references/review-rules.md
Outputs:
pre-flight/content.md— Context + code review summarypre-flight/code-review.md— Full code-review output (findings, blast radius, failure-mode probes, verdict)
Gate: None — always runs.
Why code review runs here: The code-review findings (❌ Errors, ⚠️ Warnings, failure-mode probes, blast radius) become structured hints for Phase 2 (Try-Fix). Instead of each model starting from scratch, they receive concrete code concerns to address, leading to higher-quality fix exploration.
Phase 2: Try-Fix → Invoke try-fix Skill (×4 Models)
Read and follow
.github/skills/try-fix/SKILL.md
⚠️ THIS PHASE IS MANDATORY. YOU MUST NEVER SKIP IT. NO EXCEPTIONS.
Even if the PR's fix looks correct and Gate passed, you MUST still run all 4 models to explore alternative approaches. The purpose is to find the BEST fix, not just validate one.
🚨 CRITICAL: try-fix is Independent of PR's Fix
"Independent" means each model explores a different fix approach from the PR's fix — not that models are isolated from code-review context. Code-review findings are provided as advisory background to improve fix quality.
The purpose is NOT to re-test the PR's fix, but to:
- Generate independent fix ideas — What would YOU do to fix this bug?
- Test those ideas empirically — Actually implement and run tests
- Compare with PR's fix — Is there a simpler/better alternative?
- Learn from failures — Record WHY failed attempts didn't work
Checklist (you MUST complete ALL of these)
- Attempt 1 launched with claude-opus-4.6
-
try-fix/content.mdupdated with attempt 1 result - Attempt 2 launched with claude-opus-4.7
-
try-fix/content.mdupdated with attempt 2 result - Attempt 3 launched with gpt-5.3-codex
-
try-fix/content.mdupdated with attempt 3 result - Attempt 4 launched with gpt-5.5
-
try-fix/content.mdupdated with attempt 4 result - Cross-pollination round completed (all models queried)
- Best fix selected with comparison table
Round 1: Independent Exploration
For each model, invoke try-fix skill via a general-purpose task agent with that model:
prompt: |
Invoke the try-fix skill for PR #XXXXX:
- problem: {bug description from Pre-Flight}
- platform: {platform from Platform Selection}
- test_command: {test command from detected test type — use BuildAndRunHostApp.ps1 for UITest, Run-DeviceTests.ps1 for DeviceTest, dotnet test for UnitTest}
- target_files:
- src/{area}/{file1}.cs
- src/{area}/{file2}.cs
- hints: |
Code review found the following concerns (advisory — use to inform your approach, not as a checklist):
Errors:
- {❌ Error finding 1 with file:line reference}
# Include warnings ONLY if relevant to the root cause:
# Warnings:
# - {⚠️ Warning — omit if unrelated to root cause}
Failure modes:
- {Failure mode 1}: {What happens in this scenario}
Blast radius: {Summary — e.g., "Runs for ALL toolbar items at startup, not just badged ones"}
Code review verdict: {LGTM / NEEDS_CHANGES / NEEDS_DISCUSSION} (confidence: {high/medium/low})
Generate ONE independent fix idea. Review the PR's fix first to ensure your approach is DIFFERENT.
"Independent" means exploring a different fix approach — the code review context above is background
information to help you make better decisions, not a constraint on your exploration.
Include code review context in the hints field (try-fix's documented optional input). If Pre-Flight code review found no issues, use hints: "Code review found no issues (verdict: LGTM)". If code review was SKIPPED, omit the hints field entirely.
Selectivity: Only include ❌ Error findings and failure-mode probes that are relevant to the bug being fixed. Omit 💡 Suggestions. Include ⚠️ Warnings only if directly related to the root cause.
Wait for each to complete before starting the next.
🧹 MANDATORY: Clean up between attempts:
# Restore baseline from previous attempt — this is the ONLY way to restore.
# Do NOT use manual git checkout/restore/reset commands.
pwsh .github/scripts/EstablishBrokenBaseline.ps1 -Restore
📝 MANDATORY: Update try-fix/content.md after EVERY attempt. Do not wait until all attempts are done. After each try-fix attempt completes (pass or fail), immediately write/update CustomAgentLogsTmp/PRState/{PRNumber}/PRAgent/try-fix/content.md with all results so far. This ensures the PR comment always reflects the latest try-fix state, even if a later attempt times out or the agent is interrupted.
Round 2+: Cross-Pollination (MANDATORY)
After Round 1, invoke EACH model via task agent:
"Review PR #XXXXX fix attempts:
- Attempt 1: {approach} - ✅/❌
- Attempt 2: {approach} - ✅/❌
...
Do you have any NEW fix ideas? Reply: 'NEW IDEA: {desc}' or 'NO NEW IDEAS'"
Run any new ideas as additional try-fix attempts. Repeat until all say "NO NEW IDEAS" (max 3 rounds).
Selecting the Best Fix
Compare all passing candidates on:
- Must pass tests — Only consider ✅ PASS candidates
- Simplest solution — Fewer files, fewer lines
- Most robust — Handles edge cases
- Matches codebase style — Consistent with existing patterns
Output File
mkdir -p CustomAgentLogsTmp/PRState/{PRNumber}/PRAgent/try-fix
Write content.md:
### Fix Candidates
| # | Source | Approach | Test Result | Files Changed | Notes |
|---|--------|----------|-------------|---------------|-------|
| 1 | try-fix | {approach} | ✅/❌ | 1 file | {insight} |
| ... | ... | ... | ... | ... | ... |
| PR | PR #XXXXX | {approach} | ✅ PASSED (Gate) | 2 files | Original PR |
### Cross-Pollination
| Model | Round | New Ideas? | Details |
|-------|-------|------------|---------|
| ... | 2 | Yes/No | {idea or "NO NEW IDEAS"} |
**Exhausted:** {Yes/No}
**Selected Fix:** {PR's fix / Candidate #N} — {Reason}
Common Mistakes
- ❌ Looking at PR's fix before generating ideas — generate independently first
- ❌ Running try-fix in parallel — SEQUENTIAL ONLY, always
mode: "sync" - ❌ Using
mode: "background"for try-fix tasks — results will be lost - ❌ Skipping cleanup between attempts — ALWAYS run cleanup commands
- ❌ Declaring exhaustion without querying all 4 models
Phase 3: Report
Read and follow
.github/pr-review/pr-report.md
Deliver the final review recommendation.
🚨 DO NOT post any comments. All output goes to
CustomAgentLogsTmp/PRState/.
Gate: Phases 1-2 must be complete.
Output Directory Structure (MANDATORY)
CustomAgentLogsTmp/PRState/{PRNumber}/PRAgent/
├── pre-flight/
│ ├── content.md # Phase 1 output (context + code review summary)
│ └── code-review.md # Full code-review skill output (findings, blast radius, verdict)
├── gate/
│ └── content.md # Gate output (pr-gate, run separately)
├── try-fix/
│ ├── content.md # Phase 2 summary
│ └── attempt-{N}/ # Per-model attempt
│ ├── baseline.log # Baseline establishment proof
│ ├── approach.md # What was tried
│ ├── result.txt # Pass / Fail / Blocked
│ ├── fix.diff # git diff of changes
│ ├── test-output.log # Full test command output
│ ├── reviewer-findings.json # Inline expert self-review (`[]` if clean) — reflects the FINAL diff (refreshed by Step 7.5 if test loop modified code)
│ ├── reviewer-findings.diff # Snapshot of the diff that the self-review evaluated (used by Step 7.5 to detect drift)
│ └── analysis.md # Why it worked/failed + self-review summary
└── report/
└── content.md # Phase 3 output (pr-report)
Quick Reference
| Phase | Instructions | Key Action | If Blocked |
|---|---|---|---|
| Gate (pre-run) | pr-gate.md |
Verify tests (run by Review-PR.ps1) | Result passed in prompt — if missing, document and continue |
| 1. Pre-Flight | pr-preflight.md |
Read issue + PR context + code review | Skip missing info; if code review fails, set verdict to SKIPPED |
| 2. Try-Fix | try-fix skill (×4) |
4-model exploration with code-review hints (MANDATORY) | Skip failing models, continue |
| 3. Report | pr-report.md |
Write review recommendation | Never skip |
Common Errors and Recovery
| Error | Cause | Fix |
|---|---|---|
ENOENT: no such file on skill |
Dirty working tree from prior attempt | Run cleanup: -Restore + git checkout HEAD -- . + git clean -fd --exclude=CustomAgentLogsTmp/ |
| Dirty working tree before attempt | Prior attempt didn't restore | Same cleanup as above |
| Build errors in unmodified files | Stale state | Cleanup + retry; if still fails, treat as environment blocker |