Verify LLM Artifacts Findings

Second-pass verification for .beagle/llm-artifacts-review.json. The detection pass optimizes for recall; this pass optimizes for precision so agents do not remove or “clean” code that is still required.

When to run

After /beagle-core:review-llm-artifacts (especially full-project scans).
Before /beagle-core:fix-llm-artifacts when findings include deletions, dead code, or High risk.
Whenever past runs flagged artifacts that should not have been removed.

Inputs

Required: .beagle/llm-artifacts-review.json from a completed review.
Optional: $ARGUMENTS — --priority-only (verify dead_code and any fix_action of delete first; then others), --id N (single finding id).

If the review file is missing, exit with: Run /beagle-core:review-llm-artifacts first.

Prerequisite skills

Load Skill(skill: "beagle-core:review-verification-protocol") — general anti–false-positive discipline.
Load Skill(skill: "beagle-core:llm-artifacts-detection") — category criteria for what counts as a real issue.

Instructions

Hard gates

Objective pass conditions before you claim verification is done:

Input parse: The JSON load command in step 1 exits 0 (no traceback). Pass: valid JSON on disk at .beagle/llm-artifacts-review.json.
Evidence before verdict: For each finding you adjudicate, you have applied references/verification-checklist.md for its category (or documented why the category is N/A) and recorded matching strings in checks_performed. Pass: no status without at least one checklist-backed check or an explicit N/A note in notes.
Output contract: After writing .beagle/llm-artifacts-verification.json, the validate command in step 4 exits 0; summary counts equal the number of results entries by status; every id matches the source report. Pass: schema-valid JSON and consistent ids/counts.

1. Load and validate JSON

python3 -c "import json; json.load(open('.beagle/llm-artifacts-review.json'))"

Pass: command exits 0.

Record git_head and scope from the report. If the working tree no longer matches (optional strict mode: compare to git rev-parse HEAD), warn that line numbers may drift.

2. Order findings

Default order:

category == "dead_code" or fix_action == "delete" or risk == "High"
Remaining findings by (risk descending, id ascending)

With --priority-only, stop after processing category dead_code and all fix_action: delete (still write full output for those processed).

3. Verify each finding

For each finding, follow references/verification-checklist.md.

Minimum evidence per finding:

Read the file at the cited location and enough context to judge (parent symbol, imports).
For unused/dead claims: search the repo (symbols, exports, string hooks) unless the issue is purely stylistic with no removal.

Pass: checks_performed lists only checks you actually ran (e.g. read_symbol, ripgrep_symbol); notes cite the decisive observation.

Assign one status:

`status`	Meaning
`confirmed_issue`	The finding is valid; acting on it is appropriate.
`false_positive`	The finding should be discarded; do not auto-fix.
`inconclusive`	Needs human or product context; treat like risky in `fix-llm-artifacts`.

Set confidence: high | medium | low based on how direct the evidence was.

4. Write output

Create .beagle if needed. Write .beagle/llm-artifacts-verification.json:

{
  "version": "1.0.0",
  "created_at": "2026-04-19T12:00:00Z",
  "source_report": ".beagle/llm-artifacts-review.json",
  "source_git_head": "<from review>",
  "review_scope": "all|changed",
  "results": [
    {
      "id": 1,
      "status": "confirmed_issue|false_positive|inconclusive",
      "confidence": "high|medium|low",
      "checks_performed": ["read_symbol", "ripgrep_symbol", "export_trace"],
      "notes": "1-3 sentences of evidence"
    }
  ],
  "summary": {
    "confirmed_issue": 0,
    "false_positive": 0,
    "inconclusive": 0
  }
}

Validate the file you wrote:

python3 -c "import json; json.load(open('.beagle/llm-artifacts-verification.json'))"

Pass: command exits 0; re-open the file and confirm summary matches results (count each status).

5. Summarize for the user

Print a short markdown table: id, category, original one-line description, verdict, confidence.

End with:

Counts of confirmed vs false positive vs inconclusive.
Recommendation: run fix-llm-artifacts only on confirmed (see that skill when verification file is present).

Rules

Do not invent new issues; only adjudicate existing findings[] entries.
Prefer inconclusive over confirmed_issue when removal could break dynamic or cross-repo usage.
Preserve finding id values exactly as in the source report.

Integration

fix-llm-artifacts: When this file exists, use it to skip false_positive ids and to treat inconclusive like risky fixes.
fix_action custody: The fix_action field (refactor/delete/simplify/extract) is emitted by review-llm-artifacts and consumed by fix-llm-artifacts as a risk gate; verification carries it through unchanged and does not re-validate it.