Verify LLM Artifacts Findings
Second-pass verification for .beagle/llm-artifacts-review.json. The detection pass optimizes for recall; this pass optimizes for precision so agents do not remove or “clean” code that is still required.
When to run
- After
/beagle-core:review-llm-artifacts(especially full-project scans). - Before
/beagle-core:fix-llm-artifactswhen findings include deletions, dead code, or High risk. - Whenever past runs flagged artifacts that should not have been removed.
Inputs
- Required:
.beagle/llm-artifacts-review.jsonfrom a completed review. - Optional:
$ARGUMENTS—--priority-only(verifydead_codeand anyfix_actionofdeletefirst; then others),--id N(single finding id).
If the review file is missing, exit with: Run /beagle-core:review-llm-artifacts first.
Prerequisite skills
- Load
Skill(skill: "beagle-core:review-verification-protocol")— general anti–false-positive discipline. - Load
Skill(skill: "beagle-core:llm-artifacts-detection")— category criteria for what counts as a real issue.
Instructions
Hard gates
Objective pass conditions before you claim verification is done:
- Input parse: The JSON load command in step 1 exits 0 (no traceback). Pass: valid JSON on disk at
.beagle/llm-artifacts-review.json. - Evidence before verdict: For each finding you adjudicate, you have applied references/verification-checklist.md for its
category(or documented why the category is N/A) and recorded matching strings inchecks_performed. Pass: nostatuswithout at least one checklist-backed check or an explicit N/A note innotes. - Output contract: After writing
.beagle/llm-artifacts-verification.json, the validate command in step 4 exits 0;summarycounts equal the number ofresultsentries bystatus; everyidmatches the source report. Pass: schema-valid JSON and consistent ids/counts.
1. Load and validate JSON
python3 -c "import json; json.load(open('.beagle/llm-artifacts-review.json'))"
Pass: command exits 0.
Record git_head and scope from the report. If the working tree no longer matches (optional strict mode: compare to git rev-parse HEAD), warn that line numbers may drift.
2. Order findings
Default order:
category == "dead_code"orfix_action == "delete"orrisk == "High"- Remaining findings by
(risk descending, id ascending)
With --priority-only, stop after processing category dead_code and all fix_action: delete (still write full output for those processed).
3. Verify each finding
For each finding, follow references/verification-checklist.md.
Minimum evidence per finding:
- Read the file at the cited location and enough context to judge (parent symbol, imports).
- For unused/dead claims: search the repo (symbols, exports, string hooks) unless the issue is purely stylistic with no removal.
Pass: checks_performed lists only checks you actually ran (e.g. read_symbol, ripgrep_symbol); notes cite the decisive observation.
Assign one status:
status |
Meaning |
|---|---|
confirmed_issue |
The finding is valid; acting on it is appropriate. |
false_positive |
The finding should be discarded; do not auto-fix. |
inconclusive |
Needs human or product context; treat like risky in fix-llm-artifacts. |
Set confidence: high | medium | low based on how direct the evidence was.
4. Write output
Create .beagle if needed. Write .beagle/llm-artifacts-verification.json:
{
"version": "1.0.0",
"created_at": "2026-04-19T12:00:00Z",
"source_report": ".beagle/llm-artifacts-review.json",
"source_git_head": "<from review>",
"review_scope": "all|changed",
"results": [
{
"id": 1,
"status": "confirmed_issue|false_positive|inconclusive",
"confidence": "high|medium|low",
"checks_performed": ["read_symbol", "ripgrep_symbol", "export_trace"],
"notes": "1-3 sentences of evidence"
}
],
"summary": {
"confirmed_issue": 0,
"false_positive": 0,
"inconclusive": 0
}
}
Validate the file you wrote:
python3 -c "import json; json.load(open('.beagle/llm-artifacts-verification.json'))"
Pass: command exits 0; re-open the file and confirm summary matches results (count each status).
5. Summarize for the user
Print a short markdown table: id, category, original one-line description, verdict, confidence.
End with:
- Counts of confirmed vs false positive vs inconclusive.
- Recommendation: run
fix-llm-artifactsonly on confirmed (see that skill when verification file is present).
Rules
- Do not invent new issues; only adjudicate existing
findings[]entries. - Prefer
inconclusiveoverconfirmed_issuewhen removal could break dynamic or cross-repo usage. - Preserve finding
idvalues exactly as in the source report.
Integration
fix-llm-artifacts: When this file exists, use it to skipfalse_positiveids and to treatinconclusivelike risky fixes.fix_actioncustody: Thefix_actionfield (refactor/delete/simplify/extract) is emitted byreview-llm-artifactsand consumed byfix-llm-artifactsas a risk gate; verification carries it through unchanged and does not re-validate it.