False-Positive Reduction (hybrid joern + LLM)
Purpose
Transform a stream of unified findings into a disposition register that the exec-report-generator can trust. Every finding gets a verdict (true_positive | likely_true_positive | uncertain | likely_false_positive | false_positive), a reachability trace, an exploitability score, and a reachability_source tag (joern-cpg or llm-fallback).
The skill's job is to remove noise without suppressing real issues. False positives waste analyst attention; missed true positives get someone fired.
Six-stage rubric (applied in order; each stage can downgrade severity or change verdict)
Lifted from the opus_repo_scan_test reference's § analyze-11 framework with extensions for the disposition-register output format. Stage 0 is new: a self-adversarial pre-pass that sharpens Stage 1 and strengthens the audit trail.
Stage 0 — Devil's advocate
Question: What is the strongest argument that this finding is NOT a vulnerability?
The agent generates a counter-argument before applying the rubric. This is not a skip gate — all five subsequent stages still run. The purpose is twofold:
- Sharpen Stage 1: a strong counter-argument gives Stage 1 a concrete hypothesis to test (is the path actually dead / test-only?) rather than an open-ended search.
- Strengthen the audit trail: a
true_positivethat explicitly refuted a counter-argument is more trustworthy than one that never examined the counter-case. A well-reasonedfalse_positiveis more trustworthy than a silent discard.
Counter-argument prompts:
- Framework/runtime protection: does the tech stack have a built-in prevention for this class (ORM parameterization, template auto-escaping, TLS termination at the LB)?
- Trusted caller: is this code only reachable from internal, trusted, or admin-only paths?
- Non-production context: is the file a migration, test fixture, seed script, or utility that RECON's
entry_pointsdon't include? - Rule pattern noise: does this rule commonly fire on intentional non-exploitable configurations?
Disposition rules:
- Strong counter-argument →
da_strong: true; Stage 1 tests the hypothesis - Weak / no counter-argument →
da_strong: false; Stage 1 performs open-ended reachability search da_strong: true+ Stage 1 confirms (unreachable) →false_positive; both arguments cited in rationaleda_strong: true+ Stage 1 disproves (reachable) → rejected counter-argument cited intrue_positiverationale
Stage 1 — Reachability
Question: Is this code executed in production at all?
Disposition rules:
- Dead code (no inbound call graph from any entry point) →
verdict: false_positive, severity →infopresentational. - Test-only paths (only reached from test code) →
verdict: likely_false_positive, severity → one level down (CRITICAL → HIGH, HIGH → MEDIUM). - Feature-flagged-off in all production configs → one level down, verdict stays
true_positive. - Reached from production entry point → no change; record the entry point in
reachability.rationale.
Joern-present mode: reachability is computed from the CPG by tracing back from the finding location to HTTP/CLI/lambda/cron entry points.
Joern-absent mode: the agent reasons from RECON's entry_points and security_surface fields, plus grep over the call sites. Tag each entry with reachability_source: llm-fallback.
Stage 2 — Environment context
Question: Could deployed configuration override the committed value, making the finding inert?
Disposition rules:
- Confirmed override at deploy time (e.g. env var in
values.yamlor Helm chart overrides a committed default) → one level down,verdict: likely_true_positive(the committed value is still a weak default). - No override found → full severity, verdict unchanged.
The agent consults docker-compose*.yml, values.yaml, helmfile.yaml, k8s/*.yaml, and any CI-scoped env vars discoverable in .github/workflows/* or GitLab equivalents.
Stage 3 — Compensating controls
Question: Is there a control in the repo that mitigates this finding's impact?
Disposition rules:
- Confirmed in-repo control (WAF rule, rate limiter, input validation layer upstream of the finding, idempotency key check, etc.) → one level down, verdict
likely_true_positivewith the control's file:line in the rationale. - Assumed-only ("we have a WAF in prod" — not verifiable from the repo) → no change.
- Absent → no change.
Stage 4 — Deduplication
Question: Is this the same root cause as another finding already in the register?
Disposition rules:
- Same rule_id + same value (e.g. same secret SHA-256) across multiple files → collapse to ONE finding with a
locationsarray; emit one disposition entry referencing the primary finding. - Same rule_id + different values (e.g. 14 different hardcoded passwords) → separate findings, NOT deduplicated.
- Different rule_ids that describe the same root cause (e.g.
semgrep.python.hardcoded-password+gitleaks.generic.aws-access-keyfiring on the same line) → dedupe keeping the higher-priority source per the static-analysis skill's priority order.
Stage 5 — Severity calibration
Question: Is the severity consistent across similar findings?
Disposition rules:
- Ensure two findings with identical exploitability profiles receive identical presentational severity across the run.
- If a finding falls between two severity levels, prefer the higher — better to over-flag than miss. Use exploitability score (0–10) to break ties deterministically per the severity-mapping table in the primitives contract.
Exploitability scoring (0–10)
Per-finding score determines presentational severity bucket (see primitives contract § Severity mapping). Factors:
| Factor | Weight | Example |
|---|---|---|
| Network reachability | +3 | Finding is in an HTTP handler on a public route |
| Authentication bypass | +3 | Finding bypasses an auth check (not merely missing one) |
| Credential exposure | +2 | Finding leaks a credential an attacker could use elsewhere |
| Input-controlled | +2 | An attacker can influence the vulnerable value via request parameters |
| Persistent | +1 | Finding creates persistent state (stored XSS, stored credentials) |
| Privileged context | +1 | Finding runs in an elevated context (root, admin route) |
| Cascading | +1 | A successful exploit unlocks further access (lateral movement) |
Rationale field is mandatory (min 20 chars per schema). Summarize which factors applied and why.
Joern integration (when present)
If joern is on PATH, invoke via tools/reachability.sh (build commands + CPG cache details are in the script). Stage 1 reachability queries the CPG for paths from the finding location back to entry points; cite the entry point path in reachability.rationale.
LLM-fallback mode (joern absent)
Stages 1–3 use judgment rather than CPG data; Stages 4–5 work unchanged. Every entry in fallback mode carries reachability_source: llm-fallback. The exec-report-generator detects this and emits a banner — see agents/exec-report-generator.md § Section 0 banners.
Output
A DispositionRegister object per plugins/agentic-dev-team/knowledge/schemas/disposition-register-v1.json. Required entry fields and required envelope fields (schema_version, generated_at, dispositioner, reachability_tool, entries[]) are defined in the schema. Written to memory/disposition-<assessment-slug>.json.
Related
agents/fp-reduction.md— the opus agent that implements this skillplugins/agentic-dev-team/knowledge/security-primitives-contract.md— disposition register schema + severity mappingplugins/agentic-dev-team/knowledge/schemas/disposition-register-v1.json— JSON Schematools/reachability.sh— joern wrapper (installed if joern is on PATH)