Skip to main content
AI/MLruvnet

safety-scan

Scan inputs for prompt injection, unsafe content, and adversarial attacks using AIDefence

Stars
56,726
Source
ruvnet/claude-flow
Updated
2026-05-31
Slug
ruvnet--claude-flow--safety-scan
View on GitHubRaw SKILL.md

// install — copy + paste into any project

mkdir -p .claude/skills && curl -fsSL https://raw.githubusercontent.com/ruvnet/claude-flow/HEAD/plugins/ruflo-aidefence/skills/safety-scan/SKILL.md -o .claude/skills/safety-scan.md

Drops the SKILL.md into .claude/skills/safety-scan.md. Works with Claude Code, Cursor, and any agent that loads SKILL.md files from .claude/skills/.

Safety Scan

Scan content for prompt injection, jailbreak attempts, and unsafe patterns.

When to use

Before processing untrusted input (user submissions, API payloads, webhook data), scan it to detect prompt injection, adversarial content, or policy violations.

Steps

  1. Quick safety check — call mcp__claude-flow__aidefence_is_safe with the input text for a boolean safe/unsafe result
  2. Deep analysis — call mcp__claude-flow__aidefence_analyze for detailed threat classification and confidence scores
  3. Full scan — call mcp__claude-flow__aidefence_scan for comprehensive multi-layer scanning
  4. Train defenses — call mcp__claude-flow__aidefence_learn with confirmed threats to improve detection
  5. View stats — call mcp__claude-flow__aidefence_stats for detection rates and false positive metrics

Threat categories

  • Prompt injection (direct and indirect)
  • Jailbreak attempts
  • Data exfiltration patterns
  • Instruction override attacks
  • Social engineering prompts