Safety Scan
Scan content for prompt injection, jailbreak attempts, and unsafe patterns.
When to use
Before processing untrusted input (user submissions, API payloads, webhook data), scan it to detect prompt injection, adversarial content, or policy violations.
Steps
- Quick safety check — call
mcp__claude-flow__aidefence_is_safewith the input text for a boolean safe/unsafe result - Deep analysis — call
mcp__claude-flow__aidefence_analyzefor detailed threat classification and confidence scores - Full scan — call
mcp__claude-flow__aidefence_scanfor comprehensive multi-layer scanning - Train defenses — call
mcp__claude-flow__aidefence_learnwith confirmed threats to improve detection - View stats — call
mcp__claude-flow__aidefence_statsfor detection rates and false positive metrics
Threat categories
- Prompt injection (direct and indirect)
- Jailbreak attempts
- Data exfiltration patterns
- Instruction override attacks
- Social engineering prompts