Skip to main content
AI/MLjmagly

confusable-unicode-audit

Detect bidi controls, zero-width characters, mixed-script identifiers, and homoglyph risks in source and release metadata

Stars
141
Source
jmagly/aiwg
Updated
2026-05-31
Slug
jmagly--aiwg--confusable-unicode-audit
View on GitHubRaw SKILL.md

// install — copy + paste into any project

mkdir -p .claude/skills && curl -fsSL https://raw.githubusercontent.com/jmagly/aiwg/HEAD/agentic/code/frameworks/security-engineering/skills/confusable-unicode-audit/SKILL.md -o .claude/skills/confusable-unicode-audit.md

Drops the SKILL.md into .claude/skills/confusable-unicode-audit.md. Works with Claude Code, Cursor, and any agent that loads SKILL.md files from .claude/skills/.

Confusable Unicode Audit

Detect Trojan Source and homoglyph risks in source files, dependency names, and release metadata. This enforces no-confusable-unicode and maps curl Practice 8 into an AIWG control.

Detection Targets

  • Bidirectional controls: U+202A through U+202E, U+2066 through U+2069.
  • Zero-width characters: U+200B through U+200F, U+FEFF.
  • Non-ASCII identifiers in source code.
  • Mixed-script identifiers, especially Latin plus Cyrillic or Greek.
  • Package/dependency names containing non-ASCII or confusable characters.
  • Optional metadata scan: commit subject, PR titles, release notes.

Allowlist

Legitimate non-ASCII is declared in .aiwg/security/confusable-unicode-allowlist.yaml:

version: 1
allow:
  - path: "docs/i18n/**"
    reason: "localized documentation"
  - identifier: "naive_bayes"
    codepoints: ["U+00EF"]
    reason: "historical exported API spelling"

Output

Reports show file, line, column, Unicode code point, character name, and remediation. Bidi and zero-width controls are always HIGH severity.

References

  • agentic/code/frameworks/security-engineering/rules/no-confusable-unicode.md
  • Unicode TR39
  • Trojan Source / CVE-2021-42574