Codex Adversarial Review + Double-Check
You are a translator + executor + double-checker. Adversarial review defaults to skepticism — it looks for reasons NOT to ship. Because adversarial hallucinates more than plain review, Phase 4 rigor matters even more than usual.
Execution Contract
This contract overrides default exploration habits. Read it before Phase 1.
| Phase | Allowed | Forbidden |
|---|---|---|
| 1 ANALYZE | test -f/-s/-d, git rev-parse --verify, git branch --list, wc -l/-c, file, echo, printf |
cat, head, tail, git diff, git log -p, git show, git blame, Read, Grep, Glob |
| 2 INVOKE | Bash for companion launch (multi-arg form only — never $ARGUMENTS blob) |
All source reads |
| 3 WAIT | BashOutput |
All source reads, manual polling, ps/kill outside KillShell |
| 4 DOUBLE-CHECK | Read ONLY files/lines Codex cited | Reading whole files "for context"; reading uncited files; inventing citations |
| 5 REPORT + SAVE | Write report file | n/a |
The companion collects the diff itself. Unknown flags are silently
joined into the prompt by the companion (lib/args.mjs:47-49 +
:613-619). Phase 1 whitelist is the only safety net.
Phase 1: Analyze
You are a translator. Use LM intelligence, not regex tables.
Whitelist for this skill: --base <ref>, --scope <auto|working-tree|branch>, --model <slug>, --effort <level>, and positional focus text (natural-language attack hints, e.g., "check for SQL injection in login handler").
--model and --effort route through scripts/apply-codex-config.py to update ~/.codex/config.toml before the companion launches. Two reasons:
--effortis not a registered review flag (handleReviewCommandvalueOptions = ["base", "scope", "model", "cwd"]at:684). Passing--effortdirectly would become silent prompt corruption (references/companion-usage.md §3). Only the config.tomlmodel_reasoning_effortkey reaches the review path.- Consistency + persistence.
--modelIS honored as a flag in v1.0.4 (startThread({ model }),lib/codex.mjs:56-66), but routing it through config.toml keeps every codex-advisor skill identical and lets the value persist for the next session without re-typing.
Rules:
- Meta-instructions addressed to YOU ("한국어로", "빨리", "thoroughly") → obey for your own behavior, never forward.
- Junk, emoji, trailing punctuation on flag values → drop (
--base develop,→base=develop). - Focus text — unlike
/codex-review, adversarial DOES accept it. Collect all non-flag, non-meta tokens and join with spaces. This becomes the positional prompt passed after the flags. Never embed meta-instructions in the focus text. - Unknown flag (e.g.,
--commit,--uncommitted,--wait,--foo) →AskUserQuestionto clarify. Common corrections:--uncommitted→ did you mean--scope working-tree?--commit <sha>→ did you mean--base <sha>~1 --scope branch?--wait/--background→ silent no-ops on adversarial; drop.- Never pass through.
- Duplicate flag →
AskUserQuestionwhich one. - Ambiguous →
AskUserQuestion(interactive) or exit 1 (non-interactive, seereferences/companion-usage.md §9).
Input validation (allowed in Phase 1):
# Replace <literal clean base> with the value from Phase 1 — or skip
# this block entirely if the user gave no --base.
git rev-parse --verify "<literal clean base>" >/dev/null 2>&1 \
|| { echo "Unknown revision: <literal clean base>" >&2; git branch --list | head -20 >&2; exit 1; }
Apply model/effort (if either flag was provided)
Run before Phase 2 so the companion sees the new config.toml:
python3 "${CLAUDE_PLUGIN_ROOT}/scripts/apply-codex-config.py" \
"<literal clean model from Phase 1 or empty>" \
"<literal clean effort from Phase 1 or empty>"
Relay the Model: ... | Effort: ... stdout line verbatim, plus any stderr advisories. config.toml is global — the change affects every Codex invocation until changed again. Flag that to the user when values changed.
If neither flag was provided, still call with two empty strings so the user sees the current values in the same format.
Before Phase 2, also print the Parsed line:
Parsed: base=develop, scope=auto, focus="check SQL injection in login" (meta: "빨리" obeyed)
Order: apply-codex-config.py output first, Parsed line second.
For edge cases, read ${CLAUDE_PLUGIN_ROOT}/references/companion-usage.md §7.
Phase 2: Invoke (Pattern A — Bash run_in_background)
Adversarial shares handleReviewCommand with review (:725, :992-1003), so
--background / --wait are silent no-ops. Use Bash
run_in_background=true.
set -o pipefail
CODEX_COMPANION=$("${CLAUDE_PLUGIN_ROOT}/scripts/resolve-companion.sh") \
|| { echo "Official Codex plugin not found — run /codex-setup" >&2; exit 1; }
mkdir -p "${CLAUDE_PLUGIN_DATA}/tmp"
TS=$(date +%s%N)
OUT_FILE="${CLAUDE_PLUGIN_DATA}/tmp/adversarial-${TS}.json"
ERR_FILE="${CLAUDE_PLUGIN_DATA}/tmp/adversarial-${TS}.log"
echo "OUT_FILE=$OUT_FILE"
echo "ERR_FILE=$ERR_FILE"
# Launch via Bash run_in_background=true.
# Replace <literal ...> with values from Phase 1. Omit the entire line
# for values the user did not provide. Focus text is a positional arg —
# place it AFTER all flags, or omit if empty.
node "$CODEX_COMPANION" adversarial-review --json \
--base "<literal clean base from Phase 1>" \
--scope "<literal clean scope from Phase 1>" \
"<literal clean focus text from Phase 1>" \
> "$OUT_FILE" 2> "$ERR_FILE"
Capture the bash_id and the literal OUT_FILE / ERR_FILE paths.
Re-inject these as literal strings in every subsequent Bash call — shell
variables do not survive across calls.
Phase 3: Wait
Poll with BashOutput every 30 seconds (60s acceptable for very
long reviews). Termination: BashOutput response field
status === "completed". Never match on stdout content.
| Situation | Action |
|---|---|
completed + $OUT_FILE is valid JSON |
Proceed to Phase 4 |
completed + $OUT_FILE empty |
Read $ERR_FILE, categorize per §6, save adversarial-<ts>-failed.md, stop |
completed + non-JSON |
unexpected-format — show stderr verbatim, abort |
| 30 minutes elapsed | wait-timeout — KillShell the bash_id, handle per §6 |
Full error table: ${CLAUDE_PLUGIN_ROOT}/references/companion-usage.md §6.
Phase 4: Double-check
Now — and only now — you may read source code.
Read ${CLAUDE_PLUGIN_ROOT}/references/evaluation.md.
Adversarial findings are intentionally skeptical. Be especially rigorous — adversarial review produces more false positives by design.
Parse $OUT_FILE JSON. For each finding:
- Read ONLY the file:line Codex cited. Never whole files.
- Verify the attack scenario is realistic. Adversarial prompts happily invent implausible failure modes.
- Classify:
- Agree — cited code matches a real vulnerability / bug
- Disagree — cited code does not have the problem described
- Nuance — real issue but Codex overstated severity or missed a mitigating factor
- False Positive (hallucination) — Codex cited a file, function, or line that does not exist in the current source tree. This is the most common failure mode for adversarial.
- Uncited — no concrete file:line. Surface to user as "verification deferred". Never invent citations.
Phase 5: Report + save
mkdir -p "${CLAUDE_PLUGIN_DATA}/reviews"
Success: save to
${CLAUDE_PLUGIN_DATA}/reviews/adversarial-<YYYYMMDD-HHMMSS>.md. Include
Codex output verbatim, per-finding classifications, and a realistic risk
assessment separating genuine concerns from noise.
Failure: save to
${CLAUDE_PLUGIN_DATA}/reviews/adversarial-<YYYYMMDD-HHMMSS>-failed.md
with error category and captured stderr.
Clean up temp files using the literal paths captured in Phase 2:
rm -f "<literal $OUT_FILE path>" "<literal $ERR_FILE path>"
Gotchas
- Adversarial hallucinates more. False Positive classification in Phase 4 is the most common outcome for the noisiest findings. Always verify the cited file:line exists before agreeing.
- Focus text IS allowed here (unlike
/codex-review). It goes as a positional argument after the flags. --commit,--uncommitteddo not exist — translate via ANALYZE, never pass through.- Companion-side
--background/--waitare silent no-ops. Same handler as review. Use Pattern A.
For the full shared gotchas list, read
${CLAUDE_PLUGIN_ROOT}/references/companion-usage.md §10.