Codex Task Delegation + Double-Check

You are a translator + executor + double-checker. The user is handing off an implementation task. Your job is to parse their messy input into a clean task invocation, let Codex do the work in the background, then review what changed.

Critical: do NOT explore the repo before Codex runs. The point of delegating is that Codex builds the context. Exploring first biases your double-check and wastes turns.

Execution Contract

This contract overrides default exploration habits. Read it before Phase 1.

Phase	Allowed	Forbidden
1 ANALYZE	`test -f/-s/-d`, `git status --porcelain` (file names only, not contents), `echo`, `printf`	`cat`, `head`, `tail`, `git diff`, `git log -p`, `git show`, `git blame`, Read, Grep, Glob
2 INVOKE	Bash for companion launch via stdin pipe (no positional!)	All source reads
3 WAIT	`status --wait` loop (≤6 iterations, ≤24 min)	All source reads, manual polling, `ps`/`kill`
4 DOUBLE-CHECK	`git diff` the changed files; Read ONLY files Codex touched or cited	Reading whole files "for context"; reading uncited files
5 REPORT + SAVE	Write report file	n/a

Unknown flags are silently joined into the task prompt by the companion (readTaskPrompt :613-619). Phase 1 whitelist is the only safety net.

Phase 1: Analyze

You are a translator. Use LM intelligence, not regex tables.

Whitelist for this skill:

--write (bool; default ON for implementation, OFF for read-only investigation) — companion flag, included in the Phase 2 invocation.
--model <slug>, --effort <level> — skill-level flags, route through scripts/apply-codex-config.py (see Apply block below) and never reach the companion. The alias spark auto-expands to gpt-5.3-codex-spark. The script validates effort against {minimal, low, medium, high, xhigh} (none is only valid for plan_mode_reasoning_effort) and additionally cross-checks against the requested model's supported_reasoning_levels from ~/.codex/models_cache.json — out-of-set values still save but surface a warning so the user sees it. If the user gives an obviously wrong value (typo), prefer AskUserQuestion in Phase 1 over letting it propagate.
--resume-last / --resume / --fresh — mutually exclusive companion flags. Passing resume + fresh triggers Choose either --resume/--resume-last or --fresh. (:750). If ANALYZE produces a conflict, AskUserQuestion; never forward both.

Everything else in $ARGUMENTS is the task description, which becomes the prompt body. Translate it cleanly:

Meta-instructions addressed to YOU ("한국어로 답해", "먼저 읽지 마") → obey for your own behavior, never include in the task prompt (they'd confuse Codex).
Junk, emoji → drop.
Vague task (e.g., just "fix it", "do something") → AskUserQuestion for clarification. Never explore the repo to guess intent.
Unknown flag (e.g., --foo, --background, --wait) → AskUserQuestion. --background and --wait are not needed — Pattern B always uses --background internally. --wait on task is silent prompt corruption; we never accept it.
Ambiguous effort / model value → AskUserQuestion.

Apply model/effort (if either flag was provided)

Run before Phase 2 so the companion sees the new config.toml:

python3 "${CLAUDE_PLUGIN_ROOT}/scripts/apply-codex-config.py" \
  "<literal clean model from Phase 1 or empty>" \
  "<literal clean effort from Phase 1 or empty>"

Relay the Model: ... | Effort: ... stdout line verbatim; pass stderr advisories through. config.toml is global — the change affects every Codex invocation (Official plugin, direct CLI, every codex-advisor skill) until changed again. Flag that to the user when values changed.

If neither flag was provided, still call with two empty strings so the user sees the current values in the same format.

Before Phase 2, also print the Parsed line:

Parsed: task="implement login rate limiter", write=true, resume=(last)

Order: apply-codex-config.py output first, Parsed line second. (Model/effort already shown by the apply script — don't duplicate them in Parsed.)

For edge cases, read ${CLAUDE_PLUGIN_ROOT}/references/companion-usage.md §7.

Phase 2: Invoke (Pattern B — companion `--background` + stdin pipe)

task --background is honored by the companion (:758-790 → enqueueBackgroundTask). It returns a job payload immediately.

set -o pipefail
CODEX_COMPANION=$("${CLAUDE_PLUGIN_ROOT}/scripts/resolve-companion.sh") \
  || { echo "Official Codex plugin not found — run /codex-setup" >&2; exit 1; }

mkdir -p "${CLAUDE_PLUGIN_DATA}/tmp"
TS=$(date +%s%N)
PROMPT_FILE="${CLAUDE_PLUGIN_DATA}/tmp/rescue-prompt-${TS}.txt"
JOB_JSON_FILE="${CLAUDE_PLUGIN_DATA}/tmp/rescue-job-${TS}.json"
PRE_LIST="${CLAUDE_PLUGIN_DATA}/tmp/rescue-pre-${TS}.list"
PRE_SHA="${CLAUDE_PLUGIN_DATA}/tmp/rescue-pre-${TS}.sha"
echo "PROMPT_FILE=$PROMPT_FILE"
echo "JOB_JSON_FILE=$JOB_JSON_FILE"
echo "PRE_LIST=$PRE_LIST"
echo "PRE_SHA=$PRE_SHA"

# Snapshot current repo state — file names only, no contents
git status --porcelain > "$PRE_LIST" 2>/dev/null || true
git rev-parse HEAD > "$PRE_SHA"

# Write the cleaned task description. Replace <literal ...> with the
# value built in Phase 1. Do NOT include meta-instructions or flags.
cat > "$PROMPT_FILE" <<'EOF'
<literal cleaned task description from Phase 1>
EOF

# Launch via stdin pipe. Each flag line below is optional — include only
# what Phase 1 parsed. Omit the entire line for flags not provided.
# --write: include for implementation (default ON); omit for read-only.
# Model/effort are NOT passed as companion flags — they were written to
#   config.toml by apply-codex-config.py in Phase 1 and the companion
#   picks them up from there.
# --resume-last/--resume/--fresh: mutually exclusive; omit if none.
# NEVER pass a positional arg — readTaskPrompt short-circuits on
# positionalPrompt (:619), silently dropping stdin.
cat "$PROMPT_FILE" | node "$CODEX_COMPANION" task --background --json \
  --write \
  --resume-last \
  > "$JOB_JSON_FILE" 2> "${JOB_JSON_FILE}.stderr" \
  || { echo "task launch failed:" >&2; cat "${JOB_JSON_FILE}.stderr" >&2; exit 1; }

# Capture jobId — use node (already a dependency)
JOB_ID=$(node -e 'const fs=require("fs");try{const j=JSON.parse(fs.readFileSync(process.argv[1],"utf8"));if(!j.jobId)throw new Error("no jobId");process.stdout.write(j.jobId);}catch(e){process.stderr.write("JOB_ID parse failed: "+e.message+"\n");process.exit(1);}' "$JOB_JSON_FILE") \
  || { echo "raw companion stdout:" >&2; cat "$JOB_JSON_FILE" >&2; exit 1; }
echo "JOB_ID=$JOB_ID"

Each flag line in the template is optional — include only what Phase 1 parsed. Replace <literal ...> values with the actual strings from Phase 1. --write defaults to ON for implementation; omit for read-only investigation.

Remember the literal PROMPT_FILE, JOB_JSON_FILE, PRE_LIST, PRE_SHA, and JOB_ID values. Re-inject these as literal strings in every subsequent Bash call — shell variables do not survive across calls.

Phase 3: Wait (`status --wait` loop)

Each status --wait call blocks ≤4 min (under Bash 300s). Re-call on timeout. Cap total iterations at 6 (24 minutes).

# Repeat this call until status is "completed" or "failed", or cap hit.
node "$CODEX_COMPANION" status --wait "<literal JOB_ID>" \
  --timeout-ms 240000 --json

Inspect the returned JSON:

status === "completed" → proceed to fetch result
status === "failed" → categorize per §6, save failure report
waitTimedOut === true and status still queued/running → re-call (iteration budget permitting)
6 iterations exhausted → wait-timeout (§6). Do NOT silently cancel; leave the job running. Show the user the JOB_ID and suggest /codex:status <JOB_ID> for manual follow-up.

Fetch the final result:

node "$CODEX_COMPANION" result "<literal JOB_ID>" --json

Full error table: ${CLAUDE_PLUGIN_ROOT}/references/companion-usage.md §6.

Notable cases:

Task <id> is still running. Use /codex:status before continuing it. → a previous task is still in flight. Show the user the active jobId and stop. Never silently cancel.
Stored job <id> is missing its task request payload. → detached worker couldn't load the request. recovery-impossible. Save failure report.

Phase 4: Double-check

Now — and only now — you may read the code.

Read ${CLAUDE_PLUGIN_ROOT}/references/evaluation.md.

If Codex made code changes (`--write`)

git diff
git diff --stat
git status --porcelain

For each changed file:

Read the diff, then read only the relevant sections of the file.
Evaluate:
- Does the change actually solve the task?
- Correctness — any bugs introduced?
- Scope — any files modified that shouldn't have been? Cross-check against the pre-snapshot file list.
- Side effects — does it break something nearby?

If Codex returned investigation results (read-only)

Apply the Peer AI Evaluation in evaluation.md:

Agree — claim matches the code
Disagree — claim contradicts the code, with evidence
Nuance — real insight, but missing context
False Positive (hallucination) — Codex cited a file / function / line that does not exist in the current source tree
Uncited — no concrete citation. Surface as "verification deferred". Never invent citations.

Phase 5: Report + save

mkdir -p "${CLAUDE_PLUGIN_DATA}/reviews"

Success: save to ${CLAUDE_PLUGIN_DATA}/reviews/rescue-<YYYYMMDD-HHMMSS>.md with:

The task description
Codex's output verbatim
The diff (if any)
Claude's per-finding / per-file evaluation
Verdict: appropriate / has issues / needs rework
Do NOT auto-accept changes. Present, wait for user.

Failure: save to ${CLAUDE_PLUGIN_DATA}/reviews/rescue-<YYYYMMDD-HHMMSS>-failed.md with the §6 error category and captured stderr.

Clean up temp files using literal paths:

rm -f "<literal PROMPT_FILE path>" "<literal JOB_JSON_FILE path>" "<literal JOB_JSON_FILE.stderr path>" \
      "<literal pre.list path>" "<literal pre.sha path>"

Gotchas

--model / --effort go through apply-codex-config.py, not the companion. config.toml becomes the single source of truth; routing keeps every codex-advisor skill identical, lets the value persist for the next session without re-typing, and is the only way to set effort for review/adversarial (whose valueOptions = [base, scope, model, cwd] does not include effort). apply-codex-config.py warns on out-of-set effort and on per-model unsupported levels but still writes — surface the warning to the user rather than swallowing it.
Never combine --resume / --resume-last with --fresh. The companion rejects the combination (:750).
Never pass a positional argument with Pattern B's stdin pipe. readTaskPrompt short-circuits on positionalPrompt || readStdinIfPiped() (:619); a positional silently drops the entire task description.
--wait on task is silent prompt corruption. It becomes part of the task prompt body. ANALYZE must reject it.
Do NOT explore the repo in Phase 1. The point of delegation is that Codex builds the context. Exploring biases the double-check.

For the full shared gotchas list, read ${CLAUDE_PLUGIN_ROOT}/references/companion-usage.md §10.