Skip to main content
AI/MLantinomyhq

test-reasoning

Validate that reasoning parameters are correctly serialized and sent to provider APIs. Use when the user asks to test reasoning serialization, run reasoning tests, verify reasoning config fields, or check that ReasoningConfig maps correctly to provider-specific JSON (OpenRouter, Anthropic, GitHub Copilot, Codex).

Stars
7,360
Source
antinomyhq/forge
Updated
2026-05-31
Slug
antinomyhq--forge--test-reasoning
View on GitHubRaw SKILL.md

// install — copy + paste into any project

mkdir -p .claude/skills && curl -fsSL https://raw.githubusercontent.com/antinomyhq/forge/HEAD/.forge/skills/test-reasoning/SKILL.md -o .claude/skills/test-reasoning.md

Drops the SKILL.md into .claude/skills/test-reasoning.md. Works with Claude Code, Cursor, and any agent that loads SKILL.md files from .claude/skills/.

Test Reasoning Serialization

Validates that ReasoningConfig fields are correctly serialized into provider-specific JSON for OpenRouter, Anthropic, GitHub Copilot, and Codex.

Quick Start

Run all tests with the bundled script:

./scripts/test-reasoning.sh

The script builds forge in debug mode, runs each provider/model combination, captures the outgoing HTTP request body via FORGE_DEBUG_REQUESTS, and asserts the correct JSON fields.

Running a Single Test Manually

FORGE_DEBUG_REQUESTS="forge.request.json" \
FORGE_SESSION__PROVIDER_ID=<provider_id> \
FORGE_SESSION__MODEL_ID=<model_id> \
FORGE_REASONING__EFFORT=<effort> \
target/debug/forge -p "Hello!"

Then inspect .forge/forge.request.json for the expected fields.

Test Coverage

Provider Model Config fields Expected JSON field
open_router openai/o4-mini effort: none|minimal|low|medium|high|xhigh reasoning.effort
open_router openai/o4-mini max_tokens: 4000 reasoning.max_tokens
open_router openai/o4-mini effort: high + exclude: true reasoning.effort + .exclude
open_router openai/o4-mini enabled: true reasoning.enabled
open_router anthropic/claude-opus-4-5 max_tokens: 4000 reasoning.max_tokens
open_router moonshotai/kimi-k2 max_tokens: 4000 reasoning.max_tokens
open_router moonshotai/kimi-k2 effort: high reasoning.effort
open_router minimax/minimax-m2 max_tokens: 4000 reasoning.max_tokens
open_router minimax/minimax-m2 effort: high reasoning.effort
anthropic claude-opus-4-6 effort: low|medium|high|max output_config.effort
anthropic claude-3-7-sonnet-20250219 enabled: true + max_tokens: 8000 thinking.type + budget_tokens
github_copilot o4-mini effort: none|minimal|low|medium|high|xhigh reasoning_effort (top-level)
codex gpt-5.1-codex effort: none|minimal|low|medium|high|xhigh reasoning.effort + .summary
codex gpt-5.1-codex effort: medium + exclude: true reasoning.summary = "concise"
all providers one model each effort: invalid non-zero exit, no request written

Tests for unconfigured providers are skipped automatically. Invalid-effort tests run regardless of credentials — the rejection happens at config parse time before any provider interaction.

References