Cost Trend
The smoke gate is binary (winRate ≥ 0.80 → pass/fail). The corpus benchmarks captured over time form a curve — and curves catch regressions the gate misses (win rate slowly creeping from 100% to 85% is "still passing" by smoke but a real degradation).
This skill reads every persisted run in docs/benchmarks/runs/*.json and reports first→last deltas plus a per-run series, flagging regressions in win rate or latency.
When to use
- Before a release — check that the speedup hasn't drifted.
- After expanding the corpus — verify older runs still hit the same win rate on the new corpus they reflected.
- After upgrading
agent-booster— surface latency / strategy changes.
Steps
Run the trend script from the project root:
node plugins/ruflo-cost-tracker/scripts/trend.mjsOptional env:
TREND_FORMAT=json— emit JSON instead of markdownTREND_LIMIT=10— consider only the most recent N runs
Inspect the drift summary — first vs last on win rate, avg latency, p99, escalation rate, speedup vs Gemini.
Inspect the per-run series — one row per run, including Sonnet 4.6 + Opus 4.7 baseline latencies if those were enabled (
BENCH_ANTHROPIC=1at run time).Regression flags — the script emits
> ⚠ Regressioncallouts when:- Win rate dropped between first and last run
- Avg latency rose ≥ 1.5× from first run
Cross-references
cost-benchmark— the producer of the run JSONs this skill consumesbench/booster-corpus.json— the corpus version is recorded in each run, so trends across corpus versions remain interpretabledocs/benchmarks/runs/latest.json— the most-recent run; smoke step 23 gates onwinRate ≥ 0.80from this file