Skip to main content
AI/MLathola

performance-review

Detects time and space complexity hotspots via AST scan. Use when code feels slow, before performance-sensitive merges, or to find O(n²) regressions.

Stars
294
Source
athola/claude-night-market
Updated
2026-05-30
Slug
athola--claude-night-market--performance-review
View on GitHubRaw SKILL.md

// install — copy + paste into any project

mkdir -p .claude/skills && curl -fsSL https://raw.githubusercontent.com/athola/claude-night-market/HEAD/plugins/pensive/skills/performance-review/SKILL.md -o .claude/skills/performance-review.md

Drops the SKILL.md into .claude/skills/performance-review.md. Works with Claude Code, Cursor, and any agent that loads SKILL.md files from .claude/skills/.

Table of Contents

Performance Review

Static-analysis review of time and space complexity hotspots.

The skill runs in three escalating tiers. Tier 1 uses Python's stdlib ast and always runs. Tier 2 uses gauntlet's tree-sitter parser to extend detection across languages when gauntlet is installed. Tier 3 uses the gauntlet code graph to upgrade severity when hotspots reach other hotspots transitively. If gauntlet is missing, Tiers 2 and 3 no-op and Tier 1 still produces useful findings on Python source.

Quick Start

/performance-review                  # scan changed files
/performance-review path/to/file.py  # scan one file
/performance-review --tier 1         # force Tier 1 only

Programmatic use:

from pensive.skills.performance_review import PerformanceReviewSkill
skill = PerformanceReviewSkill()
result = skill.analyze(context, "src/module.py")
for f in result.issues:
    print(f"[{f.severity}] {f.file}:{f.line} {f.message}")

When to Use

  • Pre-merge review of code that runs on user-scaled inputs.
  • Triage of a function that "feels slow" before reaching for a profiler.
  • Audit a refactor for newly introduced O(n²) patterns.
  • Guardrail for AI-generated code where nested-loop hot spots are common.

When NOT to Use

  • The target needs runtime measurement (memory profile, CPU time on real data). Use Skill(parseltongue:python-performance) instead: that skill drives cProfile, py-spy, and benchmarks.
  • General refactoring guidance not focused on hotspots: use Skill(pensive:code-refinement) whose algorithm-efficiency module covers broader optimization patterns. This skill detects; that skill teaches.
  • Architecture-level performance (sharding, caching layers, queue placement): use Skill(pensive:architecture-review).

Required TodoWrite Items

  1. perf-review:context-established
  2. perf-review:scan-complete
  3. perf-review:findings-categorized
  4. perf-review:integration-checked
  5. perf-review:report-generated

Workflow

Step 1: Context (perf-review:context-established)

  • Identify target files. If invoked with no argument, use git diff --name-only. If invoked with a path, scope to that.
  • Note language(s) involved. Tier 1 covers Python; non-Python files need gauntlet for Tier 2 coverage.

Step 2: Tier 1 AST scan (perf-review:scan-complete)

Load modules/time-complexity.md for the time-side patterns and modules/space-complexity.md for space-side. Each module documents the AST shape of every detector.

For each Python target file, call:

from pensive.skills.performance_review import PerformanceReviewSkill
result = PerformanceReviewSkill().analyze(context, path)

The visitor walks the AST once and emits ReviewFinding records.

Step 3: Categorize and rank (perf-review:findings-categorized)

Group findings by severity:

  • HIGH: O(n²) or worse on input-sized iterables (T1, T2).
  • MEDIUM: Unbounded allocation or per-iteration overhead (T3, T4, S1, S3).
  • LOW: Style-level inefficiencies (T5, T6, S2).
  • CRITICAL: Reserved for Tier-3 transitive upgrades.

Within a severity, sort by file then line. Suppress findings the user has explicitly marked acceptable (TODO/comment markers) at module-load time of the target.

Step 4: Tier 2/3 enrichment (perf-review:integration-checked)

Load modules/gauntlet-integration.md for the contract.

If gauntlet is installed, run Tier 2 on non-Python files that were skipped at Step 2. If a .gauntlet/graph.db exists in the working tree, run Tier 3 to upgrade severities based on transitive hotspot reachability.

If gauntlet is missing, this step is a no-op and the report notes "Tier 2/3 not available: install gauntlet for multi-language and call-chain coverage."

Step 5: Report (perf-review:report-generated)

Emit a markdown report:

## Performance Review: <target>

### HIGH (<count>)
- src/foo.py:42: Nested loop over the same iterable 'items'.
  Suggestion: sort + two pointers, or hash-set membership.

### MEDIUM (<count>)
- ...

### LOW (<count>)
- ...

Tier coverage: 1 (always) | 2 (gauntlet ✓/✗) | 3 (graph ✓/✗)

The report is informational. Apply fixes via Skill(pensive:code-refinement) or hand-merge.

Tiered Analysis

Tier Source When it runs What it covers
1 stdlib ast Always (Python source only) T1-T6, S1-S3
2 gauntlet.treesitter_parser When gauntlet importable Same patterns adapted to JS/TS, Go, Rust, Java, C/C++
3 gauntlet.graph.GraphStore When .gauntlet/graph.db exists Severity upgrade via transitive call chains

Output Format

Findings use the shared ReviewFinding dataclass from pensive.skills.base:

ReviewFinding(
    file="src/module.py",
    line=42,
    severity="HIGH",          # LOW | MEDIUM | HIGH | CRITICAL
    category="time",          # time | space
    message="Nested loop over the same iterable 'items'.",
    suggestion="Sort + two pointers, or hash-set membership.",
    code_snippet="",
)

This shape matches every other pensive review skill, so the findings can flow into Skill(pensive:unified-review) without translation.

Cross-Plugin Dependencies

Dependency Required? Effect when missing
gauntlet.treesitter_parser Optional Tier 2 returns []; Python coverage unchanged
gauntlet.graph.GraphStore Optional Tier 3 returns []; severities are not upgraded

The optional-import contract follows the precedent in plugins/leyline/src/leyline/tokens.py:25-32 and plugins/gauntlet/hooks/pr_blast_radius.py:52-56: try-import to module-level sentinels, then early-return on None inside each tier helper. See modules/gauntlet-integration.md for the exact code shape.

Supporting Modules

  • modules/time-complexity.md: T1-T6 detector patterns and AST shapes.
  • modules/space-complexity.md: S1-S3 detector patterns.
  • modules/gauntlet-integration.md: Tier 2/3 contract, fallback semantics, examples.
  • modules/kuva-visualization.md: Rendering benchmark data as charts with kuva (criterion, pytest-benchmark, ad-hoc tables). Covers when chart evidence satisfies proof-of-work requirements.

Verification

A perf-review finding is only useful if the caller can confirm it is real. Use this checklist before treating any finding as worth fixing:

  1. Reproduce under a profiler. Run cProfile, py-spy, or the language-specific equivalent on the hotspot. The findings pinpoint AST shapes; the profiler validates the runtime impact.
  2. Re-run the failing benchmark. If benches/ exists, the hotspot should show up in numbers, not just AST scans.
  3. Compare numbers before and after the proposed fix. The fix is wrong if numbers do not move. Capture both timings as evidence references like [E1] (before) and [E2] (after). When 3+ data points exist, render a kuva chart and attach it to the PR — see modules/kuva-visualization.md.
  4. Sample two or three reported hotspots manually. Findings can be true at the AST level and false at the call-graph level when callers short-circuit. Manual sampling catches that.

The Skill(imbue:proof-of-work) discipline applies: claims like "the hotspot is fixed" require evidence, not assertion.

Testing

A test file already lives at plugins/pensive/tests/skills/test_performance_review.py covering the AST-shape detectors. Two rules for changes here:

  • Add a new detector with a test. Any new T-* or S-* pattern added to the modules ships with a test that has the smallest AST sample exercising it.
  • Add a regression test for any false positive removed. When the skill stops firing on a shape that used to look hot, the reason should appear as a test case so the regression is discoverable later.

The Iron Law applies: a new detector without a failing test first is a request to skip TDD on a code-analysis component, which is exactly the place where TDD pays off most.

Exit Criteria

  • A perf-review report file exists for the requested target.
  • Every finding carries a severity label and a concrete suggestion the caller can act on.
  • Time-complexity (T1-T6) and space-complexity (S1-S3) detectors have been run; tier coverage is reported.
  • Tier 2 (gauntlet treesitter) and Tier 3 (graph store) contracts honor the optional-import sentinel: missing modules return [] rather than raising.
  • Each new detector ships with a smallest-AST test that fails before the detector exists; each removed false positive ships with a regression test.
  • Findings flow into Skill(pensive:unified-review) without translation when invoked from the unified entry point.