Skip to main content
AI/MLjmagly

corpus-snapshot

Generate a corpus snapshot report — computes dimensions, topology, degree distribution, delta from previous. Helps with cluster, chain, and gap analysis sections.

Stars
141
Source
jmagly/aiwg
Updated
2026-05-31
Slug
jmagly--aiwg--corpus-snapshot
View on GitHubRaw SKILL.md

// install — copy + paste into any project

mkdir -p .claude/skills && curl -fsSL https://raw.githubusercontent.com/jmagly/aiwg/HEAD/agentic/code/frameworks/research-complete/skills/corpus-snapshot/SKILL.md -o .claude/skills/corpus-snapshot.md

Drops the SKILL.md into .claude/skills/corpus-snapshot.md. Works with Claude Code, Cursor, and any agent that loads SKILL.md files from .claude/skills/.

Corpus Snapshot

Generate a point-in-time snapshot of the research corpus with computed metrics and analysis. Reads a snapshot template, fills [COMPUTE] sections with data, assists with [ANALYZE] sections, and writes the completed report.

Triggers

  • "take a corpus snapshot"
  • "generate corpus report"
  • "snapshot the research"
  • "corpus snapshot"
  • /corpus-snapshot

Parameters

--compute-only (optional)

Only compute data sections — skip analysis sections. Faster, fully automated.

--delta-only (optional)

Only compute the delta from the previous snapshot. Useful for tracking session progress.

--template <path> (optional)

Custom template path. Default: .aiwg/reports/corpus-snapshot-template.md.

--format (optional)

Output format: full (default for the report file), summary (terminal), json (programmatic).

Prerequisites

Before generating a snapshot, the following should be current:

Prerequisite Command Gates on
Citation edges complete /citation-backfill Topology metrics
Indices up to date /corpus-index-build Group counts, hub analysis
Stub rate < 10% /research-quality-audit Snapshot validity

If prerequisites are stale, the snapshot will include warnings.

Execution Flow

Phase 1: Collect Raw Metrics

Scan the corpus and compute:

Dimensions:

  • Total papers (node count)
  • Total citation edges (edge count)
  • Topics (unique tag count)
  • Authors (unique author count)
  • Year range (oldest → newest)
  • Source types distribution

Topology (from citation-network index):

  • Graph density: edges / (nodes * (nodes-1))
  • Average degree (mean edges per node)
  • Max hub (node with most connections)
  • Connected components count
  • Isolated nodes (degree 0)
  • Diameter estimate (longest shortest path in largest component)

Degree Distribution:

  • Histogram: how many nodes have degree 0, 1-2, 3-5, 6-10, 11-20, 20+
  • Power law fit (if applicable)

Quality Distribution:

  • GRADE breakdown: High / Moderate / Low / Very Low
  • Doc depth: Full / Adequate / Stub / Skeleton (from quality-audit)
  • Source availability: PDF present / Full text extracted / Missing

Phase 2: Compute Delta (if previous snapshot exists)

Compare current metrics against the most recent snapshot:

Delta from previous snapshot (2026-04-10):
  Papers:     +12 (360 → 372)
  Edges:      +87 (1,160 → 1,247)
  Density:    +0.001 (0.008 → 0.009)
  New topics:  +2 (gui-agents, code-generation)
  Stubs fixed: 23 (88 → 65)
  New hubs:    REF-364 (entered top 10)

Phase 3: Fill Template Sections

Read the snapshot template and fill sections:

[COMPUTE] sections — fully automated:

  • Dimensions table
  • Topology metrics
  • Degree distribution histogram
  • GRADE distribution
  • Delta table

[ANALYZE] sections — agent-assisted:

  • Cluster narrative: describe the main clusters and their themes
  • Chain analysis: identify citation chains (A→B→C→D) and their significance
  • Gap narrative: summarize disconnected areas and bridge opportunities
  • Trend analysis: what's growing, what's stagnant

Phase 4: Write Report

Write the completed snapshot to:

.aiwg/reports/corpus-snapshot-YYYY-MM-DD.md

With frontmatter:

---
type: corpus-snapshot
date: 2026-04-13
papers: 372
edges: 1247
density: 0.009
components: 9
stub_rate: 0.17
previous: corpus-snapshot-2026-04-10.md
---

Phase 5: Report Summary

Corpus Snapshot Generated
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Papers: 372 (+12)  |  Edges: 1,247 (+87)
Density: 0.009     |  Components: 9
Hub: REF-016 (34)  |  Isolated: 3
GRADE: 33% High, 24% Mod, 26% Low, 16% VLow
Stubs: 65 (17%)    |  Full text: 54%

Delta highlights:
  +12 papers inducted
  +87 citation edges (backfill)
  -23 stubs (expanded)
  +2 new topics

Written to: .aiwg/reports/corpus-snapshot-2026-04-13.md

Template Format

The default template uses markers for computed vs analyzed sections:

# Corpus Snapshot — [DATE]

## Dimensions
[COMPUTE: dimensions-table]

## Topology
[COMPUTE: topology-metrics]

## Degree Distribution
[COMPUTE: degree-histogram]

## Quality Distribution
[COMPUTE: grade-distribution]
[COMPUTE: depth-distribution]

## Delta
[COMPUTE: delta-from-previous]

## Cluster Analysis
[ANALYZE: describe main clusters, their themes, and notable papers]

## Citation Chains
[ANALYZE: identify significant citation chains and their meaning]

## Gaps and Opportunities
[ANALYZE: summarize disconnected areas and bridge opportunities]

## Recommendations
[ANALYZE: what should be inducted next, what needs expansion]

Integration Points

Component Relationship
corpus-index-build Reads index metrics (topology, hubs, components)
research-quality-audit Reads depth distribution; gates if stub rate > 10%
citation-backfill Must run before snapshot for accurate topology
research-gap-detect Cluster data feeds into gap narrative
research-status Snapshot is the detailed version of the health score

Examples

# Full snapshot with analysis
/corpus-snapshot

# Just data, no analysis sections
/corpus-snapshot --compute-only

# Delta from previous snapshot only
/corpus-snapshot --delta-only

# Custom template
/corpus-snapshot --template .aiwg/reports/custom-template.md

# JSON metrics for dashboards
/corpus-snapshot --format json

References

  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/corpus-index-build/SKILL.md — Index metrics source
  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-quality-audit/SKILL.md — Depth distribution source
  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/citation-backfill/SKILL.md — Prerequisite for topology
  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-gap-detect/SKILL.md — Cluster data for narrative
  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-status/SKILL.md — Health scoring complement