Skip to main content
AI/MLjmagly

research-gap-detect

Build the mutual citation graph, find connected components, identify isolated clusters, and optionally search for bridge candidates and file gap issues.

Stars
141
Source
jmagly/aiwg
Updated
2026-05-31
Slug
jmagly--aiwg--research-gap-detect
View on GitHubRaw SKILL.md

// install — copy + paste into any project

mkdir -p .claude/skills && curl -fsSL https://raw.githubusercontent.com/jmagly/aiwg/HEAD/agentic/code/frameworks/research-complete/skills/research-gap-detect/SKILL.md -o .claude/skills/research-gap-detect.md

Drops the SKILL.md into .claude/skills/research-gap-detect.md. Works with Claude Code, Cursor, and any agent that loads SKILL.md files from .claude/skills/.

Research Gap Detect

Analyze the research corpus citation graph to find disconnected clusters, isolated papers, and gap opportunities. Optionally searches for bridge paper candidates and files gap issues.

Triggers

  • "find research gaps"
  • "detect clusters"
  • "cluster analysis"
  • "find isolated papers"
  • "bridge candidate search"
  • /research-gap-detect

Parameters

--clusters-only (optional)

Only run cluster detection — skip bridge search and issue filing.

--file-issues (optional)

Auto-file gap issues for each disconnected cluster pair.

--search-bridges (optional)

Search external databases for papers that could bridge disconnected clusters.

--min-cluster-size N (optional)

Minimum papers in a cluster to report. Default: 2.

--format (optional)

Output format: full (default), summary, or json.

Execution Flow

Phase 1: Build Citation Graph

  1. Read the citation-network index (from /corpus-index-build --graph citation-network)
    • If stale or missing: run /corpus-index-build --graph citation-network first
  2. Build an adjacency list from outgoing + incoming edges
  3. Treat as undirected for cluster detection (A cites B ≡ A connected to B)

Phase 2: Connected Components (BFS)

Run BFS/connected-components on the undirected citation graph:

  1. Initialize: all nodes unvisited
  2. For each unvisited node: BFS to find its connected component
  3. Collect components sorted by size (largest first)

Output:

Connected Components: 9

Cluster 1: "Agentic Workflows" (124 papers)
  Hub: REF-016 (34 connections)
  Topics: agentic-workflows, multi-agent, orchestration
  Sample: REF-001, REF-016, REF-024, REF-121 ...

Cluster 2: "GUI Agents" (31 papers)
  Hub: REF-198 (12 connections)
  Topics: gui-agents, web-agents, screen-understanding
  Sample: REF-198, REF-201, REF-215 ...

...

Cluster 9: "Isolated" (3 papers)
  No hub (all degree 1)
  REF-299, REF-312, REF-350

Phase 3: Gap Analysis

For each pair of clusters, assess the gap:

  1. Topic overlap — do the clusters share any tags?
  2. Temporal overlap — do they cover the same years?
  3. Author overlap — do any authors appear in both clusters?
  4. Bridgeability — could a single paper connect them?

Prioritize gaps by:

  • Size product — larger clusters disconnected = higher priority
  • Topic proximity — clusters with related but not identical topics
  • Recency — newer clusters may simply be missing recent cross-citations

Output:

Gap Analysis: 12 cluster pairs

Priority 1: "Agentic Workflows" ↔ "GUI Agents"
  Gap: 124 × 31 = 3,844 (size product)
  Topic overlap: agent, llm (2 shared tags)
  Bridge opportunity: HIGH
  Suggested search: "LLM agent GUI interaction orchestration"

Priority 2: "Evaluation" ↔ "Reproducibility"
  Gap: 45 × 28 = 1,260
  Topic overlap: evaluation, benchmark (2 shared tags)
  Bridge opportunity: MEDIUM
  Suggested search: "reproducible LLM evaluation benchmarks"
...

Phase 4: Bridge Search (if --search-bridges)

For each high-priority gap:

  1. Generate search queries from cluster topic overlap
  2. Search external databases (Semantic Scholar, arXiv, Google Scholar)
  3. Filter candidates by:
    • Cites papers from BOTH clusters
    • Published in overlapping time range
    • High citation count (likely to be connecting work)
  4. Rank candidates by bridge potential

Output:

Bridge Candidates Found: 8

For gap "Agentic Workflows" ↔ "GUI Agents":
  1. "WebAgent: World-Centric Web Navigation" (2024)
     Cites: REF-016 (Cluster 1), REF-198 (Cluster 2)
     Citations: 87
     Bridge potential: HIGH

  2. "Agent-E: Vision-Language Planning for Web Tasks" (2024)
     Cites: REF-024 (Cluster 1), REF-201 (Cluster 2)
     Citations: 45
     Bridge potential: MEDIUM

Phase 5: File Issues (if --file-issues)

For each gap with bridge candidates, file a research induction issue:

## Research Gap: [Cluster A] ↔ [Cluster B]

**Gap Size**: [N × M papers disconnected]
**Bridge Candidates**: [list]
**Suggested Action**: Induct [top candidate] to connect clusters

### Bridge Papers to Induct
- [ ] "WebAgent: World-Centric Web Navigation" — arxiv:2401.XXXXX
- [ ] "Agent-E: Vision-Language Planning" — arxiv:2403.XXXXX

Phase 6: Report

Research Gap Detection
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Graph: 372 nodes, 1,247 edges
Connected components: 9
Largest cluster: 124 papers ("Agentic Workflows")
Isolated papers: 3

Gap analysis: 12 cluster pairs
  HIGH priority: 4 (bridge candidates available)
  MEDIUM priority: 5
  LOW priority: 3

Bridge candidates found: 8 papers
Issues filed: 4
Papers recommended for induction: 8

Distinction from research-gap

Tool Approach Output
research-gap Intellectual — topic coverage, missing areas, GRADE gaps Gap report with search queries
research-gap-detect Structural — citation graph topology, disconnected components Cluster map, bridge candidates, filed issues

research-gap answers "what topics are we missing?" while research-gap-detect answers "which existing papers don't cite each other but should?"

Examples

# Full analysis with bridge search
/research-gap-detect --search-bridges

# Just show clusters
/research-gap-detect --clusters-only

# Detect and auto-file issues
/research-gap-detect --file-issues

# Combined: search + file
/research-gap-detect --search-bridges --file-issues

# JSON for visualization
/research-gap-detect --format json

References

  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/corpus-index-build/SKILL.md — Builds the citation-network graph
  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/citation-backfill/SKILL.md — Prerequisite: complete bidirectional edges
  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-gap/SKILL.md — Complementary intellectual gap analysis
  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/induct-research/SKILL.md — Inducts bridge candidates