Research Gap Detect
Analyze the research corpus citation graph to find disconnected clusters, isolated papers, and gap opportunities. Optionally searches for bridge paper candidates and files gap issues.
Triggers
- "find research gaps"
- "detect clusters"
- "cluster analysis"
- "find isolated papers"
- "bridge candidate search"
/research-gap-detect
Parameters
--clusters-only (optional)
Only run cluster detection — skip bridge search and issue filing.
--file-issues (optional)
Auto-file gap issues for each disconnected cluster pair.
--search-bridges (optional)
Search external databases for papers that could bridge disconnected clusters.
--min-cluster-size N (optional)
Minimum papers in a cluster to report. Default: 2.
--format (optional)
Output format: full (default), summary, or json.
Execution Flow
Phase 1: Build Citation Graph
- Read the citation-network index (from
/corpus-index-build --graph citation-network)- If stale or missing: run
/corpus-index-build --graph citation-networkfirst
- If stale or missing: run
- Build an adjacency list from outgoing + incoming edges
- Treat as undirected for cluster detection (A cites B ≡ A connected to B)
Phase 2: Connected Components (BFS)
Run BFS/connected-components on the undirected citation graph:
- Initialize: all nodes unvisited
- For each unvisited node: BFS to find its connected component
- Collect components sorted by size (largest first)
Output:
Connected Components: 9
Cluster 1: "Agentic Workflows" (124 papers)
Hub: REF-016 (34 connections)
Topics: agentic-workflows, multi-agent, orchestration
Sample: REF-001, REF-016, REF-024, REF-121 ...
Cluster 2: "GUI Agents" (31 papers)
Hub: REF-198 (12 connections)
Topics: gui-agents, web-agents, screen-understanding
Sample: REF-198, REF-201, REF-215 ...
...
Cluster 9: "Isolated" (3 papers)
No hub (all degree 1)
REF-299, REF-312, REF-350
Phase 3: Gap Analysis
For each pair of clusters, assess the gap:
- Topic overlap — do the clusters share any tags?
- Temporal overlap — do they cover the same years?
- Author overlap — do any authors appear in both clusters?
- Bridgeability — could a single paper connect them?
Prioritize gaps by:
- Size product — larger clusters disconnected = higher priority
- Topic proximity — clusters with related but not identical topics
- Recency — newer clusters may simply be missing recent cross-citations
Output:
Gap Analysis: 12 cluster pairs
Priority 1: "Agentic Workflows" ↔ "GUI Agents"
Gap: 124 × 31 = 3,844 (size product)
Topic overlap: agent, llm (2 shared tags)
Bridge opportunity: HIGH
Suggested search: "LLM agent GUI interaction orchestration"
Priority 2: "Evaluation" ↔ "Reproducibility"
Gap: 45 × 28 = 1,260
Topic overlap: evaluation, benchmark (2 shared tags)
Bridge opportunity: MEDIUM
Suggested search: "reproducible LLM evaluation benchmarks"
...
Phase 4: Bridge Search (if --search-bridges)
For each high-priority gap:
- Generate search queries from cluster topic overlap
- Search external databases (Semantic Scholar, arXiv, Google Scholar)
- Filter candidates by:
- Cites papers from BOTH clusters
- Published in overlapping time range
- High citation count (likely to be connecting work)
- Rank candidates by bridge potential
Output:
Bridge Candidates Found: 8
For gap "Agentic Workflows" ↔ "GUI Agents":
1. "WebAgent: World-Centric Web Navigation" (2024)
Cites: REF-016 (Cluster 1), REF-198 (Cluster 2)
Citations: 87
Bridge potential: HIGH
2. "Agent-E: Vision-Language Planning for Web Tasks" (2024)
Cites: REF-024 (Cluster 1), REF-201 (Cluster 2)
Citations: 45
Bridge potential: MEDIUM
Phase 5: File Issues (if --file-issues)
For each gap with bridge candidates, file a research induction issue:
## Research Gap: [Cluster A] ↔ [Cluster B]
**Gap Size**: [N × M papers disconnected]
**Bridge Candidates**: [list]
**Suggested Action**: Induct [top candidate] to connect clusters
### Bridge Papers to Induct
- [ ] "WebAgent: World-Centric Web Navigation" — arxiv:2401.XXXXX
- [ ] "Agent-E: Vision-Language Planning" — arxiv:2403.XXXXX
Phase 6: Report
Research Gap Detection
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Graph: 372 nodes, 1,247 edges
Connected components: 9
Largest cluster: 124 papers ("Agentic Workflows")
Isolated papers: 3
Gap analysis: 12 cluster pairs
HIGH priority: 4 (bridge candidates available)
MEDIUM priority: 5
LOW priority: 3
Bridge candidates found: 8 papers
Issues filed: 4
Papers recommended for induction: 8
Distinction from research-gap
| Tool | Approach | Output |
|---|---|---|
research-gap |
Intellectual — topic coverage, missing areas, GRADE gaps | Gap report with search queries |
research-gap-detect |
Structural — citation graph topology, disconnected components | Cluster map, bridge candidates, filed issues |
research-gap answers "what topics are we missing?" while research-gap-detect answers "which existing papers don't cite each other but should?"
Examples
# Full analysis with bridge search
/research-gap-detect --search-bridges
# Just show clusters
/research-gap-detect --clusters-only
# Detect and auto-file issues
/research-gap-detect --file-issues
# Combined: search + file
/research-gap-detect --search-bridges --file-issues
# JSON for visualization
/research-gap-detect --format json
References
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/corpus-index-build/SKILL.md — Builds the citation-network graph
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/citation-backfill/SKILL.md — Prerequisite: complete bidirectional edges
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-gap/SKILL.md — Complementary intellectual gap analysis
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/induct-research/SKILL.md — Inducts bridge candidates