Skip to main content
AI/MLjmagly

rlm-search

Run the full Recursive Language Model pipeline — prep, fan out across chunks, and recursively synthesize until results fit one context window

Stars
141
Source
jmagly/aiwg
Updated
2026-05-31
Slug
jmagly--aiwg--rlm-search
View on GitHubRaw SKILL.md

// install — copy + paste into any project

mkdir -p .claude/skills && curl -fsSL https://raw.githubusercontent.com/jmagly/aiwg/HEAD/agentic/code/addons/rlm/skills/rlm-search/SKILL.md -o .claude/skills/rlm-search.md

Drops the SKILL.md into .claude/skills/rlm-search.md. Works with Claude Code, Cursor, and any agent that loads SKILL.md files from .claude/skills/.

RLM Search

The full Recursive Language Model pipeline in one command. Prepares content if needed, fans the query out across all chunks, and recursively synthesizes results until they fit in a single context window. Use this when you need to answer a question against content too large to read at once.

Triggers

Alternate expressions and non-obvious activations:

  • "deep search this codebase" → rlm-search with source .
  • "answer this using the whole repo" → rlm-search with source .
  • "recursive search" → rlm-search
  • "search the entire codebase for X" → rlm-search with extracted query
  • "use RLM to find X" → rlm-search

Trigger Patterns Reference

Pattern Example Action
Whole-repo search "search the entire codebase for all usages of deprecated API" rlm-search "..." --source .
Directory search "recursively search src/ for logging calls" --source src/
File search "use RLM to analyze this 5000-line file" --source path/to/file.ts
Budget limit "search but cap at 200k tokens" --budget 200000
Depth limit "search up to 2 levels deep" --depth 2
Skip re-prep "search using the existing prep" No re-prep if manifest exists

Behavior

When triggered:

  1. Extract query and source — identify the natural language query and the source path (file or directory). Default source is . (current directory).

  2. Check for existing prep — look for a valid manifest in .aiwg/rlm-prep/ matching the source. Reuse only when the prep index covers every source file, manifest, and chunk. If missing, stale, incomplete, or from an older single-chunk-dropping prep run, rebuild with rlm-prep.

  3. Initial fanout (level 1) — dispatch the query across all chunks, up to --max-parallel subagents at a time. Collect results with provenance.

  4. Check synthesis fit — measure the total size of all level-1 results. If they fit in a single context window, synthesize directly (base case). If not, recurse.

  5. Recursive reduction — chunk the level-1 results into a new set of chunks and fan out again. Each level-N subagent synthesizes the results from one batch of level-(N-1) answers. Repeat until the output fits in one window.

  6. Final synthesis — produce a single coherent answer from the last reduction level. Include provenance: trace each claim back to a source file and line range.

  7. Cost summary — report total tokens consumed, number of subagents launched, recursion depth reached, and USD cost estimate.

Recursion Diagram

Level 0 (root query)
  └── Level 1 fanout: N subagents (one per chunk)
        ├── chunk-0001 → answer fragment A
        ├── chunk-0002 → answer fragment B
        ├── chunk-0003 → (no match)
        └── chunk-0004 → answer fragment C

      If A + B + C fit in one window:
        └── Synthesize → Final Answer  ✓

      If A + B + C do NOT fit:
        Level 2 fanout: chunk the level-1 results
          ├── [A + B] → synthesis fragment 1
          └── [C]     → synthesis fragment 2
          └── Synthesize fragments 1 + 2 → Final Answer  ✓

The default --depth 3 means the pipeline will recurse at most 3 times before forcing synthesis even if results are large.

Final Answer Format

RLM Search Complete
Query: "Where is rate limiting implemented?"
Source: src/  |  Chunks: 47  |  Depth reached: 1  |  Subagents: 14

Answer:

Rate limiting is implemented in three places:

1. **API gateway level** — `src/gateway/rate-limit.ts` (lines 12-45) applies
   a sliding window limiter using Redis. Limits are configured per route in
   `config/rate-limits.yaml`.

2. **Auth service** — `src/auth/middleware.ts` (lines 88-102) imposes a
   per-IP limit of 10 login attempts per minute using an in-memory store.

3. **WebSocket connections** — `src/realtime/server.ts` (lines 231-248)
   limits new connections per second to prevent connection floods.

Cost summary: 47 subagents, 184,320 tokens (~$0.18), 1 synthesis pass

Parameters

  • <query> — Natural language question or task (required)
  • --source <file|dir> — Source content to search (default: .)
  • --depth N — Maximum recursion depth before forcing synthesis (default: 3)
  • --max-parallel N — Max parallel subagents per level (default: 4, bounded by context budget). Alias --parallel is also accepted by the CLI.
  • --budget N — Token budget for the entire operation (default: 500000)

Prep coverage note: files that fit within one chunk are still written to a manifest and included in the search plan. Existing prep indexes are validated before reuse so older partial indexes are rebuilt automatically.

Examples

Example 1: Whole-codebase search

User: "search the entire codebase for where authentication tokens are validated"

Action: Check for existing prep of ., fanout across all chunks, synthesize.

Response:

RLM Search Complete
Query: "where are authentication tokens validated?"
Source: .  |  Chunks: 84  |  Depth: 1  |  Subagents: 84

Answer:

Token validation occurs at two layers:

1. **HTTP middleware** — `src/auth/middleware.ts` lines 34-67: the
   `validateToken` function decodes and verifies JWTs using the
   `jsonwebtoken` library, checking signature and expiry.

2. **GraphQL context** — `src/graphql/context.ts` lines 18-31: calls
   `validateToken` on every request and attaches the decoded payload
   to the GraphQL execution context.

Cost: 84 subagents, 241,800 tokens (~$0.24)

Example 2: Large document set, multi-level recursion

User: "use RLM to find all compliance-relevant data handling in the entire codebase"

Action:

aiwg rlm-search "find all places where PII or sensitive data is stored, transmitted, or logged" --source .

Level-1 produces 28 matching fragments totaling 40,000 tokens (too large for one pass). Level-2 reduces to 4 synthesis fragments, then final synthesis produces the answer.

Response: "Depth reached: 2. Found 14 locations across 9 files. [Full provenance-tagged answer]"


Example 3: Budget-constrained search

User: "deep search src/payments/ for Stripe webhook handling, cap at 100k tokens"

Action:

aiwg rlm-search "how are Stripe webhooks handled?" \
  --source src/payments/ \
  --budget 100000 \
  --max-parallel 4

Response: If budget would be exceeded, the pipeline pauses and reports: "Budget checkpoint: 82,400 tokens used. Continue (remaining budget: 17,600)? [y/n]"


Example 4: Single large file

User: "use RLM to analyze this 8,000-line migration file for rollback risk"

Action:

aiwg rlm-search "identify any irreversible operations with no rollback path" \
  --source db/migrations/0099_big_schema.sql \
  --depth 2

Response: Preps the single file into ~40 chunks, fans out, synthesizes. Reports all DROP, TRUNCATE, and ALTER TABLE ... DROP COLUMN statements with line numbers.


Example 5: Shallow search (fast mode)

User: "quick RLM search: where is the database connection string set?"

Action:

aiwg rlm-search "where is the database connection string configured?" \
  --source . \
  --depth 1 \
  --max-parallel 8

Response: Forces synthesis at depth 1 — faster but may miss cross-chunk context. Reports results within a single fanout pass.

Clarification Prompts

If the user's intent is ambiguous:

  • "Should I search the whole repo or a specific directory?"
  • "What token budget should I use? Default is 500,000 tokens (~$0.50 with haiku)."
  • "Is this a one-time search or should I prep the source for repeated queries?"

References

  • @$AIWG_ROOT/agentic/code/addons/rlm/skills/chunk/SKILL.md — Chunking used in prep stage
  • @$AIWG_ROOT/agentic/code/addons/rlm/skills/fanout/SKILL.md — Fanout used at each recursion level
  • @$AIWG_ROOT/agentic/code/addons/rlm/skills/rlm-prep/SKILL.md — Prep stage (called automatically if needed)
  • @$AIWG_ROOT/agentic/code/addons/rlm/skills/rlm-status/SKILL.md — Monitor a running rlm-search
  • @$AIWG_ROOT/agentic/code/addons/rlm/schemas/rlm-state.yaml — State schema for in-progress searches
  • @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/context-budget.md — Budget and parallel limits
  • @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/subagent-scoping.md — Subagent isolation (max 2-level delegation)
  • @.aiwg/research/findings/REF-089-recursive-language-models.md — RLM research foundation