RLM Search

The full Recursive Language Model pipeline in one command. Prepares content if needed, fans the query out across all chunks, and recursively synthesizes results until they fit in a single context window. Use this when you need to answer a question against content too large to read at once.

Triggers

Alternate expressions and non-obvious activations:

"deep search this codebase" → rlm-search with source .
"answer this using the whole repo" → rlm-search with source .
"recursive search" → rlm-search
"search the entire codebase for X" → rlm-search with extracted query
"use RLM to find X" → rlm-search

Trigger Patterns Reference

Pattern	Example	Action
Whole-repo search	"search the entire codebase for all usages of deprecated API"	`rlm-search "..." --source .`
Directory search	"recursively search src/ for logging calls"	`--source src/`
File search	"use RLM to analyze this 5000-line file"	`--source path/to/file.ts`
Budget limit	"search but cap at 200k tokens"	`--budget 200000`
Depth limit	"search up to 2 levels deep"	`--depth 2`
Skip re-prep	"search using the existing prep"	No re-prep if manifest exists

Behavior

When triggered:

Extract query and source — identify the natural language query and the source path (file or directory). Default source is . (current directory).
Check for existing prep — look for a valid manifest in .aiwg/rlm-prep/ matching the source. Reuse only when the prep index covers every source file, manifest, and chunk. If missing, stale, incomplete, or from an older single-chunk-dropping prep run, rebuild with rlm-prep.
Initial fanout (level 1) — dispatch the query across all chunks, up to --max-parallel subagents at a time. Collect results with provenance.
Check synthesis fit — measure the total size of all level-1 results. If they fit in a single context window, synthesize directly (base case). If not, recurse.
Recursive reduction — chunk the level-1 results into a new set of chunks and fan out again. Each level-N subagent synthesizes the results from one batch of level-(N-1) answers. Repeat until the output fits in one window.
Final synthesis — produce a single coherent answer from the last reduction level. Include provenance: trace each claim back to a source file and line range.
Cost summary — report total tokens consumed, number of subagents launched, recursion depth reached, and USD cost estimate.

Recursion Diagram

Level 0 (root query)
  └── Level 1 fanout: N subagents (one per chunk)
        ├── chunk-0001 → answer fragment A
        ├── chunk-0002 → answer fragment B
        ├── chunk-0003 → (no match)
        └── chunk-0004 → answer fragment C

      If A + B + C fit in one window:
        └── Synthesize → Final Answer  ✓

      If A + B + C do NOT fit:
        Level 2 fanout: chunk the level-1 results
          ├── [A + B] → synthesis fragment 1
          └── [C]     → synthesis fragment 2
          └── Synthesize fragments 1 + 2 → Final Answer  ✓

The default --depth 3 means the pipeline will recurse at most 3 times before forcing synthesis even if results are large.

Final Answer Format

RLM Search Complete
Query: "Where is rate limiting implemented?"
Source: src/  |  Chunks: 47  |  Depth reached: 1  |  Subagents: 14

Answer:

Rate limiting is implemented in three places:

1. **API gateway level** — `src/gateway/rate-limit.ts` (lines 12-45) applies
   a sliding window limiter using Redis. Limits are configured per route in
   `config/rate-limits.yaml`.

2. **Auth service** — `src/auth/middleware.ts` (lines 88-102) imposes a
   per-IP limit of 10 login attempts per minute using an in-memory store.

3. **WebSocket connections** — `src/realtime/server.ts` (lines 231-248)
   limits new connections per second to prevent connection floods.

Cost summary: 47 subagents, 184,320 tokens (~$0.18), 1 synthesis pass

Parameters

<query> — Natural language question or task (required)
--source <file|dir> — Source content to search (default: .)
--depth N — Maximum recursion depth before forcing synthesis (default: 3)
--max-parallel N — Max parallel subagents per level (default: 4, bounded by context budget). Alias --parallel is also accepted by the CLI.
--budget N — Token budget for the entire operation (default: 500000)

Prep coverage note: files that fit within one chunk are still written to a manifest and included in the search plan. Existing prep indexes are validated before reuse so older partial indexes are rebuilt automatically.

Examples

Example 1: Whole-codebase search

User: "search the entire codebase for where authentication tokens are validated"

Action: Check for existing prep of ., fanout across all chunks, synthesize.

Response:

RLM Search Complete
Query: "where are authentication tokens validated?"
Source: .  |  Chunks: 84  |  Depth: 1  |  Subagents: 84

Answer:

Token validation occurs at two layers:

1. **HTTP middleware** — `src/auth/middleware.ts` lines 34-67: the
   `validateToken` function decodes and verifies JWTs using the
   `jsonwebtoken` library, checking signature and expiry.

2. **GraphQL context** — `src/graphql/context.ts` lines 18-31: calls
   `validateToken` on every request and attaches the decoded payload
   to the GraphQL execution context.

Cost: 84 subagents, 241,800 tokens (~$0.24)

Example 2: Large document set, multi-level recursion

User: "use RLM to find all compliance-relevant data handling in the entire codebase"

Action:

aiwg rlm-search "find all places where PII or sensitive data is stored, transmitted, or logged" --source .

Level-1 produces 28 matching fragments totaling 40,000 tokens (too large for one pass). Level-2 reduces to 4 synthesis fragments, then final synthesis produces the answer.

Response: "Depth reached: 2. Found 14 locations across 9 files. [Full provenance-tagged answer]"

Example 3: Budget-constrained search

User: "deep search src/payments/ for Stripe webhook handling, cap at 100k tokens"

Action:

aiwg rlm-search "how are Stripe webhooks handled?" \
  --source src/payments/ \
  --budget 100000 \
  --max-parallel 4

Response: If budget would be exceeded, the pipeline pauses and reports: "Budget checkpoint: 82,400 tokens used. Continue (remaining budget: 17,600)? [y/n]"

Example 4: Single large file

User: "use RLM to analyze this 8,000-line migration file for rollback risk"

Action:

aiwg rlm-search "identify any irreversible operations with no rollback path" \
  --source db/migrations/0099_big_schema.sql \
  --depth 2

Response: Preps the single file into ~40 chunks, fans out, synthesizes. Reports all DROP, TRUNCATE, and ALTER TABLE ... DROP COLUMN statements with line numbers.

Example 5: Shallow search (fast mode)

User: "quick RLM search: where is the database connection string set?"

Action:

aiwg rlm-search "where is the database connection string configured?" \
  --source . \
  --depth 1 \
  --max-parallel 8

Response: Forces synthesis at depth 1 — faster but may miss cross-chunk context. Reports results within a single fanout pass.

Clarification Prompts

If the user's intent is ambiguous:

"Should I search the whole repo or a specific directory?"
"What token budget should I use? Default is 500,000 tokens (~$0.50 with haiku)."
"Is this a one-time search or should I prep the source for repeated queries?"

References

@$AIWG_ROOT/agentic/code/addons/rlm/skills/chunk/SKILL.md — Chunking used in prep stage
@$AIWG_ROOT/agentic/code/addons/rlm/skills/fanout/SKILL.md — Fanout used at each recursion level
@$AIWG_ROOT/agentic/code/addons/rlm/skills/rlm-prep/SKILL.md — Prep stage (called automatically if needed)
@$AIWG_ROOT/agentic/code/addons/rlm/skills/rlm-status/SKILL.md — Monitor a running rlm-search
@$AIWG_ROOT/agentic/code/addons/rlm/schemas/rlm-state.yaml — State schema for in-progress searches
@$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/context-budget.md — Budget and parallel limits
@$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/subagent-scoping.md — Subagent isolation (max 2-level delegation)
@.aiwg/research/findings/REF-089-recursive-language-models.md — RLM research foundation