Design RAG Architecture

Design a Retrieval-Augmented Generation system for a given use case.

Arguments

$ARGUMENTS - The RAG use case to design for (e.g., "customer support chatbot", "documentation Q&A", "legal document search", "code assistant")

Workflow

Clarify requirements by understanding:
- What type of questions will be asked?
- What is the document corpus size and type?
- What is the required accuracy/faithfulness?
- What is the latency budget?
- Are there multi-turn conversation requirements?
Load relevant skills based on the use case:
- RAG patterns → rag-architecture
- Vector store selection → vector-databases
- LLM serving → llm-serving-patterns
- Inference optimization → ml-inference-optimization
Spawn the rag-architect agent for comprehensive design:
- Use Task tool with subagent_type="rag-architect"
- Provide full use case context and requirements
- Request end-to-end RAG architecture
Design the ingestion pipeline:
- Document extraction (PDF, HTML, code)
- Chunking strategy selection
- Embedding model selection
- Vector database configuration
- Metadata extraction and indexing
Design the retrieval pipeline:
- Query processing (expansion, HyDE)
- Retrieval strategy (dense, sparse, hybrid)
- Reranking approach
- Context assembly
- Prompt engineering
Address quality and scale:
- Retrieval accuracy (recall@k, MRR)
- Answer faithfulness (grounding)
- Latency budget allocation
- Cost optimization
- Scaling strategy

Example Usage

/sd:rag-design customer support chatbot with 10K FAQ documents
/sd:rag-design internal documentation Q&A for engineering team
/sd:rag-design legal document search for contract review
/sd:rag-design code assistant for enterprise codebase
/sd:rag-design research paper Q&A with 100K papers
/sd:rag-design product catalog search with structured data
/sd:rag-design multi-lingual knowledge base

Use Case Categories

Category	Key Considerations
Customer Support	FAQ coverage, escalation, tone consistency
Documentation	Technical accuracy, code examples, versioning
Legal/Compliance	Citation accuracy, audit trails, access control
Code Assistance	AST-aware chunking, context relevance, IDE integration
Research/Academic	Multi-document reasoning, citation, long-form answers
E-commerce	Product attributes, inventory awareness, personalization

RAG Pattern Selection Guide

Complexity	Pattern	When to Use
Low	Basic RAG	Simple Q&A, small corpus
Medium	RAG + Reranking	Higher accuracy needed
Medium	Hybrid Search	Mixed keyword + semantic queries
High	Query-Transformed	Vague or complex queries
High	Agentic RAG	Multi-hop reasoning, tool use

Output

A comprehensive RAG system architecture including:

Ingestion pipeline (documents → vectors)
Retrieval pipeline (query → context)
Technology stack (embedding model, vector DB, LLM)
Quality targets (recall, faithfulness, latency)
Trade-offs and alternatives
Cost estimate (per-query and monthly)