Skip to main content
Designmelodic-software

rag-design

Design a RAG architecture for a use case

Stars
74
Source
melodic-software/claude-code-plugins
Updated
2026-04-07
Slug
melodic-software--claude-code-plugins--rag-design
View on GitHubRaw SKILL.md

// install — copy + paste into any project

mkdir -p .claude/skills && curl -fsSL https://raw.githubusercontent.com/melodic-software/claude-code-plugins/HEAD/plugins/systems-design/skills/rag-design/SKILL.md -o .claude/skills/rag-design.md

Drops the SKILL.md into .claude/skills/rag-design.md. Works with Claude Code, Cursor, and any agent that loads SKILL.md files from .claude/skills/.

Design RAG Architecture

Design a Retrieval-Augmented Generation system for a given use case.

Arguments

$ARGUMENTS - The RAG use case to design for (e.g., "customer support chatbot", "documentation Q&A", "legal document search", "code assistant")

Workflow

  1. Clarify requirements by understanding:

    • What type of questions will be asked?
    • What is the document corpus size and type?
    • What is the required accuracy/faithfulness?
    • What is the latency budget?
    • Are there multi-turn conversation requirements?
  2. Load relevant skills based on the use case:

    • RAG patterns → rag-architecture
    • Vector store selection → vector-databases
    • LLM serving → llm-serving-patterns
    • Inference optimization → ml-inference-optimization
  3. Spawn the rag-architect agent for comprehensive design:

    • Use Task tool with subagent_type="rag-architect"
    • Provide full use case context and requirements
    • Request end-to-end RAG architecture
  4. Design the ingestion pipeline:

    • Document extraction (PDF, HTML, code)
    • Chunking strategy selection
    • Embedding model selection
    • Vector database configuration
    • Metadata extraction and indexing
  5. Design the retrieval pipeline:

    • Query processing (expansion, HyDE)
    • Retrieval strategy (dense, sparse, hybrid)
    • Reranking approach
    • Context assembly
    • Prompt engineering
  6. Address quality and scale:

    • Retrieval accuracy (recall@k, MRR)
    • Answer faithfulness (grounding)
    • Latency budget allocation
    • Cost optimization
    • Scaling strategy

Example Usage

/sd:rag-design customer support chatbot with 10K FAQ documents
/sd:rag-design internal documentation Q&A for engineering team
/sd:rag-design legal document search for contract review
/sd:rag-design code assistant for enterprise codebase
/sd:rag-design research paper Q&A with 100K papers
/sd:rag-design product catalog search with structured data
/sd:rag-design multi-lingual knowledge base

Use Case Categories

Category Key Considerations
Customer Support FAQ coverage, escalation, tone consistency
Documentation Technical accuracy, code examples, versioning
Legal/Compliance Citation accuracy, audit trails, access control
Code Assistance AST-aware chunking, context relevance, IDE integration
Research/Academic Multi-document reasoning, citation, long-form answers
E-commerce Product attributes, inventory awareness, personalization

RAG Pattern Selection Guide

Complexity Pattern When to Use
Low Basic RAG Simple Q&A, small corpus
Medium RAG + Reranking Higher accuracy needed
Medium Hybrid Search Mixed keyword + semantic queries
High Query-Transformed Vague or complex queries
High Agentic RAG Multi-hop reasoning, tool use

Output

A comprehensive RAG system architecture including:

  • Ingestion pipeline (documents → vectors)
  • Retrieval pipeline (query → context)
  • Technology stack (embedding model, vector DB, LLM)
  • Quality targets (recall, faithfulness, latency)
  • Trade-offs and alternatives
  • Cost estimate (per-query and monthly)