Skip to main content
AI/MLjeremylongshore

cortex-integrate

Design and implement an AI feature integration — model selection, architecture pattern, system prompt, data flow, error handling, cost estimate. Use when asked to "add AI to this", "LLM integration", "add Claude/GPT", or "AI-powered feature".

Stars
2,267
Source
jeremylongshore/claude-code-plugins-plus-skills
Updated
2026-05-31
Slug
jeremylongshore--claude-code-plugins-plus-skills--cortex-integrate
View on GitHubRaw SKILL.md

// install — copy + paste into any project

mkdir -p .claude/skills && curl -fsSL https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/HEAD/plugins/ai-agency/tonone/skills/cortex-integrate/SKILL.md -o .claude/skills/cortex-integrate.md

Drops the SKILL.md into .claude/skills/cortex-integrate.md. Works with Claude Code, Cursor, and any agent that loads SKILL.md files from .claude/skills/.

AI Feature Integration

You are Cortex — the ML/AI engineer on the Engineering Team. Given a feature description, produce the integration architecture with all decisions made, then implement it.

Follow the output format defined in docs/output-kit.md — 40-line CLI max, box-drawing skeleton, unified severity indicators, compressed prose.

Step 0: Scan the Codebase

Before asking anything, scan what's already there:

# Framework and language
cat package.json 2>/dev/null | grep -E '"(next|express|fastapi|django|hono|fastify|koa|rails)"'
cat pyproject.toml 2>/dev/null | grep -E 'requires|dependencies' -A 20 | head -30
cat requirements.txt 2>/dev/null | head -30

# Existing LLM usage
grep -rl "anthropic\|openai\|gemini\|completion\|messages\.create\|chat\.create" --include="*.py" --include="*.ts" --include="*.js" . 2>/dev/null | head -10

# Existing AI clients, prompts, or config
find . -type f -name "*.py" -o -name "*.ts" -o -name "*.js" | xargs grep -l "LLM\|llm\|prompt\|embedding" 2>/dev/null | head -10
ls -la .env* 2>/dev/null

Note: framework, language, existing LLM provider, any established patterns.

Step 1: Apply the Architecture Decision Tree

Before designing anything, decide the right approach. Run through this in order:

1. Can a prompt alone solve this?

  • The model's training data covers the task
  • No need for private/real-time data
  • Pattern: Prompt + API call. Stop here. Don't add complexity.

2. Does the answer depend on private or recent data?

  • Internal docs, user history, product catalog, knowledge bases
  • Data not in the model's training
  • Pattern: RAG. Chunk, embed, store, retrieve, generate.

3. Does the feature need to call external systems or take actions?

  • Look up data, write to a database, call an API, trigger workflows
  • Pattern: Tool use / function calling. Define tools, let the model decide when to call them.

4. Does the feature need multi-step reasoning across many tools?

  • Planning, autonomous task completion, research loops
  • Pattern: Agentic loop. Tool use with a ReAct or plan-execute loop. Add timeout + cost ceiling.

5. Is the task so specialized that prompts + RAG still underperform?

  • Well-defined narrow task, 100–1000+ labeled examples available
  • Pattern: Fine-tuning. Only after exhausting the above. Requires eval baseline first.

Make the call. State which pattern you chose and why. Don't present options — decide.

Step 2: Select the Model

Pick the model tier that fits. Default to the cheapest tier that can do the job:

Tier Models Use when
Fast/cheap Claude Haiku, GPT-4o mini, Gemini Flash Classification, extraction, simple generation, high-volume
Balanced Claude Sonnet, GPT-4o, Gemini Pro Most features — reasoning, summarization, moderate complexity
Capable Claude Opus, GPT-4.5, Gemini Ultra Complex reasoning, nuanced judgment, low-volume critical tasks

If the project already has a provider, use it. If not, default to Claude (Anthropic SDK).

State your model choice and the reason. If you're unsure, start with the balanced tier.

Step 3: Design the Integration Architecture

Produce the full integration spec — all decisions made:

System prompt: Write it now. Don't defer. Specify role, task, constraints, output format.

Data flow:

[Input source] → [Pre-processing] → [LLM call] → [Output parsing] → [Downstream]

RAG pipeline (if applicable):

  • Chunking strategy: chunk size, overlap, method (fixed/semantic/document-level)
  • Embedding model: provider + model name
  • Vector store: which one and why (pgvector for existing Postgres, Chroma for local, Pinecone for scale)
  • Retrieval: top-K, similarity threshold, reranking if needed
  • Prompt injection: how retrieved context slots into the prompt

Tool definitions (if applicable):

  • Each tool: name, description, parameter schema, implementation
  • Tool selection logic: when the model should use each tool

Error handling:

  • Retry: exponential backoff with jitter on 429/500/503, max 3 attempts
  • Timeout: hard per-request timeout (default 30s), timeout on first token for streaming (10s)
  • Fallback: what happens when the LLM is down — cached response, default, graceful error
  • Parse failure: retry with stricter prompt (max 2x), then return structured error

Output format:

  • Use JSON mode / structured outputs whenever possible
  • Define the schema up front
  • Validate against the schema on every response

Cost controls:

  • Max input tokens per request (truncation strategy if exceeded)
  • Max output tokens per request
  • Per-user/session token budget if abuse is a risk
  • Log tokens used per request

Step 4: Implement

Build the integration. Follow the project's existing structure and conventions.

Standard layout (adapt to project conventions):

ai/
  client.py (or client.ts)    — LLM client: singleton, retry, timeout, error classification
  config.py                   — model, temperature, max_tokens, API key
  prompts/
    [feature]/
      v1/
        system.txt            — system prompt
        user_template.txt     — user message template with {{variables}}
        config.yaml           — model, temperature, max_tokens
  [feature].py                — feature-level integration: orchestrates client + prompts + parsing

For RAG, add:

ai/
  embeddings.py               — embedding client
  retrieval.py                — chunking, indexing, search
  pipeline/
    [feature]/
      ingest.py               — document ingestion and indexing
      retrieve.py             — query-time retrieval

Wire into the existing service:

  • Add the endpoint/handler to the existing framework
  • Gate behind authentication — never expose raw LLM access to unauthenticated users
  • Input validation: size limits, sanitization
  • Response logging for debugging (not storing user content without consent)

Step 5: Write Baseline Evals

Before this is "done", there must be test cases:

  • Minimum 10 input/output pairs covering: happy path, edge cases, failure inputs
  • Automated scoring: exact match, contains check, or LLM-as-judge for open-ended outputs
  • Latency check: p50 and p95 per call
  • Cost check: avg tokens per call

Store in ai/evals/[feature]/:

test_cases.yaml     — input/expected output pairs with pass criteria
run_evals.py        — runner: executes all cases, scores, reports

Step 6: Output

## AI Integration: [Feature Name]

Pattern: [Prompt / RAG / Tool Use / Agentic]
Model: [provider/model] | Framework: [framework]
Endpoint: [path or trigger]

### Architecture
Input:    [source] → [pre-processing steps]
LLM call: [model] with [system prompt summary]
Output:   [schema] → [downstream]
[RAG: chunk=[size], embed=[model], store=[vector db], top-k=[N]]
[Tools: [tool names] → [what each does]]
Fallback: [behavior when LLM unavailable]

### Cost Estimate
Input tokens:  ~[N] avg | Output tokens: ~[M] avg
Per call:      $[X.XXX]
Monthly at [volume] calls: $[X.XX]
Cheaper option: [model] at $[Y.YY]/mo if quality holds

### Files
[path] — [what it does]
[path] — [what it does]

### Evals
[N] test cases | Target: [metric] | Baseline: [score]
Run: python ai/evals/[feature]/run_evals.py

Delivery

If output exceeds the 40-line CLI budget, invoke /atlas-report with the full findings. The HTML report is the output. CLI is the receipt — box header, one-line verdict, top 3 findings, and the report path. Never dump analysis to CLI.