MCP Server Development
Adapted from anthropics/skills/mcp-builder. MCP-server quality is measured by how well it lets LLMs accomplish real-world tasks — not by endpoint count.
Stack default for our projects
- Language: TypeScript (matches our stack; static typing + Zod schemas + good LLM code-gen)
- Transport:
stdiofor local tools, Streamable HTTP (stateless JSON) for remote - SDK:
@modelcontextprotocol/sdk - Package manager: pnpm (never npm/yarn in our repos)
Phase 1 — Research & Plan
1.1 Design principles
API coverage vs. workflow tools. Balance comprehensive endpoint coverage with specialized workflow shortcuts. Default to coverage unless you have a clear reason — agents compose basic tools well; workflow tools ossify.
Tool naming & discoverability. Consistent prefix + action verb. Examples:
github_create_issue,github_list_reposgitlab_search_issues,gitlab_close_mr
Context management. Return focused, paginated data. Agents suffer when a single tool call floods context.
Actionable error messages. Errors must guide the next action:
❌ "Invalid input"
✅ "Field 'project_id' is required. Call gitlab_list_projects to enumerate available IDs."
1.2 Read the spec
- Sitemap:
https://modelcontextprotocol.io/sitemap.xml - Append
.mdto any page URL for markdown (e.g.https://modelcontextprotocol.io/specification/draft.md)
Focus on: tool definitions, resource definitions, transport mechanisms.
1.3 Load SDK docs
- TS SDK README:
https://raw.githubusercontent.com/modelcontextprotocol/typescript-sdk/main/README.md - Python SDK README:
https://raw.githubusercontent.com/modelcontextprotocol/python-sdk/main/README.md
Fetch via WebFetch only when needed — don't dump entire docs into context upfront.
1.4 Plan implementation
- Review the target service's API docs (auth, core endpoints, data models)
- List endpoints by priority — most-common operations first
- Identify destructive vs. read-only operations (matters for tool annotations)
Tool-Hosting Pattern — In-Process vs Stdio MCP
Before writing a line of implementation code, choose a hosting pattern. The wrong choice cannot be refactored cheaply once tooling is wired.
Decision tree
≤ 5 tools AND latency-critical (<50ms tool resolution)?
│
├─ Yes → tools share the SDK process AND no external auth required?
│ │
│ ├─ Yes → In-process @tool decorator (single-process, sub-ms resolution)
│ └─ No → Stdio MCP Server
│
└─ No → Stdio MCP Server
(≥ 6 tools, external auth, language/runtime mismatch, long-lived process)
In-process @tool decorator (Python — anthropics/claude-agent-sdk-python)
Use create_sdk_mcp_server when your tools live entirely inside the SDK process and you need the lowest possible latency. Source reference: examples/mcp_calculator.py L11–99.
from claude_agent_sdk import tool, create_sdk_mcp_server
@tool(name="add", description="Add two numbers", input_schema={"a": int, "b": int})
async def add(args):
return {"content": [{"type": "text", "text": str(args["a"] + args["b"])}]}
server = create_sdk_mcp_server(name="calc", version="1.0.0", tools=[add])
In-process registration (TypeScript — @modelcontextprotocol/sdk)
Our default stack uses McpServer.registerTool() from @modelcontextprotocol/sdk. The inline Zod schema is parsed at registration time — no separate schema file needed for small tool sets.
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';
const server = new McpServer({ name: 'calc', version: '1.0.0' });
server.registerTool(
'add',
{
title: 'Add two numbers',
inputSchema: { a: z.number(), b: z.number() },
},
async ({ a, b }) => ({
content: [{ type: 'text', text: String(a + b) }],
}),
);
Tool annotations — readOnlyHint and destructiveHint
Annotations are first-class SDK metadata that Claude and downstream hooks use for permission decisions. Set them on every tool:
server.registerTool(
'delete-file',
{
title: 'Delete a file',
inputSchema: { path: z.string() },
annotations: { readOnlyHint: false, destructiveHint: true },
},
handler,
);
readOnlyHint: true— signals the tool only reads state; Claude can call it freely without a permission prompt.destructiveHint: true— signals irreversible side effects; ourpre-bash-destructive-guardhook andagents/security-reviewer.mdboth elevate review priority for tools carrying this flag. Any tool that deletes, overwrites, or mutates shared state must set this.- Missing
destructiveHint: trueon a destructive tool is a known pitfall — see the "Common pitfalls" table below.
Pattern comparison
| Aspect | In-Process @tool | Stdio MCP Server |
|---|---|---|
| Tool count | ≤ 5 | 6+ |
| Latency | Sub-ms resolution | 5–50 ms IPC overhead |
| Auth complexity | Shares SDK auth | Separate auth context |
| Language constraint | Must match SDK | Any runtime |
| Process isolation | None (in-SDK) | Full (separate child) |
| Lifecycle | Bound to SDK session | Long-lived independent |
For the stdio MCP server implementation path (≥ 6 tools, external auth, or language mismatch), continue with Phase 2 — Implementation below, which covers project structure, core infrastructure, and the full TypeScript stdio setup.
Phase 2 — Implementation
2.1 Project structure (TypeScript)
mcp-server-name/
├── package.json
├── tsconfig.json
├── src/
│ ├── index.ts (server entry, transport wiring)
│ ├── tools/ (one file per tool or tool group)
│ ├── schemas.ts (shared Zod schemas)
│ └── client.ts (API client with auth + error handling)
└── README.md (setup + config)
2.2 Core infrastructure
Build once, reuse everywhere:
- API client with auth (env-var-driven, never hardcoded)
- Error-handler helper that returns actionable MCP error responses
- Pagination helper (most APIs paginate; most tools forget)
- Response formatter (JSON for structured, Markdown for human-readable where agents benefit from it)
2.3 Implement tools
For each tool:
Input schema — Zod, with descriptions per field:
z.object({
projectId: z.string().describe("GitLab project ID. Call gitlab_list_projects to discover."),
state: z.enum(["opened", "closed", "all"]).default("opened"),
});
Output schema — define outputSchema where possible; use structuredContent in tool responses (TS SDK feature). This helps downstream agents parse results.
Annotations — set all four:
readOnlyHint: true/falsedestructiveHint: true/falseidempotentHint: true/falseopenWorldHint: true/false
These inform Claude's hook decisions (destructive-guard, permission prompts).
Implementation — async/await for I/O; errors must surface with enough context for the LLM to fix them.
Phase 3 — Review & Test
3.1 Code quality
- DRY — no duplicated API-call logic
- Consistent error handling (one helper, not ad-hoc throws)
- Full TypeScript coverage —
tsgo --noEmitortsc --noEmitclean - Clear tool descriptions
3.2 Build & test
pnpm build # or npm run build in non-pnpm projects
npx @modelcontextprotocol/inspector # interactive testing UI
Walk through every tool in the Inspector. If a tool can fail, trigger the failure and verify the error message is actionable.
Phase 4 — Evaluations
Create 10 evaluation questions. An MCP server without evals is a guess, not a deliverable.
Each question must be:
- Independent — doesn't depend on a previous question's answer
- Read-only — no destructive side effects
- Complex — requires multiple tool calls, not a single lookup
- Realistic — a real user would actually ask this
- Verifiable — has a single correct answer checkable by string comparison
- Stable — answer doesn't change over time
Output format
<evaluation>
<qa_pair>
<question>Which GitLab project in group 'X' has the highest number of open issues labeled 'bug'?</question>
<answer>project-name-here</answer>
</qa_pair>
</evaluation>
Run the eval via: Claude-with-MCP-server on each question, compare output to expected answer. Any eval below 80% accuracy signals tool-design problems (usually: unclear descriptions, missing pagination, or bad error messages).
Common pitfalls
| Pitfall | Fix |
|---|---|
| Tool returns 10k rows, agent context blows up | Add pagination + default page size |
| Agent can't figure out auth failure | Error message: "Set ENV_VAR_NAME — current value is empty" |
| Tool name collision across MCP servers | Always prefix with service name |
Destructive tools without destructiveHint: true |
Breaks our destructive-guard hook |
| Async errors swallowed | Wrap every handler in try/catch that returns structured error |
References
Upstream reference material (worth reading once, not mirroring here):