Architecture·10 min read·12 June 2026

MCP Server in Production: Security, Scope, and Reliability

Learn how to design, secure, and operate an MCP server for Claude agents in production: permissions, tool scope, error handling, and enterprise deployment patterns.

By Solomon Udoh · AI Architect & Certification Lead

MCP Server in Production: Security, Scope, and Reliability

An MCP server is the standardised integration layer that lets Claude agents call external tools, read resources, and trigger actions without bespoke glue code for every service. The Model Context Protocol (MCP), published by Anthropic in late 2024, defines a client-server contract so that any compliant host, including Claude, can discover and invoke capabilities at runtime. Getting that contract right in production is what Domain 2 of the CCA-F exam (Tool Design & MCP Integration, 18% of the exam) tests directly.

This guide covers the decisions that matter most: how to scope permissions without locking agents out, when to use read-only resources versus write actions, how to prevent tool overload, and how to build the error-handling and observability layer that keeps enterprise deployments auditable.

What exactly is an MCP server and how does Claude connect to it?

An MCP server exposes three primitive types to a connected host: tools (callable functions with side effects), resources (read-only content addressable by URI), and prompts (reusable prompt templates). Claude, acting as an MCP client, discovers these primitives at session start through a capability negotiation handshake, then selects among them during inference.

The configuration that wires Claude to a server lives in a JSON file. In Claude Code, for example, a local filesystem server entry looks like this:

json
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/workspace/project"],
"env": {
"NODE_ENV": "production"
}
}
}
}

The command field launches the server process; Claude communicates with it over stdio or HTTP+SSE. Because the server process runs with the permissions of the host process, the scope of that process matters enormously. We return to this under security below.

For a deeper look at how tool results flow back into the conversation, see our concept on Tool Result Appending.

How should you scope MCP server permissions to avoid PII leakage?

Scope permissions to the minimum surface the agent needs for its task, then enforce that surface programmatically rather than relying on prompt instructions alone.

The exam consistently rewards deterministic, programmatic controls over probabilistic ones when stakes are high. That principle maps directly to MCP permission design:

Control layerMechanismReliability
Filesystem server path restrictionPass only the target directory as an argumentDeterministic
Database server row-level filteringServer-side WHERE clause on every queryDeterministic
API server OAuth scopeIssue tokens with minimum required scopesDeterministic
Prompt instruction ("don't read PII")System prompt textProbabilistic

Prompt-level instructions are useful for nuance, but they are not a security boundary. A server that can read /etc/passwd will read it if the model decides to. Restrict the path at the server level instead.

For PII specifically, consider building a sanitisation hook that strips or masks sensitive fields before the tool result reaches the model's context window. Our concept on PostToolUse Hooks for Data Normalisation explains the hook pattern in detail.

Environment variables are the correct way to pass secrets (API keys, database credentials) into an MCP server process. Hard-coding credentials in the config file creates version-control exposure. See Environment Variable Expansion in MCP Config for the exact syntax Claude Code supports.

json
{
"mcpServers": {
"crm": {
"command": "node",
"args": ["./servers/crm-server.js"],
"env": {
"CRM_API_KEY": "${CRM_API_KEY}",
"CRM_BASE_URL": "https://crm.internal.example.com"
}
}
}
}

When should an MCP server expose resources versus tools?

Use resources for read-only, URI-addressable content that the agent needs as context. Use tools for anything with a side effect: writing a file, sending an email, updating a record, executing code.

This distinction matters for two reasons. First, resources do not require the same level of confirmation logic as tools because they cannot change state. Second, MCP clients can prefetch or cache resources more aggressively than tool calls, which reduces latency and token consumption.

A practical heuristic:

OperationPrimitiveRationale
Read a product catalogueResourceNo side effect; cacheable
Fetch a customer record for displayResourceRead-only
Update a customer recordToolMutates state
Send a Slack messageToolExternal side effect
Execute a SQL SELECTResource or ToolTool if query is dynamic/parameterised
Execute a SQL INSERTToolAlways

For write operations that are irreversible (deleting records, sending emails, making payments), the exam pattern is to add a confirmation gate before execution. This is not a prompt instruction; it is a server-side check that requires an explicit confirmed: true parameter before the destructive path runs.

python
@server.tool()
async def delete_customer(customer_id: str, confirmed: bool = False) -> dict:
if not confirmed:
return {
"isError": False,
"content": [{"type": "text", "text": f"Dry run: would delete customer {customer_id}. Pass confirmed=true to proceed."}]
}
# proceed with deletion
result = await crm.delete(customer_id)
return {"isError": False, "content": [{"type": "text", "text": f"Deleted {customer_id}"}]}

How do too many MCP servers hurt agent performance?

Connecting every available MCP server to every agent session is the most common production mistake. Each server's tool list is injected into the context window during capability negotiation. With ten servers exposing fifteen tools each, the agent receives 150 tool descriptions before it has processed a single user message.

This creates two compounding problems. First, token consumption rises, increasing cost and latency. Second, the model's ability to select the correct tool degrades as the tool list grows, a phenomenon the exam calls the tool overload problem.

The fix is scoping: connect only the servers relevant to the current task or agent role. The MCP Scoping Hierarchy defines three levels at which you can apply this:

Scope levelWhere configuredGranularity
User (personal)~/.claude/ settingsPer-developer
Project.claude/ in repo rootPer-codebase
SessionRuntime flag or API paramPer-invocation

In a multi-agent system, the coordinator should pass only the tools its subagents need for their specific subtask, not the full tool manifest. This is the Tool Distribution Strategy Design pattern: treat tool access as a capability that is granted per-role, not broadcast globally.

Tools are a form of API surface. Every tool you expose is a surface you must secure, monitor, and maintain. Fewer tools, better described, outperform many tools with thin descriptions.

Anthropic , Model Context Protocol Documentation

What does enterprise-grade MCP server deployment require?

Production deployments need four operational layers beyond the basic server implementation: authentication, rate limiting, structured logging, and health monitoring.

Authentication. For HTTP-transport MCP servers, use OAuth 2.0 or API key validation at the server boundary. The server should reject unauthenticated requests before any tool logic runs. Do not rely on network isolation alone.

Rate limiting. Agents in agentic loops can call tools far faster than a human would. Without rate limits, a runaway loop can exhaust a downstream API quota in seconds. Implement per-session and per-tool rate limits server-side.

Structured logging. Every tool invocation should emit a structured log entry with at minimum: timestamp, tool name, input parameters (with PII masked), output summary, latency, and success/error status. This is the audit trail that compliance teams require and that you need for root-cause debugging.

json
{
"timestamp": "2026-06-11T14:32:01Z",
"tool": "crm.update_customer",
"session_id": "sess_abc123",
"input": {"customer_id": "cust_789", "field": "email", "value": "[REDACTED]"},
"output_summary": "success",
"latency_ms": 142,
"error": null
}

Health monitoring. Expose a /health endpoint from each HTTP MCP server. Your orchestration layer should poll it and remove unhealthy servers from the active pool rather than letting the agent discover failures mid-task.

The CCA-F exam as of 12 March 2026 tests these patterns under Domain 2 (Tool Design & MCP Integration, 18%) and Domain 5 (Context Management & Reliability, 15%). As of 3 June 2026, more than 10,000 individuals have passed the exam, and production MCP deployment questions appear consistently in scenario-based items.

How should MCP servers handle errors so agents can recover?

MCP defines the isError flag on tool results precisely for this purpose. When a tool call fails, the server should return isError: true with a structured error payload rather than throwing an exception or returning an empty result. An exception crashes the tool call and gives the agent nothing to reason about. A structured error gives the agent information it can act on.

The four error categories the exam tests are: access failure, validation failure, downstream service failure, and rate-limit failure. Each warrants a different response:

Error categoryisErrorRecommended agent action
Access failure (auth/permission)trueEscalate to human; do not retry
Validation failure (bad input)trueRetry with corrected parameters
Downstream service failuretrueRetry with backoff; escalate after N attempts
Rate-limit failuretrueWait and retry; surface wait time if known
python
@server.tool()
async def fetch_order(order_id: str) -> dict:
try:
order = await orders_api.get(order_id)
return {"isError": False, "content": [{"type": "text", "text": order.to_json()}]}
except PermissionError as e:
return {"isError": True, "content": [{"type": "text", "text": f"Access denied: {e}. Escalate to administrator."}]}
except RateLimitError as e:
return {"isError": True, "content": [{"type": "text", "text": f"Rate limited. Retry after {e.retry_after}s."}]}
except Exception as e:
return {"isError": True, "content": [{"type": "text", "text": f"Service error: {e}. Retry with backoff."}]}

The distinction between an access failure and a valid empty result is a common exam trap. If a search returns zero results, isError should be false with an empty list. If the search could not execute because of a permission problem, isError should be true. Conflating these causes agents to misroute their recovery logic. See Access Failure vs Valid Empty Result for the full treatment.

How do you test MCP integrations for real agent reliability?

Unit tests on individual tool functions are necessary but not sufficient. Agents fail in ways that only emerge from the full tool-selection-and-execution loop: the model picks the wrong tool, a tool returns a subtly malformed result that the model misinterprets, or a sequence of valid calls produces an invalid aggregate state.

A reliable MCP test suite has three layers:

  1. Unit tests on each tool function: valid inputs, each error category, boundary conditions.
  2. Integration tests that run the MCP server against a real (or stubbed) downstream and verify the full request-response cycle including isError semantics.
  3. Agent-in-the-loop tests that send a natural-language task to Claude with the MCP server connected and assert on the final outcome, not the intermediate steps.

For layer three, determinism is the challenge. Use a fixed model version, a fixed system prompt, and a fixed tool manifest. Seed any randomness in the downstream stub. If a test is flaky, the root cause is almost always either an ambiguous tool description (the model sometimes picks the wrong tool) or an error response that does not give the model enough information to recover.

When a tool is selected incorrectly, the first place to look is the tool description, not the model. Descriptions are the primary selection mechanism.

Anthropic , Model Context Protocol Documentation

Tool descriptions are the primary lever for fixing misrouting. A description that says "fetch data" will misfire. A description that says "fetch a single order record by its numeric order ID; use search_orders for fuzzy lookups by customer name" will not. Our concept on Writing Effective Tool Descriptions has the full pattern.

Should you build a custom MCP server or use an existing one?

The Build vs Use Decision for MCP Servers comes down to three questions: Does a maintained open-source or vendor server already cover the integration? Does the existing server's permission model match your security requirements? Does the existing server's error semantics match what your agents expect?

If the answer to all three is yes, use the existing server. The MCP ecosystem already includes servers for GitHub, Slack, PostgreSQL, filesystem access, web search, and dozens of other common integrations. Building a custom server for a commodity integration adds maintenance burden without adding capability.

Build custom when: the downstream system is internal and not publicly available, the existing server exposes too broad a surface (you need a constrained subset), or the error semantics are wrong for your agent's recovery logic.

The CCA-F exam tests this decision under the MCP Server Integration Best Practices task statement. The exam pattern is: prefer existing servers for commodity integrations, build custom for internal systems or when security constraints require a narrower surface.

For teams preparing for the exam, our Tool Design & MCP Integration concept library covers all 18% of Domain 2 across 30 mapped task statements, with practice questions scored on the same 100-to-1000 scale as the real exam.

Frequently asked questions

What transport protocols does an MCP server support?
MCP servers support two transports: stdio (standard input/output), used for local processes launched by the host, and HTTP with Server-Sent Events (SSE), used for remote or networked servers. Claude Code uses stdio for locally configured servers. Remote deployments typically use HTTP+SSE with OAuth or API key authentication at the server boundary.
How many MCP servers can you connect to Claude at once?
There is no hard protocol limit on the number of connected MCP servers, but practical limits emerge quickly. Each server's tool list is injected into the context window during capability negotiation. Connecting too many servers causes token bloat and degrades tool-selection accuracy. Best practice is to scope servers to the task or agent role rather than connecting all available servers globally.
Does the CCA-F exam test MCP server configuration syntax?
The CCA-F exam tests MCP concepts and decision-making rather than verbatim syntax recall. Domain 2 (Tool Design & MCP Integration, 18% of the exam) covers tool description design, error handling with the isError flag, scoping strategy, and the build-vs-use decision. Scenario questions ask you to diagnose misrouting or choose the correct error response, not to write JSON from memory.
What is the difference between an MCP tool and an MCP resource?
MCP tools are callable functions that can have side effects: writing files, sending messages, updating records. MCP resources are read-only, URI-addressable content that the agent uses as context. Use resources for catalogues, documents, and reference data. Use tools for any operation that mutates state or triggers an external action.
How do you prevent an MCP server from being called in an infinite loop?
Implement server-side rate limits per session and per tool, and return a structured isError response with a retry-after value when the limit is hit. On the agent side, configure a maximum iteration count in the agentic loop and treat repeated identical tool calls as a loop-termination signal. Never rely solely on prompt instructions to prevent runaway loops.
Is AI Skill Certs affiliated with Anthropic or the CCA-F exam programme?
No. AI Skill Certs is an independent adaptive preparation platform for the CCA-F exam. It is not affiliated with, endorsed by, or approved by Anthropic. The platform uses Bayesian Knowledge Tracing with a 0.90 mastery threshold and covers 174 atomic concepts mapped to the five exam domains.

People also ask

What is an MCP server used for?
An MCP server exposes tools, resources, and prompt templates to a connected AI host such as Claude. Tools let the agent take actions with side effects (writing files, calling APIs). Resources provide read-only context. The protocol standardises how agents discover and invoke these capabilities without bespoke integration code for every service.
How do I secure an MCP server in production?
Restrict the server's filesystem or API surface at the process level, not just in the system prompt. Use OAuth or API key authentication for HTTP-transport servers. Pass secrets via environment variables, not config files. Add server-side rate limits and structured logging. Programmatic controls are deterministic; prompt instructions are not a security boundary.
What is the Model Context Protocol?
The Model Context Protocol (MCP) is an open standard published by Anthropic that defines how AI hosts like Claude connect to external tools and data sources. It specifies a client-server contract for capability discovery, tool invocation, resource retrieval, and error reporting, enabling reusable integrations across different agents and applications.
How does MCP handle errors when a tool call fails?
MCP uses an isError flag on tool results. When a call fails, the server returns isError: true with a structured error payload describing the failure category and recommended recovery action. This gives the agent actionable information rather than an exception. The four main categories are access failure, validation failure, downstream service failure, and rate-limit failure.
Can Claude use multiple MCP servers at the same time?
Yes. Claude can connect to multiple MCP servers simultaneously, with each server's tools appearing in the combined tool manifest. However, connecting too many servers inflates the context window and degrades tool-selection accuracy. Best practice is to scope each agent or session to only the servers relevant to its specific task.

About the author

Solomon Udoh

AI Architect & Certification Lead

Solomon Udoh is an AI Architect who designs and ships production agent systems on the Claude API and Claude Code. He built AI Skill Certs' adaptive engine and authored its 174-concept knowledge graph, mapping every Claude Certified Architect - Foundations objective to hands-on, exam-aligned practice.

  • Designs production multi-agent systems on the Claude API and Agent SDK
  • Author of the AI Skill Certs knowledge graph (174 mapped exam concepts)
  • Builds with MCP, Claude Code, structured outputs, and agentic loops daily
  • Reviews every concept page against the official Anthropic exam guide

You might also like

Ready to put it into practice?

Study every exam concept with an adaptive tutor.

Start studying