Architecture·10 min read·12 June 2026

MCP Server in Production: Security, Scope, and Reliability

Learn how to design, secure, and operate an MCP server for Claude agents in production: permissions, tool scope, error handling, and enterprise deployment patterns.

By Solomon Udoh · AI Architect & Certification Lead

MCP Server in Production: Security, Scope, and Reliability

An MCP server is the standardised integration layer that lets Claude agents call external tools, read resources, and trigger actions without bespoke glue code for every service. The Model Context Protocol (MCP), published by Anthropic in late 2024, defines a client-server contract so that any compliant host, including Claude, can discover and invoke capabilities at runtime. Getting that contract right in production is what Domain 2 of the CCA-F exam (Tool Design & MCP Integration, 18% of the exam) tests directly.

This guide covers the decisions that matter most: how to scope permissions without locking agents out, when to use read-only resources versus write actions, how to prevent tool overload, and how to build the error-handling and observability layer that keeps enterprise deployments auditable.

What exactly is an MCP server and how does Claude connect to it?

An MCP server exposes three primitive types to a connected host: tools (callable functions with side effects), resources (read-only content addressable by URI), and prompts (reusable prompt templates). Claude, acting as an MCP client, discovers these primitives at session start through a capability negotiation handshake, then selects among them during inference.

The configuration that wires Claude to a server lives in a JSON file. In Claude Code, for example, a local filesystem server entry looks like this:

json

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/workspace/project"],
      "env": {
        "NODE_ENV": "production"
      }
    }
  }
}

The command field launches the server process; Claude communicates with it over stdio or HTTP+SSE. Because the server process runs with the permissions of the host process, the scope of that process matters enormously. We return to this under security below.

For a deeper look at how tool results flow back into the conversation, see our concept on Tool Result Appending.

How should you scope MCP server permissions to avoid PII leakage?

Scope permissions to the minimum surface the agent needs for its task, then enforce that surface programmatically rather than relying on prompt instructions alone.

The exam consistently rewards deterministic, programmatic controls over probabilistic ones when stakes are high. That principle maps directly to MCP permission design:

Control layer	Mechanism	Reliability
Filesystem server path restriction	Pass only the target directory as an argument	Deterministic
Database server row-level filtering	Server-side WHERE clause on every query	Deterministic
API server OAuth scope	Issue tokens with minimum required scopes	Deterministic
Prompt instruction ("don't read PII")	System prompt text	Probabilistic

Prompt-level instructions are useful for nuance, but they are not a security boundary. A server that can read /etc/passwd will read it if the model decides to. Restrict the path at the server level instead.

For PII specifically, consider building a sanitisation hook that strips or masks sensitive fields before the tool result reaches the model's context window. Our concept on PostToolUse Hooks for Data Normalisation explains the hook pattern in detail.

Environment variables are the correct way to pass secrets (API keys, database credentials) into an MCP server process. Hard-coding credentials in the config file creates version-control exposure. See Environment Variable Expansion in MCP Config for the exact syntax Claude Code supports.

json

{
  "mcpServers": {
    "crm": {
      "command": "node",
      "args": ["./servers/crm-server.js"],
      "env": {
        "CRM_API_KEY": "${CRM_API_KEY}",
        "CRM_BASE_URL": "https://crm.internal.example.com"
      }
    }
  }
}

When should an MCP server expose resources versus tools?

Use resources for read-only, URI-addressable content that the agent needs as context. Use tools for anything with a side effect: writing a file, sending an email, updating a record, executing code.

This distinction matters for two reasons. First, resources do not require the same level of confirmation logic as tools because they cannot change state. Second, MCP clients can prefetch or cache resources more aggressively than tool calls, which reduces latency and token consumption.

A practical heuristic:

Operation	Primitive	Rationale
Read a product catalogue	Resource	No side effect; cacheable
Fetch a customer record for display	Resource	Read-only
Update a customer record	Tool	Mutates state
Send a Slack message	Tool	External side effect
Execute a SQL SELECT	Resource or Tool	Tool if query is dynamic/parameterised
Execute a SQL INSERT	Tool	Always

For write operations that are irreversible (deleting records, sending emails, making payments), the exam pattern is to add a confirmation gate before execution. This is not a prompt instruction; it is a server-side check that requires an explicit confirmed: true parameter before the destructive path runs.

python

@server.tool()
async def delete_customer(customer_id: str, confirmed: bool = False) -> dict:
    if not confirmed:
        return {
            "isError": False,
            "content": [{"type": "text", "text": f"Dry run: would delete customer {customer_id}. Pass confirmed=true to proceed."}]
        }
    # proceed with deletion
    result = await crm.delete(customer_id)
    return {"isError": False, "content": [{"type": "text", "text": f"Deleted {customer_id}"}]}

How do too many MCP servers hurt agent performance?

Connecting every available MCP server to every agent session is the most common production mistake. Each server's tool list is injected into the context window during capability negotiation. With ten servers exposing fifteen tools each, the agent receives 150 tool descriptions before it has processed a single user message.

This creates two compounding problems. First, token consumption rises, increasing cost and latency. Second, the model's ability to select the correct tool degrades as the tool list grows, a phenomenon the exam calls the tool overload problem.

The fix is scoping: connect only the servers relevant to the current task or agent role. The MCP Scoping Hierarchy defines three levels at which you can apply this:

Scope level	Where configured	Granularity
User (personal)	`~/.claude/` settings	Per-developer
Project	`.claude/` in repo root	Per-codebase
Session	Runtime flag or API param	Per-invocation

In a multi-agent system, the coordinator should pass only the tools its subagents need for their specific subtask, not the full tool manifest. This is the Tool Distribution Strategy Design pattern: treat tool access as a capability that is granted per-role, not broadcast globally.

Tools are a form of API surface. Every tool you expose is a surface you must secure, monitor, and maintain. Fewer tools, better described, outperform many tools with thin descriptions.

Anthropic , Model Context Protocol Documentation

What does enterprise-grade MCP server deployment require?

Production deployments need four operational layers beyond the basic server implementation: authentication, rate limiting, structured logging, and health monitoring.

Authentication. For HTTP-transport MCP servers, use OAuth 2.0 or API key validation at the server boundary. The server should reject unauthenticated requests before any tool logic runs. Do not rely on network isolation alone.

Rate limiting. Agents in agentic loops can call tools far faster than a human would. Without rate limits, a runaway loop can exhaust a downstream API quota in seconds. Implement per-session and per-tool rate limits server-side.

Structured logging. Every tool invocation should emit a structured log entry with at minimum: timestamp, tool name, input parameters (with PII masked), output summary, latency, and success/error status. This is the audit trail that compliance teams require and that you need for root-cause debugging.

json

{
  "timestamp": "2026-06-11T14:32:01Z",
  "tool": "crm.update_customer",
  "session_id": "sess_abc123",
  "input": {"customer_id": "cust_789", "field": "email", "value": "[REDACTED]"},
  "output_summary": "success",
  "latency_ms": 142,
  "error": null
}

Health monitoring. Expose a /health endpoint from each HTTP MCP server. Your orchestration layer should poll it and remove unhealthy servers from the active pool rather than letting the agent discover failures mid-task.

The CCA-F exam as of 12 March 2026 tests these patterns under Domain 2 (Tool Design & MCP Integration, 18%) and Domain 5 (Context Management & Reliability, 15%). As of 3 June 2026, more than 10,000 individuals have passed the exam, and production MCP deployment questions appear consistently in scenario-based items.

How should MCP servers handle errors so agents can recover?

MCP defines the isError flag on tool results precisely for this purpose. When a tool call fails, the server should return isError: true with a structured error payload rather than throwing an exception or returning an empty result. An exception crashes the tool call and gives the agent nothing to reason about. A structured error gives the agent information it can act on.

The four error categories the exam tests are: access failure, validation failure, downstream service failure, and rate-limit failure. Each warrants a different response:

Error category	`isError`	Recommended agent action
Access failure (auth/permission)	`true`	Escalate to human; do not retry
Validation failure (bad input)	`true`	Retry with corrected parameters
Downstream service failure	`true`	Retry with backoff; escalate after N attempts
Rate-limit failure	`true`	Wait and retry; surface wait time if known

python

@server.tool()
async def fetch_order(order_id: str) -> dict:
    try:
        order = await orders_api.get(order_id)
        return {"isError": False, "content": [{"type": "text", "text": order.to_json()}]}
    except PermissionError as e:
        return {"isError": True, "content": [{"type": "text", "text": f"Access denied: {e}. Escalate to administrator."}]}
    except RateLimitError as e:
        return {"isError": True, "content": [{"type": "text", "text": f"Rate limited. Retry after {e.retry_after}s."}]}
    except Exception as e:
        return {"isError": True, "content": [{"type": "text", "text": f"Service error: {e}. Retry with backoff."}]}

The distinction between an access failure and a valid empty result is a common exam trap. If a search returns zero results, isError should be false with an empty list. If the search could not execute because of a permission problem, isError should be true. Conflating these causes agents to misroute their recovery logic. See Access Failure vs Valid Empty Result for the full treatment.

How do you test MCP integrations for real agent reliability?

Unit tests on individual tool functions are necessary but not sufficient. Agents fail in ways that only emerge from the full tool-selection-and-execution loop: the model picks the wrong tool, a tool returns a subtly malformed result that the model misinterprets, or a sequence of valid calls produces an invalid aggregate state.

A reliable MCP test suite has three layers:

Unit tests on each tool function: valid inputs, each error category, boundary conditions.
Integration tests that run the MCP server against a real (or stubbed) downstream and verify the full request-response cycle including isError semantics.
Agent-in-the-loop tests that send a natural-language task to Claude with the MCP server connected and assert on the final outcome, not the intermediate steps.

For layer three, determinism is the challenge. Use a fixed model version, a fixed system prompt, and a fixed tool manifest. Seed any randomness in the downstream stub. If a test is flaky, the root cause is almost always either an ambiguous tool description (the model sometimes picks the wrong tool) or an error response that does not give the model enough information to recover.

When a tool is selected incorrectly, the first place to look is the tool description, not the model. Descriptions are the primary selection mechanism.

Anthropic , Model Context Protocol Documentation

Tool descriptions are the primary lever for fixing misrouting. A description that says "fetch data" will misfire. A description that says "fetch a single order record by its numeric order ID; use search_orders for fuzzy lookups by customer name" will not. Our concept on Writing Effective Tool Descriptions has the full pattern.

Should you build a custom MCP server or use an existing one?

The Build vs Use Decision for MCP Servers comes down to three questions: Does a maintained open-source or vendor server already cover the integration? Does the existing server's permission model match your security requirements? Does the existing server's error semantics match what your agents expect?

If the answer to all three is yes, use the existing server. The MCP ecosystem already includes servers for GitHub, Slack, PostgreSQL, filesystem access, web search, and dozens of other common integrations. Building a custom server for a commodity integration adds maintenance burden without adding capability.

Build custom when: the downstream system is internal and not publicly available, the existing server exposes too broad a surface (you need a constrained subset), or the error semantics are wrong for your agent's recovery logic.

The CCA-F exam tests this decision under the MCP Server Integration Best Practices task statement. The exam pattern is: prefer existing servers for commodity integrations, build custom for internal systems or when security constraints require a narrower surface.

For teams preparing for the exam, our Tool Design & MCP Integration concept library covers all 18% of Domain 2 across 30 mapped task statements, with practice questions scored on the same 100-to-1000 scale as the real exam.

Frequently asked questions

What transport protocols does an MCP server support?

MCP servers support two transports: stdio (standard input/output), used for local processes launched by the host, and HTTP with Server-Sent Events (SSE), used for remote or networked servers. Claude Code uses stdio for locally configured servers. Remote deployments typically use HTTP+SSE with OAuth or API key authentication at the server boundary.

How many MCP servers can you connect to Claude at once?

There is no hard protocol limit on the number of connected MCP servers, but practical limits emerge quickly. Each server's tool list is injected into the context window during capability negotiation. Connecting too many servers causes token bloat and degrades tool-selection accuracy. Best practice is to scope servers to the task or agent role rather than connecting all available servers globally.

Does the CCA-F exam test MCP server configuration syntax?

The CCA-F exam tests MCP concepts and decision-making rather than verbatim syntax recall. Domain 2 (Tool Design & MCP Integration, 18% of the exam) covers tool description design, error handling with the isError flag, scoping strategy, and the build-vs-use decision. Scenario questions ask you to diagnose misrouting or choose the correct error response, not to write JSON from memory.

What is the difference between an MCP tool and an MCP resource?

MCP tools are callable functions that can have side effects: writing files, sending messages, updating records. MCP resources are read-only, URI-addressable content that the agent uses as context. Use resources for catalogues, documents, and reference data. Use tools for any operation that mutates state or triggers an external action.

How do you prevent an MCP server from being called in an infinite loop?

Implement server-side rate limits per session and per tool, and return a structured isError response with a retry-after value when the limit is hit. On the agent side, configure a maximum iteration count in the agentic loop and treat repeated identical tool calls as a loop-termination signal. Never rely solely on prompt instructions to prevent runaway loops.

Is AI Skill Certs affiliated with Anthropic or the CCA-F exam programme?

No. AI Skill Certs is an independent adaptive preparation platform for the CCA-F exam. It is not affiliated with, endorsed by, or approved by Anthropic. The platform uses Bayesian Knowledge Tracing with a 0.90 mastery threshold and covers 174 atomic concepts mapped to the five exam domains.

What exactly is an MCP server and how does Claude connect to it?

How should you scope MCP server permissions to avoid PII leakage?

When should an MCP server expose resources versus tools?

How do too many MCP servers hurt agent performance?

What does enterprise-grade MCP server deployment require?

How should MCP servers handle errors so agents can recover?

How do you test MCP integrations for real agent reliability?

Should you build a custom MCP server or use an existing one?

Frequently asked questions

People also ask