MCP (Model Context Protocol) in Production: A 2026 Field Guide for AI Developers

MCP (Model Context Protocol) in Production: A 2026 Field Guide for AI Developers

posted 11 min read

TL;DR — Model Context Protocol (MCP) is Anthropic's open standard for connecting LLMs to external tools, data sources, and services. It's powerful, but it's also overhyped. This guide covers what MCP actually is under the hood, when it's the right architectural choice, when you should just use plain tool use instead, and how to build production-grade MCP servers that don't fall over under real traffic.


Why MCP exists (and why you probably misunderstand it)

If you've been building with Claude, GPT, or any modern LLM for more than a year, you've solved the same problem at least three times: "how do I let the model talk to my database / my filesystem / my Jira / my Slack?"

Every team ends up writing the same glue code:

  1. Define a tool schema (JSON Schema for inputs)
  2. Wire it into the LLM's tool-use API
  3. Implement an executor that runs the tool and returns results
  4. Handle auth, errors, retries, observability
  5. Repeat for every new integration, every new project, every new LLM provider

MCP standardizes steps 1-5 into a protocol. Instead of every team reinventing the integration layer, MCP defines a wire format (JSON-RPC over stdio, SSE, or HTTP) and a discovery mechanism. An MCP server exposes "tools," "resources," and "prompts." An MCP client (Claude Desktop, Claude Code, Cursor, your custom agent) discovers and calls them.

The result: write the integration once, use it in every MCP-compatible client.

That's the value prop. Now let me tell you what the docs don't tell you.


What MCP actually is (without the marketing)

MCP has three primitives. Most articles list them and move on; understanding what they actually mean in practice is where the production wins live.

1. Tools

Functions the LLM can call. Identical in shape to OpenAI/Anthropic tool use. A name, a description, a JSON Schema for input. The LLM decides when to call them.

{
  "name": "search_jira_issues",
  "description": "Search Jira issues by project and status",
  "inputSchema": {
    "type": "object",
    "properties": {
      "project": {"type": "string"},
      "status": {"type": "string", "enum": ["open", "in_progress", "done"]}
    },
    "required": ["project"]
  }
}

If you've done function calling, you've done MCP tools — just over a different wire.

2. Resources

Read-only data the LLM (or the client) can fetch. Files, database rows, API responses, log streams. Resources are addressable by URI (file:///, postgres://, confluence://, etc.).

The critical distinction: tools are active (LLM decides when to invoke), resources are passive (the client decides what to attach to the conversation).

This is where most teams get MCP wrong. They model everything as a tool because "the LLM should decide." But if a user says "summarize this Confluence page," that's a resource attachment, not a tool call. Using a tool there means the LLM has to discover the page exists, decide to fetch it, and risk getting the URI wrong.

3. Prompts

Reusable prompt templates the server exposes to clients. Less interesting in most production stacks — useful for shared team prompts ("summarize-pr", "review-security", "extract-action-items"), but easily replaced by your own prompt library.

In practice, 80% of production MCP value comes from tools + resources. Prompts are nice-to-have.

Want a structured walkthrough of MCP architecture, server implementation, and integration patterns with hands-on exercises? The MCP (Model Context Protocol) course on Cursuri-AI.ro covers all three primitives, transport options, and real production server examples.


When MCP is the right choice

Three scenarios where MCP earns its complexity.

Scenario 1: Multi-client integration

You're building a tool that needs to work inside Claude Desktop, Claude Code, Cursor, Continue, AND your own custom agent. Without MCP, you write five separate integrations. With MCP, you write one server and every client speaks to it.

This is the original use case Anthropic designed MCP for. If your tool will live in only one client, MCP is overhead.

Scenario 2: Cross-team integration platforms

Your company has 15 internal tools (Jira, Confluence, Datadog, GitHub, an internal CRM, your data warehouse, three custom services). You want every engineer to be able to use any of those tools from any AI assistant without each team reimplementing.

You build one MCP server per tool, run them as services, and every AI assistant in the company connects to all of them. New tools added to the catalog become available to everyone automatically.

This is where MCP shines as a platform play — and where most public MCP discussions miss the point. It's not really about a single dev wiring Claude Desktop to their filesystem; it's about a company building an internal tool fabric for AI.

Scenario 3: Open-source AI tooling distribution

You're publishing a tool you want devs to use across the AI ecosystem. An MCP server is installable as a package, runs as a subprocess, and works in any compatible client. Distribution becomes a non-issue.

The current MCP server catalog already has servers for GitHub, Slack, Postgres, Brave Search, Filesystem, Memory, and dozens more — all installable in one config line.


When MCP is the WRONG choice

This is where every other MCP article stops being useful. Skip MCP when:

Anti-pattern 1: Single-app, single-purpose tools

You're building a SaaS product where your backend already exposes APIs. Your LLM agent runs inside your backend, calling your own services. MCP adds a wire protocol where none is needed.

Just use plain tool use. Define tool schemas in your Anthropic API call, execute them in-process, return results. Zero MCP machinery.

# This is fine. You don't need MCP for this.
tools = [{
    "name": "create_invoice",
    "description": "Create a new invoice for a customer",
    "input_schema": {...}
}]

response = client.messages.create(
    model="claude-sonnet-4-6",
    tools=tools,
    messages=[...]
)

Anti-pattern 2: Latency-sensitive paths

MCP adds protocol overhead — JSON-RPC serialization, transport (stdio/SSE/HTTP), discovery roundtrips. For sub-100ms operations called dozens of times per second, that overhead matters.

If your tool is called inside a tight inner loop (e.g., per-token evaluation, real-time streaming with side effects), bypass MCP and call your function directly.

Anti-pattern 3: Tightly coupled domain logic

If your "tool" is really just a thin wrapper around a function in the same codebase as your agent, don't extract it to MCP. You'll add a network/IPC boundary, deployment complexity, version skew problems, and observability gaps for zero architectural benefit.

MCP shines when the tool and the agent have independent lifecycles. If they don't, keep them in-process.

Anti-pattern 4: One-off prototyping

Spinning up an MCP server, configuring transport, writing a tool spec, registering it in your client — that's a 30-minute setup before you've called a function. For a one-off script or a hackathon prototype, just use tool use directly.

When to use MCP, when to use direct tool use, when to use a multi-agent architecture — and how each affects cost, latency, and maintainability — is exactly what the Advanced LLM Integration course on Cursuri-AI.ro trains you to decide.


Building a production MCP server (working code)

The MCP TypeScript and Python SDKs make this surprisingly approachable. Here's a minimal but production-shaped server in Python.

# server.py
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent
import asyncpg
import os

app = Server("internal-crm")

@app.list_tools()
async def list_tools() -> list[Tool]:
    return [
        Tool(
            name="find_customer",
            description="Find a customer by email or phone",
            inputSchema={
                "type": "object",
                "properties": {
                    "email": {"type": "string"},
                    "phone": {"type": "string"}
                },
            },
        ),
        Tool(
            name="recent_orders",
            description="Get the N most recent orders for a customer",
            inputSchema={
                "type": "object",
                "properties": {
                    "customer_id": {"type": "string"},
                    "limit": {"type": "integer", "default": 10}
                },
                "required": ["customer_id"],
            },
        ),
    ]

@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
    pool = await get_pool()

    if name == "find_customer":
        row = await pool.fetchrow(
            "SELECT id, email, name FROM customers WHERE email = $1 OR phone = $2",
            arguments.get("email"),
            arguments.get("phone"),
        )
        if not row:
            return [TextContent(type="text", text="No customer found")]
        return [TextContent(type="text", text=f"{row['id']} | {row['name']} | {row['email']}")]

    if name == "recent_orders":
        rows = await pool.fetch(
            "SELECT id, total, created_at FROM orders WHERE customer_id = $1 ORDER BY created_at DESC LIMIT $2",
            arguments["customer_id"],
            arguments.get("limit", 10),
        )
        lines = [f"{r['id']} | ${r['total']} | {r['created_at']}" for r in rows]
        return [TextContent(type="text", text="\n".join(lines) or "No orders")]

    raise ValueError(f"Unknown tool: {name}")

_pool = None
async def get_pool():
    global _pool
    if _pool is None:
        _pool = await asyncpg.create_pool(os.environ["DATABASE_URL"], min_size=2, max_size=10)
    return _pool

if __name__ == "__main__":
    import asyncio
    asyncio.run(stdio_server(app))

That's ~50 lines for a working MCP server. Register it in your Claude Desktop config:

{
  "mcpServers": {
    "internal-crm": {
      "command": "python",
      "args": ["/path/to/server.py"],
      "env": {
        "DATABASE_URL": "postgresql://..."
      }
    }
  }
}

Restart the client. Tools appear. Done.


Production concerns nobody warns you about

The 50-line example above is a demo. Here's what changes when you ship for real.

1. Authentication is your problem

MCP doesn't define auth. The protocol assumes the transport handles it (stdio = local trust, HTTP = your headers/tokens). For a server running as a subprocess, the user's local credentials are usually fine. For a remote MCP server (HTTP transport), you must layer your own auth on top — typically JWT, OAuth2, or mTLS.

A common mistake: shipping an HTTP MCP server with no auth because "Claude Desktop just connects to it." If the server is reachable on the network, anyone on the network can call your tools.

2. Tool output size matters

Every tool result is fed back into the LLM's context. If your tool returns a 50,000-row query result, you've just:

  • Blown past your context window
  • Burned $5 per call on input tokens
  • Confused the LLM with a wall of mostly-irrelevant data

Always paginate, summarize, or filter inside the tool. Return at most a few hundred tokens of output unless you have a specific reason otherwise. Offer a "give me more" follow-up tool for cases where the model needs to drill in.

3. Idempotency and side effects

Tool calls can be retried by the client (network failures, rate limits, user-triggered re-runs). If your tool has side effects — sending an email, creating a record, charging a card — build in idempotency keys.

@app.call_tool()
async def call_tool(name: str, arguments: dict):
    if name == "create_invoice":
        idempotency_key = arguments.get("idempotency_key") or generate_key(arguments)
        existing = await db.fetch_invoice_by_key(idempotency_key)
        if existing:
            return [TextContent(type="text", text=f"Invoice {existing.id} (already created)")]
        # ... actually create ...

4. Observability

Local MCP servers (stdio transport) have no built-in logging surface that bubbles up to the client. Log to a file with structured JSON, ship those logs to your observability platform like any other service. You want:

  • Tool invocation count + latency
  • Error rate per tool
  • Input size, output size distributions
  • Top arguments per tool (for prompt-engineering insights)

Without this, debugging a flaky agent that "sometimes can't find customers" is misery.

5. Versioning and breaking changes

When you change a tool's schema (rename a parameter, change a default, remove a tool), every client that has cached your tool list has stale info until they reconnect. Plan for:

  • Semver on your server package
  • Deprecation periods (keep old tool names as aliases for a release or two)
  • A tools/list notification mechanism if your client supports it

MCP vs the alternatives

Three patterns get conflated in 2026 discussions. Knowing when to use each is half the architectural skill.

Pattern Best for Avoid when
Direct tool use (OpenAI/Anthropic native function calling) Single app, in-process tools, latency-sensitive paths, prototyping Sharing tools across clients/teams
MCP Multi-client distribution, cross-team integration platforms, OSS tooling Single-app monoliths, sub-100ms tools, tight domain coupling
HTTP REST APIs called via tool use Tools owned by another team/service that already has a stable API When you'd be wrapping the same call surface twice

A good production stack often uses all three:

  • HTTP APIs for cross-service calls (existing infra)
  • Direct tool use for in-process logic (low overhead)
  • MCP for tools that need to live in multiple AI clients (developer experience play)

Picking the right integration boundary — when to wrap a service in MCP vs leave it as a REST API vs inline it as tool use — is one of the most consequential architecture decisions in modern AI apps. The AI System Architecture course on Cursuri-AI.ro walks through these tradeoffs with real case studies.


A production checklist before you ship an MCP server

  • [ ] Tool schemas are stable and versioned
  • [ ] Tool outputs are bounded in size (paginated/summarized)
  • [ ] Idempotency keys on every side-effecting tool
  • [ ] Auth layered on top of HTTP transport (if not stdio)
  • [ ] Structured logging with per-tool latency + error rate metrics
  • [ ] Connection retries and graceful degradation on the client side
  • [ ] At least one client tested end-to-end (Claude Desktop, Claude Code, Cursor)
  • [ ] Tool descriptions are LLM-friendly (clear, no jargon, examples in description)
  • [ ] Resources used for read-only attachments, tools used for active calls
  • [ ] Server is monitored like any other production service

Wrapping up

MCP is a real architectural advance for AI tooling, but it's not a silver bullet. It earns its complexity when you have multiple AI clients to support, multiple teams sharing tools, or an OSS distribution story. It costs you complexity when you're building a single-app agent that calls in-process functions.

The teams getting outsized value from MCP in 2026 are the ones treating it as a platform: one MCP server per backend service, exposed to every AI assistant the company uses, observed and versioned like any production service. The teams burning time on MCP are the ones using it as a substitute for plain function calling in single-app contexts.

Where to go deeper

I write about production AI engineering — MCP, agent architectures, RAG, Claude API patterns — on Cursuri-AI.ro, an interactive learning platform built for engineers shipping AI in production. Courses most relevant to what's in this article:

Course content is delivered in Romanian (the platform's primary audience), but the code, protocols, and architecture patterns are language-agnostic. The IT Pro track is built specifically for engineers shipping AI in production environments.


What's your MCP setup in production? Are you running a single internal platform server, or distributing OSS servers, or skipping MCP entirely in favor of direct tool use? I'm collecting real-world architectures for a follow-up on multi-tenant MCP at scale — auth boundaries, per-customer tool scoping, and the cost model when 1,000+ engineers share a tool fabric.

If this was useful, leave a comment with your stack — I read every reply.


Related reading:

More Posts

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Karol Modelskiverified - Mar 19

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

Dharanidharan - Feb 9

Architecting a Local-First Hybrid RAG for Finance

Pocket Portfolio - Feb 25

Your AI Doesn't Just Write Tests. It Runs Them Too.

Kevin Martinez - May 12

TypeScript Complexity Has Finally Reached the Point of Total Absurdity

Karol Modelskiverified - Apr 23
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

3 comments
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!