I have shipped three Model Context Protocol servers in the last few months, all of them in production, all of them things I use myself every day. One gives Claude semantic recall over the stuff I save from X and Reddit. One puts live SEO data inside Claude and Cursor. One watches my competitors and briefs me on Mondays.
None of that is the interesting part. The interesting part is everything that broke on the way there, and the handful of patterns that held up. This is the write-up I wish I had read before I started, from someone who was actually running the things rather than demoing them.
1. The client forgets everything, so your server has to remember
The first thing that trips people up is that an MCP server is stateless from the model's point of view. Claude opens a chat, calls your tool, gets a response, and by the next conversation it has forgotten the whole thing happened. If your server only ever returns text into the chat, you have built a novelty. The user gets an answer, closes the tab, and the answer is gone.
The servers of mine that earned a permanent spot all do the same thing: they write durable artifacts to the user's disk. My SEO server does not just answer "which keywords am I ranking for," it also mirrors every lookup into a local folder as plain markdown, so the research compounds instead of evaporating. That one design choice is the difference between a tool people try once and a tool people keep connected.
If you take one thing from this post, take that. Decide early where the memory lives. The chat is not it.
2. Stateless workers will double-charge you if you let them
This is the bug that cost me the most sleep, so I will be specific.
I run everything on Cloudflare Workers. Serverless workers are cheap and they scale to zero, which is perfect for a solo founder. They also get torn down aggressively when idle, often within about thirty seconds. If a request is mid-flight when the worker is recycled, the client retries. Now the same unit of work runs twice. For a tool that costs real money per call (an external data API, an LLM call, a credit deduction), running twice means charging the user twice for one action.
The fix is idempotency, and you want it from day one, not after the first angry email. Hash the meaningful inputs of the request into a stable key (I use a SHA-256 of the request payload), write that key to a short-lived KV store the moment work starts, and check for it before doing anything expensive. If the key is already there, you are looking at a retry, so return the cached result instead of redoing the work. Pair it with a watermark so partial progress is not repeated. It is a small amount of code and it is the load-bearing wall of a paid MCP server.
You are not writing an API for a developer who reads docs. You are writing for a model that decides, in the moment, whether your tool is the right one to call and how to fill in the arguments. The tool name and description are the entire contract.
I have watched Claude skip a perfectly good tool because the description was vague, and I have watched it call the right tool with confidence after I rewrote one sentence. Treat descriptions as prompt engineering. Say plainly what the tool is for, when to reach for it, and what each argument expects. Give an example in the description if the arguments are non-obvious. Then actually test it by asking the model natural questions and watching which tool it picks. If it picks wrong, that is a description bug, not a user error.
4. Auth is most of the work, and it is where the standards are still moving
The happy-path demo of an MCP server is an afternoon. Authenticating real users against it is the rest of the month.
If you are building anything multi-tenant, budget for it. Getting OAuth working cleanly on the edge involved more sharp edges than anything else I built: cookie handling that behaves inside an embedded client, the newer dynamic client registration flow that MCP is standardizing on, and picking a password hashing strategy that actually runs in a Workers runtime rather than assuming a full Node environment. None of it is glamorous and all of it is the difference between a personal tool and a product. Do not leave it to the end and assume it will slot in.
5. Metering and abuse limits are day-one features
Because each call can cost you money (point 2), you cannot treat rate limiting and usage accounting as things you will add later. The first person who scripts a loop against your public tool will find out before you do.
I put a hard per-hour ceiling on calls and a credit system on the paid tools from the start. It does two jobs. It caps your downside if someone hammers the endpoint, and it gives you the usage data you need to price the thing honestly. You cannot set a fair price for an MCP server until you know what a typical week of real usage actually costs you to serve, and you only learn that by metering from the first day.
6. Ship for a job, not for a demo
This is the product lesson underneath all the technical ones. The MCP servers that stick are the ones that do a specific job the user already has, inside the tool they already work in. "Look, Claude can call an API" is a demo. "I asked which pages slipped in search this week and got the answer without opening a dashboard" is a job.
Every server I kept started as a thing I was doing by hand and resented. I keep a running directory of the best MCP servers I actually use, grouped by the job they do rather than by category, because that framing is the one that has never let me down. If you cannot name the job in a sentence, the server is probably a demo.
Where I would start if I did it again
Pick one job you do by hand and hate. Build the smallest server that does exactly that job and writes its output somewhere durable. Make it idempotent before you make it pretty. Write the tool description as if the model is a smart colleague who has never seen your code. Then use it yourself for two weeks before you show anyone. That last step is the real filter. The servers I still run are the ones I could not stop using during those two weeks.
David Hamilton is the founder of ContextBolt, where he builds MCP servers that give AI assistants real-world context, including a live SEO MCP server for Claude, Cursor, and Codex.