Building an MCP Server Ecosystem from Scratch

When I started on Lasius, we had a single MCP server, a Slack channel full of opinions, and a vague sense that this protocol was either going to eat half our integration work or melt down trying. A year later it has done both, several times, and we've come out with something I can recommend with a straight face.

This is a long note on what the Model Context Protocol actually asks of you once you go past the hello-world server, written for the person who is about to be the one carrying the pager.

The mental model that finally clicked

MCP is not an API. It is a transport contract between an agent and a set of tools, with optional resource and prompt surfaces. The hard part is not implementing the verbs. The hard part is that you now have an N×M problem — N agents, M servers — and every cross-cutting concern (auth, observability, rate limits, schema drift) has to live somewhere that isn't the servers themselves.

Once we stopped treating each MCP server as a self-contained microservice and started treating the fleet as a directory of capabilities behind a gateway, almost every architectural argument we had been having dissolved.

Layer one — the gateway

The gateway is the only thing your agents talk to. It owns:

Server discovery (who is up, what tools they advertise, schema version).
Authorization (this agent can call these tools, with these scopes).
Transport translation (STDIO ↔ SSE ↔ Streamable HTTP).
Telemetry (every call, latency, error class, cost where applicable).

Putting all four of these in one component is opinionated. It is also the only configuration we tried that survived the first three customers.

A gateway is not glamorous infrastructure. It will feel like an afterthought until the day a customer asks how to revoke an agent's access to a single tool without redeploying anything — and then it is the only thing that matters.

Layer two — the proxy

MCP gives you three transports today and you will, eventually, need all of them. STDIO is the developer-laptop default and the one your local servers will speak. SSE is what your hosted servers will speak. Streamable HTTP is what survives the firewall, the load balancer, and the customer's corporate proxy.

We wrote a proxy layer that does the boring work — accepting one transport on the wire, speaking another to the server, preserving message ordering, and surfacing transport-level errors as first-class events the gateway can act on.

// Pseudocode for the transport pump
async function pump(inbound: Transport, outbound: Transport) {
  for await (const msg of inbound.messages()) {
    const traced = annotate(msg, ctx);
    try {
      await outbound.send(traced);
    } catch (err) {
      onTransportError(traced, err);
      throw err;
    }
  }
}

The single most useful thing we did here was make the proxy emit one structured log line per message — request id, agent id, server id, tool, transport, byte count, latency. That log table is now our debugging tool, our billing source, and the input to our weekly capacity review.

Layer three — auth, the part everyone underestimates

MCP servers tend to be wrappers over real services with real credentials — Google Workspace, Slack, internal databases. Your gateway has to broker access to those without ever handing the underlying tokens to the agent.

We landed on a three-tier model:

Identity Provider (IDP). The human or service account behind the request. Stable. Owns roles.
MCP Auth. A short-lived credential scoped to a specific (agent × server × toolset) tuple. The gateway mints this and the proxy presents it.
Downstream OAuth. The actual Google / GitHub / whatever token, held by the server, refreshed by a job that the agent never sees.

Every time we have been tempted to collapse two of these into one, we have regretted it within a sprint.

Schemas drift. Plan for it.

Tool schemas change. Servers get redeployed. Agents that were fine yesterday will hit a validation error tomorrow and the only person who can explain why is on a flight.

The thing that saved us was simple: every tool schema is versioned, the gateway pins agents to a specific version on first contact, and a background job warns the agent owner when the pinned version is more than two versions behind the latest. We have rolled back from a bad schema deploy three times because of this, and have never had to do an emergency revert.

The protocol is the easy part. The discipline of versioning, observing, and revoking is the actual product.

What I would do differently

One thing only: I would write the gateway first, with a stub server behind it, and not write a real MCP server until the gateway could handle two of them. We did it the other way around. We were wrong.

Everything else — the proxy, the auth model, the schema pins — came out of necessity in the right order. The gateway is the one piece you need before you need it.

Closing

MCP is still a young protocol and the specifics in this note will age. The shape of the problem — N agents, M servers, one place that has to know about all of it — will not. If you are building on MCP today, the favor I'd ask is this: pick the gateway-first path, write down your three auth tiers before you write any code, and instrument the proxy from line one.

That is most of what I know. The rest I will probably learn the hard way, and write up here.