gRPC Between Services — What I'd Do Differently

We added gRPC to Lasius because we wanted typed contracts between services and we were tired of arguing about JSON field names in Slack. Six months later, we have typed contracts, we still argue about field names, and we also have a set of opinions that did not exist before.

Here is what I would change if I were starting from scratch.

Proto files are a monorepo problem you haven't solved yet

The moment you have two services sharing a .proto file, you have a versioning problem. We tried three approaches in sequence:

Copy-paste the proto into each repo. Painful immediately.
Shared npm package with the generated stubs. Worked until it didn't — publish cadence became a friction point.
A proto/ monorepo with generated code committed alongside. Still our current setup, still imperfect.

The thing nobody warns you about: the generated TypeScript from ts-proto and the generated Python from grpcio-tools have subtly different null semantics. A field that is optional in the proto is T | undefined in TypeScript and... also possibly an empty default value in Python, depending on field type. We hit this with a string field that a TypeScript service left unset (undefined) and a Python service read as an empty string and then used as a key.

Rule I'd give past-me: any field that will be used as an identifier — ID, name, key — should be required in proto3 or validated at the service boundary on read, not trusted from the wire.

Error propagation across language boundaries

gRPC status codes are nominally a shared language. In practice, INVALID_ARGUMENT from a Python FastAPI-gRPC service means something different from INVALID_ARGUMENT thrown by a Node.js service using @grpc/grpc-js.

The Python ecosystem tends to attach error detail in the trailing metadata. The Node ecosystem tends to attach it in the details string. We ended up writing an error-normalization layer in our gateway that unpacked both conventions and re-surfaced them as a consistent shape.

If you are polyglot from day one, write this normalization layer before you need it, not after you have five services with five conventions.

When to not use gRPC

gRPC is worth it when:

You cross a language boundary and the call is on a hot path
You need streaming (bidirectional or server-push)
The schema discipline pays for itself in team coordination

It is probably not worth it when:

Both services are Node.js (just call the function, or use a typed HTTP client)
The service is customer-facing anyway (you'll wrap in REST/GraphQL regardless)
The team is small enough that a shared TypeScript interface file would solve the coordination problem

We removed gRPC from two internal service pairs this year and replaced them with direct HTTP calls with Zod validation at the boundary. Faster to iterate, zero protoc in CI, same contract guarantee where it matters.

Streaming is the actual unlock

The part that made gRPC worth keeping: server-streaming for MCP tool results. When an agent calls a long-running tool, we stream partial results back over a gRPC server-stream, which the gateway bridges to SSE for the client. The alternative — polling or holding a long HTTP connection — was worse in every measurement.

If you have a use case with long-lived, incremental results, gRPC streaming is genuinely good. That is the case where I would pick it first, not last.