July 24, 2025

How does a Go webserver behave when it receives… a… request…

slowly?

We had the opportunity to investigate that recently.

A black Grafana dashboard entitled “Real Time Latency”. The time range goes from 16:00 to 18:15. The p99

By “opportunity”, I of course mean “production incident”.

Setup

I’ll set the scene. It’s 4:30PM on a Tuesday, at the fraud detection company. I went on call half an hour ago. My team is winding up tasks for the day. The data platform team is about to kick off a process to construct side tables so they can start training models with Data Science tomorrow. Everything was humming along, until I got paged.

Investigation

“Real Time Latency” measures how long it takes for our fraud detection engine to process so-called real-time transactions. That is, somebody has swiped a debit card and they’re waiting for the machine to beep. Our p99 latency is normally under 70ms, and it just shot up to 480ms. The p95 followed, but the p50 remained rock solid around 40ms.

We looked at other stats. CPU and memory were normal across all of our microservices. None of the downstream microservices were showing increased error rates or timeouts. We hadn’t deployed any new code recently.

I checked with Data Platform, and they had indeed just started their side table creation tasks. They assured me, and I confirmed, it was running in the same Kubernetes cluster but on a different node pool, so there’s no resource contention between them and us.

Grasping at straws, I guessed “goroutine leak in the main Fraud Service” and restarted that deployment around 16:45. The p95 and p99 latency metrics returned to normal, but the p99 shot back up around 16:50.

Traces

We instrument our webserver with OpenTelemetry tracing by wrapping the handler function in otelhttp.NewHandler, and write those traces to Amazon X-Ray[1].

Baseline normal trace

Screenshot of an AWS X-Ray trace, showing a bunch of segments starting roughly immediately and completing in 55ms.

This is what the Fraud Service segment of a normal trace looks like. The trace starts when the webserver gets a POST request. We evaluate feature flags, check a cache to make sure this isn’t a duplicate, get features, run the model, apply rules, update the cache, and return a response. Simple as.