Software Engineer @ Grafana Labs
Björn “Beorn” Rabenstein is an engineer at Grafana and a Prometheus developer. Previously, he was a Production Engineer at SoundCloud, a Site Reliability Engineer at Google, and a number cruncher for science.
Should SLOs Be Request-Based or Time-Based? And Why Neither Really Works…
Once you had gotten somewhat familiar with SLOs, you probably realized that time-based SLOs aren't really fair for most users. It doesn't help you if your ISP gives you perfect connectivity while you are asleep but always goes down during that important weekly video conference. Or in other words: A time-based SLO means free uptime whenever your service isn't used. Clearly a request-based SLO is much better: It measures what matters, and now an outage during peak time will consume your error budget much more quickly. If this talk were on the “New To SLOs” track, we would stop here. But since this is on the “Deep Dive” track, we need to go deeper. Let's explore a few common scenarios to see how a request-based SLO sometimes exaggerates and sometimes masks problems with your service and what we can do about it.Watch Talk