Site Reliability Engineer @ Grafana Labs
Milan is working as a Site Reliability Engineer at Grafana Labs since early 2020, being a part of the Platform squad that manages internal infrastructure, develops internal tooling and provides support to the product teams. Prior to that, Milan worked as an Infrastructure Engineer at a local AI startup and as a Site Reliability Engineer at Google Cloud.
Production Readiness Review: Providing a Solid Base for SLOs
It's hard to propose a good SLO for a new service with little mileage. Even for years-running service, it's hard to gain confidence that if the service scales 10x, SLO won't be impacted. We'll have a look at Production Readiness Review process, which seeks to identify and remove common pitfalls and already-learned mistakes by a review focused, strengthening confidence in the defined SLO. The process was originally developed at Google (https://sre.google/sre-book/evolving-sre-engagement-model/); at Grafana Labs, we've tailored the process towards our needs, which is what this talk will discuss.Watch Talk