New to SLOs Talks
The first Service Level Objective Conference for Site Reliability Engineers
A Year of SLO Bootcamps
New To SLOsKit Merker
In this talk, I'll share what I've learned in the last year leading a hands-on SLO bootcamp for a variety of cross functional teams. You'll learn a proven strategy for helping teams get over the hump of a first SLO and how to drive a scalable organizational and cultural change to the SLO-based way of thinking. With COVID, I had to adapt my SLO Bootcamp to being online only, and this forced me to focus on just the essentials, increase interactivity, and ensure the course was of value to all the participants. I'll go over resources you can use to run your own SLO Bootcamp too!
Creating Great Dev Culture though Error Budgets
New To SLOsSal Kimmich
In the most basic definition, error budgets are simply the amount of error that a service can accumulate over a specified period of time before users grumble about the experience. While many organizations introducing error budgets observe them as just another metric for system quality control, there's a huge utility to incorporating error budgets as a fundamental part of your developer culture around trust and timely innovation: with the critical autonomy provided to engineers in this working paradigm, the development team can spend their error budget however they feel is right: either in prevention or cure of system instabilities. In this talk, we will cover common combinations of SLIs that lead to error budget best practices, as well as protocols that can be enacted when error budgets slip: the who, what, and when and why of pre-incident reporting.
From Availability to User Happiness: An Introduction to SLOs That Matter
New To SLOsMichael Ericksen
This talk tells the story of an engineering team that finds themselves in a quasi-incident for a web application that runs inside of Electronic Health Record (EHR) systems like Epic and Cerner. The engineering dashboard for the application showed uptime at 100%. Users, however, paused their implementation timelines because of poor application performance. As an organization, we were measuring the wrong thing. In this talk, I will tell the story of how an engineering team pivoted from measuring availability to key application behaviors for their end users to dramatically improve user satisfaction.
Infrastructure Comes out of the Wall, No One Cares How
New To SLOsRichard Hartmann
You care about your service and how it works internally, your users do not. Your water, electricity, and Internet come out the wall, and if they stop doing that, you call someone to complain. That's how you should think about your services, and we'll explore this thought more.
Just Say No (to Dashboards): You Don’t Need More Information, You Need the Right Information
New To SLOsZac Nickens Abby Bangser
Talk will illustrate the differences between signal and noise in monitoring efforts. Engineers shouldn't sit watching dashboards, they should be improving existing features and developing new features; dashboard/metrics fatigue prevents engineers from living their best life. talk will trace a journey from too many dashboards to identifying the signals that are most meaningful for a team, and adopting an SLO approach to reduce signal fatigue.
Production Readiness Review: Providing a Solid Base for SLOs
New To SLOsMilan Plžík
It's hard to propose a good SLO for a new service with little mileage. Even for years-running service, it's hard to gain confidence that if the service scales 10x, SLO won't be impacted. We'll have a look at Production Readiness Review process, which seeks to identify and remove common pitfalls and already-learned mistakes by a review focused, strengthening confidence in the defined SLO. The process was originally developed at Google (https://sre.google/sre-book/evolving-sre-engagement-model/); at Grafana Labs, we've tailored the process towards our needs, which is what this talk will discuss.
Service Level Overkill - SLO In a World of SOA
New To SLOsMick Roper
Service levels are excellent for understanding the limits you put on your own services, but in a world of web services your own ability to create a useful SLO is impacted by everything you depend upon. In this chat I discuss how to understand SLOs from other teams, how to try to mitigate SLO impact and how to deal with it when it happens. I also talk about what a low SLO means, and why it shouldn't be assumed that you need 9 9's of availability to offer a useful service!
SLO Basics - a conversation about reliability
New To SLOsKeri Melich
SLO Basics - a conversation about reliability
New To SLOsSteve McGhee
It's the architecture, not the products or infrastructure that matter. How to think about your dependencies and how their SLOs affect your own. ⛓Chained services slo = SLOs ^ depth ⛷Parallel isolated services slo = min(SLOs) 🤹♂️ Redundant parallel services = much better ~= SLO of the LB “above”
SLOs at Facebook
New To SLOsPosten A
Scaling SLOs at Facebook to planetary scale using SLICK - a purpose built centralised SLO store integrated into key observability systems.
SLOs for climate: How to Continuously Reduce the Climate Impact of Tech Services
New To SLOsBenoit Petit
Site Reliability Engineering’s goal is to ensure that software systems and services that are created in an organization are made to evolve easily and especially to be extremely reliable. There are several definitions of reliability, one being: “reliability is the ability for a system to fulfill a mission in some defined conditions, for a given period of time”. This definition allows to redefine the conditions that dictate if the system did actually fulfill its mission on the given period of time. As the tech industry has to lower its Green House Gas emissions of 45% in the next 10 years to match Paris agreement objectives, it seems essential to me that a tech service or system is considered reliable, not only if it satisfies the client on the short term, but also if it doesn’t contribute to jeopardize the client’s future. That means obviously, that it has to respect objectives smartly defined regarding GHG emissions related to it’s very existence and usage. In this talk we'll see we can do right now to use those methods, not only to create business value, but for our future too.
SLOs for VPs: What They Give You, What They Cost
New To SLOsNiall Murphy
Targeting "VP-style" audience, explain that SLOs kinda look like KPIs, but they're used to make resourcing decisions rather than provide pure visibility. Worked example. 10m.
Unboxing Blackbox Monitoring for SLO
New To SLOsNavya Dwarakanath
You have read this in every SLO book and heard it in several talks – measure SLOs from the perspective of the end user. Measuring from the user’s perspective is not easy or straightforward but the very basics of how effective your SLOs are. Learn why the user’s perspective is paramount, what makes Blackbox monitoring is effective, the blind spots it helps you cover and how you can use it to define your SLOs.
Weaknesses of the SLO Model
New To SLOsNiall Murphy
Kit will hate this, but it's probably worth spending 5 minutes on problems with the SLO model, and potential approaches to fixing them.