SLOconf 2021

The world wants to share and learn about SLOs and
who are we to stop them?
Learn about the success of SLOconf 2021, as we’re bringing back the virtual conference to our community in 2022!

 

go-slo-mark

Previous speakers

Abby Bangser
Abby Bangser

Site Reliability Engineer

Duffel

Abby Bangser

Site Reliability Engineer

Duffel

Just Say No (to Dashboards): You Don't Need More Information...
Learn more ›
close
Abby Bangser

Abby Bangser

Site Reliability Engineer

Duffel

Twitter
Abby Bangser is an engineer with a keen interest in working on products where colleagues are the users. Abby brings the techniques of analysing and testing customer-facing products to tools like delivery pipelines and logging so as to generate clearer feedback and greater value. Currently, Abby is a Site Reliability Engineer (SRE) at Duffel where they are redefining travel.

Outside of work Abby is active in the community by co-leading TechVoices which mentors new and diverse speakers, hosts the London chapter of #CoffeeOps meetup which provides a more interactive space for DevOps professionals to discuss relevant topics, and co-hosts the London Essentials which brings together mentors and new joiners to the software testing industry.
2021
Track New To SLOs

Just Say No (to Dashboards): You Don't Need More Information, You Need the Right Information

Talk will illustrate the differences between signal and noise in monitoring efforts. Engineers shouldn't sit watching dashboards, they should be improving existing features and developing new features; dashboard/metrics fatigue prevents engineers from living their best life. talk will trace a journey from too many dashboards to identifying the signals that are most meaningful for a team, and adopting an SLO approach to reduce signal fatigue.

Alex Hidalgo
Alex Hidalgo

Principal Reliability Advocate

Nobl9

Alex Hidalgo

Principal Reliability Advocate

Nobl9

Learn more ›
close
Alex Hidalgo

Alex Hidalgo

Principal Reliability Advocate

Nobl9

Twitter
Alex Hidalgo is the Principal Reliability Advocate at Nobl9 and author of “Implementing Service Level Objectives.” During his career he has developed a deep love for sustainable operations, proper observability, and using SLO data to drive discussions and make decisions. Alex’s previous jobs have included IT support, network security, restaurant work, t-shirt design, and hosting game shows at bars. When not sharing his passion for technology with others, you can find him scuba diving or watching college basketball. He lives in Brooklyn with his partner Jen and a rescue dog named Taco. Alex has a BA in philosophy from Virginia Commonwealth University.
Alina Anderson
Alina Anderson

Senior TPM of Site Reliability Engineering

Outreach

Alina Anderson

Senior TPM of Site Reliability Engineering

Outreach

Survival Guide: What I Learned From Putting 200 Developers O...
Learn more ›
close
Alina Anderson

Alina Anderson

Senior TPM of Site Reliability Engineering

Outreach

Twitter
Alina is a Senior TPM, cat herding organizations through complex challenges at the intersection of humans and systems. Over the last six years, she has navigated on-call through pre-IPO hypergrowth to Enterprise scale. Alina is committed to giving back to the Seattle Devops community through Ada Developers Academy mentorship, co-organizing DevOps Days Seattle and volunteering on the King County crisis hotline.
2021
Track Beyond Theory

Survival Guide: What I Learned From Putting 200 Developers On Call

We want to live in a world where the development team who writes the code, also owns that code's success...or failure, in production. Nothing incentivizes a team to ship better quality software than getting paged at 2am, but how do we do this? In this talk, you'll learn some tips and tricks for easing less than enthusiastic development teams into on-call rotations, how SRE facilitates the transition to production code ownership and why SLOs are critical to your success.

Andreas Grabner
Andreas Grabner

DevOps Activist at Dynatrace & DevRel for CNCF Kep...

Dynatrace

Andreas Grabner

DevOps Activist at Dynatrace & DevRel for CNCF Keptn

Dynatrace

SLOs For Quality Gates In Your Delivery Pipeline
Learn more ›
close
Andreas Grabner

Andreas Grabner

DevOps Activist at Dynatrace & DevRel for CNCF Keptn

Dynatrace

Twitter
Andreas Grabner (@grabnerandi) has 20+ years of experience as a software developer, tester and architect and is an advocate for high-performing cloud scale applications. He is a contributor and DevRel for the CNCF open source project keptn (www.keptn.sh). Andreas is also a regular contributor to the DevOps community, a frequent speaker at technology conferences and regularly publishes articles on blog.dynatrace.com or medium. In his spare time you can most likely find him on one of the salsa dancefloors of the world (will resume once Covid is behind us)!
2021
Track Technical and Deep Dive

SLOs For Quality Gates In Your Delivery Pipeline

SREs use SLOs to ensure production is stable and changes from development are not impacting SLAs. Error Budgets are a great way to decide whether we can still deploy or not. But Ð every deployment has a risk of impacting critical SLOs, will eat up the error budget faster than planned and eventually lead to a slowdown of innovation.In this session we demonstrate how to use the concept of SLOs as part of continuous delivery to already validate the impact of code or configuration changes before its time to deploy to production. It gives developers faster feedback on the potential impact of their code changes, will increase quality of code that makes it to the gates of production and will therefore result in less impact when the actual production deployment happens.We will demoing this approach using the open source projects Keptn's SLO-based Quality Gate capability.

Andrew Newdigate
Andrew Newdigate

Distinguished Engineer

GitLab Inc.

Andrew Newdigate

Distinguished Engineer

GitLab Inc.

GitLab's journey to SLO Monitoring
Learn more ›
close
Andrew Newdigate

Andrew Newdigate

Distinguished Engineer

GitLab Inc.

Twitter
Andrew is a Distinguished Engineer in the Infrastructure team at GitLab, where he helps keep GitLab.com available, observable and scaling. Before GitLab, he cofounded the developer community site Gitter in 2012, and was CTO there until Gitter’s acquisition by GitLab in 2017.

After living in London in the UK for 17 years, he recently relocated back to his hometown of Cape Town in South Africa.
2021
Track Technical and Deep Dive

GitLab's journey to SLO Monitoring

This talk covers GitLab's adoption of SLO monitoring, from our previous causal alerting strategy, which had outgrown its purpose as the complexity and traffic volumes grew, to our early attempts, building and maintaining configuration, and the problems that brought about, to our current, declarative approach. The talk will cover the challenges of getting buy-in from engineering, operations and product stakeholders, the benefits of having a common language of availability across the organisation and our future plans. This is a deep-dive, practical talk; all the code and configuration for GitLab.com's monitoring infrastructure is open-source, and the talk will include links to these resources. The talk is based on a talk I did at ScaleConf 2020, which received good feedback.

Bart Enkelaar
Bart Enkelaar

Lead Site Reliability Engineer

bol.com

Bart Enkelaar

Lead Site Reliability Engineer

bol.com

The Game of SLOs - A Three Part Reliability Musical
Learn more ›
close
Bart Enkelaar

Bart Enkelaar

Lead Site Reliability Engineer

bol.com

Twitter
Equal parts excited bubble of enthusiasm and space geek, Bart has always had a passion for sharing knowledge. With 12 years of backend engineering experience under his belt, Site Reliability Engineering is his next challenge. More particularly the challenge is rolling it out as the next evolution of DevOps at bol.com, the largest online retailing platform in the Netherlands and Belgium
2021
Track SLOcializing

The Game of SLOs - A Three Part Reliability Musical

Ever since the great success of important society-shaping documentaries like Cats, Wicked and Hamilton, it has been clear that music is the way to truly get a broad audience to accept new information.As SREs, evangelisation is often a core part of what we do, since it often revolves around convincing people to take a new approach to innovation.In this three-part musical, we'll describe the journey through SRE in a manner which is both recognisable and informative and as such should be directly applicable to change hearts and minds on reliability all across the world. Get out your ukulele and sing along!

Benoit Petit
Benoit Petit

Founder

Hubblo

Benoit Petit

Founder

Hubblo

SLOs for climate: How to Continuously Reduce the Climate Imp...
Learn more ›
close
Benoit Petit

Benoit Petit

Founder

Hubblo

Twitter
Systems and cloud engineer. I want to help reduce the climate impact of tech by enabling full transparency regarding our impact on the physical world. I work on scaphandre, an open-source tool to measure the power consumption of servers and the services they host.
2021
Track New To SLOs

SLOs for climate: How to Continuously Reduce the Climate Impact of Tech Services

Site Reliability Engineering's goal is to ensure that software systems and services that are created in an organization are made to evolve easily and especially to be extremely reliable.There are several definitions of reliability, one being: 'reliability is the ability for a system to fulfill a mission in some defined conditions, for a given period of time'. This definition allows to redefine the conditions that dictate if the system did actually fulfill its mission on the given period of time.As the tech industry has to lower its Green House Gas emissions of 45% in the next 10 years to match Paris agreement objectives, it seems essential to me that a tech service or system is considered reliable, not only if it satisfies the client on the short term, but also if it doesn't contribute to jeopardize the client's future. That means obviously, that it has to respect objectives smartly defined regarding GHG emissions related to it's very existence and usage.In this talk we'll see we can do right now to use those methods, not only to create business value, but for our future too.

Bhargav Bhikkaji
Bhargav Bhikkaji

Founder & CEO

Tailwinds

Bhargav Bhikkaji

Founder & CEO

Tailwinds

SLOs for Production Grade Kubernetes.
Learn more ›
close
Bhargav Bhikkaji

Bhargav Bhikkaji

Founder & CEO

Tailwinds

Twitter
Bhargav Bhikkaji is a Founder and CEO of Tailwinds.ai which provides continous automated contiounous deployments for cloud native platforms to accelrate fast. Bhargav has 20 years experience in software industry having build softwares for computer networking and for cloud. Bhargav is avid runner and has run 22 marathons in all around the world.
2021
Track Technical and Deep Dive

SLOs for Production Grade Kubernetes.

We all know that cloud native platform and especially Kubernetes is hard to operate, would not it be great to look at list of SLIs/SLOs to understand if our Kubernetes platform is fine or not. I being cloud native consultant and have worked with many organizations have helped customers to kick start and manage their Kubernetes journey, would like share experiences on important SLOs they monitor for their production grade Kubernetes.

Björn Rabenstein
Björn Rabenstein

Grafana Labs

Grafana Labs

Björn Rabenstein

Grafana Labs

Grafana Labs

Should SLOs Be Request-Based or Time-Based? And Why Neither ...
Learn more ›
close
Björn Rabenstein

Björn Rabenstein

Grafana Labs

Grafana Labs

Björn “Beorn” Rabenstein is an engineer at Grafana and a Prometheus developer. Previously, he was a Production Engineer at SoundCloud, a Site Reliability Engineer at Google, and a number cruncher for science.
2021
Track Technical and Deep Dive

Should SLOs Be Request-Based or Time-Based? And Why Neither Really WorksÉ

Once you had gotten somewhat familiar with SLOs, you probably realized that time-based SLOs aren't really fair for most users. It doesn't help you if your ISP gives you perfect connectivity while you are asleep but always goes down during that important weekly video conference. Or in other words: A time-based SLO means free uptime whenever your service isn't used. Clearly a request-based SLO is much better: It measures what matters, and now an outage during peak time will consume your error budget much more quickly. If this talk were on the ÒNew To SLOsÓ track, we would stop here. But since this is on the ÒDeep DiveÓ track, we need to go deeper. Let's explore a few common scenarios to see how a request-based SLO sometimes exaggerates and sometimes masks problems with your service and what we can do about it.

Dan Wilson
Dan Wilson

Co-founder, CTO

Control Plane Corporation

Dan Wilson

Co-founder, CTO

Control Plane Corporation

Lessons from Failure: How to Fail and Still Succeed
Learn more ›
close
Dan Wilson

Dan Wilson

Co-founder, CTO

Control Plane Corporation

Twitter
Dan Wilson has served as CTO of Control Plane since October of 2019. Dan has over 20 years of experience working on cloud services in contributor and leadership roles across operations, engineering, and architecture. While working at SAP Concur, he scaled their SaaS offering to millions of users and directed their shift to cloud architecture. Dan's passion is building tools to help engineering teams leverage cloud-native solutions with ease yet sustain power and flexibility. Dan is an open-source software advocate with contributions to Kubernetes, Istio, Knative, and more. He inspires teams to focus on powerful but simple architectures, end-to-end security, and failure mitigation.
2021
Track Beyond Theory

Lessons from Failure: How to Fail and Still Succeed

I worked at Concur on infrastructure, operations and engineering as it grew from a few users to millions. Over the years, I was witness of many failures across the stack and caused a handful of issues myself. In this talk, I'll walk through some of the most brutal and customer impacting failures that I saw or caused and highlight the core principles I learned after surviving through these stressful situations.

Daniel “Spoons” Spoonhower
Daniel “Spoons” Spoonhower

Co-founder, Chief Architect

Lightstep

Daniel “Spoons” Spoonhower

Co-founder, Chief Architect

Lightstep

Using Observability to Set Good SLOs
Learn more ›
close
Daniel “Spoons” Spoonhower

Daniel “Spoons” Spoonhower

Co-founder, Chief Architect

Lightstep

Twitter
Daniel “Spoons” Spoonhower is a co-founder and Chief Architect at Lightstep, where he’s building performance management tools for deep software systems. He is an author of Distributed Tracing in Practice (O’Reilly Media, 2020). Previously, Spoons spent almost six years at Google where he worked as part of Google’s infrastructure and Cloud Platform teams. He has published papers on the performance of parallel programs, garbage collection, and real-time programming. He has a PhD in programming languages from Carnegie Mellon University but still hasn’t found one he loves.
2021
Track Beyond Theory

Using Observability to Set Good SLOs

While setting SLOs for externally visible services can be relatively straightforward, doing so for *internal* services can be more challenging. Teams can use current performance metrics to take a first stab at what internal services SLOs should be. While this lets them set realistic targets, it often means that they set objectives that are too high. In contrast, using distributed traces to understand how requests Ð and SLOs Ð flow through through the application can help set SLOs that are looser (but not too loose). And not only does it help teams set better SLOs, it also helps them better understand which other SLOs their services depend on (and which depend on them). In this talk, I'll walk through a couple of examples to show how.

Dylan Zehr
Dylan Zehr

Site Reliability Engineering Manager

Google SRE

Dylan Zehr

Site Reliability Engineering Manager

Google SRE

Using Binomial proportion confidence intervals to reduce fal...
Learn more ›
close
Dylan Zehr

Dylan Zehr

Site Reliability Engineering Manager

Google SRE

Twitter
SRE manager for Google Cloud (GCP) BigQuery.
2021
Track Technical and Deep Dive

Using Binomial proportion confidence intervals to reduce false positives in low QPS services

Description of how to use Binomial intervals (specific Wilson score intervals) to modify SLO metrics to reduct false positives in services with periods of low QPS.The description would cover some basic background of the statistical methods, some example graphs, possibly an example of how to configure using a common platform.

Fred Moyer
Fred Moyer

Senior Staff SRE

Zendesk

Fred Moyer

Senior Staff SRE

Zendesk

SLIs, SLOs, and Error Budgets at Scale
Learn more ›
close
Fred Moyer

Fred Moyer

Senior Staff SRE

Zendesk

Twitter
Fred is a resident SLOgician and Observability Economist at Zendesk, where he works to ensure reliability is world class through the use of SLOs and Error Budgets. He previously worked with high scale operational telemetry at Circonus, and before that at Turnitin.com. Fred recently received a patent for Inverse Cumulative histograms, which Zendesk uses to power SLOs and Error Budgets.
2021
Track Technical and Deep Dive

SLIs, SLOs, and Error Budgets at Scale

How can one democratize the implementation of SLIs, SLOs, and Error Budgets to put them in the hands of a thousand engineers at once? At Zendesk we developed simple algorithms and practical approaches for implementing SLIs, SLOs, and Error Budgets at scale using a number of observability tools. This talk will show the approaches developed and how we were able to manage observability instrumentation across dozens of teams quickly in a complex ecosystem (CDN, UI, middleware, backend, queues, dbs, queues, etc).This talk is for engineers and operations folks who are putting SLIs, SLOs, and Error Budgets into practice. Attendees will come away with concrete examples of how to communicate and implement Error Budgets across multiple teams and diverse service architectures.

Frederic Branczyk
Frederic Branczyk

CEO and Founder

Polar Signals

Frederic Branczyk

CEO and Founder

Polar Signals

Defining SLOs: A Practical Guide
Learn more ›
close
Frederic Branczyk

Frederic Branczyk

CEO and Founder

Polar Signals

Twitter
Frederic is the founder and CEO of Polar Signals. Before founding Polar Signals he was a senior principal engineer and the main architect for all things observability at Red Hat, which he joined through the CoreOS acquisition. Frederic is a Prometheus and Thanos maintainer as well as the tech lead for the special interest group for instrumentation in Kubernetes. In a previous life, he was a security researcher working on key management solutions as well as intrusion detection systems. When not working on software, Frederic enjoys obsessing over brewing a perfect cup of coffee.
2021
Track Technical and Deep Dive

Defining SLOs: A Practical Guide

SLOs often seem simple in theory, but tend to get difficult when actually implementing them, as the reality if often not by the textbook. SLOs are an invaluable tool for both engineers as well as management to consistently communicate reliability with data. Defining bad SLOs can also be harmful, so it's important to keep various caveats in mind. Not only are SLOs about data, it is equally important to clarify and evangelize expectations of SLOs within an organization.Frederic and Matthias have many years of experience of defining SLOs for many services and components. Together they will demonstrate real life examples of choosing, measuring, alerting and reporting SLOs based on Prometheus metrics.Join this talk to learn how to implement SLOs successfully using data you most likely already have.

Hassy Veldstra
Hassy Veldstra

Open source developer, SRE, founder of an open sou...

Artillery.io

Hassy Veldstra

Open source developer, SRE, founder of an open source company. On a mission to help dev teams keep their production systems fast & reliable and pagers silent.

Artillery.io

Production Load Testing as a Guardrail for SLOs
Learn more ›
close
Hassy Veldstra

Hassy Veldstra

Open source developer, SRE, founder of an open source company. On a mission to help dev teams keep their production systems fast & reliable and pagers silent.

Artillery.io

Twitter
Open source developer, SRE, founder of an open source company. On a mission to help dev teams keep their production systems fast & reliable and pagers silent.
2021
Track Technical and Deep Dive

Production Load Testing as a Guardrail for SLOs

Production load testing (yes you read that right!) can be an excellent technique for building an extra buffer of safety around your SLOs.We will cover:- Using existing SLOs to prioritize the areas of the system to test- Using existing SLOs to run production load tests safely- Putting SLOs on the load tests themselvesThis talk draws on the author's experience of implementing production load testing for building a margin of safety around SLOs at a large international publisher.

Heinrich Hartmann
Heinrich Hartmann

Principal Engineer

Zalando

Heinrich Hartmann

Principal Engineer

Zalando

The State of the Histogram
Learn more ›
close
Heinrich Hartmann

Heinrich Hartmann

Principal Engineer

Zalando

Twitter
PhD in Mathematics. Ex Circonus Data Scientist. Now Principal Engineer @ Zalando SRE.

I have been speaking about Monitoring, Statistics and Histograms at various tech conferences over the past 5 years.
2021
Track Technical and Deep Dive

The State of the Histogram

In this talk we are going to survey different available technologies to capture (latency) distributions and store them in time-series databases. This includes (a) the theoretical underpinnings (b) accuracy and performance and (c) operational aspects (d) adoption.Disclaimer: The author worked on openhistogram.io in the past.

Ioannis Georgoulas
Ioannis Georgoulas

Senior Site Reliability Engineering Manager

Paddle.com

Ioannis Georgoulas

Senior Site Reliability Engineering Manager

Paddle.com

SLO From Nothing to Production
Learn more ›
close
Ioannis Georgoulas

Ioannis Georgoulas

Senior Site Reliability Engineering Manager

Paddle.com

Twitter
Ioannis is Senior SRE Engineering Manager at Paddle.com, he is an SLO evangelist and practisioner with an obsession to measure anything that matters for the users and the business.
2021
Track Beyond Theory

SLO From Nothing to Production

My focus of this talk will be on how I educated myself about SLOs and how applied this to my organization. I will present my biggest learnings; such as having an SLO mindset is definitely a marathon. I will present my SLO journey and more specific: what I read and did to learn more about SLOs, how I got the buy in from the appropriate stateholders, how advocacy of SLOs internally is super important and how we build an SLO "framework".On the SLO framework I will cover what tools we use to build our SLIs, where we store the SLO docs, how we implement burn rate alerting and how all these fit together in a scalable and extendable way. The last part will be learnings from our SLOs and ways of working with the Product teams in order to define their SLOs.

Jacob Scott
Jacob Scott

Reliability Engineer

Stripe

Jacob Scott

Reliability Engineer

Stripe

SLOs As One Course in the Full Reliability Tasting Menu
Learn more ›
close
Jacob Scott

Jacob Scott

Reliability Engineer

Stripe

Twitter
Software engineer at Stripe focused on Reliability. Previously South Park Commons, Lyft.
2021
Track Technical and Deep Dive

SLOs As One Course in the Full Reliability Tasting Menu

SLOs can help us understand our reliability, but they aren't magic beans. In this talk I'll explain what they aren't good for (spoiler: catastrophes). Embracing the fact that SLOs are an incomplete approach to reliability lets us use them in composition with other approaches to better wrangle with the end-to-end reliability of our (complex, socio-technical) systems. I'll also discuss how techniques from modern safety science ('resilience engineering') can pair well with SLOs.You'll leave this talk curious about how these techniques can help you address the concrete reliability challenges you face in your systems today.

Julie Gunderson
Julie Gunderson

Senior Reliability Advocate

Gremlin Inc.

Julie Gunderson

Senior Reliability Advocate

Gremlin Inc.

The Psychology of Chaos Engineering
Learn more ›
close
Julie Gunderson

Julie Gunderson

Senior Reliability Advocate

Gremlin Inc.

Twitter
Julie Gunderson is a DevOps Advocate at PagerDuty, where she works to further the adoption of DevOps best practices and methodologies. She has been actively involved in the DevOps space for over five years and is passionate about helping individuals, teams and organizations understand how to leverage DevOps and develop amazing cultures. Julie made a career developing relationships and building communities. She has delivered talks at conferences such as Velocity, Agile Conf, OSCON and more, as well as being a contributor to opensource.com and techtarget. Julie is also a founding member and co-organizer of DevOpsDays Boise.
In her off time Julie can be found either traipsing through the mountains in Idaho, or making circuit boards into wearable art.
2021
Track Beyond Theory

The Psychology of Chaos Engineering

Chaos Engineering, failure injection, and similar practices have verified benefits to the resilience of systems and infrastructure. But can they provide similar resilience to teams and people? What are the effects and impacts on the humans involved in the systems? This talk will delve into both positive and negative outcomes to all the groups of people involved - including users, engineers, product, and business owner

Jürgen Etzlstorfer
Jürgen Etzlstorfer

Technology Strategist

Dynatrace

Jürgen Etzlstorfer

Technology Strategist

Dynatrace

Evaluate Application Resilience with Chaos Engineering and S...
Learn more ›
close
Jürgen Etzlstorfer

Jürgen Etzlstorfer

Technology Strategist

Dynatrace

Twitter
Jürgen is a Technology Strategist in the Innovation Lab at Dynatrace. He is a maintainer of the Keptn open-source project and is working with contributors and the community on growing the Keptn ecosystem, including chaos engineering, performance testing, and other tools. He is not only passionate about developing new software, but equally excited to share his experience, most recently at conferences on Kubernetes based technologies and chaos engineering.
2021
Track Technical and Deep Dive

Evaluate Application Resilience with Chaos Engineering and SLOs

SLOs are not only a great way to efficiently measure the availability and quality of production environments but should also be used to ensure the resilience of applications before production as part of chaos engineering. While many organizations start with ad-hoc chaos experiments in production to validate the impact on SLOs it is more efficient to bake these tests and checks into the continuous delivery process.In this session, we give you practical guidance on Òchaos stagesÓ as part of your continuous delivery to validate the compliance with your production SLOs prior to entering production. As a showcase we are demoing a chaos enriched delivery orchestration with the CNCF projects LitmusChaos (for chaos experiments) and Keptn (for orchestration of automated load testing and SLO validation).

Keri Melich
Keri Melich

Senior Site Reliability Engineer

Nobl9

Keri Melich

Senior Site Reliability Engineer

Nobl9

SLO Basics - a conversation about reliability
Learn more ›
close
Keri Melich

Keri Melich

Senior Site Reliability Engineer

Nobl9

Twitter
Keri is an SRE for Nobl9 working to help scale and secure the Nobl9 platform. Before that she spent 4 years at Squarespace building secure, scalable solutions for internal users. In her free time, she loves crafting, backpacking, and snowboarding.
2021
Track New To SLOs

SLO Basics - a conversation about reliability

SLO Basics - a conversation about reliability

Kit Merker
Kit Merker

COO

Nobl9

Kit Merker

COO

Nobl9

A Year of SLO Bootcamps
Learn more ›
close
Kit Merker

Kit Merker

COO

Nobl9

Twitter
Kit Merker's 20+ year career spans product management, engineering, evangelism and community-building roles at Google, Microsoft, JFrog, and the governing board of the Cloud Native Computing Foundation (CNCF). He is currently Chief Operating Officer for Nobl9, the service level observability company, helping software teams optimize their delivery to make customers happy and business sustainable.
2021
Track New To SLOs

A Year of SLO Bootcamps

In this talk, I'll share what I've learned in the last year leading a hands-on SLO bootcamp for a variety of cross functional teams. You'll learn a proven strategy for helping teams get over the hump of a first SLO and how to drive a scalable organizational and cultural change to the SLO-based way of thinking. With COVID, I had to adapt my SLO Bootcamp to being online only, and this forced me to focus on just the essentials, increase interactivity, and ensure the course was of value to all the participants. I'll go over resources you can use to run your own SLO Bootcamp too!

Kristina Bennett
Kristina Bennett

Site Reliability Engineer, Customer Reliability

Google SRE

Kristina Bennett

Site Reliability Engineer, Customer Reliability

Google SRE

Learn more ›
close
Kristina Bennett

Kristina Bennett

Site Reliability Engineer, Customer Reliability

Google SRE

Twitter
Site Reliability Engineer, Customer Reliability for Google's GCP cloud.
Kristof Renders
Kristof Renders

Autonomous Cloud Enablement Practice Manager

Dynatrace

Kristof Renders

Autonomous Cloud Enablement Practice Manager

Dynatrace

Top 5 Real-life SLOs and Decision Tree to Define Your SLOs
Learn more ›
close
Kristof Renders

Kristof Renders

Autonomous Cloud Enablement Practice Manager

Dynatrace

Kristof is the Autonomous Cloud Enablement Practice Manager at Dynatrace based out of Belgium with over a decade of experience in the observability space. Having spent his career helping our largest customers capitalize on their Performance Monitoring investment, he now uses that expertise to help customers get to the next level of Autonomous Cloud Enablement. When not jet-setting around the globe for work, Kristof likes to travel and see all the beauty, and breweries, that the world has to offer by bike or with a backpack.
2021
Track Beyond Theory

Top 5 Real-life SLOs and Decision Tree to Define Your SLOs

The Google SRE theory already tells us, what many confirm with the own SRE journey: It is a hard task to determine the most valuable SLOs for your system. Monitoring tools like Dynatrace provide over 2000 metrics with many filter options and even more data is available with the integration of data sources like OpenTelemetry, SNMP, or any business data sources. For SLOs one needs to choose to focus on important data. We had a look at our customers adopting SLO monitoring in Dynatrace and present a hit list of SLO types we got reported as important. We show how the setup of such SLOs looks like Ð for both major categories of SLOs: real-user traffic request count based SLOs and synthetic availability monitoring SLOs. We propose a decision tree how to get from an idea to defined SLO configurations.

Liz Fong-Jones
Liz Fong-Jones

Developer advocate, Labor And Ethics Organizer, & ...

Honeycomb

Liz Fong-Jones

Developer advocate, Labor And Ethics Organizer, & Site Reliability Engineer

Honeycomb

SLOs & Observability - better together
Learn more ›
close
Liz Fong-Jones

Liz Fong-Jones

Developer advocate, Labor And Ethics Organizer, & Site Reliability Engineer

Honeycomb

Twitter
Liz is a developer advocate, labor and ethics organizer, and Site Reliability Engineer (SRE) with 16+ years of experience. She is an advocate at Honeycomb for the SRE and Observability communities, and previously was an SRE working on products ranging from the Google Cloud Load Balancer to Google Flights.

She lives in Vancouver, BC with her wife Elly, partners, and a Samoyed/Golden Retriever mix, and in Sydney, NSW. She plays classical piano, leads an EVE Online alliance, and advocates for transgender rights.
2021
Track Beyond Theory

SLOs & Observability - better together

We built support for SLOs (Service Level Objectives) against our event store so we could monitor our own complex distributed system. In the process of doing so, we learned that there were a number of important aspects that we didn't expect from carefully reading the SRE workbook. This talk is the story of the missing pieces, unexpected pitfalls, and how we solved those problems. We'd like to share what we learned and how we iterated on our SLO adventure. As an SLO advocate and a design researcher, we collected user feedback through iterative deployments to learn what challenges users were running into. This conversation will discuss how we iterated our design, based on user feedback; how we deployed, what we learned, and re-deployed; and how we collected information from our users and from the alerts our system fired.In this talk, we will discuss how we brought the theory of SLOs to practice, and what we learned that we hadn't expected in the process. We'll discuss implementing the SLO feature and burn alerts; and our experiences from working with the SRE team who started using the alerts. Our hope is that when you buy or build your SLO tools, you'll know what to look for, and how to get started. implementors will be able to start with a more solid ground, and that we will be able to advance the state of SLO support for all teams that wish to implement them.The major design points will be broken into a discussion of what we actually built; a number of unexpected technical features; and ways that we had to educate users beyond the standard SLO guidelines. The talk is largely conceptual: no live code will be shown, although some innocent servers may well die in the process of being visualized.

MaSonya Scott
MaSonya Scott

Principal Specialist

AWS

MaSonya Scott

Principal Specialist

AWS

Learn more ›
close
MaSonya Scott

MaSonya Scott

Principal Specialist

AWS

MaSonya B. Scott has over 20 years of experience developing and managing the execution of complex programs, entailing IT organizational transformation, Service Management (ITILv3) process engineering, system integration and business relationship management for a magnitude of high profile organizations in both the commercial and public sector arenas. In the past five years, across top tier commercial clients (members of the Fortune 100) and mission-critical federal agencies, MaSonya’ s client portfolio encompasses a total cost savings value of over $40 million dollars.
Matt Ray
Matt Ray

Regional Manager, Customer Architect - APJ

Chef

Matt Ray

Regional Manager, Customer Architect - APJ

Chef

Applying SLOs to Infrastructure and Compliance as Code
Learn more ›
close
Matt Ray

Matt Ray

Regional Manager, Customer Architect - APJ

Chef

Twitter
Matt Ray has worked in and with enterprises and startups across a wide variety of industries including banking, retail, and government. He has been active in Open Source and DevOps communities for over two decades and has spoken at and helped organize many conferences and meetups. He currently resides in Sydney, Australia after relocating from Austin, Texas. He podcasts at SoftwareDefinedTalk.com, blogs at MattRay.dev, and is @mattray on Twitter, IRC, GitHub, and too many Slacks.
2021
Track SLOcializing

Applying SLOs to Infrastructure and Compliance as Code

Audits, compliance, and security are top of mind for most enterprises, while configuration management is not something most executives consider. Management teams are focused on reaching their business targets, but operations is the engine that helps the organization achieve their goals. Developers and operators need to align their goals with the business, and Service Level Objectives (SLOs) help focus these efforts and raise visibility. Configuration management _is_ important, but it needs to be part of an SLO for delivering reliable infrastructure quickly and efficiently. Security and passing audits are important, we need to understand our exposure to risk by attaining high levels of compliance. This session will provide examples of making those goals visible through SLOs, with examples provided from the open source Chef and InSpec projects.

Matthias Loibl
Matthias Loibl

Senior Software Engineer

Polar Signals

Matthias Loibl

Senior Software Engineer

Polar Signals

Defining SLOs: A Practical Guide
Learn more ›
close
Matthias Loibl

Matthias Loibl

Senior Software Engineer

Polar Signals

Twitter
Matthias Loibl is a Senior Software Engineer who works on cloud-native observability, previously at Red Hat and Kubermatic, and is a maintainer of many projects like Thanos and the Prometheus Operator. He contributed to the SLO book: "Implementing Service Level Objectives". He loves working on Distributed Systems with Go, Docker, Kubernetes and Prometheus. In his free time, he contributes to numerous open source projects related to Prometheus and Drone.
2021
Track Technical and Deep Dive

Defining SLOs: A Practical Guide

SLOs often seem simple in theory, but tend to get difficult when actually implementing them, as the reality if often not by the textbook. SLOs are an invaluable tool for both engineers as well as management to consistently communicate reliability with data. Defining bad SLOs can also be harmful, so it's important to keep various caveats in mind. Not only are SLOs about data, it is equally important to clarify and evangelize expectations of SLOs within an organization.Frederic and Matthias have many years of experience of defining SLOs for many services and components. Together they will demonstrate real life examples of choosing, measuring, alerting and reporting SLOs based on Prometheus metrics.Join this talk to learn how to implement SLOs successfully using data you most likely already have.

Meghan Jordan
Meghan Jordan

Senior Product Manager

Datadog

Meghan Jordan

Senior Product Manager

Datadog

Fundamentals for improving customer experience
Learn more ›
close
Meghan Jordan

Meghan Jordan

Senior Product Manager

Datadog

Twitter
Meghan Jordan is a Senior Product Manager at Datadog focusing on improving the experience for on-call engineers with Datadog's SLO and Incident Management products. When she's not working with customers, she's walking her dog around Brooklyn.
2021
Track Beyond Theory

Fundamentals for improving customer experience

Service level objectives (SLOs) help you understand the health of your systems and how your end users experience them.You're not likely to achieve desired results if you're not basing decisions on useful data and this means that poorly defined SLIs (using the wrong metrics) and SLOs (defining the wrong targets) could cause worse outcomes for your users.In this talk we'll cover how SLOs help you make more informed decisions. You'll learn how to get started with SLOs and choose the right service level indicators to meet your customers' expectations.

Melissa Boggs
Melissa Boggs

VP of Business Agility

Sauce Labs

Melissa Boggs

VP of Business Agility

Sauce Labs

Agile & DevOps Walk into a Bar
Learn more ›
close
Melissa Boggs

Melissa Boggs

VP of Business Agility

Sauce Labs

VP of Business Agility at Sauce Labs
2021
Track Beyond Theory

Agile & DevOps Walk into a Bar

Tune in to hear an Agility Exec and a DevOps Exec talk about the intersection of agile, DevOps, and metrics over a virtual "beer". In this 10 minute convo, we chat about the definitions of DevOps and agile and how metrics can play a part in showing leadership and teams where they can improve. Are your metrics acting as a window or a mirror?

Michael Ericksen
Michael Ericksen

Site Reliability Engineer

Intelligent Medical Objects

Michael Ericksen

Site Reliability Engineer

Intelligent Medical Objects

From Availability to User Happiness: An Introduction to SLOs...
Learn more ›
close
Michael Ericksen

Michael Ericksen

Site Reliability Engineer

Intelligent Medical Objects

Twitter
An electric bike hive member, Michael works as a Site Reliability Engineer and obsesses about service level objectives and measuring what matters to users. When he’s not working (or on call), you’ll likely find him picking up kids from school, grocery shopping, and running other local errands on his bicycle.
2021
Track New To SLOs

From Availability to User Happiness: An Introduction to SLOs That Matter

This talk tells the story of an engineering team that finds themselves in a quasi-incident for a web application that runs inside of Electronic Health Record (EHR) systems like Epic and Cerner. The engineering dashboard for the application showed uptime at 100%. Users, however, paused their implementation timelines because of poor application performance. As an organization, we were measuring the wrong thing. In this talk, I will tell the story of how an engineering team pivoted from measuring availability to key application behaviors for their end users to dramatically improve user satisfaction.

Michael Friedrich
Michael Friedrich

Senior Developer Evangelist

GitLab Inc.

Michael Friedrich

Senior Developer Evangelist

GitLab Inc.

Left Shift your SLOs
Learn more ›
close
Michael Friedrich

Michael Friedrich

Senior Developer Evangelist

GitLab Inc.

Twitter
Michael Friedrich is a Senior Developer Evangelist at GitLab focussing on Observability, SRE, and Ops. He studied Hardware/Software Systems Engineering and moved into DNS and monitoring development at the University of Vienna and ACO.net. Michael was a maintainer of an OSS monitoring software for 11 years before joining GitLab. He loves to help educate everyone and regularly speaks at events and meetups. Michael co-founded the #EveryoneCanContribute cafe meetup group to learn cloud-native & DevOps. Michael is a Polynaut advisor at Polywork, created o11y.love as a learning platform for Observability, and shares insights in the opsindev.news newsletter.
2021
Track SLOcializing

Left Shift your SLOs

Everyone talks about Security shifting left in your CI/CD pipeline. Tools and cultural changes enable teams to scale and avoid deployment problems. SLOs are left out - what if a software change triggers a regression and your production SLOs fail? As a developer, you want to detect these problems as early as possible. This talk dives deep into CI/CD pipelines and discusses ideas to calculate and match SLOs in the development lifecycle. Early in your Pull or Merge Request for review.

Michael March
Michael March

Head of Innovation

Isos Technology

Michael March

Head of Innovation

Isos Technology

Supporting tools/templates to guide your SLO journey
Learn more ›
close
Michael March

Michael March

Head of Innovation

Isos Technology

Twitter
30+ years in IT services, Atlassian "expert", SLO luv'r
2021
Track SLOcializing

Supporting tools/templates to guide your SLO journey

Your org has chosen to implement SLOs, awesome! Beyond the core tooling (monitoring, SLO measuring, etc) this talk will quickly demonstrate concrete examples of tools and processes one can utilize which will support your organization implementation journey - soup to nuts.

Mick Roper
Mick Roper

Software Engineer

Reliably

Mick Roper

Software Engineer

Reliably

Service Level Overkill - SLO In a World of SOA
Learn more ›
close
Mick Roper

Mick Roper

Software Engineer

Reliably

Software, Architecture and reliability engineer
2021
Track New To SLOs

Service Level Overkill - SLO In a World of SOA

Service levels are excellent for understanding the limits you put on your own services, but in a world of web services your own ability to create a useful SLO is impacted by everything you depend upon. In this chat I discuss how to understand SLOs from other teams, how to try to mitigate SLO impact and how to deal with it when it happens. I also talk about what a low SLO means, and why it shouldn't be assumed that you need 9 9's of availability to offer a useful service!

Milan Plžík
Milan Plžík

Site Reliability Engineer

Grafana Labs

Milan Plžík

Site Reliability Engineer

Grafana Labs

Production Readiness Review: Providing a Solid Base for SLOs
Learn more ›
close
Milan Plžík

Milan Plžík

Site Reliability Engineer

Grafana Labs

Milan is working as a Site Reliability Engineer at Grafana Labs since early 2020, being a part of the Platform squad that manages internal infrastructure, develops internal tooling and provides support to the product teams. Prior to that, Milan worked as an Infrastructure Engineer at a local AI startup and as a Site Reliability Engineer at Google Cloud.
2021
Track New To SLOs

Production Readiness Review: Providing a Solid Base for SLOs

It's hard to propose a good SLO for a new service with little mileage. Even for years-running service, it's hard to gain confidence that if the service scales 10x, SLO won't be impacted. We'll have a look at Production Readiness Review process, which seeks to identify and remove common pitfalls and already-learned mistakes by a review focused, strengthening confidence in the defined SLO. The process was originally developed at Google (https://sre.google/sre-book/evolving-sre-engagement-model/); at Grafana Labs, we've tailored the process towards our needs, which is what this talk will discuss.

Navya Dwarakanath
Navya Dwarakanath

Senior Solutions Engineer

Catchpoint Systems

Navya Dwarakanath

Senior Solutions Engineer

Catchpoint Systems

Unboxing Blackbox Monitoring for SLO
Learn more ›
close
Navya Dwarakanath

Navya Dwarakanath

Senior Solutions Engineer

Catchpoint Systems

Twitter
Navya Dwarakanath is a Senior Solutions Engineer at Catchpoint Systems. With over 8 years of experience in the digital performance and monitoring space, she works with customers to define and refine their digital monitoring strategy. She is passionate about networking, distributed systems and defining and improving availability and performance in a digital world. She is a strong advocate of the saying “That which is measured improves”. She takes pleasure in being a data detective when troubleshooting issues and enjoys the thrills of digging into monitoring data to unravel the unknowns.
2021
Track New To SLOs

Unboxing Blackbox Monitoring for SLO

You have read this in every SLO book and heard it in several talks Ð measure SLOs from the perspective of the end user. Measuring from the user's perspective is not easy or straightforward but the very basics of how effective your SLOs are. Learn why the user's perspective is paramount, what makes Blackbox monitoring is effective, the blind spots it helps you cover and how you can use it to define your SLOs.

Niall Murphy
Niall Murphy

Author

SRE Book

Niall Murphy

Author

SRE Book

Introduction to SLO Alerting and Monitoring
Learn more ›
close
Niall Murphy

Niall Murphy

Author

SRE Book

Twitter
SRE book maven
2021
Track Technical and Deep Dive

Introduction to SLO Alerting and Monitoring

Super simple rehearsal of the "SLO alerting" chapter from the book, with worked example.

Posten A
Posten A

Production Engineer

Facebook

Posten A

Production Engineer

Facebook

SLOs at Facebook
Learn more ›
close
Posten A

Posten A

Production Engineer

Facebook

Posten is a Production Engineer at Facebook in London with 10 years of experience in Site Reliability topics currently working on monitoring at Facebook.
2021
Track New To SLOs

SLOs at Facebook

Scaling SLOs at Facebook to planetary scale using SLICK - a purpose built centralised SLO store integrated into key observability systems.

Richard Hartmann
Richard Hartmann

Community Director

Grafana Labs

Richard Hartmann

Community Director

Grafana Labs

Infrastructure Comes out of the Wall, No One Cares How
Learn more ›
close
Richard Hartmann

Richard Hartmann

Community Director

Grafana Labs

Twitter
RichiH has been running Open Source communities and conferences for two decades, several of them the largest on Earth like freenode and FOSDEM. For eleven years, he was the only person at a company truly caring about the pager, and he would not have made it through without on-point alerting.
He also designed and built a datacenter from scratch, is a Prometheus maintainer, and founded OpenMetrics.
2021
Track New To SLOs

Infrastructure Comes out of the Wall, No One Cares How

You care about your service and how it works internally, your users do not.Your water, electricity, and Internet come out the wall, and if they stop doing that, you call someone to complain. That's how you should think about your services, and we'll explore this thought more.

Ryan Lockard
Ryan Lockard

SVP, @ Contino | Chief Technology Officer Cloud+

Ryan Lockard

SVP, @ Contino | Chief Technology Officer Cloud+

Agile & DevOps Walk into a Bar
Learn more ›
close
Ryan Lockard

Ryan Lockard

SVP, @ Contino | Chief Technology Officer Cloud+

Ryan Lockard, SVP, Contino | CTO, Cognizant Cloud+
2021
Track Beyond Theory

Agile & DevOps Walk into a Bar

Tune in to hear an Agility Exec and a DevOps Exec talk about the intersection of agile, DevOps, and metrics over a virtual "beer". In this 10 minute convo, we chat about the definitions of DevOps and agile and how metrics can play a part in showing leadership and teams where they can improve. Are your metrics acting as a window or a mirror?

Sal Kimmich
Sal Kimmich

Product Strategist, Developer Advocate

Reliably

Sal Kimmich

Product Strategist, Developer Advocate

Reliably

Creating Great Dev Culture though Error Budgets
Learn more ›
close
Sal Kimmich

Sal Kimmich

Product Strategist, Developer Advocate

Reliably

Twitter
Sara Kimmich is a full-stack developer and certified scrum master focused on building better-than-agile systems in the workplace. With experience leading teams ranging from small, co-located agile software teams to distributed online teams of nearly a thousand, she teaches practical and quantitative strategies for successful tech development. She cares about the open web, believes in equal opportunity to education, and is passionate about how the internet can be a force for good in the world.
2021
Track New To SLOs

Creating Great Dev Culture though Error Budgets

In the most basic definition, error budgets are simply the amount of error that a service can accumulate over a specified period of time before users grumble about the experience. While many organizations introducing error budgets observe them as just another metric for system quality control, there's a huge utility to incorporating error budgets as a fundamental part of your developer culture around trust and timely innovation: with the critical autonomy provided to engineers in this working paradigm, the development team can spend their error budget however they feel is right: either in prevention or cure of system instabilities. In this talk, we will cover common combinations of SLIs that lead to error budget best practices, as well as protocols that can be enacted when error budgets slip: the who, what, and when and why of pre-incident reporting.

Simon Aronsson
Simon Aronsson

Head of Developer Relations

K6

Simon Aronsson

Head of Developer Relations

K6

Error Economics: How to avoid breaking the budget
Learn more ›
close
Simon Aronsson

Simon Aronsson

Head of Developer Relations

K6

Twitter
I’m a thirty-something gopher, developer 🥑, public speaker and meetup organizer from Sweden. I’ve been working in tech for the last 10 years or so, in many different roles ranging from full-stack dev and systems architect to scrum master and ops engineer. During the last couple of years I’ve put a lot of my time into DevOps practices, cloud development, automation and creating highly efficient, self-organising teams.

In my spare time, you’ll usually find me either out and about on my longboard or alpine skis, caring for the chilies in my hydroponic window garden, building software or hardware or playing with my Commodore 64.
2021
Track Technical and Deep Dive

Error Economics: How to avoid breaking the budget

It's scary to release to production, especially if you don't know if your system is performing within your quality SLOs. Using error budgets and testing at scale as quality gates in your release cycle, you'll be able to gain much-needed confidence about the risk-level associated with your release.Using open-source tools, we'll set up a test, generate the necessary load to run it at scale and make sure we stay on budget.After attending this talk, attendees will:- Have an understanding of what error budgets are and how they are measured.- Know how to use them as indicators of service quality.- Know how to create their first high-concurrency test using a load generator and how to set it up with acceptance thresholds based on their error budget.

Steve McGhee
Steve McGhee

Reliability Advocacy Engineer

Google SRE

Steve McGhee

Reliability Advocacy Engineer

Google SRE

SLO Math
Learn more ›
close
Steve McGhee

Steve McGhee

Reliability Advocacy Engineer

Google SRE

Twitter
Steve is a Reliability Advocate, helping teams understand how best to build and operate world-class, reliable services. Before that, he spent 10+ years as an SRE within Google, learning how to scale global systems in Search, YouTube, Android, and Cloud.  He managed multiple engineering teams in California, Japan, and the UK.  Steve also spent some time with a California-based enterprise to help them transition onto the Cloud.
2021
Track New To SLOs

SLO Math

It's the architecture, not the products or infrastructure that matter. How to think about your dependencies and how their SLOs affect your own._Chained services slo = SLOs ^ depth_Parallel isolated services slo = min(SLOs)_____ Redundant parallel services = much better ~= SLO of the LB ÒaboveÓ

Uma Mukkara
Uma Mukkara

Co-Founder & CEO

ChaosNative

Uma Mukkara

Co-Founder & CEO

ChaosNative

Benchmarking SLOs Using Chaos Engineering
Learn more ›
close
Uma Mukkara

Uma Mukkara

Co-Founder & CEO

ChaosNative

Twitter
Uma Mukkara is a maintainer of CNCF chaos engineering project Litmus. He is also the CEO of ChaosNative, the company that provides enterprise support and solutions around Litmus. Uma co-created Litmus project while trying to chaos test OpenEBS project which he also co-created years ago. He is passionate about building solutions around resilience through chaos testing in cloud native space. He is a regular speaker in events, meetups and conferences related to SREs, reliability and Chaos Engineering. Uma holds a Masters degree in Telecommunications and software engineering from Illinois Institute of Technology, Chicago and a bachelor’s degree in Communications from S.V.University, Tirupati, India.
2021
Track Technical and Deep Dive

Benchmarking SLOs Using Chaos Engineering

SLOs are the visible results that SREs need to maintain in any operations. Recently the concept or application of SLOs is increasing being observed into pre-production CI/CD pipelines. If the pre-production setups are closer to production, the resilience of such a setup can be tested by introducing Chaos in the pipeline and measuring the SLOs. In this talk, we discuss the techniques to introduce chaos testing as a trigger to CD and as a post CD action in production or pre-production. Audience will see an example chaos stage in action in a cloud-native CI/CD pipeline and how the prometheus based SLIs are used to measure SLOs during a given period of time and use this benchmarking to make decisions to trigger continuous deployments. The takeaway for the SREs is using chaos testing as a tool to measure SLO based resilience and how this can be automated using declarative config and GitOps.

Vidya Subramanian
Vidya Subramanian

Founder

MyTechLadder

Vidya Subramanian

Founder

MyTechLadder

Learn more ›
close
Vidya Subramanian

Vidya Subramanian

Founder

MyTechLadder

An entrepreneur at heart with a passion for education and a technologist who believed in DevOps and pushed for it before the word was in existence.
Founder MyTechLadder - a career progression and career mobility service, my give back project.
Founder 1MinuteDances - a nonprofit dedicated to helping women reconnect with the performing art.
Startup advisor - Founder Devopsly, LLC. Advising startups in the DevSecOps space.
Startup investor - Angel and early stage investor
Wolfgang Heider
Wolfgang Heider

Senior Technical Product Manager

Dynatrace

Wolfgang Heider

Senior Technical Product Manager

Dynatrace

Top 5 Real-life SLOs and Decision Tree to Define Your SLOs
Learn more ›
close
Wolfgang Heider

Wolfgang Heider

Senior Technical Product Manager

Dynatrace

With a Ph.D. on the evolution of software product lines, Wolfgang is an expert at optimizing software engineering productivity. As a specialist in DevOps, Wolfgang works on improving the efficiency of pipelines and the automation of answering "To Release or not to release?" to enable SRE concepts, SLOs, continuous paradigms, and progressive delivery (in other words, developing the machines that make the machines that make machines).
2021
Track Beyond Theory

Top 5 Real-life SLOs and Decision Tree to Define Your SLOs

The Google SRE theory already tells us, what many confirm with the own SRE journey: It is a hard task to determine the most valuable SLOs for your system. Monitoring tools like Dynatrace provide over 2000 metrics with many filter options and even more data is available with the integration of data sources like OpenTelemetry, SNMP, or any business data sources. For SLOs one needs to choose to focus on important data. We had a look at our customers adopting SLO monitoring in Dynatrace and present a hit list of SLO types we got reported as important. We show how the setup of such SLOs looks like Ð for both major categories of SLOs: real-user traffic request count based SLOs and synthetic availability monitoring SLOs. We propose a decision tree how to get from an idea to defined SLO configurations.

Yury Niño Roa
Yury Niño Roa

SRE Technical Program Manager

ADL Digital Labs

Yury Niño Roa

SRE Technical Program Manager

ADL Digital Labs

Defining a Maturity Model for SLOs
Learn more ›
close
Yury Niño Roa

Yury Niño Roa

SRE Technical Program Manager

ADL Digital Labs

Twitter
Site Reliability Engineer, SRE Technical Program Manager and Chaos Engineering Advocate. 

Software Engineer with 7+ years of experience designing, implementing and managing the development of software applications using agile methodologies such as scrum and kanban. 2+ years of hands-on experience supporting, automating and optimizing mission-critical deployments. Experience with on-premise and cloud architectures and foundations both on the coding and deploying systems. +1 year as Technical Program Manager of a Site Reliability Engineering Team, designing and architecting software to improve availability, scalability, latency and efficiency.

Professor of Software Engineering and Researcher with interest in solving performance, resilience and reliability issues, using chaos engineering and studying human factors, safety on systems and lack of monitoring and observability.
2021
Track SLOcializing

Defining a Maturity Model for SLOs

Service Level Objectives or SLOs are a quantitative contract that describes the expected service behavior. They are often used by Organizations to prioritize the reliability, availability, coverage, and other service-level indicators of the software systems. Based on what I have learned defining and implementing SLOs, I have discovered that they are valuable when they are used to build feedback loops in two axes: adoption and automation. SLOs are a process, not a project, which imposes a need for having a framework that helps organizations to adopt a culture based on SLOs. In this talk, I am presenting a framework that allows determining the level of adoption and automation of SLOs. Based on questions related to the amount of convincing: engineering, operations, product, leadership, legal, and quality assurance, we determine the level of adoption. On the other side, considering aspects such as established and documented measurements, the level of user-centric metrics, observability strategies, and reporting toolsets, we determine the level of automation.

Zac Nickens
Zac Nickens

Site Reliability Engineer

Boxboat

Zac Nickens

Site Reliability Engineer

Boxboat

No More Theater: Building SLO Culture Without the Bullsh*t
Learn more ›
close
Zac Nickens

Zac Nickens

Site Reliability Engineer

Boxboat

Twitter
Reliability Engineer specializing in accelerating adoption of reliability-oriented culture and business/technical feedback loops around SLOs
2021
Track SLOcializing

No More Theater: Building SLO Culture Without the Bullsh*t

Using SLOculture to break down silos, empower engineers, and drive user (and engineer) happiness. Using real life examples from unnamed orgs, I will highlight the pitfalls and traps of "theater" and "fiefdoms" and how SLO culture is can be used to break down barriers to high performance and high happiness.

2021 SLOconf Highlights

highlights-2021

2021 Talks

A Year of SLO Bootcamps
Agile & DevOps Walk into a Bar
Applying SLOs to Infrastructure and Comp...
Benchmarking SLOs Using Chaos Engineerin...
Creating Great Dev Culture though Error ...
Defining a Maturity Model for SLOs
Defining SLOs: A Practical Guide
Don't be a victim of your own success
Error Economics: How to avoid breaking t...
Evaluate Application Resilience with Cha...
Prev
Next

Previous Sponsors