Previous Year | SLOconf 2022

Ajuna Kyaruzi

Developer Relations

Datadog

Ajuna Kyaruzi

Developer Relations

Datadog

Ensuring Reliability using SLO Burn Rates

Learn more ›

Ajuna Kyaruzi

Developer Relations

Datadog

Ajuna Kyaruzi works in Developer Relations at Datadog was born and raised in Dar es Salaam, Tanzania. She loves community building and volunteers with multiple mentorship programs aimed helping early career folks breaking into tech have successful careers. Previously she worked at Google for about four years as a Software Engineer on Google Maps and as a Site Reliability Engineer on Google Cloud.

2022

Track SLO FUNdamentals

Ensuring Reliability using SLO Burn Rates

Service Level Objectives (SLOs) are a measurement of the reliability and general experience your end users and customers can expect. SLO burn rate is a value that indicates how fast your error budget is consumed relative to your SLO’s target length. In this talk, we’ll cover how to calculate burn rates and how alongside error budgets to get a solid, actionable metric for balancing innovation and velocity with reliability and safety over a specific period of time. You will also learn how to use burn rates as a measure of the potential of missing SLO targets.

Aleksander Tarraro

Software Engineer

Nobl9

Aleksander Tarraro

Software Engineer

Nobl9

What is an Error Budget For?

Learn more ›

Aleksander Tarraro

Software Engineer

Nobl9

2022

Track SLO FUNdamentals

What is an Error Budget For?

Alex Hidalgo

Principal Reliability Advocate

Nobl9

Alex Hidalgo

Principal Reliability Advocate

Nobl9

SLOs for Everyone

Learn more ›

Alex Hidalgo

Principal Reliability Advocate

Nobl9

Alex Hidalgo is the Principal Reliability Advocate at Nobl9 and author of “Implementing Service Level Objectives.” During his career he has developed a deep love for sustainable operations, proper observability, and using SLO data to drive discussions and make decisions. Alex’s previous jobs have included IT support, network security, restaurant work, t-shirt design, and hosting game shows at bars. When not sharing his passion for technology with others, you can find him scuba diving or watching college basketball. He lives in Brooklyn with his partner Jen and a rescue dog named Taco. Alex has a BA in philosophy from Virginia Commonwealth University.

2022

Track SLOs for everyone

SLOs for Everyone

Service Level Objectives are often thought of as an engineering tasks meant to help engineers better understand their services. But the concepts behind such an approach can be useful for so many other things! Let’s talk about how you can use SLOs to better think about money, your human factors, high-level business decisions, and even uses outside the world of tech!

Alex Rasmussen

Principal Cloud Economist

The Duckbill Group

Alex Rasmussen

Principal Cloud Economist

The Duckbill Group

The Real World Math and Implications of S3's 99.999999999% A...

Learn more ›

Alex Rasmussen

Principal Cloud Economist

The Duckbill Group

Alex Rasmussen is a Principal Cloud Economist at The Duckbill Group, where he helps organizations with horrifying AWS bills make those bills a little less horrifying. Alex holds a Ph.D. in Computer Science and Engineering from UC San Diego, and has spent over a decade building high-performance, robust data management and processing systems. As an early member of a couple fast-growing startups, he’s had the opportunity to wear a lot of different hats, serving at various times as an individual contributor, tech lead, manager, and executive. Prior to joining the Duckbill Group, Alex spent a few years as a freelance data engineering consultant, helping his clients build, manage and maintain their data infrastructure. He lives in Los Angeles, CA.

2022

Track The future of SLOs

The Real World Math and Implications of S3's 99.999999999% Advertised Durability

Amazon S3 advertises 11 9’s of durability for both Standard and One Zone-IA object storage classes. This is a mind-bogglingly high level of durability. The way AWS puts it, if you’ve stored 10 million objects, you can expect to lose about one object about every 10,000 years - even within a single availability zone!

It’s important to note, though, that this is the designed durability, not the guaranteed durability. There are all kinds of things that can go wrong that could impact durability on AWS’s side, and it’s even more likely that problems with your application can impact the durability of your data in S3, even if AWS does everything right.

In this talk, we’ll talk about some of those things that might go wrong, and explore what S3’s advertised durability actually means to you if you’re someone who builds systems that use it.

Andreas Grabner

DevOps Activist at Dynatrace & DevRel for CNCF Kep...

Dynatrace

Andreas Grabner

DevOps Activist at Dynatrace & DevRel for CNCF Keptn

Dynatrace

Tips for Running Successful SLO Workshops in Under One Hour

Learn more ›

Andreas Grabner

DevOps Activist at Dynatrace & DevRel for CNCF Keptn

Dynatrace

Andreas Grabner (@grabnerandi) has 20+ years of experience as a software developer, tester and architect and is an advocate for high-performing cloud scale applications. He is a contributor and DevRel for the CNCF open source project keptn (www.keptn.sh). Andreas is also a regular contributor to the DevOps community, a frequent speaker at technology conferences and regularly publishes articles on blog.dynatrace.com or medium. In his spare time you can most likely find him on one of the salsa dancefloors of the world (will resume once Covid is behind us)!

2022

Track SLO Stories

Tips for Running Successful SLO Workshops in Under One Hour

I have been advising organizations over the past year on how to start with SLOs. I struggled the same as many do but eventually found an approach that has now worked repeatedly over the past months. Based on one specific workshop I ran, my audience will learn about Business Level Objectives such as “Mobile App Adoption”, “App Rating” and how to break those into more technical Service Level Objectives such as “Crash Rates”, “Availability”, “Performance”, “Error Rates”. I will explain how to define but also capture and report on those SLOs.

Andrew Newdigate

Distinguished Engineer

GitLab Inc.

Andrew Newdigate

Distinguished Engineer

GitLab Inc.

Everyone Can Contribute to Our SLO

Learn more ›

Andrew Newdigate

Distinguished Engineer

GitLab Inc.

Andrew is a Distinguished Engineer in the Infrastructure team at GitLab, where he helps keep GitLab.com available, observable and scaling. Before GitLab, he cofounded the developer community site Gitter in 2012, and was CTO there until Gitter’s acquisition by GitLab in 2017.

After living in London in the UK for 17 years, he recently relocated back to his hometown of Cape Town in South Africa.

2022

Track SLOs for everyone

Everyone Can Contribute to Our SLO

At GitLab, we’ve built an extensive framework for defining service level indicators (SLIs) for our different services. This allows us to take a simple definition, and turn that into dashboards and alerts. There are different owners involved: Infrastructure and stage groups. The SLIs we use to monitor GitLab.com are attributed to groups building the features we run. Everyone is held to the same 99.95% SLO, everyone can contribute to our observability.

Join this talk to learn about the challenges with SLOs and error budgets. Hear how we are aggregating our infrastructure SLIs by features groups, and how we are involving groups in improving our SLI definitions.

Andrew Snyder

Senior DevOps Consultant

Contino

Andrew Snyder

Senior DevOps Consultant

Contino

Using a Service Canvas to Define SLOs

Learn more ›

Andrew Snyder

Senior DevOps Consultant

Contino

Andrew is a Senior global IT professional entering the third decade of providing outstanding information technology solutions focused on reliability, resilience, observability, performance, scalability, cost optimization, best practices, up-skilling, documentation, and compliance. Passionate about facilitating business requirements with optimized technical solutions. Voraciously curious and endlessly learning, the frontiers of my comprehension are envelopes under constant expansion.

2022

Track The future of SLOs

Using a Service Canvas to Define SLOs

Through the use of our visual Service Canvas methodology, we have enabled IT teams to collaboratively discuss and readily identify their catalog of services from their internal customers’ perspectives and to identify their Service Level Objectives (SLOs) and corresponding Service Level Indicators (SLIs) for all of their offerings. The Service Canvas method utilizes a top-down whiteboard of swim lanes which may be completed virtually / remotely using freely available SAAS tools, such as Miro, or in-person using Sticky Notes. Each swim-lane row is a repository for the descriptive components required for establishing the services requiring objectives and indicators, and their corresponding qualities that are neseccary for identifying all meaningful metrics. Using this following top-down visual approach that helps subsequent answers arise organically once the team completes their first service mapping, we have had much success with enabling teams to quickly identify their Service Catalog along with corresponding Service Level Objectives with Service Level Indicators (image of Service Canvas available upon request – it’s much prettier in person!):

Service What is the service that your internal customer wants to use?
Components Which components within the service do your internal users interact with?
Customers Who uses your services?
Target Category Which qualities of the service are important to your internal customers?
Customer Needs How does your internal customer describe a level of service that meets their business requirements?
Service Level Indicator How can we measure that we are meeting our customer’s service qualities needs?
Service Level Objective What are the goals we will target in order to maintain our internal dustomer’s satisfaction with our services?

By iterating through each of the teams services in their offerings and coaching them to get feedback and buy-in from their internal customers regarding these SLO and SLI definitions for each service, we have enjoyed much success in the efficacy of this method to the point where it has now become a staple element in our SRE transformation engagements.

This talk will provide all of the necessary tools and demonstrate the facilitation of this Service Canvas method, and attendees will gain this powerful new way of helping IT teams catalog their services and assign meaningful objectives to them, in coordination with the consumers of these services – their internal customer pool.

Ashutosh Agrawal

Platform architect

Disney+Hotstar

Ashutosh Agrawal

Platform architect

Disney+Hotstar

Using SLOs at Scale: How Disney+ Hotstar Streams One of the ...

Learn more ›

Ashutosh Agrawal

Platform architect

Disney+Hotstar

As a platform architect - for Disney+ Hotstar, Ashutosh is responsible for the entire platform stability/performance/operations. As a result, he ends up working with all the teams (~400+) across the Hotstar to ensure that whatever they build, fits into its vision of the platform and ensure that all things are built up to the required quality. Ashutosh is a full-stack engineer with high exposure to Backend / DevOps. He has worked across various range of products from SAAS / E-Commerce / Live Streaming platform.

In the past, Ashutosh has built one of the highest scale operating systems at Hotstar - SSAI, which does live ad insertion in a video live stream for more than 15M+ Concurrent users. Building this system has helped him understand a lot of fundamentals about how tech works.

2022

Track SLO Stories

Using SLOs at Scale: How Disney+ Hotstar Streams One of the Biggest Sports Tournaments in the World!

The last few years of successfully streaming the IPL at this scale have forced us at Disney+Hotstar to go back to the drawing board and think about site reliability engineering from a first principle point of view.

Outside-in monitoring of services was our goal as we headed into this uphill task, with the following accepted criteria for our systems:

Build correlation across different services across different layers.
Drill down and narrow down to a layer/component causing the trouble.
Detect degradations post changes/deployments on the platform.
Automatically detect thresholds and set alerts on the data.
Detect anomalous patterns across the platform.
Monitor all resources created in production, and auto-discover correlations

It was clear to us as an organisation that to be able to achieve this wish list, we have to have a way to measure and monitor the current performance of our services - this is where the concept of service level objectives entered the picture!

Today, Disney+ Hotstar extensively uses SLOs to measure performance of our systems. This talk will deep dive into how we went about thinking about SLOs for this scale, how we measured the base performance of our systems to actually start setting SLOs, why we chose the compliance targets and periods we chose and much more. It has been an exciting last few years at Disney+ Hotstar and I would love for everyone to know how SLOs help engineering scale!

Austin Krauza

Site Reliability Engineer

JPMorgan Chase

Austin Krauza

Site Reliability Engineer

JPMorgan Chase

What Are SLIs and Why Should I Care?

Learn more ›

Austin Krauza

Site Reliability Engineer

JPMorgan Chase

Austin Krauza is a Site Reliability Engineer for Identity and Trust Platforms at JPMorgan Chase & Co. Previously, Austin has worked a SRE building out the monitoring of the Private Cloud environment, and as a Platform Engineer building several internal platforms for the firm. He graduated from the Honors College of the City University of New York (CUNY) in June 2016 with a Bachelors of Science in Computer Science, and is currently pursuing a Masters of Science in Cybersecurity from New York University. Within the firm, Austin leads several “Ignite” Community of Practices, especially around Site Reliability Engineering (SRE) and Cloud/Distributed Computing.

2022

Track SLO FUNdamentals

What Are SLIs and Why Should I Care?

Every day we interact with various systems, both online and offline – from our visit to the bagel store to checking our email on mobile. We commonly talk about Service Level Objectives (SLOs) within the Reliability Engineering community. Still, we rarely talk about the underpinning indicators that tell us whether a system is healthy. Typically, engineers begin picking the low-hanging fruit objectives (such as request time and volume) rather than doing a complete analysis of their dependencies and the metrics available to them from their systems or application. By leveraging the VALET (Volume, Availability, Latency, Errors, and Tickets) framework, teams can begin exploring their metrics and better define their “health” indicators, leading to more sustainable Service Level Objectives.

Austin Parker

Head of Developer Relations

Lightstep

Austin Parker

Head of Developer Relations

Lightstep

Burn Your Dashboards: The Case For SLO-First Monitoring

Learn more ›

Austin Parker

Head of Developer Relations

Lightstep

Austin Parker is the Head of Developer Relations at Lightstep, and has been creating problems with computers for most of his life. He’s a maintainer of the OpenTelemetry project, the host of several podcasts, organizer of Deserted Island DevOps, infrequent Twitch streamer, conference speaker, and more. When he’s not working, you can find him posting on Twitter, cooking, and parenting. His most recent book is Distributed Tracing in Practice, published by O’Reilly Media.

2022

Track The future of SLOs

Burn Your Dashboards: The Case For SLO-First Monitoring

SLO adoption in existing organizations is often seen as a secondary method of communicating reliability; A way to translate internal performance metrics into more “human-digestible” forms. What if we flipped this supposition around, though? Is our reliance on traditional monitoring practice, of staring at a bunch of indicators on a dashboard, actually holding us back from realizing the benefits of SLOs? In this talk, I’ll present an alternate theory of how SLO’s should be used as your primary measurement of service health for on-call, and how you should incorporate them into your runbooks and release processes.

Bob Van Landuyt

Senior Developer Scalability

GitLab Inc.

Bob Van Landuyt

Senior Developer Scalability

GitLab Inc.

Everyone Can Contribute to Our SLO

Learn more ›

Bob Van Landuyt

Senior Developer Scalability

GitLab Inc.

Bob is a Backend Developer, Scalability @ GitLab. Worked on improving SLI accuracy, and attributing traffic to features and building tooling for development groups to help monitoring GitLab.com. Lives in Kortrijk, Belgium.

2022

Track SLOs for everyone

Everyone Can Contribute to Our SLO

Christian Beedgen

CTO & Co-Founder

Sumo Logic

Christian Beedgen

CTO & Co-Founder

Sumo Logic

Dynamic Environments Need SLOs

Learn more ›

Christian Beedgen

CTO & Co-Founder

Sumo Logic

As a co-founder and CTO of Sumo Logic, Christian Beedgen brings 18 years of experience creating industry-leading enterprise software products. Since 2010 he has been focused on building Sumo Logic’s multi-tenant, cloud-native machine data analytics platform which is widely used today by more than 2,000 customers and 50,000 users. Prior to Sumo Logic, Christian was an early engineer, engineering director, and chief architect at ArcSight, contributing to ArcSight’s SIEM and log management solutions.

2022

Track The future of SLOs

Dynamic Environments Need SLOs

The goal is to refocus attention from individual KPIs to broader indicators to help observers manage very large, complex systems and ensure that their SLOs can be met. This service-level orientation allows teams to address unreliable applications by using SLOs to objectively calculate adherence to the SLA. Managing the SLOs is key to ensuring that modern app stacks are performing reliably for end users. However, many organizations either avoid defining SLOs altogether or use manual processes and tools to track and manage them. Without structured approaches to SLO management, Observability is a pointless investment; simply put, you get what you measure. This talk will discuss how to monitor and create SLOs, dependencies that drive composite SLOs; and how to use the service map to analyze dependencies to set and predict dependencies as a key differentiator.

Colin Curtin

Engineering Leader

Square

Colin Curtin

Engineering Leader

Square

Humans First: Using Error Budgets to Keep Your Team Happy an...

Learn more ›

Colin Curtin

Engineering Leader

Square

Colin Curtin is an engineering leader, coach, and systems engineer. Colin applies models and techniques from disparate domains (engineering and non-engineering) to come to bear on socio-technical problems. Colin has seen (and created) many failures in the past, and through this talk, he hopes you can learn and benefit from them.

2022

Track The future of SLOs

Humans First: Using Error Budgets to Keep Your Team Happy and Healthy

Humans are minimal creatures. They need food, water, sleep, sunlight, joy, etc. What a terrible choice for a system that requires consistent performance, right? Well, except for how adaptable humans are! They can do many different things, like writing software in collaboration with others, dreaming up wild plans, and solving constraints in hundreds of dimensions.

In this talk, we’ll pound more nails in the coffin of industrialization by centering human needs at work. We will measure and correct simple indicators of success: PTO, interruptions, total working time, and work outside of work hours. That’s it? Yes. If you nail these SLOs, I guarantee your humans will be happy, healthy, and referring colleagues to your team.

Colin Douch

Systems Reliability Engineer

Cloudflare

Colin Douch

Systems Reliability Engineer

Cloudflare

You Have an SLO, Whether You Know It or Not

Learn more ›

Colin Douch

Systems Reliability Engineer

Cloudflare

2022

Track SLO Stories

You Have an SLO, Whether You Know It or Not

Cloudflare’s Observability Team learned the lesson of implicit SLOs the hard way. When you aren’t actively setting expectations with your customers they pick their own SLOs, which are often at odds with what you were expecting to offer in the first place. In this talk, Colin will outline the steps that his Observability Team took to get a handle on the expectations of its customers, properly establish SLOs, and utilize those SLOs to progress further discussions about the reliability of their systems.

In the process, Colin will talk about how damaging these implicit SLOs can be to the perceived reliability of your systems, how to have those conversations with your customers, and what operators can do to avoid them in the first place.

Dotan Horovits

Developer Advocate

Logz.io

Dotan Horovits

Developer Advocate

Logz.io

OpenTelemetry: the Open Source Vision for Unified Observabil...

Learn more ›

Dotan Horovits

Developer Advocate

Logz.io

Dotan lives at the intersection of technology, product and innovation. With over 20 years in the hi-tech industry as a software developer, a solutions architect and a product manager, he brings a wealth of knowledge in cloud computing, big data solutions, DevOps practices and more. Dotan is an avid advocate of open source software, open standards and communities. Dotan is an advocate at the Cloud Native Computing Foundation (CNCF), he co-organizes the CNCF Tel Aviv meetup group, and runs the OpenObservability Talks podcast, among others. Currently working as a developer advocate at Logz.io, Dotan evangelizes on Observability in IT systems using popular open source projects such as ELK stack, Prometheus, Grafana, Jaeger and OpenTelemetry.

2022

Track The future of SLOs

OpenTelemetry: the Open Source Vision for Unified Observability

Everyone wants observability into their system, but find themselves with too many vendors and tools, each with its own API, SDK, agent and collectors.

In this talk I will present OpenTelemetry, an ambitious open source project with the promise of a unified framework for collecting observability data. With OpenTelemetry you could instrument your application in a vendor-agnostic way, and then analyze the telemetry data in your backend tool of choice, whether Prometheus, Jaeger, Zipkin, or others.

I will cover the current state of the various projects of OpenTelemetry (across programming languages, exporters, receivers, protocols), some of which not even GA yet, and provide useful guidance on how to get started with it.

Emily Gorcenski

Head of Data and AI

Thoughtworks

Emily Gorcenski

Head of Data and AI

Thoughtworks

A Better SLO for Data-Intensive Systems

Learn more ›

Emily Gorcenski

Head of Data and AI

Thoughtworks

2022

Track The future of SLOs

A Better SLO for Data-Intensive Systems

Availability is one of the core SLOs that comes to mind when designing any new system. However, in the analytics and business intelligence space, stakeholders still demand an impossible “24/7 uptime” SLO while simultaneously struggling to find ways to improve reliability in the system. Every data engineer is familiar with broken pipelines, long-running jobs crashing after 11 hours of runtime, and angry analytics stakeholders wondering where the data is or why it isn’t up-to-date. Let’s solve that problem by designing SLOs that make sense for the uniquenesses of analytics and business intelligence use cases. This talk proposed such an SLO in a way that should inspire similar thinking that can be extended to quality, latency, and other data-relevant measures.

Erik Morgan

Professional Chef

8 Hands Farm

Erik Morgan

Professional Chef

8 Hands Farm

The SLO Food Movement: Error Creep Management in the Restaur...

Learn more ›

Erik Morgan

Professional Chef

8 Hands Farm

Erik Morgan has cooked professionally for 20 years and sometimes even makes meals at home. After starting cooking in the fall of 2002 in order to pursue a small dream and also pay rent, he found the occupation agreed with him. It is estimated – and likely true – that he cooked over 20,000 burgers to temperature while at his first job at Mick O’Shea’s Irish Pub in Baltimore. During the rest of the next 15 years, he ran or helped to run some of the best kitchens on the East Coast, including Ripple in DC and later Zahav and Aldine in Philadelphia. Currently he enjoys line cooking semi-retirement at 8 Hands Farm in Cutchogue, Long Island, a small farm with responsibly and organically raised livestock and plants. During the summer of 2013, Erik, while eating lasagna at Maggiano’s in Chicago, saw One Direction leave the building across the street and get into a tour bus.

2022

Track SLOs for everyone

The SLO Food Movement: Error Creep Management in the Restaurant Kitchen

In addition to guests, restaurants also host many errors great and small. Therefore there is a constant challenge to prevent small error creep as well as the obvious catastrophes; but when bad things happen, and they always do, how do you get back on track?

Fred Moyer

Observability Engineer

Zendesk

Fred Moyer

Observability Engineer

Zendesk

Don’t Go Chasing Percentiles, Use Histograms if You Want Pre...

Learn more ›

Fred Moyer

Observability Engineer

Zendesk

Fred Moyer is an Observability Engineer for Zendesk, where he likes to apply math to ridiculously large sets of data. Fred is a recovering Perl and C programmer, and these days likes to hack in Go and is Ruby. He is a 2018 Google dev award winner for his Istio adapter, a 2013 White Camel award winner, Apache Software Foundation member, and has worked in software engineering and reliability roles for the last 18 years. Fred has two young children and now only rides his bike on the trainer in the garage.

2022

Track The future of SLOs

Don’t Go Chasing Percentiles, Use Histograms if You Want Precision SLO Latency

“Don’t go chasing percentiles, use histograms if you want precision SLO latency”

This talk will discuss the motivations and mathematical reasons for using statistical distributions for Latency based SLOs. The status quo in monitoring and observability tooling relies upon percentile based latency metrics. Those inputs work fine when everything is working well, but as practitioners we are tasked with delivering best in class results when things are at their worst.

Come learn a little math I present proven techniques at scale for high precision latency SLOs using histograms.

Greg Patmore

Contino

Greg Patmore

Contino

Using a Service Canvas to Define SLOs

Learn more ›

Greg Patmore

Contino

For more than 15 years, Greg has been leading teams in a number of industries including Financial Services, Ad Technology, Real Money Gaming, and Streaming Content Providers.

Over the last 4 years, Greg has been leading teams for Contino on a variety of Cloud and Devops projects that are truly transforming the organizations he works with. His projects are centered around rapid assessment, cloud transformation, workload and technology optimization, security and compliance integration, and process engineering.

2022

Track The future of SLOs

Using a Service Canvas to Define SLOs

Service What is the service that your internal customer wants to use?
Components Which components within the service do your internal users interact with?
Customers Who uses your services?
Target Category Which qualities of the service are important to your internal customers?
Customer Needs How does your internal customer describe a level of service that meets their business requirements?
Service Level Indicator How can we measure that we are meeting our customer’s service qualities needs?
Service Level Objective What are the goals we will target in order to maintain our internal dustomer’s satisfaction with our services?

Gwen De Leon

Site Reliability Engineer

IAG

Gwen De Leon

Site Reliability Engineer

IAG

Defining SLOs When You Don't Know Anything About SLOs

Learn more ›

Gwen De Leon

Site Reliability Engineer

IAG

2022

Track SLO FUNdamentals

Defining SLOs When You Don't Know Anything About SLOs

In this talk we walk through our SLO definition workshop, a facilitated session that we used at IAG as an experiment to help teams embed customer focus. We talk openly about what did and did not work, and the experimentation and adjustments we made along the way.

Heidi Waterhouse

Developer advocate

LaunchDarkly

Heidi Waterhouse

Developer advocate

LaunchDarkly

Uh-oh: How Automating Responses Saves Your SLO

Learn more ›

Heidi Waterhouse

Developer advocate

LaunchDarkly

Heidi is a developer advocate with LaunchDarkly. She delights in working at the intersection of usability, risk reduction, and cutting-edge technology. One of her favorite hobbies is talking to developers about things they already knew but had never thought of that way before. She sews all her conference dresses so that she’s sure there is a pocket for the mic.

2022

Track The future of SLOs

Uh-oh: How Automating Responses Saves Your SLO

Service-level objectives are about the way your users experience your system. We spend a lot of time trying to ensure that the system meets those objectives, but because of the nature of systems, humans, and fate, sometimes our systems have problems.

When that happens, how do we respond? Is it a panic? Is it a manual method? Is it something that requires people noticing something and reacting the right way?

Planning and automating our response as much as possible saves time and our SLO number, even if we don’t have a precise answer for what has gone wrong. In this talk, we’ll walk through a couple examples of how you can use standard tools to preserve your service levels.

Ian Bartholomew

Site Reliability Engineering Manager

Nobl9

Ian Bartholomew

Site Reliability Engineering Manager

Nobl9

What Does "Reliability" Actually Mean?

Learn more ›

Ian Bartholomew

Site Reliability Engineering Manager

Nobl9

2022

Track SLO FUNdamentals

What Does "Reliability" Actually Mean?

One of the most important things when developing SLOs is to have meaningful SLIs. A key characteristic of meaningful SLIs is focusing on reliability from a user perspective. This talk will focus on what is reliability and how to integrate that into the creation of meaningful SLIs and ultimately better SLOs,

Ioannis Georgoulas

Senior Site Reliability Engineering Manager

Paddle.com

Ioannis Georgoulas

Senior Site Reliability Engineering Manager

Paddle.com

SLIs the Hard Way

Learn more ›

Ioannis Georgoulas

Senior Site Reliability Engineering Manager

Paddle.com

Ioannis is Senior SRE Engineering Manager at Paddle.com, he is an SLO evangelist and practisioner with an obsession to measure anything that matters for the users and the business.

2022

Track SLO FUNdamentals

SLIs the Hard Way

In this talk, Ioannis will focus on the challenging task of setting up SLIs. The speaker explains with real-world examples the difficulty of some use-cases (cron jobs, queues, low traffic, internal tooling, etc.) to set up meaningful SLIs and present some tips and lessons learned… the hard way.

Isobel Redelmeier

Senior Site Reliability Engineer

Discord

Isobel Redelmeier

Senior Site Reliability Engineer

Discord

SLOs for the New Dog Owner

Learn more ›

Isobel Redelmeier

Senior Site Reliability Engineer

Discord

Isobel Redelmeier is a Senior Site Reliability Engineer at Discord, where her dog Mischa helps her keep the SLOs green so everyone can keep hanging out and talking about their dogs.

2022

Track SLOs for everyone

SLOs for the New Dog Owner

Jakub Warczarek

Principal Architect

Nobl9

Jakub Warczarek

Principal Architect

Nobl9

How to Answer Tricky Questions the SRE Way

Learn more ›

Jakub Warczarek

Principal Architect

Nobl9

Jakub Warczarek is the Principal Architect at Nobl9, where he works on the ultimate SLO platform. During his career he has developed a deep love for open source software, cloud native solutions, distributed systems and proper observability.

2022

Track SLO FUNdamentals

How to Answer Tricky Questions the SRE Way

Save money and time with answers for following questions. Why shouldn’t aim for 100% reliability? When speeding up an application two times doesn’t improve user experience? And what should wake up on-call engineers?

James Strong

Lead Solutions Architect

Chainguard

James Strong

Lead Solutions Architect

Chainguard

SLOs For Your Software Supply Chain Security

Learn more ›

James Strong

Lead Solutions Architect

Chainguard

James joined Chainguard after a long stint of helping customers migrate to the Cloud and Kubernetes. Security was the number one issue he saw when doing these migrations and now wants to help secure their supply chains. James is also the author of O’Reilly’s Networking & Kubernetes, KubePhilly Meetup organizer, ACloud Guru instructor and when he is not at a computer you can find him in the gym doing Olympic weightlifting.

2022

Track The future of SLOs

SLOs For Your Software Supply Chain Security

Supply chain security is paramount to your service level objectives. A compromised system impacts services and your company’s reputation. A rogue dependency will demolish your error budget and is 100% preventable. In this talk, we will discuss SLOs from a security perspective that developers and organizations as a whole should measure to increase the security of their services, supply chains, and peace of mind.

SLOs extend to the entire Supply Chain, not just operations. Adding SLOs into the Development of services improves SLOs at the run time of an application’s life cycle. We will review metrics at the Organization level that companies should be enforcing, such as the number of signed containers, the number of container image sources, MTTR of a CVE, among others impacting the security of development, build, and ultimately the runtime of your services. This presentation is for anyone responsible for their software pipeline, from developers who can learn about what security measures they should be implementing for their services up to CISOs looking to improve the supply chain security for their organization.

Jan Ritter

DevOps Engineer Lead

Trusted Shops

Jan Ritter

DevOps Engineer Lead

Trusted Shops

Good Reasons for SLOs in Less Than 10 Minutes

Learn more ›

Jan Ritter

DevOps Engineer Lead

Trusted Shops

Jan Ritter is DevOps Engineer Lead at Trusted Shops

2022

Track SLO FUNdamentals

Good Reasons for SLOs in Less Than 10 Minutes

For both engineering and product / business, there are good reasons to use service level objectives (SLOs). This talk will, based on concrete examples, give you some of these reasons to start your SLO journey.

Jason Yee

Director of Advocacy

Gremlin Inc.

Jason Yee

Director of Advocacy

Gremlin Inc.

Budgets are for Spending

Learn more ›

Jason Yee

Director of Advocacy

Gremlin Inc.

Jason Yee is Director of Advocacy at Gremlin where he helps people build more resilient systems by learning from how they fail. Previously, he was Senior Technical Evangelist at Datadog, a Community Manager for DevOps & Performance at O’Reilly Media, and a Software Engineer at MongoDB. Outside of work, he likes to spend his time collecting interesting regional whiskey and pokemon.

2022

Track The future of SLOs

Budgets are for Spending

Whenever the news would announce that some extremely wealthy person died, my dad would remind me, “You can’t take it with you.” The statement was intended to frame money as a resource—something to be used with intention, not squandered or hoarded.

With reliability, we’ve adopted the financial framing of Error Budgets, and along with it we’ve also adopted some poor financial habits. This often leads to teams wasting their Error Budgets on unforeseen incidents or simply not using them at all.

In this session, I’ll share how to spend your Error Budget in order to improve your applications, your engineering teams, and the overall success of your organization.

Jayesh Ahire

Product Manager

Last9

Jayesh Ahire

Product Manager

Last9

Monitoring Services Not Servers

Learn more ›

Jayesh Ahire

Product Manager

Last9

Jayesh Ahire is the Product Manager at Last9. He is the AWS ML Hero, Twilio champion, and maintainer of OSS Project Hypertrace. He is the Organizer of AWS UG, Elastic UG, TensorFlow UG, Microsoft AI community, and many other communities in India. His research interest involved Distributed neural computers and Defi. In his free time, he likes to read and he is learning to play the piano.

2022

Track SLO FUNdamentals

Monitoring Services Not Servers

The Internet and Pets have an old relationship. It started with the infamous Pets.com. While unfortunately, the business crashed, it established that online was here to stay. To run a business online, we used to buy server hardware for operations. We named these with respect—animals, dragons, star wars, wines, or movie characters. Just like our pets. Fast forward to today, Infrastructure is overwhelmed with pets again. This time around, we are exchanging pet photos and not pet supplies. Suddenly, we have a flock of these servers at our disposal. As the scale evolved, the rise of Service oriented architecture was inevitable. But in this micro-services led world, what has become absolutely clear is that one needs to start monitoring services and not servers. In this talk, we intend to explain the rationale behind Monitoring Services and not Servers, define the different types of services available today and define the key SLOs that should be measured for each type of service. We will also cover how SLOs will help you solve the ultimate debate on Feature v/s Stability and solve for cascading impacts, ever so common in today’s world of distributed systems.

Jennifer Robertson, M.S., CCC-SLP

Speech-Language Pathologist

Jennifer Robertson, M.S., CCC-SLP

Speech-Language Pathologist

Now You're Talking: SLIs & SLOs for SLPs

Learn more ›

Jennifer Robertson, M.S., CCC-SLP

Speech-Language Pathologist

Jen Robertson is a Speech-Language Pathologist who has worked with people with developmental disabilities for the past fifteen years. Her career has focused primarily on Augmentative and Alternative Communication, dysphagia, and play-based intervention. Jen currently works in pediatrics at Helen Keller Children’s Learning Center. She enjoys horror movies, crossword puzzles (both solving and constructing), and post-punk. She lives in Brooklyn with her partner Alex and their dog Taco.

2022

Track SLOs for everyone

Now You're Talking: SLIs & SLOs for SLPs

SLIs and SLOs aren’t just for tech! While trying to teach small children how to communicate, I need a way to keep track of what they’re doing, what they should be doing, and why. The objectives SLPs set are critical not just for data, but to ensure that what we’re keeping track of actually matters.

John Willis

Red Hat

John Willis

Red Hat

Dr. Deming Would Have Hated OKRs, but Would Have Loved SLOs

Learn more ›

John Willis

Red Hat

2022

Track The future of SLOs

Dr. Deming Would Have Hated OKRs, but Would Have Loved SLOs

Julie Gunderson

Senior Reliability Advocate

Gremlin Inc.

Julie Gunderson

Senior Reliability Advocate

Gremlin Inc.

Reducing Trauma in Production with SLOs and Chaos Engineerin...

Learn more ›

Julie Gunderson

Senior Reliability Advocate

Gremlin Inc.

Julie Gunderson is a DevOps Advocate at PagerDuty, where she works to further the adoption of DevOps best practices and methodologies. She has been actively involved in the DevOps space for over five years and is passionate about helping individuals, teams and organizations understand how to leverage DevOps and develop amazing cultures. Julie made a career developing relationships and building communities. She has delivered talks at conferences such as Velocity, Agile Conf, OSCON and more, as well as being a contributor to opensource.com and techtarget. Julie is also a founding member and co-organizer of DevOpsDays Boise.
In her off time Julie can be found either traipsing through the mountains in Idaho, or making circuit boards into wearable art.

2022

Track The future of SLOs

Reducing Trauma in Production with SLOs and Chaos Engineering

Customer experience is the responsibility of the entire team. Many organizations leave reliability up to the SRE team, however reliability should be built in from the very beginning. In this talk Mandi and Julie will discuss what Service Levels Objectives are, why they are important to the organization, and how to define and set them. Going beyond SLOs, attendees will learn what Chaos Engineering is and practical ways to ensure compliance and resilience with best practices. We’ll show you how to focus your goals and error budgets with examples that will lead to reliability and improved user experience.

Kasia Zemka

Engineering Manager & Software Engineer

Nobl9

Kasia Zemka

Engineering Manager & Software Engineer

Nobl9

SLOs are Not Only for SREs

Learn more ›

Kasia Zemka

Engineering Manager & Software Engineer

Nobl9

Kasia works as both Engineering Manager and Software Engineer at Nobl9, where she began her journey with SLOs.

Kasia’s adventure with programming and the IT industry began 10 years ago. At the turn of these years, she worked in various roles and fulfilled herself in many fields, including frontend development, testing, team leadership, and management. Currently, she uses all the experience she gained during this time to build reliable software and spread domain knowledge. And with that said, Kasia is glad to work at Nobl9 - a startup where she can pursue herself in this field on a daily basis.

2022

Track SLOs for everyone

SLOs are Not Only for SREs

When you think about SLOs, you probably think about SREs. But if we think about the whole idea behind SLOs and SLO culture, we quickly realize it is a great tool for teams to stay on track with their goals. One of the best solutions to achieve a consensus in this regard is to have a common understanding what’s our objective and when good is good enough. SLOs are a great way to do it, and they are applicable throught the whole software development processes. We have things we monitor and standards we want to meet. In my talk I want to deep dive into this aspect of SLOs, explaining why they’re not only for SREs, and how they can be used by anyone in our industry in our everyday work.

Keri Melich

Senior Site Reliability Engineer

Nobl9

Keri Melich

Senior Site Reliability Engineer

Nobl9

Growing Business Ops Through SLOs

Learn more ›

Keri Melich

Senior Site Reliability Engineer

Nobl9

Keri is an SRE for Nobl9 working to help scale and secure the Nobl9 platform. Before that she spent 4 years at Squarespace building secure, scalable solutions for internal users. In her free time, she loves crafting, backpacking, and snowboarding.

2022

Track SLOs for everyone

Growing Business Ops Through SLOs

SLOs are a great tool for helping engineers communicate with the business. But in this talk, we’re going to look at SLOs from the perspective of everyone except an engineer. We’ll touch on SLAs, Roadmapping, and the monetary benefits of setting realistic reliability goals.

Kit Merker

COO

Nobl9

Kit Merker

COO

Nobl9

The SLODLC: SLO Development Lifecycle

Learn more ›

Kit Merker

COO

Nobl9

Kit Merker's 20+ year career spans product management, engineering, evangelism and community-building roles at Google, Microsoft, JFrog, and the governing board of the Cloud Native Computing Foundation (CNCF). He is currently Chief Operating Officer for Nobl9, the service level observability company, helping software teams optimize their delivery to make customers happy and business sustainable.

2022

Track The future of SLOs

The SLODLC: SLO Development Lifecycle

An overview of the SLODLC, a repeatable methodology for adopting SLOs across the organization. You’ll get a walkthrough of the methodology, the resources, examples, and how to get started in your organization.

Leo Vasiliou

Director of Product Marketing

Catchpoint Systems

Leo Vasiliou

Director of Product Marketing

Catchpoint Systems

Perform: How many Nines? Depends on Accumulation

Learn more ›

Leo Vasiliou

Director of Product Marketing

Catchpoint Systems

With more than 15 years of experience leading production operations, web performance, and security programs, the battle scars Leo carries to this day are part of the reason why he now works marketing and evangelizing the correct way to think about monitoring and observability ecosystems. Leo is the primary author and analyst of the annual Site Reliability Engineering report. He also has a passion for data analysis as applied to monitoring and web performance data. He started his career in IT infrastructure and Operations in the United States Air Force.

2022

Track SLO FUNdamentals

Perform: How many Nines? Depends on Accumulation

Meet the powerful analytic for performance-based SLOs. This talk starts with the fact that most teaching SLO discussions focus on using an internal, non-cumulative endpoint (e.g. how many successful GET requests to /API) to illustrate SLO concepts. And arriving at the fact that when it comes to setting SLO for cumulative endpoints (e.g. an app or page consisting of many, distributed requests), determining the number of nines for this objective must be accordingly adjusted to account. In other words, three or four nines may be acceptable for /API. But three or four nines for an experience-based (cumulative) endpoint is not practical. In this session, will discuss the various adjustments needed for experience-based (cumulative) endpoints through both an availability and performance lens. Will further expand on the performance lens and discuss semi-advanced distribution functions for analyzing them – with the ultimate goal being reliable, resilient experiences to better serve self, team, and business.

Liz Fong-Jones

Developer advocate, Labor And Ethics Organizer, & ...

Honeycomb

Liz Fong-Jones

Developer advocate, Labor And Ethics Organizer, & Site Reliability Engineer

Honeycomb

Evaluating Event-based SLOs at Scale

Learn more ›

Liz Fong-Jones

Developer advocate, Labor And Ethics Organizer, & Site Reliability Engineer

Honeycomb

Liz is a developer advocate, labor and ethics organizer, and Site Reliability Engineer (SRE) with 16+ years of experience. She is an advocate at Honeycomb for the SRE and Observability communities, and previously was an SRE working on products ranging from the Google Cloud Load Balancer to Google Flights.

She lives in Vancouver, BC with her wife Elly, partners, and a Samoyed/Golden Retriever mix, and in Sydney, NSW. She plays classical piano, leads an EVE Online alliance, and advocates for transgender rights.

2022

Track SLO Stories

Evaluating Event-based SLOs at Scale

How do you evaluate and alert on thousands of SLOs based on millions of incoming telemetry events per second? Originally, we evaluated the data at rest every minute, but we found that inefficient and expensive. Learn how you too can turn high-throughput logs into SLO gold with streaming evaluation.

In the early days of our SLO evaluation system, we ran up an AWS lambda bill of $10k in a few days, before implementing caching. As our system scaled, we discovered a majority of our total computational resources were devoted just to processing the SLOs from the events that arrived in the last minute, let alone user interactive querying or other analysis. Thus, we needed to switch to streaming SLO evaluation based on incoming events, leveraging Kafka consumer groups. In this talk, you’ll learn from our mistakes and be better able to implement streaming SLO evaluation at scale for true real-time SLO computation, action, and iteration on existing SLOs.

Luis Parada

Head of Engineering

FARFETCH

Luis Parada

Head of Engineering

FARFETCH

How FARFETCH is dealing with SLOs

Learn more ›

Luis Parada

Head of Engineering

FARFETCH

Luís Parada is Head of Engineering at FARFETCH leading the Platform Tech Foundations area, currently focused on foundational aspects of the platform such as Observability, CI/CD pipelines, Scalability and Resilience.

Prior to this Parada led Farfetch ID, the group responsible for FARFETCH’s Identity Provider, Authentication & Authorisation, Customer and Partner Account data.

Parada likes to focus a lot on continuous improvement of teams and organisations and on personal development, sharing insights on his YouTube channel and via a Monthly Newsletter called A Leader’s Mindset.

2022

Track SLO Stories

How FARFETCH is dealing with SLOs

FARFETCH is the Leading Global Platform for Luxury Fashion Industry, currently with over 2500 engineers that implement our platform that contains around 1000 services. For us, customer and partner satisfaction is paramount and as such the reliability of our platform is extremely important. This is the story of how we’re tackling the issue of reliability transparency, and how we’re implementing SLOs using the OpenSLO standard.

Lukasz Dobek

Software Engineer

Nobl9

Lukasz Dobek

Software Engineer

Nobl9

Service Level Objectives: What Are Those?

Learn more ›

Lukasz Dobek

Software Engineer

Nobl9

Łukasz Dobek is a Software Engineer that works with cloud-native technologies on a daily basis. He strives to be language-agnostic and to treat programming languages as tools. Most of the time, you can find him building software with Go, JavaScript, or Python. He has experience in DevOps which certainly helps him develop and implement practical, effective, and easy-to-maintain solutions. Working at scale is another thing he can share his knowledge about, be it Kubernetes or Serverless architecture.

Currently, he’s developing Service Level Objectives platform at Nobl9, helping to make a cultural shift to the Site Reliability Engineering mindset.

2022

Track SLO FUNdamentals

Service Level Objectives: What Are Those?

As Service Level Objectives gain more and more traction, they’re also becoming a foundation for building reliable and customer-centric software. It’s important to know what they are, how can they help, and how to pick them. This quick talk aims to go through the very basics of SLOs and enable beginners to make their first steps in the reliability domain.

Mandi Walls

DevOps Advocate

PagerDuty

Mandi Walls

DevOps Advocate

PagerDuty

Reducing Trauma in Production with SLOs and Chaos Engineerin...

Learn more ›

Mandi Walls

DevOps Advocate

PagerDuty

Mandi Walls is a DevOps Advocate at PagerDuty. For PagerDuty, she helps organizations along their IT Modernization journey. Prior to PagerDuty, she worked at Chef Software and AOL. She is an international speaker on DevOps topics and the author of the whitepaper “Building A DevOps Culture”, published by O’Reilly.

2022

Track The future of SLOs

Reducing Trauma in Production with SLOs and Chaos Engineering

Matthew Macdonald-Wallace

Principal Consultant

Contino

Matthew Macdonald-Wallace

Principal Consultant

Contino

Automated deployments of SLOs Using Sloth and Prometheus

Learn more ›

Matthew Macdonald-Wallace

Principal Consultant

Contino

Currently heading up the Observability Community within Contino, Matt has a history of helping organizations monitor and track anything from air quality and viticulture to systems with thousands of servers and instances running public cloud platforms.

With a career that started in first-line tech support over 20 years ago, since then Matt has worked at all levels of customer support, managed a shared web hosting cluster serving data and email for over 500,000 sites, held roles as a software developer building a cloud platform for public use, and provided consultancy services to some of the UK’s best-known companies.

Matt’s experience of working with both physical hardware, IoT devices, and cloud platforms has led him to a strange position in the world of Observabilty in which I’ve been able to monitor servers, serverless, and even farm animals using the same set of Open Source tools - tools that he continues to recommend to our customers.
When he’s not preaching about the wonders of what good dashboards look like, he helps run the local Makerspace, plays his guitar, and spends time with his family. Of course, he’s also frequently seen out and about with the family dog!

2022

Track SLO FUNdamentals

Automated deployments of SLOs Using Sloth and Prometheus

In this talk, I’ll take you through a quick tour of how Contino.io helps our customers manage and deploy their SLIs and SLOs using https://sloth.dev

We’ll cover the CI pipeline commands that we use to validate, generate, and deploy SLI’s and SLO’s simultaneously as the application and the different options available for Prometheus, Cortex, and Grafana Cloud.

We’ve only got 10 minutes, so it will need to be quick, but thankfully the tools available make this a lot easier than you think!

Max Knee

Principal Software Engineer

Comcast

Max Knee

Principal Software Engineer

Comcast

Defining SLOs for Running a Delivery Tool as a Platform

Learn more ›

Max Knee

Principal Software Engineer

Comcast

Max is a Principal Software Engineer at Comcast, focused on improving software delivery through tools and practices. When Max is not coding, you can catch Max cooking or baking, hanging out with his dog Bonnie (a rescued greyhound), or riding his bike in the Philadelphia area.

2022

Track SLO FUNdamentals

Defining SLOs for Running a Delivery Tool as a Platform

We at Comcast run ConcourseCI as our CI/CD tool company wide. In an effort to make it easier for us to operate and determine problem areas, we needed to implement SLIs and SLOs. This brings up an interesting problem, how does one running an open source project implement SLIs and SLOs if they don’t have direct ownership of the code and adhere to the maintainer’s vision? We did exactly that, by using their existing metrics to define and create our own SLIs and SLOs to ensure that our instance is performing up to our needs and standards.

Michael Ericksen

Site Reliability Engineer

Intelligent Medical Objects

Michael Ericksen

Site Reliability Engineer

Intelligent Medical Objects

Reliability Lessons from the Back of a Bicycle: 100% is Neve...

Learn more ›

Michael Ericksen

Site Reliability Engineer

Intelligent Medical Objects

An electric bike hive member, Michael works as a Site Reliability Engineer and obsesses about service level objectives and measuring what matters to users. When he’s not working (or on call), you’ll likely find him picking up kids from school, grocery shopping, and running other local errands on his bicycle.

2022

Track SLOs for everyone

Reliability Lessons from the Back of a Bicycle: 100% is Never the Goal

According to a 2018 study by the US Department of Transportation, almost 60% of vehicle trips are less than six miles and 75% are less than ten. This talk discusses anthropogenic climate change and examines how service level objectives shaped one family’s strategy to reduce their dependency on cars and fossil fuels. In it, the presenter explores how tools like service level indicators and objectives helped their family side-step impossible goals like eliminating 100% of car trips and instead focus on more realistic ones like “How many trips less than 2 miles might we make by bicycle instead?” To conclude, the presenter discusses how an approach on good-enough rather than perfection leaves space for the “reliability events” like a burst appendix or winter in Chicago that periodically interrupt daily life.

Michael Friedrich

Senior Developer Evangelist

GitLab Inc.

Michael Friedrich

Senior Developer Evangelist

GitLab Inc.

Left Shift your SLOs with Chaos

Learn more ›

Michael Friedrich

Senior Developer Evangelist

GitLab Inc.

Michael Friedrich is a Senior Developer Evangelist at GitLab focussing on Observability, SRE, and Ops. He studied Hardware/Software Systems Engineering and moved into DNS and monitoring development at the University of Vienna and ACO.net. Michael was a maintainer of an OSS monitoring software for 11 years before joining GitLab. He loves to help educate everyone and regularly speaks at events and meetups. Michael co-founded the #EveryoneCanContribute cafe meetup group to learn cloud-native & DevOps. Michael is a Polynaut advisor at Polywork, created o11y.love as a learning platform for Observability, and shares insights in the opsindev.news newsletter.

2022

Track The future of SLOs

Left Shift your SLOs with Chaos

Developers and SREs are instrumenting applications and apply observability workflows with metrics, traces, logs, and beyond. The first service level objective (SLO) is defined, now what - wait for the first production incident?

Think of day-2-Ops: SLOs need to be well understood and simulated early in the development process. False-positive alerts can lead to on-call fatigue. How to simulate an incident? Add chaos to production and simulate network failures, broken apps, etc. - and validate the SLOs. Developers can add their own chaos experiments too.

Join this talk to learn how SLOs can be shifted left with chaos, and get inspired by new tools and workflows for your production environment.

Nikhil Unni

Co-Founder & Chief Architect

Cortex

Nikhil Unni

Co-Founder & Chief Architect

Cortex

We’re Not the Bad Guys: Extending Your Influence as an SRE

Learn more ›

Nikhil Unni

Co-Founder & Chief Architect

Cortex

Nikhil is the Co-Founder & Chief Architect of Cortex.io.

2022

Track SLOs for everyone

We’re Not the Bad Guys: Extending Your Influence as an SRE

To start, how do you get people to care about reliability? At the end of the day, all engineering work should tie back to the business. Underscore that reliability leads to money saved in the long run. Maintaining SLOs, minimizing downtime, and speeding up developer productivity all have a real dollar amount saved for businesses.

Next, you need to align with leadership and engineers on standards: Avoid subjective criteria, instead defining objective standards like SRE readiness checklists and SLO adoption targets. This will not only allow you to take stock of the current state of the world, but will also let you track your reliability journey over time.

After that, make it as easy as possible for your engineers to adopt best practices: use automation, tooling and work with counterparts in infra/platform to drive org-wide adoption of tools and processes. It’s much easier to get an organization aligned on the “right way” to do things if it’s seamless for them to do so.

Finally, after driving reliability initiatives for several quarters, prove the value of the work. Tie it back to the business use case, and track results to make sure that the investment is paying off.

Pedro Alves

Zalando

Pedro Alves

Zalando

Operation Based SLOs

Learn more ›

Pedro Alves

Zalando

Pedro has been focusing on developing back-end code for web apps since 2008. Pedro has been in Zalando since 2013. He has worked in different areas of Zalando’s business and is now working in the SRE team, ensuring people can buy shoes reliably.

2022

Track The future of SLOs

Operation Based SLOs

The industry by and large assigns SLOs on Services (it is in the name after all). At Zalando, we did the same thing, but we weren’t getting all the nice benefits we were expecting. A couple of things we struggled with was managing the vast collection of SLOs for our hundres of microservices, while trying to translate the performance of all of those SLOs to the performance of the customer experience.

In this talk we’ll present a new implementation of SLOs that are defined on end-customer operations. We’ll see what the advantages are for engineers and managers alike when using these SLOs. We’ll also take a look at how Distributed Tracing can be used to enable the measurement of these SLOs and serve as key ingredients for Symptom Based Alerting.

Petr Hajek

Senior Engineer

Omio

Petr Hajek

Senior Engineer

Omio

Alerting on High and Low Traffic Using SLOs

Learn more ›

Petr Hajek

Senior Engineer

Omio

I am a Senior Engineer in Omio. I enjoy building end-to-end solutions and being responsible for the whole lifecycle. SLOs became an important asset in my toolkit for ensuring a high level of service quality.

2022

Track SLO FUNdamentals

Alerting on High and Low Traffic Using SLOs

This talk describes the solution of implementing SLO alerting in an environment of 50+ services with varying traffic:

some services generate hundreds of events per hour while some generate units of events per day
peak vs off-peak traffic variation is 10x
the services to monitor are integrating with 3rd party APIs which often generate a high volume of errors

Our solution is using standard tooling (Grafana, Graphite, and Terraform), which makes it easy to implement by other teams or organizations

Pieter van Noordennen

Senior Director, Growth

Slim.AI

Pieter van Noordennen

Senior Director, Growth

Slim.AI

Growing Fast, Growing SLO

Learn more ›

Pieter van Noordennen

Senior Director, Growth

Slim.AI

Pieter van Noordennen is the Senior Director of Growth at Slim.AI. A former journalist and product leader, Pieter has held roles at TripAdvisor, Orbitz, Time Inc, and Globe Pequot Publishing. He’s led major consumer-facing technology projects across multiple engineering teams, with a focus on machine-learning driven recommendation systems and content at scale. A graduate of the MIT Sloan School of Management (MBA) and The George Washington University (B.A., Journalism), he writes and speaks frequently about how we can build technology products more effectively. His interests include skiing, sailing, cooking, smoking (the BBQ kind), and learning “Hello World” in every available programming language.

2022

Track SLOs for everyone

Growing Fast, Growing SLO

We know SLOs are great for SREs and DevOps professionals in gauging reliability, managing tech debt, and ensuring the team is working on the right things at the right time. More and more business leaders, especially in product management and related areas, are also embracing SLOs as ways to manage business outcomes. And the best SLO practitioners are adept at showing the impact SLOs have on their customers, not just their up-time.

Pieter van Noordennen is Senior Director of Growth for Slim.AI, a fast-scaling startup working on dev tools for cloud-native apps. Growth is rife with metrics, most commonly tracked in the form of OKRs or KPIs, and when a product is expanding quickly, there are often trade-offs that are difficult to assess and prioritize in the heat of the moment. For instance, should the team focus on increasing the acquisition of new users or focus on more deeply engaging with the users you already have? Most CEOs would want both, but how do we come up with a framework for making these trade-offs in real-time?

A recent convert to SLOs (thanks Kit!), Pieter wanted to see what he could learn from translating the current OKRs and KPIs into SLOs and tracking them that way. In this talk, he’ll share his approach, process, and learning. He will share:

A comparison of SLOs and OKRs as tools for guiding business outcomes
An assessment of how SLOs can be applied to traditional marketing metrics
Learnings, gotchas, and real-life experience in socializing SLOs in a non-traditional function area

This talk is aimed at:

business leaders looking to assess the use of SLOs in their daily work
startup founders and early employees interested in new approaches to measuring key metrics in their growth journey
and SLO Practioners interested in expanding the use of SLOs in their organizations to new functional areas.

Piotr Ptak

Software Engineer

Nobl9

Piotr Ptak

Software Engineer

Nobl9

Write Your Code with SLOs in Mind

Learn more ›

Piotr Ptak

Software Engineer

Nobl9

Who’s Piotr?
Known mainly as a human being living in Poznań, Piotr is a programmer interested about all things computers. He’s a free software enthusiast that focuses (currently) on the world of distributed systems. Despite being often stuck in his mind, you can find him wandering through different dimensions playing guitar or harmonica, drawing things that grab his attention, and talking about SLOs.

He’s constantly in search for new ways of communicating with nature, for example hiking and climbing. And for unknown reasons, he’s fond of staying deep inside the forest at night.

2022

Track SLOs for everyone

Write Your Code with SLOs in Mind

Software Development Life Cycle consists of the multiple phases, like Requirements Analysis, Design, Implementation, Testing and Deployment. As more and more people are adopting the Site Reliability Engineering philosophy, the question that comes to our minds is: How do SLOs fit in this cycle? How do we incorporate SLOs into methodologies and techniques that we built and polished over the years of software development industry?

Adding SLOs as the next and final step of such cycle may seem tempting (for some), but there are many advantages of doing that in the earlier phases. In this talk I’m gonna discuss these benefits and highlights some principles and practices for teams to follow this approach.

Prathamesh Sonpatki

Software Engineer

Last9

Prathamesh Sonpatki

Software Engineer

Last9

Custom SLOs for all: Manage Your Business With Your Own Metr...

Learn more ›

Prathamesh Sonpatki

Software Engineer

Last9

Prathamesh works at Last9 managing SLOs of his own services and providing a way to people to manage their own.

2022

Track SLO FUNdamentals

Custom SLOs for all: Manage Your Business With Your Own Metrics

As software becomes more and more complicated, it is essential to give users the power to measure and optimize whatever metric they desire, depending on the priority for their business. This becomes important when out-of-the-box solutions fail to provide users with the flexibility to choose their metrics. Users typically tend to move to open source tools providing such leeway.

This talk is intended for such users - power users who are writing their own PromQL queries and heavily leveraging existing tools in the market for custom tailor-made monitoring solutions for their organizations OR up & coming users that intend to go down this rabbit hole.
This talk is intended for such users - power users who are writing their own PromQL queries and heavily leveraging existing tools in the market for custom tailor-made monitoring solutions for their organisations OR up & coming users that intend to go down this rabbit hole.

Rama Kulasekaran

service reliability advocate

Optum

Rama Kulasekaran

service reliability advocate

Optum

Measuring Service Reliability for Greater User Experience: T...

Learn more ›

Rama Kulasekaran

service reliability advocate

Optum

I help my team focus on high-value work and align engineering efforts against business goals. I am a developer advocate as well as a service reliability advocate. I enjoy public speaking at conferences and meetups around engineering management, site reliability engineering, engineering culture, building high-performance teams, psychological safety, and leadership. I promote early career programs as well as diversity, equity, and inclusion efforts. I always love to network with like-minded professionals as I am always learning.

2022

Track SLO FUNdamentals

Measuring Service Reliability for Greater User Experience: Theory to Practice!

If you are a business that stands up and sells any service to your customers and assures them about always-on and reliability, where and how do you get started? What framework and approach you could use to make the tangible process that makes sense for you and your customer? Perfection and 100% reliability isn’t the goal, but setting measurable and concrete reliability targets will result in happy customers. Start the SLO discussion early in the design process. So, what is SLI, SLO? What should be your SLO? Why should you pick SLOs from historical performance vs current performance? What other metrics you should care about and track are industry standards while you make better software, faster? What specific measures can prevent from getting paged in the middle of the night as a result of unstable service?

Ricardo Castro

Lead SRE

Anova

Ricardo Castro

Lead SRE

Anova

First Principles: How to Learn about SLO-based Approaches fr...

Learn more ›

Ricardo Castro

Lead SRE

Anova

2022

Track SLO FUNdamentals

First Principles: How to Learn about SLO-based Approaches from Scratch

SRE and SLOs are all the hype. Everywhere you look, there they are. You’ve read about it. Watched a few talks. But still, you’re having some trouble applying them to your context. Don’t worry, you’re not alone.

Putting SRE and SLOs aside for a bit, your ultimate goal is to amaze your customers. You want to make them happy using your product or service. And you know that happy customers are good for business. SRE and SLOs can help you achieve that.

This talk will reason from first principles. It will break down these complicated concepts into basic elements. They will then be reassembled from the bottom-up. Reasoning from first principles is one of the best ways to learn new and complex concepts.

Robert Ross

CEO

FireHydrant

Robert Ross

CEO

FireHydrant

Why You Should Probably Only Alert on SLOs

Learn more ›

Robert Ross

CEO

FireHydrant

Robert is a software engineer turned CEO of FireHydrant. Robert loves creating products that other developers use as a primary tool in their daily lives. Further, Robert is a huge marching band dork, punk-pop listener, and skier when he is not advocating for reliability as a business metric, not a software metric.

2022

Track The future of SLOs

Why You Should Probably Only Alert on SLOs

This talk focuses on moving away from alerting on computer vitals and more to customer pain measured through SLOs. We’ll focus on symptoms not causes and how this can better tie your product/engineering organization to customer satisfaction. A highly energetic talk that will question the norms that almost all organizations I’ve worked at have operated under: I’ll just page someone when CPU is greater than 80%. Why? I say we should be cheering about that with a pizza party if customers are as happy as they were before the alert went out.

Sal Furino

CRE

Sal Furino

CRE

Life of an SLO

Learn more ›

Sal Furino

CRE

Sal Furino is a Customer Reliablity Engineer. During his career he's worked as a TPM, SRE, Developer, Sys Admin, and IT support. While not working he enjoys cooking, gamings, traveling, skiings, and golfing. Sal lives in Queens with his parter and has a BS in Applied Mathematics from Marist College.

2022

Track The future of SLOs

Life of an SLO

Sal Kimmich

Product Strategist, Developer Advocate

Reliably

Sal Kimmich

Product Strategist, Developer Advocate

Reliably

Culture Clash: Why DevOps, SRE and Cybersecurity Teams Have ...

Learn more ›

Sal Kimmich

Product Strategist, Developer Advocate

Reliably

Sara Kimmich is a full-stack developer and certified scrum master focused on building better-than-agile systems in the workplace. With experience leading teams ranging from small, co-located agile software teams to distributed online teams of nearly a thousand, she teaches practical and quantitative strategies for successful tech development. She cares about the open web, believes in equal opportunity to education, and is passionate about how the internet can be a force for good in the world.

2022

Track SLOs for everyone

Culture Clash: Why DevOps, SRE and Cybersecurity Teams Have Different Motivations and Professional Cultures

The most beautiful thing about SRE is your error budget, but in cybersecurity, error budgets just can’t happen when a critical vulnerability is found in what we call “zero day” events. Here we’ll talk about shared SLOs that serve to keep source code running, while keeping vulnerabilities out. This is a great talk for SREs, cybersecurity engineers and researchers, and especially the managers that oversee both of them. There are incredibly different methods and motivations for accelerating delivery and tolerating some risk with an error budget, and the cybersecurity approach of zero-tolerance to critical vulnerabilities. Both approaches are absolutely necessary to keep software running in the real world. This talk will show us how we can structurally support shared aims with SLOs, and most importantly, you’ll walk away with a little more empathy for two dev teams who typically only come together to put out fires for a better culture across teams in modern DevOps driven enterprise.

Shelby Spees

Site Reliability Engineer

Equinix Metal

Shelby Spees

Site Reliability Engineer

Equinix Metal

Intro to Tracing-Based SLOs

Learn more ›

Shelby Spees

Site Reliability Engineer

Equinix Metal

Shelby Spees is a site reliability engineer who’s been making the tech industry more accessible and equitable through better engineering practices since 2015. Shelby lives in Los Angeles, CA, where she enjoys making up songs about her rescue pitbull, Nova.

2022

Track The future of SLOs

Intro to Tracing-Based SLOs

Distributed tracing is becoming more accessible with libraries like OpenTelemetry, and custom instrumentation helps us capture data that better represents the user experience. Shelby Spees, SRE at Equinix, shares the benefits of tracing-based SLOs and how to define SLIs for trace data using examples from Equinix Metal.

Stefan Zier

Chief Architect

Sumo Logic

Stefan Zier

Chief Architect

Sumo Logic

Customer-Centric SLOs: Track SLOs for Every Customer

Learn more ›

Stefan Zier

Chief Architect

Sumo Logic

Stefan Zier is the Chief Architect at Sumo Logic. He enjoys building and running large distributed systems. He is also one of the key influencers for Sumo Logic’s DevSecOps culture. Prior to Sumo Logic, Stefan was a key contributor to several products at ArcSight (now HP).

2022

Track SLOs for everyone

Customer-Centric SLOs: Track SLOs for Every Customer

As an enterprise vendor, we don’t have millions of customers, but thousands, each with specific needs and behaviors. In this talk, I will walk you through how we mapped “is Sumo Logic working” to concrete SLOs, and how we track them on a per-customer basis. We use this data to surface new issues that would have been lost in a “global SLO” but may have major impact on individual customers.

Stephan Mousset

Lead Reliability Advocate

ING

Stephan Mousset

Lead Reliability Advocate

ING

How to Plant the SLO Seed, from Availability Reporting to SL...

Learn more ›

Stephan Mousset

Lead Reliability Advocate

ING

Stephan’s passion is helping making ING’s IT systems and services always match our typical users expectations. In his role as Lead Reliability Advocate Stephan is teaching the Art of SLO class from Google to hundreds of ING’s engineers so far. He also is a frequent speaker at ING about the Google SLI, SLO and Error Budgets concept and what good it will bring to our company and since November 2021 he is responsible for implementing it fully at ING globally. Stephan is also a innovative Product Owner responsible for performance test automation within the ING and is the proud founder of ING’s Performance Guild. He will tell you about this journey on https://testguild.com/podcast/performance/p85-stephan/ In Stephan’s free time he enjoys quality time with the family and drinking the occasional craft beer with friends.

2022

Track SLO Stories

How to Plant the SLO Seed, from Availability Reporting to SLO: A Real Life Story from the ING

ING is a major global financial institution with many external stakeholders / regulators that request availability reporting from us to prove that we meet the availability targets of our regulators. But how do we translate this into the experience of our typical customers and how do we steer on meeting their expectations? And how do we use it to steer our squads on where to put their engineering power to get the maximum value for our customers? This is where the SLI’s, SLO’s and Error Budget concept comes in which gives us an opportunity to not only steer on end to end availability but also on all other quality criteria our typical customer expects from us. In this story I will explain our journey so far on how we managed to make this a global ING standard in the future, expect a practical talk with a lot of real life usable examples.

Stephen Townshend

Site Reliability Engineer

IAG

Stephen Townshend

Site Reliability Engineer

IAG

Defining SLOs When You Don't Know Anything About SLOs

Learn more ›

Stephen Townshend

Site Reliability Engineer

IAG

Stephen pretended to be a performance engineer for thirteen years, and very recently started pretending to be an SRE. However, he is actually an actor, playing the role of a Site Reliability Engineer. He lost touch with reality at some point and is no longer sure if he is acting or this has become a reality. In the words of Robert Downey, Jr. in the film Tropic Thunder: “I know who I am. I’m a dude playing a dude disguised as another dude.

2022

Track SLO FUNdamentals

Defining SLOs When You Don't Know Anything About SLOs

Steve McGhee

Reliability Advocacy Engineer

Google SRE

Steve McGhee

Reliability Advocacy Engineer

Google SRE

SLO Classical Lit

Learn more ›

Steve McGhee

Reliability Advocacy Engineer

Google SRE

Steve was an SRE at Google for about 10 years, then left to help a company move to the Cloud. He's back at Google, helping more companies do that.

2022

Track SLO Stories

SLO Classical Lit

How can Greek Literature help us understand Internet Services? In this talk, Steve will provide a few points from this Ancient Literature and how to relate them to our modern Internet world.

Timothy Bonci

Principal DevOps Engineer

Cimpress

Timothy Bonci

Principal DevOps Engineer

Cimpress

Measure What Matters: How it Started and How it's Going Prov...

Learn more ›

Timothy Bonci

Principal DevOps Engineer

Cimpress

Tim (he/him) wears many hats and identifies with “the jack of all trades,” having worked many different roles across different industries - E-commerce, A/V, broadband engineering, even managing a wine and cheese shop. He is passionate about staffing, problem solving, having a wide-angle view of things, and can be a bit of an IT adrenaline junkie. Home is just north of Providence, RI, where he lives with his wife and 2 children on their alpaca and chicken farm.

2022

Track SLO Stories

Measure What Matters: How it Started and How it's Going Providing a CI/CD Platform

When I started building a new platform I had to decide when it was important enough to be woken up, so I needed to establish an SLO. What I found is that measuring what is easy to measure doesn’t always capture the intent of your SLI. I had to iterate a few times and take new things into consideration as the proverbial water started to drain and uncovered more rocks.

Weyert de Boer

Head of App Store Engineering

Tapico

Weyert de Boer

Head of App Store Engineering

Tapico

OpenSLO Alerting

Learn more ›

Weyert de Boer

Head of App Store Engineering

Tapico

Weyert is a converted interaction designer who is passionate about building amazing and user-friendly digital products, that are a joy to use! My developer story in short: from Flash to Web.

Weyert also contributes to various communities and is part of the OpenSLO team to help define the SLOs in a declarative way.

In his spare time, Weyert enjoys reading about ancient history, a hobby paleoanthropologist, helping developers out in various communities, and trying to get better at oil painting.

2022

Track The future of SLOs

OpenSLO Alerting

As part of the OpenSLo specification team I have worked on defining an extension which allows you to define alerting as part of your OpenSLO definitions. I would like to talk about how you can define alerts for your OpenSLOs

Custom SLOs for all: Manage Your Busines...

Automated deployments of SLOs Using Slot...

Burn Your Dashboards: The Case For SLO-F...

First Principles: How to Learn about SLO...

You Have an SLO, Whether You Know It or ...

Defining SLOs When You Don't Know Anythi...

SLIs the Hard Way

Good Reasons for SLOs in Less Than 10 Mi...

Tips for Running Successful SLO Workshop...

Customer-Centric SLOs: Track SLOs for Ev...

Media Sponsors

DevOps.com

DevOps.com hosts a variety of articles, videos, podcasts and
custom content, all designed to educate, inform and engage.

DevOps.com hosts a variety of articles, videos, po...

Visit sponsor’s page ›

TFiR

TFiR is a video-focussed story-telling platform covering Open Source, Cloud Native Computing, Security, Edge, 5G & AI/ML.

TFiR is a video-focussed story-telling platform co...

Visit sponsor’s page ›

The New Stack

For developers and engineers building and managing new stacks around the world that are built on open source technologies and distributed infrastructures.

For developers and engineers building and managing...

Visit sponsor’s page ›

VMblog

VMblog.com is dedicated to spreading the word about modern Data Center technologies like Virtualization, Cloud Computing, Containers, Hyperconvergence, IoT, Software-Defined "X", etc.

VMblog.com is dedicated to spreading the word abou...

Visit sponsor’s page ›

SLOconf 2022

Speakers

Ajuna Kyaruzi

Ajuna Kyaruzi

Ensuring Reliability using SLO Burn Rates

Ajuna Kyaruzi

Ensuring Reliability using SLO Burn Rates

Enjoyed video? Share your thoughts on Slack!

Aleksander Tarraro

Aleksander Tarraro

What is an Error Budget For?

Aleksander Tarraro

What is an Error Budget For?

Enjoyed video? Share your thoughts on Slack!

Alex Hidalgo

Alex Hidalgo

SLOs for Everyone

Alex Hidalgo

SLOs for Everyone

Enjoyed video? Share your thoughts on Slack!

Alex Rasmussen

Alex Rasmussen

The Real World Math and Implications of S3's 99.999999999% A...

Alex Rasmussen

The Real World Math and Implications of S3's 99.999999999% Advertised Durability

Enjoyed video? Share your thoughts on Slack!

Andreas Grabner

Andreas Grabner

Tips for Running Successful SLO Workshops in Under One Hour

Andreas Grabner

Tips for Running Successful SLO Workshops in Under One Hour

Enjoyed video? Share your thoughts on Slack!

Andrew Newdigate

Andrew Newdigate

Everyone Can Contribute to Our SLO

Andrew Newdigate

Everyone Can Contribute to Our SLO

Enjoyed video? Share your thoughts on Slack!

Andrew Snyder

Andrew Snyder

Using a Service Canvas to Define SLOs

Andrew Snyder

Using a Service Canvas to Define SLOs

Enjoyed video? Share your thoughts on Slack!

Ashutosh Agrawal

Ashutosh Agrawal

Using SLOs at Scale: How Disney+ Hotstar Streams One of the ...

Ashutosh Agrawal

Using SLOs at Scale: How Disney+ Hotstar Streams One of the Biggest Sports Tournaments in the World!

Enjoyed video? Share your thoughts on Slack!

Austin Krauza

Austin Krauza

What Are SLIs and Why Should I Care?

Austin Krauza

What Are SLIs and Why Should I Care?

Enjoyed video? Share your thoughts on Slack!

Austin Parker

Austin Parker

Burn Your Dashboards: The Case For SLO-First Monitoring

Austin Parker

Burn Your Dashboards: The Case For SLO-First Monitoring

Enjoyed video? Share your thoughts on Slack!

Bob Van Landuyt

Bob Van Landuyt

Everyone Can Contribute to Our SLO

Bob Van Landuyt

Everyone Can Contribute to Our SLO

Enjoyed video? Share your thoughts on Slack!

Christian Beedgen

Christian Beedgen

Dynamic Environments Need SLOs

Christian Beedgen

Dynamic Environments Need SLOs

Enjoyed video? Share your thoughts on Slack!

Colin Curtin

Colin Curtin

Humans First: Using Error Budgets to Keep Your Team Happy an...

Colin Curtin

Humans First: Using Error Budgets to Keep Your Team Happy and Healthy

Enjoyed video? Share your thoughts on Slack!