Ayelet Sachto - Maintaining Reliable Systems: How to Minimize Incidents’ Impact - WTF is SRE 2023

Container Solutions
Container Solutions
34 بار بازدید - 5 ماه پیش - We know that failures will
We know that failures will occur, but how can we make them ‘hurt less’?How can we reduce the impact? I will cover several methods that we can do in order to reduce the customer impact both from technical and people aspects, and how we can use Incident Response and postmortem to be data driven

Incidents are expensive to the business, especially if customers leave us if we are perceived as unreliable. But failures will happen, it’s not an issue of IF, but a question of when. So how can we reduce the impact on our users? In this talk, I will review the production incident cycle, the time that we are not reliable and our users are not happy which includes the time to detect, time to repair and time between failures. I’ll share a few methods to tackle each one of those parts in order to minimize incident impact both from technical and people aspects, expending on incident response and postmortems to know what is the most important thing for us, and we want to be data driven in those decisions.

Ayelet Sachto, Cloud Engineer @Google
Ayelet on Twitter https://bit.ly/3I2tgSN

This presentation was recorded at WTF is SRE 2023, London. #WTFisSRE

Learn more about Container Solutions events:
https://www.container-solutions.com/e...

WTF is Cloud Native Collections:
https://www.container-solutions.com/w...

Links to our social media channels:
LinkedIn - https://bit.ly/3I0NMD2
Twitter - https://bit.ly/3I0NLyY
Youtube SUBSCRIBE TO OUR CHANNEL - https://bit.ly/3VXMUFg
5 ماه پیش در تاریخ 1403/01/03 منتشر شده است.
34 بـار بازدید شده
... بیشتر