Saturday, August 21, 2021

On Call At 99

Had a couple of on call workshops last week and have some thoughts that came out of them:

  • You get a little bump in your wage
  • If you get woken up in the middle of the night and have no idea how to resolve the situation just escalate to a more senior engineer
  • You always need to document what is happening as you have seen in post mortem
  • This is a really great resource for the general reasons for on call, the tools we use and how to respond to alerts
  • Some specific on call todo's are provided here by Giacomo
  • Lach's workshop didn't really contain anything too memorable but I did write this down, find correlated metrics in Datadog when looking ay something with spike in traffic
  • The op's exercise provided some great challenges for trying out our tools, using things like 99cli and Datadog to spot spikes and abnormalities