Had a couple of on call workshops last week and have some thoughts that came out of them:
- You get a little bump in your wage
- If you get woken up in the middle of the night and have no idea how to resolve the situation just escalate to a more senior engineer
- You always need to document what is happening as you have seen in post mortem
- This is a really great resource for the general reasons for on call, the tools we use and how to respond to alerts
- Some specific on call todo's are provided here by Giacomo
- Lach's workshop didn't really contain anything too memorable but I did write this down, find correlated metrics in Datadog when looking ay something with spike in traffic
- The op's exercise provided some great challenges for trying out our tools, using things like 99cli and Datadog to spot spikes and abnormalities