Unplanned outages are painful and costly, as such should not be tolerated lightly. Yet do you know the root cause of each of your outages and what has been done to mitigate it happening in the future?
Make sure you receive a service outage analysis next time and make sure it contains:
- When it started affecting users?
- When the users could return to business as usual service?
- How was the incident classified?
- Does it have a time line recording key actions that took place?
- Does it explain technically what happened?
- Does it provide a root cause analysis?
- Does it explain what process/procedure failed from an operation perspective to allow it to happen?
- Does it give details of the work around put in place?
- Have you been supplied details of the remedial actions that will be performed to ensure it does not happen again?