On 8/6 at 7:37 AM Pacific local time, NocTel engineering detected anomalies with service and application availability through monitoring systems and began investigating scope of services impaired. By 7:51 AM Pacific local, the NocTel engineering team had diagnosed a storage system failure impacted the gamut of services and worked to redirect access to available resources while beginning repair of affected services and servers.
The timeline of NocTel engineering's activities to address service and system repairs:
8:27 AM Pacific engineering team arrived at affected datacenter.
9:01 AM Pacific local voice services on PDX10, 11, and 19 were repaired, tested, and verified.
9:06 AM Pacific local voice services on PDX12 were repaired, tested, and verified.
9:26 AM Pacific local, some NocTel Flow accounts were repaired, tested, and verified.
9:32 AM Pacific local voice services on PDX16 were repaired, tested, and verified.
10:25 AM Pacific local voice services on PDX15 were repaired, tested, and verified.
11:31 AM Pacific local voice services on PDX13 were repaired, tested, and verified.
11:58 AM Pacific local, all Flow services restored.
12:06 PM Pacific local PDX14 voice services and faxing services repaired, tested, and verified.
2:11 PM Pacific local all services repaired, tested, and verified. Direct support conducted for affected accounts with specific configuration in proceeding hours.
In reflection of the incident, NocTel engineering recognizes architectural improvements to eliminate single points of failure and to reduce extent of "blast radius" when an incident - in general - occurs anywhere in the system or infrastructure. Longer term horizon, NocTel engineering already has ongoing and in-progress discussion, design, and implementation of a more performant and fault-tolerant architecture. In the interim, measures to address specific fault prone or high impact fault points will be implemented.