We would like to take this opportunity again to apologise to all ShopWired customers who were affected by the downtime on 15th August 2020.
This incident was highly unusual in that multiple parts of the infrastructure were affected simultaneously and the error also affected backup servers (that are present in case of failure).
The incident cause
An update was issued to fix identified vulnerabilities in third party software that is used on the ShopWired platform.
The security update was applied automatically to the ShopWired servers and this process completed successfully. Later, the servers were automatically rebooted after another separate and unconnected update.
This process is entirely automatic.
Unfortunately, following the update, some of the servers on our network became unresponsive. Our engineers were alerted and attempts were made to reboot manually, but unfortunately all of these attempts failed.
Our engineers then began to investigate the root cause of the issue, which was difficult to identify. Unfortunately a “perfect storm” of problems occurred where backup servers in place, to mitigate against such a failure, were also affected and became unresponsive.
Repeated attempts to reboot the existing infrastructure failed and parts of it required a rebuild.
Remediation
Our engineers have put in place systems to better guard against reboot failures across our network caused by automatic updates.
Backup servers are now more closely monitored with regards to updates, and where possible systems have now been installed to protect the backup infrastructure against errors caused by automatic updates.