Website errors
Incident Report for ShopWired
Postmortem

We would like to take this opportunity again to apologise to all ShopWired customers who were affected by the downtime on 15th August 2020.

This incident was highly unusual in that multiple parts of the infrastructure were affected simultaneously and the error also affected backup servers (that are present in case of failure).

The incident cause

An update was issued to fix identified vulnerabilities in third party software that is used on the ShopWired platform.

The security update was applied automatically to the ShopWired servers and this process completed successfully. Later, the servers were automatically rebooted after another separate and unconnected update.

This process is entirely automatic.

Unfortunately, following the update, some of the servers on our network became unresponsive. Our engineers were alerted and attempts were made to reboot manually, but unfortunately all of these attempts failed.

Our engineers then began to investigate the root cause of the issue, which was difficult to identify. Unfortunately a “perfect storm” of problems occurred where backup servers in place, to mitigate against such a failure, were also affected and became unresponsive.

Repeated attempts to reboot the existing infrastructure failed and parts of it required a rebuild.

Remediation

Our engineers have put in place systems to better guard against reboot failures across our network caused by automatic updates.

Backup servers are now more closely monitored with regards to updates, and where possible systems have now been installed to protect the backup infrastructure against errors caused by automatic updates.

Posted Sep 06, 2020 - 11:59 BST

Resolved
This incident now appears to be resolved with all systems fully operational. We will continue to monitor the platform infrastructure.
Posted Aug 15, 2020 - 17:03 BST
Update
We are continuing to monitor for any further issues.
Posted Aug 15, 2020 - 16:29 BST
Monitoring
A fix has been implemented and we are monitoring results. Most users will already be able to view their websites and access the admin system. We are continuing to work on the issue.
Posted Aug 15, 2020 - 16:13 BST
Update
We are continuing to work on bringing affected hardware back online. This page will be updated as soon as we are able to provide a further update.
Posted Aug 15, 2020 - 14:12 BST
Update
The underlying cause of the failure has been identified. We are about to commence the process of bringing hardware back online. We appreciate our customers patience at this time.
Posted Aug 15, 2020 - 12:55 BST
Update
We are working with our hosts to re-provision replacement hardware for failed areas of the infrastructure. We are advised that this process may take between two to three hours to complete and for the platform to be fully available for all customers. We apologise to all customers for the inconvenience caused.
Posted Aug 15, 2020 - 09:59 BST
Identified
We have identified hardware failure as the cause of the fault and are currently re-provisioning resources.
Posted Aug 15, 2020 - 09:34 BST
Investigating
We are currently investigating an outage on part of our network that is causing websites and admin to be unresponsive. We are working to establish the root cause and get a fix implemented.
Posted Aug 15, 2020 - 08:38 BST
This incident affected: Websites, Product Search, Checkout, ShopWired Admin System, and API.