Resolved -
We've monitored the situation for the past 11 days and as of now, everything is stable and we're closing the issue.
Jan 1, 14:08 EET
Update -
We've proceeded with replacing additional hardware. After we've performed manual fsck on each virtual machine's virtual drive - we've proceeded with booting up all affected VPS. As of now, all services are restored. We'll be monitoring the situation throughout the next days.
Dec 22, 01:35 EET
Update -
Unfortunately, as we booted up the majority of the affected VMs, the node started becoming unstable again with flush operations starting to get queued. We're looking into alternative solutions at the moment. Further updates to come.
Dec 21, 17:44 EET
Monitoring -
We'll be booting up the remaining virtual servers in the next 1 hour and continue with monitoring the hypervisor.
Dec 21, 16:01 EET
Update -
We're continuing to work on restoring VMs, if your VM is affected - please wait until it's booted up.
Dec 20, 23:04 EET
Update -
We are starting with restoration of individual KVM machines. Further updates to follow.
Dec 20, 15:54 EET
Identified -
We're currently working on bringing the node back online in a stable state. We estimate that it would take us 3-4 additional hours before we start booting the virtual machines up and assessing the damage. Please stand by for further updates.
Dec 20, 11:31 EET
Investigating -
We've been alerted of a catastrophic RAID failure on our nuestorage07 node. Earlier this morning, the controller started flapping, which jumped the I/O load to over 90. This made every VM on the host hypervisor unresponsive. We've proceeded with replacing the RAID adapter and we're moving towards an attempted recovery. At this point, we're unsure if recovery would be possible.
Dec 20, 09:49 EET