nuestorage07 - Catastrophic RAID failure
Incident Report for AlphaVPS Status
Update
We've proceeded with replacing additional hardware. After we've performed manual fsck on each virtual machine's virtual drive - we've proceeded with booting up all affected VPS. As of now, all services are restored. We'll be monitoring the situation throughout the next days.
Posted Dec 22, 2024 - 01:35 EET
Update
Unfortunately, as we booted up the majority of the affected VMs, the node started becoming unstable again with flush operations starting to get queued. We're looking into alternative solutions at the moment. Further updates to come.
Posted Dec 21, 2024 - 17:44 EET
Monitoring
We'll be booting up the remaining virtual servers in the next 1 hour and continue with monitoring the hypervisor.
Posted Dec 21, 2024 - 16:01 EET
Update
We're continuing to work on restoring VMs, if your VM is affected - please wait until it's booted up.
Posted Dec 20, 2024 - 23:04 EET
Update
We are starting with restoration of individual KVM machines. Further updates to follow.
Posted Dec 20, 2024 - 15:54 EET
Identified
We're currently working on bringing the node back online in a stable state. We estimate that it would take us 3-4 additional hours before we start booting the virtual machines up and assessing the damage. Please stand by for further updates.
Posted Dec 20, 2024 - 11:31 EET
Investigating
We've been alerted of a catastrophic RAID failure on our nuestorage07 node. Earlier this morning, the controller started flapping, which jumped the I/O load to over 90. This made every VM on the host hypervisor unresponsive. We've proceeded with replacing the RAID adapter and we're moving towards an attempted recovery. At this point, we're unsure if recovery would be possible.
Posted Dec 20, 2024 - 09:49 EET
This incident affects: Nuremberg, Germany (KVM Services in Nuremberg).