Yesterday, ZenIV had a problem with one of its many disks at 03:03 UTC -
one of two for the /home partition. The drive appeared to have filled
up all available sectors for reallocation, and then just got stuck.
Bryce cleared space on the FTP archive disks, and transferred /home to
there, which allowed the machine to resume normal operations.
Today, at around 05:29 UTC, the machine went offline - totally offline.
Our remote KVM card says that the machine is powered off, and no attempts
to turn it back on cause any change of state on the remote console.
We have contacted the hosting facility for assistance. They say that they
have an issue with the power strip that the machine is plugged into, and
they are replacing it. They said they'd call back. Further updates as we
get them.
However, this speeds up the timescale for ZenV somewhat - ZenIV's replacement
machine. We were hoping to be able to wait for Alan Cox to have more time
to discuss his ideas for virtualization on ZenV, but we believe we can't wait
for that anymore - we need to get ZenIV replaced.
Update: 13:08
Joy, oh joy. We had to call back (that's the norm in this
day and age - people no longer have pride in doing a "good job" - and
never make good on their promises to call you back - and people
wonder why I try to ensure that I am capable of doing everything I need to
myself. Relying on 3rd parties in my experience generally leads to failure.)
It seems that they needed us to raise another ticket to replace their PDU,
but didn't tell us. Raised at 12:30.
Update: 13:48
They're now saying the entire cabinet needs to be replaced. They also said
they put power back onto the machine, and the HD lamps lit up. But we're
not getting the BIOS screen, and we can see the PSU voltages bouncing around.
Update: 15:24
New PSU ordered, to be delivered to the hosting facility tomorrow. Machine
will be down at least until the PSU is replaced.
Update: 07 March 2013 16:52
The PSU was supposed to be delivered today - we paid extra to have it on
express delivery. The delivery driver couldn't find the data centre (we
know this, he phoned Bryce), and decided he wouldn't bother anymore,
deciding to state a non-delivery reason of "No one present to sign for
the delivery". Lies. Lies. And more damned Lies. There are two
security guards on duty at the data centre. Data centre staff are
surprised by that, because they know otherwise. Oh well, your standard
lies from parcel delivery agencies.
Will they deliver it tomorrow? Who knows. It'll probably be a repeat
performance.
Update: 08 March 2013 10:36
Well, the parcel tracking site says that it's been delivered and signed
for, but do we yet trust that it's been delivered to the correct location?
I think we don't assume that until we have confirmation from the data
centre.
Update: 17:29
Okay, this is getting tiresome. Replacement PSU arrived today, and was
fitted. However, it was found to be DoA. Investigating options at
present.
Update: 09 March 2013 17:45
ZenIV lives! Thanks to Dave Gilbert, who went to scan.co.uk, collected
a power supply, took it to Leeds by train, and fitted it, we now have
power on the machine again, and it's up and running!
|