lug.org.uk is dead (again)

Hi,

You may be looking at this because you have a web site or mailing list hosted by lug.org.uk and you are wondering where it went.

At just after 01:00 GMT today, Sunday 25th June, the server suffered some form of hardware hiccup. The following was seen on its console:

hda: lost interrupt
hda: dma_timer_expiry: dma status == 0x61
hda: DMA timeout error

and it locked up.

Over the last few weeks we have had similar problems a few times, but they all involved /dev/hdc and the server always came back after a power cycle via Black Cat‘s APC masterswitch. Last Wednesday I went to the data centre and replaced hdc for a new Western Digital drive in an attempt to cure the problem.

This time involves /dev/hda and the machine isn’t coming back after a power cycle. I expect the worst to be honest, but if we are lucky it’s simply the case that hda is dead and the BIOS is refusing to boot from hdc.

All this means that I need to go to the data centre later today, as soon as I am able, and assess the situation. We could be in for a long, possibly permanent downtime.

I know this sucks, but before complaining too much, please consider that we have no budget and our existing setup consists of desktop-quality hardware being used in a 24×7 hosting environment. If anyone is prepared to donate a decent 1U server that can take two IDE (PATA) drives and 3x1GB DDR RAM (from the existing hardware) then that would be really great.

I will update as soon as I know more.

Cheers,
Andy

6 thoughts on “lug.org.uk is dead (again)

  1. 🙁 @ problem
    🙁 🙁 at the comment about people complaining.

    I would like to say a hugemongous “thank you” on behalf of all the lug.org.uk users. I don’t think most of them realise how much effort you put in. I’m sure I don’t and I’m already extremely grateful.

    Adrian

  2. Hi,

    1) Do you use any sort of RAID? Or are you likely to have lost data?

    2) Have you thought about approaching any of the hardware vendors (sun, hp, dell etc) to see whether they could donate hardware? I suspect even a low end machine would do the job?

    Good luck fixing it….

    David.

  3. Yes, Linux software RAID is used in RAID-1 across two drives. Also backups are done every 4 hours. No data will have been lost, the most likely case is that the BIOS won’t or isn’t set up to fall back to booting from hdc (cheap desktop motherboard).

  4. I have only just got auth. to visit the datacentre and have plans for this afternoon already, so I will be there fo initial investigation about 8pm instead.

  5. Got back from redbus about an hour ago. Annoyingly the machine booted up and worked fine in the build room. Since the only hint we have is involving hda I will be returning tomorrow (Monday) night to swap that for another drive, and fit some extra/replacement fans, both provided by Alan Pope.

  6. BTW thanks Adrian for your kind words. I wasn’t fishing for sympathy, honest! 😉

    (Indeed, Popey has been down to Redbus twice now as well, and lots of people have helped and/or donated small bits of hardware – thanks all)

Leave a Reply

Your email address will not be published. Required fields are marked *