Confusing hardware issues at home

I’ve got this server in my loft at home that’s mainly a file server for the data we use/view/listen to here. It looks like this:

A bit of a beast. When I bought it over 4 years ago I somehow thought I’d be adding a lot more drives. Anyway.

It’s been a good, reliable bit of kit and had no problems for a long time apart from overheating in the old house, but that was a problem with the room it was in. It’s never even lost a disk. A couple of months ago though the PSU went pop and ever since then it started occasionally giving me this sort of thing:

Mar 21 13:53:16 specialbrew kernel: [5875576.400044] ata3.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Mar 21 13:53:16 specialbrew kernel: [5875576.400095] ata3.01: cmd c8/00:50:9e:a2:1d/00:00:00:00:00/f2 tag 0 dma 40960 in
Mar 21 13:53:16 specialbrew kernel: [5875576.400098]          res 40/00:01:01:4f:c2/00:00:00:00:00/10 Emask 0x4 (timeout)
Mar 21 13:53:16 specialbrew kernel: [5875576.400167] ata3.01: status: { DRDY }
Mar 21 13:53:16 specialbrew kernel: [5875576.400196] ata3: soft resetting link
Mar 21 13:53:16 specialbrew kernel: [5875576.719196] ata3.00: configured for UDMA/33
Mar 21 13:53:16 specialbrew kernel: [5875576.759036] ata3.01: configured for UDMA/100
Mar 21 13:53:16 specialbrew kernel: [5875576.759075] ata3: EH complete
Mar 21 13:53:16 specialbrew kernel: [5875576.800851] sd 2:0:0:0: [sdc] 625134827 512-byte hardware sectors (320069 MB)
Mar 21 13:53:16 specialbrew kernel: [5875576.801386] sd 2:0:0:0: [sdc] Write Protect is off
Mar 21 13:53:16 specialbrew kernel: [5875576.801418] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Mar 21 13:53:16 specialbrew kernel: [5875576.808855] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 21 13:53:16 specialbrew kernel: [5875576.810058] sd 2:0:1:0: [sdd] 625134827 512-byte hardware sectors (320069 MB)
Mar 21 13:53:16 specialbrew kernel: [5875576.810452] sd 2:0:1:0: [sdd] Write Protect is off
Mar 21 13:53:16 specialbrew kernel: [5875576.810482] sd 2:0:1:0: [sdd] Mode Sense: 00 3a 00 00
Mar 21 13:53:16 specialbrew kernel: [5875576.867347] sd 2:0:1:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 21 13:53:16 specialbrew kernel: [5875576.871943] sd 2:0:0:0: [sdc] 625134827 512-byte hardware sectors (320069 MB)
Mar 21 13:53:16 specialbrew kernel: [5875576.873744] sd 2:0:0:0: [sdc] Write Protect is off
Mar 21 13:53:16 specialbrew kernel: [5875576.873770] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Mar 21 13:53:16 specialbrew kernel: [5875576.873966] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 21 13:53:16 specialbrew kernel: [5875576.874062] sd 2:0:1:0: [sdd] 625134827 512-byte hardware sectors (320069 MB)
Mar 21 13:53:16 specialbrew kernel: [5875576.874125] sd 2:0:1:0: [sdd] Write Protect is off
Mar 21 13:53:16 specialbrew kernel: [5875576.874148] sd 2:0:1:0: [sdd] Mode Sense: 00 3a 00 00
Mar 21 13:53:16 specialbrew kernel: [5875576.874195] sd 2:0:1:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

There’s 6 drives in there and the above messages have been seen referring to all of them at one time or another, so I don’t believe it’s as simple as a broken disk.

The incidences of the above have become more and more frequent, so today I spent some time trying to work out where the problem lay.

The way it seemed to affect all ATA busses made me think maybe the (new) PSU was underperforming, but I tried two different ones and they seem fine.

The six disks are inserted into two 3-bay Icydocks. Here’s what they look like:

They’re pretty dumb devices which just let you fit three 3.5″ disks into two 5.25″ bays. On the back they have three SATA data connectors (one for each disk), two molex power, one SATA power and a fan. I bought them because I didn’t want to buy a really expensive disk chassis for home, but I also didn’t want to screw six drives inside the case where they’d be hard to get access to.

Inside I have four of the drives connected to the motherboard’s SATA controller, and two of them connected to an additional Si3112 SATA card. This setup has been in place for over four years.

When all the drives are removed from the Icydocks and directly connected to SATA and power, everything appears to be fine. When either of the Icydocks have three disks in, the problem reappears. I then put three disks in an Icydock, three disks directly connected, but popped one of the disks in the Icydock out. This appears to also work fine (the file systems are all RAID-10 so can stand to run with one disk missing).

I’m a bit confused by that. When I was testing the Icydocks individually, I was using the same set of three disks with each one (with the other three disks connected directly). I could believe that the disk I have now removed is bad in some way that causes the whole bus to reset, but I would have to ask why it affects the other busses, and why it doesn’t happen when it’s directly connected.

I know other people who bought Icydocks and had a real struggle getting them to behave reliably, but mine worked well from the start and have done for over four years. I could believe that one of them went bad when the power popped, even though they are very simple electro-mechanical devices, but it’s hard to believe that two of them did.

I can’t just remove the Icydocks from the picture and forget about it because that leaves six SATA drives running on the floor. 🙂 They need to be inside some form of enclosure, and I don’t want to fork out for a new enclosure or two right now if I can help it.

I’ve left it there for this evening, but I’ll have to return to it tomorrow afternoon. I’ll probably start by putting the other three disks back in their Icydock to see if the removal of that one really does fix it.

Any ideas for ways to narrow the problem down?

I hate hardware.

Update 2010-03-31

I tentatively believe I’ve tracked down the issue.

Joel wins: despite the new PSU being a bit beefier in max output than the dead one I was replacing (500W vs 384W), the new one actually had a lower limit on the 12V rail: 2.5A vs the previous 3.3A.

I scavenged a PSU from elsewhere that also had 3.3A and everything seems fine now and has been for 2 days.

I think that things worked fine outside the Icydocks because the Icydocks have fans, which are probably not very good, and suck additional power. Or else they maybe don’t do any kind of staggered spinup that might happen without them.

Burn Notice’s Gabrielle Anwar and Bruce Campbell

We’re quite enjoying Burn Notice at the moment.

There’s something strangely familiar about Gabrielle Anwar. I looked at her IMDB and nothing really stood out (except possibly Things to Do in Denver When You’re Dead).

And then I got to Press Gang. Sam Black!

The other one I should really have known, but must admit I didn’t even consider until I noticed his name in the credits. How could I have failed to recognise Bruce Campbell!? I suppose there’s not enough zombies in Miami. Is that why his character is called Sam Axe?

Coping with busy mailing lists with Mutt

I’m on a couple of fairly busy mailing lists which by their nature have loose or no moderation. It’s natural that some mailing lists work well with tight moderation, even perhaps requiring every post to be approved, but it’s more common for there to be little or no moderation. This is not a bad thing; people have very different ideas about what sort of posts are interesting.

As a consequence though, I tend to find that many (sometimes the majority) of posts are uninteresting. Clearly if they are all uninteresting then I need to just unsubscribe, but I’m on plenty of lists that do come up with gems from time to time. I find Mutt, the text-based (console) email client to be really helpful at quickly getting through these mailing lists without missing too many interesting things, and I thought I’d share some ways I do that.

These are primarily simple tips for dealing with other people’s sub-optimal mailing list behaviour. In some cases it’s the poster who’s clearly in the wrong, but asking people to give a toss about those reading their words of wisdom is apparently considered offensive in many places, and doesn’t actually modify behaviour.

First off, Mutt is kind of a culture shock for most people. This post is only really for people who already use Mutt, or maybe who were already considering using it. I’m not even going to try convincing the typical gmail web user to switch. Or anyone really. Depending on how you configure it, Mutt looks a bit like this:

Screenshot of Mutt; click for bigger version

(Click for higher-resolution version)

If that freaks you out, it may be best to stop reading, move along. Run, don’t walk.

Context is useful ^

Like, I suspect, most Mutt users, I don’t really know most of its features. It’s pretty complicated to configure. When other Mutt users glance at my email they quite commonly ask, “how did you split the window like that!?” I didn’t, it’s not split, it’s just a different layout of message window. Here’s how:

set pager_index_lines=5

Just handy for being able to see a bit of the context of where the current post is in regards to the rest of the thread.

Threading matters ^

Once a thread has gone bad, it’s usually going to stay bad. The most useful tricks involve operating on whole threads at once, so you don’t have to tediously click something on every email. So while it might not seem like the most annoying thing at first, people not threading properly becomes one of the more annoying things later as it slows down whatever you are trying to do on the thread.

Sort your folders by thread:

set sort=threads

Unfortunately Mutt doesn’t seem to have a feature to break a thread when the subject header changes, so you might instead prefer to sort by subject. That had too many false positives for me though, even with sort_re.

Get rid of a whole thread ^

If you’re looking at the start of a thread, and it’s uninteresting, and you can see all of the thread below it, chances are that it’s all going to be uninteresting. You can mark it all read with ctrl-r. I prefer to see the whole thread on the screen before doing that, because there’s some chance that someone might change the subject line into something that becomes interesting.

I find myself reading some lists mostly with ctrl-r without even looking at the content of the posts. For example, a thread that starts with “Mandriva v Windows” isn’t very likely to contain anything except trolling and counter-trolling (If you are unaware of what Mandriva is, you are reading the wrong blog post and only need know that it is a brand name for a popular USB personal massager product).

The risk is that someone will go off at a tangent and post something that is actually interesting, without changing the subject. I’m willing to take the risk; they should have changed the subject IMHO. And anyway, you still have the mail, it’s only been marked as read.

Deal with subthreads ^

If you can see that the subject of a thread has changed or there’s some other reason why you might want to treat every message below the current one differently, then you can operate on subthreads.

The most simple thing is to break the subthread off into a new thread of its own. # will do that. You can then treat it differently, use the thread commands on it in isolation. That’s how I usually do it because if this happened then it means that the content of the subthread is very different to the rest of the thread, to me.

If you want to keep it as part of the same thread, you can mark the subthread read with esc-r (or alt-r).

Also useful for when someone decides that the right way to start a new email is to just press reply on some other random email.

Tagging ^

Occasionally a bunch of posts are the same sort of thing but they’re not in a thread. If you can find something about them that’s common then you can tag them based on that, with T. e.g. T followed by ~h @luser.example.com tags every email that has “@luser.example.com” in its headers.

If you can’t think of anything to match on then at the very least, just hitting t on each of the posts will tag it.

Once tagged, you’ll find that many existing Mutt commands that operate on a single email will work on a bunch of tagged emails as well, by prefixing the command with ;. So, if you imagine you’d tagged the above emails and wanted to mark them read, the next thing you’d do would be ;N.

Rejoin broken threading ^

Some people continue to use broken email clients that don’t do threading properly. All of their posts appear in a new thread. You can easily rejoin errant posts into an existing thread by tagging them, moving to the post they should be replies of, and using the & command. It may seem like a lot of hassle, but the benefit is that every reply to that one will then be in the right place too.

Useful for those pointless flamewars that just won’t die. “Oh look it’s that thread again, I’ll just mark it read again.”

More info ^

That’s about all I can think of in terms of the simple stuff I do every day when reading email. I hope it helped some newcomers to Mutt. There’s a lot of great tips in the documentation but it can seem a bit impenetrable at first: