Supermicro SATA DOM flash devices don’t report lifetime writes correctly

I’m playing around with a pair of Supermicro SATA DOM flash devices at the moment, evaluating them for use as the operating system storage for servers (as opposed to where customer data goes).

They’re flash devices with a limited write endurance. The smallest model (16GB), for example, is good for 17TB of writes. Therefore it’s important to know how much you’ve actually written to it.

Many SSDs and other flash devices expose the total amount written through the SMART attribute 241, Total_LBAs_Written. The SATA DOM devices do seem to expose this attribute, but right now they say this:

$ for dom in $(sudo lsblk --paths -d -o NAME,MODEL --noheadings |
    awk '/SATA SSD/ { print $1 }')
do
    echo -n "$dom: "
    sudo smartctl -A "$dom" |
      awk '/^241/ { print $10 * 512 * 1.0e-9, "GB" }'
done
/dev/sda: 0.00856934 GB
/dev/sdb: 0.00881715 GB

This being after install and (as of now) more than a week of uptime, ~9MB of lifetime writes isn’t credible.

Another place we can look for amount of bytes written is /proc/diskstats. The 10th column is the number of (512-byte) sectors written, so:

$ for dom in $(sudo lsblk -d -o NAME,MODEL --noheadings |
    awk '/SATA SSD/ { print $1 }')
do
     awk "/$dom / {
        print \$3, \$10 / 2 * 1.0e-6, \"GB\"
    }" /proc/diskstats
done
sda 3.93009 GB
sdb 3.93009 GB

Almost 4GB is a lot more believable, so can we just use /proc/diskstats? Well, the problem there is that those figures are only since boot. That won’t include, for example, all the data written during install.

Okay, so, are these figures even consistent? Let’s write 100MB and see what changes.

Since the figure provided by SMART attribute 241 apparently isn’t actually 512-byte blocks we’ll just print the raw value there.

Before:

$ for dom in $(sudo lsblk -d -o NAME,MODEL --noheadings |
    awk '/SATA SSD/ { print $1 }')
do
     awk "/$dom / {
        print \$3, \$10 / 2 * 1.0e-6, \"GB\"
    }" /proc/diskstats
done
sda 4.03076 GB
sdb 4.03076 GB
$ for dom in $(sudo lsblk --paths -d -o NAME,MODEL --noheadings |
  awk '/SATA SSD/ { print $1 }')
do
    echo -n "$dom: "
    sudo smartctl -A "$dom" |
      awk '/^241/ { print $10 }'
done
/dev/sda: 16835
/dev/sdb: 17318

Write 100MB:

$ dd if=/dev/urandom bs=1MB count=100 > /var/tmp/one_hundred_megabytes
100+0 records in
100+0 records out
100000000 bytes (100 MB) copied, 7.40454 s, 13.5 MB/s

(I used /dev/urandom just in case some compression might take place or something)

After:

$ for dom in $(sudo lsblk -d -o NAME,MODEL --noheadings |
    awk '/SATA SSD/ { print $1 }')
do
     awk "/$dom / {
        print \$3, \$10 / 2 * 1.0e-6, \"GB\"
    }" /proc/diskstats
done
sda 4.13046 GB
sdb 4.13046 GB
$ for dom in $(sudo lsblk --paths -d -o NAME,MODEL --noheadings |
  awk '/SATA SSD/ { print $1 }')
do
    echo -n "$dom: "
    sudo smartctl -A "$dom" |
      awk '/^241/ { print $10 }'
done
/dev/sda: 16932
/dev/sdb: 17416

Well, alright, all is apparently not lost: SMART attribute 241 went up by ~100 and diskstats agrees that ~100MB was written too, so it looks like it does actually report lifetime writes, but it’s reporting them as megabytes (109 bytes), not 512-byte sectors.

Every reference I can find says that Total_LBAs_Written is the number of 512-byte sectors, though, so in reporting units of 1MB I feel that these devices are doing the wrong thing.

Anyway, I’m a little alarmed that ~0.1% of the lifetime has gone already, although a lot of that would have been the install. I probably should take this opportunity to get rid of a lot of writes by tracking down logging of mundane garbage. Also this is the smallest model; the devices are rated for 1 DWPD so just over-provisioning by using a larger model than necessary will help.

Installing Debian by PXE using Supermicro IPMI Serial over LAN

Here’s how to install Debian jessie on a Supermicro server using PXE boot and the IPMI serial-over-LAN.

Using these instructions you will be able to complete an install of a remote machine, although you will initially need access to the BIOS to configure the IPMI part.

BIOS settings ^

This bit needs you to be in the same location as the machine, or else have someone who is make the required changes.

Press DEL to go into the BIOS configuration.

Under Advanced > PCIe/PCI/PnP Configuration make sure that the network interface through which you’ll reach your PXE server has the “PXE” option ROM set:

BIOS PSE option ROM

Under Advanced > Serial Port Console Redirection you’ll want to enable SOL Console Redirection.

BIOS serial console redirection

(Pictured here is also COM1 Console Redirection. This is for the physical serial port on the machine, not the one in the IPMI.)

Under SOL Console Redirection Settings you may as well set the Bits per second to 115200.

BIOS SOL redirection settings

Now it’s time to configure the IPMI so you can interact with it over the network. Under IPMI > BMC Network Configuration, put the IPMI on your management network:

IPMI network configuration

Connecting to the IPMI serial ^

With the above BIOS settings in place you should be able to save and reboot and then connect to the IPMI serial console. The default credentials are ADMIn / ADMIN which you should of course change with ipmitool, but that is for a different post.

There’s two ways to connect to the serial-over-LAN: You can ssh to the IPMI controller, or you can use ipmitool. Personally I prefer ssh, but the ipmitool way is like this:

$ ipmitool -I lanplus -H 192.168.1.22 -U ADMIN -a sol activate

The ssh way:

$ ssh ADMIN@192.168.1.22
The authenticity of host '192.168.1.22 (192.168.1.22)' can't be established.
RSA key fingerprint is b7:e1:12:94:37:81:fc:f7:db:6f:1c:00:e4:e0:e1:c4.
Are you sure you want to continue connecting (yes/no)?
Warning: Permanently added '192.168.1.22,192.168.1.22' (RSA) to the list of known hosts.
ADMIN@192.168.1.22's password:
 
ATEN SMASH-CLP System Management Shell, version 1.05
Copyright (c) 2008-2009 by ATEN International CO., Ltd.
All Rights Reserved 
 
 
-> cd /system1/sol1
/system1/sol1
 
-> start
/system1/sol1
press <Enter>, <Esc>, and then <T> to terminate session
(press the keys in sequence, one after the other)

They both end up displaying basically the same thing.

The serial console should just be displaying the boot process, which won’t go anywhere.

DHCP and TFTP server ^

You will need to configure a DHCP and TFTP server on an already-existing machine on the same LAN as your new server. They can both run on the same host.

The DHCP server responds to the initial requests for IP address configuration and passes along where to get the boot environment from. The TFTP server serves up that boot environment. The boot environment here consists of a kernel, initramfs and some configuration for passing arguments to the bootloader/kernel. The boot environment is provided by the Debian project.

DHCP ^

I’m using isc-dhcp-server. Its configuration file is at /etc/dhcp/dhcpd.conf.

You’ll need to know the MAC address of the server, which can be obtained either from the front page of the IPMI controller’s web interface (i.e. https://192.168.1.22/ in this case) or else it is displayed on the serial console when it attempts to do a PXE boot. So, add a section for that:

subnet 192.168.2.0 netmask 255.255.255.0 {
}
 
host foo {
    hardware ethernet 0C:C4:7A:7C:28:40;
    fixed-address 192.168.2.22;
    filename "pxelinux.0";
    next-server 192.168.2.251;
    option subnet-mask 255.255.255.0;
    option routers 192.168.2.1;
}

Here we set the network configuration of the new server with fixed-address, option subnet-mask and option routers. The IP address in next-server refers to the IP address of the TFTP server, and pxelinux.0 is what the new server will download from it.

Make sure that is running:

# service isc-dhcp-server start

DHCP uses UDP port 67, so make sure that is allowed through your firewall.

TFTP ^

A number of different TFTP servers are available. I use tftpd-hpa, which is mostly configured by variables in /etc/default/tftp-hpa:

TFTP_OPTIONS="--secure"
TFTP_USERNAME="tftp"
TFTP_DIRECTORY="/srv/tftp"
TFTP_ADDRESS="0.0.0.0:69"

TFTP_DIRECTORY is where you’ll put the files for the PXE environment.

Make sure that the TFTP server is running:

# service tftpd-hpa start

TFTP uses UDP port 69, so make sure that is allowed through your firewall.

Download the netboot files from your local Debian mirror:

$ cd /srv/tftp
$ curl -s http://ftp.YOUR-MIRROR.debian.org/debian/dists/jessie/main/installer-amd64/current/images/netboot/netboot.tar.gz | sudo tar zxvf -
./                
./version.info    
./ldlinux.c32     
./pxelinux.0      
./pxelinux.cfg    
./debian-installer/
./debian-installer/amd64/
./debian-installer/amd64/bootnetx64.efi
./debian-installer/amd64/grub/
./debian-installer/amd64/grub/grub.cfg
./debian-installer/amd64/grub/font.pf2
…

(This assumes you are installing a device with architecture amd64.)

At this point your TFTP server root should contain a debian-installer subdirectory and a couple of links into it:

$ ls -l .
total 8
drwxrwxr-x 3 root root 4096 Jun  4  2015 debian-installer
lrwxrwxrwx 1 root root   47 Jun  4  2015 ldlinux.c32 -> debian-installer/amd64/boot-screens/ldlinux.c32
lrwxrwxrwx 1 root root   33 Jun  4  2015 pxelinux.0 -> debian-installer/amd64/pxelinux.0
lrwxrwxrwx 1 root root   35 Jun  4  2015 pxelinux.cfg -> debian-installer/amd64/pxelinux.cfg
-rw-rw-r-- 1 root root   61 Jun  4  2015 version.info

You could now boot your server and it would call out to PXE to do its netboot, but would be displaying the installer process on the VGA output. If you intend to carry it out using the Remote Console facility of the IPMI interface then that may be good enough. If you want to do it over the serial-over-LAN though, you’ll need to edit some of the files that came out of the netboot.tar.gz to configure that.

Here’s a list of the files you need to edit. All you are doing in each one is telling it to use serial console. The changes are quite mechanical so you can easily come up with a script to do it, but here I will show the changes verbosely. All the files live in the debian-installer/amd64/boot-screens/ directory.

ttyS1 is used here because this system has a real serial port on ttyS0. 115200 is the baud rate of ttyS1 as configured in the BIOS earlier.

adtxt.cfg

From:

label expert
        menu label ^Expert install
        kernel debian-installer/amd64/linux
        append priority=low vga=788 initrd=debian-installer/amd64/initrd.gz --- 
include debian-installer/amd64/boot-screens/rqtxt.cfg
label auto
        menu label ^Automated install
        kernel debian-installer/amd64/linux
        append auto=true priority=critical vga=788 initrd=debian-installer/amd64/initrd.gz --- quiet

To:

label expert
        menu label ^Expert install
        kernel debian-installer/amd64/linux
        append priority=low console=ttyS1,115200n8 initrd=debian-installer/amd64/initrd.gz --- 
include debian-installer/amd64/boot-screens/rqtxt.cfg
label auto
        menu label ^Automated install
        kernel debian-installer/amd64/linux
        append auto=true priority=critical console=ttyS1,115200n8 initrd=debian-installer/amd64/initrd.gz --- quiet
rqtxt.cfg

From:

label rescue
        menu label ^Rescue mode
        kernel debian-installer/amd64/linux
        append vga=788 initrd=debian-installer/amd64/initrd.gz rescue/enable=true --- quiet

To:

label rescue
        menu label ^Rescue mode
        kernel debian-installer/amd64/linux
        append console=ttyS1,115200n8 initrd=debian-installer/amd64/initrd.gz rescue/enable=true --- quiet
syslinux.cfg

From:

# D-I config version 2.0
# search path for the c32 support libraries (libcom32, libutil etc.)
path debian-installer/amd64/boot-screens/
include debian-installer/amd64/boot-screens/menu.cfg
default debian-installer/amd64/boot-screens/vesamenu.c32
prompt 0
timeout 0

To:

serial 1 115200
console 1
# D-I config version 2.0
# search path for the c32 support libraries (libcom32, libutil etc.)
path debian-installer/amd64/boot-screens/
include debian-installer/amd64/boot-screens/menu.cfg
default debian-installer/amd64/boot-screens/vesamenu.c32
prompt 0
timeout 0
txt.cfg

From:

default install
label install
        menu label ^Install
        menu default
        kernel debian-installer/amd64/linux
        append vga=788 initrd=debian-installer/amd64/initrd.gz --- quiet

To:

default install
label install
        menu label ^Install
        menu default
        kernel debian-installer/amd64/linux
        append console=ttyS1,115200n8 initrd=debian-installer/amd64/initrd.gz --- quiet

Perform the install ^

Connect to the serial-over-LAN and get started. If the server doesn’t have anything currently installed then it should go straight to trying PXE boot. If it does have something on the storage that it would boot then you will have to use F12 at the BIOS screen to convince it to jump straight to PXE boot.

$ ssh ADMIN@192.168.1.22
ADMIN@192.168.1.22's password:
 
ATEN SMASH-CLP System Management Shell, version 1.05
Copyright (c) 2008-2009 by ATEN International CO., Ltd.
All Rights Reserved 
 
 
-> cd /system1/sol1
/system1/sol1
 
-> start
/system1/sol1
press <Enter>, <Esc>, and then <T> to terminate session
(press the keys in sequence, one after the other)
 
Intel(R) Boot Agent GE v1.5.13                                                  
Copyright (C) 1997-2013, Intel Corporation                                      
 
CLIENT MAC ADDR: 0C C4 7A 7C 28 40  GUID: 00000000 0000 0000 0000 0CC47A7C2840  
CLIENT IP: 192.168.2.22  MASK: 255.255.255.0  DHCP IP: 192.168.2.252             
GATEWAY IP: 192.168.2.1
 
PXELINUX 6.03 PXE 20150107 Copyright (C) 1994-2014 H. Peter Anvin et al    
 
 
 
 
 
 
                 ┌───────────────────────────────────────┐
                 │ Debian GNU/Linux installer boot menu  │
                 ├───────────────────────────────────────┤
                 │ Install                               │
                 │ Advanced options                    > │
                 │ Help                                  │
                 │ Install with speech synthesis         │
                 │                                       │
                 │                                       │
                 │                                       │
                 │                                       │
                 │                                       │
                 │                                       │
                 └───────────────────────────────────────┘
 
 
 
              Press ENTER to boot or TAB to edit a menu entry
 
 
  ┌───────────────────────┤ [!!] Select a language ├────────────────────────┐
  │                                                                         │
  │ Choose the language to be used for the installation process. The        │
  │ selected language will also be the default language for the installed   │
  │ system.                                                                 │
  │                                                                         │
  │ Language:                                                               │
  │                                                                         │
  │                               C                                         │
  │                               English                                   │
  │                                                                         │
  │     <Go Back>                                                           │
  │                                                                         │
  └─────────────────────────────────────────────────────────────────────────┘
 
 
 
 
<Tab> moves; <Space> selects; <Enter> activates buttons

…and now the installation proceeds as normal.

At the end of this you should be left with a system that uses ttyS1 for its console. You may need to tweak that depending on whether you want the VGA console also.

SSDs and Linux Native Command Queuing

Native Command Queueing ^

Native Command Queuing (NCQ) is an extension of the Serial ATA protocol that allows multiple requests to be sent to a drive, allowing the drive to order them in a way it considers optimal.

This is very handy for rotational media like conventional hard drives, because they have to move the head all over to do random IO, so in theory if they are allowed to optimise ordering then they may be able to do a better job of it. If the drive supports NCQ then it will advertise this fact to the operating system and Linux by default will enable it.

Queue depth ^

The maximum depth of the queue in SATA is 31 for practical purposes, and so if the drive supports NCQ then Linux will usually set the depth to 31. You can change the depth by writing a number between 1 and 31 to /sys/block/<device>/device/queue_depth. Writing 1 to the file effectively disables NCQ for that device.

NCQ and SSDs ^

So what about SSDs? They aren’t rotational media; any access is in theory the same as any other access, so no need to optimally order the commands, right?

The sad fact is, many SSDs even today have incompatibilities with SATA drivers and chipsets such that NCQ does not reliably work. There’s advice all over the place that NCQ can be disabled with no ill effect, because supposedly SSDs do not benefit from it. Some posts even go as far as to suggest that NCQ might be detrimental to performance with SSDs.

Well, let’s see what fio has to say about that.

The setup ^

  • Two Intel DC s3610 1.6TB SSDs in an MD RAID-10 on Debian 8.1.
  • noop IO scheduler.
  • fio operating on a 4GiB test file that is on an ext4 filesystem backed by LVM.
  • fio set to do a 70/30% mix of read vs write operations with 128 simultaneous IO operations in flight.

The goal of this is to simulate a busy highly parallel server load, such as you might see with a database.

The fio command line looks like this:

fio --randrepeat=1 \
    --ioengine=libaio \
    --direct=1 \
    --gtod_reduce=1 \
    --name=ncq \
    --filename=test \
    --bs=4k \
    --iodepth=128 \
    --size=4G \
    --readwrite=randrw \
    --rwmixread=70

Expected output will be something like this:

ncq: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
fio-2.1.11
Starting 1 process
Jobs: 1 (f=1): [m(1)] [100.0% done] [50805KB/21546KB/0KB /s] [12.8K/5386/0 iops] [eta 00m:00s]
ncq1: (groupid=0, jobs=1): err= 0: pid=11272: Sun Aug  9 06:29:33 2015
  read : io=2867.6MB, bw=44949KB/s, iops=11237, runt= 65327msec
  write: io=1228.5MB, bw=19256KB/s, iops=4813, runt= 65327msec
  cpu          : usr=4.39%, sys=25.20%, ctx=732814, majf=0, minf=6
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued    : total=r=734099/w=314477/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=128
 
Run status group 0 (all jobs):
   READ: io=2867.6MB, aggrb=44949KB/s, minb=44949KB/s, maxb=44949KB/s, mint=65327msec, maxt=65327msec
  WRITE: io=1228.5MB, aggrb=19255KB/s, minb=19255KB/s, maxb=19255KB/s, mint=65327msec, maxt=65327msec
 
Disk stats (read/write):
    dm-0: ios=732755/313937, merge=0/0, ticks=4865644/3457248, in_queue=8323636, util=99.97%, aggrios=734101/314673, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
    md4: ios=734101/314673, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=364562/313849, aggrmerge=2519/1670, aggrticks=2422422/2049132, aggrin_queue=4471730, aggrutil=94.37%
  sda: ios=364664/313901, merge=2526/1618, ticks=2627716/2223944, in_queue=4852092, util=94.37%
  sdb: ios=364461/313797, merge=2513/1722, ticks=2217128/1874320, in_queue=4091368, util=91.68%

The figures we’re interested in are the iops= ones, in this case 11237 and 4813 for read and write respectively.

Results ^

Here’s how different NCQ queue depths affected things. Click the graph image for the full size version.

Graph of the effect of NCQ queue depth on read/write IOPS

Conclusion ^

On this setup anything below a queue depth of about 8 is disastrous to performance. The aberration at a queue depth of 19 is interesting. This is actually repeatable. I have no explanation for it.

Don’t believe anyone who tells you that NCQ is unimportant for SSDs unless you’ve benchmarked that and proven it to yourself. Disabling NCQ on an Intel DC s3610 appears to reduce its performance to around 25% of what it would be with even a queue depth of 8. Modern SSDs, especially enterprise ones, have a parallel architecture that allows them to get multiple things done at once. They expect NCQ to be enabled.

It’s easy to guess why 8 might be the magic number for the DC s3610:

The top of the PCB has eight NAND emplacements and Intel’s proprietary eight-channel PC29AS21CB0 controller.

The newer NVMe devices are even more aggressive with this; while the SATA spec stops at one queue with a depth of 32, NVMe specifies up to 65k queues with a depth of up to 65k each! Modern SSDs are designed with this in mind.

Fun with Supermicro motherboard serial headers

or, “LOL, standards” ^

TL;DR: Most motherboards have a serial header in an IDC-10 (5×2 pins) arrangement with the pins as a row of even numbered pins (2,4,6,8,X) followed by a row of odd numbered pins (1,3,5,7,9). Supermicro ones appear to have the pins in sequential order (6,7,8,9,X and then 1,2,3,4,5). As a result a standard IDC-10 to DB-9 cable will not work and you’ll need to either hack one about or buy the Supermicro one.

Are we sitting comfortably? ^

I bought a Supermicro motherboard. It doesn’t have a serial port exposed at the back. I like to use serial ports for a serial console even though I am aware that IPMI exists. IPMI on this board works okay but I like knowing I can always get to the “real” serial port as well.

The motherboard has a COM1 serial header, and I wasn’t using the PCI expansion slot on the back of the chassis, so I decided to put a serial port there. I bought a typical IDC-10 / DB-9 cable and plate:

IDC-10 to DB-9

Didn’t work. Serial-over-LAN (IPMI) worked alright. On COM1 I would get either nothing or a run of garbage characters from time to time. I wasted a good number of hours messing with BIOS settings, baud rates, checking if my USB serial adaptor actually worked with another device (of which I only have one in my home), before I decided to sit down and check the pin numbering for both the header and the cable.

Looking at the motherboard manual we see this:

x10sdv board com1 pin layout

And the cable?

IDC-10 to DB-9 pinout

Notice anything amiss?

The cable’s pins go in a row of odd numbers and then a row of even numbers:

2 4 6 8 X
1 3 5 7 9
    -

The X is the missing pin (serial uses 9 pins) and the - indicates where the notch for the connector would be: next to pin 5 in this case.

The header’s pins go in sequential order:

6 7 8 9 X
1 2 3 4 5
    -

As a result all but pin 1 are incorrect.

You actually need a Supermicro cable for this. CBL-0010L is the part number in my case. CBL-0010LP would be the low profile version. Good luck finding it mentioned on Supermicro’s site, but your favourite reseller will probably know of it. As it was I found one on Ebay for £1.58+VAT, and it works now.

After knowing what to search for I also found someone else having a similar issue with a Supermicro board.

You could of course instead hack any existing cable’s pins about or fit an adaptor in between (as the person in the above link did).

Thanks Supermicro. Thupermicro.

OneRNG kickstarter arrived!

My OneRNG kickstarter arrived today. I had five units, so I chose three external models and two internal ones. The finish of the external model isn’t really up to the quality of an Entropy Key. Here’s a picture of them together.

OneRNG external and Entropy Key

Given that the external model looks rather flimsy — I could imagine it getting snapped in half if someone bumped into it — I think I’d probably prefer the internal model in practice. Here’s what that looks like:

OneRNG Internal

The three different connectors are to try to ensure you can find a useful connection angle no matter how your motherboard’s internal USB headers are laid out.

I haven’t yet plugged them in to check out how they work. This is probably going to have to wait a few weeks as I have quite a lot on.

Assuming they work about as well as the Entropy Keys then I only need to keep two of these for myself, so if anyone wants one I would be willing to sell it on to you at cost plus postage.

Has your CurrentCost ever done this?

Update: Replacing the battery and retraining the receiver fixed it. I suppose it must have had enough juice to flash the LED but not transmit.

A few days ago my CurrentCost starting reading just dashes. There’s also no transmitter icon, so I think it’s not receiving anything from the transmitter. It looks like this:

CurrentCost showing just dashes

I went and fished the transmitter box out of the meter closet expecting its batteries to be dead, but it still has its red LED flashing periodically, so I don’t think it’s that.

I did the thing where you hold down the button on the transmitter for 9 seconds and also hold down the V button on the display to make them pair. The display showed its “searching” screen for a while but then it went back to how it looks above.

Anyone had that happen before? It’s otherwise worked fine for 4 years or so (batteries replaced once).

Home file server / disk enclosure troubles

On March 26th the file server here at home started resetting one of its ATA links over and over, in a couple of bursts each hour. Like this:

Mar 26 13:38:34 specialbrew kernel: [5137674.573170] ata5.03: exception Emask 0x0 SAct 0xc0 SErr 0x400000 action 0x6
Mar 26 13:38:34 specialbrew kernel: [5137674.579164] ata5.03: irq_stat 0x00060002, device error via SDB FIS
Mar 26 13:38:34 specialbrew kernel: [5137674.585127] ata5.03: SError: { Handshk }
Mar 26 13:38:34 specialbrew kernel: [5137674.591016] ata5.03: failed command: WRITE FPDMA QUEUED
Mar 26 13:38:34 specialbrew kernel: [5137674.596962] ata5.03: cmd 61/80:30:80:79:10/00:00:14:00:00/40 tag 6 ncq 65536 out
Mar 26 13:38:34 specialbrew kernel: [5137674.596962]          res 41/84:00:9f:79:10/00:00:14:00:00/40 Emask 0x410 (ATA bus error) <F>
Mar 26 13:38:34 specialbrew kernel: [5137674.609065] ata5.03: status: { DRDY ERR }
Mar 26 13:38:34 specialbrew kernel: [5137674.615131] ata5.03: error: { ICRC ABRT }
Mar 26 13:38:34 specialbrew kernel: [5137674.621135] ata5.03: failed command: WRITE FPDMA QUEUED
Mar 26 13:38:34 specialbrew kernel: [5137674.627123] ata5.03: cmd 61/80:38:00:7a:10/00:00:14:00:00/40 tag 7 ncq 65536 out
Mar 26 13:38:34 specialbrew kernel: [5137674.627123]          res 98/d7:00:00:00:00/00:00:00:70:98/00 Emask 0x2 (HSM violation)
Mar 26 13:38:34 specialbrew kernel: [5137674.639241] ata5.03: status: { Busy }
Mar 26 13:38:34 specialbrew kernel: [5137674.645349] ata5.03: error: { ICRC UNC AMNF IDNF ABRT }
Mar 26 13:38:34 specialbrew kernel: [5137674.651591] ata5.03: hard resetting link
Mar 26 13:38:34 specialbrew kernel: [5137674.976944] ata5.03: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Mar 26 13:38:34 specialbrew kernel: [5137674.984592] ata5.03: configured for UDMA/100
Mar 26 13:38:34 specialbrew kernel: [5137674.990852] ata5: EH complete

Or this:

Mar 26 13:53:36 specialbrew kernel: [5138576.767785] ata5.02: exception Emask 0x0 SAct 0x100000 SErr 0x400000 action 0x6
Mar 26 13:53:36 specialbrew kernel: [5138576.774595] ata5.02: irq_stat 0x00060002, device error via SDB FIS
Mar 26 13:53:36 specialbrew kernel: [5138576.781330] ata5.02: SError: { Handshk }
Mar 26 13:53:36 specialbrew kernel: [5138576.788067] ata5.02: failed command: WRITE FPDMA QUEUED
Mar 26 13:53:36 specialbrew kernel: [5138576.794853] ata5.02: cmd 61/80:a0:00:20:a8/00:00:25:00:00/40 tag 20 ncq 65536 out
Mar 26 13:53:36 specialbrew kernel: [5138576.794853]          res 41/84:80:00:20:a8/00:00:25:00:00/00 Emask 0x410 (ATA bus error) <F>
Mar 26 13:53:36 specialbrew kernel: [5138576.808592] ata5.02: status: { DRDY ERR }
Mar 26 13:53:36 specialbrew kernel: [5138576.815527] ata5.02: error: { ICRC ABRT }
Mar 26 13:53:36 specialbrew kernel: [5138576.822504] ata5.02: hard resetting link
Mar 26 13:53:36 specialbrew kernel: [5138577.145789] ata5.02: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Mar 26 13:53:36 specialbrew kernel: [5138577.154616] ata5.02: configured for UDMA/100
Mar 26 13:53:36 specialbrew kernel: [5138577.161849] ata5: EH complete
Mar 26 13:53:38 specialbrew kernel: [5138578.402262] ata5.03: failed command: WRITE FPDMA QUEUED
Mar 26 13:53:38 specialbrew kernel: [5138578.409516] ata5.03: cmd 61/d8:f0:a0:5b:11/00:00:00:00:00/40 tag 30 ncq 110592 out
Mar 26 13:53:38 specialbrew kernel: [5138578.409516]          res 99/d7:00:00:00:00/00:00:00:e0:99/00 Emask 0x2 (HSM violation)
Mar 26 13:53:38 specialbrew kernel: [5138578.424086] ata5.03: status: { Busy }
Mar 26 13:53:38 specialbrew kernel: [5138578.431389] ata5.03: error: { ICRC UNC AMNF IDNF ABRT }
Mar 26 13:53:38 specialbrew kernel: [5138578.438850] ata5.03: hard resetting link
Mar 26 13:53:38 specialbrew kernel: [5138578.765827] ata5.03: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Mar 26 13:53:38 specialbrew kernel: [5138578.774446] ata5.03: configured for UDMA/100
Mar 26 13:53:38 specialbrew kernel: [5138578.781795] ata5: EH complete

Despite the masses of link resets a couple of times an hour it all seemed to be working still, and it was a few days before I looked into it further.

The hardware is an HP Microserver with a Sans Digital 8-bay SATA disk enclosure connected over 2x eSATA. The Microserver’s four internal SATA slots are used for something else; these messages relate to the two eSATA links (ata5 and ata6) to the disk enclosure. There’s seven disks in there, six of which form a btrfs filesystem which holds the majority of files we use here at home (yes, I’ve heard about ZFSonLinux. No, I’m not interested in using it yet).

I have never been too happy with that Sans Digital enclosure:

  • Half the LEDs on the front have never worked. So you only get link lights for one SATA link.
  • As a more general example of the above, the mechanical build quality just seems bad. It’s really flimsy.
  • It’s pretty slow to detect disks and spews errors in the dmesg as it does.
  • It’s done this thing before where it resets the SATA link over and over, but the problem mysteriously went away when everything was power-cycled.

So anyway, plan of action:

  1. Try rebooting everything.
  2. If no joy, buy a new enclosure from somewhere and try to make it not another one of those.

It was 30th March before I got around to giving that a go. There was a short period of terror where the Microserver abruptly declared

No boot device. Insert boot floppy.

but amusingly this turned out to be because the BIOS was set to prefer booting off a USB disk, and I had left my Sansa Clip plugged in to charge it. It was trying to boot off of an mp3 player.

Sadly once we did actually get past GruB things were not much happier. SATA links resetting pretty much continuously, to the point were it couldn’t even get to the end of Linux attaching the disks as SCSI devices. Unplugging the eSATA cables entirely left the Microserver happy to boot with no issues other than all my data being missing!

A process of elimination (take all disks out, boot, put disks back in one by one and boot each time until everything complains) allowed me to identify one particular disk that was really upsetting things. I was able to boot without that disk inserted, and I could mount the btrfs filesystem in degraded mode as it’s a RAID-1 for both data and metadata.

At this point I decided to push my luck and try inserting that one problematic disk again. As soon as I did it all went crazy again, and the link resets caused a bunch more devices to be ejected by Linux. I was left with a btrfs filesystem with three remaining devices that had mounted itself read-only due to read errors.

So, is it the disk or is the the enclosure?

I don’t have a spare enclosure so it’s rather difficult to stress test that, but bearing in mind that I’m not too happy with it anyway I decided to buy a new enclosure anyway and investigate the existing one later. This time I decided to try this thing, which appears to be a CFi-B7886CM. No I’ve never heard of them either but I doubt it is any different to what Sans Digital buy and slap their branding on.

It’s quite a bit cheaper than the Sans Digital unit, though obviously it only has one eSATA link. I’m not too bothered by that as I doubt that any four of my disks were exceeding 3Gbps or whatever, so yeah, whatever, I’ll put six disks on one link. It might even be a benefit if I want to add another one of these enclosures later, as there’s no more room in the Microserver for more eSATA cards.

While that was on order I got things back up and running with the five disks and had the btrfs filesystem running degraded (yes, I do have backups). I attached that suspect disk to another machine via a USB dock and was able to repeatedly read it all, perform SMART long tests on it without error, etc.

The disk caddies are almost identical, and are interchangeable.
The disk caddies are almost identical, and are interchangeable.

The new enclosure arrived today and my first impressions of it are fairly positive. In terms of appearance and dimensions it’s almost identical to the Sans Digital one. In fact the disk caddies look identical bar the colour of the plastic handle, and are interchangeable, which saved me the hassle of having to unscrew 28 screws (seven disks, four per disk) and screw them in again. It definitely feels bit more sturdy though.

On boot for some reason I expected the btrfs filesystem to not automatically mount. It had been running with a device missing for nearly two days, and in that state you have to do

# mount -odegraded /srv/tank

to get it to mount, so I was thinking that even with all disks inserted it would decide that the out-of-date disk was unacceptable and require manual intervention to mount it. Well, I was wrong about that. It automatically mounted it and then immediately started complaining:

Apr  1 14:46:14 specialbrew kernel: [  161.210994] BTRFS info (device sdj): csum failed ino 259 off 12288 csum 1424469249 expected csum 1836639932
Apr  1 14:46:14 specialbrew kernel: [  161.211321] BTRFS info (device sdj): csum failed ino 259 off 24576 csum 3532125205 expected csum 1877817807
Apr  1 14:46:19 specialbrew kernel: [  165.447070] verify_parent_transid: 9 callbacks suppressed
Apr  1 14:46:19 specialbrew kernel: [  165.457108] parent transid verify failed on 3438446813184 wanted 1144554 found 1141678
Apr  1 14:46:19 specialbrew kernel: [  165.468310] repair_io_failure: 16 callbacks suppressed
Apr  1 14:46:19 specialbrew kernel: [  165.478409] BTRFS: read error corrected: ino 1 off 3438446813184 (dev /dev/sdk sector 1525564240)
Apr  1 14:46:19 specialbrew kernel: [  165.491461] parent transid verify failed on 3438446821376 wanted 1144554 found 1141678
Apr  1 14:46:19 specialbrew kernel: [  165.501892] BTRFS: read error corrected: ino 1 off 3438446821376 (dev /dev/sdk sector 1525564256)
Apr  1 14:46:19 specialbrew kernel: [  165.512262] parent transid verify failed on 3438336446464 wanted 1144475 found 1141571
Apr  1 14:46:19 specialbrew kernel: [  165.522746] BTRFS: read error corrected: ino 1 off 3438336446464 (dev /dev/sdk sector 1525348680)
Apr  1 14:46:34 specialbrew kernel: [  180.411201] parent transid verify failed on 3892014911488 wanted 1144992 found 1142134
Apr  1 14:46:34 specialbrew kernel: [  180.422607] BTRFS: read error corrected: ino 1 off 3892014911488 (dev /dev/sdk sector 150856)
Apr  1 14:46:34 specialbrew kernel: [  180.442075] parent transid verify failed on 3892014706688 wanted 1144992 found 1142133
Apr  1 14:46:34 specialbrew kernel: [  180.453157] BTRFS: read error corrected: ino 1 off 3892014706688 (dev /dev/sdk sector 150456)
Apr  1 14:46:34 specialbrew kernel: [  180.475472] parent transid verify failed on 3892014710784 wanted 1144992 found 1142133
Apr  1 14:46:34 specialbrew kernel: [  180.486688] BTRFS: read error corrected: ino 1 off 3892014710784 (dev /dev/sdk sector 150464)
Apr  1 14:46:34 specialbrew kernel: [  180.508824] parent transid verify failed on 3892014714880 wanted 1144992 found 1142133
Apr  1 14:46:34 specialbrew kernel: [  180.519368] BTRFS: read error corrected: ino 1 off 3892014714880 (dev /dev/sdk sector 150472)
Apr  1 14:47:22 specialbrew kernel: [  229.019490] parent transid verify failed on 3438634471424 wanted 1144757 found 1141901
Apr  1 14:47:22 specialbrew kernel: [  229.056008] BTRFS: read error corrected: ino 1 off 3438634471424 (dev /dev/sdk sector 1525930760)
It's nice to have all the blinkenlights blinking.
It’s nice to have all the blinkenlights blinking.

This is apparently what you would normally expect from btrfs fixing the out-of-date metadata from the other copy.

Everything seems okay; there are no read errors and no more link resets, and a cursory inspection of important data is matching up with checksums from last good backup. Oh, and all the SATA link/activity LEDs work!

I have to now:

  • do a btrfs scrub to make sure all the data on sdk is correct.
  • investigate the Sans Digital enclosure to see if I can replicate its brokenness.
While we're at it, is there a brand of label tape that doesn't dry up and peel off?
While we’re at it, is there a brand of label tape that doesn’t dry up and peel off?

Electronically I can’t tell if it’s any different to the old enclosure (aside from only having one SATA link, not two).

Comparing the dmesg between “SCSI subsystem initialized” and the last SCSI disk being attached, for the old and the new enclosure I see a few interesting differences. The new one is a little bit faster, and has fewer weird errors (failed to IDENTIFY, etc.).

Old:

[    5.818389] ata5.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9

New:

[    8.116807] ata6.15: Port Multiplier 1.2, 0x197b:0x575f r0, 15 ports, feat 0x5/0xf

So the new one supports Port Multiplier v1.2 as opposed to v1.1. Presumably that is good news. I wonder what feat 0x1/0x9 vs feat 0x5/0xf means?

With the old enclosure the SATA links seem to come up as a mixture of 1.5 and 3.0Gbps, and multiple times as the links are reset:

[    3.510120] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    3.510209] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    3.510290] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    3.510359] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    5.818090] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[    6.138331] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[    6.458331] ata5.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    6.778330] ata5.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    7.738323] ata5.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[   10.082066] ata5.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[   10.402304] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   11.778298] ata5.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   12.098296] ata5.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   13.058294] ata5.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[   15.406036] ata5.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[   15.726279] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   16.046275] ata5.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   17.418267] ata5.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   18.378264] ata5.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[   20.586006] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[   20.906281] ata6.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   21.226248] ata6.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   21.866245] ata6.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   22.506241] ata6.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[   24.869982] ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[   25.190226] ata6.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   26.546216] ata6.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   27.186212] ata6.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   27.826212] ata6.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[   30.169952] ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[   30.490216] ata6.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   31.866187] ata6.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   32.506184] ata6.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   33.146180] ata6.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)

The new one, they’re all 6.0Gbps (although the eSATA link itself seems to only be 3Gbps; not sure why that would be but it’s no worse anyway):

[    3.509782] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    3.513788] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    3.513878] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    3.513965] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    7.961748] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[    8.454063] ata6.00: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    8.822059] ata6.01: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    9.174057] ata6.02: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    9.510064] ata6.03: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    9.846061] ata6.04: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   10.182059] ata6.05: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   10.518057] ata6.06: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

Hmm, probably should investigate why /dev/sd[a-d] all have their write cache disabled.

Update 2nd April:

The scrub seems to have gone okay:

scrub status for 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
scrub device /dev/sdh (id 2) history
        scrub started at Wed Apr  1 20:05:58 2015 and finished after 14642 seconds
        total bytes scrubbed: 383.42GiB with 0 errors
scrub device /dev/sdg (id 3) history
        scrub started at Wed Apr  1 20:05:58 2015 and finished after 14504 seconds
        total bytes scrubbed: 382.62GiB with 0 errors
scrub device /dev/sdf (id 4) history
        scrub started at Wed Apr  1 20:05:58 2015 and finished after 14436 seconds
        total bytes scrubbed: 383.00GiB with 0 errors
scrub device /dev/sdk (id 5) history
        scrub started at Wed Apr  1 20:05:58 2015 and finished after 21156 seconds
        total bytes scrubbed: 1.13TiB with 14530 errors
        error details: verify=10909 csum=3621
        corrected errors: 14530, uncorrectable errors: 0, unverified errors: 0
scrub device /dev/sdj (id 6) history
        scrub started at Wed Apr  1 20:05:58 2015 and finished after 5693 seconds
        total bytes scrubbed: 119.42GiB with 0 errors
scrub device /dev/sde (id 7) history
        scrub started at Wed Apr  1 20:05:58 2015 and finished after 5282 seconds
        total bytes scrubbed: 114.45GiB with 0 errors

Those dmesg excerpts.

Old enclosure:

[    3.010635] SCSI subsystem initialized
[    3.013325] libata version 3.00 loaded.
[    3.014937] ahci 0000:00:11.0: version 3.0
[    3.015245] ahci 0000:00:11.0: irq 42 for MSI/MSI-X
[    3.015350] ahci 0000:00:11.0: AHCI 0001.0200 32 slots 4 ports 3 Gbps 0xf impl SATA mode
[    3.015413] ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part 
[    3.016613] scsi0 : ahci
[    3.017073] scsi1 : ahci
[    3.017527] scsi2 : ahci
[    3.017891] scsi3 : ahci
[    3.018071] ata1: SATA max UDMA/133 abar m1024@0xfe5ffc00 port 0xfe5ffd00 irq 42
[    3.018123] ata2: SATA max UDMA/133 abar m1024@0xfe5ffc00 port 0xfe5ffd80 irq 42
[    3.018174] ata3: SATA max UDMA/133 abar m1024@0xfe5ffc00 port 0xfe5ffe00 irq 42
[    3.018224] ata4: SATA max UDMA/133 abar m1024@0xfe5ffc00 port 0xfe5ffe80 irq 42
[    3.510120] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    3.510209] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    3.510290] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    3.510359] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    3.511393] ata3.00: ATA-8: ST3320418AS, CC38, max UDMA/133
[    3.511446] ata3.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32)
[    3.512865] ata3.00: configured for UDMA/133
[    3.550857] ata2.00: ATA-7: ST3320620AS, 3.AAC, max UDMA/133
[    3.550910] ata2.00: 625134827 sectors, multi 0: LBA48 NCQ (depth 31/32)
[    3.552655] ata4.00: ATA-7: ST3320620AS, 3.AAC, max UDMA/133
[    3.552708] ata4.00: 625134827 sectors, multi 0: LBA48 NCQ (depth 31/32)
[    3.555002] ata1.00: ATA-7: ST3320620AS, 3.AAC, max UDMA/133
[    3.555055] ata1.00: 625134827 sectors, multi 0: LBA48 NCQ (depth 31/32)
[    3.609166] ata2.00: configured for UDMA/133
[    3.610965] ata4.00: configured for UDMA/133
[    3.613304] ata1.00: configured for UDMA/133
[    3.613707] scsi 0:0:0:0: Direct-Access     ATA      ST3320620AS      C    PQ: 0 ANSI: 5
[    3.614932] scsi 1:0:0:0: Direct-Access     ATA      ST3320620AS      C    PQ: 0 ANSI: 5
[    3.615974] scsi 2:0:0:0: Direct-Access     ATA      ST3320418AS      CC38 PQ: 0 ANSI: 5
[    3.616662] scsi 3:0:0:0: Direct-Access     ATA      ST3320620AS      C    PQ: 0 ANSI: 5
[    3.619964] sata_sil24 0000:02:00.0: version 1.1
[    3.620839] scsi4 : sata_sil24
[    3.621229] scsi5 : sata_sil24
[    3.621391] ata5: SATA max UDMA/100 host m128@0xfe8ffc00 port 0xfe8f8000 irq 16
[    3.621446] ata6: SATA max UDMA/100 host m128@0xfe8ffc00 port 0xfe8fa000 irq 16
[    3.914252] Switched to clocksource tsc
[    5.818090] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[    5.818389] ata5.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
[    5.820283] ata5.00: hard resetting link
[    6.138331] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[    6.138411] ata5.01: hard resetting link
[    6.458331] ata5.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    6.458412] ata5.02: hard resetting link
[    6.778330] ata5.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    6.778409] ata5.03: hard resetting link
[    7.098325] ata5.03: SATA link down (SStatus 0 SControl 320)
[    7.098427] ata5.04: hard resetting link
[    7.418325] ata5.04: SATA link down (SStatus 0 SControl 320)
[    7.418427] ata5.05: hard resetting link
[    7.738323] ata5.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[    7.739205] ata5.00: ATA-8: ST2000DL003-9VT166, CC3C, max UDMA/133
[    7.739259] ata5.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 31/32)
[    7.740177] ata5.00: configured for UDMA/100
[    7.850059] ata5.01: failed to IDENTIFY (I/O error, err_mask=0x11)
[    7.850116] ata5.15: hard resetting link
[    7.850161] ata5: controller in dubious state, performing PORT_RST
[   10.082066] ata5.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[   10.082368] ata5.00: hard resetting link
[   10.402304] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   11.458039] ata5.01: hard resetting link
[   11.778298] ata5.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   11.778378] ata5.02: hard resetting link
[   12.098296] ata5.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   12.738033] ata5.05: hard resetting link
[   13.058294] ata5.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[   13.060046] ata5.00: configured for UDMA/100
[   13.061547] ata5.01: ATA-8: ST500DM002-1BD142, KC45, max UDMA/133
[   13.061600] ata5.01: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[   13.063284] ata5.01: configured for UDMA/100
[   13.174029] ata5.02: failed to IDENTIFY (I/O error, err_mask=0x11)
[   13.174086] ata5.15: hard resetting link
[   13.174131] ata5: controller in dubious state, performing PORT_RST
[   15.406036] ata5.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[   15.406340] ata5.00: hard resetting link
[   15.726279] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   15.726359] ata5.01: hard resetting link
[   16.046275] ata5.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   17.098013] ata5.02: hard resetting link
[   17.418267] ata5.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   17.418347] ata5.03: hard resetting link
[   17.738266] ata5.03: SATA link down (SStatus 0 SControl 320)
[   17.738371] ata5.04: hard resetting link
[   18.058262] ata5.04: SATA link down (SStatus 0 SControl 320)
[   18.058364] ata5.05: hard resetting link
[   18.378264] ata5.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[   18.380014] ata5.00: configured for UDMA/100
[   18.383049] ata5.01: configured for UDMA/100
[   18.383981] ata5.02: ATA-8: ST2000DL003-9VT166, CC3C, max UDMA/133
[   18.384034] ata5.02: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[   18.385025] ata5.02: configured for UDMA/100
[   18.385136] ata5: EH complete
[   18.385504] scsi 4:0:0:0: Direct-Access     ATA      ST2000DL003-9VT1 CC3C PQ: 0 ANSI: 5
[   18.386442] scsi 4:1:0:0: Direct-Access     ATA      ST500DM002-1BD14 KC45 PQ: 0 ANSI: 5
[   18.387336] scsi 4:2:0:0: Direct-Access     ATA      ST2000DL003-9VT1 CC3C PQ: 0 ANSI: 5
[   20.586006] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[   20.586306] ata6.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
[   20.588209] ata6.00: hard resetting link
[   20.906281] ata6.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   20.906362] ata6.01: hard resetting link
[   21.226248] ata6.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   21.226328] ata6.02: hard resetting link
[   21.546245] ata6.02: SATA link down (SStatus 0 SControl 320)
[   21.546348] ata6.03: hard resetting link
[   21.866245] ata6.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   21.866326] ata6.04: hard resetting link
[   22.186243] ata6.04: SATA link down (SStatus 0 SControl 320)
[   22.186345] ata6.05: hard resetting link
[   22.506241] ata6.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[   22.507044] ata6.00: ATA-8: ST2000DM001-1CH164, CC26, max UDMA/133
[   22.507097] ata6.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 31/32)
[   22.526916] ata6.00: configured for UDMA/100
[   22.637976] ata6.01: failed to IDENTIFY (I/O error, err_mask=0x11)
[   22.638033] ata6.15: hard resetting link
[   22.638077] ata6: controller in dubious state, performing PORT_RST
[   24.869982] ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[   24.870284] ata6.00: hard resetting link
[   25.190226] ata6.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   26.225956] ata6.01: hard resetting link
[   26.546216] ata6.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   26.865951] ata6.03: hard resetting link
[   27.186212] ata6.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   27.505948] ata6.05: hard resetting link
[   27.826212] ata6.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[   27.828088] ata6.00: configured for UDMA/100
[   27.937960] ata6.01: failed to IDENTIFY (I/O error, err_mask=0x11)
[   27.938018] ata6.15: hard resetting link
[   27.938064] ata6: controller in dubious state, performing PORT_RST
[   30.169952] ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[   30.170255] ata6.00: hard resetting link
[   30.490216] ata6.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   31.545926] ata6.01: hard resetting link
[   31.866187] ata6.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   31.866267] ata6.02: hard resetting link
[   32.186183] ata6.02: SATA link down (SStatus 0 SControl 320)
[   32.186286] ata6.03: hard resetting link
[   32.506184] ata6.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   32.506265] ata6.04: hard resetting link
[   32.826180] ata6.04: SATA link down (SStatus 0 SControl 320)
[   32.826283] ata6.05: hard resetting link
[   33.146180] ata6.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[   33.148097] ata6.00: configured for UDMA/100
[   33.149175] ata6.01: ATA-8: ST2000DM001-1CH164, CC26, max UDMA/133
[   33.149232] ata6.01: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[   33.150077] ata6.01: configured for UDMA/100
[   33.150673] ata6.03: ATA-9: WDC WD30EZRX-00D8PB0, 80.00A80, max UDMA/133
[   33.151349] ata6.03: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[   33.151980] ata6.03: configured for UDMA/100
[   33.152091] ata6: EH complete
[   33.152449] scsi 5:0:0:0: Direct-Access     ATA      ST2000DM001-1CH1 CC26 PQ: 0 ANSI: 5
[   33.153351] scsi 5:1:0:0: Direct-Access     ATA      ST2000DM001-1CH1 CC26 PQ: 0 ANSI: 5
[   33.154294] scsi 5:3:0:0: Direct-Access     ATA      WDC WD30EZRX-00D 0A80 PQ: 0 ANSI: 5
[   33.159340] udevd[79]: starting version 175
[   33.279358] pps_core: LinuxPPS API ver. 1 registered
[   33.279447] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[   33.281013] PTP clock support registered
[   33.288070] tg3.c:v3.137 (May 11, 2014)
[   33.305489] ACPI: bus type USB registered
[   33.305628] usbcore: registered new interface driver usbfs
[   33.305692] usbcore: registered new interface driver hub
[   33.334492] sd 2:0:0:0: [sdc] 625142448 512-byte logical blocks: (320 GB/298 GiB)
[   33.334600] sd 1:0:0:0: [sdb] 625134827 512-byte logical blocks: (320 GB/298 GiB)
[   33.334663] sd 4:0:0:0: [sde] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[   33.334713] sd 4:0:0:0: [sde] 4096-byte physical blocks
[   33.334769] sd 3:0:0:0: [sdd] 625134827 512-byte logical blocks: (320 GB/298 GiB)
[   33.334851] sd 0:0:0:0: [sda] 625134827 512-byte logical blocks: (320 GB/298 GiB)
[   33.335019] sd 1:0:0:0: [sdb] Write Protect is off
[   33.335071] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[   33.335654] sd 3:0:0:0: [sdd] Write Protect is off
[   33.335738] sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[   33.335805] sd 0:0:0:0: [sda] Write Protect is off
[   33.335851] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[   33.335931] sd 2:0:0:0: [sdc] Write Protect is off
[   33.335981] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[   33.335997] sd 4:0:0:0: [sde] Write Protect is off
[   33.336043] sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00
[   33.336072] sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   33.336272] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   33.336430] sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   33.336549] sd 3:0:0:0: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   33.336657] sd 2:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   33.338151] usbcore: registered new device driver usb
[   33.338392] sd 4:1:0:0: [sdf] 976773168 512-byte logical blocks: (500 GB/465 GiB)
[   33.338448] sd 4:1:0:0: [sdf] 4096-byte physical blocks
[   33.338556] sd 4:1:0:0: [sdf] Write Protect is off
[   33.338602] sd 4:1:0:0: [sdf] Mode Sense: 00 3a 00 00
[   33.338631] sd 4:1:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   33.340427] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[   33.340790] ehci-pci: EHCI PCI platform driver
[   33.341526] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[   33.341963] ohci-pci: OHCI PCI platform driver
[   33.343818]  sdf: unknown partition table
[   33.344181] sd 4:1:0:0: [sdf] Attached SCSI disk
[   33.344271] sd 4:2:0:0: [sdg] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[   33.344322] sd 4:2:0:0: [sdg] 4096-byte physical blocks
[   33.344424] sd 4:2:0:0: [sdg] Write Protect is off
[   33.344470] sd 4:2:0:0: [sdg] Mode Sense: 00 3a 00 00
[   33.344497] sd 4:2:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   33.348760]  sdc: sdc1 sdc2 sdc3
[   33.348907] scsi6 : pata_atiixp
[   33.349487] sd 2:0:0:0: [sdc] Attached SCSI disk
[   33.349615] sd 5:0:0:0: [sdh] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[   33.349674] sd 5:0:0:0: [sdh] 4096-byte physical blocks
[   33.349777] sd 5:0:0:0: [sdh] Write Protect is off
[   33.349892] sd 5:0:0:0: [sdh] Mode Sense: 00 3a 00 00
[   33.349928] sd 5:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   33.351467] sd 5:1:0:0: [sdi] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[   33.351562] sd 5:1:0:0: [sdi] 4096-byte physical blocks
[   33.351659] sd 5:1:0:0: [sdi] Write Protect is off
[   33.351707] sd 5:1:0:0: [sdi] Mode Sense: 00 3a 00 00
[   33.351731] sd 5:1:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   33.354852]  sdb: sdb1 sdb2 sdb3
[   33.355394] sd 1:0:0:0: [sdb] Attached SCSI disk
[   33.355517] sd 5:3:0:0: [sdj] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
[   33.355570] sd 5:3:0:0: [sdj] 4096-byte physical blocks
[   33.355687] sd 5:3:0:0: [sdj] Write Protect is off
[   33.355746] sd 5:3:0:0: [sdj] Mode Sense: 00 3a 00 00
[   33.355774] sd 5:3:0:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   33.356077]  sdd: sdd1 sdd2 sdd3
[   33.356565] sd 3:0:0:0: [sdd] Attached SCSI disk
[   33.356772]  sde: unknown partition table
[   33.357024] sd 4:0:0:0: [sde] Attached SCSI disk
[   33.358014]  sda: sda1 sda2 sda3
[   33.358470] sd 0:0:0:0: [sda] Attached SCSI disk
[   33.360879] sd 0:0:0:0: Attached scsi generic sg0 type 0
[   33.361384] sd 1:0:0:0: Attached scsi generic sg1 type 0
[   33.361524] sd 2:0:0:0: Attached scsi generic sg2 type 0
[   33.362055] sd 3:0:0:0: Attached scsi generic sg3 type 0
[   33.362192] random: nonblocking pool is initialized
[   33.362992]  sdg: unknown partition table
[   33.363064] sd 4:0:0:0: Attached scsi generic sg4 type 0
[   33.363217] sd 4:1:0:0: Attached scsi generic sg5 type 0
[   33.363425] sd 4:2:0:0: [sdg] Attached SCSI disk
[   33.363684] sd 4:2:0:0: Attached scsi generic sg6 type 0
[   33.364163] sd 5:0:0:0: Attached scsi generic sg7 type 0
[   33.364262] scsi7 : pata_atiixp
[   33.364414] ata7: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xff00 irq 14
[   33.364482] ata8: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xff08 irq 15
[   33.364779] sd 5:1:0:0: Attached scsi generic sg8 type 0
[   33.364816] QUIRK: Enable AMD PLL fix
[   33.364856] ehci-pci 0000:00:12.2: EHCI Host Controller
[   33.364916] ehci-pci 0000:00:12.2: new USB bus registered, assigned bus number 1
[   33.364973] ehci-pci 0000:00:12.2: applying AMD SB700/SB800/Hudson-2/3 EHCI dummy qh workaround
[   33.365035] ehci-pci 0000:00:12.2: debug port 1
[   33.365143] ehci-pci 0000:00:12.2: irq 17, io mem 0xfe5ff800
[   33.365218] sd 5:3:0:0: Attached scsi generic sg9 type 0
[   33.372701] tg3 0000:03:00.0 eth0: Tigon3 [partno(BCM95723) rev 5784100] (PCI Express) MAC address d4:85:64:c9:e6:24
[   33.372767] tg3 0000:03:00.0 eth0: attached PHY is 5784 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
[   33.372820] tg3 0000:03:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[   33.372870] tg3 0000:03:00.0 eth0: dma_rwctrl[76180000] dma_mask[64-bit]
[   33.373201]  sdh: unknown partition table
[   33.373555] sd 5:0:0:0: [sdh] Attached SCSI disk
[   33.373944] ehci-pci 0000:00:12.2: USB 2.0 started, EHCI 1.00
[   33.374071] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[   33.374119] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   33.374169] usb usb1: Product: EHCI Host Controller
[   33.374214] usb usb1: Manufacturer: Linux 3.16.0-0.bpo.4-amd64 ehci_hcd
[   33.374260] usb usb1: SerialNumber: 0000:00:12.2
[   33.374691] hub 1-0:1.0: USB hub found
[   33.374781] hub 1-0:1.0: 5 ports detected
[   33.375334] ehci-pci 0000:00:13.2: EHCI Host Controller
[   33.375390] ehci-pci 0000:00:13.2: new USB bus registered, assigned bus number 2
[   33.375449] ehci-pci 0000:00:13.2: applying AMD SB700/SB800/Hudson-2/3 EHCI dummy qh workaround
[   33.375512] ehci-pci 0000:00:13.2: debug port 1
[   33.375607] ehci-pci 0000:00:13.2: irq 17, io mem 0xfe5ff400
[   33.376656]  sdi: unknown partition table
[   33.377076] sd 5:1:0:0: [sdi] Attached SCSI disk
[   33.385906] ehci-pci 0000:00:13.2: USB 2.0 started, EHCI 1.00
[   33.386167] usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
[   33.386220] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   33.386270] usb usb2: Product: EHCI Host Controller
[   33.386316] usb usb2: Manufacturer: Linux 3.16.0-0.bpo.4-amd64 ehci_hcd
[   33.386362] usb usb2: SerialNumber: 0000:00:13.2
[   33.387005] hub 2-0:1.0: USB hub found
[   33.387265] hub 2-0:1.0: 5 ports detected
[   33.387879] ehci-pci 0000:00:16.2: EHCI Host Controller
[   33.387933] ehci-pci 0000:00:16.2: new USB bus registered, assigned bus number 3
[   33.387986] ehci-pci 0000:00:16.2: applying AMD SB700/SB800/Hudson-2/3 EHCI dummy qh workaround
[   33.388049] ehci-pci 0000:00:16.2: debug port 1
[   33.388139] ehci-pci 0000:00:16.2: irq 17, io mem 0xfe5ff000
[   33.397878] ehci-pci 0000:00:16.2: USB 2.0 started, EHCI 1.00
[   33.398037] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002
[   33.398090] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   33.398142] usb usb3: Product: EHCI Host Controller
[   33.398188] usb usb3: Manufacturer: Linux 3.16.0-0.bpo.4-amd64 ehci_hcd
[   33.398234] usb usb3: SerialNumber: 0000:00:16.2
[   33.398551] hub 3-0:1.0: USB hub found
[   33.398615] hub 3-0:1.0: 4 ports detected
[   33.399512] ohci-pci 0000:00:12.0: OHCI PCI host controller
[   33.399567] ohci-pci 0000:00:12.0: new USB bus registered, assigned bus number 4
[   33.399666] ohci-pci 0000:00:12.0: irq 18, io mem 0xfe5fe000
[   33.458012] usb usb4: New USB device found, idVendor=1d6b, idProduct=0001
[   33.458069] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   33.458122] usb usb4: Product: OHCI PCI host controller
[   33.458167] usb usb4: Manufacturer: Linux 3.16.0-0.bpo.4-amd64 ohci_hcd
[   33.458214] usb usb4: SerialNumber: 0000:00:12.0
[   33.458798] hub 4-0:1.0: USB hub found
[   33.458864] hub 4-0:1.0: 5 ports detected
[   33.459403] ohci-pci 0000:00:13.0: OHCI PCI host controller
[   33.460079] ohci-pci 0000:00:13.0: new USB bus registered, assigned bus number 5
[   33.460173] ohci-pci 0000:00:13.0: irq 18, io mem 0xfe5fd000
[   33.517983] usb usb5: New USB device found, idVendor=1d6b, idProduct=0001
[   33.518039] usb usb5: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   33.518093] usb usb5: Product: OHCI PCI host controller
[   33.518142] usb usb5: Manufacturer: Linux 3.16.0-0.bpo.4-amd64 ohci_hcd
[   33.518212] usb usb5: SerialNumber: 0000:00:13.0
[   33.519228] hub 5-0:1.0: USB hub found
[   33.519291] hub 5-0:1.0: 5 ports detected
[   33.519871] ohci-pci 0000:00:16.0: OHCI PCI host controller
[   33.519946] ohci-pci 0000:00:16.0: new USB bus registered, assigned bus number 6
[   33.520048] ohci-pci 0000:00:16.0: irq 18, io mem 0xfe5fc000
[   33.577996] usb usb6: New USB device found, idVendor=1d6b, idProduct=0001
[   33.578054] usb usb6: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   33.578105] usb usb6: Product: OHCI PCI host controller
[   33.578151] usb usb6: Manufacturer: Linux 3.16.0-0.bpo.4-amd64 ohci_hcd
[   33.578198] usb usb6: SerialNumber: 0000:00:16.0
[   33.578570] hub 6-0:1.0: USB hub found
[   33.578635] hub 6-0:1.0: 4 ports detected
[   34.261707]  sdj: unknown partition table
[   34.262244] sd 5:3:0:0: [sdj] Attached SCSI disk

And with the new enclosure:

[    3.014000] SCSI subsystem initialized
[    3.016739] libata version 3.00 loaded.
[    3.018309] ahci 0000:00:11.0: version 3.0
[    3.018588] ahci 0000:00:11.0: irq 42 for MSI/MSI-X
[    3.018684] ahci 0000:00:11.0: AHCI 0001.0200 32 slots 4 ports 3 Gbps 0xf impl SATA mode
[    3.018742] ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part 
[    3.019841] scsi0 : ahci
[    3.020241] scsi1 : ahci
[    3.020705] scsi2 : ahci
[    3.021119] scsi3 : ahci
[    3.021317] ata1: SATA max UDMA/133 abar m1024@0xfe5ffc00 port 0xfe5ffd00 irq 42
[    3.021372] ata2: SATA max UDMA/133 abar m1024@0xfe5ffc00 port 0xfe5ffd80 irq 42
[    3.021427] ata3: SATA max UDMA/133 abar m1024@0xfe5ffc00 port 0xfe5ffe00 irq 42
[    3.021479] ata4: SATA max UDMA/133 abar m1024@0xfe5ffc00 port 0xfe5ffe80 irq 42
[    3.509782] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    3.513788] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    3.513878] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    3.513965] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    3.515180] ata3.00: ATA-8: ST3320418AS, CC38, max UDMA/133
[    3.515233] ata3.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32)
[    3.516651] ata3.00: configured for UDMA/133
[    3.554413] ata1.00: ATA-7: ST3320620AS, 3.AAC, max UDMA/133
[    3.554466] ata1.00: 625134827 sectors, multi 0: LBA48 NCQ (depth 31/32)
[    3.554593] ata2.00: ATA-7: ST3320620AS, 3.AAC, max UDMA/133
[    3.554639] ata2.00: 625134827 sectors, multi 0: LBA48 NCQ (depth 31/32)
[    3.557219] ata4.00: ATA-7: ST3320620AS, 3.AAC, max UDMA/133
[    3.557272] ata4.00: 625134827 sectors, multi 0: LBA48 NCQ (depth 31/32)
[    3.612719] ata1.00: configured for UDMA/133
[    3.612949] ata2.00: configured for UDMA/133
[    3.613000] scsi 0:0:0:0: Direct-Access     ATA      ST3320620AS      C    PQ: 0 ANSI: 5
[    3.613707] scsi 1:0:0:0: Direct-Access     ATA      ST3320620AS      C    PQ: 0 ANSI: 5
[    3.614554] scsi 2:0:0:0: Direct-Access     ATA      ST3320418AS      CC38 PQ: 0 ANSI: 5
[    3.615496] ata4.00: configured for UDMA/133
[    3.615739] scsi 3:0:0:0: Direct-Access     ATA      ST3320620AS      C    PQ: 0 ANSI: 5
[    3.619004] sata_sil24 0000:02:00.0: version 1.1
[    3.619834] scsi4 : sata_sil24
[    3.620237] scsi5 : sata_sil24
[    3.620414] ata5: SATA max UDMA/100 host m128@0xfe8ffc00 port 0xfe8f8000 irq 16
[    3.620468] ata6: SATA max UDMA/100 host m128@0xfe8ffc00 port 0xfe8fa000 irq 16
[    3.921925] Switched to clocksource tsc
[    5.697760] ata5: SATA link down (SStatus 0 SControl 0)
[    7.961748] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[    8.116807] ata6.15: Port Multiplier 1.2, 0x197b:0x575f r0, 15 ports, feat 0x5/0xf
[    8.119113] ata6.00: hard resetting link
[    8.454063] ata6.00: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    8.454167] ata6.01: hard resetting link
[    8.822059] ata6.01: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    8.822162] ata6.02: hard resetting link
[    9.174057] ata6.02: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    9.174160] ata6.03: hard resetting link
[    9.510064] ata6.03: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    9.510168] ata6.04: hard resetting link
[    9.846061] ata6.04: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    9.846165] ata6.05: hard resetting link
[   10.182059] ata6.05: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   10.182164] ata6.06: hard resetting link
[   10.518057] ata6.06: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   10.518162] ata6.07: hard resetting link
[   10.838099] ata6.07: SATA link down (SStatus 0 SControl 320)
[   10.838219] ata6.08: hard resetting link
[   11.158095] ata6.08: SATA link down (SStatus 0 SControl 320)
[   11.158213] ata6.09: hard resetting link
[   11.478093] ata6.09: SATA link down (SStatus 0 SControl 320)
[   11.478212] ata6.10: hard resetting link
[   11.798094] ata6.10: SATA link down (SStatus 0 SControl 320)
[   11.798214] ata6.11: hard resetting link
[   12.118088] ata6.11: SATA link down (SStatus 0 SControl 320)
[   12.118208] ata6.12: hard resetting link
[   12.438091] ata6.12: SATA link down (SStatus 0 SControl 320)
[   12.438209] ata6.13: hard resetting link
[   12.758087] ata6.13: SATA link down (SStatus 0 SControl 320)
[   12.758206] ata6.14: hard resetting link
[   13.078082] ata6.14: SATA link down (SStatus 0 SControl 320)
[   13.078746] ata6.00: ATA-9: WDC WD30EZRX-00D8PB0, 80.00A80, max UDMA/133
[   13.078802] ata6.00: 5860533168 sectors, multi 16: LBA48 NCQ (depth 31/32)
[   13.079454] ata6.00: configured for UDMA/100
[   13.083304] ata6.01: ATA-8: ST2000DM001-1CH164, CC26, max UDMA/133
[   13.083357] ata6.01: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[   17.852068] ata6.01: configured for UDMA/100
[   17.855904] ata6.02: ATA-8: ST2000DM001-1CH164, CC26, max UDMA/133
[   17.855957] ata6.02: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[   22.590353] ata6.02: configured for UDMA/100
[   22.596782] ata6.03: ATA-8: ST2000DL003-9VT166, CC3C, max UDMA/133
[   22.596838] ata6.03: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[   22.614183] ata6.03: configured for UDMA/100
[   22.616206] ata6.04: ATA-8: ST500DM002-1BD142, KC45, max UDMA/133
[   22.616259] ata6.04: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[   22.640259] ata6.04: configured for UDMA/100
[   22.646640] ata6.05: ATA-8: ST2000DL003-9VT166, CC3C, max UDMA/133
[   22.646693] ata6.05: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[   22.664004] ata6.05: configured for UDMA/100
[   22.664550] ata6.06: ATA-9: WDC WD30EZRX-00D8PB0, 80.00A80, max UDMA/133
[   22.664603] ata6.06: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[   22.665186] ata6.06: configured for UDMA/100
[   22.665437] ata6: EH complete
[   22.665865] scsi 5:0:0:0: Direct-Access     ATA      WDC WD30EZRX-00D 0A80 PQ: 0 ANSI: 5
[   22.666964] scsi 5:1:0:0: Direct-Access     ATA      ST2000DM001-1CH1 CC26 PQ: 0 ANSI: 5
[   22.668009] scsi 5:2:0:0: Direct-Access     ATA      ST2000DM001-1CH1 CC26 PQ: 0 ANSI: 5
[   22.668589] scsi 5:3:0:0: Direct-Access     ATA      ST2000DL003-9VT1 CC3C PQ: 0 ANSI: 5
[   22.669046] scsi 5:4:0:0: Direct-Access     ATA      ST500DM002-1BD14 KC45 PQ: 0 ANSI: 5
[   22.669519] scsi 5:5:0:0: Direct-Access     ATA      ST2000DL003-9VT1 CC3C PQ: 0 ANSI: 5
[   22.669972] scsi 5:6:0:0: Direct-Access     ATA      WDC WD30EZRX-00D 0A80 PQ: 0 ANSI: 5
[   22.674753] udevd[79]: starting version 175
[   22.718011] pps_core: LinuxPPS API ver. 1 registered
[   22.718099] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[   22.718474] PTP clock support registered
[   22.723563] tg3.c:v3.137 (May 11, 2014)
[   22.779806] tg3 0000:03:00.0 eth0: Tigon3 [partno(BCM95723) rev 5784100] (PCI Express) MAC address d4:85:64:c9:e6:24
[   22.779903] tg3 0000:03:00.0 eth0: attached PHY is 5784 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
[   22.779956] tg3 0000:03:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[   22.780009] tg3 0000:03:00.0 eth0: dma_rwctrl[76180000] dma_mask[64-bit]
[   22.781303] ACPI: bus type USB registered
[   22.781448] usbcore: registered new interface driver usbfs
[   22.781515] usbcore: registered new interface driver hub
[   22.808386] usbcore: registered new device driver usb
[   22.809764] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[   22.810453] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[   22.810686] ehci-pci: EHCI PCI platform driver
[   22.810872] ohci-pci: OHCI PCI platform driver
[   22.823167] scsi6 : pata_atiixp
[   22.832627] sd 1:0:0:0: [sdb] 625134827 512-byte logical blocks: (320 GB/298 GiB)
[   22.832785] sd 1:0:0:0: [sdb] Write Protect is off
[   22.832834] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[   22.832863] sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   22.833349] sd 2:0:0:0: [sdc] 625142448 512-byte logical blocks: (320 GB/298 GiB)
[   22.833402] sd 5:0:0:0: [sde] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
[   22.833404] sd 5:0:0:0: [sde] 4096-byte physical blocks
[   22.833417] sd 3:0:0:0: [sdd] 625134827 512-byte logical blocks: (320 GB/298 GiB)
[   22.833504] sd 5:0:0:0: [sde] Write Protect is off
[   22.833507] sd 5:0:0:0: [sde] Mode Sense: 00 3a 00 00
[   22.833517] sd 3:0:0:0: [sdd] Write Protect is off
[   22.833520] sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[   22.833564] sd 5:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   22.833642] sd 3:0:0:0: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   22.833777] sd 0:0:0:0: [sda] 625134827 512-byte logical blocks: (320 GB/298 GiB)
[   22.834163] sd 0:0:0:0: [sda] Write Protect is off
[   22.834212] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[   22.834241] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   22.835054] sd 2:0:0:0: [sdc] Write Protect is off
[   22.835101] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[   22.835130] sd 2:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   22.845520]  sdc: sdc1 sdc2 sdc3
[   22.846184] sd 2:0:0:0: [sdc] Attached SCSI disk
[   22.846305] sd 5:1:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[   22.846356] sd 5:1:0:0: [sdf] 4096-byte physical blocks
[   22.846455] sd 5:1:0:0: [sdf] Write Protect is off
[   22.846501] sd 5:1:0:0: [sdf] Mode Sense: 00 3a 00 00
[   22.846528] sd 5:1:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   22.850755] sd 5:2:0:0: [sdg] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[   22.850849] sd 5:2:0:0: [sdg] 4096-byte physical blocks
[   22.850953] sd 5:2:0:0: [sdg] Write Protect is off
[   22.850999] sd 5:2:0:0: [sdg] Mode Sense: 00 3a 00 00
[   22.851026] sd 5:2:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   22.853444]  sda: sda1 sda2 sda3
[   22.854098]  sdb: sdb1 sdb2 sdb3
[   22.854559] sd 1:0:0:0: [sdb] Attached SCSI disk
[   22.854682] sd 5:3:0:0: [sdh] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[   22.854738] sd 5:3:0:0: [sdh] 4096-byte physical blocks
[   22.854832] sd 0:0:0:0: [sda] Attached SCSI disk
[   22.854955] sd 5:4:0:0: [sdi] 976773168 512-byte logical blocks: (500 GB/465 GiB)
[   22.855008] sd 5:4:0:0: [sdi] 4096-byte physical blocks
[   22.855069] sd 5:3:0:0: [sdh] Write Protect is off
[   22.855118] sd 5:3:0:0: [sdh] Mode Sense: 00 3a 00 00
[   22.855174] sd 5:3:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   22.855303]  sdf: unknown partition table
[   22.855502] sd 5:4:0:0: [sdi] Write Protect is off
[   22.855554] sd 5:4:0:0: [sdi] Mode Sense: 00 3a 00 00
[   22.855666] sd 5:4:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   22.855884] sd 5:1:0:0: [sdf] Attached SCSI disk
[   22.856103] sd 5:5:0:0: [sdj] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[   22.856154] sd 5:5:0:0: [sdj] 4096-byte physical blocks
[   22.856354]  sdd: sdd1 sdd2 sdd3
[   22.856791] sd 3:0:0:0: [sdd] Attached SCSI disk
[   22.856903] sd 5:6:0:0: [sdk] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
[   22.856954] sd 5:6:0:0: [sdk] 4096-byte physical blocks
[   22.857051] sd 5:6:0:0: [sdk] Write Protect is off
[   22.857097] sd 5:6:0:0: [sdk] Mode Sense: 00 3a 00 00
[   22.857123] sd 5:6:0:0: [sdk] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   22.857453] sd 5:5:0:0: [sdj] Write Protect is off
[   22.857504] sd 5:5:0:0: [sdj] Mode Sense: 00 3a 00 00
[   22.857542] sd 5:5:0:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   22.861699] sd 0:0:0:0: Attached scsi generic sg0 type 0
[   22.864256] scsi7 : pata_atiixp
[   22.864427] ata7: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xff00 irq 14
[   22.864479] ata8: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xff08 irq 15
[   22.864823] QUIRK: Enable AMD PLL fix
[   22.864866] ehci-pci 0000:00:12.2: EHCI Host Controller
[   22.864921] ehci-pci 0000:00:12.2: new USB bus registered, assigned bus number 1
[   22.864979] ehci-pci 0000:00:12.2: applying AMD SB700/SB800/Hudson-2/3 EHCI dummy qh workaround
[   22.865041] ehci-pci 0000:00:12.2: debug port 1
[   22.865148] ehci-pci 0000:00:12.2: irq 17, io mem 0xfe5ff800
[   22.865331]  sdg: unknown partition table
[   22.865675] sd 5:2:0:0: [sdg] Attached SCSI disk
[   22.868644] sd 1:0:0:0: Attached scsi generic sg1 type 0
[   22.869290] sd 2:0:0:0: Attached scsi generic sg2 type 0
[   22.869628] sd 3:0:0:0: Attached scsi generic sg3 type 0
[   22.871473] sd 5:0:0:0: Attached scsi generic sg4 type 0
[   22.872143] sd 5:1:0:0: Attached scsi generic sg5 type 0
[   22.872544] sd 5:2:0:0: Attached scsi generic sg6 type 0
[   22.872889] sd 5:3:0:0: Attached scsi generic sg7 type 0
[   22.873228] sd 5:4:0:0: Attached scsi generic sg8 type 0
[   22.873605] ehci-pci 0000:00:12.2: USB 2.0 started, EHCI 1.00
[   22.873738] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[   22.873786] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   22.873836] usb usb1: Product: EHCI Host Controller
[   22.873881] usb usb1: Manufacturer: Linux 3.16.0-0.bpo.4-amd64 ehci_hcd
[   22.873927] usb usb1: SerialNumber: 0000:00:12.2
[   22.874176] hub 1-0:1.0: USB hub found
[   22.874237] hub 1-0:1.0: 5 ports detected
[   22.874799] ehci-pci 0000:00:13.2: EHCI Host Controller
[   22.874853] ehci-pci 0000:00:13.2: new USB bus registered, assigned bus number 2
[   22.874908] ehci-pci 0000:00:13.2: applying AMD SB700/SB800/Hudson-2/3 EHCI dummy qh workaround
[   22.874970] ehci-pci 0000:00:13.2: debug port 1
[   22.875060] ehci-pci 0000:00:13.2: irq 17, io mem 0xfe5ff400
[   22.875304] sd 5:5:0:0: Attached scsi generic sg9 type 0
[   22.875734] sd 5:6:0:0: Attached scsi generic sg10 type 0
[   22.885688] ehci-pci 0000:00:13.2: USB 2.0 started, EHCI 1.00
[   22.885832] usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
[   22.885885] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   22.885935] usb usb2: Product: EHCI Host Controller
[   22.885981] usb usb2: Manufacturer: Linux 3.16.0-0.bpo.4-amd64 ehci_hcd
[   22.886028] usb usb2: SerialNumber: 0000:00:13.2
[   22.886291] hub 2-0:1.0: USB hub found
[   22.886350] hub 2-0:1.0: 5 ports detected
[   22.886918] ehci-pci 0000:00:16.2: EHCI Host Controller
[   22.886972] ehci-pci 0000:00:16.2: new USB bus registered, assigned bus number 3
[   22.887028] ehci-pci 0000:00:16.2: applying AMD SB700/SB800/Hudson-2/3 EHCI dummy qh workaround
[   22.887090] ehci-pci 0000:00:16.2: debug port 1
[   22.887185] ehci-pci 0000:00:16.2: irq 17, io mem 0xfe5ff000
[   22.897569] ehci-pci 0000:00:16.2: USB 2.0 started, EHCI 1.00
[   22.897730] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002
[   22.897783] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   22.897836] usb usb3: Product: EHCI Host Controller
[   22.897883] usb usb3: Manufacturer: Linux 3.16.0-0.bpo.4-amd64 ehci_hcd
[   22.897932] usb usb3: SerialNumber: 0000:00:16.2
[   22.898214] hub 3-0:1.0: USB hub found
[   22.898272] hub 3-0:1.0: 4 ports detected
[   22.898847] ohci-pci 0000:00:12.0: OHCI PCI host controller
[   22.898914] ohci-pci 0000:00:12.0: new USB bus registered, assigned bus number 4
[   22.899017] ohci-pci 0000:00:12.0: irq 18, io mem 0xfe5fe000
[   22.957744] usb usb4: New USB device found, idVendor=1d6b, idProduct=0001
[   22.957823] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   22.957874] usb usb4: Product: OHCI PCI host controller
[   22.957920] usb usb4: Manufacturer: Linux 3.16.0-0.bpo.4-amd64 ohci_hcd
[   22.957966] usb usb4: SerialNumber: 0000:00:12.0
[   22.958434] hub 4-0:1.0: USB hub found
[   22.958495] hub 4-0:1.0: 5 ports detected
[   22.959085] ohci-pci 0000:00:13.0: OHCI PCI host controller
[   22.959142] ohci-pci 0000:00:13.0: new USB bus registered, assigned bus number 5
[   22.959237] ohci-pci 0000:00:13.0: irq 18, io mem 0xfe5fd000
[   22.972654] random: nonblocking pool is initialized
[   23.018494] usb usb5: New USB device found, idVendor=1d6b, idProduct=0001
[   23.018554] usb usb5: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   23.018605] usb usb5: Product: OHCI PCI host controller
[   23.018650] usb usb5: Manufacturer: Linux 3.16.0-0.bpo.4-amd64 ohci_hcd
[   23.018698] usb usb5: SerialNumber: 0000:00:13.0
[   23.019240] hub 5-0:1.0: USB hub found
[   23.019342] hub 5-0:1.0: 5 ports detected
[   23.019877] ohci-pci 0000:00:16.0: OHCI PCI host controller
[   23.019935] ohci-pci 0000:00:16.0: new USB bus registered, assigned bus number 6
[   23.020023] ohci-pci 0000:00:16.0: irq 18, io mem 0xfe5fc000
[   23.081765] usb usb6: New USB device found, idVendor=1d6b, idProduct=0001
[   23.081821] usb usb6: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   23.081871] usb usb6: Product: OHCI PCI host controller
[   23.081917] usb usb6: Manufacturer: Linux 3.16.0-0.bpo.4-amd64 ohci_hcd
[   23.081962] usb usb6: SerialNumber: 0000:00:16.0
[   23.082344] hub 6-0:1.0: USB hub found
[   23.082408] hub 6-0:1.0: 4 ports detected
[   23.785682] usb 4-2: new low-speed USB device number 2 using ohci-pci
[   23.967768] usb 4-2: New USB device found, idVendor=04d9, idProduct=1503
[   23.967824] usb 4-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[   23.967871] usb 4-2: Product: USB Keyboard
[   23.967915] usb 4-2: Manufacturer:  
[   23.973497] hidraw: raw HID events driver (C) Jiri Kosina
[   24.003112] usbcore: registered new interface driver usbhid
[   24.003200] usbhid: USB HID core driver
[   24.006819] input:   USB Keyboard as /devices/pci0000:00/0000:00:12.0/usb4/4-2/4-2:1.0/0003:04D9:1503.0001/input/input0
[   24.006968] hid-generic 0003:04D9:1503.0001: input,hidraw0: USB HID v1.10 Keyboard [  USB Keyboard] on usb-0000:00:12.0-2/input0
[   24.011935] input:   USB Keyboard as /devices/pci0000:00/0000:00:12.0/usb4/4-2/4-2:1.1/0003:04D9:1503.0002/input/input1
[   24.012130] hid-generic 0003:04D9:1503.0002: input,hidraw1: USB HID v1.10 Device [  USB Keyboard] on usb-0000:00:12.0-2/input1
[   24.233688] usb 4-5: new full-speed USB device number 3 using ohci-pci
[   24.400767] usb 4-5: New USB device found, idVendor=067b, idProduct=2303
[   24.400821] usb 4-5: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[   24.400869] usb 4-5: Product: USB-Serial Controller
[   24.400914] usb 4-5: Manufacturer: Prolific Technology Inc.
[   30.567918]  sdh: unknown partition table
[   30.568200]  sdj: unknown partition table
[   30.568393]  sdi: unknown partition table
[   30.569631] sd 5:4:0:0: [sdi] Attached SCSI disk
[   30.570218]  sde: unknown partition table
[   30.570481]  sdk: unknown partition table
[   30.571689] sd 5:5:0:0: [sdj] Attached SCSI disk
[   30.571875] sd 5:6:0:0: [sdk] Attached SCSI disk
[   30.572689] sd 5:3:0:0: [sdh] Attached SCSI disk
[   30.572778] sd 5:0:0:0: [sde] Attached SCSI disk

Adventures in entropy, part 2

Recap ^

Back in part 1 I discussed what entropy is as far as Linux is concerned, why I’ve started to look in to entropy as it relates to a Linux/Xen-based virtual hosting platform, how much entropy I have available, and how this might be improved.

If you didn’t read that part yet then you might want to do so, before carrying on with this part.

As before, click on any graph to see the full-size version.

Hosting server with an Entropy Key ^

Recently I colocated a new hosting server so it seemed like a good opportunity to try out the Entropy Key at the same time. Here’s what the available entropy looks like whilst ekeyd is running.

urquell.bitfolk.com available entropy with ekey, daily

First impressions are, this is pretty impressive. It hovers very close to 4096 bytes at all times. There is very little jitter.

Trying to deplete the entropy pool, while using an Entropy Key ^

As per Hugo’s comment in part 1, I tried watch -n 0.25 cat /proc/sys/kernel/random/entropy_avail to see if I could deplete the entropy pool, but it had virtually no effect. I tried with watch -n 0.1 cat /proc/sys/kernel/random/entropy_avail (so every tenth of a second) and the available entropy fluctuated mostly around 4000 bytes with a brief dip to ~3600 bytes:

urquell.bitfolk.com available entropy with ekey, trying to deplete the pool

In the above graph, the first watch invocation was at ~1100 UTC. The second one was at ~1135 UTC.

Disabling the Entropy Key ^

Unfortunately I forgot to get graphs of urquell before the ekeyd was started, so I have no baseline for this machine.

I assumed it would be the same as all the other host machines, but decided to shut down ekeyd to verify that. Here’s what happened.

urquell.bitfolk.com available entropy with ekeyd shut down, daily

The huge chasm of very little entropy in the middle of this graph is urquell running without an ekeyd. At first I was at a loss to explain why it should only have ~400 bytes of entropy by itself, when the other hosting servers manage somewhere between 3250 and 4096 bytes.

I now believe that it’s because urquell is newly installed and has no real load. Looking into how modern Linux kernels obtain entropy, it’s basically:

  • keyboard interrupts;
  • mouse interrupts;
  • other device driver interrupts with the flag IRQF_SAMPLE_RANDOM.

Bear in mind that headless servers usuallly don’t have a mouse or keyboard attached!

You can see which other drivers are candidates for filling up the entropy pool by looking where the IRQF_SAMPLE_RANDOM identifier occurs in the source of the kernel:

http://www.cs.fsu.edu/~baker/devices/lxr/http/ident?i=IRQF_SAMPLE_RANDOM

(as an aside, in 2.4.x kernels, most of the network interface card drivers had IRQF_SAMPLE_RANDOM and then they all got removed through the 2.6.x cycle since it was decided that IRQF_SAMPLE_RANDOM is really only for interrupts that can’t be observed or tampered with by an outside party. That’s why a lot of people reported problems with lack of entropy after upgrading their kernels.)

My hosting servers are typically Supermicro motherboards with Intel gigabit NICs and 3ware RAID controller. The most obvious device in the list that could be supplying entropy is probably block/xen-blkfront since there’s one of those for each block device exported to a Xen virtual machine on the system.

To test the hypothesis that the other servers are getting entropy from busy Xen block devices, I shut down ekeyd and then hammered on a VM filesystem:

urquell.bitfolk.com available entropy with ekeyd shut down, hammering a VM filesystem

The increase you see towards the end of the graph was while I was hammering the virtual machine’s filesystem. I was able to raise the available entropy to a stable ~2000 bytes doing this, so I’m satisfied that if urquell were as busy as the other servers then it would have similar available entropy to them, even without the Entropy Key.

Feeding entropy to other hosts ^

ekeyd by default feeds entropy from the key directly into the Linux kernel of the host it’s on, but it can be configured to listen on a Unix or TCP socket and mimic the egd protocol. I set it up this way and then put an instance of HAProxy into a VM with my ekeyd as a back end. So at this point I had a service IP which would talk egd protocol, and client machines could use to request entropy.

On the client side, ekeyd-egd-linux can be found in Debian lenny-backports and in Debian squeeze, as well as Ubuntu universe since Jaunty. This daemon can read from a Unix or TCP socket using the egd protocol and will feed the received entropy into the Linux kernel.

I took a look at which of my VMs had the lowest available entropy and installed ekeyd-egd-linux on them, pointing it at my entropy service IP:

admin.obstler.bitfolk.com available entropy after hooking up to entropy service

panel0.bitfolk.com available entropy after hooking up to entropy service

spamd0.lon.bitfolk.com available entropy after hooking up to entropy service

Success!

Where next? ^

  • Get some customers using it, explore the limits of how much entropy can be served.
  • Buy another Entropy Key so that it doesn’t all grind to a halt if one of them should die.
  • Investigate a way to get egd to read from another egd so I can serve the entropy directly from a VM and not have so many connections to my real hardware. Anyone interested in coding that?
  • Monitor the served entropy both for availability and for quality.

Adventures in entropy, part 1

A while back, a couple of BitFolk customers mentioned to me that they were having problems running out of entropy.

A brief explanation of entropy as it relates to computing ^

Where we say entropy, we could in layman’s terms say “randomness”. Computers need entropy for a lot of things, particularly cryptographic operations. You may not think that you do a lot of cryptography on your computer, and you personally probably don’t, but for example every time you visit a secure web site (https://…) your computer has to set up a cryptographic channel with the server. Cryptographic algorithms generally require a lot of random data and it has to be secure random data. For the purposes of this discussion, “secure” means that an attacker shouldn’t be able to guess or influence what the random data is.

Why would an attacker be able to guess or influence the random data if it is actually random? Because it’s not actually random. The computer has to get the data from somewhere. A lot of places it might be programmed to get it from may seem random but potentially aren’t. A silly implementation might just use the number of seconds the computer has been running as a basis for generating “random” numbers, but you can see that an attacker can guess this and may even be able to influence it, which could weaken any cryptographic algorithm that uses the “random” data.

Modern computers and operating systems generate entropy based on events like electrical noise, timings of data coming into the computer over the network, what’s going on with the disks, etc. fed into algorithms — what we call pseudo-random number generators (PRNGs). A lot of data goes in and a relatively small amount of entropy comes out, but it’s entropy you should be able to trust.

That works reasonably well for conventional computers and servers, but it doesn’t work so well for virtual servers. Virtual servers are running in an emulated environment, with very little access to “real” hardware. The random data that conventional computers get from their hardware doesn’t happen with emulated virtual hardware, so the prime source of entropy just isn’t present.

When you have an application that wants some entropy and the system has no more entropy to give, what usually happens is that the application blocks, doing nothing, until the system can supply some more entropy. Linux systems have two ways for applications to request entropy: there’s /dev/random and /dev/urandom. random is the high-quality one. When it runs out, it blocks until there is more available. urandom will supply high-quality entropy until it runs out, then it will generate more programmatically, so it doesn’t block, but it might not be as secure as random. I’m vastly simplifying how these interfaces work, but that’s the basic gist of it.

What to do when there’s no more entropy? ^

If you’re running applications that want a lot of high-quality entropy, and your system keeps running out, there’s a few things you could do about it.

Nothing ^

So stuff slows down, who cares? It’s only applications that want high-quality entropy and they’re pretty specialised, right?

Well, no, not really. If you’re running a busy site with a lot of HTTPS connections then you probably don’t want it to be waiting around for more entropy when it could be serving your users. Another one that tends to use all the entropy is secure email – mail servers talking to each other using Transport Layer Security so the email is encrypted on the wire.

Use real hosting hardware ^

Most of BitFolk’s customers are using it for personal hosting, this problem is common to virtual hosting platforms (it’s not a BitFolk-specific issue), and BitFolk doesn’t provide dedicated/colo servers, so arguably I don’t need to consider this my problem to fix. If the customer could justify greater expense then they could move to a dedicated server or colo provider to host their stuff.

Tell the software to use urandom instead ^

In a lot of cases it’s possible to tell the applications to use urandom instead. Since urandom doesn’t block, but instead generates more lower-quality entropy on demand, there shouldn’t be a performance problem. There are obvious downsides to this:

  • If the application author wanted high-quality entropy, it might be unwise to not respect that.
  • Altering this may not be as simple as changing its configuration. You might find yourself having to recompile the software, which is a lot of extra work.

You could force this system-wide by replacing your /dev/random with /dev/urandom.

Customers could get some more entropy from somewhere else ^

It’s possible to feed your own data into your system’s pseudo-random number generator, so if you have a good source of entropy you can help yourself. People have used some weird and wonderful things for entropy sources. Some examples:

  • A sound card listening to electro-magnetic interference (“static”).
  • A web camera watching a lava lamp.
  • A web camera in a dark box, so it just sees noise on its CCD.

The problem for BitFolk customers of course is that all they have is a virtual server. They can’t attach web cams and sound cards to their servers! If they had real servers then they probably wouldn’t be having this issue at all.

BitFolk could get some entropy from somewhere else, and serve it to customers ^

BitFolk has the real servers, so I could do the above to get some extra entropy. I might not even need extra entropy; I could just serve the entropy that the real machines have. If it wasn’t for the existence of the Simtec Electronics Entropy Key then that’s probably what I’d be trying.

I haven’t got time to be playing about with sound cards listening to static, webcams in boxes and things like that, but buying a relatively cheap little gadget is well within the limit of things I’m prepared to risk wasting money on. 🙂

Customers would need to trust my entropy, of course. They already need to trust a lot of other things that I do though.

Entropy Key ^

Entropy Keys are very interesting little gadgets and I encourage you to read about how they work. It’s all a bit beyond me though, so for the purposes of this series of blog posts I’ll just take it as read that you plug in an Entropy Key into a USB port, run ekeyd and it feeds high quality entropy into your PRNG.

I’d been watching the development of the Entropy Key with interest. When they were offered for cheap at the Debian-UK BBQ in 2009 I was sorely tempted, but I knew I wasn’t going to be able to attend, so I left it.

Then earlier this year, James at Jump happened to mention that he was doing a bulk order (I assume to fix this same issue for his own VPS customers) if anyone wanted in. Between the Debian BBQ and then I’d had a few more complaints about people running out of entropy so at ~£30 each I was thinking it was definitely worth exploring with one of them; perhaps buy more if it works.

How much entropy do I have anyway? ^

Before stuffing more entropy in to my systems, I was curious how much I had available anyway. On Linux you can check this by looking at /proc/sys/kernel/random/entropy_avail. I think this value is in bytes, and tops out at 4096. Not hard to plug this in to your graphing system.

Click on the following images to see the full-size versions.

Typical host server, no Entropy Key ^

Here’s what some typical BitFolk VM hosting servers have in terms of available entropy.

barbar.bitfolk.com available entropy, daily

That’s pretty good. The available entropy hovers close to 4096 bytes all the time. It’s what you’d expect from a typical piece of computer hardware. The weekly view shows the small jitter:

barbar.bitfolk.com available entropy, weekly

The lighter pink area is the highest 5-minute reading in each 30 minute sample. The dark line is the lowest 5-minute reading. You can see that there is a small amount of jitter where the available entropy fluctuates between about 3250 and 4096 bytes.

Here’s a couple of the other host servers just to see the pattern:

corona.bitfolk.com available entropy, daily

corona.bitfolk.com available entropy, weekly

faustino.bitfolk.com available entropy, daily

faustino.bitfolk.com available entropy, weekly

No surprises here; they’re all much the same. If these were the only machines I was using then I’d probably decide that I have enough entropy.

Typical general purpose Xen-based paravirtualised virtual machine ^

Here’s a typical general purpose BitFolk VPS. It’s doing some crypto stuff, but there’s a good mix of every type of workload here.

bitfolk.com available entropy, daily

bitfolk.com available entropy, weekly

These graphs are very different. There’s much more jitter and a general lack of entropy to begin with. Still, it never appears to reach zero (although it’s important to realise that these graphs are at best 5-minute averages, so the minimum and maximum values will be lower and higher within that 5-minute span) so there doesn’t seem to be a huge problem here.

Virtual machines with more crypto ^

Here’s a couple of VMs which are doing more SSL work.

cacti.bitfolk.com available entropy, daily

cacti.bitfolk.com available entropy, weekly

This one has a fair number of web visitors and they’re all HTTPS. You can see that it’s even more jittery, and spends most of its time with less than 1024 bytes of entropy available. It goes as low as ~140 bytes from time to time, and because of the 5-minute sampling it’s possible that it does run out.

panel0.bitfolk.com available entropy, daily

panel0.bitfolk.com available entropy, weekly

Again, this one has some HTTPS traffic and is faring worse for entropy, with an average of only ~470 bytes available. I ran a check every second for several hours and available entropy at times was as low as 133 bytes.

Summary so far ^

BitFolk doesn’t have any particularly busy crypto-heavy VMs so the above was the best I could do. I think that I’ve shown that virtual machines do have less entropy generally available, and that a moderate amount of crypto work can come close to draining it.

Based on the above results I probably wouldn’t personally take any action since it seems none of my own VMs run out of entropy, although I am unsure if the 133 bytes I measured was merely as low as the pool is allowed to go before blocking happens. In any case, I am not really noticing poor performance.

Customers have reported running out of entropy though, so it might still be something I can fix, for them.

Where next? ^

Next:

  • See what effect using an Entropy Key has on a machine’s available entropy.
  • Assuming it has a positive effect, see if I can serve this entropy to other machines, particularly virtual ones.
  • Can I serve it from a virtual machine, so I don’t have customers interacting with my real hosts?
  • Does one Entropy Key give enough entropy for everyone that wants it?
  • Can I add extra keys and serve their entropy in a highly-available fashion?

Those are the things I’ll be looking into and will blog some more about in later parts. This isn’t high priority though so it might take a while. In the meantime, if you’re a BitFolk customer who actually is experiencing entropy exhaustion in a repeatable fashion then it’d be great if you could get in touch with me so we can see if it can be fixed.

In part 2 of this series of posts I do get the key working and serve entropy to my virtual machines.

Gating Twitter to IRC using Perl and Net::Twitter

The other day I had a strop about someone carrying on a Twitter conversation in IRC, since I didn’t have a Twitter account and had no idea what people were talking about. It was at that point that I floated the idea of gating the channel members’ tweets into the channel so I would still only have to look at IRC to follow what was going on. To me not really any more anoying or intrusive than announcing our blog posts, which the planet bot already does.

People were generally in favour of it, so I had a look around for libraries. I quickly found the very handy Net::Twitter Perl module and bolted on the Twitter bits to an existing quote bot skeleton.

The result is Twitfolk. It currently lives in #bitfolk on irc.bitfolk.com. The source is in SVN at https://svn.bitfolk.com/repos/twitfolk/.

PS. I am not a very good Perl programmer but it seems to work.