OpenLDAP and md5crypt

I’ve got some machines which authenticate their local users against OpenLDAP, and I wanted to reset some passwords from a Perl script.

First I tried just calling modify from Net::LDAP. That worked but just set the new password as plain text. My passwords appear to be “md5crypt”, and normally look like this:

{CRYPT}$1$fywXcrPC$Uakrx8POGBf1WM9l6mkG6/

I could come up with some code to create the correct hash on the machine where I was running the Perl script, but I really wanted the LDAP server to do it for me, for consistency.

A little bit of searching revealed Net::LDAP::Extension::SetPassword, so I gave that a go. Well, that was progress, but it set MD5 passwords. They look like this:

{MD5}13dmpYRmooMYt50wdZBpSQ==

Why did it just decide to use MD5? The answer’s in the OpenLDAP FAQ. password-hash was indeed set to {md5} on the server.

Right, OK, so set password-hash to {md5crypt} then? No! It does not accept that value. It does accept {crypt}, but that ends up like:

{CRYPT}Q.nfbCdTMBuGU

It seems to have the right hash type ({CRYPT}) but it’s much shorter. It’s the POSIX crypt(3) based on DES. Not quite what I wanted.

The eventual answer was found in the archives of the openldap-software mailing list from almost 8 years ago! So once slapd.conf contained:

password-hash  {CRYPT}
password-crypt-salt-format "$1$%.8s"

the correct password hash was generated.

How did I know about the “.8” bit? In an md5crypt hash, the characters between the $1$ and the next $ are the salt, and there’s 8 of them, so that’s why .8s.

Linux software RAID hot swap disk replacement

One of BitFolk’s servers in the US has had first one and then two dead disks for quite some time. It has a 4 disk software RAID-10, so by pure luck it was still running. Obviously as soon as a disk breaks you really should replace it, preferably with a hot spare. I was very lucky that the second disk failure wasn’t from the same half of the RAID-10 (resulting in downtime and restore from backup). There’s no customer data or customer-facing services on this machine though, so I let it slide for far too long.

Yesterday morning Graham was visiting the datacenter and kindly agreed to replace the disks for me. As it happens I don’t have that much experience of software RAID since the production machines I’ve worked on tend to have hardware RAID and the home ones tend not to be hot swap. It didn’t go entirely smoothly, but I think it was my fault.

The server chassis doesn’t support drive identification (e.g. by turning a light on) so I had to generate some disk activity so that Graham could see which drive lights weren’t blinking. It was easy enough for him to spot that slots 0 and 1 were still blinking away with slots 2 and 3 dead. I checked /proc/mdstat to ensure that those disks weren’t still present in any of the arrays. If they had been then I would have done:

$ sudo mdadm --fail /dev/mdX /dev/sdbX

to remove each one.

They weren’t present, so I gave Graham the go-ahead to pull the hot swap drive trays out.

At first the server didn’t notice anything. I thought this was bad as I would like it to notice! This was confirmed to be bad when all disk IO blocked and the load went through the roof.

I think what I had forgotten to do was to remove the devices from the SCSI subsystem as described in this article. So for me, it would have been something like:

$ for disk in sd{a,b,c,d}; do echo -n "$disk: "; ls -d /sys/block/$disk/device/scsi_device*; done
sda: /sys/block/sda/device/scsi_device:0:0:0:0
sdb: /sys/block/sdb/device/scsi_device:0:0:1:0
sdc: /sys/block/sdc/device/scsi_device:1:0:0:0
sdd: /sys/block/sdd/device/scsi_device:1:0:1:0

From /proc/mdstat I knew it was sdb and sdd that were broken. I think I should have done:

$ sudo sh -c 'echo "scsi remove-single-device" 0 0 1 0 > /proc/scsi/scsi'
$ sudo sh -c 'echo "scsi remove-single-device" 1 0 1 0 > /proc/scsi/scsi'

Anyway, at the time what I had was a largely unresponsive server. I used Magic Sysrq to sync, mount filesystems read-only and then reboot. In Cernio’s console this would normally be “~b” to send a break, but Xen uses “ctrl-o“. So that was ctrl-o s to sync, ctrl-o u to remount read-only and then ctrl-o b to reboot the system.

Graham had by then taken the dead disks out of the caddies and replaced with new, re-inserted them and powered the server back on.

Happily it did come back up fine, I then had to set about adding the new disks to the arrays.

I’d already been forewarned that the new disks had 488397168 sectors whereas the existing ones had 490234752 — both described as 250GB of course! A difference of some 890MiB, despite them both being from the same manufacturer, from the same range even. I didn’t bother adding a swap partition on the two new disks which made them just about big enough for everything else.

$ sudo mdadm --add /dev/md1 /dev/sdb1
mdadm: Cannot open /dev/sdb1: Device or resource busy

Oh dear!

After lengthy googling, this article gave me a clue.

$ sudo mulitpath -l
SATA_WDC_WD2500SD-01WD-WCAL72844661dm-1 ATA,WDC WD2500SD-01K
[size=233G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 1:0:1:0 sdd 8:48  [active][undef]
SATA_WDC_WD2500SD-01WD-WCAL72802716dm-0 ATA,WDC WD2500SD-01K
[size=233G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 0:0:1:0 sdb 8:16  [active][undef]

There’s my disks!

Stopping multipath daemon didn’t help. Running multipath -F did help.

The usual

$ sudo mdadm --add /dev/md1 /dev/sdb1
$ sudo mdadm --add /dev/md1 /dev/sdd1
$ sudo mdadm --add /dev/md3 /dev/sdb3
$ sudo mdadm --add /dev/md3 /dev/sdd3
$ sudo mdadm --add /dev/md5 /dev/sdb5
$ sudo mdadm --add /dev/md5 /dev/sdd5

worked fine after that.

I hope that was useful to someone. I’ll be practising it some more on some spare hardware here to see if the fiddling with /proc/scsi/scsi really does work.

Update:

Dominic (author of the linked article about dm-multipath) says:

I think there’s also a “remove” or “delete” file you can echo to in the /sys device directory, bit more friendly than talking to /proc/scsi/scsi.

and provides this snippet for multipath.conf which should disable multipath:

# Blacklist all devices by default. Remove this to enable multipathing
# on the default devices.
blacklist {
        devnode "*"
}

THING!

I always find it adorable when Jenny talks to me in her sleep, especially when my responses obviously provoke a reaction without waking her up.

Out of nowhere just now:

Her: Erm. Erm. If you’ve separated all the wood stuff how are you going to separate the rest?

Me: What wood stuff?

Her: From my bit.

(voice manages to convey mild irritation at my lack of understanding)

Me: Your bit of what?

Her: Thing!

(Sleep-Jenny clearly losing patience)

Me: Okay then. We’ll work it out.

Her: Good.

(the world has been set to rights)

In the morning I shall endeavour to find out what her bit is and what apart from wood needs to be separated from it.

Feltham Airparcs leisure centre FAIL

Feltham Airparcs leisure centre has for the last 2 weeks — and ongoing — closed at 4pm, instead of 10pm, because the emergency lighting doesn’t work.

The actual lighting works fine, it’s just that if the lighting did fail then there’d be no emergency lights directing the shallow end of the gene pool to safety.

So the staff close the place up as soon as it starts to get a bit dusky out.