Tracking down the lvmcache fix

Background ^

In the previous article I covered how, in order to get decent performance out of lvmcache with a packaged Debian kernel, you’d have to use the 4.12.2-1~exp1 kernel from experimental. The kernels packaged in sid, testing (buster) and stable (stretch) aren’t new enough.

I decided to bisect the Linux kernel upstream git repository to find out exactly which commit(s) fixed things.

Results ^

Here’s a graph showing the IOPS over time for baseline SSD and lvmcache with a full cache under several different kernel versions. As in previous articles, the lines are actually Bezier curves fitted to the data which is scattered all over the place from 500ms averages.

What we can see here is that performance starts to improve with commit 4d44ec5ab751 authored by Joe Thornber:

dm cache policy smq: put newly promoted entries at the top of the multiqueue

This stops entries bouncing in and out of the cache quickly.

This is part of a set of commits authored by Joe Thornber on the drivers/md/dm-cache-policy-smq.c file and committed on 2017-05-14. By the time we reach commit 6cf4cc8f8b3b we have the long-term good performance that we were looking for.

The first of Joe Thornber’s commits on that day in the dm-cache area was 072792dcdfc8 and stepping back to the commit immediately prior to that one (2ea659a9ef48) we get a kernel representing the moment that Linus designated the v4.12-rc1 tag. Joe’s commits went into -rc1, and without them the performance of lvmcache under these test conditions isn’t much better than baseline HDD.

It seems like some of Joe’s changes helped a lot and then the last one really provided the long term performance.

git bisect procedure ^

Normally when you do a git bisect you’re starting with something that works and you’re looking for the commit that introduced a bug. In this case I was starting off with a known-good state and was interested in which commit(s) got me there. The normal bisect key words of “good” and “bad” in this case would be backwards to what I wanted. Dominic gave me the tip that I could alias the terms in order to reduce my confusion:

$ git bisect start --term-old broken --term-new fixed

From here on, when I encountered a test run that produced poor results I would issue:

$ git bisect broken

and when I had a test run with good results I would issue:

$ git bisect fixed

As I knew that the tag v4.13-rc1 produced a good run and v4.11 was bad, I could start off with:

$ git bisect reset v4.13-rc1
$ git bisect fixed
$ git bisect broken v4.11

git would then keep bisecting the search space of commits until I would find the one(s) that resulted in the high performance I was looking for.

Good and bad? ^

As before I’m using fio to conduct the testing, with the same job specification:


The only difference from the other articles was that the run time was reduced to 15 minutes as all of the interesting behaviour happened within the first 11 minutes.

To recap, this fio job specification lays out two 2GiB files of random data and then starts two processes that perform 4kiB-sized reads against the files. Direct IO is used, in order to bypass the page cache.

A Zipfian distribution with a factor of 1.2 is used; this gives a 90/10 split where about 90% of the reads should come from about 10% of the data. The purpose of this is to simulate the hot spots of popular data that occur in real life. If the access pattern were to be perfectly and uniformly random then caching would not be effective.

In previous tests we had observed that dramatically different performance would be seen on the first run against an empty cache device compared to all other subsequent runs against what would be a full cache device. In the tests using kernels with the fix present the IOPS achieved would converge towards baseline SSD performance, whereas in kernels without the fix the performance would remain down near the level of baseline HDD. Therefore the fio tests were carried out twice.

Where to next? ^

I think I am going to see what happens when the cache device is pretty small in comparison to the working data.

All of the tests so far have used a 4GiB cache with 4GiB of data, so if everything got promoted it would entirely fit in cache. Not only that but the Zipf distribution makes most of the hits come from 10% of the data, so it’s actually just ~400MiB of hot data. I think it would be interesting to see what happens when the hot 10% is bigger than the cache device.

git bisect progress and test output ^

Unless you are particularly interested in the fio output and why I considered each one to be either fixed or broken, you probably want to stop reading now.

Continue reading “Tracking down the lvmcache fix”

Using a TOTP app for multi-factor SSH auth

I’ve been playing around with enabling multi-factor authentication (MFA) on web services and went with TOTP. It’s pretty simple to implement in Perl, and there are plenty of apps for it including Google Authenticator, 1Password and others.

I also wanted to use the same multi-factor auth for SSH logins. Happily, from Debian jessie onwards libpam-google-authenticator is packaged. To enable it for SSH you would just add the following:

auth required

to /etc/pam.d/sshd (put it just after @include common-auth).

and ensure that:

ChallengeResponseAuthentication yes

is in /etc/ssh/sshd_config.

Not all my users will have MFA enabled though, so to skip prompting for these I use:

auth required nullok

Finally, I only wanted users in a particular Unix group to be prompted for an MFA token so (assuming that group was totp) that would be:

auth [success=1 default=ignore] quiet user notingroup totp
auth required nullok

If the pam_succeed_if conditions are met then the next line is skipped, so that causes pam_google_authenticator to be skipped for users not in the group totp.

Each user will require a TOTP secret key generating and storing. If you’re only setting this up for SSH then you can use the google-authenticator binary from the libpam-google-authenticator package. This asks you some simple questions and then populates the file $HOME/.google_authenticator with the key and some configuration options. That looks like:

" RATE_LIMIT 3 30 1462548404

The first line is the secret key; the five numbers are emergency codes that will always work (once each) if locked out.

If generating keys elsewhere then you can just populate this file yourself. If the file isn’t present then that’s when “nullok” applies; without “nullok” authentication would fail.

Note that despite the repeated mentions of “google” here, this is not a Google-specific service and no data is sent to Google. Google are the authors of the open source Google Authenticator mobile app and the libpam-google-authenticator PAM module, but (as evidenced by the Perl example) this is an open standard and client and server sides can be implemented in any language.

So that is how you can make a web service and an SSH service use the same TOTP multi-factor authentication.

Your Debian netboot suddenly can’t do Ext4?

If, like me, you’ve just done a Debian netboot install over PXE and discovered that the partitioner suddenly seems to have no option for Ext4 filesystem (leaving only btrfs and XFS), despite the fact that it worked fine a couple of weeks ago, do not be alarmed. You aren’t losing your mind. It seems to be a bug.

As the comment says, downloading netboot.tar.gz version 20150422+deb8u3 fixes it. You can find your version in the debian-installer/amd64/boot-screens/f1.txt file. I was previously using 20150422+deb8u1 and the commenter was using 20150422+deb8u2.

Looking at the dates on the files I’m guessing this broke on 23rd January 2016. There was a Debian point release around then, so possibly you are supposed to download a new netboot.tar.gz with each one – not sure. Although if this is the case it would still be nice to know you’re doing something wrong as opposed to having the installer appear to proceed normally except for denying the existence of any filesystems except XFS and btrfs.

Oh and don’t forget to restart your TFTP daemon. tftpd-hpa at least seems to cache things (or maybe hold the tftp directory open, as I had just moved the old directory out of the way), so I was left even more confused when it still seemed to be serving 20150422+deb8u1.

Installing Debian by PXE using Supermicro IPMI Serial over LAN

Here’s how to install Debian jessie on a Supermicro server using PXE boot and the IPMI serial-over-LAN.

Using these instructions you will be able to complete an install of a remote machine, although you will initially need access to the BIOS to configure the IPMI part.

BIOS settings ^

This bit needs you to be in the same location as the machine, or else have someone who is make the required changes.

Press DEL to go into the BIOS configuration.

Under Advanced > PCIe/PCI/PnP Configuration make sure that the network interface through which you’ll reach your PXE server has the “PXE” option ROM set:


Under Advanced > Serial Port Console Redirection you’ll want to enable SOL Console Redirection.

BIOS serial console redirection

(Pictured here is also COM1 Console Redirection. This is for the physical serial port on the machine, not the one in the IPMI.)

Under SOL Console Redirection Settings you may as well set the Bits per second to 115200.

BIOS SOL redirection settings

Now it’s time to configure the IPMI so you can interact with it over the network. Under IPMI > BMC Network Configuration, put the IPMI on your management network:

IPMI network configuration

Connecting to the IPMI serial ^

With the above BIOS settings in place you should be able to save and reboot and then connect to the IPMI serial console. The default credentials are ADMIn / ADMIN which you should of course change with ipmitool, but that is for a different post.

There’s two ways to connect to the serial-over-LAN: You can ssh to the IPMI controller, or you can use ipmitool. Personally I prefer ssh, but the ipmitool way is like this:

$ ipmitool -I lanplus -H -U ADMIN -a sol activate

The ssh way:

$ ssh ADMIN@
The authenticity of host ' (' can't be established.
RSA key fingerprint is b7:e1:12:94:37:81:fc:f7:db:6f:1c:00:e4:e0:e1:c4.
Are you sure you want to continue connecting (yes/no)?
Warning: Permanently added ',' (RSA) to the list of known hosts.
ADMIN@'s password:
ATEN SMASH-CLP System Management Shell, version 1.05
Copyright (c) 2008-2009 by ATEN International CO., Ltd.
All Rights Reserved 
-> cd /system1/sol1
-> start
press <Enter>, <Esc>, and then <T> to terminate session
(press the keys in sequence, one after the other)

They both end up displaying basically the same thing.

The serial console should just be displaying the boot process, which won’t go anywhere.

DHCP and TFTP server ^

You will need to configure a DHCP and TFTP server on an already-existing machine on the same LAN as your new server. They can both run on the same host.

The DHCP server responds to the initial requests for IP address configuration and passes along where to get the boot environment from. The TFTP server serves up that boot environment. The boot environment here consists of a kernel, initramfs and some configuration for passing arguments to the bootloader/kernel. The boot environment is provided by the Debian project.


I’m using isc-dhcp-server. Its configuration file is at /etc/dhcp/dhcpd.conf.

You’ll need to know the MAC address of the server, which can be obtained either from the front page of the IPMI controller’s web interface (i.e. in this case) or else it is displayed on the serial console when it attempts to do a PXE boot. So, add a section for that:

subnet netmask {
host foo {
    hardware ethernet 0C:C4:7A:7C:28:40;
    filename "pxelinux.0";
    option subnet-mask;
    option routers;

Here we set the network configuration of the new server with fixed-address, option subnet-mask and option routers. The IP address in next-server refers to the IP address of the TFTP server, and pxelinux.0 is what the new server will download from it.

Make sure that is running:

# service isc-dhcp-server start

DHCP uses UDP port 67, so make sure that is allowed through your firewall.


A number of different TFTP servers are available. I use tftpd-hpa, which is mostly configured by variables in /etc/default/tftp-hpa:


TFTP_DIRECTORY is where you’ll put the files for the PXE environment.

Make sure that the TFTP server is running:

# service tftpd-hpa start

TFTP uses UDP port 69, so make sure that is allowed through your firewall.

Download the netboot files from your local Debian mirror:

$ cd /srv/tftp
$ curl -s | sudo tar zxvf -

(This assumes you are installing a device with architecture amd64.)

At this point your TFTP server root should contain a debian-installer subdirectory and a couple of links into it:

$ ls -l .
total 8
drwxrwxr-x 3 root root 4096 Jun  4  2015 debian-installer
lrwxrwxrwx 1 root root   47 Jun  4  2015 ldlinux.c32 -> debian-installer/amd64/boot-screens/ldlinux.c32
lrwxrwxrwx 1 root root   33 Jun  4  2015 pxelinux.0 -> debian-installer/amd64/pxelinux.0
lrwxrwxrwx 1 root root   35 Jun  4  2015 pxelinux.cfg -> debian-installer/amd64/pxelinux.cfg
-rw-rw-r-- 1 root root   61 Jun  4  2015

You could now boot your server and it would call out to PXE to do its netboot, but would be displaying the installer process on the VGA output. If you intend to carry it out using the Remote Console facility of the IPMI interface then that may be good enough. If you want to do it over the serial-over-LAN though, you’ll need to edit some of the files that came out of the netboot.tar.gz to configure that.

Here’s a list of the files you need to edit. All you are doing in each one is telling it to use serial console. The changes are quite mechanical so you can easily come up with a script to do it, but here I will show the changes verbosely. All the files live in the debian-installer/amd64/boot-screens/ directory.

ttyS1 is used here because this system has a real serial port on ttyS0. 115200 is the baud rate of ttyS1 as configured in the BIOS earlier.



label expert
        menu label ^Expert install
        kernel debian-installer/amd64/linux
        append priority=low vga=788 initrd=debian-installer/amd64/initrd.gz --- 
include debian-installer/amd64/boot-screens/rqtxt.cfg
label auto
        menu label ^Automated install
        kernel debian-installer/amd64/linux
        append auto=true priority=critical vga=788 initrd=debian-installer/amd64/initrd.gz --- quiet


label expert
        menu label ^Expert install
        kernel debian-installer/amd64/linux
        append priority=low console=ttyS1,115200n8 initrd=debian-installer/amd64/initrd.gz --- 
include debian-installer/amd64/boot-screens/rqtxt.cfg
label auto
        menu label ^Automated install
        kernel debian-installer/amd64/linux
        append auto=true priority=critical console=ttyS1,115200n8 initrd=debian-installer/amd64/initrd.gz --- quiet


label rescue
        menu label ^Rescue mode
        kernel debian-installer/amd64/linux
        append vga=788 initrd=debian-installer/amd64/initrd.gz rescue/enable=true --- quiet


label rescue
        menu label ^Rescue mode
        kernel debian-installer/amd64/linux
        append console=ttyS1,115200n8 initrd=debian-installer/amd64/initrd.gz rescue/enable=true --- quiet


# D-I config version 2.0
# search path for the c32 support libraries (libcom32, libutil etc.)
path debian-installer/amd64/boot-screens/
include debian-installer/amd64/boot-screens/menu.cfg
default debian-installer/amd64/boot-screens/vesamenu.c32
prompt 0
timeout 0


serial 1 115200
console 1
# D-I config version 2.0
# search path for the c32 support libraries (libcom32, libutil etc.)
path debian-installer/amd64/boot-screens/
include debian-installer/amd64/boot-screens/menu.cfg
default debian-installer/amd64/boot-screens/vesamenu.c32
prompt 0
timeout 0


default install
label install
        menu label ^Install
        menu default
        kernel debian-installer/amd64/linux
        append vga=788 initrd=debian-installer/amd64/initrd.gz --- quiet


default install
label install
        menu label ^Install
        menu default
        kernel debian-installer/amd64/linux
        append console=ttyS1,115200n8 initrd=debian-installer/amd64/initrd.gz --- quiet

Perform the install ^

Connect to the serial-over-LAN and get started. If the server doesn’t have anything currently installed then it should go straight to trying PXE boot. If it does have something on the storage that it would boot then you will have to use F12 at the BIOS screen to convince it to jump straight to PXE boot.

$ ssh ADMIN@
ADMIN@'s password:
ATEN SMASH-CLP System Management Shell, version 1.05
Copyright (c) 2008-2009 by ATEN International CO., Ltd.
All Rights Reserved 
-> cd /system1/sol1
-> start
press <Enter>, <Esc>, and then <T> to terminate session
(press the keys in sequence, one after the other)
Intel(R) Boot Agent GE v1.5.13                                                  
Copyright (C) 1997-2013, Intel Corporation                                      
CLIENT MAC ADDR: 0C C4 7A 7C 28 40  GUID: 00000000 0000 0000 0000 0CC47A7C2840  
CLIENT IP:  MASK:  DHCP IP:             
PXELINUX 6.03 PXE 20150107 Copyright (C) 1994-2014 H. Peter Anvin et al    
                 │ Debian GNU/Linux installer boot menu  │
                 │ Install                               │
                 │ Advanced options                    > │
                 │ Help                                  │
                 │ Install with speech synthesis         │
                 │                                       │
                 │                                       │
                 │                                       │
                 │                                       │
                 │                                       │
                 │                                       │
              Press ENTER to boot or TAB to edit a menu entry
  ┌───────────────────────┤ [!!] Select a language ├────────────────────────┐
  │                                                                         │
  │ Choose the language to be used for the installation process. The        │
  │ selected language will also be the default language for the installed   │
  │ system.                                                                 │
  │                                                                         │
  │ Language:                                                               │
  │                                                                         │
  │                               C                                         │
  │                               English                                   │
  │                                                                         │
  │     <Go Back>                                                           │
  │                                                                         │
<Tab> moves; <Space> selects; <Enter> activates buttons

…and now the installation proceeds as normal.

At the end of this you should be left with a system that uses ttyS1 for its console. You may need to tweak that depending on whether you want the VGA console also.

systemd on Debian, reading the persistent system logs as a user

All the documentation and guides I found say that to enable a persistent journal on Debian you just need to create /var/log/journal. It is true that once you create that directory you will get a persistent journal.

All the documentation and guides I found say that as long as you are in group adm (or sometimes they say group systemd-journal) it is possible to see all system logs by just typing journalctl, without having to run it as root. Having simply done mkdir /var/log/journal I can tell you that is not the case. All you will see is logs relating to your user.

The missing piece of info is contained in /usr/share/doc/systemd/README.Debian:

Enabling persistent logging in journald

To enable persistent logging, create /var/log/journal and set up proper permissions:

install -d -g systemd-journal /var/log/journal
setfacl -R -nm g:adm:rx,d:g:adm:rx /var/log/journal

-- Tollef Fog Heen <>, Wed, 12 Oct 2011 08:43:50 +0200

Without the above you will not have permission to read the /var/log/journal//system.journal file, and the ACL is necessary for journal files created in the future to also be readable.

Paranoid, Init

Having marvelled at the er… unique nature of MikeeUSA’s Systemd Blues: Took our thing (Wooo) blues homage to the perils of using systemd, I decided what the world actually needs is something from the metal genre.

So, here’s the lyrics to Paranoid, Init.

Default soon on Debian
This doesn’t help me with my mind
People think I’m insane
Because I am trolling all the time

All day long I fight Red Hat
And uphold UNIX philosophy
Think I’ll lose my mind
If I can’t use sysvinit on jessie

Can you help me
Terrorise pid 1?
Oh yeah!

Tried to show the committee
That things were wrong with this design
They can’t see Poettering’s plan in this
They must be blind

Some sick joke I could just cry
GNOME needs logind API
QR codes gave me a feel
Then binary logs just broke the deal

And so as you hear these words
Telling you now of my state
Can’t log off and enjoy life
I’ve another sock puppet to create

rsync: “Inflate (token) returned -5”

Today one of my rsync backups began failing with:

inflate (token) returned -5
rsync error: error in rsync protocol data stream (code 12) at token.c(604) [receiver=3.0.3]
rsync: writefd_unbuffered failed to write 373 bytes [generator]: Broken pipe (32)
rsync error: error in rsync protocol data stream (code 12) at io.c(1544) [generator=3.0.3]

It was repeatable when trying to transfer the same file (a large gzipped SQL dump file).

It turned out to be a bug in that version of rsync.

rsync 3.0.3 comes with Debian lenny. In order to get a newer version I have had to use lenny-backports for this. That gets me rsync v3.0.7, which does not exhibit this bug.

(Yes, I am aware that squeeze has been released and this host should be upgraded to that. There is security support for lenny until at least February 2012.)

Linux, IPv6, router advertisements and forwarding

By default, a Linux host on an IPv6 network will listen for and solicit router advertisements in order to choose an IPv6 address for itself and to set up its default route. This is referred to as stateless address autoconfiguration (SLAAC).

If you don’t want a host to automatically configure an address and route then you could disable this behaviour by writing “0” to /proc/sys/net/ipv6/conf/*/accept_ra.

Additionally, if the Linux host considers itself to be a router then it will ignore all router advertisements.

In this context, what makes the difference between router or not are the settings of the /proc/sys/net/ipv6/conf/*/forwarding files (or the net.ipv6.conf.*.forwarding sysctl). If you turn your host into a router by setting one of those to “1”, you may find that your host removes any IPv6 address and default route it learnt via SLAAC.

There is a valid argument that a router should not be autoconfiguring itself, and should have its addresses and routes configured statically. Linux has IP forwarding features for a reason though, and sometimes you want to forward packets with a Linux box while still enjoying autoconfiguration. In my case I have some hosts running virtual machines, with IPv6 prefixes routed to the virtual machines. I’d still like the hosts to learn their default route via SLAAC.

It’s taken me a long time to work out how to do this. It isn’t well-documented.

Firstly, if you have a kernel version of 2.6.37 or higher then your answer is to set accept_ra to “2”. From ip-sysctl.txt:

accept_ra – BOOLEAN

Accept Router Advertisements; autoconfigure using them.

Possible values are:

  • 0 Do not accept Router Advertisements.
  • 1 Accept Router Advertisements if forwarding is disabled.
  • 2 Overrule forwarding behaviour. Accept Router Advertisements even if forwarding is enabled.

Functional default:

  • enabled if local forwarding is disabled.
  • disabled if local forwarding is enabled.

This appears to be a type of boolean that I wasn’t previously familiar with – one that has three different values.

If you don’t have kernel version 2.6.37 though, like say, everyone running the current Debian stable (2.6.32), this will not work. Helpfully, it also doesn’t give you any sort of error when you set accept_ra to “2”. It just sets it and continues silently ignoring router advertisements.


Fortunately Bjørn Mork posted about a workaround for earlier kernels which I would likely have never discovered otherwise. You just have to disable forwarding for the interface that your router advertisements will come in on, e.g.:

# echo 0 > /proc/sys/net/ipv6/conf/eth0/forwarding

Apparently as long as /proc/sys/net/ipv6/conf/all/forwarding is still set to “1” then forwarding will still be enabled. Obviously.

Additionally there are some extremely unintuitive interactions between “default” and “all” settings you may set in /etc/sysctl.conf and pre-existing interfaces. So there is a race condition on boot between IPv6 interfaces coming up and sysctl configuration being parsed. martin f krafft posted about this, and on Debian recommends setting desired sysctls in pre-up headers of the relevant iface stanza in /etc/network/interfaces, e.g.:

iface eth0 inet6 static
    address 2001:0db8:10c0:d0c5::1
    netmask 64
# Enable forwarding
    pre-up echo 1 > /proc/sys/net/ipv6/conf/default/forwarding
    pre-up echo 1 > /proc/sys/net/ipv6/conf/all/forwarding
# But disable forwarding on THIS interface so we still get RAs
    pre-up echo 0 > /proc/sys/net/ipv6/conf/$IFACE/forwarding
    pre-up echo 1 > /proc/sys/net/ipv6/conf/$IFACE/accept_ra
    pre-up echo 1 > /proc/sys/net/ipv6/conf/all/accept_ra
    pre-up echo 1 > /proc/sys/net/ipv6/conf/default/accept_ra

You will now have forwarding and SLAAC.

everything went better than expected

Recently updated gnutls then found you can’t connect to LDAP?

If you recently installed this update:

gnutls26 (2.4.2-6+lenny2) stable-security; urgency=high
  * Non-maintainer upload by the Security Team.
  * Fixed CVE-2009-2730: a vulnerability related to NUL bytes in
    X.509 certificate name fields. (Closes: #541439)

 -- Giuseppe Iuculano <> Sun, 01 Nov 2009 21:29:06

and then found that your applications began failing to connect to your LDAP server, you may want to check that your SSL certificate is valid. Along with this update it seems that the default behaviour changed to being more strict. In my case I was using self-signed SSL certificates without the CA being available.

You can disable the verification if you don’t want it by adding:

TLS_REQCERT     never

in /etc/ldap/ldap.conf on each client machine.