Using Duplicity to back up to Amazon S3 over IPv6 (only)

Scenario ^

I have a server that I use for making backups. I also send backups from that server into Amazon S3 at the “Infrequent Access” storage class. That class is cheaper to store but expensive to access. It’s intended for backups of last resort that you only access in an emergency. I use Duplicity to handle the S3 part.

(I could save a bit more by using one of the “Glacier” classes but at the moment the cost is minimal and I’m not that brave.)

I recently decided to change which server I use for the backups. I noticed that renting a server with only IPv6 connectivity was cheaper, and as all the hosts I back up have IPv6 connectivity I decided to give that a go.

This mostly worked fine. The only thing I really noticed was when I tried to install some software from GitHub. GitHub doesn’t support IPv6, so I had to piggy back that download through another host.

Then I came to set up Duplicity again and found that I needed to make some non-obvious changes to make it work with S3 over IPv6-only.

S3 endpoint ^

The main issue is that the default S3 endpoint URL is https://s3.<region>.amazonaws.com, and this host only has an A (IPv4) record! For example:

$ host s3.us-east-1.amazonaws.com
s3.us-east-1.amazonaws.com has address 52.216.89.254

If you run Duplicity with a target like s3://yourbucketname/path/to/backup then it will try that endpoint, get only an IPv4 address, and return Network unreachable.

S3 does actually support IPv6, but for that to work you need to use a dual stack endpoint! They look like this:

$ host s3.dualstack.us-east-1.amazonaws.com
s3.dualstack.us-east-1.amazonaws.com has address 54.231.129.0
s3.dualstack.us-east-1.amazonaws.com has IPv6 address 2600:1fa0:80dc:5101:34d9:451e::

So we need to specify the S3 endpoint to use.

Specifying the S3 endpoint ^

In order to do this you need to switch Duplicity to the “boto3” backend. Assuming you’ve installed the correct package (python3-boto3 on Debian), this is as simple as changing the target from s3://… to boto3+s3://….

That then allows you to use the command line arguments --s3-region-name and --s3-endpoint-url so you can tell it which host to talk to. That ends up giving you both an IPv4 and an IPv6 address and your system correctly chooses the IPv6 one.

The full script ^

The new, working script now looks something like this:

export PASSPHRASE="highlysecret"
export AWS_ACCESS_KEY_ID="notquiteassecret"
export AWS_SECRET_ACCESS_KEY="extremelysecret"
# Somewhere with plenty of free space.
export TMPDIR=/var/tmp
 
duplicity --encrypt-key ABCDEF0123456789 \
          --asynchronous-upload \
          -v 4 \
          --archive-dir=/path/to/your/duplicity/archives \
          --s3-use-ia \
          --s3-use-multiprocessing \
          --s3-use-new-style \
          --s3-region-name "us-east-1" \
          --s3-endpoint-url "https://s3.dualstack.us-east-1.amazonaws.com" \
          incr \
          --full-if-older-than 30D \
          /stuff/you/want/backed/up \
          "boto3+s3://yourbucketname/path/to/backups"

The previous version of the script looked a bit like:

# All the exports stayed the same
duplicity --encrypt-key ABCDEF0123456789 \
          --asynchronous-upload \
          -v 4 \
          --archive-dir=/path/to/your/duplicity/archives \
          --s3-use-ia \
          --s3-use-multiprocessing \
          incr \
          --full-if-older-than 30D \
          /stuff/you/want/backed/up \
          "s3+http://yourbucketname/path/to/backups"

Building BitFolk’s Rescue VM

Overview ^

BitFolk‘s Rescue VM is a live system based on the Debian Live project. You boot it, it finds its root filesystem over read-only NFS, and then it mounts a unionfs RAM disk over that so that you can make changes (e.g. install packages) that don’t persist. People generally use it to repair broken operating systems, reset root passwords etc.

Every few years I have to rebuild it, because it’s important that it’s new enough to be able to effectively poke around in guest filesystems. Each time I have to try to remember how I did it. It’s not that difficult but it’s well past time that I document how it’s done.

Basic concept of Debian Live ^

The idea is that everything under the config/ directory of your build area is either

  • a set of configuration options for the process itself,
  • some files to put in the image,
  • some scripts to run while building the image, or
  • some scripts to run while booting the image.

Install packages ^

Pick a host running at least the latest Debian stable. It might be possible to build a live image for a newer version of Debian, but the live-build system and its dependencies like debootstrap might end up being too old.

$ sudo apt install live-build live-boot live-config

Prepare the work directory ^

$ sudo mkdir -vp /srv/lb/auto
$ cd /srv/lb

Main configuration ^

All of these config options are described in the lb_config man page.

$ sudo tee auto/config >/dev/null <<'_EOF_'
#!/bin/sh
 
set -e
 
cacher_prefix="apt-cacher.lon.bitfolk.com/debian"
mirror_host="deb.debian.org"
main_mirror="http://${cacher_prefix}/${mirror_host}/debian/"
sec_mirror="http://${cacher_prefix}/${mirror_host}/debian-security/"
 
lb config noauto \
    --architectures                     amd64 \
    --distribution                      bullseye \
    --binary-images                     netboot \
    --archive-areas                     main \
    --apt-source-archives               false \
    --apt-indices                       false \
    --backports                         true \
    --mirror-bootstrap                  "$main_mirror" \
    --mirror-chroot-security            "$sec_mirror" \
    --mirror-binary                     "$main_mirror" \
    --mirror-binary-security            "$sec_mirror" \
    --memtest                           none \
    --net-tarball                       true \
    "${@}"
_EOF_

The variables at the top just save me having to repeat myself for all the mirrors. They make both the build process and the resulting image use BitFolk’s apt-cacher to proxy the deb.debian.org mirror.

I’m not going to describe every config option as you can just look them up in the man page. The most important one is --binary-images netboot to make sure it builds an image that can be booted by network.

Extra packages ^

There’s some extra packages I want available in the rescue image. Here’s how to get them installed.

$ sudo tee config/package-lists/bitfolk_rescue.list.chroot > /dev/null <<_EOF_
pwgen
less
binutils
build-essential
bzip2
gnupg
openssh-client
openssh-server
perl
perl-modules
telnet
screen
tmux
rpm
_EOF_

Installing a backports kernel ^

I want the rescue system to be Debian 11 (bullseye), but with a bullseye-backports kernel.

We already used --backports true to make sure that we have access to the backports package mirrors but we need to run a script hook to actually install the backports kernel in the image while it’s being built.

$ sudo tee config/hooks/live/9000-install-backports-kernel.hook.chroot >/dev/null <<'_EOF_'
#!/bin/sh
 
set -e
 
apt -y install -t bullseye-backports linux-image-amd64
apt -y purge -t bullseye linux-image-amd64
apt -y purge -t bullseye 'linux-image-5.10.*'
_EOF_

Set a static /etc/resolv.conf ^

This image will only be booted on one network where I know what the nameservers are, so may as well statically override them. If you were building an image to use on different networks you’d probably instead want to use one of the public resolvers or accept what DHCP gives you.

$ sudo tee config/includes.chroot/etc/resolv.conf >/dev/null <<_EOF_
nameserver 85.119.80.232
nameserver 85.119.80.233
_EOF_

Set an explanatory footer text in /etc/issue.footer ^

The people using this rescue image don’t necessarily know what it is and how to use it. I take the opportunity to put some basic info in the file /etc/issue.footer in the image, which will later end up in the real /etc/issue

$ sudo tee config/includes.chroot/etc/issue.footer >/dev/null <<_EOF_
BitFolk Rescue Environment - https://tools.bitfolk.com/wiki/Rescue
 
Blah blah about what this is and how to use it
_EOF_

Set a random password at boot ^

By default a Debian Live image has a user name of “user” and a password of “live“. This isn’t suitable for a networked service that will have sshd active from the start, so we will install a hook script that sets a random password. This will be run near the end of the image’s boot process.

$ sudo tee config/includes.chroot/lib/live/config/2000-passwd >/dev/null <<'_EOF_'
#!/bin/sh
 
set -e
 
echo -n " random-password "
 
NEWPASS=$(/usr/bin/pwgen -c -N 1)
printf "user:%s\n" "$NEWPASS" | chpasswd
 
RED='\033[0;31m'
NORMAL='\033[0m'
 
{
    printf "****************************************\n";
    printf "Resetting user password to random value:\n";
    printf "\t${RED}New user password:${NORMAL} %s\n" "$NEWPASS";
    printf "****************************************\n";
    cat /etc/issue.footer
} >> /etc/issue
_EOF_

This script puts the random password and the footer text into the /etc/issue file which is displayed above the console login prompt, so the user can see what the password is.

Fix initial networking setup ^

This one’s a bit unfortunate and is a huge hack, but I’m not sure enough of the details to report a bug yet.

The live image when booted is supposed to be able to set up its network by a number of different ways. DHCP would be the most sensible for an image you take with you to different networks.

The BitFolk Rescue VM is only ever booted in one network though, and we don’t use DHCP. I want to set static networking through the ip=…s syntax of the kernel command line.

Unfortunately it doesn’t seem to work properly with live-boot as shipped. I had to hack the /lib/live/boot/9990-networking.sh file to make it parse the values out of the kernel command line.

Here’s a diff. Copy /lib/live/boot/9990-networking.sh to config/includes.chroot/usr/lib/live/boot/9990-networking.sh and then apply that patch to it.

It’s simple enough that you could probably edit it by hand. All it does is comment out one section and replace it with some bits that parse IP setup out of the $STATICIP variable.

Fix the shutdown process ^

Again this is a horrible hack and I’m sure there is a better way to handle it, but I couldn’t work out anything better and this works.

This image will be running with its root filesystem on NFS. When a shutdown or halt command is issued however, systemd seems extremely keen to shut off the network as soon as possible. That leaves the shutdown process unable to continue because it can’t read or write its root filesystem any more. The shutdown process stalls forever.

As this is a read-only system with no persistent state I don’t care how brutal the shutdown process is. I care more that it does actually shut down. So, I have added a systemd service that issues systemctl –force –force poweroff any time that it’s about to shut down by any means.

$ sudo tee config/includes.chroot/etc/systemd/system/always-brutally-poweroff.service >/dev/null <<_EOF_
[Unit]
Description=Every kind of shutdown will be brutal poweroff
DefaultDependencies=no
After=final.target
 
[Service]
Type=oneshot
ExecStart=/usr/bin/systemctl --force --force poweroff
 
[Install]
WantedBy=final.target
_EOF_

And to force it to be enabled at boot time:

$ sudo tee config/includes.chroot/etc/rc.local >/dev/null <<_EOF_
#!/bin/sh
 
set -e
 
systemctl enable always-brutally-poweroff
_EOF_

Build it ^

At last we’re ready to build the image.

$ sudo lb clean && sudo lb config && sudo lb build

The “lb clean” is there because you probably won’t get this right first time and will want to iterate on it.

Once complete you’ll find the files to put on your NFS server in binary/ and the kernel and initramfs to boot on your client machine in tftpboot/live/

$ sudo rsync -av binary/ my.nfs.server:/srv/rescue/

Booting it ^

The details of exactly how I boot the client side (which in BitFolk’s case is a customer VM) are out of scope here, but this is sort of what the kernel command line looks like on the client (normally all on one line):

root=/dev/nfs
ip=192.168.0.225:192.168.0.243:192.168.0.1:255.255.248.0:rescue
hostname=rescue
nfsroot=192.168.0.243:/srv/rescue
nfsopts=tcp
boot=live
persistent

Explained:

root=/dev/nfs
Get root filesystem from NFS.
ip=192.168.0.225:192.168.0.243:192.168.0.1:255.255.248.0:rescue
Static IP configuration on kernel command line. Separated by colons:

  • Client’s IP
  • NFS server’s IP
  • Default gateway
  • Netmask
  • Host name
hostname=rescue
Host name.
nfsroot=192.168.0.243:/srv/rescue
Where to mount root from on NFS server.
nfsopts=tcp
NFS client options to use.
boot=live
Tell live-boot that this is a live image.

persistent
Look for persistent data.

In action ^

Here’s an Asciinema of this image in action.

Improvements ^

There’s a few things in here which are hacks. What I have works but no doubt I am doing some things wrong. If you know better please do let me know in comments or whatever. Ideally I’d like to stick with Debian Live though because it’s got a lot of problems solved already.

btrfs compression wins

Some quite good btrfs compression results from my backup hosts (which back up customer data).

Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       64%       68G         105G         1.2T
none       100%       24G          24G         434G
zlib        54%       43G          80G         797G
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       74%       91G         123G         992G
none       100%       59G          59G         599G
lzo         50%       32G          63G         393G
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       73%       16G          22G         459G
none       100%       12G          12G         269G
lzo         40%      4.1G          10G         190G
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       71%      105G         148G         1.9T
none       100%       70G          70G         910G
zlib        40%       24G          60G         1.0T
lzo         58%       10G          17G          17G

So that’s 398G that takes up 280G, a 29.6% reduction.

The “none” type is incompressible files such as media that’s already compressed. I started off with lzo compression but I’m switching to zlib now as it compresses more and this data is rarely accessed so I’m not too concerned about performance. I need newer kernels on these before I can try zstd.

I’ve had serious concerns about btrfs before based on issues I’ve had using it at home, but these were mostly around multiple device usage. Here they get a single block device that has redundancy underneath so the only remotely interesting thing that btrfs is doing here is the compression.

Might try some offline deduplication next.

Resolving a sector offset to a logical volume

The Problem ^

Sometimes Linux logs interesting things with sector offsets. For example:

Jul 23 23:11:19 tanqueray kernel: [197925.429561] sg[22] phys_addr:0x00000015bac60000 offset:0 length:4096 dma_address:0x00000012cf47a000 dma_length:4096
Jul 23 23:11:19 tanqueray kernel: [197925.430323] sg[23] phys_addr:0x00000015bac5d000 offset:0 length:4608 dma_address:0x00000012cf47b000 dma_length:4608
Jul 23 23:11:19 tanqueray kernel: [197925.431052] sg[24] phys_addr:0x00000015bac5e200 offset:512 length:3584 dma_address:0x00000012cf47c200 dma_length:3584
Jul 23 23:11:19 tanqueray kernel: [197925.431824] sg[25] phys_addr:0x00000015bac2e000 offset:0 length:4096 dma_address:0x00000012cf47d000 dma_length:4096
.
.
.
Jul 23 23:11:19 tanqueray kernel: [197925.434447] Invalid SGL for payload:131072 nents:32
.
.
.
Jul 23 23:11:19 tanqueray kernel: [197925.454419] blk_update_request: I/O error, dev nvme0n1, sector 509505343 op 0x1:(WRITE) flags 0x800 phys_seg 32 prio class 0
Jul 23 23:11:19 tanqueray kernel: [197925.464644] md/raid1:md5: Disk failure on nvme0n1p5, disabling device.
Jul 23 23:11:19 tanqueray kernel: [197925.464644] md/raid1:md5: Operation continuing on 1 devices.

What is at sector 509505343 of /dev/nvme0n1p5 anyway? Well, that’s part of an md array and then on top of that is an lvm physical volume, which has a number of logical volumes.

I’d like to know which logical volume sector 509505343 of /dev/nvme0n1p5 corresponds to.

At the md level ^

Thankfully this is a RAID-1 so every device in it has the exact same layout.

$ grep -A 2 ^md5 /proc/mdstat 
md5 : active raid1 nvme0n1p5[0] sda5[1]
      3738534208 blocks super 1.2 [2/2] [UU]
      bitmap: 2/28 pages [8KB], 65536KB chunk

The superblock format of 1.2 also means that the RAID metadata is at the end of each device, so there is no offset there to worry about.

For all intents and purposes sector 509505343 of /dev/nvme0n1p5 is the same as sector 509505343 of /dev/md5.

If I’d been using a different RAID level like 5 or 6 then this would have been far more complicated as the data would have been striped across multiple devices at different offsets, together with parity. Some layouts of Linux RAID-10 would also have different offsets.

At the lvm level ^

LVM has physical volumes (PVs) that are split into extents, then one or more ranges of one or more extents make up a logical volume (LV). The physical volumes are just the underlying device, so in my case that’s /dev/md5.

Offset into the PV ^

LVM has some metadata at the start of the PV, so we first work out how far into the PV the extents can start:

$ sudo pvs --noheadings -o pe_start --units s /dev/md5
    2048S

So, sector 509505343 is actually 509503295 sectors into this PV, because the first 2048 sectors are reserved for metadata.

How big is an extent? ^

Next we need to know how big an LVM extent is.

$ sudo pvdisplay --units s /dev/md5 | grep 'PE Size'
  PE Size               8192 Se

There’s 8192 sectors in each of the extents in this PV, so this sector is inside extent number 509503295 / 8192 = 62195.22644043.

It’s fractional because naturally the sector is not on an exact PE boundary. If I need to I could work out from the remainder how many sectors into PE 62195 this is, but I’m only interested in the LV name and each LV has an integer number of PEs, so that’s fine: PE 62195.

Look at the PV’s mappings ^

Now you can dump out a list of mappings for the PV. This will show you what each range of extents corresponds to. Note that there might be multiple ranges for an LV if it’s been grown later on.

$ sudo pvdisplay --maps /dev/md5 | grep -A1 'Physical extent'
.
.
.
  Physical extent 58934 to 71733:
    Logical volume      /dev/myvg/domu_backup4_xvdd
--
  Physical extent 71734 to 912726:
    FREE

So, extent 62195 is inside /dev/myvg/domu_backup4_xvdd.

What’s going on here then? ^

I’m not sure, but there appears to be a kernel bug and it’s probably got something to do with the fact that this LV is a disk with an unaligned partition table:

$ sudo fdisk -u -l /dev/myvg/domu_backup4_xvdd
Disk /dev/myvg/domu_backup4_xvdd: 50 GiB, 53687091200 bytes, 104857600 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x07c7ce4c

Device Boot Start End Sectors Size Id Type
/dev/myvg/domu_backup4_xvdd1 63 104857599 104857537 50G 83 Linux
Partition 1 does not start on physical sector boundary.

The Linux NVMe driver can only do IO in multiples of 4096 bytes. As seen in the initial logs, two of the requests were for 4608 and 3584 bytes respectively; these are not divisible by 4096 and thus hit a WARN().

.
.
.
Jul 23 23:11:19 tanqueray kernel: [197925.430323] sg[23] phys_addr:0x00000015bac5d000 offset:0 length:4608 dma_address:0x00000012cf47b000 dma_length:4608
Jul 23 23:11:19 tanqueray kernel: [197925.431052] sg[24] phys_addr:0x00000015bac5e200 offset:512 length:3584 dma_address:0x00000012cf47c200 dma_length:3584
.
.
.

Going further: finding the file ^

I’m not interested in doing this because it’s fairly likely that it’s because of the offset partition and many kinds of IO to it will cause this.

If you did want to though, you’d first have to look at the partition table to see where your filesystem starts. 0.22644043 * 8192 = 1855 sectors into the disk. Partition 1 starts at 63, so this file is at 1792 sectors.

You can then (for ext4) use debugfs to poke about and see which file that corresponds to.

Keeping firewall logs out of Linux’s kernel log with ulogd2

A few words about iptables vs nft ^

nftables is the new thing and iptables is deprecated, but I haven’t found time to convert everything to nft rules syntax yet.

I’m still using iptables rules but it’s the iptables frontend to nftables. All of this works both with legacy iptables and with nft but with different syntax.

Logging with iptables ^

As a contrived example let’s log inbound ICMP packets at a maximum rate of 1 per second:

-A INPUT -m limit --limit 1/s -p icmp -j LOG --log-level 7 --log-prefix "ICMP: "

The Problem ^

If you have logging rules in your firewall then they’ll log to your kernel log, which is available at /dev/kmsg. The dmesg command displays the contents of /dev/kmsg but /dev/kmsg is a fixed size circular buffer, so after a while your firewall logs will crowd out every other thing.

On a modern systemd system this stuff does get copied to the journal, so if you set that to be persistent then you can keep the kernel logs forever. Or you can additionally run a syslog daemon like rsyslogd, and have that keep things forever.

Either way though your dmesg or journalctl -k commands are only going to display the contents of the kernel’s ring buffer which will be a limited amount.

I’m not that interested in firewall logs. They’re nice to have and very occasionally valuable when debugging something, but most of the time I’d rather they weren’t in my kernel log.

An answer: ulogd2 ^

One answer to this problem is ulogd2. ulogd2 is a userspace logging daemon into which you can feed netfilter data and have it log it in a flexible way, to multiple different formats and destinations.

I actually already use it to log certain firewall things to a MariaDB database for monitoring purposes, but you can also emit plain text, JSON, netflow and all manner of things. Since I’m already running it I decided to switch my general firewall logging to it.

Configuring ulogd2 ^

I added the following to /etc/ulogd.conf:

# This one for logging to local file in emulated syslog format.
stack=log2:NFLOG,base1:BASE,ifi1:IFINDEX,ip2str1:IP2STR,print1:PRINTPKT,emu1:LOGEMU
 
[log2]
group=2
 
[emu1]
file="/var/log/iptables_ulogd2.log"
sync=1

I already had a stack called log1 for logging to MariaDB, so I called the new one log2 with its output being emu1.

The log2 section can then be told to expect messages from netfilter group 2. Don’t worry about this, just know that this is what you refer to in your firewall rules, and you can’t use group 0 because that’s used for something else.

The emu1 section then says which file to write this stuff to.

That’s it. Restart the daemon.

Configuring iptables ^

Now it’s time to make iptables log to netfilter group 2 instead of its normal LOG target. As a reminder, here’s what the rule was like before:

-A INPUT -m limit --limit 1/s -p icmp -j LOG --log-level 7 --log-prefix "ICMP: "

And here’s what you’d change it to:

-A INPUT -m limit --limit 1/s -p icmp -j NFLOG --nflog-group 2 --nflog-prefix "ICMP:"

The --nflog-group 2 needs to match what you put in /etc/ulogd.conf.

You’re now logging with ulogd2 and none of this will be going to the kernel log buffer. Don’t forget to rotate the new log file! Or maybe you’d like to play with logging this as JSON or into a SQLite DB?

rsync and sudo without X forwarding

Five years ago I wrote about how to do rsync as root on both sides. That solution required using ssh-askpass which in turn requires X forwarding.

The main complication here is that sudo on the remote side is going to ask for a password, which either requires an interactive terminal or a forwarded X session.

I thought I would mention that if you’ve disabled tty_tickets in the sudo configuration then you can “prime” the sudo authentication with some harmless command and then do the real rsync without it asking for a sudo password:

local$ ssh -t you@remote.example.com sudo whoami
[sudo] password for you: 
root
local$ sudo rsync --rsync-path="sudo rsync" -av --delete \ 
  you@remote.example.com:/etc/secret/ /etc/secret/

This suggestion was already supplied as a comment on the earlier post five years ago, but I keep forgetting it.

I suggest this is only for ad hoc commands and not for automation. For automation you need to find a way to make sudo not ever ask for a password, and some would say to add configuration to sudo with a NOPASSWD directive to accomplish that.

I would instead suggest allowing a root login by ssh using a public key that is only for the specific purpose, as you can lock it down to only ever be able to execute that one script/program.

Also bear in mind that if you permanently allow “host A” to run rsync as root with unrestricted parameters on “host B” then a compromise of “host A” is also a compromise of “host B”, as full write access to filesystem is granted. Whereas if you only allow “host A” to run a specific script/program on “host B” then you’ve a better chance of things being contained.

grub-install: error: embedding is not possible, but this is required for RAID and LVM install

The Initial Problem ^

The recent security update of the GRUB bootloader did not want to install on my fileserver at home:

$ sudo apt dist-upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
  grub-common grub-pc grub-pc-bin grub2-common
4 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 4,067 kB of archives.
After this operation, 72.7 kB of additional disk space will be used.
Do you want to continue? [Y/n]
…
Setting up grub-pc (2.02+dfsg1-20+deb10u4) ...
Installing for i386-pc platform.
grub-install: warning: your core.img is unusually large.  It won't fit in the embedding area.
grub-install: error: embedding is not possible, but this is required for RAID and LVM install.
Installing for i386-pc platform.
grub-install: warning: your core.img is unusually large.  It won't fit in the embedding area.
grub-install: error: embedding is not possible, but this is required for RAID and LVM install.
Installing for i386-pc platform.
grub-install: warning: your core.img is unusually large.  It won't fit in the embedding area.
grub-install: error: embedding is not possible, but this is required for RAID and LVM install.
Installing for i386-pc platform.
grub-install: warning: your core.img is unusually large.  It won't fit in the embedding area.
grub-install: error: embedding is not possible, but this is required for RAID and LVM install.

Four identical error messages, because this server has four drives upon which the operating system is installed, and I’d decided to do a four way RAID-1 of a small first partition to make up /boot. This error is coming from grub-install.

Ancient History ^

This system came to life in 2006, so it’s 15 years old. It’s always been Debian stable, so right now it runs Debian buster and during those 15 years it’s been transplanted into several different iterations of hardware.

Choices were made in 2006 that were reasonable for 2006, but it’s not 2006 now. Some of these choices are now causing problems.

Aside: four way RAID-1 might seem excessive, but we’re only talking about the small /boot partition. Back in 2006 I chose a ~256M one so if I did the minimal thing of only having a RAID-1 pair I’d have 2x 256M spare on the two other drives, which isn’t very useful. I’d honestly rather have all four system drives with the same partition table and there’s hardly ever writes to /boot anyway.

Here’s what the identical partition tables of the drives /dev/sd[abcd] look like:

$ sudo fdisk -u -l /dev/sda
Disk /dev/sda: 298.1 GiB, 320069031424 bytes, 625134827 sectors
Disk model: ST3320620AS     
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000000
 
Device     Boot   Start       End   Sectors  Size Id Type
/dev/sda1  *         63    514079    514017  251M fd Linux raid autodetect
/dev/sda2        514080   6393869   5879790  2.8G fd Linux raid autodetect
/dev/sda3       6393870 625121279 618727410  295G fd Linux raid autodetect

Note that the first partition starts at sector 63, 32,256 bytes into the disk. Modern partition tools tend to start partitions at sector 2,048 (1,024KiB in), but this was acceptable in 2006 for me and worked up until a few days ago.

Those four partitions /dev/sd[abcd]1 make up an mdadm RAID-1 with metadata version 0.90. This was purposefully chosen because at the time of install GRUB did not have RAID support. This metadata version lives at the end of the member device so anything that just reads the device can pretend it’s an ext2 filesystem. That’s what people did many years ago to boot off of software RAID.

What’s Gone Wrong? ^

The last successful update of grub-pc seems to have been done on 7 February 2021:

$ ls -la /boot/grub/i386-pc/core.img
-rw-r--r-- 1 root root 31082 Feb  7 17:19 /boot/grub/i386-pc/core.img

I’ve got 62 sectors available for the core.img so that’s 31,744 bytes – just 662 bytes more than what is required.

The update of grub-pc appears to be detecting that my /boot partition is on a software RAID and is now including MD RAID support even though I don’t strictly require it. This makes the core.img larger than the space I have available for it.

I don’t think it is great that such a major change has been introduced as a security update, and it doesn’t seem like there is any easy way to tell it not to include the MD RAID support, but I’m sure everyone is doing their best here and it’s more important to get the security update out.

Possible Fixes ^

So, how to fix? It seems to me the choices are:

  1. Ignore the problem and stay on the older grub-pc
  2. Create a core.img with only the modules I need
  3. Rebuild my /boot partition

Option #1 is okay short term, especially if you don’t use Secure Boot as that’s what the security update was about.

Option #2 doesn’t seem that feasible as I can’t find a way to influence how Debian’s upgrade process calls grub-install. I don’t want that to become a manual process.

Option #3 seems like the easiest thing to do, as shaving ~1MiB off the size of my /boot isn’t going to cause me any issues.

Rebuilding My /boot ^

Take a backup ^

/boot is only relatively small so it seemed easiest just to tar it up ready to put it back later.

$ sudo tar -C /boot -cvf ~/boot.tar .

I then sent that tar file off to another machine as well, just in case the worst should happen.

Unmount /boot and stop the RAID array that it’s on ^

I’ve already checked in /etc/fstab that /boot is on /dev/md0.

$ sudo umount /boot
$ sudo mdadm --stop md0         
mdadm: stopped md0

At this point I would also recommend doing a wipefs -a on each of the partitions in order to remove the MD superblocks. I didn’t and it caused me a slight problem later as we shall see.

Delete and recreate first partition on each drive ^

I chose to use parted, but should be doable with fdisk or sfdisk or whatever you prefer.

I know from the fdisk output way above that the new partition needs to start at sector 2048 and end at sector 514,079.

$ sudo parted /dev/sda                                                             
GNU Parted 3.2
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) unit s
(parted) rm 1
(parted) mkpart primary ext4 2048 514079s
(parted) set 1 raid on
(parted) set 1 boot on
(parted) p
Model: ATA ST3320620AS (scsi)
Disk /dev/sda: 625134827s
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:
 
Number  Start     End         Size        Type     File system  Flags
 1      2048s     514079s     512032s     primary  ext4         boot, raid, lba
 2      514080s   6393869s    5879790s    primary               raid
 3      6393870s  625121279s  618727410s  primary               raid
 
(parted) q
Information: You may need to update /etc/fstab.

Do that for each drive in turn. When I got to /dev/sdd, this happened:

Error: Partition(s) 1 on /dev/sdd have been written, but we have been unable to
inform the kernel of the change, probably because it/they are in use.  As a result,
the old partition(s) will remain in use.  You should reboot now before making further changes.
Ignore/Cancel?

The reason for this seems to be that something has decided that there is still a RAID signature on /dev/sdd1 and so it will try to incrementally assemble the RAID-1 automatically in the background. This is why I recommend a wipefs of each member device.

To get out of this situation without rebooting I needed to repeat my mdadm --stop /dev/md0 command and then do a wipefs -a /dev/sdd1. I was then able to partition it with parted.

Create md0 array again ^

I’m going to stick with metadata format 0.90 for this one even though it may not be strictly necessary.

$ sudo mdadm --create /dev/md0 \
             --metadata 0.9 \
             --level=1 \
             --raid-devices=4 \
             /dev/sd[abcd]1
mdadm: array /dev/md0 started.

Again, if you did not do a wipefs earlier then mdadm will complain that these devices already have a RAID array on them and ask for confirmation.

Get the Array UUID ^

$ sudo mdadm --detail /dev/md0
/dev/md0:
           Version : 0.90
     Creation Time : Sat Mar  6 03:20:10 2021
        Raid Level : raid1
        Array Size : 255936 (249.94 MiB 262.08 MB)
     Used Dev Size : 255936 (249.94 MiB 262.08 MB)
      Raid Devices : 4
     Total Devices : 4
   Preferred Minor : 0
       Persistence : Superblock is persistent
 
       Update Time : Sat Mar  6 03:20:16 2021
             State : clean
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0
 
Consistency Policy : resync
 
              UUID : e05aa2fc:91023169:da7eb873:22131b12 (local to host specialbrew.localnet)            Events : 0.18
 
    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1

Change your /etc/mdadm/mdadm.conf for the updated UUID of md0:

$ grep md0 /etc/mdadm/mdadm.conf
ARRAY /dev/md0 level=raid1 num-devices=4 UUID=e05aa2fc:91023169:da7eb873:22131b12

Make a new filesystem on /dev/md0 ^

$ sudo mkfs.ext4 -m0 -L boot /dev/md0
mke2fs 1.44.5 (15-Dec-2018)
Creating filesystem with 255936 1k blocks and 64000 inodes
Filesystem UUID: fdc611f2-e82a-4877-91d3-0f5f8a5dd31d
Superblock backups stored on blocks:
        8193, 24577, 40961, 57345, 73729, 204801, 221185
 
Allocating group tables: done
Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

My /etc/fstab didn’t need a change because it mounted by device name, i.e. /dev/md0, but if yours uses UUID or label then you’ll need to update that now, too.

Mount it and put your files back ^

$ sudo mount /boot
$ sudo tar -C /boot -xvf ~/boot.tar

Reinstall grub-pc ^

$ sudo apt reinstall grub-pc
…
Setting up grub-pc (2.02+dfsg1-20+deb10u4) ...
Installing for i386-pc platform.
Installation finished. No error reported.
Installing for i386-pc platform.
Installation finished. No error reported.
Installing for i386-pc platform.
Installation finished. No error reported.
Installing for i386-pc platform.
Installation finished. No error reported.

Reboot ^

You probably should reboot now to make sure it all works when you have time to fix any problems, as opposed to risking issues when you least expect it.

$ uprecords 
     #               Uptime | System                                     Boot up
----------------------------+---------------------------------------------------
     1   392 days, 16:45:55 | Linux 4.7.0               Thu Jun 14 16:13:52 2018
     2   325 days, 03:20:18 | Linux 3.16.0-0.bpo.4-amd  Wed Apr  1 14:43:32 2015
->   3   287 days, 16:03:12 | Linux 4.19.0-9-amd64      Fri May 22 12:33:27 2020     4   257 days, 07:31:42 | Linux 4.19.0-6-amd64      Sun Sep  8 05:00:38 2019
     5   246 days, 14:45:10 | Linux 4.7.0               Sat Aug  6 06:27:52 2016
     6   165 days, 01:24:22 | Linux 4.5.0-rc4-specialb  Sat Feb 20 18:18:47 2016
     7   131 days, 18:27:51 | Linux 3.16.0              Tue Sep 16 08:01:05 2014
     8    89 days, 16:01:40 | Linux 4.7.0               Fri May 26 18:28:40 2017
     9    85 days, 17:33:51 | Linux 4.7.0               Mon Feb 19 17:17:39 2018
    10    63 days, 18:57:12 | Linux 3.16.0-0.bpo.4-amd  Mon Jan 26 02:33:47 2015
----------------------------+---------------------------------------------------
1up in    37 days, 11:17:07 | at                        Mon Apr 12 15:53:46 2021
no1 in   105 days, 00:42:44 | at                        Sat Jun 19 05:19:23 2021
    up  2362 days, 06:33:25 | since                     Tue Sep 16 08:01:05 2014
  down     0 days, 14:02:09 | since                     Tue Sep 16 08:01:05 2014
   %up               99.975 | since                     Tue Sep 16 08:01:05 2014

My Kingdom For 7 Bytes ^

My new core.img is 7 bytes too big to fit before my original /boot:

$ ls -la /boot/grub/i386-pc/core.img
-rw-r--r-- 1 root root 31751 Mar  6 03:24 /boot/grub/i386-pc/core.img

Booting the CentOS/RHEL installer under Xen PVH mode

CentOS/RHEL and Xen ^

As of the release of CentOS 8 / RHEL8, Red Hat disabled kernel support for running as a Xen PV or PVH guest, even though such support is enabled by default in the upstream Linux kernel.

As a result—unlike with all previous versions of CentOS/RHEL—you cannot boot the installer in Xen PV or PVH mode. You can still boot it in Xen HVM mode, or under KVM, but that is not very helpful if you don’t want to run HVM or KVM.

At BitFolk ever since the release of CentOS 8 we’ve had to tell customers to use the Rescue VM (a kind of live system) to unpack CentOS into a chroot.

Fortunately there is now a better way.

Credit ^

This method was worked out by Jon Fautley. Jon emailed me instructions and I was able to replicate them. Several people have since asked me how it was done and Jon was happy for me to write it up, but this was all worked out by Jon, not me.

Overview ^

The basic idea here is to:

  1. take the installer initrd.img
  2. unpack it
  3. shove the modules from a Debian kernel into it
  4. repack it
  5. use a Debian kernel and this new frankeninitrd as the installer kernel and initrd
  6. switch the installed OS to kernel-ml package from ELRepo so it has a working kernel when it boots

Detailed process ^

I’ll go into enough detail that you should be able to exactly replicate what I did to end up with something that works. This is quite a lot but it only needs to be done each time the real installer initrd.img changes, which isn’t that often. The resulting kernel and initrd.img can be used to install many guests.

Throughout the rest of this article I’ll refer to CentOS, but Jon initially made this work for RHEL 8. I’ve replicated it for CentOS 8 and will soon do so for RHEL 8 as well.

Extract the CentOS initrd.img ^

You will find this in the install ISO or on mirrors as images/pxeboot/initrd.img.

$ mkdir /var/tmp/frankeninitrd/initrd
$ cd /var/tmp/frankeninitrd/initrd
$ xz -dc /path/to/initrd.img > ../initrd.cpio
$ # root needed because this will do some mknod/mkdev.
$ sudo cpio -idv < ../initrd.cpio

Copy modules from a working Xen guest ^

I’m going to use the Xen guest that I’m doing this on, which at the time of writing is a Debian buster system running kernel 4.19.0-13. Even a system that is not currently running as a Xen guest will probably work, as they usually have modules available for everything.

At the time of writing the kernel version in the installer is 4.18.0-240.

If you’ve got different, adjust filenames accordingly.

$ sudo cp -r /lib/modules/4.19.0-13-amd64 lib/modules/
$ # You're not going to use the original modules
$ # so may as well delete them to save space.
$ sudo rm -vr lib/modules/4.18*

Add dracut hook to copy fs modules ^

$ cat > usr/lib/dracut/hooks/pre-pivot/99-move-modules.sh <<__EOF__
#!/bin/sh
 
mkdir -p /sysroot/lib/modules/$(uname -r)/kernel/fs
rm -r /sysroot/lib/modules/4.18*
cp -r /lib/modules/$(uname -r)/kernel/fs/* /sysroot/lib/modules/$(uname -r)/kernel/fs
cp /lib/modules/$(uname -r)/modules.builtin /sysroot/lib/modules/$(uname -r)/
depmod -a -b /sysroot
 
exit 0
__EOF__
$ chmod +x usr/lib/dracut/hooks/pre-pivot/99-move-modules.sh

Repack initrd ^

This will take a really long time because xz -9 is sloooooow.

$ sudo find . 2>/dev/null | \
  sudo cpio -o -H newc -R root:root | \
  xz -9 --format=lzma > ../centos8-initrd.img

Use the Debian kernel ^

Put the matching kernel next to your initrd.

$ cp /boot/vmlinuz-4.19.0-13-amd64 ../centos8-vmlinuz
$ ls -lah ../centos*
-rw-r--r-- 1 andy andy  81M Feb  1 04:43 ../centos8-initrd.img
-rw-r--r-- 1 andy andy 5.1M Feb  1 04:04 ../centos8-vmlinuz

Boot this kernel/initrd as a Xen guest ^

Copy the kernel and initrd to somewhere on your dom0 and create a guest config file that looks a bit like this:

name       = "centostest"
# CentOS 8 installer requires at least 2.5G RAM.
# OS will run with a lot less though.
memory     = 2560
vif        = [ "mac=00:16:5e:00:02:39, ip=192.168.82.225, vifname=v-centostest" ]
type       = "pvh"
kernel     = "/var/tmp/frankeninitrd/centos8-vmlinuz"
ramdisk    = "/var/tmp/frankeninitrd/centos8-initrd.img"
extra      = "console=hvc0 ip=192.168.82.225::192.168.82.1:255.255.255.0:centostest:eth0:none nameserver=8.8.8.8 inst.stage2=http://www.mirrorservice.org/sites/mirror.centos.org/8/BaseOS/x86_64/os/ inst.ks=http://example.com/yourkickstart.ks"
disk       = [ "phy:/dev/vg/centostest_xvda,xvda,w",
               "phy:/dev/vg/centostest_xvdb,xvdb,w" ]

Assumptions in the above:

  • vif and disk settings will be however you usually do that.
  • “extra” is for the kernel command line and here gives the installer static networking with the ip=IP address::default gateway:netmask:hostname:interface name:auto configuration type option.
  • inst.stage2 here goes to a public mirror but could be an unpacked installer iso file instead.
  • inst.ks points to a minimal kickstart file you’ll have to create (see below).

Minimal kickstart file ^

This kickstart file will:

  • Automatically wipe disks and partition. I use xvda for the OS and xvdb for swap. Adjust accordingly.
  • Install only minimal package set.
  • Switch the installed system over to kernel-ml from EPEL.
  • Force an SELinux autorelabel at first boot.

The only thing it doesn’t do is create any users. The installer will wait for you to do that. If you want an entirely automated install just add the user creation stuff to your kickstart file.

url --url="http://www.mirrorservice.org/sites/mirror.centos.org/8/BaseOS/x86_64/os"
text
 
# Clear all the disks.
clearpart --all --initlabel
zerombr
 
# A root filesystem that takes up all of xvda.
part /    --ondisk=xvda --fstype=xfs --size=1 --grow
 
# A swap partition that takes up all of xvdb.
part swap --ondisk=xvdb --size=1 --grow
 
bootloader --location=mbr --driveorder=xvda --append="console=hvc0"
firstboot --disabled
timezone --utc Etc/UTC --ntpservers="0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org,3.pool.ntp.org"
keyboard --vckeymap=gb --xlayouts='gb'
lang en_GB.UTF-8
skipx
firewall --enabled --ssh
halt
 
%packages
@^Minimal install
%end 
 
%post --interpreter=/usr/bin/bash --log=/root/ks-post.log --erroronfail
 
# Switch to kernel-ml from EPEL. Necessary for Xen PV/PVH boot support.
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
yum -y install https://www.elrepo.org/elrepo-release-8.el8.elrepo.noarch.rpm
yum --enablerepo=elrepo-kernel -y install kernel-ml
yum -y remove kernel-tools kernel-core kernel-modules
 
sed -i -e 's/DEFAULTKERNEL=.*/DEFAULTKERNEL=kernel-ml/' /etc/sysconfig/kernel
grub2-mkconfig -o /boot/grub2/grub.cfg
 
# Force SELinux autorelabel on first boot.
touch /.autorelabel
%end

Launch the guest ^

$ sudo xl create -c /etc/xen/centostest.conf

Obviously this guest config can only boot the installer. Once it’s actually installed and halts you’ll want to make a guest config suitable for normal booting. The kernel-ml does work in PVH mode so at BitFolk we use pvhgrub to boot these.

A better way? ^

The actual modifications needed to the stock installer kernel are quite small: just enable CONFIG_XEN_PVH kernel option and build. I don’t know the process to build a CentOS or RHEL installer kernel though, so that wasn’t an option for me.

If you do know how to do it please do send me any information you have.

If you’re running Ubuntu and/or using snaps, look into CVE-2020-27348

I was reading an article about CVE-2020-27348 earlier, which is quite a nasty bug affecting a lot of snap packages.

My desktop runs Ubuntu 18.04 at the moment, and so does my partner’s laptop. I also have a Debian buster laptop but I’ve never installed snapd there. So it’s just my desktop and my partner’s laptop I’m concerned about.

If you run Ubuntu 20.04 or later I think there’s probably more concern, as I understand the software centre offers snap versions of things by default.

Anyway, I couldn’t recall ever installing a snap on purpose on my desktop except for a short while ago when I intentionally installed signal-desktop. But in fact I have quite a few snaps installed.

$ snap list
Name                  Version                     Rev    Tracking         Publisher     Notes
core                  16-2.48.2                   1058   latest/stable    canonical✓    core
core18                20201210                    1944   latest/stable    canonical✓    base 
gnome-3-26-1604       3.26.0.20200529             100    latest/stable/…  canonical✓    -
gnome-3-28-1804       3.28.0-19-g98f9e67.98f9e67  145    latest/stable    canonical✓    -
gnome-3-34-1804       0+git.3556cb3               66     latest/stable    canonical✓    -
gnome-calculator      3.38.0+git7.c840c69c        826    latest/stable/…  canonical✓    -
gnome-characters      v3.34.0+git9.eeab5f2        570    latest/stable/…  canonical✓    -
gnome-logs            3.34.0                      100    latest/stable/…  canonical✓    -
gnome-system-monitor  3.36.0-12-g35f88a56d7       148    latest/stable/…  canonical✓    -
gtk-common-themes     0.1-50-gf7627e4             1514   latest/stable/…  canonical✓    -
signal-desktop        1.39.5                      345    latest/stable    snapcrafters  -

I don’t know why gnome-calculator is there. It doesn’t appear to be the binary that’s run when I start the calculator.

So are any of them a security risk? Well…

$ grep -l \$LD_LIBRARY_PATH /snap/*/current/snap/snapcraft.yaml
/snap/gnome-calculator/current/snap/snapcraft.yaml
/snap/gnome-characters/current/snap/snapcraft.yaml
/snap/gnome-logs/current/snap/snapcraft.yaml
/snap/gnome-system-monitor/current/snap/snapcraft.yaml

Those are all the snaps on my system which include the value of the (empty) environment variable LD_LIBRARY_PATH, so are likely vulnerable to this.

But does this really end up with an empty item in the LD_LIBRARY_PATH list?

$ which gnome-system-monitor 
/snap/bin/gnome-system-monitor
$ gnome-system-monitor &
$ pgrep -f gnome-system-monitor
8259
$ tr '\0' '\n' < /proc/8259/environ | grep ^LD_LIBR | grep -q :: && echo "oh dear"
oh dear

Yes it really does.

(The tr is necessary above because the /proc/*/environ file is a NUL-separated string, so that modifies it to be one variable per line, then looks for the LD_LIBRARY_PATH line, and checks if it has an empty entry ::)

So yeah, my gnome-system-monitor is a local code execution vector.

As are my gnome-characters, gnome-logs and that gnome-calculator if I ever uninstall the non-snap version.

That CVE seems to have been published on 3 December 2020. I hope that the affected snaps will be fixed soon.

I don’t like that the CVE says the impact is:

If a user were tricked into installing a malicious snap or downloading a malicious library, under certain circumstances an attacker could exploit this to affect strict mode snaps that have access to the library and were launched from the directory containing the library.

My first thought upon reading is, “I’m safe, I haven’t been tricked into downloading any malicious snaps!” But I do have snaps that aren’t malicious, they are just insecure. The hardest part of the exploit is indeed getting a malicious file (a library) into my filesystem in a directory where I will run a snap from.

Starting services only when the network is ready on Debian/systemd

TL;DR: ^

  • Make sure that whatever configures your network supports network-online.target
  • Override the service unit to have Wants=network.target network-online.target and the same for After=

Overview ^

Sometimes you only want services to start up once there is a network configured. Most network services can handle the situation where there is initially no network, waiting until the network appears, because this is a very common situation.

Other services though may not in themselves be expecting to use the network, and so have never thought about it. Also a great thing about open source software is that it tends to be very composable, so it’s not possible to predict the ways that people will use combinations of software.

The problem ^

systemd will tend to start things as soon as it can. If your service is not configured to wait for the network that means it will most likely be started up before the network exists. If your service then tries to do something that requires a network it will receive an error, which it may not be prepared to handle.

A concrete example: ulogd2 ^

A real life example for me is ulogd2. ulogd2 allows your firewall rules to log things in a variety of ways, in incredible detail.

Most of the ways people configure it involve just logging to the local filesystem, so it doesn’t actually require the network to be configured first.

The default systemd configuration in Debian buster for the ulogd2 service looks like this:

$ sudo systemctl cat ulogd2.service
[Unit]
Description=Netfilter Userspace Logging Daemon
Documentation=man:ulogd(8)
 
[Service]
Type=forking
PIDFile=/run/ulog/ulogd.pid
ExecStart=/usr/sbin/ulogd --daemon --uid ulog --pidfile /run/ulog/ulogd.pid
 
[Install]
Alias=ulogd.service
WantedBy=multi-user.target

As you can see there’s nothing in there that says to wait for a network.

I use a database plugin for ulogd2 that makes it log to a (remote) database. As a consequence as soon as it starts up it tries to establish a database connection, immediately fails as there is no route to any remote host, retries a few times and then bails out.

Most of the time it exhausts its retries before the network is up, so the result is that the service is in a failed state. Simply manually starting the service (or having config management do it) resolves that, but that’s a mess.

Ideally I don’t want systemd to start ulogd2 until there is a network.

Wants=network-online.target” mate. Job done. No! ^

If like me you know just enough about systemd to be dangerous, you figure that what you want to do is add something like this to the [Unit] section of the service unit file:

[Unit]
…
Wants=network.target network-online.target
After=network.target network-online.target

This is only part of the correct solution. If you do only this then you’ll probably find that nothing actually changes.

About network-online.target ^

The thing about the network-online target is that it doesn’t exist unless you’re using a “modern” method of bringing up your networking, like NetworkManager or systemd-networkd.

If you’re not doing that then systemd works out that the network-online target can never be reached and ignores it as a Want.

I’m using ifupdown on servers as it still does everything I need it to. To make ifupdown support the network-online target on Debian, you should enable the ifupdown-wait-online service:

$ sudo systemctl enable ifupdown-wait-online.service

This will inject the network-online “target reached” state when every interface that is marked as “auto” in /etc/network/interfaces is up.

Editing a service file ^

The temptation now may be to edit the ulogd2 service file that’s under /lib/systemd/system/ to contain the Want/After bits.

That will work but it isn’t the correct way because if there is a package update then your changes will be overwritten.

A better way is to place a new service file into /etc/systemd/system/. That will entirely override the distributed copy. The obvious downside there is that if there’s an improvement to the packaged service file then you’ll never use it, as you’ve entirely overridden it with your own file.

Overrides to the rescue ^

The best way is to use an override file, and the easiest way to do that is with systemctl edit:

$ sudo systemctl edit ulogd2
[your favourite editor starts]
[Unit]
Wants=network.target network-online.target
After=network.target network-online.target

Check your changes took effect:

$ sudo systemctl cat ulogd2.service
# /lib/systemd/system/ulogd2.service
[Unit]
Description=Netfilter Userspace Logging Daemon
Documentation=man:ulogd(8)
 
[Service]
Type=forking
PIDFile=/run/ulog/ulogd.pid
ExecStart=/usr/sbin/ulogd --daemon --uid ulog --pidfile /run/ulog/ulogd.pid
 
[Install]
Alias=ulogd.service
WantedBy=multi-user.target
 
# /etc/systemd/system/ulogd2.service.d/override.conf
[Unit]
Wants=network.target network-online.target
After=network.target network-online.target

Note that this shows you where the files actually are. That makes it easy to distribute this through config management.