Link shortener, part 2

In the previous article I had some thoughts about link shorteners, and how I might try to implement one as a means to learn some Rust.

Since then I’ve made some progress and pushed it to GitHub. It will not be a good example of Rust nor of a link shortener. Remember: it’s only the second piece of Rust I’ve ever written. At the moment it only implements a basic REST API and stores things in an in-memory SQLite database. I’ll probably work on it some more in order to learn some more.

I’ve really enjoyed the process so far of learning the Rust so far though. I mean, it may lead to a lot of yak shaving on changing my editor to something that will popup all the fancy completions that you see the developers on YouTube have. But other than that it was surprisingly fast to learn how to do things even if not yet in the best possible way.

Things I found/find particularly helpful:

Musings on link shorteners, UUIDv7, Base58, …

Yesterday I started thinking about maybe learning some Rust and as part of that I thought perhaps I might try to implement a link shortener.

Disclaimer ^

Now, clearly there are tons of existing commercial link shorteners, I’m not interested in making a commercial link shortener since:

  • I don’t think I am capable of doing a great job of it.
  • I think being a commercial success here would involve a lot of activity
    that I don’t like.

There’s also plenty of existing FOSS projects that implement a link shortener, in almost every language you can think of. That’s interesting from an inspiration point of view but my goal here is to learn a little bit about Rust so there’s no point in just setting up someone else’s project.

So anyway, point is, I’m not looking to make something commercially viable and I don’t think I can make something better than what exists. I’m just messing about and trying to learn some things.

Seems like a good project to learn stuff – it can be very very simple, but grow to include many different areas such as databases, REST API, authentication and so on.

On Procrasturbation ^

The correct thing to do at this point is to just get on with it. I should not even be writing this article! I should be either doing paying work, life admin, or messing about with my learning project.

As usual though, my life is a testament to not doing the correct thing. 😀 Charitably I’ll say that I can’t stop myself from planning how it should be done rather than just doing it. Realistically there is a lot of procrastination in there too.

So Many Open Questions ^

I hope I’ve made clear that this is a learning goal for me, so it follows that I have a lot of open questions. If you came here looking for expert guidance then you’ve come to the wrong place. If you have answers to my questions that would be great though. And of the few assertions I do make, if you disagree then I’d also like to hear that opinion.

Thinking About Link Shorteners ^

Is enumeration bad? ^

Obviously the entire point of a link shortener is to map short strings to long ones. As a consequence the key space of short strings is quite easy to iterate through and the pressure is always there to keep the space small as that’s what makes for appealing short links. Is this bad? If it is bad, how bad is it and so how much should the temptation to keep things short be resisted?

An extremely naive link shortener might just increment a counter (and perhaps convert decimal digits to a string with more symbols so it’s shorter). For example:


That’s great for the shortest keys possible but it’s trivial for anyone in the world to just iterate through every link in your database. Users will have an expectation that links they shorten and do not show to anyone else remain known only by them. People shorten links to private documents all the time. But every guess results in a working (or at least, submitted) link and they would be proximal in time: link 99 and link 100 were likely submitted very close together in time, quite possibly by the same person.

A simple counter seems unacceptable here.

But what can a link shortener actually do to defend against enumeration? The obvious answer is rate limiting. Nobody should be doing thousands of GET requests against the shortener. And if the key space was made sparse so that some of these GET requests result in a 404, that’s also highly suspicious and might make banning decisions a lot easier.

Therefore, I think there should be rate limiting, and the key space should be sparse so that most guesses result in a 404 error.

When I say “sparse key space” I mean that the short link key that is generated should be randomly and evenly distributed over a much larger range than is required to fit all links in the database.

How sparse though? Should random guesses result in success 50% of the time? 1%? 0.1%? I don’t know. I don’t have a feel for how big a key space would be to begin with. There is always the tension here against the primary purpose of being short!

If that’s not clear, consider a hash function like md5. You can feed anything into md5 and it’ll give you back a 128 bit value which you could use as the key (the short link). Even if you have billions of links in your database, most of that 128 bit key space will be empty.

(If you’re going to use a hash function for this, there’s much better hash functions than md5, but just to illustrate the point.)

The problem here is, well, it’s 128 bits though. You might turn it into hex or Base64 encode it but there’s no escaping the fact that 128 bits of data is Not Very Short and never will be.

Even if you do distribute over a decently large key space you’ll want to cut it down for brevity purposes, but it’s hard (for me) to know how far you can go with that. After all, if you have just 32 links in your database then you could spread them out between /00 and /ff using only hex and less than 1 in 8 would correspond to a working link, right?

I don’t know if 1 in 8 is a small enough hit rate, especially at the start when it’s clear to an attacker that your space has just 256 possible values.

Alphabets ^

Moving on from the space in which the keys exist, what should they actually look like?

Since I don’t yet know how big the key space will be, but do think it will have to start big and be cut down, maybe I will start by just looking at various ways of representing the full hash to see what kind of compression can be achieved.

I’ve kind of settled on the idea of database keys being UUIDv7. A UUIDv7 is 128 bits although 6 bits of it is reserved for version fields. Out of the top 64 bits, 60 of them are used for a timestamp. Of the bottom 64 bits, 62 of them are used for essentially random data.

I’m thinking that these database keys will be private so it doesn’t matter that if you had one you could extract the submit time out of it (the top 60 bits). The purpose of having the first half of the key be time-based is to make them a bit easier on the database, providing some locality. 128 bits of key is massive overkill but I think it’s worth it for the support (for UUIDv7) across multiple languages and applications.

As I say, I know I’m not going to use all of the 128 bits of the UUIDv7 to generate a short key but just to see what different representations would look like I will start with the whole thing.

Base64 ^

The typical answer to this sort of thing is Base64. A Base64 representation of 128 bits looks like this:

$ dd if=/dev/urandom bs=16 count=1 status=none | base64

The == at the end are padding and if this doesn’t need to be decoded, i.e. it’s just being used as an identifier — as is the case here — then they can be omitted. So that’s a 22-character string.

Base64URL ^

Base64 has a few issues when used as parts of URLs. Its alphabet contains ‘+’, ‘/’ and (when padding is included) ‘=’, all of which are difficult when included in a URL string.

Base64URL is a modified Base64 alphabet that uses ‘-‘ and ‘_’ instead and has no padding. Apart from being friendly for URLs it will also be 22 characters.

Base58 ^

There are additional problems with Base64 besides its URL-unfriendly alphabet. Some of it is also unfriendly to human eyesight. Its alphabet contains ‘1’, ‘l’, ‘O’ and ‘0’ which are easy to confuse with each other.

The Bitcoin developers came up with Base58 (but let’s not hold that against it…) in order to avoid these transcription problems. Although short links will primarily be copied, pasted and clicked on it does seem desirable to also be able to easily manually transcribe them. How much would we pay for that, in terms of key length?

A Base58 of 128 bits looks like this:


That happens to be 21 characters which implies it is somehow shorter than Base64 despite the fact that Base58 has a smaller alphabet than Base64. How is it possible?

It’s because each character of Base58 encodes a fractional amount of data — 58 isn’t a power of 2 — so depending upon what data you put in sometimes it will need 22 characters and other times it only needs 21.

It can be quantified like this:

  • Base64 = log2 64 = 6 bits encoded per character.
  • Base58 = log2 58 = 5.857980995127572 bits per character.

It seems worth it to me. It’s very close.

How much to throw away ^

In order to help answer that question I wanted to visualise just how big various key spaces would be. Finally, I could no longer avoid writing some code!

I’d just watched Jeremy Chone’s video about UUIDs and Rust, so I knocked together this thing that explores UUIDv7 and Base58 representations of (bits of) it. This is the first Rust I have ever written so there’s no doubt lots of issues with it.

The output looks like this:

Full UUID:
  uuid v7 (36): 018f244b-942b-7007-927b-ace4fadf4a88
Base64URL (22): AY8kS5QrcAeSe6zk-t9KiA
   Base58 (21): CAfx7fLJ3YBDDvuwwEEPH
Base58 of bottom 64 bits:
              Hex bytes: [92, 7b, ac, e4, fa, df, 4a, 88]
Base58 encodes log2(58) = 5.857980995127572 bits per character
IDs from…   Max chars Base58          Can store
…bottom 64b 11        RW53EVp5FnF =   18,446,744,073,709,551,616 keys
…bottom 56b 10        5gqCeG4Uij  =       72,057,594,037,927,936 keys
…bottom 48b  9        2V6bFSkrT   =          281,474,976,710,656 keys
…bottom 40b  7        SqN8A3h     =            1,099,511,627,776 keys
…bottom 32b  6        7QvuWo      =                4,294,967,296 keys
…bottom 24b  5        2J14b       =                   16,777,216 keys
…bottom 16b  3        6fy         =                       65,536 keys

The idea here is that the bottom (right most) 64 bits of the UUIDv7 are used to make a short key, but only as many bytes of it as we decide we need.

So for example, if we decide we only need two bytes (16 bits) of random data then there’ll be 216 = 65,536 possible keys which will encode into three characters of Base58 — all short links will be 3 characters for a while.

When using only a few bytes of the UUID there will of course be collisions. These will be rare so I don’t think it will be an issue to just generate another UUID. As the number of existing keys grows, more bytes can be used.

Using more bytes will also enforce how sparse the key space is.

For example, let’s say we decide that only 1 in 1,000 random guesses should hit upon an existing entry. The first 65 links can be just three characters in length. After that the key space has to increase to 4 characters. That gets us 4 × 5.857980995127572 = 23 and change bits of entropy, which is 223 = 8,388,608 keys. Once we get to 8,388 links in the database we have to go to 5 characters which sees us through to 16,777 total keys.

Wrap Up ^

Is that good enough? I don’t know. What do you think?

Ultimately you will not stop determined enumerators. They will use public clouds to request from a large set of sources and they won’t go sequentially.

People should not put links to sensitive-but-publicly-requestable data in link shorteners. People should not put sensitive data anywhere that can be accessed without authentication. Some people will sometimes put sensitive data in places where it can be accessed. I think it’s still worth trying to protect them.

Aside ^

Having some customers who run personal link shorteners that they keep open to the public (i.e. anyone can sumit a link), they constantly get used for linking to malicious content. People link to phishing pages and malware and then put the shortlink into their spam emails so that URL-based antispam is confused. It is a constant source of administrative burden.

If I ever get a minimum viable product it will not allow public link submission.

For file integrity testing, you’re wasting your time with md5

Every time I go to test file integrity — e.g. are these two files the same? Is this file the same as a backup copy of this file? — muscle memory makes me type md5sum. Then my brain reprimands me:

Wait! md5 is insecure! It’s broken! Use SHA256!

Hands are wrong and brain is wrong. Just another day in the computer mines.

Well, if it was a secure hash function you were looking for, where someone might tamper with these files, then brain is not so wrong: MD5 has long been known to be too weak and is trivially subject to collision attacks.

But for file integrity on trusted data, like where you are checking for bitrot, cosmic rays or just everyday changes, you don’t need a cryptographically secure hash function. md5sum is safe enough for this, but in terms of performance it sucks. There’s been better hash functions around and packaged in major operating systems for years. Such as xxHash!

Maybe like me you reach for md5sum because…

  • You always have!
  • It’s right there!
  • It’s pretty fast though right?

On Debian, xxhash is right there after you have typed:

$ sudo apt install xxhash

Here’s me hashing the first 1GiB of one of my desktop machine’s NVMe drives.

Hash Function CPU seconds (user+kernel) %CPU
XXH128 0.21 10
xXH64 0.21 11
MD5 1.38 56
SHA1 1.72 62
SHA512 2.36 70
SHA256 3.76 80

I think this scenario was a good test as NVMe are really fast, so this focuses on the cost of algorithm rather than the IO. But if you want to see similar for slow storage, here is me doing same by reading 10GiB off a pair of 7,200RPM SATA drives:

Hash Function CPU seconds (user+kernel) %CPU
XXH128 2.44 5
xXH64 4.76 10
MD5 16.62 35
SHA1 18.00 38
SHA512 23.74 51
SHA256 35.99 69
$ for sum in md5 sha1 sha256 sha512 xxh64 xxh128; do \
sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'; \
printf "# %ssum\n" "$sum"; \
sudo dd if=/dev/sda bs=1M count=1024 status=none \
| /usr/bin/time -f 'CPU time %Us (user), %Ss (kernel); %P total CPU' "${sum}sum"; \
# md5sum
c5515c49de5116184a980a51c7783d9f  -
CPU time 1.28s (user), 0.10s (kernel); 56% total CPU
# sha1sum
60ecdfefb6d95338067b52118d2c7144b9dc2d63  -
CPU time 1.62s (user), 0.10s (kernel); 62% total CPU
# sha256sum
7fbffa1d96ae2232aa754111597634e37e5fd9b28ec692fb6deff2d020cb5bce  -
CPU time 3.68s (user), 0.08s (kernel); 80% total CPU
# sha512sum
eb4bffafc0dbdf523cc5229ba379c08916f0d25e762b60b2f52597acb040057a4b6795aa10dd098929bde61cffc7a7de1ed38fc53d5bd9e194e3a84b90fd9a21  -
CPU time 2.29s (user), 0.07s (kernel); 70% total CPU
# xxh64sum
d43417824bd6ef3a  stdin
CPU time 0.06s (user), 0.15s (kernel); 11% total CPU
# xxh128sum
e339b1c3c5c1e44db741a2e08e76fe66  stdin
CPU time 0.02s (user), 0.19s (kernel); 10% total CPU

Just use xxhsum!

I don’t think the cheapest APC Back-UPS units can be monitored except in Windows

TL;DR: Despite otherwise seeming to work correctly, I can’t monitor a Back-UPS BX1600MI in Linux without seeing a constant stream of spurious battery detach/reattach and power fail/restore events that last less than 2 seconds each. I’ve tried multiple computers and multiple UPSes of that model. It doesn’t happen in their own proprietary Windows software, so I think they’ve changed the protocol.

Apart from nearly two decades ago when I was given one for free, I’ve never bothered with a UPS at home. Our power grid is very reliable. Looking at availability information from “uptimed“, my home file server has been powered on for 99.97% of the time in the last 14 years. That includes time spent moving house and a day when the house power was off for several hours while the kitchen was refitted!

However, in December 2023 a fault with our electric oven popped the breaker for the sockets causing everything to be harshly powered off. My fileserver took it badly and one drive died. That wasn’t a huge issue as it has a redundant filesystem, but I didn’t like it.

I decided I could afford to treat myself to a relatively cheap UPS.

I did some research and read some reviews of the APC Back-UPS range, their cheapest offering. Many people were dismissive calling them cheap pieces of crap with flimsy plastic construction and batteries that are not regarded as user-replaceable. But there was no indication that such a model would not work, and I felt it hard to justify paying a lot here.

I found YouTube videos of the procedure that a technician would go through to replace the battery in 3 to 5 years. To do it yourself voids your warranty, but your warranty is done after 3 years anyway. It looked pretty doable even for a hardware-avoidant person like myself.

It’s important to me that the UPS can be monitored by a Linux computer. The entire point here is that the computer detects when the battery is near to exhausted and gracefully powers itself down. There are two main options on Linux for this: apcupsd and Network UPS Tools (“nut“).

Looking at the Back-UPS BX1600MI model, it has a USB port for monitoring and says it can be monitored with APC’s own Powerchute Serial Shutdown Windows software. There’s an entry in nut‘s hardware compatibility list for “Back-UPS (USB)” of “supported, based on publicly available protocol”. I made the order.

The UPS worked as expected in terms of being an uninterruptible power supply. It was hopeless trying to talk to it with nut though. nut just kept saying it was losing communications.

I tried apcupsd instead. This stayed connected, but it showed a continuous stream of battery detach/reattach and power fail/restore events each lasting less than 2 seconds. Normally on a power fail you’d expect a visual and audible alert on the UPS itself and I wasn’t getting any of that, but I don’t know if that’s because they were real events that were just too brief.

I contacted APC support but they were very quick to tell me that they did not support any other software but their own Windows-only Powerchute Serial Shutdown (PCSS).

I then asked about this on the apcupsd mailing list. The first response:

“Something’s wrong with your UPS, most likely the battery is bad, but since you say the UPS is brand new, just get it replaced.”

As this thing was brand new I wasn’t going to go through a warranty claim with APC. I just contacted the vendor and told them I thought it was faulty and I wanted to return it. They actually offered to send me another one in advance and me send back the one I had, so I went for that.

In the mean time I found time to install Windows 10 in a virtual machine and pass through USB to it. Guess what? No spurious events in PCSS on Windows. It detected expected events when I yanked the power etc. I had no evidence that the UPS was in any way faulty. You can probably see what is coming.

The replacement UPS (of the same model) behaved exactly the same: spurious events. This just seems to be what the APC Back-UPS does on non-Windows.

Returning to my thread on the apcupsd mailing list, I asked again if there was actually anyone out there who had one of these working with non-Windows. The only substantive response I’ve got so far is:

“BX are the El Cheapo plastic craps, worst of all, not even the BExx0 family is such a crap – Schneider’s direct response to all the chinese craps flooding the markets […] no sane person would buy these things, but, well, here we are.”

So as far as I am aware, the Back-UPS models cannot currently be monitored from non-Windows. That will have to be my working theory unless someone who has it working with non-Windows contacts me to let me know I am wrong, which I would be interested to know about. I feel like I’ve done all that I can to find such people, by asking on the mailing list for the software that is meant for monitoring APC UPSes on Unix.

After talking all this over with the vendor they’ve recommended a Riello NPW 1.5kVA which is listed as fully supported by nut. They are taking the APC units back for a full refund; the Riello is about £30 more expensive.

Farewell Soekris, old friend

This morning I shut off the Soekris Engineering net4801 that has served as our home firewall / PPP termination box for just over 18½ years.

Front view of a Soekris net4801
Front view of a Soekris net4801. Clothes peg for scale.
Inside of a Soekris net4801
Inside of a Soekris net4801.

In truth this has been long overdue. Like, at least 10 years overdue. It has been struggling to cope with even our paltry ~60Mbps VDSL (what UK calls Fibre to the Cabinet). But I am very lazy, and change is work.

In theory we can get fibre from Openreach to approach 1Gbit/s down, and I should sort that out, but see above about me being really very lazy. The poor old Soekris would certainly not be viable then.

I’ve replaced it with a PC Engines APU2 (the apu2e2 model). Much like the Soekris it’s a fanless single board x86 computer with coreboot firmware so it’s manageable from the BIOS over serial.

Soekris net4801 PC Engines apu2e2
1 core @266MHz
x86 (32-bit)
4 cores @1GHz (turbo 1.4GHz)
amd64 (64-bit)
Memory 128MiB 2048MiB
Storage 512MiB CompactFlash 16GiB mSATA SSD
Ports 3x 100M Ethernet, 1 serial 3x 1G Ethernet, 1 serial

The Soekris ran Debian and so does the APU2. Installing it over PXE was completely straightforward on the APU2; a bit simpler than it was with the net4801 back in 2005! If you have just one and it’s right there in the same building then it’s probably quicker to just boot the Debian installer off of USB though. I may be lazy but once I do get going I’m also pointlessly bloody-minded.

Anyway, completely stock Debian works fine, though obviously it has no display whatsoever — all non-Ethernet-based interaction would have to be done over serial. By default that runs at 115200 baud (8n1).

This is not “home server” material. Like the Soekris even in 2005 it’s weak and it’s expensive for what it is. It’s meant to be an appliance. I think I was right with the Soekris’s endurance, beyond even sensible limits, and I hope I will be right about the APU2.

The Soekris is still using its original 512M CompactFlash card from 2005 by the way. Although admittedly I did go to some effort to make it run on a read-only filesystem, only flipped to read-write for upgrades.

ncmpcpp — A Modern(ish) Text-Based Music Setup On Linux

Preface ^

This article is about how I’ve ended up (back) on the terminal-based music player ncmpcpp on my GNOME Linux desktop and laptop. I’ll cover why it is that this has happened, and some of the finer points of the configuration. The various scripts are available at GitHub. My thing now looks like this:

A screenshot of my ncmpcpp setup running in a kitty terminal, with a track change notification visible in the top right corner
A screenshot of my ncmpcpp setup running in a kitty terminal, with a track change notification visible in the top right corner

These sorts of things are inherently personal. I don’t expect that most people would have my requirements — the lack of functioning software that caters for them must indicate that — but if you do, or if you’re just interested in seeing what a modern text interface player can do on Linux, maybe you will be interested in what I came up with.

My Requirements ^

I’m one of those strange old-fashioned people who likes owning the music I regularly play, instead of just streaming everything, always. I don’t mind doing a stream search to play something on a whim or to check out new music, but if I think I’ll want to listen to it again then I want to own a copy of it. So I also need something to play music with.

I thought I had simple requirements.

Essential ^

  • Fill a play queue randomly by album, i.e. queue entire albums at once until some target number of tracks are in the queue. The sort of thing that’s often called a “dynamic playlist” or a “smart playlist” these days.
  • Have working media keys, i.e. when I press the Play/Pause button or the Next button on my keyboard, that actually happens.

That’s it. Those are my essential requirements.

Nice to have ^

  • Have album cover art displayed.
  • Have desktop notifications show up announcing a new track being played.

Ancient history ^

Literally decades ago these needs were met by the likes of Winamp and Amarok; software that’s now consigned to history. Still more than a decade ago on desktop Linux I looked around and couldn’t easily find what I wanted from any of the music apps. I settled on putting my music in mpd and using an mpd client to play it, because that way it was fairly easy to write a script for a dynamic play queue that worked exactly how I wanted it to — the most important requirement.

For a while I used a terminal-based mpd client called ncmpcpp. I’m very comfortable in a Linux terminal so this wasn’t alien to me. It’s very pleasant to use, but being text-based it doesn’t come with the niceties of media key support, album cover art or desktop notifications. The mpd client that I settled upon was GNOME’s built-in gmpc. It’s a very basic player but all it had to do was show the play queue that mpd had provided, and do the media keys, album art and notifications.

Change Is Forced Upon Me ^

Fast forward to December 2023 and I found myself desperately needing to upgrade my Ubuntu 18.04 desktop machine. I switched to Debian 12, which brought with it a new major version of GNOME as well as using Wayland instead of Xorg. And I found that gmpc didn’t work correctly any more! The media keys weren’t doing anything (they work fine in everything else), and I didn’t like the notifications.

I checked out a wide range of media players again. I’m talking Rhythmbox, Clementine, Raspberry, Quod Libet and more. Some of them clearly didn’t do the play queue thing. Others might do, but were incomprehensible to me and lacking in documentation. I think the nearest might have been Rhythmbox which has a plugin that can queue a specified number of random albums. There is an 11 year old GitHub issue asking for it to just continually queue such albums. A bit clunky without that.

I expect some reading this are now shouting at their screens about how their favourite player does actually do what I want. It’s quite possible I was too ignorant to notice it or work out how. Did I mention that quite a lot of this software is not documented at all? Seriously, major pieces of software that just have a web site that is a set of screenshots and a bulleted feature list and …that’s it. I had complained about this on Fedi and got some suggestions for things to try, which I will (and I’ll check out any that are suggested here), but the thing is… I know how shell scripts work now. 😀

This Is The Way ^

I had a look at ncmpcpp again. I still enjoyed using it. I was able to see how I could get the niceties after all. This is how.

Required Software ^

Here’s the software I needed to install to make this work on Debian 12. I’m not going to particularly go into the configuration of Debian, GNOME, mpd or ncmpcpp because it doesn’t really matter how you set those up. Just first get to the point where your music is in mpd and you can start ncmpcpp to play it.

Packaged in Debian ^

  • mpd
  • mpc
  • ncmpcpp
  • kitty
  • timg
  • libnotify-bin
  • inotify-tools


$ apt install mpd mpc ncmpcpp kitty timg libnotify-bin inotify-tools

In case you weren’t aware, you can arrange for your personal mpd to be started every time you start your desktop environment like this:

$ systemctl --user enable --now mpd

The --now flag both enables the service and starts it right away.

At this point you should have mpd running and serving your music collection to any mpd client that connects. You can verify this with gmpc which is a very simple graphical mpd client.

Not currently packaged in Debian ^

mpd-mpris ^

This small Go binary listens on the user DBUS for the media keys and issues mpd commands appropriately. If you didn’t want to use this then you could lash up something very simple that executes e.g. “mpc next” or “mpc toggle” when the relevant key is pressed, but this does it all for you. Once you’ve got it from GitHub place the binary in $HOME/bin/, the mpd-mpris.service file from my GitHub at $HOME/.config/systemd/user/mpd-mpris.service and issue:

$ systemctl --user enable --now mpd-mpris

Assuming you have a running mpd and mpd client your media keys should now control it. Test that with gmpc or whatever.

My scripts and supporting files ^

Just four files, and they are all in GitHub. Here’s what to do with them. ^

Put it in $HOME/.ncmpcpp/. It shouldn’t need editing.

default_cover.jpg ^

Put it in $HOME/.ncmpcpp/. If you don’t like it, just substitute it with any other you like. When it comes time for timg to display it, it will scale it to fit inside the window whatever size it is on your desktop. ^

Put it in $HOME/.ncmpcpp/. You’ll need to change candidate_name near the top if your album cover art files aren’t called cover.jpg.

viz.conf ^

Put it in $HOME/.ncmpcpp/. This is a cut-down example ncmpcpp config for the visualizer pane that removes a number of UI elements. It’s just for an ncmpcpp that starts on a visualizer view so feel free to customise it however you like your visualizer to be. You will need to change mpd_music_dir to match where your music is, like in your main ncmpcpp config.

The Main App ^

The main app displayed in the screenshot above is a kitty terminal with three windows. The leftmost 75% of the kitty terminal runs ncmpcpp defaulting to the playlist view. In the bottom right corner is a copy of ncmpcpp defaulting to the visualizer view and using the viz.conf. The top right corner is running a shell script that polls for album covert art and displays it in the terminal.

kitty is one of the newer crop of terminals that can display graphics. The timg program will detect kitty‘s graphics support and display a proper graphical image. In the absence of kitty‘s graphical protocol timg will fall back to sixel mode, which may be discernible but I wouldn’t personally want to use it.

I don’t actually use kitty as my day-to-day terminal. I use gnome-terminal and tmux. You can make a layout like this with gnome-terminal and tmux, or even kitty and tmux, but tmux doesn’t support kitty‘s graphical protocol so it would cause a fall back to sixel mode. So for this use and this use alone I use kitty and its built-in windowing support.

If you don’t want to use kitty then pick whatever terminal you like and figure out how to put some different windows in it (tmux panes work fine, layout-wise). timg will probably fall back to sixels as even the venerable xterm supports that. But assuming you are willing to use kitty, you can start it like this:

$ kitty -o font_size=16 --session ~/.config/kitty/ncmpcpp.session

That kitty session file is in GitHub with everything else, and it’s what lays things out in the main terminal window. You should now be able to start playing music in ncmpcpp and have everything work.

How Stuff Works ^

You don’t need to know how it works, but in case you care I will explain a bit.

There are two bash shell scripts; and

Album cover art ^ uses inotifywait from the inotify-tools package to watch a file in a cache directory. Any time that file changes, it uses timg to display it in the upper right window and queries mpd for the meta data of the currently-playing track.

Track change tasks ^ is a bit more involved.

ncmpcpp is made to execute it when it changes track by adding this to your ncmpcpp configuration:

execute_on_song_change = "~/.ncmpcpp/ -m /path/to/your/music/dir"

The /path/to/your/music/dir should be the same as what you have set your music library to in your MPD config. It defaults to $HOME/Music/ if not set.

First it asks mpd for a bunch of metadata about the currently-playing track. Using that it’s able to find the directory in the filesystem where the track file lives. It assumes that if album cover art is available then it will be in this directory and named cover.jpg. If it finds such a file then it copies it to the place where is expecting to find it. That will trigger that script’s inotifywait to display the new image. If it doesn’t find such a file then a default generic cover art image is used.

(A consequence of this is that it expects each directory in your music library to be for an album, with the cover.jpg being the album covert art. It intentionally doesn’t try to handle layouts like Artist/Track.ogg because it hasn’t got a way to know which file would be for that album. If you use some other layout I’d be interested in hearing about it. An obvious improvement would be to have it look inside each file’s metadata for art in the absence of a cover.jpg in the directory. That would be pretty easy, but it’s not relevant for my use at the moment.)

Secondly, a desktop notification is sent using notify-send. Most modern desktops including GNOME come with support for showing such notifications. Exactly how they look and the degree to which you can configure that depends upon your desktop environment. For GNOME, the answer is “like ass“, and “not at all without changing notification daemon,” but that’s the case for every notification on the system so is a bit out of scope for this article.

Other Useful Tools ^

I use a few other bits of software to help manage my music collection and play things nicely, that aren’t directly relevant to this.

Library maintenance ^

A good experience relies on there being correct metadata and files in the expected directory structure. It’s pretty common for music I buy to have junk metadata, and moving things into place would be tedious even when the metadata is correct. MusicBrainz Picard to the rescue!

It’s great at fixing metadata and then moving files en masse to my chosen directory structure. It can even be told for example that if the current track artist differs from the album artist then it should save the file out to “${album_artist}/${track_number}-${track_artist}-${track title}.mp3” so that a directory listing of a large “Various Artists” compilation album looks nice.

It also finds and saves album cover art for me.

It’s packaged in Debian.

I hear good things about beets, too, but have never tried it.

Album cover art ^

Picard is pretty good at finding album cover art but sometimes it can’t manage it, or it chooses sub-par images. I like the Python app sacad which tries really hard to find good quality album art and works on masses of directories at once.

Nicer desktop notifications ^

I really don’t like the default GNOME desktop notifications. On a 4K display they are tiny unless you crank up the general font size, in which case your entire desktop then looks like a toddler’s toy. Not only is their text tiny but they don’t hold much content either. When most track title notifications are ellipsized I start to wonder what the point is.

I replaced GNOME’s notification daemon with wired-notify, which is extremely configurable. I did have to clone it out of GitHub, install the rust toolchain and cargo build it, however.

My track change script that I talk about above will issue notifications that work on stock GNOME just as well as any other app’s notifications, but I prefer the wired-notify ones. Here’s an unscaled example.

A close up of a notification from
A close up of a notification from

It’s not a work of art by any means, but is so much better than the default experience. There’s a bunch of other people’s configs showcased on their GitHub.

Scrobbling ^

mpdscribble has got you covered for and Again it is already packaged in Debian.

Shortcomings ^

If there’s any music files with tabs or newlines in any of their metadata, the scripts are going to blow up. I’m not sure of the best way to handle that one. mpc can’t format output NULL-separated like you’d do with GNU find. I’m not sure there is any character you can make it use in a format that is banned in metadata. I think worst case is simply messed up display and/or no cover art displayed, and I’d regard tabs and newlines in track metadata as a data error that I’d want to fix, so maybe I don’t care too much.

timg is supposed to scale and centre the image in the terminal, and the kitty window does resize to keep it at 25% width, 50% height, but timg is sometimes visibly a little off-centre. No ideas at the moment how to improve that.

mpd is a networked application — while by default it listens only on localhost, you can configure it to listen on any or all interfaces and be available over your local network or even the Internet. All of these scripts rely on your mpd client, in this case ncmpcpp, having direct access to the music library and its files, which is probably not going to be the case for a non-localhost mpd server. I can think of various tricky ways to handle this, but again it’s not relevant to my situation at present.

Mutt wins again – subject munging on display

TL;DR: ^

You can munge subjects for display only using the subjectrx Mutt configuration directive.

The Setup ^

I use the terminal-based email reader Mutt.

Many projects that I follow are switching away from email discussion lists in favour of web-first interfaces (“forums”, I think the youngsters are calling them now) like Discourse. That is fine—there’s lots of problems with trying to run a busy community over email—but Discourse offers a “mailing list mode” and I still find my Mutt email client to be a comfortable way to follow discussions. So all my accounts on the various Discourse instances are set to mailing list mode.

The Problem ^

One of the slight issues I have with this is the subject lines that Discourse uses. On an instance with a lot of categories and sub-categories, these will all be prepended to the subject line of each email using up quite a lot of screen space.

The same is true for legacy mailing list subject tags, but in that environment the admins were generally conscious that whatever text they chose would be prepended to every subject, so they tend to choose terse tags like “[users]” for example.

There was a time when subject line tags were controversial amongst experienced email users, because experienced email users know how to sort and filter their mails based on headers and don’t need a tag in the subject line to let them know what an email is. It doesn’t seem to be very controversial any more; I hypothesise that’s because new Internet users don’t use email as much and so don’t value spending much time working out how to get their filtering just right, and so on. So, most legacy mailing lists that I’m part of now do use terse subject tags and not many people complain about that.

Since the posts on Discourse are primarily intended for a web browser, the verbosity of the categories is not an issue. It’s not uncommon to see a category called, say, “Help & Support” and then within that a sub-category for a particular project, e.g. “Footronic 5.x”. When Discourse sends out an email for a post to such a category, it’ll look like this:

Subject: [Help & Support] [Footronic 5.x] Need some help getting my Foo into alignment after passing through a bar-quux transform

Lots of space used by that prefix, on every message, and pointlessly so for me since these mails will have been filtered into a folder so I always know which folder I’m looking at anyway: all of the messages in that folder would be for help and support on Footronic 5.x. Like most email clients, Mutt has an index view that shows an overview of all the emails with a single line for each. Long subjects are truncated at the edge of my terminal.

I’ve put up with this for years now but the last straw was the newly-launched Ansible forum. Their category names are large and there’s lots of sub-categories. Here’s an example of what that looks like in my 95 character wide terminal.

The index view of a Mutt email client
The index view of a Mutt email client

This is quite irritating! I wondered if it could be fixed.

The Fix ^

Of course the Mutt community thought of this, and years ago. subjectrx! You put it in your config, specifying a regular expression to match and what it should be replaced with. For example:

subjectrx '\[Ansible\] \[[^]]*\] *' '%L%R'

That matches any sequence of “[Ansible] ” followed by another set of “[]” that have anything inside them, and replaces all of that with the left side of the match (%L) and the right side of the match (%R). So that effectively discards the tags.

This happens only on display; it doesn’t modify the actual emails on storage.

Here’s what that looks like afterwards:

The index view of a Mutt email client, with tags removed from subject lines
The index view of a Mutt email client, with tags removed from subject lines

Much better, right?

And that’s one of the reasons why I continue to use Mutt.

Other Solutions ^

Off the top of my head, there are some other ways this could have been done.

Alter emails upon delivery ^

It would have been pretty simple to strip these tags out of emails as they were being delivered, but I really like to keep emails on storage the same as they were when they arrived. At the very least doing this will cause a DKIM failure as I would have modified the message after it was signed. That wouldn’t be an issue for my delivery since my DKIM check would happen before any such editing, but I’d still rather not.

Run the subject lines through an external filter program ^

The format of many things in Mutt is highly configurable and one such format is index_format, which controls how the lines on the index view are displayed.

Sadly there is not a builtin format specifier to search and replace in the subject tag (or any other tag), but you can run the whole thing through an external program, which could do anything you liked to it. That would involve fork()ing and exec()ing a process for every single mail in a mailbox though. Yuck.

On Discourse ^

This is not a gripe about Discourse. I think Discourse is a better way to run a busy community than email lists. At this point I’d be happy for most mailing lists I’m part of to switch to Discourse instances, especially the very busy ones. I’m impressed with the amount of work and features that Discourse now has.

The only exception to that I think is that purely question-answer support mailing lists might be better off with a StackOverflow-style approach like AskUbuntu. But failing that, I think Discourse is still many times better than a mailing list for that use case.

Not that you asked, but I think the primary problem with email as a community platform is that only old people use email. In the 21st century it’s an unacceptable barrier to entry.

The next most serious problem with email for running a community is that any decently-sized community will have a certain percentage of utter numpties; these utter numpties won’t be self-aware enough to know they are utter numpties, and they will post a lot of nonsense. The only way to counter a numpty posting nonsense is to reply to it and call them out. That is exhausting, unrewarding work, which frequently goes wrong, adding to the noise and ill-feeling. Problem posters do not get dealt with until they reach a level bad enough to warrant their posting rights being removed. Forums like Discourse scale their moderation tasks much better, with a lot of it being amenable to wide community input.

I could go on to list a lot more serious problems but those two are the worst in my opinion.

Happy birthday, /dev/sdd?

One of my hard drives reaches 120,000 hours of operation in about a month:

$ ~/src/blkleaderboard/
     sdd 119195 hours (13.59 years) 0.29TiB ST3320620AS
     sdb 114560 hours (13.06 years) 0.29TiB ST3320620AS
     sda 113030 hours (12.89 years) 0.29TiB ST3320620AS
     sdk  76904 hours ( 8.77 years) 2.73TiB WDC WD30EZRX-00D
     sdh  66018 hours ( 7.53 years) 0.91TiB Hitachi HUA72201
     sde  45746 hours ( 5.21 years) 0.91TiB SanDisk SDSSDH31
     sdc  39179 hours ( 4.46 years) 0.29TiB ST3320418AS
     sdf  28758 hours ( 3.28 years) 1.82TiB Samsung SSD 860
     sdj  28637 hours ( 3.26 years) 1.75TiB KINGSTON SUV5001
     sdg  23067 hours ( 2.63 years) 1.75TiB KINGSTON SUV5001
     sdi   9596 hours ( 1.09 years) 0.45TiB ST500DM002-1BD14

It’s a 320GB Seagate Barracuda 7200.10.

The machine these are in is a fileserver at my home. The four 320GB HDDs are what the operating system is installed on, whereas the hodge podge assortment of differently-sized HDDs and SSDs are the main storage for files.

That is not the most performant way to do things, but it’s only at home and doesn’t need best performance. It mostly just uses up discarded storage from other machines as they get replaced.

sdd has seen every release of Debian since 4.0 (etch) and several iterations of hardware, but this can’t go on much longer. The machine that the four 320GB HDDs are in now is very underpowered but any replacement I can think of won’t be needing four 3.5″ SATA devices inside it. More like 2x 2.5″ NVMe or M.2.

Then again, I’ve been saying that it must be replaced for about 5 years now, so who knows how long it will take me. sdd will definitely reach 120,000 hours barring hardware failure in the next month. is on GitHub, by the way.

Exim: Adding the Autonomous System Number as a header in received emails

Updates ^

2022-11-05 ^

  • Added a bit about timeouts, as concern was expressed that I am “bonkers”.

The Problem ^

For statistical purposes I wanted to add the Autonomous System Number (ASN) for the IP address of the connecting host as a header in the received email, like this:

X-ASN: AS63949 2a01:7e01::/32

The Answer ^

You can obtain this information through a DNS query to Team Cymru:

$ sipcalc -r 2a01:7e01::f03c:92ff:fe32:a408
-[ipv6 : 2a01:7e01::f03c:92ff:fe32:a408] - 0
Reverse DNS (  -
$ dig +short -t txt
"63949 | 2a01:7e01::/32 | US | ripencc | 2011-02-01"

Or for legacy Internet addresses:

$ dig +noall +question -x
;   IN      PTR
$ dig +short -t txt
"13414 | | US | arin | 2010-11-23"

So for IPv6 addresses the process is:

  1. Expand the address out fully (2a01:7e01::f03c:92ff:fe32:a4082a01:7e01:0000:0000:f03c:92ff:fe32:a408)
  2. Remove the colons (2a01:7e01:0000:0000:f03c:92ff:fe32:a4082a017e0100000000f03c92fffe32a408)
  3. Reverse it (2a017e0100000000f03c92fffe32a408804a23efff29c30f0000000010e710a2)
  4. Add a dot after every hexadecimal number (804a23efff29c30f0000000010e710a28.0.4.a.2.3.e.f.f.f.2.9.c.3.0.f.
  5. Add on the end (
  6. Query that TXT record and parse out the first two values separated by ‘|’ in the response.

For legacy IP addresses the process is much simpler; reverse the octets, add on the end and query that.

An Exim Answer ^

In Exim configuration you can do it like this:

(This is meant to go inside an ACL like your check_rcpt or check_data. Maybe near the end of check_data at the point where you’ve already decided to accept the email. No point in doing this for an email you will reject.)

# Add X-ASN: header for IPv6 senders.
  warn message = X-ASN: AS${sg{${extract{1}{|}{$acl_m9}}}{\N\s+\N}{}} ${sg{${extract{2}{{|}{$acl_m9}}}{\N\s+\N}{}}
     condition = ${if isip6{$sender_host_address}}
    set acl_m9 = ${lookup dnsdb{txt=${reverse_ip:$sender_host_address}}}
# Add X-ASN: header for legacy IP senders.
  warn message = X-ASN: AS${sg{${extract{1}{|}{$acl_m9}}}{\N\s+\N}{}} ${sg{${extract{2}{{|}{$acl_m9}}}{\N\s+\N}{}}
     condition = ${if isip4{$sender_host_address}}
    set acl_m9 = ${lookup dnsdb{txt=${reverse_ip:$sender_host_address}}}

I dislike that I’ve had to use two tests that are almost exactly the same except they query slightly different DNS names ( vs I’m sure it could be done in one, but I’m not good enough with the Exim string evaluations. They send me cross-eyed. I couldn’t find a better way so I decided to use the time-honoured tactic of posting what I have in order to provoke people into correcting me. Please let me know if you can improve it!

The amount of nested {} will probably drive you mad, but basically:

  • ${reverse_ip:$sender_host_address} handles expanding and reversing an IP address into the form you would use for a reverse DNS query.
  • That gets queried in DNS with the correct suffix and the full response stored in $acl_m9.
  • warn message = X-ASN: adds a header to the email, the content of which is built from two fields extracted out of $acl_m9 with all whitespace removed (${sg{source}{regex}{replacement}}).

What about timeouts? ^

One piece of feedback I got was that I am “bonkers” to make my email delivery rely on a real time network lookup. I can kind of see the argument, but also not: this is a DNS query exactly the same as a typical DNSBL query (Team Cymru IP-to-ASN service is used exactly like a typical DNSBL).

Most people’s mail servers do multiple DNSBL queries already and nobody really is up in arms saying it’s bonkers to do so. My Exim already does a couple of DNSBL queries and then if it is going to deliver the email it will call out to SpamAssassin which does many DNSBL queries itself. If these hit a timeout then it would slow down my mail delivery.

In the past where a DNSBL has unceremoniously shut down and made its nameservers unresponsive I have seen problems, as it caused the delivery processes to pile up while they waited on their timeouts and then Exim would complain that there’s too many processes. That would be resolved by removing the errant DNSBL(s) from the configuration.

Query load is not a concern as DNS is highly scalable and my system is not going to add noticeable load to Team Cymru’s already public service. The SpamAssassin ASN plugin is already out there, hard coded to use this same service and must have many many users already.

As far as I can tell, in Exim dnsdb queries use the same timeouts and retries as dnslist queries do, that being controlled by the dns_retrans and dns_rety settings. These settings both default to 0, which means “operating system / resolver library default”. If you were worried you could explicitly set these to their minimum value:

If still worried then you would first have to either turn off all DNSBLs or make sure you had local copies of them (e.g. by arranging AXFR to your own local servers). Then to do the IP-to-ASN locally you’d arrange to have a local BGP feed that you could query. I think you’d need to have an absolutely huge mail server before these issues became real concerns.

set acl_m9 = ${lookup dnsdb{retrans_1s,retry_1,txt=${reverse_ip:$sender_host_address}}}

As for dnslist, the consequence of a time out is that you get no data, so it would just result in an empty header.

But Why? ^

I’ve actually been doing this for a while with SpamAssassin’s ASN plugin but I’ve changed the way in which I query SpamAssassin and now I don’t directly get the rewritten email that SpamAssassin makes (with its X-Spam-ASN: header in).

I use it for feeding into Bayes to see if there’s a particular prevalence of ASN amongst the email that is classified as spam, and I sometimes add a few points on manually for ASNs that are particularly bad. That is a lot less work than trying to track down all their IP addresses and keep that up to date.

Using Duplicity to back up to Amazon S3 over IPv6 (only)

Scenario ^

I have a server that I use for making backups. I also send backups from that server into Amazon S3 at the “Infrequent Access” storage class. That class is cheaper to store but expensive to access. It’s intended for backups of last resort that you only access in an emergency. I use Duplicity to handle the S3 part.

(I could save a bit more by using one of the “Glacier” classes but at the moment the cost is minimal and I’m not that brave.)

I recently decided to change which server I use for the backups. I noticed that renting a server with only IPv6 connectivity was cheaper, and as all the hosts I back up have IPv6 connectivity I decided to give that a go.

This mostly worked fine. The only thing I really noticed was when I tried to install some software from GitHub. GitHub doesn’t support IPv6, so I had to piggy back that download through another host.

Then I came to set up Duplicity again and found that I needed to make some non-obvious changes to make it work with S3 over IPv6-only.

S3 endpoint ^

The main issue is that the default S3 endpoint URL is https://s3.<region>, and this host only has an A (IPv4) record! For example:

$ host has address

If you run Duplicity with a target like s3://yourbucketname/path/to/backup then it will try that endpoint, get only an IPv4 address, and return Network unreachable.

S3 does actually support IPv6, but for that to work you need to use a dual stack endpoint! They look like this:

$ host has address has IPv6 address 2600:1fa0:80dc:5101:34d9:451e::

So we need to specify the S3 endpoint to use.

Specifying the S3 endpoint ^

In order to do this you need to switch Duplicity to the “boto3” backend. Assuming you’ve installed the correct package (python3-boto3 on Debian), this is as simple as changing the target from s3://… to boto3+s3://….

That then allows you to use the command line arguments --s3-region-name and --s3-endpoint-url so you can tell it which host to talk to. That ends up giving you both an IPv4 and an IPv6 address and your system correctly chooses the IPv6 one.

The full script ^

The new, working script now looks something like this:

export PASSPHRASE="highlysecret"
export AWS_ACCESS_KEY_ID="notquiteassecret"
export AWS_SECRET_ACCESS_KEY="extremelysecret"
# Somewhere with plenty of free space.
export TMPDIR=/var/tmp
duplicity --encrypt-key ABCDEF0123456789 \
          --asynchronous-upload \
          -v 4 \
          --archive-dir=/path/to/your/duplicity/archives \
          --s3-use-ia \
          --s3-use-multiprocessing \
          --s3-use-new-style \
          --s3-region-name "us-east-1" \
          --s3-endpoint-url "" \
          incr \
          --full-if-older-than 30D \
          /stuff/you/want/backed/up \

The previous version of the script looked a bit like:

# All the exports stayed the same
duplicity --encrypt-key ABCDEF0123456789 \
          --asynchronous-upload \
          -v 4 \
          --archive-dir=/path/to/your/duplicity/archives \
          --s3-use-ia \
          --s3-use-multiprocessing \
          incr \
          --full-if-older-than 30D \
          /stuff/you/want/backed/up \