I don’t think the cheapest APC Back-UPS units can be monitored except in Windows

TL;DR: Despite otherwise seeming to work correctly, I can’t monitor a Back-UPS BX1600MI in Linux without seeing a constant stream of spurious battery detach/reattach and power fail/restore events that last less than 2 seconds each. I’ve tried multiple computers and multiple UPSes of that model. It doesn’t happen in their own proprietary Windows software, so I think they’ve changed the protocol.

Apart from nearly two decades ago when I was given one for free, I’ve never bothered with a UPS at home. Our power grid is very reliable. Looking at availability information from “uptimed“, my home file server has been powered on for 99.97% of the time in the last 14 years. That includes time spent moving house and a day when the house power was off for several hours while the kitchen was refitted!

However, in December 2023 a fault with our electric oven popped the breaker for the sockets causing everything to be harshly powered off. My fileserver took it badly and one drive died. That wasn’t a huge issue as it has a redundant filesystem, but I didn’t like it.

I decided I could afford to treat myself to a relatively cheap UPS.

I did some research and read some reviews of the APC Back-UPS range, their cheapest offering. Many people were dismissive calling them cheap pieces of crap with flimsy plastic construction and batteries that are not regarded as user-replaceable. But there was no indication that such a model would not work, and I felt it hard to justify paying a lot here.

I found YouTube videos of the procedure that a technician would go through to replace the battery in 3 to 5 years. To do it yourself voids your warranty, but your warranty is done after 3 years anyway. It looked pretty doable even for a hardware-avoidant person like myself.

It’s important to me that the UPS can be monitored by a Linux computer. The entire point here is that the computer detects when the battery is near to exhausted and gracefully powers itself down. There are two main options on Linux for this: apcupsd and Network UPS Tools (“nut“).

Looking at the Back-UPS BX1600MI model, it has a USB port for monitoring and says it can be monitored with APC’s own Powerchute Serial Shutdown Windows software. There’s an entry in nut‘s hardware compatibility list for “Back-UPS (USB)” of “supported, based on publicly available protocol”. I made the order.

The UPS worked as expected in terms of being an uninterruptible power supply. It was hopeless trying to talk to it with nut though. nut just kept saying it was losing communications.

I tried apcupsd instead. This stayed connected, but it showed a continuous stream of battery detach/reattach and power fail/restore events each lasting less than 2 seconds. Normally on a power fail you’d expect a visual and audible alert on the UPS itself and I wasn’t getting any of that, but I don’t know if that’s because they were real events that were just too brief.

I contacted APC support but they were very quick to tell me that they did not support any other software but their own Windows-only Powerchute Serial Shutdown (PCSS).

I then asked about this on the apcupsd mailing list. The first response:

“Something’s wrong with your UPS, most likely the battery is bad, but since you say the UPS is brand new, just get it replaced.”

As this thing was brand new I wasn’t going to go through a warranty claim with APC. I just contacted the vendor and told them I thought it was faulty and I wanted to return it. They actually offered to send me another one in advance and me send back the one I had, so I went for that.

In the mean time I found time to install Windows 10 in a virtual machine and pass through USB to it. Guess what? No spurious events in PCSS on Windows. It detected expected events when I yanked the power etc. I had no evidence that the UPS was in any way faulty. You can probably see what is coming.

The replacement UPS (of the same model) behaved exactly the same: spurious events. This just seems to be what the APC Back-UPS does on non-Windows.

Returning to my thread on the apcupsd mailing list, I asked again if there was actually anyone out there who had one of these working with non-Windows. The only substantive response I’ve got so far is:

“BX are the El Cheapo plastic craps, worst of all, not even the BExx0 family is such a crap – Schneider’s direct response to all the chinese craps flooding the markets […] no sane person would buy these things, but, well, here we are.”

So as far as I am aware, the Back-UPS models cannot currently be monitored from non-Windows. That will have to be my working theory unless someone who has it working with non-Windows contacts me to let me know I am wrong, which I would be interested to know about. I feel like I’ve done all that I can to find such people, by asking on the mailing list for the software that is meant for monitoring APC UPSes on Unix.

After talking all this over with the vendor they’ve recommended a Riello NPW 1.5kVA which is listed as fully supported by nut. They are taking the APC units back for a full refund; the Riello is about £30 more expensive.

grub-install: error: embedding is not possible, but this is required for RAID and LVM install

The Initial Problem ^

The recent security update of the GRUB bootloader did not want to install on my fileserver at home:

$ sudo apt dist-upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
  grub-common grub-pc grub-pc-bin grub2-common
4 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 4,067 kB of archives.
After this operation, 72.7 kB of additional disk space will be used.
Do you want to continue? [Y/n]
…
Setting up grub-pc (2.02+dfsg1-20+deb10u4) ...
Installing for i386-pc platform.
grub-install: warning: your core.img is unusually large.  It won't fit in the embedding area.
grub-install: error: embedding is not possible, but this is required for RAID and LVM install.
Installing for i386-pc platform.
grub-install: warning: your core.img is unusually large.  It won't fit in the embedding area.
grub-install: error: embedding is not possible, but this is required for RAID and LVM install.
Installing for i386-pc platform.
grub-install: warning: your core.img is unusually large.  It won't fit in the embedding area.
grub-install: error: embedding is not possible, but this is required for RAID and LVM install.
Installing for i386-pc platform.
grub-install: warning: your core.img is unusually large.  It won't fit in the embedding area.
grub-install: error: embedding is not possible, but this is required for RAID and LVM install.

Four identical error messages, because this server has four drives upon which the operating system is installed, and I’d decided to do a four way RAID-1 of a small first partition to make up /boot. This error is coming from grub-install.

Ancient History ^

This system came to life in 2006, so it’s 15 years old. It’s always been Debian stable, so right now it runs Debian buster and during those 15 years it’s been transplanted into several different iterations of hardware.

Choices were made in 2006 that were reasonable for 2006, but it’s not 2006 now. Some of these choices are now causing problems.

Aside: four way RAID-1 might seem excessive, but we’re only talking about the small /boot partition. Back in 2006 I chose a ~256M one so if I did the minimal thing of only having a RAID-1 pair I’d have 2x 256M spare on the two other drives, which isn’t very useful. I’d honestly rather have all four system drives with the same partition table and there’s hardly ever writes to /boot anyway.

Here’s what the identical partition tables of the drives /dev/sd[abcd] look like:

$ sudo fdisk -u -l /dev/sda
Disk /dev/sda: 298.1 GiB, 320069031424 bytes, 625134827 sectors
Disk model: ST3320620AS     
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000000
 
Device     Boot   Start       End   Sectors  Size Id Type
/dev/sda1  *         63    514079    514017  251M fd Linux raid autodetect
/dev/sda2        514080   6393869   5879790  2.8G fd Linux raid autodetect
/dev/sda3       6393870 625121279 618727410  295G fd Linux raid autodetect

Note that the first partition starts at sector 63, 32,256 bytes into the disk. Modern partition tools tend to start partitions at sector 2,048 (1,024KiB in), but this was acceptable in 2006 for me and worked up until a few days ago.

Those four partitions /dev/sd[abcd]1 make up an mdadm RAID-1 with metadata version 0.90. This was purposefully chosen because at the time of install GRUB did not have RAID support. This metadata version lives at the end of the member device so anything that just reads the device can pretend it’s an ext2 filesystem. That’s what people did many years ago to boot off of software RAID.

What’s Gone Wrong? ^

The last successful update of grub-pc seems to have been done on 7 February 2021:

$ ls -la /boot/grub/i386-pc/core.img
-rw-r--r-- 1 root root 31082 Feb  7 17:19 /boot/grub/i386-pc/core.img

I’ve got 62 sectors available for the core.img so that’s 31,744 bytes – just 662 bytes more than what is required.

The update of grub-pc appears to be detecting that my /boot partition is on a software RAID and is now including MD RAID support even though I don’t strictly require it. This makes the core.img larger than the space I have available for it.

I don’t think it is great that such a major change has been introduced as a security update, and it doesn’t seem like there is any easy way to tell it not to include the MD RAID support, but I’m sure everyone is doing their best here and it’s more important to get the security update out.

Possible Fixes ^

So, how to fix? It seems to me the choices are:

  1. Ignore the problem and stay on the older grub-pc
  2. Create a core.img with only the modules I need
  3. Rebuild my /boot partition

Option #1 is okay short term, especially if you don’t use Secure Boot as that’s what the security update was about.

Option #2 doesn’t seem that feasible as I can’t find a way to influence how Debian’s upgrade process calls grub-install. I don’t want that to become a manual process.

Option #3 seems like the easiest thing to do, as shaving ~1MiB off the size of my /boot isn’t going to cause me any issues.

Rebuilding My /boot ^

Take a backup ^

/boot is only relatively small so it seemed easiest just to tar it up ready to put it back later.

$ sudo tar -C /boot -cvf ~/boot.tar .

I then sent that tar file off to another machine as well, just in case the worst should happen.

Unmount /boot and stop the RAID array that it’s on ^

I’ve already checked in /etc/fstab that /boot is on /dev/md0.

$ sudo umount /boot
$ sudo mdadm --stop md0         
mdadm: stopped md0

At this point I would also recommend doing a wipefs -a on each of the partitions in order to remove the MD superblocks. I didn’t and it caused me a slight problem later as we shall see.

Delete and recreate first partition on each drive ^

I chose to use parted, but should be doable with fdisk or sfdisk or whatever you prefer.

I know from the fdisk output way above that the new partition needs to start at sector 2048 and end at sector 514,079.

$ sudo parted /dev/sda                                                             
GNU Parted 3.2
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) unit s
(parted) rm 1
(parted) mkpart primary ext4 2048 514079s
(parted) set 1 raid on
(parted) set 1 boot on
(parted) p
Model: ATA ST3320620AS (scsi)
Disk /dev/sda: 625134827s
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:
 
Number  Start     End         Size        Type     File system  Flags
 1      2048s     514079s     512032s     primary  ext4         boot, raid, lba
 2      514080s   6393869s    5879790s    primary               raid
 3      6393870s  625121279s  618727410s  primary               raid
 
(parted) q
Information: You may need to update /etc/fstab.

Do that for each drive in turn. When I got to /dev/sdd, this happened:

Error: Partition(s) 1 on /dev/sdd have been written, but we have been unable to
inform the kernel of the change, probably because it/they are in use.  As a result,
the old partition(s) will remain in use.  You should reboot now before making further changes.
Ignore/Cancel?

The reason for this seems to be that something has decided that there is still a RAID signature on /dev/sdd1 and so it will try to incrementally assemble the RAID-1 automatically in the background. This is why I recommend a wipefs of each member device.

To get out of this situation without rebooting I needed to repeat my mdadm --stop /dev/md0 command and then do a wipefs -a /dev/sdd1. I was then able to partition it with parted.

Create md0 array again ^

I’m going to stick with metadata format 0.90 for this one even though it may not be strictly necessary.

$ sudo mdadm --create /dev/md0 \
             --metadata 0.9 \
             --level=1 \
             --raid-devices=4 \
             /dev/sd[abcd]1
mdadm: array /dev/md0 started.

Again, if you did not do a wipefs earlier then mdadm will complain that these devices already have a RAID array on them and ask for confirmation.

Get the Array UUID ^

$ sudo mdadm --detail /dev/md0
/dev/md0:
           Version : 0.90
     Creation Time : Sat Mar  6 03:20:10 2021
        Raid Level : raid1
        Array Size : 255936 (249.94 MiB 262.08 MB)
     Used Dev Size : 255936 (249.94 MiB 262.08 MB)
      Raid Devices : 4
     Total Devices : 4
   Preferred Minor : 0
       Persistence : Superblock is persistent
 
       Update Time : Sat Mar  6 03:20:16 2021
             State : clean
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0
 
Consistency Policy : resync
 
              UUID : e05aa2fc:91023169:da7eb873:22131b12 (local to host specialbrew.localnet)            Events : 0.18
 
    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1

Change your /etc/mdadm/mdadm.conf for the updated UUID of md0:

$ grep md0 /etc/mdadm/mdadm.conf
ARRAY /dev/md0 level=raid1 num-devices=4 UUID=e05aa2fc:91023169:da7eb873:22131b12

Make a new filesystem on /dev/md0 ^

$ sudo mkfs.ext4 -m0 -L boot /dev/md0
mke2fs 1.44.5 (15-Dec-2018)
Creating filesystem with 255936 1k blocks and 64000 inodes
Filesystem UUID: fdc611f2-e82a-4877-91d3-0f5f8a5dd31d
Superblock backups stored on blocks:
        8193, 24577, 40961, 57345, 73729, 204801, 221185
 
Allocating group tables: done
Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

My /etc/fstab didn’t need a change because it mounted by device name, i.e. /dev/md0, but if yours uses UUID or label then you’ll need to update that now, too.

Mount it and put your files back ^

$ sudo mount /boot
$ sudo tar -C /boot -xvf ~/boot.tar

Reinstall grub-pc ^

$ sudo apt reinstall grub-pc
…
Setting up grub-pc (2.02+dfsg1-20+deb10u4) ...
Installing for i386-pc platform.
Installation finished. No error reported.
Installing for i386-pc platform.
Installation finished. No error reported.
Installing for i386-pc platform.
Installation finished. No error reported.
Installing for i386-pc platform.
Installation finished. No error reported.

Reboot ^

You probably should reboot now to make sure it all works when you have time to fix any problems, as opposed to risking issues when you least expect it.

$ uprecords 
     #               Uptime | System                                     Boot up
----------------------------+---------------------------------------------------
     1   392 days, 16:45:55 | Linux 4.7.0               Thu Jun 14 16:13:52 2018
     2   325 days, 03:20:18 | Linux 3.16.0-0.bpo.4-amd  Wed Apr  1 14:43:32 2015
->   3   287 days, 16:03:12 | Linux 4.19.0-9-amd64      Fri May 22 12:33:27 2020     4   257 days, 07:31:42 | Linux 4.19.0-6-amd64      Sun Sep  8 05:00:38 2019
     5   246 days, 14:45:10 | Linux 4.7.0               Sat Aug  6 06:27:52 2016
     6   165 days, 01:24:22 | Linux 4.5.0-rc4-specialb  Sat Feb 20 18:18:47 2016
     7   131 days, 18:27:51 | Linux 3.16.0              Tue Sep 16 08:01:05 2014
     8    89 days, 16:01:40 | Linux 4.7.0               Fri May 26 18:28:40 2017
     9    85 days, 17:33:51 | Linux 4.7.0               Mon Feb 19 17:17:39 2018
    10    63 days, 18:57:12 | Linux 3.16.0-0.bpo.4-amd  Mon Jan 26 02:33:47 2015
----------------------------+---------------------------------------------------
1up in    37 days, 11:17:07 | at                        Mon Apr 12 15:53:46 2021
no1 in   105 days, 00:42:44 | at                        Sat Jun 19 05:19:23 2021
    up  2362 days, 06:33:25 | since                     Tue Sep 16 08:01:05 2014
  down     0 days, 14:02:09 | since                     Tue Sep 16 08:01:05 2014
   %up               99.975 | since                     Tue Sep 16 08:01:05 2014

My Kingdom For 7 Bytes ^

My new core.img is 7 bytes too big to fit before my original /boot:

$ ls -la /boot/grub/i386-pc/core.img
-rw-r--r-- 1 root root 31751 Mar  6 03:24 /boot/grub/i386-pc/core.img

Intel may need me to sign an NDA before I can know the capacity of one of their SSDs

Apologies for the slightly clickbaity title! I could not resist. While an Intel employee did tell me this, they are obviously wrong.

Still, I found out some interesting things that I was previously unaware of.

I was thinking of purchasing some “3.84TB” Intel D3-S4610 SSDs for work. I already have some “3.84TB” Samsung SM883s so it would be good if the actual byte capacity of the Intel SSDs were at least as much as the Samsung ones, so that they could be used to replace a failed Samsung SSD.

To those with little tech experience you would think that two things which are described as X TB in capacity would be:

  1. Actually X TB in size, where 1TB = 1,000 x 1,000 x 1,000 x 1,000 bytes, using powers of ten SI prefixes. Or;
  2. Actually X TiB in size, where 1TiB = 1,024 x 1,024 x 1,024 x 1,024 bytes, using binary prefixes.

…and there was a period of time where this was mostly correct, in that manufacturers would prefer something like the former case, as it results in larger headline numbers.

The thing is, years ago, manufacturers used to pick a capacity that was at least what was advertised (in powers of 10 figures) but it wasn’t standardised.

If you used those drives in a RAID array then it was possible that a replacement—even from the same manufacturer—could be very slightly smaller. That would give you a bad day as you generally need devices that are all the same size. Larger is okay (you’ll waste some), but smaller won’t work.

So for those of us who like me are old, this is something we’re accustomed to checking, and I still thought it was the case. I wanted to find out the exact byte capacity of this Intel SSD. So I tried to ask Intel, in a live support chat.

Edgar (22/02/2021, 13:50:59): Hello. My name is Edgar and I’ll be helping you today.

Me (22/02/2021, 13:51:36): Hi Edgar, I have a simple request. Please could you tell me the exact byte capacity of a SSD-SSDSC2KG038T801 that is a 3.84TB Intel D3-S4610 SSD

Me (22/02/2021, 13:51:47): I need this information for matching capacities in a RAID set

Edgar (22/02/2021, 13:52:07): Hello, thank you for contacting Intel Technical Support. It is going to be my pleasure to help you.

Edgar (22/02/2021, 13:53:05): Allow me a moment to create a ticket for you.

Edgar (22/02/2021, 13:57:26): We have a calculation to get the decimal drive sectors of an SSD because the information you are asking for most probably is going to need a Non-Disclousre Agreement (NDA)

Yeah, an Intel employee told me that I might need to sign an NDA to know the usable capacity of an SSD. This is obviously nonsense. I don’t know whether they misunderstood and thought I was asking about the raw capacity of the flash chips or what.

Me (22/02/2021, 13:58:15): That seems a bit strange. If I buy this drive I can just plug it in and see the capacity in bytes. But if it’s too small then that is a wasted purchase which would be RMA’d

Edgar (22/02/2021, 14:02:48): It is 7,500,000,000

Edgar (22/02/2021, 14:03:17): Because you take the size of the SSD that is 3.84 in TB, in Byte is 3840000000000

Edgar (22/02/2021, 14:03:47): So we divide 3840000000000 / 512 which is the sector size for a total of 7,500,000,000 Bytes

Me (22/02/2021, 14:05:50): you must mean 7,500,000,000 sectors of 512byte, right?

Edgar (22/02/2021, 14:07:45): That is the total sector size, 512 byte

Edgar (22/02/2021, 14:08:12): So the total sector size of the SSD is 7,500,000,000

Me (22/02/2021, 14:08:26): 7,500,000,000 sectors is only 3,750GB so this seems rather unlikely

The reason why this seemed unlikely to me is that I have never seen an Intel or Samsung SSD that was advertised as X.Y TB capacity that did not have a usable capacity of at least X,Y00,000,000,000 bytes. So I would expect a “3.84TB” device to have at least 3,840,000,000,000 bytes of usable capacity.

Edgar was unable to help me further so the support chat was ended. I decided to ask around online to see if anyone actually had one of these devices running and could tell me the capacity.

Peter Corlett responded to me with:

As per IDEMA LBA1-03, the capacity is 1,000,194,048 bytes per marketing gigabyte plus 10,838,016 bytes. A marketing terabyte is 1000 marketing gigabytes.

3840 * 1000194048 + 10838016 = 3840755982336. Presumably your Samsung disk has that capacity, as should that Intel one you’re eyeing up.

My Samsung ones do! And every other SSD I’ve checked obeys this formula, which explains why things have seemed a lot more standard recently. I think this might have been standardised some time around 2014 / 2015. I can’t tell right now because the IDEMA web site is down!

So the interesting and previously unknown to me thing is that storage device sizes are indeed standardised now, albeit not to any sane definition of the units that they use.

What a relief.

Also that Intel live support sadly can’t be relied upon to know basic facts about Intel products. 🙁

Recovering From an Exif Disaster

The Discovery ^

Sometime in late December (2019) I noticed that when I clicked on a tag in Shotwell, the photo management software that I use, it was showing either zero or hardly any matching photos when I knew for sure that there should be many more.

(When I say “tag” in this article it’s mostly going to refer to the type of tags you generally put on an image, i.e. the tags that identify who or what is in the image, what event it is associated with, the place it was taken etc. Images can have many different kinds of tags containing all manner of metadata, but for avoidance of doubt please assume that I don’t mean any of those.)

I have Shotwell set to store the tags in the image files themselves, in the metadata. There is a standard for this called Exif. What seems to have happened is that Shotwell had removed a huge number of tags from the files themselves. At the time of discovery I had around 15,500 photos in my collection and it looked like the only way to tell what was in them would be by looking at them. Disaster.

Here follows some notes about what I found out when trying to recover from this situation, in case it si ever useful for anyone.

Shotwell still had a visible tag hierarchy, so I could for example click on the “Pets/Remy” tag, but this brought up only one photo that I took on 14 December 2019. I’ve been taking photos of Remy for years so I knew there should be many more. Here’s Remy.

Remy at The Avenue Ealing Christmas Fair, December 2019
Remy at The Avenue Ealing Christmas Fair

Luckily, I have backups.

Comparing Good and Bad Copies of a Photo ^

I knew this must have happened fairly recently because I’d have noticed quite quickly that photos were “missing”. I had a look for a recent photo that I knew I’d tagged with a particular thing, and then looked in the backups to see when it was last modified.

As an example I found a photo that was taken on 30 October 2019 that should have been tagged “Pets/Violet” but no longer was. It had been modified (but not by me) on 7 December 2019.

A broken photo of Violet
A broken photo of Violet

(Sorry about the text-as-images; I’m reconstructing this series of events from a Twitter thread, where things necessarily had to be posted as screenshots.)

What the above shows is that the version of the photo that existed on 30 October 2019 had the tags “Pets“, “Edna“, and “Violet” but then the version that was written on 7 December 2019 lost the “Violet” tag.

Here I used the exiftool utility to display EXIF tags from the photo files. You can do that like this:

$ exiftool -s $filename

Using egrep I limited this to the tag keys “Subject“, “Keywords“, and “TagsListLastKeywordXMP” but this was a slight mistake: “TagsListLastKeywordXMP” was actually a typo, is totally irrelevant and should be ignored.

Subject” and “Keywords” were always identical for any photo I examined and contained the flattened list of tags. For example, in Shotwell that photo originally had the tags:

  • Pets/Edna
  • Pets/Violet

It seems that Shotwell flattens that to:

  • Pets
  • Edna
  • Violet

and then stores it in “Subject” and “Keywords“.

The tags with hierarchy are actually in the key “TagsList” like:

  • Pets
  • Pets/Edna
  • Pets/Violet

Fixing One Photo ^

I tested stuffing the tag “Violet” back in to this file under the keys “Subject” and “Keywords“:

$ exiftool -keywords+="…" -subject+="…" $filename

Stuffing the Violet tag back in
Stuffing the Violet tag back in

This shows that the “Violet” tag is now back in the current version of the file. After restarting Shotwell and doing a free text search for “Violet”, this photo now shows up whereas before it did not. It still did not show up when I clicked on “Pets/Violet” in the tag hierarchy however. It was then that I realised I also needed to put “Pets/Violet” into the “TagsList” key.

I ended up using a script to do this in bulk fashion, but individually I think you should be able to do this like:

$ exiftool -keywords+=Violet -subject+=Violet -TagsList+=Pets/Violet

After restarting Shotwell I was able to click on the “Pets/Violet” tag and see this photo.

Fixing All the Photos? ^

My process to recover from this, then, was to compile a list of each file that had been modified at the suspected time of disaster, and for each:

  1. Read the list of tags from “Keywords
  2. Read the list of tags from “Subject
  3. De-duplicate them and store them as $keywords
  4. Read the list of tags from “TagsList” and store them as $tagslist
  5. Stuff $keywords back into both “Subject” and “Keywords” of the current version of the file

Gulp.

Which files were tampered with? ^

It was relatively easy to work out which files had been screwed with, because thankfully I didn’t make any other photo modifications on 7 December 2019. So any photo that got modified that day was probably a candidate.

I haven’t mentioned what actually caused this problem yet. I don’t know exactly. At 16:53 on 7 December 2019 I was importing some photos into Shotwell, and I do seem to recall it crashed at some point, either while I was doing that or shortly after.

The photos from that import and all others afterwards had retained their tags correctly, but many that existed prior to that time seemed to be missing some or all tags. I have no idea why such a crash would cause Shotwell to do that but that must have been what did it.

Running this against my backups identified 3,721 files that had been modified on 7 December 2019:

$ cd weekly.2/specialbrew.21tc.bitfolk.com/srv/tank/Photos/Andy
$ find . -type f \
  -newermt "2019-12-07 00:00:00" \! \
  -newermt "2019-12-07 23:59:59" > ~/busted.txt

The next thing I did was to check that each of these file paths still exist in the current photo store and in the known-good backups (weekly.3).

Extract tags from known-good copies ^

Next up, I wrote a script which:

  1. Goes to the known-good copies of the files
  2. Extracts the Subject and Keywords and deduplicates them
  3. Extracts the TagsList
  4. Writes it all into a hash
  5. Dumps that out as a YAML file

All scripts mentioned here script use the Perl module Image::ExifTool which is part of the exiftool package.

backup_host$ ./gather_tags.pl < ~/busted.txt > ~/tags.yaml

tags.yaml looks a bit like this:

---
2011/01/16/16012011163.jpg:
  keywords:
  - Hatter
  - Pets
  tagslist:
  - Pets
  - Pets/Hatter
[]
2019/11/29/20191129_095218~2.jpg:
  keywords:
  - Bedfont Lakes
  - Feltham
  - London
  - Mandy
  - Pets
  - Places
  tagslist:
  - Pets
  - Pets/Mandy
  - Places
  - Places/London
  - Places/London/Feltham
  - Places/London/Feltham/Bedfont Lakes

Stuff tags back into current versions of photos ^

After transferring tags.yaml back to my home fileserver it was time to use it to stuff the tags back into the files that had lost them.

One thing to note while doing this is that if you just add a tag, it adds it even if the same tag already exists, leading to duplicates. I thought it best to first delete the tag and then add it again so that there would only be one instance of each one.

I called that one fix_tags.pl.

$ ./fix_tags.pl tags.yaml

Profit! Or, only slight loss, I guess ^

16m53s of runtime later, it had completed its work… 🙌 2020 will definitely be the year of Linux on the desktop¹.

¹ As long as you know how to manipulate EXIF tags from a programming language and have a functioning backup system and even then don’t mind losing some stuff

Losing some stuff…? ^

Unfortunately there were some things I couldn’t restore. It was at this point that I discovered that Shotwell does not ever put tags into video files (even though they do support EXIF tags…)

That means that the only record of the tags on a video file is in Shotwell’s own database, which I did not back up as I didn’t think I needed to.

Getting Tags Out of Shotwell ^

I am now backing that up, but should this sort of thing happen in the future I’d need to know how to manipulate the tags for videos in Shotwell’s database.

Shotwell’s database is an SQLite file that’s normally at $HOME/.local/share/shotwell/data/photo.db. I’m fairly familiar with SQLite so I had a poke around, but couldn’t immediately see how these tags were stored. I had to ask on the Shotwell mailing list.

Here’s how Shotwell does it. There’s a table called TagTable which stores the name of each tag and a comma-separated list of every photo/video which matches it:

sqlite> .schema TagTable 
CREATE TABLE TagTable (id INTEGER PRIMARY KEY, name TEXT UNIQUE NOT NULL, photo_id_list TEXT, time_created INTEGER);

The photo_id_list column holds the comma-separated list. Each item in the list is of the form:

  1. “thumb” or “video-” depending on whether the item is a photo or a video
  2. 16 hex digits, zero padded, which is the ID value from the PhotosTable or VideosTable for that item
  3. a comma

Full example of extracting tags for the video file 2019/12/31/20191231_121604.mp4:

$ sqlite3 /home/andy/.local/share/shotwell/DATA/photo.db
SQLite version 3.22.0 2018-01-22 18:45:57
Enter ".help" FOR usage hints.
sqlite> SELECT id
        FROM VideoTable
        WHERE filename LIKE '%20191231%';
553
sqlite> SELECT printf("%016x", 553);
0000000000000229
sqlite> SELECT name
        FROM TagTable
        WHERE photo_id_list LIKE '%video-0000000000000229,%';
/Places
/Places/London
/Places/London/Feltham
/Pets
/Places/London/Feltham/Bedfont Lakes
/Pets/Marge
/Pets/Mandy

If that is not completely clear:

  • The ID for that video file is 553
  • 553 in hexadecial is 229
  • Pad that to 16 digits, add “video-” at the front and “.” at the end (even the last item in the list has a comma at the end)
  • Search for that string in photo_id_list
  • If a row matches then the name column is a tag that is attached to that file

I don’t exactly know how I would have identified which videos got messed with, but at least I would have had both versions of the database to compare, and I now know how I would do the comparison.

Should Tags Even Be In Photos? ^

During my Twitter thread it was suggested to me that tags should not be stored in photos, but only in the photo cataloging software, where they can be backed up along with everything else.

I disagree with this for several reasons:

  • Exif exists for the purpose of storing tags like this.

  • When I move my photos from one piece of software to another I want it to be able to read the tags. I don’t want to have to input them all over again. That would be unimaginably tedious.

    When I moved from F-Spot to Shotwell the fact that the tags were in the files saved me countless hours of work. It just worked on import.

    If there wasn’t a dedicated importer feature then it would be so much work that really the only way to do it would be to extract the tags from the database and insert them again programmatically, which is basically admitting that to change software you need to be an expert. That really isn’t how this should work.

  • If the only copy of my tags is in the internal database of a unique piece of cataloging software, then I have to become an expert on the internal data store of that piece of software. I don’t want to have to do that.

    I’ve been forced to do that here for Shotwell because of a deficiency of Shotwell in not storing video tags in the files. But if we’re only talking about photos then I could have avoided it, and could also avoid having to be an expert on every future piece of cataloging software.

  • Even if I’m not moving to a different cataloging solution, lots of software understands Exif and it’s useful to be able to query those things from other software.

    I regard it very much like artist, album, author, genre etc tags in the metadata of digital music and ebooks, all of which are in the files; you would not expect to have to reconstruct these out of the database of some other bit of software every time you wanted to use them elsewhere.

It was a mistake not to backup the Shotwell database though; I thought I did not need it as I thought all tags were being stored in files, and tags were the only things I cared about. As it happened, tags were not being stored in video files and tags for video files only exist in Shotwell’s database.

Other Thoughts ^

Having backups was obviously a lifesaver here. It took me ~3 weeks to notice.

Being able to manipulate them like a regular filesystem made things a lot more convenient, so that’s a property I will want to keep in whatever future backup arrangements I have.

I might very well switch to different photo management software now, assuming I could find any that I prefer, but all software has bugs. Whatever I switch to I would have to ensure that I knew how to extract the tags from that as well, if it doesn’t store them in the files.

I don’t want to store my photos and videos “in the cloud” but it is a shortcoming of Shotwell that I can basically only use it from my desktop at home. Its database does not support multiple or remote access. I wonder if there is some web-based thing that can just read (and cache) the tags out of the files, build dynamic galleries and allow arbitrary searches on them…

Shotwell’s database schema and its use of 16 hexadecimal digits (nibbles?) means I can only store a maximum of 18,446,744,073,709,551,615 (1.844674407×10¹⁹ -1) photos or videos of dogs. Arbitrary limits suck so much.

Greyhounds Marge, Janti and Will at Sainsbury's Staines with Wimbledon Greyhound Welfare, December 2019
Marge, Janti and Will at Sainsbury’s Staines with Wimbledon Greyhound Welfare, December 2019

The Internet of Unprofitable Things

Gather round children ^

Uncle Andrew wants to tell you a festive story. The NTPmare shortly after Christmas.

A modest proposal ^

Nearly two years ago, on the afternoon of Monday 16th January 2017, I received an interesting BitFolk support ticket from a non-customer. The sender identified themselves as a senior software engineer at NetThings UK Ltd.

Subject: Specific request for NTP on IP 85.119.80.232

Hi,

This might sound odd but I need to setup an NTP server instance on IP address 85.119.80.232.

wats 85.119.80.232 precious? ^

85.119.80.232 is actually one of the IP addresses of one of BitFolk’s customer-facing NTP servers. It was also, until a few weeks before this email, part of the NTP Pool project.

Was” being the important issue here. In late December of 2016 I had withdrawn BitFolk’s NTP servers from the public pool and firewalled them off to non-customers.

I’d done that because they were receiving an unusually large amount of traffic due to the Snapchat NTP bug. It wasn’t really causing any huge problems, but the number of traffic flows were pushing useful information out of Jump‘s fixed-size netflow database and I didn’t want to deal with it over the holiday period, so this public service was withdrawn.

NTP? ^

This article was posted to Hacker News and a couple of comments there said they would have liked to have seen a brief explanation of what NTP is, so I’ve now added this section. If you know what NTP is already then you should probably skip this section because it will be quite brief and non-technical.

Network Time Protocol is a means by which a computer can use multiple other computers, often from across the Internet on completely different networks under different administrative control, to accurately determine what the current time is. By using several different computers, a small number of them can be inaccurate or even downright broken or hostile, and still the protocol can detect the “bad” clocks and only take into account the more accurate majority.

NTP is supposed to be used in a hierarchical fashion: A small number of servers have hardware directly attached from which they can very accurately tell the time, e.g. an atomic clock, GPS, etc. Those are called “Stratum 1” servers. A larger number of servers use the stratum 1 servers to set their own time, then serve that time to a much larger population of clients, and so on.

It used to be the case that it was quite hard to find NTP servers that you were allowed to use. Your own organisation might have one or two, but really you should have at least 3 to 7 of them and it’s better if there are multiple different organisations involved. In a university environment that wasn’t so difficult because you could speak to colleagues from another institution and swap NTP access. As the Internet matured and became majority used by corporations and private individuals though, people still needed access to accurate time, and this wasn’t going to cut it.

The NTP Pool project came to the rescue by making an easy web interface for people to volunteer their NTP servers, and then they’d be served collectively in a DNS zone with some basic means to share load. A private individual can just use three names from the pool zone and they will get three different (constantly changing) NTP servers.

Corporations and those making products that need to query the NTP pool are supposed to ask for a “vendor zone”. They make some small contribution to the NTP pool project and then they get a DNS zone dedicated to their product, so it’s easier for the pool administrators to direct the traffic.

Sadly many companies don’t take the time to understand this and just use the generic pool zone. NetThings UK Ltd went one step further in a very wrong direction by taking an IP address from the pool and just using it directly, assuming it would always be available for their use. In reality it was a free service donated to the pool by BitFolk and as it had become temporarily inconvenient for that arrangement to continue, service was withdrawn.

On with the story…

They want what? ^

The Senior Software Engineer continued:

The NTP service was recently shutdown and I am interested to know if there is any possibility of starting it up again on the IP address mentioned. Either through the current holder of the IP address or through the migration of the current machine to another address to enable us to lease 85.119.80.232.

Um…

I realise that this is a peculiar request but I can assure you it is genuine.

That’s not gonna work ^

Obviously what with 85.119.80.232 currently being in use by all customers as a resolver and NTP server I wasn’t very interested in getting them all to change their configuration and then leasing it to NetThings UK Ltd.

What I did was remove the firewalling so that 85.119.80.232 still worked as an NTP server for NetThings UK Ltd until we worked out what could be done.

I then asked some pertinent questions so we could work out the scope of the service we’d need to provide. Questions such as:

  • How many clients do you have using this?
  • Do you know their IP addresses?
  • When do they need to use the NTP server and for how long?
  • Can you make them use the pool properly (a vendor zone)?

Down the rabbit hole ^

The answers to some of the above questions were quite disappointing.

It would be of some use for our manufacturing setup (where the RTCs are initially set) but unfortunately we also have a reasonably large field population (~500 units with weekly NTP calls) that use roaming GPRS SIMs. I don’t know if we can rely on the source IP of the APN for configuring the firewall in this case (I will check though). We are also unable to update the firmware remotely on these devices as they only have a 5MB per month data allowance. We are able to wirelessly update them locally but the timeline for this is months rather than weeks.

Basically it seemed that NetThings UK Ltd made remote controlled thermostats and lighting controllers for large retail spaces etc. And their devices had one of BitFolk’s IP addresses burnt into them at the factory. And they could not be identified or remotely updated.

Facepalm

Oh, and whatever these devices were, without an external time source their clocks would start to noticeably drift within 2 weeks.

By the way, they solved their “burnt into it at the factory” problem by bringing up BitFolk’s IP address locally at their factory to set initial date/time.

Group Facepalm

I’ll admit, at this point I was slightly tempted to work out how to identify these devices and reply to them with completely the wrong times to see if I could get some retail parks to turn their lights on and off at strange times.

Weekly?? ^

We are triggering ntp calls on a weekly cron with no client side load balancing. This would result in a flood of calls at the same time every Sunday evening at around 19:45.

Yeah, they made every single one of their unidentifiable devices contact a hard coded IP address within a two minute window every Sunday night.

The Senior Software Engineer was initially very worried that they were the cause of the excess flows I had mentioned earlier, but I reassured them that it was definitely the Snapchat bug. In fact I never was able to detect their devices above background noise; it turns out that ~500 devices doing a single SNTP query is pretty light load. They’d been doing it for over 2 years before I received this email.

I did of course point out that they were lucky we caught this early because they could have ended up as the next Netgear vs. University of Wisconsin.

I am feeling really, really bad about this. I’m very, very sorry if we were the cause of your problems.

Bless. I must point out that throughout all of this, their Senior Software Engineer was a pleasure to work with.

We made a deal ^

While NTP service is something BitFolk provides as a courtesy to customers, it’s not something that I wanted to sell as a service on its own. And after all, who would buy it, when the public pool exists? The correct thing for a corporate entity to do is support the pool with a vendor zone.

But NetThings UK Ltd were in a bind and not allowing them to use BitFolk’s NTP server was going to cause them great commercial harm. Potentially I could have asked for a lot of money at this point, but (no doubt to my detriment) that just felt wrong.

I proposed that initially they pay me for two hours of consultancy to cover work already done in dealing with their request and making the firewall changes.

I further proposed that I charged them one hour of consultancy per month for a period of 12 months, to cover continued operation of the NTP server. Of course, I do not spend an hour a month fiddling with NTP, but this unusual departure from my normal business had to come at some cost.

I was keen to point out that this wasn’t something I wanted to continue forever:

Finally, this is not a punitive charge. It seems likely that you are in a difficult position at the moment and there is the temptation to charge you as much as we can get away with (a lot more than £840 [+VAT per year], anyway), but this seems unfair to me. However, providing NTP service to third parties is not a business we want to be in so we would expect this to only last around 12 months. If you end up having to renew this service after 12 months then that would be an indication that we haven’t charged you enough and we will increase the price.

Does this seem reasonable?

NetThings UK Ltd happily agreed to this proposal on a quarterly basis.

Thanks again for the info and help. You have saved me a huge amount of convoluted and throwaway work. This give us enough time to fix things properly.

Not plain sailing ^

I only communicated with the Senior Software Engineer one more time. The rest of the correspondence was with financial staff, mainly because NetThings UK Ltd did not like paying its bills on time.

NetThings UK Ltd paid 3 of its 4 invoices in the first year late. I made sure to charge them statutory late payment fees for each overdue invoice.

Yearly report card: must try harder ^

As 2017 was drawing to a close, I asked the Senior Software Engineer how NetThings UK Ltd was getting on with ceasing to hard code BitFolk’s IP address in its products.

To give you a quick summary, we have migrated the majority of our products away from using the fixed IP address. There is still one project to be updated after which there will be no new units being manufactured using the fixed IP address. However, we still have around 1000 units out in the field that are not readily updatable and will continue to perform weekly NTP calls to the fixed IP address. So to answer your question, yes we will still require the service past January 2018.

This was a bit disappointing because a year earlier the number had been “about 500” devices, yet despite a year of effort the number had apparently doubled.

That alone would have been enough for me to increase the charge, but I was going to anyway due to NetThings UK Ltd’s aversion to paying on time. I gave them just over 2 months of notice that the price was going to double.

u wot m8 ^

Approximately 15 weeks after being told that the price doubling was going to happen, NetThings UK Ltd’s Financial Controller asked me why it had happened, while letting me know that another of their late payments had been made:

Date: Wed, 21 Feb 2018 14:59:42 +0000

We’ve paid this now, but can you explain why the price has doubled?

I was very happy to explain again in detail why it had doubled. The Financial Controller in response tried to agree a fixed price for a year, which I said I would be happy to do if they paid for the full year in one payment.

My rationale for this was that a large part of the reason for the increase was that I had been spending a lot of time chasing their late payments, so if they wanted to still make quarterly payments then I would need the opportunity to charge more if I needed to. If they wanted assurance then in my view they should pay for it by making one yearly payment.

There was no reply, so the arrangement continued on a quarterly basis.

All good things… ^

On 20 November 2018 BitFolk received a letter from Deloitte:

Netthings Limited – In Administration (“The Company”)

Company Number: SC313913

[…]

Cessation of Trading

The Company ceased to trade with effect from 15 November 2018.

Investigation

As part of our duties as Joint Administrators, we shall be investigating what assets the Company holds and what recoveries if any may be made for the benefit of creditors as well as the manner in which the Company’s business has been conducted.

And then on 21 December:

Under paragraph 51(1)(b) of the Insolvency Act 1986, the Joint Administrators are not required to call an initial creditors’ meeting unless the Company has sufficient funds to make a distribution to the unsecured creditors, or unless a meeting is requested on Form SADM_127 by 10% or more in value of the Company’s unsecured creditors. There will be no funds available to make a distribution to the unsecured creditors of the Company, therefore a creditors’ meeting will not be convened.

Luckily their only unpaid invoice was for service from some point in November, so they didn’t really get anything that they hadn’t already paid for.

So that’s the story of NetThings UK Ltd, a brave pioneer of the Internet of Things wave, who thought that the public NTP pool was just an inherent part of the Internet that anyone could use for free, and that the way to do that was to pick one IP address out of it at random and bake that into over a thousand bits of hardware that they distributed around the country with no way to remotely update.

This coupled with their innovative reluctance to pay for anything on time was sadly not enough to let them remain solvent.

Google App Engine started requiring Content-Length header on POST requests

TL;DR ^

Update: It’s GoCardless who moved api.gocardless.com to Google Cloud. Google Cloud has behaved this way for years.

I think that Google App Engine may have recently started requiring every POST request to have a Content-Length header, even if there is no request body.

That will cause you problems if your library doesn’t add one for POST requests that have no content. Perl’s HTTP::Request is one such library.

You might be experiencing this if an API has just started replying to you with:

Error 411 (Length Required)!!1

411.That’s an error.

POST requests require a Content-length header. That’s all we know.

(Yes, the title does contain “!!1”.)

You can fix it by adding the header yourself, e.g.:

my $ua = LWP::UserAgent->new;
 
my $req = HTTP::Request->new(
    POST => 'https://api.example.com/things/$id/actions/fettle'
);
 
$req->header('Accept' => 'application/json');
$req->content_type('application/json');
 
my $json;
$json = JSON->new->utf8->canonical->encode($params) if $params;
 
$req->content($json) if $json;
# Explicitly set Content-Length to zero as HTTP::Request doesn't add one
# when there's no content.
$req->header( 'Content-Length' => 0 ) unless $json;
 
my $res = $ua->request( $req );

This is a bit far outside of my comfort zone so I’m not sure if I’m 100% correct, but I do know that sending the header fixes things for me.

What happened? ^

Yesterday a BitFolk customer tried to cancel their Direct Debit mandate, and it didn’t work. The server logs contained the above message.

For Direct Debit payments we use the Perl module Business::GoCardless for integrating with GoCardless, but the additional HTML styling in the message (which I’ve left out for brevity) made clear that the message was coming from Google. api.gocardless.com is hosted on Google App Engine (or some bit of Google cloud anyway).

After a bit of debugging I established that HTTP::Request was only setting a Content-Length header when there was actually request content. The API for cancelling a Direct Debit mandate is to send an empty POST to https://api.gocardless.com//mandates/$id/actions/cancel.

Adding Content-Length: 0 makes it work again.

When did it change? ^

There was a successful mandate cancellation on 25 October 2018, so some time between then and 12 December 2018. I haven’t looked for any change notice put out by Google as I’m not a Google Cloud user and wouldn’t know where to look.

Who’s to blame ^

I haven’t yet looked into whether the HTTP standard requires POST requests to have a Content-Length header. I welcome comments from someone who wants to do the digging.

Realistically even if it doesn’t and Google is just being overly strict, other servers might also be strict, so I guess HTTP::Request should always send the header.

Fun with Supermicro motherboard serial headers

or, “LOL, standards” ^

TL;DR: Most motherboards have a serial header in an IDC-10 (5×2 pins) arrangement with the pins as a row of even numbered pins (2,4,6,8,X) followed by a row of odd numbered pins (1,3,5,7,9). Supermicro ones appear to have the pins in sequential order (6,7,8,9,X and then 1,2,3,4,5). As a result a standard IDC-10 to DB-9 cable will not work and you’ll need to either hack one about or buy the Supermicro one.

Update ^

A comment below kindly points out that Supermicro actually is using a standard header pinout, it’s just that it’s a competing and lesser-used standard. It’s apparently called Intel/DTK or crossover, so that may help you find a working cable.

Are we sitting comfortably? ^

I bought a Supermicro motherboard. It doesn’t have a serial port exposed at the back. I like to use serial ports for a serial console even though I am aware that IPMI exists. IPMI on this board works okay but I like knowing I can always get to the “real” serial port as well.

The motherboard has a COM1 serial header, and I wasn’t using the PCI expansion slot on the back of the chassis, so I decided to put a serial port there. I bought a typical IDC-10 / DB-9 cable and plate:

IDC-10 to DB-9

Didn’t work. Serial-over-LAN (IPMI) worked alright. On COM1 I would get either nothing or a run of garbage characters from time to time. I wasted a good number of hours messing with BIOS settings, baud rates, checking if my USB serial adaptor actually worked with another device (of which I only have one in my home), before I decided to sit down and check the pin numbering for both the header and the cable.

Looking at the motherboard manual we see this:

x10sdv board com1 pin layout

And the cable?

IDC-10 to DB-9 pinout

Notice anything amiss?

The cable’s pins go in a row of odd numbers and then a row of even numbers:

2 4 6 8 X
1 3 5 7 9
    -

The X is the missing pin (serial uses 9 pins) and the - indicates where the notch for the connector would be: next to pin 5 in this case.

The header’s pins go in sequential order:

6 7 8 9 X
1 2 3 4 5
    -

As a result all but pin 1 are incorrect.

You actually need a Supermicro cable for this. CBL-0010L is the part number in my case. CBL-0010LP would be the low profile version. Good luck finding it mentioned on Supermicro’s site, but your favourite reseller will probably know of it. As it was I found one on Ebay for £1.58+VAT, and it works now.

After knowing what to search for I also found someone else having a similar issue with a Supermicro board.

You could of course instead hack any existing cable’s pins about or fit an adaptor in between (as the person in the above link did).

Thanks Supermicro. Thupermicro.

Firefox, Ubuntu and middlemouse.contentLoadURL

I use Firefox web browser, currently on Ubuntu 10.04 LTS. For many years I have set the config option middlemouse.contentLoadURL to true so that middle clicking anywhere in the page (that does not accept input) will load the URL that is in my clipboard.

After restarting my web browser somewhere near the end of January 2012 I found my Firefox 3.x had been upgraded to Firefox 9.x. Also the middle click behaviour no longer worked.

Perusing about:config showed that the option had been set to false again. I set it back to true but on restart of the browser it was set back to false. A bit of searching about found various suggestions about forcing it in my user.js file, but none of those worked either.

Finally, in desperation, I did a search of every file beneath /usr for the string “middlemouse”. Lo and behold:

/usr/lib/firefox-9.0.1/extensions/ubufox@ubuntu.com/defaults/preferences/ubuntu-mods.js

…
pref("middlemouse.contentLoadURL", false); //setting to false disables pasting urls on to the page
…

Commenting this line out once more allowed me to change the setting myself.

It seems this this override was discussed by Ubuntu as far back as 2004, but it only became something that I could not override upon the upgrade to Firefox 9.

I reported a bug about this, and one of the comments seems to suggest that the method Ubuntu uses to change these settings has changed because they were breaking Firefox Sync, and that this outcome (overriding middlemouse.contentLoadURL) is not as bad as breaking Firefox Sync.

Even so, I would suggest that this outcome is very confusing for people and that as middlemouse.contentLoadURL is a popular setting which is easy to change, it should not be overridden in some obscure file.

As of the recent upgrade to Firefox 11, the file with the override in it has now moved to /usr/share/xul-ext/ubufox/defaults/preferences/ubuntu-mods.js.

Dear System Integrators, a few words about screwing

Right, System Integrators – those companies that buy components from Supermicro et al and build you a server out of them. You guys seem to have a bit of a fascination with screwing. Screwing things in as tight as you can. Please stop.

It’s 100% true that vibration of components like hard disks is bad. numerous studies have been done that prove that vibration causes performance problems as drives need to do more corrective work.

However, this does not mean that you have to screw in the drives to the caddies to the limit of what is physically possible. They just need to be tightened until a little force won’t tighten them any more.

When you supply me with a server that’s got four super-tightened screws for each drive in it, and I deploy that server, chances are that one of the first things that will break in that server is one of the disk drives.

During the years those screws have been there they haven’t got any looser. It’s likely that if you tightened them all to the limit of your strength and tools, by now the force required to unscrew them will be less than the force required to deform the screw head. Like this:

Stripped screw heads in a drive caddy

Close-up of a stripped screw head

No, this is not an issue of using the wrong driver head. Yes, you will strip a screw if you use the wrong driver head. That’s why I carry this stuff every time I go to a datacentre:

A selection of screwdrivers for your pleasure

There’s two exactly correct drivers in there, and several that should also work anyway despite being a little bit off. I have never had a problem unscrewing any screw that I originally put in. Probably because I don’t tighten them like I am some sort of lunatic. I can even unscrew them around a corner with the offline driver. Oh yeah baby. So far nothing I have screwed in with merely normal force has fallen apart.

And this is not an isolated occurrence! Nearly all of you seem to do this with every screw, everywhere. Stop it!

The drive in that caddy is a dead one, and luckily I had a spare caddy with me for the replacement drive to go in, otherwise I too would have been screwed beyond the limits of my endurance.

So, now I’ve got to drill those out just to get this caddy back to being useful again. Or more likely find someone else to drill it out for me as I don’t trust myself with power tools really.

ffffuuuuu

Did anyone else get this spam to an address they gave to Red Hat?

On November 2nd I received this spam:

(some headers removed; xxxxxxxxxxx@strugglers.net is my censored email address)

Received: from mail15.soatube.com ([184.105.143.66])
        by mail.bitfolk.com with esmtp (Exim 4.72)
        (envelope-from <bounce@soatube.com%gt;)
        id 1RLikr-00070I-6U
        for xxxxxxxxxxx@strugglers.net; Wed, 02 Nov 2011 21:53:57 +0000
Received: from [64.62.145.53] (mail3.soatube.com [64.62.145.53])
        by mail15.soatube.com (Postfix) with ESMTP id 6B324181CFF
        for <xxxxxxxxxxx@strugglers.net>;
        Wed,  2 Nov 2011 14:46:01 -0700 (PDT)
To: xxxxxxxxxxx@strugglers.net
From: events@idevnews.com
Date: Wed, 02 Nov 2011 14:00:40 -0700
Subject: BPM Panel Discussion: IBM, Oracle and Progress Software

-------------
BPM-CON: BPM Panel Discussion - IBM, Oracle and Progress Software
-------------
Online Conference

Expert Speakers:
IBM, Oracle, Progress Software
etc..

The email address it arrived at was an email address I created in November 2004 in order to take a web-based test on Red Hat’s web site prior to going on an RHCE course. It has only ever been provided to Red Hat, and has not received any email since 2007 (and all of that was from Red Hat). Until November 2nd.

The spam email contains no reference to Red Hat and is not related to any Red Hat product.

From my point of view, I can only think that one of the following things has happened:

  1. Spammers guessed this email address out of the blue, first time, without trying any of the other possible variations of it all of which would still reach me.
  2. One of my computers has been cracked into and the only apparent repercussion is that someone spammed an email address that appears only in an email archive from 2004/2005.
  3. Red Hat knowingly gave/sold my email address to some spammers.
  4. Red Hat or one of its agents have accidentally lost a database containing email addresses.

Possibility #4 seems far and away the most likely.

I contacted Red Hat to ask them if they knew what had happened, but they ignored all of my questions and simply sent me the following statement:

“Hello.

Thank you for contacting Red Hat.

we apologies for the inconvenience caused however we would like to inform you that we have not provided your email address to anyone.

Thank You.

Red Hat Training coordinator.”

That wasn’t really what I was asking. Let’s try again.

“Hi Red Hat Training coordinator,

Thanks for your reply, but I’m afraid I am not very reassured by your response. Do you have any suggestions as to how an email address created in 2004 and used only by yourselves for my RHCE exam managed to be used for unrelated marketing by a third party in 2011, unless Red Hat either provided my email address or leaked my email address?

For clarity we are talking about the email address “xxxxxxxxxxx@strugglers.net” which has never ever received any email except from Red Hat, until yesterday, when it got some unwanted
marketing email from a third party.”

“Hi Andy,

Please be assured that Red Hat does not circulate student’s e-mail address to any third party.

Thanks,
Red Hat Training Coordinator”

I’m not getting anywhere am I? I was only after some reassurance that they would actually look into it. Maybe they are looking into it, and for some reason decided that the best way to assure me of this was to show complete disinterest.

Oh well, I can send that email address to the bitbucket, but I can’t help thinking it’s not just my email address that has been leaked.

Anyone else received similar email? If so, was it to an address you gave to Red Hat?

Update 2011-11-10: Someone suggested I politely ask the marketer where they obtained my email address. It’s worth a try.

“Hi Integration Developer News,

May I ask where you obtained my email address
“xxxxxxxxxxx@strugglers.net”? I’m concerned that it may have been
given to you without my authority.

Thanks,
Andy”

Also I have now been contacted by someone from Red Hat’s Information Security team, who is looking into it. Thanks!