Sleeping deterministically in a shell script

Occasionally on a Unix machine you might want a task to happen at the same time every iteration, but not have to specify the exact time. An example of this could be when getting multiple machines to mirror something from a master — you would like this to happen say every half an hour, but not for every machine to do it at the exact same time.

If you are distributing out your configs from a central place then you can’t hard code a time since then every machine will hit at once. Also if you are suggesting to others a config to use, you do not want to specify a time otherwise they will hard code it too.

A technique I’ve used for a few years is to use the “hostid” command to get a sort of serial number for each host:

ruminant$ hostid
0dd44bc6
specialbrew$ hostid
a8c00800

I use this inside sleep commands to pause some tasks for a “random” but deterministic amount of time. For example, in /etc/cron.daily/000-sleep-deterministically I have:

#!/bin/bash

sleep $(($(printf \%d 0x$(hostid)) \% 24))h

The printf command changes the hex string into a decimal number and then the outer $(()) construct does a modulus 24, so the whole thing will return a “random” number between 0 and 23. The script then sleeps for that many hours.

As it is named 000-sleep-deterministically it is the first script to run out of /etc/cron.daily/ and thus holds up the entire daily cron routine by this “random” number of hours — but because it is deterministic, the next run will still be exactly 24 hours later, so we still get a single daily cron run at the same time every day.

The point is that when all the virtual machines on one server do their daily cron it won’t be at the exact same moment for all of them, which can be a real issue when you have 30+ machines on one piece of hardware.

I use the same thing for other cron jobs, often calling them like:

*/30 * * * * sleep $(($(printf \%d 0x$(hostid)) \% 30))m && some_heavy_task

This fires every half an hour but then sleeps for between 0 and 29 minutes. I can use the same cron job on every machine but not have to worry about multiple clients running some_heavy_task and all hitting the server at :00 and :30.

“hostid” doesn’t really produce a random number of course — it’s based on the IP address of the host. dg pointed out that “hostid” is still broken in Ubuntu if you have 127.0.1.1 in /etc/hosts. If you can’t use the workaround suggested then maybe you could use the MAC address of an interface or similar.

This technique was also discussed on Debian Administration a while back, but the site seems to be down right now so I can’t find the exact article.