Adventures in entropy, part 1

A while back, a couple of BitFolk customers mentioned to me that they were having problems running out of entropy.

A brief explanation of entropy as it relates to computing ^

Where we say entropy, we could in layman’s terms say “randomness”. Computers need entropy for a lot of things, particularly cryptographic operations. You may not think that you do a lot of cryptography on your computer, and you personally probably don’t, but for example every time you visit a secure web site (https://…) your computer has to set up a cryptographic channel with the server. Cryptographic algorithms generally require a lot of random data and it has to be secure random data. For the purposes of this discussion, “secure” means that an attacker shouldn’t be able to guess or influence what the random data is.

Why would an attacker be able to guess or influence the random data if it is actually random? Because it’s not actually random. The computer has to get the data from somewhere. A lot of places it might be programmed to get it from may seem random but potentially aren’t. A silly implementation might just use the number of seconds the computer has been running as a basis for generating “random” numbers, but you can see that an attacker can guess this and may even be able to influence it, which could weaken any cryptographic algorithm that uses the “random” data.

Modern computers and operating systems generate entropy based on events like electrical noise, timings of data coming into the computer over the network, what’s going on with the disks, etc. fed into algorithms — what we call pseudo-random number generators (PRNGs). A lot of data goes in and a relatively small amount of entropy comes out, but it’s entropy you should be able to trust.

That works reasonably well for conventional computers and servers, but it doesn’t work so well for virtual servers. Virtual servers are running in an emulated environment, with very little access to “real” hardware. The random data that conventional computers get from their hardware doesn’t happen with emulated virtual hardware, so the prime source of entropy just isn’t present.

When you have an application that wants some entropy and the system has no more entropy to give, what usually happens is that the application blocks, doing nothing, until the system can supply some more entropy. Linux systems have two ways for applications to request entropy: there’s /dev/random and /dev/urandom. random is the high-quality one. When it runs out, it blocks until there is more available. urandom will supply high-quality entropy until it runs out, then it will generate more programmatically, so it doesn’t block, but it might not be as secure as random. I’m vastly simplifying how these interfaces work, but that’s the basic gist of it.

What to do when there’s no more entropy? ^

If you’re running applications that want a lot of high-quality entropy, and your system keeps running out, there’s a few things you could do about it.

Nothing ^

So stuff slows down, who cares? It’s only applications that want high-quality entropy and they’re pretty specialised, right?

Well, no, not really. If you’re running a busy site with a lot of HTTPS connections then you probably don’t want it to be waiting around for more entropy when it could be serving your users. Another one that tends to use all the entropy is secure email – mail servers talking to each other using Transport Layer Security so the email is encrypted on the wire.

Use real hosting hardware ^

Most of BitFolk’s customers are using it for personal hosting, this problem is common to virtual hosting platforms (it’s not a BitFolk-specific issue), and BitFolk doesn’t provide dedicated/colo servers, so arguably I don’t need to consider this my problem to fix. If the customer could justify greater expense then they could move to a dedicated server or colo provider to host their stuff.

Tell the software to use urandom instead ^

In a lot of cases it’s possible to tell the applications to use urandom instead. Since urandom doesn’t block, but instead generates more lower-quality entropy on demand, there shouldn’t be a performance problem. There are obvious downsides to this:

  • If the application author wanted high-quality entropy, it might be unwise to not respect that.
  • Altering this may not be as simple as changing its configuration. You might find yourself having to recompile the software, which is a lot of extra work.

You could force this system-wide by replacing your /dev/random with /dev/urandom.

Customers could get some more entropy from somewhere else ^

It’s possible to feed your own data into your system’s pseudo-random number generator, so if you have a good source of entropy you can help yourself. People have used some weird and wonderful things for entropy sources. Some examples:

  • A sound card listening to electro-magnetic interference (“static”).
  • A web camera watching a lava lamp.
  • A web camera in a dark box, so it just sees noise on its CCD.

The problem for BitFolk customers of course is that all they have is a virtual server. They can’t attach web cams and sound cards to their servers! If they had real servers then they probably wouldn’t be having this issue at all.

BitFolk could get some entropy from somewhere else, and serve it to customers ^

BitFolk has the real servers, so I could do the above to get some extra entropy. I might not even need extra entropy; I could just serve the entropy that the real machines have. If it wasn’t for the existence of the Simtec Electronics Entropy Key then that’s probably what I’d be trying.

I haven’t got time to be playing about with sound cards listening to static, webcams in boxes and things like that, but buying a relatively cheap little gadget is well within the limit of things I’m prepared to risk wasting money on. 🙂

Customers would need to trust my entropy, of course. They already need to trust a lot of other things that I do though.

Entropy Key ^

Entropy Keys are very interesting little gadgets and I encourage you to read about how they work. It’s all a bit beyond me though, so for the purposes of this series of blog posts I’ll just take it as read that you plug in an Entropy Key into a USB port, run ekeyd and it feeds high quality entropy into your PRNG.

I’d been watching the development of the Entropy Key with interest. When they were offered for cheap at the Debian-UK BBQ in 2009 I was sorely tempted, but I knew I wasn’t going to be able to attend, so I left it.

Then earlier this year, James at Jump happened to mention that he was doing a bulk order (I assume to fix this same issue for his own VPS customers) if anyone wanted in. Between the Debian BBQ and then I’d had a few more complaints about people running out of entropy so at ~£30 each I was thinking it was definitely worth exploring with one of them; perhaps buy more if it works.

How much entropy do I have anyway? ^

Before stuffing more entropy in to my systems, I was curious how much I had available anyway. On Linux you can check this by looking at /proc/sys/kernel/random/entropy_avail. I think this value is in bytes, and tops out at 4096. Not hard to plug this in to your graphing system.

Click on the following images to see the full-size versions.

Typical host server, no Entropy Key ^

Here’s what some typical BitFolk VM hosting servers have in terms of available entropy.

barbar.bitfolk.com available entropy, daily

That’s pretty good. The available entropy hovers close to 4096 bytes all the time. It’s what you’d expect from a typical piece of computer hardware. The weekly view shows the small jitter:

barbar.bitfolk.com available entropy, weekly

The lighter pink area is the highest 5-minute reading in each 30 minute sample. The dark line is the lowest 5-minute reading. You can see that there is a small amount of jitter where the available entropy fluctuates between about 3250 and 4096 bytes.

Here’s a couple of the other host servers just to see the pattern:

corona.bitfolk.com available entropy, daily

corona.bitfolk.com available entropy, weekly

faustino.bitfolk.com available entropy, daily

faustino.bitfolk.com available entropy, weekly

No surprises here; they’re all much the same. If these were the only machines I was using then I’d probably decide that I have enough entropy.

Typical general purpose Xen-based paravirtualised virtual machine ^

Here’s a typical general purpose BitFolk VPS. It’s doing some crypto stuff, but there’s a good mix of every type of workload here.

bitfolk.com available entropy, daily

bitfolk.com available entropy, weekly

These graphs are very different. There’s much more jitter and a general lack of entropy to begin with. Still, it never appears to reach zero (although it’s important to realise that these graphs are at best 5-minute averages, so the minimum and maximum values will be lower and higher within that 5-minute span) so there doesn’t seem to be a huge problem here.

Virtual machines with more crypto ^

Here’s a couple of VMs which are doing more SSL work.

cacti.bitfolk.com available entropy, daily

cacti.bitfolk.com available entropy, weekly

This one has a fair number of web visitors and they’re all HTTPS. You can see that it’s even more jittery, and spends most of its time with less than 1024 bytes of entropy available. It goes as low as ~140 bytes from time to time, and because of the 5-minute sampling it’s possible that it does run out.

panel0.bitfolk.com available entropy, daily

panel0.bitfolk.com available entropy, weekly

Again, this one has some HTTPS traffic and is faring worse for entropy, with an average of only ~470 bytes available. I ran a check every second for several hours and available entropy at times was as low as 133 bytes.

Summary so far ^

BitFolk doesn’t have any particularly busy crypto-heavy VMs so the above was the best I could do. I think that I’ve shown that virtual machines do have less entropy generally available, and that a moderate amount of crypto work can come close to draining it.

Based on the above results I probably wouldn’t personally take any action since it seems none of my own VMs run out of entropy, although I am unsure if the 133 bytes I measured was merely as low as the pool is allowed to go before blocking happens. In any case, I am not really noticing poor performance.

Customers have reported running out of entropy though, so it might still be something I can fix, for them.

Where next? ^

Next:

  • See what effect using an Entropy Key has on a machine’s available entropy.
  • Assuming it has a positive effect, see if I can serve this entropy to other machines, particularly virtual ones.
  • Can I serve it from a virtual machine, so I don’t have customers interacting with my real hosts?
  • Does one Entropy Key give enough entropy for everyone that wants it?
  • Can I add extra keys and serve their entropy in a highly-available fashion?

Those are the things I’ll be looking into and will blog some more about in later parts. This isn’t high priority though so it might take a while. In the meantime, if you’re a BitFolk customer who actually is experiencing entropy exhaustion in a repeatable fashion then it’d be great if you could get in touch with me so we can see if it can be fixed.

In part 2 of this series of posts I do get the key working and serve entropy to my virtual machines.

11 thoughts on “Adventures in entropy, part 1

  1. It’s worth noting that simply using cat on /proc/sys/kernel/random/entropy_avail depletes your entropy. Try:

    watch -n 0.25 cat /proc/sys/kernel/random/entropy_avail

    and watch your free entropy drain away. Granted, checking every 5 minutes probably won’t hurt too much, but it’s an interesting effect.

  2. There are also kernel patches (grsec iirc) that increase the poolsize to 16384, which certainly helps in some situations.

  3. Hugo,

    Good point, I forgot about that. I think that’s because a certain amount of entropy is required to create a process.

    With 0.25 seconds between “cat” processes it made the available entropy jump around a lot but not really run out. I ran it with 0.1 seconds and I was able to bring available entropy down to hover around 0.

    I have an ekey plugged into a machine now and when I tried it on that one, it stayed mostly around 4096. I saw it drop to 3084 for a fraction of a second before it went back up again. That’s encouraging, if nothing else.

    This is the machine I have the ekey plugged into:

    http://tools.bitfolk.com/cacti/graph_2012.html

    At ~0430 UTC I shut down the ekeyd so it wasn’t getting any entropy from the ekey. I don’t yet understand why the available entropy dropped to ~400 bytes. Unfortunately I have no graphs of this machine from before the ekey was ever plugged in. It could be the normal state for it to have only ~400 bytes available, although this would make it unique amongst my host machines.

    In the next hour that graph should show a little dip, which is where I ran the “watch 0.1 cat …”.

  4. mksh now has a cat(1) builtin in the current version (not yet released)… so no process creation 😉 You could of course put it into some self-contained dæmon – which can use the ioctl instead of procfs, too.

    I’m a happy eKey user tho…

  5. Firstly, thanks for this great writeup, very useful. I manage a number of non-virtualised servers that nonetheless suffer from severe entropy-starvation. It may be because they are not very highly loaded IO-wise or because I try to be privacy conscious and encourage all my clients to use SSH/SSL/TLS as much as possible. But anyway, I needed a solution. The end result is that I’ve set up an OpenVPN network across all my servers and distribute entropy over EGD from one of them equipped with an EntropyKey. This solves the unencrypted-TCP problem, and with topped up entropy pools on all machines, OpenVPN should be very secure 😉

    On a side note, the default entropy pool on linux currently is 4096 *bits*, not even bytes. That seems very limited, but of course the size of this buffer is only relevant if you have spikes in your entropy demand (or dips in its generation).

    1. It seems to be based around a laser beam, and isn’t yet commercialised. I don’t think I will be plugging this into my colo servers any time soon!

  6. Great write up Andy. Always nice to see stuff from 7 years ago is still very relevant today! There are a lot more entropy generation sources around today, but I want to throw out https://getnetrandom.com – this is an online entropy as a service, which uses another quantum-based source for true entropy. Give it a go! (full disclosure, I work here)

Leave a Reply to mirabilos Cancel reply

Your email address will not be published. Required fields are marked *