Musings on link shorteners, UUIDv7, Base58, …

Yesterday I started thinking about maybe learning some Rust and as part of that I thought perhaps I might try to implement a link shortener.

Disclaimer ^

Now, clearly there are tons of existing commercial link shorteners, I’m not interested in making a commercial link shortener since:

  • I don’t think I am capable of doing a great job of it.
  • I think being a commercial success here would involve a lot of activity
    that I don’t like.

There’s also plenty of existing FOSS projects that implement a link shortener, in almost every language you can think of. That’s interesting from an inspiration point of view but my goal here is to learn a little bit about Rust so there’s no point in just setting up someone else’s project.

So anyway, point is, I’m not looking to make something commercially viable and I don’t think I can make something better than what exists. I’m just messing about and trying to learn some things.

Seems like a good project to learn stuff – it can be very very simple, but grow to include many different areas such as databases, REST API, authentication and so on.

On Procrasturbation ^

The correct thing to do at this point is to just get on with it. I should not even be writing this article! I should be either doing paying work, life admin, or messing about with my learning project.

As usual though, my life is a testament to not doing the correct thing. 😀 Charitably I’ll say that I can’t stop myself from planning how it should be done rather than just doing it. Realistically there is a lot of procrastination in there too.

So Many Open Questions ^

I hope I’ve made clear that this is a learning goal for me, so it follows that I have a lot of open questions. If you came here looking for expert guidance then you’ve come to the wrong place. If you have answers to my questions that would be great though. And of the few assertions I do make, if you disagree then I’d also like to hear that opinion.

Thinking About Link Shorteners ^

Is enumeration bad? ^

Obviously the entire point of a link shortener is to map short strings to long ones. As a consequence the key space of short strings is quite easy to iterate through and the pressure is always there to keep the space small as that’s what makes for appealing short links. Is this bad? If it is bad, how bad is it and so how much should the temptation to keep things short be resisted?

An extremely naive link shortener might just increment a counter (and perhaps convert decimal digits to a string with more symbols so it’s shorter). For example:

  • example.com/0
  • example.com/1
  • example.com/2
  • example.com/99
  • example.com/100

That’s great for the shortest keys possible but it’s trivial for anyone in the world to just iterate through every link in your database. Users will have an expectation that links they shorten and do not show to anyone else remain known only by them. People shorten links to private documents all the time. But every guess results in a working (or at least, submitted) link and they would be proximal in time: link 99 and link 100 were likely submitted very close together in time, quite possibly by the same person.

A simple counter seems unacceptable here.

But what can a link shortener actually do to defend against enumeration? The obvious answer is rate limiting. Nobody should be doing thousands of GET requests against the shortener. And if the key space was made sparse so that some of these GET requests result in a 404, that’s also highly suspicious and might make banning decisions a lot easier.

Therefore, I think there should be rate limiting, and the key space should be sparse so that most guesses result in a 404 error.

When I say “sparse key space” I mean that the short link key that is generated should be randomly and evenly distributed over a much larger range than is required to fit all links in the database.

How sparse though? Should random guesses result in success 50% of the time? 1%? 0.1%? I don’t know. I don’t have a feel for how big a key space would be to begin with. There is always the tension here against the primary purpose of being short!

If that’s not clear, consider a hash function like md5. You can feed anything into md5 and it’ll give you back a 128 bit value which you could use as the key (the short link). Even if you have billions of links in your database, most of that 128 bit key space will be empty.

(If you’re going to use a hash function for this, there’s much better hash functions than md5, but just to illustrate the point.)

The problem here is, well, it’s 128 bits though. You might turn it into hex or Base64 encode it but there’s no escaping the fact that 128 bits of data is Not Very Short and never will be.

Even if you do distribute over a decently large key space you’ll want to cut it down for brevity purposes, but it’s hard (for me) to know how far you can go with that. After all, if you have just 32 links in your database then you could spread them out between /00 and /ff using only hex and less than 1 in 8 would correspond to a working link, right?

I don’t know if 1 in 8 is a small enough hit rate, especially at the start when it’s clear to an attacker that your space has just 256 possible values.

Alphabets ^

Moving on from the space in which the keys exist, what should they actually look like?

Since I don’t yet know how big the key space will be, but do think it will have to start big and be cut down, maybe I will start by just looking at various ways of representing the full hash to see what kind of compression can be achieved.

I’ve kind of settled on the idea of database keys being UUIDv7. A UUIDv7 is 128 bits although 6 bits of it is reserved for version fields. Out of the top 64 bits, 60 of them are used for a timestamp. Of the bottom 64 bits, 62 of them are used for essentially random data.

I’m thinking that these database keys will be private so it doesn’t matter that if you had one you could extract the submit time out of it (the top 60 bits). The purpose of having the first half of the key be time-based is to make them a bit easier on the database, providing some locality. 128 bits of key is massive overkill but I think it’s worth it for the support (for UUIDv7) across multiple languages and applications.

As I say, I know I’m not going to use all of the 128 bits of the UUIDv7 to generate a short key but just to see what different representations would look like I will start with the whole thing.

Base64 ^

The typical answer to this sort of thing is Base64. A Base64 representation of 128 bits looks like this:

$ dd if=/dev/urandom bs=16 count=1 status=none | base64
uLU3DiqA74492Ma6IMXfyA==

The == at the end are padding and if this doesn’t need to be decoded, i.e. it’s just being used as an identifier — as is the case here — then they can be omitted. So that’s a 22-character string.

Base64URL ^

Base64 has a few issues when used as parts of URLs. Its alphabet contains ‘+’, ‘/’ and (when padding is included) ‘=’, all of which are difficult when included in a URL string.

Base64URL is a modified Base64 alphabet that uses ‘-‘ and ‘_’ instead and has no padding. Apart from being friendly for URLs it will also be 22 characters.

Base58 ^

There are additional problems with Base64 besides its URL-unfriendly alphabet. Some of it is also unfriendly to human eyesight. Its alphabet contains ‘1’, ‘l’, ‘O’ and ‘0’ which are easy to confuse with each other.

The Bitcoin developers came up with Base58 (but let’s not hold that against it…) in order to avoid these transcription problems. Although short links will primarily be copied, pasted and clicked on it does seem desirable to also be able to easily manually transcribe them. How much would we pay for that, in terms of key length?

A Base58 of 128 bits looks like this:

CAfx7fLJ3YBDDvuwwEEPH

That happens to be 21 characters which implies it is somehow shorter than Base64 despite the fact that Base58 has a smaller alphabet than Base64. How is it possible?

It’s because each character of Base58 encodes a fractional amount of data — 58 isn’t a power of 2 — so depending upon what data you put in sometimes it will need 22 characters and other times it only needs 21.

It can be quantified like this:

  • Base64 = log2 64 = 6 bits encoded per character.
  • Base58 = log2 58 = 5.857980995127572 bits per character.

It seems worth it to me. It’s very close.

How much to throw away ^

In order to help answer that question I wanted to visualise just how big various key spaces would be. Finally, I could no longer avoid writing some code!

I’d just watched Jeremy Chone’s video about UUIDs and Rust, so I knocked together this thing that explores UUIDv7 and Base58 representations of (bits of) it. This is the first Rust I have ever written so there’s no doubt lots of issues with it.

The output looks like this:

Full UUID:
  uuid v7 (36): 018f244b-942b-7007-927b-ace4fadf4a88
Base64URL (22): AY8kS5QrcAeSe6zk-t9KiA
   Base58 (21): CAfx7fLJ3YBDDvuwwEEPH
 
Base58 of bottom 64 bits:
              Hex bytes: [92, 7b, ac, e4, fa, df, 4a, 88]
 
Base58 encodes log2(58) = 5.857980995127572 bits per character
 
IDs from…   Max chars Base58          Can store
…bottom 64b 11        RW53EVp5FnF =   18,446,744,073,709,551,616 keys
…bottom 56b 10        5gqCeG4Uij  =       72,057,594,037,927,936 keys
…bottom 48b  9        2V6bFSkrT   =          281,474,976,710,656 keys
…bottom 40b  7        SqN8A3h     =            1,099,511,627,776 keys
…bottom 32b  6        7QvuWo      =                4,294,967,296 keys
…bottom 24b  5        2J14b       =                   16,777,216 keys
…bottom 16b  3        6fy         =                       65,536 keys

The idea here is that the bottom (right most) 64 bits of the UUIDv7 are used to make a short key, but only as many bytes of it as we decide we need.

So for example, if we decide we only need two bytes (16 bits) of random data then there’ll be 216 = 65,536 possible keys which will encode into three characters of Base58 — all short links will be 3 characters for a while.

When using only a few bytes of the UUID there will of course be collisions. These will be rare so I don’t think it will be an issue to just generate another UUID. As the number of existing keys grows, more bytes can be used.

Using more bytes will also enforce how sparse the key space is.

For example, let’s say we decide that only 1 in 1,000 random guesses should hit upon an existing entry. The first 65 links can be just three characters in length. After that the key space has to increase to 4 characters. That gets us 4 × 5.857980995127572 = 23 and change bits of entropy, which is 223 = 8,388,608 keys. Once we get to 8,388 links in the database we have to go to 5 characters which sees us through to 16,777 total keys.

Wrap Up ^

Is that good enough? I don’t know. What do you think?

Ultimately you will not stop determined enumerators. They will use public clouds to request from a large set of sources and they won’t go sequentially.

People should not put links to sensitive-but-publicly-requestable data in link shorteners. People should not put sensitive data anywhere that can be accessed without authentication. Some people will sometimes put sensitive data in places where it can be accessed. I think it’s still worth trying to protect them.

Aside ^

Having some customers who run personal link shorteners that they keep open to the public (i.e. anyone can sumit a link), they constantly get used for linking to malicious content. People link to phishing pages and malware and then put the shortlink into their spam emails so that URL-based antispam is confused. It is a constant source of administrative burden.

If I ever get a minimum viable product it will not allow public link submission.

PowerDNS Truncated SOA Response Problem

I recently upgraded bind9 on my primary nameserver and soon after I noticed that one particular zone would no longer transfer to my secondary nameservers, which run PowerDNS. All the PowerDNS servers were saying:

Nov 18 00:25:26 daiquiri pdns_server[32452]: While checking domain freshness: Query to '2001:ba8:1f1:f085::53' for SOA of 'example.com' did not return a SOA

The confusing thing was that manually using dig to query for this did work fine:

daiquiri$ dig +short -t soa example.com @2001:ba8:1f1:f085::53
ns0.example.com. bind.example.com. 1668670704 28800 14400 3600000 86400

After scratching my head for several hours over this yesterday, I eventually broke out tcpdump and was surprised to see that the response to PowerDNS’s SOA query was indeed empty. And it was also truncated!

Back to dig, I could see that this zone was DNSSEC-signed and the SOA query with DNSSEC info was 2293 bytes in size:

daiquiri$ dig +dnssec -t soa example.com @2001:ba8:1f1:f085::53 | grep MSG
;; MSG SIZE  rcvd: 2293

That’s bigger than a DNS response can be in UDP, so it truncates and the client is supposed to retry over TCP. dig has no problem doing that, but PowerDNS can’t (yet).

Specifically what has changed in bind9 is the EDNS buffer size, down from its previous default of 4096 bytes to 1232 bytes.

I can stop PowerDNS from doing the SOA check at all by upgrading all PowerDNS servers to v4.7.x and using the secondary-check-signature-freshness=no option.

I could put bind9’s EDNS buffer size back up to 4096, but it doesn’t seem advisable to go over about 1400 bytes and so that won’t help.

For now I have enabled the minimal-responses option in bind9, which drops extra records from the Authority and Additional sections of responses unless they are absolutely required. This reduces the response size of that SOA query to 685 bytes, so it no longer truncates and PowerDNS is happy.

I’m not sure if an SOA response can ever go above 1232 bytes now. Maybe as DNSSEC signatures get bigger. So this might not be a permanenet solution and hopefully PowerDNS will gain the ability to retry those SOA queries over TCP.

Exim: Adding the Autonomous System Number as a header in received emails

Updates ^

2022-11-05 ^

  • Added a bit about timeouts, as concern was expressed that I am “bonkers”.

The Problem ^

For statistical purposes I wanted to add the Autonomous System Number (ASN) for the IP address of the connecting host as a header in the received email, like this:

X-ASN: AS63949 2a01:7e01::/32

The Answer ^

You can obtain this information through a DNS query to Team Cymru:

$ sipcalc -r 2a01:7e01::f03c:92ff:fe32:a408
-[ipv6 : 2a01:7e01::f03c:92ff:fe32:a408] - 0
 
[IPV6 DNS]
Reverse DNS (ip6.arpa)  -
8.0.4.a.2.3.e.f.f.f.2.9.c.3.0.f.0.0.0.0.0.0.0.0.1.0.e.7.1.0.a.2.ip6.arpa.
 
-
$ dig +short -t txt 8.0.4.a.2.3.e.f.f.f.2.9.c.3.0.f.0.0.0.0.0.0.0.0.1.0.e.7.1.0.a.2.origin6.asn.cymru.com
"63949 | 2a01:7e01::/32 | US | ripencc | 2011-02-01"

Or for legacy Internet addresses:

$ dig +noall +question -x 199.59.150.116
;116.150.59.199.in-addr.arpa.   IN      PTR
$ dig +short -t txt 116.150.59.199.origin.asn.cymru.com
"13414 | 199.59.148.0/22 | US | arin | 2010-11-23"

So for IPv6 addresses the process is:

  1. Expand the address out fully (2a01:7e01::f03c:92ff:fe32:a4082a01:7e01:0000:0000:f03c:92ff:fe32:a408)
  2. Remove the colons (2a01:7e01:0000:0000:f03c:92ff:fe32:a4082a017e0100000000f03c92fffe32a408)
  3. Reverse it (2a017e0100000000f03c92fffe32a408804a23efff29c30f0000000010e710a2)
  4. Add a dot after every hexadecimal number (804a23efff29c30f0000000010e710a28.0.4.a.2.3.e.f.f.f.2.9.c.3.0.f.0.0.0.0.0.0.0.0.1.0.e.7.1.0.a.2.)
  5. Add origin6.asn.cymru.com on the end (8.0.4.a.2.3.e.f.f.f.2.9.c.3.0.f.0.0.0.0.0.0.0.0.1.0.e.7.1.0.a.2.origin6.asn.cymru.com)
  6. Query that TXT record and parse out the first two values separated by ‘|’ in the response.

For legacy IP addresses the process is much simpler; reverse the octets, add origin.asn.cymru.com on the end and query that.

An Exim Answer ^

In Exim configuration you can do it like this:

(This is meant to go inside an ACL like your check_rcpt or check_data. Maybe near the end of check_data at the point where you’ve already decided to accept the email. No point in doing this for an email you will reject.)

# Add X-ASN: header for IPv6 senders.
  warn message = X-ASN: AS${sg{${extract{1}{|}{$acl_m9}}}{\N\s+\N}{}} ${sg{${extract{2}{{|}{$acl_m9}}}{\N\s+\N}{}}
     condition = ${if isip6{$sender_host_address}}
    set acl_m9 = ${lookup dnsdb{txt=${reverse_ip:$sender_host_address}.origin6.asn.cymru.com}}
 
# Add X-ASN: header for legacy IP senders.
  warn message = X-ASN: AS${sg{${extract{1}{|}{$acl_m9}}}{\N\s+\N}{}} ${sg{${extract{2}{{|}{$acl_m9}}}{\N\s+\N}{}}
     condition = ${if isip4{$sender_host_address}}
    set acl_m9 = ${lookup dnsdb{txt=${reverse_ip:$sender_host_address}.origin.asn.cymru.com}}

I dislike that I’ve had to use two tests that are almost exactly the same except they query slightly different DNS names (origin6.asn.cymru.com vs origin.asn.cymru.com). I’m sure it could be done in one, but I’m not good enough with the Exim string evaluations. They send me cross-eyed. I couldn’t find a better way so I decided to use the time-honoured tactic of posting what I have in order to provoke people into correcting me. Please let me know if you can improve it!

The amount of nested {} will probably drive you mad, but basically:

  • ${reverse_ip:$sender_host_address} handles expanding and reversing an IP address into the form you would use for a reverse DNS query.
  • That gets queried in DNS with the correct suffix and the full response stored in $acl_m9.
  • warn message = X-ASN: adds a header to the email, the content of which is built from two fields extracted out of $acl_m9 with all whitespace removed (${sg{source}{regex}{replacement}}).

What about timeouts? ^

One piece of feedback I got was that I am “bonkers” to make my email delivery rely on a real time network lookup. I can kind of see the argument, but also not: this is a DNS query exactly the same as a typical DNSBL query (Team Cymru IP-to-ASN service is used exactly like a typical DNSBL).

Most people’s mail servers do multiple DNSBL queries already and nobody really is up in arms saying it’s bonkers to do so. My Exim already does a couple of DNSBL queries and then if it is going to deliver the email it will call out to SpamAssassin which does many DNSBL queries itself. If these hit a timeout then it would slow down my mail delivery.

In the past where a DNSBL has unceremoniously shut down and made its nameservers unresponsive I have seen problems, as it caused the delivery processes to pile up while they waited on their timeouts and then Exim would complain that there’s too many processes. That would be resolved by removing the errant DNSBL(s) from the configuration.

Query load is not a concern as DNS is highly scalable and my system is not going to add noticeable load to Team Cymru’s already public service. The SpamAssassin ASN plugin is already out there, hard coded to use this same service and must have many many users already.

As far as I can tell, in Exim dnsdb queries use the same timeouts and retries as dnslist queries do, that being controlled by the dns_retrans and dns_rety settings. These settings both default to 0, which means “operating system / resolver library default”. If you were worried you could explicitly set these to their minimum value:

If still worried then you would first have to either turn off all DNSBLs or make sure you had local copies of them (e.g. by arranging AXFR to your own local servers). Then to do the IP-to-ASN locally you’d arrange to have a local BGP feed that you could query. I think you’d need to have an absolutely huge mail server before these issues became real concerns.

…
set acl_m9 = ${lookup dnsdb{retrans_1s,retry_1,txt=${reverse_ip:$sender_host_address}.origin6.asn.cymru.com}}
…

As for dnslist, the consequence of a time out is that you get no data, so it would just result in an empty header.

But Why? ^

I’ve actually been doing this for a while with SpamAssassin’s ASN plugin but I’ve changed the way in which I query SpamAssassin and now I don’t directly get the rewritten email that SpamAssassin makes (with its X-Spam-ASN: header in).

I use it for feeding into Bayes to see if there’s a particular prevalence of ASN amongst the email that is classified as spam, and I sometimes add a few points on manually for ASNs that are particularly bad. That is a lot less work than trying to track down all their IP addresses and keep that up to date.

Using Duplicity to back up to Amazon S3 over IPv6 (only)

Scenario ^

I have a server that I use for making backups. I also send backups from that server into Amazon S3 at the “Infrequent Access” storage class. That class is cheaper to store but expensive to access. It’s intended for backups of last resort that you only access in an emergency. I use Duplicity to handle the S3 part.

(I could save a bit more by using one of the “Glacier” classes but at the moment the cost is minimal and I’m not that brave.)

I recently decided to change which server I use for the backups. I noticed that renting a server with only IPv6 connectivity was cheaper, and as all the hosts I back up have IPv6 connectivity I decided to give that a go.

This mostly worked fine. The only thing I really noticed was when I tried to install some software from GitHub. GitHub doesn’t support IPv6, so I had to piggy back that download through another host.

Then I came to set up Duplicity again and found that I needed to make some non-obvious changes to make it work with S3 over IPv6-only.

S3 endpoint ^

The main issue is that the default S3 endpoint URL is https://s3.<region>.amazonaws.com, and this host only has an A (IPv4) record! For example:

$ host s3.us-east-1.amazonaws.com
s3.us-east-1.amazonaws.com has address 52.216.89.254

If you run Duplicity with a target like s3://yourbucketname/path/to/backup then it will try that endpoint, get only an IPv4 address, and return Network unreachable.

S3 does actually support IPv6, but for that to work you need to use a dual stack endpoint! They look like this:

$ host s3.dualstack.us-east-1.amazonaws.com
s3.dualstack.us-east-1.amazonaws.com has address 54.231.129.0
s3.dualstack.us-east-1.amazonaws.com has IPv6 address 2600:1fa0:80dc:5101:34d9:451e::

So we need to specify the S3 endpoint to use.

Specifying the S3 endpoint ^

In order to do this you need to switch Duplicity to the “boto3” backend. Assuming you’ve installed the correct package (python3-boto3 on Debian), this is as simple as changing the target from s3://… to boto3+s3://….

That then allows you to use the command line arguments --s3-region-name and --s3-endpoint-url so you can tell it which host to talk to. That ends up giving you both an IPv4 and an IPv6 address and your system correctly chooses the IPv6 one.

The full script ^

The new, working script now looks something like this:

export PASSPHRASE="highlysecret"
export AWS_ACCESS_KEY_ID="notquiteassecret"
export AWS_SECRET_ACCESS_KEY="extremelysecret"
# Somewhere with plenty of free space.
export TMPDIR=/var/tmp
 
duplicity --encrypt-key ABCDEF0123456789 \
          --asynchronous-upload \
          -v 4 \
          --archive-dir=/path/to/your/duplicity/archives \
          --s3-use-ia \
          --s3-use-multiprocessing \
          --s3-use-new-style \
          --s3-region-name "us-east-1" \
          --s3-endpoint-url "https://s3.dualstack.us-east-1.amazonaws.com" \
          incr \
          --full-if-older-than 30D \
          /stuff/you/want/backed/up \
          "boto3+s3://yourbucketname/path/to/backups"

The previous version of the script looked a bit like:

# All the exports stayed the same
duplicity --encrypt-key ABCDEF0123456789 \
          --asynchronous-upload \
          -v 4 \
          --archive-dir=/path/to/your/duplicity/archives \
          --s3-use-ia \
          --s3-use-multiprocessing \
          incr \
          --full-if-older-than 30D \
          /stuff/you/want/backed/up \
          "s3+http://yourbucketname/path/to/backups"

Keeping firewall logs out of Linux’s kernel log with ulogd2

A few words about iptables vs nft ^

nftables is the new thing and iptables is deprecated, but I haven’t found time to convert everything to nft rules syntax yet.

I’m still using iptables rules but it’s the iptables frontend to nftables. All of this works both with legacy iptables and with nft but with different syntax.

Logging with iptables ^

As a contrived example let’s log inbound ICMP packets at a maximum rate of 1 per second:

-A INPUT -m limit --limit 1/s -p icmp -j LOG --log-level 7 --log-prefix "ICMP: "

The Problem ^

If you have logging rules in your firewall then they’ll log to your kernel log, which is available at /dev/kmsg. The dmesg command displays the contents of /dev/kmsg but /dev/kmsg is a fixed size circular buffer, so after a while your firewall logs will crowd out every other thing.

On a modern systemd system this stuff does get copied to the journal, so if you set that to be persistent then you can keep the kernel logs forever. Or you can additionally run a syslog daemon like rsyslogd, and have that keep things forever.

Either way though your dmesg or journalctl -k commands are only going to display the contents of the kernel’s ring buffer which will be a limited amount.

I’m not that interested in firewall logs. They’re nice to have and very occasionally valuable when debugging something, but most of the time I’d rather they weren’t in my kernel log.

An answer: ulogd2 ^

One answer to this problem is ulogd2. ulogd2 is a userspace logging daemon into which you can feed netfilter data and have it log it in a flexible way, to multiple different formats and destinations.

I actually already use it to log certain firewall things to a MariaDB database for monitoring purposes, but you can also emit plain text, JSON, netflow and all manner of things. Since I’m already running it I decided to switch my general firewall logging to it.

Configuring ulogd2 ^

I added the following to /etc/ulogd.conf:

# This one for logging to local file in emulated syslog format.
stack=log2:NFLOG,base1:BASE,ifi1:IFINDEX,ip2str1:IP2STR,print1:PRINTPKT,emu1:LOGEMU
 
[log2]
group=2
 
[emu1]
file="/var/log/iptables_ulogd2.log"
sync=1

I already had a stack called log1 for logging to MariaDB, so I called the new one log2 with its output being emu1.

The log2 section can then be told to expect messages from netfilter group 2. Don’t worry about this, just know that this is what you refer to in your firewall rules, and you can’t use group 0 because that’s used for something else.

The emu1 section then says which file to write this stuff to.

That’s it. Restart the daemon.

Configuring iptables ^

Now it’s time to make iptables log to netfilter group 2 instead of its normal LOG target. As a reminder, here’s what the rule was like before:

-A INPUT -m limit --limit 1/s -p icmp -j LOG --log-level 7 --log-prefix "ICMP: "

And here’s what you’d change it to:

-A INPUT -m limit --limit 1/s -p icmp -j NFLOG --nflog-group 2 --nflog-prefix "ICMP:"

The --nflog-group 2 needs to match what you put in /etc/ulogd.conf.

You’re now logging with ulogd2 and none of this will be going to the kernel log buffer. Don’t forget to rotate the new log file! Or maybe you’d like to play with logging this as JSON or into a SQLite DB?

Fun With SpamAssassin Meta Rules

I’ve got a ticketing system. Let’s say you open a ticket by emailing support@example.com. You then get an automated response confirming that you’ve opened a ticket, and on my side people get bothered by a notification about this support ticket that needs attention.

A problem here is that absolutely anyone or anything emailing that will open a ticket. And it’s pretty easy to find that email address.

As a result lots of scum of the earthenterprising individuals seem to be passing that email address around to other enterprising individuals who decide to add it to their email marketing mailshots.

A reasonable response to this would perhaps be to move away from email to a web form, and put it behind a login so that only existing, authenticated customers could submit new tickets. Thing is, I still have to have a way for previously-unknown people to create tickets by email, and I kind of like email. So I persevere.

One thing I can do though is block all kinds of newsletters. There is no scenario where people who send newsletters should be trying to open support tickets. I’m prepared to disallow any email from MailJet or SendGrid being sent to my ticketing system, for example.

But how to do it?

Well, I am already able to identify emails from MailJet and SendGrid because I use the ASN plugin. This inserts a header in the email to say which Autonomous System it came from.

MailJet’s ASN is 200069 and SendGrid’s is 11377. I know that because I’ve seen mail from them before, and the ASN plugin put a header in with those numbers.

You can add some custom rules to match mails from these ASNs:

header   LOCAL_ASN_MAILJET X-ASN =~ /\b200069\b/
score    LOCAL_ASN_MAILJET 0.001
describe LOCAL_ASN_MAILJET Sent by MailJet (ASN200069)

What this will do is check the header that the ASN plugin added and if it matches it will add this label LOCAL_ASN_MAILJET with a score of 0.001 to the list of SpamAssassin scores.

Scores that are very close to zero (but not actually zero!) are typically used just to annotate an email. You can’t use zero exactly because that disables the rule entirely.

Now, if you really didn’t want any email from MailJet at all you could crank that score up and it would all be rejected. But my users do actually get quite a lot of wanted email from the likes of MailJet and SendGrid. These senders are sadly too big to block. They know this, and this probably contributes to their noted preference for taking spammers’ money, but that is a rant for another day.

Back to the original goal: I only want to reject mail from these companies if it is destined for my ticketing system. So how to identify mail that’s for the support queue? Well that’s pretty simple:

header   LOCAL_TO_SUPPORT ToCc:addr =~ /^support\@example\.com$/i
score    LOCAL_TO_SUPPORT 0.001
describe LOCAL_TO_SUPPORT Recipient is support queue

This checks just the address part(s) of the To and Cc headers to see if any match support@example.com. The periods (‘.’) and the at symbol (‘@’) need escaping because this is a Perl regular expression. If there’s a match then the LOCAL_TO_SUPPORT tag will be added.

Now all that remains is to make a new rule that only fires if both of these conditions are true, and assigns a real score to that:

meta     LOCAL_MAILSHOT_TO_SUPPORT (LOCAL_TO_SUPPORT && (LOCAL_ASN_MAILJET || LOCAL_ASN_SENDGRID))
score    LOCAL_MAILSHOT_TO_SUPPORT 10.0
describe LOCAL_MAILSHOT_TO_SUPPORT Mailshot sent to support queue

There. Now the support queue will never get emails from these companies, but the rest of my users still can.

Of course you don’t have to match those mails by ASN. There are many other indicators of senders that just shouldn’t be opening support tickets, and if you can find any other sort of rule that matches them reliably then you can chain that with other rules that identify the support queue recipient.

Another way to do it would be to run the support queue as its own SpamAssassin user with its own per-user rules. I have a fairly simple SpamAssassin setup though with only a global set of rules so I didn’t want to do that just for this.

Forcing zone transfers with BIND and PowerDNS

The Problem ^

Today a customer told me that they had messed up the serial numbers on their DNS zones such that their primary server now had a lower serial number than my secondary servers. Once that happens the secondary servers will stop doing zone transfers.

The Fix(es) ^

TL;DR: I chose the last one, “force a zone transfer”. I knew the BIND one but had to look up the PowerDNS way. Having me look things up for you is (sometimes) part of the BitFolk value proposition. 😀

Increment the serial a bit ^

They could fix it by simply incrementing their serial again to make it larger than mine, but they wanted to continue to use a YYYYMMDDXX format for it.

Increment the serial a lot ^

As the serial is an unsigned integer, if you increment it far enough it will wrap around and become actually smaller than your desired new serial, which you can then set. This is a complicated process which is best described elsewhere.

Delete the zones and re-add them ^

If zones were deleted from all secondary servers then the next update should put them back. This would however cause an outage in between, so it’s not a good idea.

Force a zone transfer ^

Here’s how to force a zone transfer on BIND and PowerDNS.

BIND ^

$ rndc retransfer example.com

PowerDNS ^

$ pdns_control retrieve example.com

Forcing the source address of an SNMP client (e.g. snmpwalk)

I was using snmpwalk earlier and it kept using the “wrong” IP address to send packets from. The destination was firewalled to only accept packets from certain sources, and I didn’t want to poke another hole just because snmpwalk was being stupid.

I read a lot of man pages to try to find out how to specify the source address but couldn’t find anything anywhere. Eventually I uncovered a post from 2004 saying that you can use the clientaddr directive in snmp.conf.

So, just so this is easy to find the next time I need it, you can force it on the command line with:

$ snmpwalk --clientaddr=192.168.0.1 …

And you do have to put the ‘=’ in there.

The Internet of Unprofitable Things

Gather round children ^

Uncle Andrew wants to tell you a festive story. The NTPmare shortly after Christmas.

A modest proposal ^

Nearly two years ago, on the afternoon of Monday 16th January 2017, I received an interesting BitFolk support ticket from a non-customer. The sender identified themselves as a senior software engineer at NetThings UK Ltd.

Subject: Specific request for NTP on IP 85.119.80.232

Hi,

This might sound odd but I need to setup an NTP server instance on IP address 85.119.80.232.

wats 85.119.80.232 precious? ^

85.119.80.232 is actually one of the IP addresses of one of BitFolk’s customer-facing NTP servers. It was also, until a few weeks before this email, part of the NTP Pool project.

Was” being the important issue here. In late December of 2016 I had withdrawn BitFolk’s NTP servers from the public pool and firewalled them off to non-customers.

I’d done that because they were receiving an unusually large amount of traffic due to the Snapchat NTP bug. It wasn’t really causing any huge problems, but the number of traffic flows were pushing useful information out of Jump‘s fixed-size netflow database and I didn’t want to deal with it over the holiday period, so this public service was withdrawn.

NTP? ^

This article was posted to Hacker News and a couple of comments there said they would have liked to have seen a brief explanation of what NTP is, so I’ve now added this section. If you know what NTP is already then you should probably skip this section because it will be quite brief and non-technical.

Network Time Protocol is a means by which a computer can use multiple other computers, often from across the Internet on completely different networks under different administrative control, to accurately determine what the current time is. By using several different computers, a small number of them can be inaccurate or even downright broken or hostile, and still the protocol can detect the “bad” clocks and only take into account the more accurate majority.

NTP is supposed to be used in a hierarchical fashion: A small number of servers have hardware directly attached from which they can very accurately tell the time, e.g. an atomic clock, GPS, etc. Those are called “Stratum 1” servers. A larger number of servers use the stratum 1 servers to set their own time, then serve that time to a much larger population of clients, and so on.

It used to be the case that it was quite hard to find NTP servers that you were allowed to use. Your own organisation might have one or two, but really you should have at least 3 to 7 of them and it’s better if there are multiple different organisations involved. In a university environment that wasn’t so difficult because you could speak to colleagues from another institution and swap NTP access. As the Internet matured and became majority used by corporations and private individuals though, people still needed access to accurate time, and this wasn’t going to cut it.

The NTP Pool project came to the rescue by making an easy web interface for people to volunteer their NTP servers, and then they’d be served collectively in a DNS zone with some basic means to share load. A private individual can just use three names from the pool zone and they will get three different (constantly changing) NTP servers.

Corporations and those making products that need to query the NTP pool are supposed to ask for a “vendor zone”. They make some small contribution to the NTP pool project and then they get a DNS zone dedicated to their product, so it’s easier for the pool administrators to direct the traffic.

Sadly many companies don’t take the time to understand this and just use the generic pool zone. NetThings UK Ltd went one step further in a very wrong direction by taking an IP address from the pool and just using it directly, assuming it would always be available for their use. In reality it was a free service donated to the pool by BitFolk and as it had become temporarily inconvenient for that arrangement to continue, service was withdrawn.

On with the story…

They want what? ^

The Senior Software Engineer continued:

The NTP service was recently shutdown and I am interested to know if there is any possibility of starting it up again on the IP address mentioned. Either through the current holder of the IP address or through the migration of the current machine to another address to enable us to lease 85.119.80.232.

Um…

I realise that this is a peculiar request but I can assure you it is genuine.

That’s not gonna work ^

Obviously what with 85.119.80.232 currently being in use by all customers as a resolver and NTP server I wasn’t very interested in getting them all to change their configuration and then leasing it to NetThings UK Ltd.

What I did was remove the firewalling so that 85.119.80.232 still worked as an NTP server for NetThings UK Ltd until we worked out what could be done.

I then asked some pertinent questions so we could work out the scope of the service we’d need to provide. Questions such as:

  • How many clients do you have using this?
  • Do you know their IP addresses?
  • When do they need to use the NTP server and for how long?
  • Can you make them use the pool properly (a vendor zone)?

Down the rabbit hole ^

The answers to some of the above questions were quite disappointing.

It would be of some use for our manufacturing setup (where the RTCs are initially set) but unfortunately we also have a reasonably large field population (~500 units with weekly NTP calls) that use roaming GPRS SIMs. I don’t know if we can rely on the source IP of the APN for configuring the firewall in this case (I will check though). We are also unable to update the firmware remotely on these devices as they only have a 5MB per month data allowance. We are able to wirelessly update them locally but the timeline for this is months rather than weeks.

Basically it seemed that NetThings UK Ltd made remote controlled thermostats and lighting controllers for large retail spaces etc. And their devices had one of BitFolk’s IP addresses burnt into them at the factory. And they could not be identified or remotely updated.

Facepalm

Oh, and whatever these devices were, without an external time source their clocks would start to noticeably drift within 2 weeks.

By the way, they solved their “burnt into it at the factory” problem by bringing up BitFolk’s IP address locally at their factory to set initial date/time.

Group Facepalm

I’ll admit, at this point I was slightly tempted to work out how to identify these devices and reply to them with completely the wrong times to see if I could get some retail parks to turn their lights on and off at strange times.

Weekly?? ^

We are triggering ntp calls on a weekly cron with no client side load balancing. This would result in a flood of calls at the same time every Sunday evening at around 19:45.

Yeah, they made every single one of their unidentifiable devices contact a hard coded IP address within a two minute window every Sunday night.

The Senior Software Engineer was initially very worried that they were the cause of the excess flows I had mentioned earlier, but I reassured them that it was definitely the Snapchat bug. In fact I never was able to detect their devices above background noise; it turns out that ~500 devices doing a single SNTP query is pretty light load. They’d been doing it for over 2 years before I received this email.

I did of course point out that they were lucky we caught this early because they could have ended up as the next Netgear vs. University of Wisconsin.

I am feeling really, really bad about this. I’m very, very sorry if we were the cause of your problems.

Bless. I must point out that throughout all of this, their Senior Software Engineer was a pleasure to work with.

We made a deal ^

While NTP service is something BitFolk provides as a courtesy to customers, it’s not something that I wanted to sell as a service on its own. And after all, who would buy it, when the public pool exists? The correct thing for a corporate entity to do is support the pool with a vendor zone.

But NetThings UK Ltd were in a bind and not allowing them to use BitFolk’s NTP server was going to cause them great commercial harm. Potentially I could have asked for a lot of money at this point, but (no doubt to my detriment) that just felt wrong.

I proposed that initially they pay me for two hours of consultancy to cover work already done in dealing with their request and making the firewall changes.

I further proposed that I charged them one hour of consultancy per month for a period of 12 months, to cover continued operation of the NTP server. Of course, I do not spend an hour a month fiddling with NTP, but this unusual departure from my normal business had to come at some cost.

I was keen to point out that this wasn’t something I wanted to continue forever:

Finally, this is not a punitive charge. It seems likely that you are in a difficult position at the moment and there is the temptation to charge you as much as we can get away with (a lot more than £840 [+VAT per year], anyway), but this seems unfair to me. However, providing NTP service to third parties is not a business we want to be in so we would expect this to only last around 12 months. If you end up having to renew this service after 12 months then that would be an indication that we haven’t charged you enough and we will increase the price.

Does this seem reasonable?

NetThings UK Ltd happily agreed to this proposal on a quarterly basis.

Thanks again for the info and help. You have saved me a huge amount of convoluted and throwaway work. This give us enough time to fix things properly.

Not plain sailing ^

I only communicated with the Senior Software Engineer one more time. The rest of the correspondence was with financial staff, mainly because NetThings UK Ltd did not like paying its bills on time.

NetThings UK Ltd paid 3 of its 4 invoices in the first year late. I made sure to charge them statutory late payment fees for each overdue invoice.

Yearly report card: must try harder ^

As 2017 was drawing to a close, I asked the Senior Software Engineer how NetThings UK Ltd was getting on with ceasing to hard code BitFolk’s IP address in its products.

To give you a quick summary, we have migrated the majority of our products away from using the fixed IP address. There is still one project to be updated after which there will be no new units being manufactured using the fixed IP address. However, we still have around 1000 units out in the field that are not readily updatable and will continue to perform weekly NTP calls to the fixed IP address. So to answer your question, yes we will still require the service past January 2018.

This was a bit disappointing because a year earlier the number had been “about 500” devices, yet despite a year of effort the number had apparently doubled.

That alone would have been enough for me to increase the charge, but I was going to anyway due to NetThings UK Ltd’s aversion to paying on time. I gave them just over 2 months of notice that the price was going to double.

u wot m8 ^

Approximately 15 weeks after being told that the price doubling was going to happen, NetThings UK Ltd’s Financial Controller asked me why it had happened, while letting me know that another of their late payments had been made:

Date: Wed, 21 Feb 2018 14:59:42 +0000

We’ve paid this now, but can you explain why the price has doubled?

I was very happy to explain again in detail why it had doubled. The Financial Controller in response tried to agree a fixed price for a year, which I said I would be happy to do if they paid for the full year in one payment.

My rationale for this was that a large part of the reason for the increase was that I had been spending a lot of time chasing their late payments, so if they wanted to still make quarterly payments then I would need the opportunity to charge more if I needed to. If they wanted assurance then in my view they should pay for it by making one yearly payment.

There was no reply, so the arrangement continued on a quarterly basis.

All good things… ^

On 20 November 2018 BitFolk received a letter from Deloitte:

Netthings Limited – In Administration (“The Company”)

Company Number: SC313913

[…]

Cessation of Trading

The Company ceased to trade with effect from 15 November 2018.

Investigation

As part of our duties as Joint Administrators, we shall be investigating what assets the Company holds and what recoveries if any may be made for the benefit of creditors as well as the manner in which the Company’s business has been conducted.

And then on 21 December:

Under paragraph 51(1)(b) of the Insolvency Act 1986, the Joint Administrators are not required to call an initial creditors’ meeting unless the Company has sufficient funds to make a distribution to the unsecured creditors, or unless a meeting is requested on Form SADM_127 by 10% or more in value of the Company’s unsecured creditors. There will be no funds available to make a distribution to the unsecured creditors of the Company, therefore a creditors’ meeting will not be convened.

Luckily their only unpaid invoice was for service from some point in November, so they didn’t really get anything that they hadn’t already paid for.

So that’s the story of NetThings UK Ltd, a brave pioneer of the Internet of Things wave, who thought that the public NTP pool was just an inherent part of the Internet that anyone could use for free, and that the way to do that was to pick one IP address out of it at random and bake that into over a thousand bits of hardware that they distributed around the country with no way to remotely update.

This coupled with their innovative reluctance to pay for anything on time was sadly not enough to let them remain solvent.

Using a different theme for Mediawiki’s SyntaxHighlight extension

Probably the best syntax highlighting plugin for Mediawiki at the moment is the one simply called SyntaxHighlight. It uses Pygments to do the heavy lifting. What sets it apart from the other extensions is that it supports line numbers and picking out highlighted lines.

Unfortunately the default style (theme) is dark-on-light whereas for most of my syntax highlighting I am giving examples of either shell sessions or code. All my shell sessions and code are viewed as light-on-dark, so I would prefer that the wiki’s syntax highlighting followed suit.

I spent quite a while messing about with editing the extension itself but to little effect, until Robert pointed out that I just needed to edit the Common.css file inside the wiki itself. Then you get some decent results.

I used something like this to generate the correct CSS for the “native” style:

$ ./extensions/SyntaxHighlight_GeSHi/pygments/pygmentize -S native -f html|sed -e 's/^/.mw-highlight > pre /'
.mw-highlight > pre .hll { background-color: #404040 }
.mw-highlight > pre .c { color: #999999; font-style: italic } /* Comment */
.mw-highlight > pre .err { color: #a61717; background-color: #e3d2d2 } /* Error */
.mw-highlight > pre .esc { color: #d0d0d0 } /* Escape */
.mw-highlight > pre .g { color: #d0d0d0 } /* Generic */
.mw-highlight > pre .k { color: #6ab825; font-weight: bold } /* Keyword */
.mw-highlight > pre .l { color: #d0d0d0 } /* Literal */
.mw-highlight > pre .n { color: #d0d0d0 } /* Name */
.mw-highlight > pre .o { color: #d0d0d0 } /* Operator */
.mw-highlight > pre .x { color: #d0d0d0 } /* Other */
.mw-highlight > pre .p { color: #d0d0d0 } /* Punctuation */
.mw-highlight > pre .ch { color: #999999; font-style: italic } /* Comment.Hashbang */
.mw-highlight > pre .cm { color: #999999; font-style: italic } /* Comment.Multiline */
.mw-highlight > pre .cp { color: #cd2828; font-weight: bold } /* Comment.Preproc */
.mw-highlight > pre .cpf { color: #999999; font-style: italic } /* Comment.PreprocFile */
.mw-highlight > pre .c1 { color: #999999; font-style: italic } /* Comment.Single */
.mw-highlight > pre .cs { color: #e50808; font-weight: bold; background-color: #520000 } /* Comment.Special */
.mw-highlight > pre .gd { color: #d22323 } /* Generic.Deleted */
.mw-highlight > pre .ge { color: #d0d0d0; font-style: italic } /* Generic.Emph */
.mw-highlight > pre .gr { color: #d22323 } /* Generic.Error */
.mw-highlight > pre .gh { color: #ffffff; font-weight: bold } /* Generic.Heading */
.mw-highlight > pre .gi { color: #589819 } /* Generic.Inserted */
.mw-highlight > pre .go { color: #cccccc } /* Generic.Output */
.mw-highlight > pre .gp { color: #aaaaaa } /* Generic.Prompt */
.mw-highlight > pre .gs { color: #d0d0d0; font-weight: bold } /* Generic.Strong */
.mw-highlight > pre .gu { color: #ffffff; text-decoration: underline } /* Generic.Subheading */
.mw-highlight > pre .gt { color: #d22323 } /* Generic.Traceback */
.mw-highlight > pre .kc { color: #6ab825; font-weight: bold } /* Keyword.Constant */
.mw-highlight > pre .kd { color: #6ab825; font-weight: bold } /* Keyword.Declaration */
.mw-highlight > pre .kn { color: #6ab825; font-weight: bold } /* Keyword.Namespace */
.mw-highlight > pre .kp { color: #6ab825 } /* Keyword.Pseudo */
.mw-highlight > pre .kr { color: #6ab825; font-weight: bold } /* Keyword.Reserved */
.mw-highlight > pre .kt { color: #6ab825; font-weight: bold } /* Keyword.Type */
.mw-highlight > pre .ld { color: #d0d0d0 } /* Literal.Date */
.mw-highlight > pre .m { color: #3677a9 } /* Literal.Number */
.mw-highlight > pre .s { color: #ed9d13 } /* Literal.String */
.mw-highlight > pre .na { color: #bbbbbb } /* Name.Attribute */
.mw-highlight > pre .nb { color: #24909d } /* Name.Builtin */
.mw-highlight > pre .nc { color: #447fcf; text-decoration: underline } /* Name.Class */
.mw-highlight > pre .no { color: #40ffff } /* Name.Constant */
.mw-highlight > pre .nd { color: #ffa500 } /* Name.Decorator */
.mw-highlight > pre .ni { color: #d0d0d0 } /* Name.Entity */
.mw-highlight > pre .ne { color: #bbbbbb } /* Name.Exception */
.mw-highlight > pre .nf { color: #447fcf } /* Name.Function */
.mw-highlight > pre .nl { color: #d0d0d0 } /* Name.Label */
.mw-highlight > pre .nn { color: #447fcf; text-decoration: underline } /* Name.Namespace */
.mw-highlight > pre .nx { color: #d0d0d0 } /* Name.Other */
.mw-highlight > pre .py { color: #d0d0d0 } /* Name.Property */
.mw-highlight > pre .nt { color: #6ab825; font-weight: bold } /* Name.Tag */
.mw-highlight > pre .nv { color: #40ffff } /* Name.Variable */
.mw-highlight > pre .ow { color: #6ab825; font-weight: bold } /* Operator.Word */
.mw-highlight > pre .w { color: #666666 } /* Text.Whitespace */
.mw-highlight > pre .mb { color: #3677a9 } /* Literal.Number.Bin */
.mw-highlight > pre .mf { color: #3677a9 } /* Literal.Number.Float */
.mw-highlight > pre .mh { color: #3677a9 } /* Literal.Number.Hex */
.mw-highlight > pre .mi { color: #3677a9 } /* Literal.Number.Integer */
.mw-highlight > pre .mo { color: #3677a9 } /* Literal.Number.Oct */
.mw-highlight > pre .sa { color: #ed9d13 } /* Literal.String.Affix */
.mw-highlight > pre .sb { color: #ed9d13 } /* Literal.String.Backtick */
.mw-highlight > pre .sc { color: #ed9d13 } /* Literal.String.Char */
.mw-highlight > pre .dl { color: #ed9d13 } /* Literal.String.Delimiter */
.mw-highlight > pre .sd { color: #ed9d13 } /* Literal.String.Doc */
.mw-highlight > pre .s2 { color: #ed9d13 } /* Literal.String.Double */
.mw-highlight > pre .se { color: #ed9d13 } /* Literal.String.Escape */
.mw-highlight > pre .sh { color: #ed9d13 } /* Literal.String.Heredoc */
.mw-highlight > pre .si { color: #ed9d13 } /* Literal.String.Interpol */
.mw-highlight > pre .sx { color: #ffa500 } /* Literal.String.Other */
.mw-highlight > pre .sr { color: #ed9d13 } /* Literal.String.Regex */
.mw-highlight > pre .s1 { color: #ed9d13 } /* Literal.String.Single */
.mw-highlight > pre .ss { color: #ed9d13 } /* Literal.String.Symbol */
.mw-highlight > pre .bp { color: #24909d } /* Name.Builtin.Pseudo */
.mw-highlight > pre .fm { color: #447fcf } /* Name.Function.Magic */
.mw-highlight > pre .vc { color: #40ffff } /* Name.Variable.Class */
.mw-highlight > pre .vg { color: #40ffff } /* Name.Variable.Global */
.mw-highlight > pre .vi { color: #40ffff } /* Name.Variable.Instance */
.mw-highlight > pre .vm { color: #40ffff } /* Name.Variable.Magic */
.mw-highlight > pre .il { color: #3677a9 } /* Literal.Number.Integer.Long */

(Yes, I also need to do the light-on-dark thing here in this blog)

To get a list of available styles:

$ ./extensions/SyntaxHighlight_GeSHi/pygments/pygmentize -L styles
Pygments version 2.2.0, (c) 2006-2017 by Georg Brandl.
 
Styles:
~~~~~~~
* manni:
    A colorful style, inspired by the terminal highlighting style.
* igor:
    Pygments version of the official colors for Igor Pro procedures.
* lovelace:
    The style used in Lovelace interactive learning environment. Tries to avoid the "angry fruit salad" effect with desaturated and dim colours.
* xcode:
    Style similar to the Xcode default colouring theme.
* vim:
    Styles somewhat like vim 7.0
* autumn:
    A colorful style, inspired by the terminal highlighting style.
* abap:
 
* vs:
 
* rrt:
    Minimalistic "rrt" theme, based on Zap and Emacs defaults.
* native:
    Pygments version of the "native" vim theme.
* perldoc:
    Style similar to the style used in the perldoc code blocks.
* borland:
    Style similar to the style used in the borland IDEs.
* arduino:
    The Arduino® language style. This style is designed to highlight the Arduino source code, so exepect the best results with it.
* tango:
    The Crunchy default Style inspired from the color palette from the Tango Icon Theme Guidelines.
* emacs:
    The default style (inspired by Emacs 22).
* friendly:
    A modern style based on the VIM pyte theme.
* monokai:
    This style mimics the Monokai color scheme.
* paraiso-dark:
 
* colorful:
    A colorful style, inspired by CodeRay.
* murphy:
    Murphy's style from CodeRay.
* bw:
 
* pastie:
    Style similar to the pastie default style.
* rainbow_dash:
    A bright and colorful syntax highlighting theme.
* algol_nu:
 
* paraiso-light:
 
* trac:
    Port of the default trac highlighter design.
* default:
    The default style (inspired by Emacs 22).
* algol:
 
* fruity:
    Pygments version of the "native" vim theme.

Although you may find it easier looking at the Pygments style gallery.