I’m very pleased to finally be able to report some progress has been made recently on improving‘s services.

For a very long time, all of was running from a machine kindly hosted in Leeds by Energis Squared. We (well, Alasdair) had made an attempt to install Xen on it but there was something clearly not right in the networking functionality of the kernel because connections were randomly stalling for minutes or hours at a time. Most noticeably for our userbase, emails were sometimes taking 12 hours to go through the server and out to the lists, and the web sites were often unusable.

None of us having official access to the datacentre made it even more difficult and so basically the machine had stayed in that state for ages.

New server's motherboardSome time in early 2005 I had gone begging to various server hardware distributors asking if any of them would be willing to donate a new server for LUG UK usage, and DNUK kindly offered a basic 2U unit. The hardware arrived at my place and I had a go installing Xen on it, with moral support from Alan Pope who helped tremendously by eating curry.

Unfortunately I could get nowhere with Xen, it just didn’t like the hardware. It was puzzling to me because I obviously had Xen working fine elsewhere, but a lack of serial port on the server made it hard to debug.

Closeup of buggered screwThings stayed in that state until late 2005 when I gave xen 3.0-unstable a try, and lo and behold it worked. At that point I did a proper “production” build and the machine went off to a HantsLUG meet where they sorted out a buggered up screw and donated 2GB of RAM!

In early January 2006 Black Cat Networks made a very generous cheap colo offer, so the server was delivered to Redbus II and was put on line there. It was after moving the main Xen domain from the box in Leeds to the new server in London and putting it under the real load of delivering all those emails that a fairly serious memory leak in the kernel had the machine needing to be rebooted every 6-8 hours!

I must have compiled 10 different kernels, all with various Xen patches, to try and get rid of this problem up until FOSDEM weekend, when I had to give it a break. We even unpacked a Fedora Core 5 kernel source RPM and compiled a kernel from that (on Debian Sarge!), but no joy.

Finally Alasdair mailed me a patch from Neil Brown (the MD maintainer) that was meant to fix one bug, I applied it, and so far so good. That was about five days ago and none of the same problems have appeared since.

We haven’t had any time to change how things actually work at, but the mere fact that the mail is now flowing through in less than a minute instead of half a day has had a dramatic effect on most of the LUG lists we host. It’s really good to see.

In future it seems clear we need to look at the following:

  • Upgrading to Exim 4 in a Xen domain dedicated to handling the email.
  • Reducing the massive Exim queue (that is full of bounces to forged spam) by not accepting the rubbish in the first place: sensible and obvious HELO checks, checking the sender domain actually exists, greylisting, sensible DNSBLs, SMTP-time antispam and antivirus.
  • Moving the user-controlled web content (CGI, PHP) into its own domain or domains, because it’s just too dangerous.
  • More centralised services that all LUGs can use, e.g. wiki, blogs?

My thanks to all who helped out during the recent transition and all who will help with the forthcoming improvements.

Leave a Reply

Your email address will not be published. Required fields are marked *