Update: It’s GoCardless who moved api.gocardless.com to Google Cloud. Google Cloud has behaved this way for years.
I think that Google App Engine may have recently started requiring every POST request to have a Content-Length header, even if there is no request body.
That will cause you problems if your library doesn’t add one for POST requests that have no content. Perl’s HTTP::Request is one such library.
You might be experiencing this if an API has just started replying to you with:
Error 411 (Length Required)!!1
POST requests require a Content-length header.
(Yes, the title does contain “!!1”.)
You can fix it by adding the header yourself, e.g.:
my$ua= LWP::UserAgent->new;my$req= HTTP::Request->new(
POST =>'https://api.example.com/things/$id/actions/fettle');$req->header('Accept'=>'application/json');$req->content_type('application/json');my$json;$json= JSON->new->utf8->canonical->encode($params)if$params;$req->content($json)if$json;# Explicitly set Content-Length to zero as HTTP::Request doesn't add one# when there's no content.$req->header('Content-Length'=>0)unless$json;my$res=$ua->request($req);
my $ua = LWP::UserAgent->new;
my $req = HTTP::Request->new(
POST => 'https://api.example.com/things/$id/actions/fettle'
$req->header('Accept' => 'application/json');
$json = JSON->new->utf8->canonical->encode($params) if $params;
$req->content($json) if $json;
# Explicitly set Content-Length to zero as HTTP::Request doesn't add one
# when there's no content.
$req->header( 'Content-Length' => 0 ) unless $json;
my $res = $ua->request( $req );
This is a bit far outside of my comfort zone so I’m not sure if I’m 100% correct, but I do know that sending the header fixes things for me.
Yesterday a BitFolk customer tried to cancel their Direct Debit mandate, and it didn’t work. The server logs contained the above message.
For Direct Debit payments we use the Perl module Business::GoCardless for integrating with GoCardless, but the additional HTML styling in the message (which I’ve left out for brevity) made clear that the message was coming from Google. api.gocardless.com is hosted on Google App Engine (or some bit of Google cloud anyway).
After a bit of debugging I established that HTTP::Request was only setting a Content-Length header when there was actually request content. The API for cancelling a Direct Debit mandate is to send an empty POST to https://api.gocardless.com//mandates/$id/actions/cancel.
There was a successful mandate cancellation on 25 October 2018, so some time between then and 12 December 2018. I haven’t looked for any change notice put out by Google as I’m not a Google Cloud user and wouldn’t know where to look.
In September 2014 GoCardless came out with a new version of their API called the “Pro API”. It made a few things nicer but didn’t come with any real new features applicable to BitFolk, and also added a minimum fee of £0.20.
The original API I’d integrated against has a 1% fee capped at £2.00, and as BitFolk’s smallest plan is £10.79 including VAT the fee would generally be £0.11. Having a £0.20 fee on these payments would represent nearly a doubling of fees for many of my payments.
So, no compelling reason to use the Pro API.
Over the years, GoCardless made more noise about their Pro API and started calling their original API the “legacy API”. I could see the way things were going. Sure enough, eventually they announced that the legacy API would be disabled on 31 October 2017. No choice but to move to the Pro API now.
There aren’t normally any limits on Direct Debit payments. When you let your energy supplier or council or whatever do a Direct Debit, they can empty your bank account if they like.
The Direct Debit Guarantee has very strong provisions in it for protecting the payee and essentially if you dispute anything, any time, you get your money back without question and the supplier has to pursue you for the money by other means if they still think the charge was correct. A company that repeatedly gets Direct Debit chargebacks is going to be kicked off the service by their bank or payment provider.
The original GoCardless API had the ability to set caps on the mandate which would be enforced their side. A simple “X amount per Y time period”. I thought that this would provide some comfort to customers who may not be otherwise familiar with authorising Direct Debits from small companies like BitFolk, so I made use of that feature by default.
This turned out to be a bad decision.
The main problem with this was that there was no way to change the cap. If a customer upgraded their service then I’d have to cancel their Direct Debit mandate and ask them to authorise a new one because it would cease being possible to charge them the correct amount. Authorising a new mandate was not difficult—about the same amount of work as making any sort of online payment—but asking people to do things is always a pain point.
There was a long-standing feature request with GoCardless to implement some sort of “follow this link to authorise the change” feature, but it never happened.
The Pro API does not support mandates with a capped amount per interval. Given that I’d already established that it was a mistake to do that, I wasn’t too bothered about that.
I’ve since discovered however that the Pro API not only does not support setting the caps, it does not have any way to query them either. This is bad because I need to use the Pro API with mandates that were created in the legacy API. And all of those have caps.
Here’s the flow I had using the legacy API.
This way if the charge was coming a little too early, I could give some latitude and let it wait a couple of days until it could be charged. I’d also know if the problem was that the cap was too low. In that case there would be no choice but to cancel the customer’s mandate and ask them to authorise another one, but at least I would know exactly what the problem was.
With the Pro API, there is no way to check timings and charge caps. All I can do is make the charge, and then if it’s too soon or too much I get the same error message:
“Validation failed / exceeds mandate cap”
That’s it. It doesn’t tell me what the cap is, it doesn’t tell me if it’s because I’m charging too soon, nor if I’m charging too much. There is no way to distinguish between those situations.
GoCardless talk about the Pro API being backwards compatible to the legacy API, so that once switched I would still be able to create payments against mandates that were created using the legacy API. I would not need to get customers to re-authorise.
This is true to a point, but my use of caps per interval in the legacy API has severely restricted how compatible things are, and that’s something I wasn’t aware of. Sure, their “Guide to upgrading” does briefly mention that caps would continue to be enforced:
“Pre-authorisation mandates are not restricted, but the maximum amount and interval that you originally specified will still apply.”
That is the only mention of this issue in that entire document, and that statement would be fine by me, if there would have continued to be a way to tell which failure mode would be encountered.
Thinking that I was just misunderstanding, I asked GoCardless support about this. Their reply:
Thanks for emailing.
I’m afraid the limits aren’t exposed within the new API. The only solution as you suggest, is to try a payment and check for failure.
Apologies for the inconvenience caused here and if you have any further queries please don’t hesitate to let us know.
The nuclear option would be to cancel all mandates and ask customers to authorise them again. I would like to avoid this if possible.
I am thinking that most customers continue to be fine on the “amount per interval” legacy mandates as long as they don’t upgrade, so I can leave them as they are until that happens. If they upgrade, or if a DD payment ever fails with “exceeds mandate cap” then I will have to cancel their mandate and ask them to authorise again. I can see if their mandate was created before ~today and advise them on the web site to cancel it and authorise it again.
I’m a little disappointed that GoCardless didn’t think that there would need to be a way to query mandate caps even though creating new mandates with those limits is no longer possible.
I can’t really accept that there is a good level of backwards compatibility here if there is a feature that you can’t even tell is in use until it causes a payment to fail, and even then you can’t tell which details of that feature cause the failure.
I understand why they haven’t just stopped honouring the caps: it wouldn’t be in line with the consumer-focused spirit of the Direct Debit Guarantee to alter things against customer expectations, and even sending out a notification to the customer might not be enough. I think they should have gone the other way and allowed querying of things that they are going to continue to enforce, though.
Could I have tested for this? Well, the difficulty there is that the GoCardless sandbox environment for the Pro API starts off clean with no access to any of your legacy activity neither from live nor from legacy sandbox. So I couldn’t do something like the following:
Create legacy mandate in legacy sandbox, with amount per interval caps
Try to charge against the legacy mandate from the Pro API sandbox, exceeding the cap
Observe that it fails but with no way to tell why
I did note that there didn’t seem to be attributes of the mandate endpoint that would let me know when it could be charged and what the amount left to charge was, but it didn’t set off any alarm bells. Perhaps it should have.
Also I will admit I’ve had years to switch to Pro API and am only doing it now when forced. Perhaps if I had made a start on this years ago, I’d have noted what I consider to be a deficiency, asked them to remedy it and they might have had time to do so. I don’t actually think it’s likely they would bump the API version for that though. In my defence, as I mentioned, there is nothing attractive about the Pro API for my use, and it does cost more, so no surprise I’ve been reluctant to explore it.
So, if you are scrambling to update your GoCardless integration before 31 October, do check that you are prepared for payments against capped mandates to fail.
I used MusicBrainz Picard to reorganize hundreds of the files in my music collection. For some reason Banshee spotted that new files appeared, but it didn’t remove the old ones from the library. My library had hundreds of entries relating to files that no longer existed on disk.
I would have hoped that Banshee would just skip when it got to a playlist entry that didn’t exist, but unfortunately (as of v2.4.1 in Ubuntu 12.04) it decides to stop playing at that point. I have to switch to it and double click on the next playlist entry, and even then it thinks it is playing the file that it tried to play before. So I have to double click again.
I have filed a bug about this.
In the mean time I wanted to remove all the dead entries from my library. I could delete my library and re-import it all, but I wanted to keep some of the metadata. I knew it was an SQLite database, so I poked around a bit and came up with this. It seems to work, so maybe it will be useful for someone else.
A few days ago we unfortunately had some abuse reports regarding customers with DNS resolvers being abused in order to participate in a distributed denial of service attack.
Amongst other issues, DNS servers which are misconfigured to allow arbitrary hosts to do recursive queries through them can be used by attackers to launch an amplified attack on a forged source address.
I try to scan our address space reasonably often but I must admit I hadn’t done so for some time. I kicked off another scan and found one more customer with a misconfigured resolver, which has since been fixed.
After mentioning that I would do a scan I was asked how I do that.
I use a Perl script I’ve hacked together over the last couple of years. I took a few minutes to tidy it up and add a small amount of documentation (run it with --man to read that), so here it is in case anyone finds it useful:
Update: This code has now been moved to GitHub. If you have any comments, problems or improvements please do submit them as an issue and I will get to it much quicker. The gist below is now out of date so please don’t use it.
Using the default 100 concurrent queries it scans a /21 in about 80 seconds (YMMV depending upon how many hosts you have that firewall 53/UDP). That scales sort of linearly with how many you do, so using -q 200 for example will cut that down to about 40 seconds. It’s only a select loop though so it’ll use more CPU if you do that.
Two things I’ve noticed since:
It doesn’t handle failing to create a socket with bgsend so for example if you run up against your limit of file descriptors (commonly ~1024 on Linux) the whole thing will get stuck at 100% CPU.
One person reporting a similar situation (bgsend fails, stuck at 100% CPU) when they allowed it to try to send to a broadcast address. I haven’t been ale to replicate that one yet.
For ages I’ve been trying to work out how to programmatically add a watermark to an existing PDF using the Perl PDF::API2 module.
In case it’s useful for anyone else, here goes:
my $ifile = "/some/file.pdf";
my $ofile = "/some/newfile.pdf";
my $pdf = PDF::API2->open($ifile);
my $fnt = $pdf->ttfont('verdana.ttf', -encode => 'utf8');
# Only on the front page
my $page = $pdf->openpage(1);
my $text = $page->text;
# Create two extended graphic states; one for transparent content
# and one for normal content
my $eg_trans = $pdf->egstate();
my $eg_norm = $pdf->egstate();
# Go transparent
# Black edges
# With light green fill
# Text render mode: fill text first then put the edges on top
# Diagonal transparent text
$text->textlabel(55, 235, $fnt, 56, "i'm in ur pdf", -rotate => 30);
# Back to non-transparent to do other stuff
I think there may be better ways of doing this within Perl now, but I already had some code using PDF::API2.
Having a bunch of Linux servers that run Linux virtual machines I often find myself having to move a virtual machine from one server to another. The tricky thing is that I’m not in a position to be using shared storage, i.e., the virtual machines’ storage is local to the machine they are running on. So, the data has to be moved first.
The above transfers data between hosts via ssh, which will introduce some overhead since it will be encrypting everything. You may or may not wish to force it to do compression, or pipe it through a compressor (like gzip) first, or even avoid ssh entirely and just use nc.
Personally I don’t care about the ssh overhead; this is on the whole customer data and I’m happier if it’s encrypted. I also don’t bother compressing it unless it’s going over the Internet. Over a gigabit LAN I’ve found it fastest to use ssh with the -c arcfour option.
The above process works, but it has some fairly major limitations:
The virtual machine needs to be shut down for the whole time it takes to transfer data from one host to another. For 10GiB of data that’s not too bad. For 100GiB of data it’s rather painful.
It transfers the whole block device, even the empty bits. For example, if it’s a 10GiB block device with 2GiB of data on it, 10GiB still gets transferred.
Limitation #2 can be mitigated somewhat by compressing the data. But we can do better.
One of the great things about LVM is snapshots. You can do a snapshot of a virtual machine’s logical volume while it is still running, and transfer that using the above method.
But what do you end up with? A destination host with an out of date copy of the data on it, and a source host that is still running a virtual machine that’s still updating its data. How to get just the differences from the source host to the destination?
Again there is a naive approach, which is to shut down the virtual machine and mount the logical volume on the host itself, do the same on the destination host, and use rsync to transfer the differences.
This will work, but again has major issues such as:
It’s technically possible for a virtual machine admin to maliciously construct a filesystem that interferes with the host that mounts it. Mounting random filesystems is risky.
Even if you’re willing to risk the above, you have to guess what the filesystem is going to be. Is it ext3? Will it have the same options that your host supports? Will your host even support whatever filesystem is on there?
What if it isn’t a filesystem at all? It could well be a partitioned disk device, which you can still work with using kpartx, but it’s a major pain. Or it could even be a raw block device used by some tool you have no clue about.
The bottom line is, it’s a world of risk and hassle interfering with the data of virtual machines that you don’t admin.
Sadly rsync doesn’t support syncing a block device. There’s a --copy-devices patch that allows it to do so, but after applying it I found that while it can now read from a block device, it would still only write to a file.
Next I found a --write-devices patch by Darryl Dixon, which provides the other end of the functionality – it allows rsync to write to a block device instead of files in a filesystem. Unfortunately no matter what I tried, this would just send all the data every time, i.e., it was no more efficient than just using dd.
Are you OK? Do you need to have a nice cup of tea and a sit down for a bit? Yeah. I did too.
I’ve rewritten this thing into a single Perl script so it’s a little bit more readable, but I’ll attempt to explain what the above abomination does.
Even though I do refer to this script in unkind terms like “abomination”, I will be the first to admit that I couldn’t have come up with it myself, and that I’m not going to show you my single Perl script version because it’s still nearly as bad. Sorry!
It connects to the destination host and starts a Perl script which begins reading the block device over there, 1024 bytes at a time, running that through md5 and piping the output to a Perl script running locally (on the source host).
The local Perl script is reading the source block device 1024 bytes at a time, doing md5 on that and comparing it to the md5 hashes it is reading from the destination side. If they’re the same then it prints “s” otherwise it prints “c” followed by the actual data from the source block device.
The output of the local Perl script is fed to a third Perl script running on the destination. It takes the sequence of “s” or “c” as instructions on whether to skip 1024 bytes (“s”) of the destination block device or whether to take 1024 bytes of data and write it to the destination block device (“c<1024 bytes of data>“).
The lzop bits are just doing compression and can be changed for gzip or omitted entirely.
Hopefully you can see that this is behaving like a very very dumb version of rsync.
The thing is, it works really well. If you’re not convinced, run md5sum (or sha1sum or whatever you like) on both the source and destination block devices to verify that they’re identical.
The process now becomes something like:
Take an LVM snapshot of virtual machine block device while the virtual machine is still running.
Create suitable logical volume on destination host.
Use dd to copy the snapshot volume to the destination volume.
Move over any other configuration while that’s taking place.
When the initial copy is complete, shut down the virtual machine.
Run the script of doom to sync over the differences from the real device to the destination.
When that’s finished, start up the virtual machine on the destination host.
Delete snapshot on source host.
1024 bytes seemed like rather a small buffer to be working with so I upped it to 1MiB.
I find that on a typical 10GiB block device there might only be a few hundred MiB of changes between snapshot and virtual machine shut down. The entire device does have to be read through of course, but the down time and data transferred is dramatically reduced.
Is there a better way to do this, still without shared storage?
It’s getting difficult to sell the disk capacity that comes with the number of spindles I need for performance, so maybe I could do something with DRBD so that there’s always another server with a copy of the data?
This seems like it should work, but I’ve no experience of DRBD. Presumably the active node would have to be using the /dev/drbdX devices as disks. Does DRBD scale to having say 100 of those on one host? It seems like a lot of added complexity.
I just thought I would blog this so that people can find it in a search, as I could not when I tried.
I have a perl script on a machine that sends out transactional emails with the from address like firstname.lastname@example.org. The machine it sends from is not example.com. Let’s call it admin.foobar.example.com. Since some misguided people think that sender verification callouts are a good idea, their MX hosts try to connect to admin.foobar to see if the billing address is there and accepting mails. This fails because admin.foobar does not run a public MTA and I do not want it to run one either.
It’s clear that the correct thing is for the script to be setting its envelope sender to email@example.com since firstname.lastname@example.org should be an email address that does actually work! How to do it with perl’s Mail::Send, however, is not so clear.
After trying to work this out myself for a few hours I asked dg, who is not just a perl monger but a veritable perl Tesco, making mere perl mongers compromise on their margins and putting many of them completely out of business. Every little helps.
Based on his reading of the source of Mail/Util.pm, he thought it might be accomplished by setting the MAILADDRESS environment variable. He was wrong however:
<dg> oh, that only works for SMTP
<dg> how crap
The next suggestion was a no-go also, but third time lucky: