Update 2007-05-05: I sent some strace output to the author of haproxy, Willy Tarreau, and he replied within 24 hours with a full annotation of the strace and a one line patch to fix this issue. That’s what I call support! Here’s his comments and the patch:
I think this is caused by the fact that the end of connection from the client was received BEFORE the connection even established to the server, and when the connection status is checked, there is nothing anymore in the buffer because all data was just sent at once.
Could you please apply the following patch (against 1.2.17) and check that it fixes your problem (simply do “patch -p1” on the mail) ?
diff --git a/haproxy.c b/haproxy.c index 8e57700..357a37a 100644 --- a/haproxy.c +++ b/haproxy.c @@ -5589,7 +5589,7 @@ int process_srv(struct session *t) { else if (s == SV_STCONN) { /* connection in progress */ if (c == CL_STCLOSE || c == CL_STSHUTW || (c == CL_STSHUTR && - (t->req->l == 0 || t->proxy->options & PR_O_ABRT_CLOSE))) { /* give up */ + ((t->req->l == 0 && t->res_sw == RES_SILENT) || t->proxy->options & PR_O_ABRT_CLOSE))) { /* give up */ tv_eternity(&t->cnexpire); fd_delete(t->srv_fd); if (t->srv)
Original problem is described below…
I’m trying to use haproxy to load balance three spamassassin spamd servers. spamd uses a plain text TCP protocol so in theory it should be simple, but I’m getting intermittent connection problems. Here’s my config:
global log 127.0.0.1 local0 debug maxconn 100 ulimit-n 512 uid 999 gid 999 daemon pidfile /var/run/haproxy-spamd.pid listen spamd bind 212.13.194.5:783 mode tcp option tcplog log global balance roundrobin source 212.13.194.5:0 clitimeout 150000 srvtimeout 150000 contimeout 30000 server corona 212.13.194.122:783 weight 5 server curacao 212.13.194.71:783 weight 5 server islay 212.13.194.96:783 weight 6
The problem is that sometimes the client drops the connection immediately with the client (in my case my MTA, Exim) saying:
2007-04-28 23:10:32 1Hhw3k-0000xN-G2 spam acl condition: cannot parse spamd output 2007-04-28 23:10:32 1Hhw3k-0000xN-G2 SA: Action: scanned but message isn't spam: score=0.7 required=5.0 (scanned in 0/0 secs | Message-Id: SODIUM3tt4LQsJABSCu000006a6@sodium.lon.periodicnetwork.com). From(host=mail.argon.lon.periodicnetwork.com [83.245.63.194]) for elided@snowblind.net 2007-04-28 23:10:32 1Hhw3k-0000xN-G2 <= noreply@periodicnetwork.com H=mail.argon.lon.periodicnetwork.com (ARGON.lon.periodicnetwork.com) [83.245.63.194] P=esmtp S=2322 id=B0009206781@ARGON.lon.periodicnetwork.com 2007-04-28 23:10:33 1Hhw3k-0000xN-G2 => elided@gmail.com R=dnslookup T=remote_smtp H=gmail-smtp-in.l.google.com [66.249.93.114] 2007-04-28 23:10:33 1Hhw3k-0000xN-G2 Completed
at these times, haproxy’s log will report:
Apr 28 23:10:32 localhost haproxy[22910]: 212.13.194.70:32958 [28/Apr/2007:23:10:32] spamd islay 0/-1/7 0 CC 0/0/0 0/0 Apr 28 23:10:32 localhost haproxy[22910]: 212.13.194.70:32961 [28/Apr/2007:23:10:32] spamd corona 0/0/236 2285 -- 0/0/0 0/0
The “CC” means that the client dropped the connection before a connection to a backend server was made. That’s the first connection in Exim’s spam acl. The second connection from SA-Exim was successful.
(at the moment Exim on 212.13.194.70 is doing both spam acl connection to spamd and then an SA-Exim one as well, so two connections per email accepted. This is just a transitional thing while I move away from SA-Exim and isn’t a long-term plan.)
It’s not always the spamd islay that shows this error, and the problem doesn’t happen every time – both curacao and islay have successful and problem connections. Only corona is always successful. I don’t know why.
Also when this happens, although Exim drops its spamd connection immediately after sending data, haproxy does pass the connection through to a backend spamd which does process it as normal:
Apr 28 23:10:32 admin spamd[11394]: spamd: connection from 212.13.194.5 [212.13.194.5] at port 33761 Apr 28 23:10:32 admin spamd[11394]: spamd: checking messageaka for Debian-exim:102 Apr 28 23:10:33 admin spamd[11394]: spamd: clean message (0.8/5.0) for Debian-exim:102 in 0.7 seconds, 1668 bytes. Apr 28 23:10:33 admin spamd[11394]: spamd: result: . 0 - AWL,NO_REAL_NAME,PORN_URL_SEX,SPF_PASS scantime=0.7,size=1668,user=Debian-exim,uid=102,required_score=5.0,rhost=212.13.194.5,raddr=212.13.194.5,rport=33761,mid= ,rmid= ,autolearn=no
haproxy doesn’t have a support mailing list, it only has an IRC channel which I am reluctant to bring this up in. I mailed the author and he doesn’t know why it should be behaving like this either. Anyone else have any ideas?
Failing that, anyone know a decent, open source software load balancing solution for generic TCP? Bonus if I can direct to least busy backend, or if I can specify a limit of connections per server.
I’ve not tried it myself (at least not recently), but have you looked at LVS/ipvsadm? http://kb.linuxvirtualserver.org/wiki/IPVS
I came across this post in regards to haproxy but found it interesting because you were load balancing spamd, something we have been doing for a number of years now using the above mentioned LVS + heartbeat solution. We are load balancing spamd connections across 8 backend servers and are processing ~100 million spam scans/month, so I can definitlyrecommend the above solution.