Update 2007-05-05: I sent some strace output to the author of haproxy, Willy Tarreau, and he replied within 24 hours with a full annotation of the strace and a one line patch to fix this issue. That’s what I call support! Here’s his comments and the patch:
I think this is caused by the fact that the end of connection from the client was received BEFORE the connection even established to the server, and when the connection status is checked, there is nothing anymore in the buffer because all data was just sent at once.
Could you please apply the following patch (against 1.2.17) and check that it fixes your problem (simply do “patch -p1” on the mail) ?
diff --git a/haproxy.c b/haproxy.c
index 8e57700..357a37a 100644
--- a/haproxy.c
+++ b/haproxy.c
@@ -5589,7 +5589,7 @@ int process_srv(struct session *t) {
else if (s == SV_STCONN) { /* connection in progress */
if (c == CL_STCLOSE || c == CL_STSHUTW ||
(c == CL_STSHUTR &&
- (t->req->l == 0 || t->proxy->options & PR_O_ABRT_CLOSE))) { /* give up */
+ ((t->req->l == 0 && t->res_sw == RES_SILENT) || t->proxy->options & PR_O_ABRT_CLOSE))) { /* give up */
tv_eternity(&t->cnexpire);
fd_delete(t->srv_fd);
if (t->srv)
Original problem is described below…
I’m trying to use haproxy to load balance three spamassassin spamd servers. spamd uses a plain text TCP protocol so in theory it should be simple, but I’m getting intermittent connection problems. Here’s my config:
global
log 127.0.0.1 local0 debug
maxconn 100
ulimit-n 512
uid 999
gid 999
daemon
pidfile /var/run/haproxy-spamd.pid
listen spamd
bind 212.13.194.5:783
mode tcp
option tcplog
log global
balance roundrobin
source 212.13.194.5:0
clitimeout 150000
srvtimeout 150000
contimeout 30000
server corona 212.13.194.122:783 weight 5
server curacao 212.13.194.71:783 weight 5
server islay 212.13.194.96:783 weight 6
The problem is that sometimes the client drops the connection immediately with the client (in my case my MTA, Exim) saying:
2007-04-28 23:10:32 1Hhw3k-0000xN-G2 spam acl condition: cannot parse spamd output
2007-04-28 23:10:32 1Hhw3k-0000xN-G2 SA: Action: scanned but message isn't spam: score=0.7 required=5.0 (scanned in 0/0 secs | Message-Id: SODIUM3tt4LQsJABSCu000006a6@sodium.lon.periodicnetwork.com). From (host=mail.argon.lon.periodicnetwork.com [83.245.63.194]) for elided@snowblind.net
2007-04-28 23:10:32 1Hhw3k-0000xN-G2 <= noreply@periodicnetwork.com H=mail.argon.lon.periodicnetwork.com (ARGON.lon.periodicnetwork.com) [83.245.63.194] P=esmtp S=2322 id=B0009206781@ARGON.lon.periodicnetwork.com
2007-04-28 23:10:33 1Hhw3k-0000xN-G2 => elided@gmail.com R=dnslookup T=remote_smtp H=gmail-smtp-in.l.google.com [66.249.93.114]
2007-04-28 23:10:33 1Hhw3k-0000xN-G2 Completed
at these times, haproxy’s log will report:
Apr 28 23:10:32 localhost haproxy[22910]: 212.13.194.70:32958 [28/Apr/2007:23:10:32] spamd islay 0/-1/7 0 CC 0/0/0 0/0
Apr 28 23:10:32 localhost haproxy[22910]: 212.13.194.70:32961 [28/Apr/2007:23:10:32] spamd corona 0/0/236 2285 -- 0/0/0 0/0
The “CC” means that the client dropped the connection before a connection to a backend server was made. That’s the first connection in Exim’s spam acl. The second connection from SA-Exim was successful.
(at the moment Exim on 212.13.194.70 is doing both spam acl connection to spamd and then an SA-Exim one as well, so two connections per email accepted. This is just a transitional thing while I move away from SA-Exim and isn’t a long-term plan.)
It’s not always the spamd islay that shows this error, and the problem doesn’t happen every time – both curacao and islay have successful and problem connections. Only corona is always successful. I don’t know why.
Also when this happens, although Exim drops its spamd connection immediately after sending data, haproxy does pass the connection through to a backend spamd which does process it as normal:
Apr 28 23:10:32 admin spamd[11394]: spamd: connection from 212.13.194.5 [212.13.194.5] at port 33761
Apr 28 23:10:32 admin spamd[11394]: spamd: checking message aka for Debian-exim:102
Apr 28 23:10:33 admin spamd[11394]: spamd: clean message (0.8/5.0) for Debian-exim:102 in 0.7 seconds, 1668 bytes.
Apr 28 23:10:33 admin spamd[11394]: spamd: result: . 0 - AWL,NO_REAL_NAME,PORN_URL_SEX,SPF_PASS scantime=0.7,size=1668,user=Debian-exim,uid=102,required_score=5.0,rhost=212.13.194.5,raddr=212.13.194.5,rport=33761,mid=,rmid=,autolearn=no
haproxy doesn’t have a support mailing list, it only has an IRC channel which I am reluctant to bring this up in. I mailed the author and he doesn’t know why it should be behaving like this either. Anyone else have any ideas?
Failing that, anyone know a decent, open source software load balancing solution for generic TCP? Bonus if I can direct to least busy backend, or if I can specify a limit of connections per server.