Lingering close

One of the early pieces of code I tackled in httpd was APR-izing lingering_close(). I recall dean gaudet ensuring that I didn’t screw it up. (See an early part of the conversation.)

Perhaps a year or two later, some colleagues in z/OS TCP/IP and SNA, where I had worked before joining the IBM team working on httpd, let me know that a file transfer program I had written long before had stopped working reliably when transferring to or from z/OS after some updates in z/OS TCP/IP. (Why does one reinvent file transfer? I decided to learn sockets programming but got tired of all the ifdefs to support Windows and OS/2 and VM/CMS and MVS, so I did what everyone else did and wrote a high-level library. A file transfer program comes next, right? Anyway, it was used by a number of colleagues for higher level features like printing to a workstation printer from VM/CMS, XEDIT macros that interacted with your PC clipboard, and other fun stuff.) At any rate, it was good that I had learned about lingering close via httpd because otherwise I would have been shocked at the reason behind the intermittent failures in the file transfer program; every indication was that both client and server modes were doing exactly what they needed to do, but one of the peers could get ECONNRESET before it had finished reading the response. IIRC, the lingering close logic was then implemented and some small amount of happiness ensued, but I didn’t have time to rework the build for the then-available tools on VM/CMS, and a big use-case died. Sorry, folks!

Fast forward through most of the life of the web… I’ve been playing recently with httpd and nginx in front of uWSGI and writing up my notes in this in-progress tutorial. After initially encountering a bug where uWSGI doesn’t eat the empty FCGI_STDIN record with FastCGI, now that it is fixed I’m left with a familiar scenario: The server (uWSGI in this case) writes the entire response to the client (FastCGI gateway) and then calls close(), and sometimes (more often over a slow network) the gateway gets ECONNRESET before it can read everything.

What is authoritative text on the subject? I don’t know. Most discussions about getting ECONNRESET on a read-type call on the Internet do not talk about getting hit by RST jumping ahead of data already copied by the server to the TCP send buffers. Some discussions raise the issue about the client perhaps trying to send data after the server has closed. (The lingering close logic as normally implemented in TCP servers helps a lot in those cases.) This ancient Apache httpd 1.3 comment is as succinct as any I know of in describing the problem:

* in a nutshell -- if we don't make this effort we risk causing
* TCP RST packets to be sent which can tear down a connection before
* all the response data has been sent to the client.

Here is a blog page that covers this more disturbing scenario very clearly:

So can I trigger the issue at will with a simple client-or-server program? At first I couldn’t. After quite a bit of experimentation, the answer is definitely yes. It isn’t hard to trigger with httpd’s mod_proxy_fcgi and uWSGI and a particular application-level flow and uWSGI configured to run the simple application on multiple processes and threads, but it wasn’t so easy with this simple program.

The final methodology to see the error:

server
$ ./a.out server (no trick there)
client
Run this in 8 or preferably more different terminal sessions: $ while ./a.out client 192.168.1.211; do   echo ok; done
human
# while no-error-on-any-client-terminal-session; do Go clean some room in your house; done

Client and server are on two different machines connected by a relatively slow Wi-Fi.

As long as you see that the server displays no socket call errors but at least one client displays something like read: Connection reset by peer, you’ve encountered the error. Maybe errors will occur only with more clients in your environment. After seeing a few failures in my setup with around twelve client loops, I went for a long walk and found after my return that seven were still happy (many successful transactions with zero errors).

Can SO_LINGER help any, at least on recent-Ubuntu talking to recent-Ubuntu? I’ll try that, though I think the tried and true server logic (shutdown for write then wait for timeout or client to read) is the safest solution.

Massive fail

Early results were incorrect; I had forgotten to set the listen backlog to a high enough number to avoid TCP resetting connections. More results to be posted later…

The moral of the story: If you get ECONNRESET on a read-type call before having read any data, check for an issue with listen backlog first, in which case the server would have never processed the connection before it was reset.