The horrible things people’s routers do to my packets!

Auto-updating software

So, over the years, I’ve released a few programs (all native C/C++) which have included automatic version checking, so they can let the user know when a new version is available. I’ve always done this by making a socket connection to my web server (previously Apache, now entirely Node.js) and asking for a file which just contains a version number. Pretty simple, right? Sure!

SOCKET s = socket(AF_INET, SOCK_STREAM, 0);
connect(s, ...);
char cmd[] = "GET http://www.bigscreensmallgames.com/BestBombermanEver/version.txt HTTP/1.0\r\n\r\n";
send(s, cmd, sizeof(cmd), 0);
recv(s, ...);

And that’s about it.  Note:

  • This sends a single packet, and no router’s MTU is small enough that this packet would ever be fragmented, so it is, in theory, at least according to how TCP/IP is supposed to work, guaranteed to arrive as a single packet. Not that most web servers would care…
  • I don’t bother sending any headers, specifically the Host header. Why would I if this works?
  • There’s a bug in the code, it’s writing the NULL character into the packet after the final line feed.

Before this iteration, I was not even sending the HTTP/1.0 part of the text, which means it’s not even a valid HTTP request, howver this worked wonderfully for Apache (response didn’t even include any headers I had to parse and ignore, how elegant!), but broke when I added HAProxy to my server stack.

When things went wrong

This all worked fine for many years (as far as I knew), until I switched my web server architecture recently, in which case I noticed some interesting errors in my logs. Admittedly, I’m pretty certain Apache and HAProxy, with the default settings I was running, would not have told me if they were mishandling things in a similar way. I know that at least a few of these problems didn’t occur before (because my version updating stopped updating for everyone when I updated my server), but suspect some of the problems may have been happening before (in which case I would have never known, and would not have been able to fix them).

Lack of a host header

Hardware and software routers and Node.js itself didn’t seem to care about the lack of a host header, that’s, AFAIK, valid HTTP. However, I was using virtual host routes with Express, which seem to rely on the host header explicitly, so I had to patch the request before passing it off to Express:

var server = http.createServer(function (req, res) {
  // Remap a url with a host into the format express/vhost logic needs
  if (!req.headers.host && req.url.indexOf('://') !== -1) {
    var parsed = url.parse(req.url);
    req.headers.host = parsed.host;
    req.url = parsed.path + (parsed.hash || '');
  }
  express_app(req, res);
});

Extraneous null

How much damage can an extra zero do?

Node.js completely blew up on the extraneous null terminator at the end of my HTTP request (Node’s HTTP parser is rather strict and gives up on any malformed request, or, if your handlers are async, any earlier request received on the same socket as a later malformed request). This was not acceptable for me. I’ve got apps in the wild, and they need to be able to phone home to check for updates (and in some cases, actually download and apply the new updates).

So I put in what seemed to be a simple hack: intercept the first packet, if it starts with “GET http://” and ends in “\r\n”, strip off the null.

Simple, right? This worked! At least it did for me… Then crashes/exceptions parsing HTTP streams started showing up in my logs. I found out that, for a lot of users, something between my app and my web server was intercepting my simple packet, and modifying it. That it’s being intercepted and inspected isn’t surprising, as all sorts of things will do that, usually security software, firewalls, antivirus scanners, that kind of thing. It was a little surprising that my packet was getting modified, and even more surprising to see in how many different signficant ways it get modified! I found, and had to put in a fix for, each of the following things which I discovered were happening to my packet:

  • Stripping carriage returns – Probably some firewall being clever and saving a few bytes by just using line-feeds, my packet ended with “\n\n” now instead, simple enough to look for that…
  • Splitting into two packets – the second one containing just the null! How weird, and troublesome, I get a first, completely well-formed packet, looks like any other request, and then it sends along a second packet containing just a null character! What a waste of a packet! Probably some firewall being clever and splitting what it thought might be multiple requests on a single socket into two logical packets or something. Slightly more complicated to handle this, have to keep a bit of state around and not just look at the first packet, though if the second packet is just a null, that’s simple enough to detect.
  • Splitting into two packets – the second one containing “\n”! Even weirder, it’s like a combination of the first two gone bad, this time breaking the carriage return/line feed pairs in half. I wonder if this particular firewall breaks up even regular, well-formed HTTP requests into 2 packets all the time…

These are just the results of routers/antivirus programs/etc which interfered in a small enough amount that my request actually made it through, however mangled, who knows what worse things are happening that aren’t even making it to me, or which poorer quality routers are just totally choking on my slightly malformed request.

Lessons learned

  • Stick to spec – just a slight variation from the HTTP spec caused quite a bit of headache. Well, variation from what I’m assuming the HTTP spec is, I’ve never actually thoroughly read it, as it’s rather large, I’ve just had to deduce what would conform to it by watching what Chrome does, and seeing what Node accepts. Which is, I guess, exactly how I got into this mess (except, at the time, it was what Apache accepted, which was much, much more relaxed).
  • I really don’t want to know how the (internet firewall) sausages are made.

Links

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s