It’s Node.js All the Way Down

Summary

Node.js can be pretty simple, but things get a bit more complicated when using Node.js to combine serving virtual hosts, some static sites, WebSockets (socket.io) used everywhere, some proxying to other apps, and some with request rewriting for hosting apps in a subdirectories.

The problem

I have a host running collection of websites and web apps, on various hostnames, running under various users, using various technologies.  I often forget exactly how all of these things work, how they work together, and they have some subtle problems I don’t want to bother with having to resolve.

The proposed solution

It’s Node.js Apps all the way down!

It's Turtles All the Way Down

Goals

I’ve acquired some requirements through my years of running a handful of piecemeal sites on this host:

  • Single port (port 80) – Absolutely nothing running publicly on a port other than 80, it just causes too many headaches and problems with proxies, routers, firewalls, etc.
  • Serve static websites from a folder trivially with browsable folders (when there’s no index.html) for serving random dumps of things (Apache was doing this fine until I put HAProxy in front of it).
  • Various virtual hosts configurations
    • Direct virtual host to static website (Apache was fine for this)
    • Direct virtual host to another Node.js app listening on another port or on another machine on my network (with WebSocket support) (I previously used HAProxy for this)
  • Node.js apps mapped to be running in sub-folders on any host (e.g. foo.com/ is a static site, but foo.com/app1/ pulls up some single-page web app that probably also uses WebSockets) – this requires a well-written app (all relative paths, or it knows what subfolder it’s served out of) and request rewriting (I previously used HAProxy for these, see config below)
  • Centralized logging – I’m lazy and don’t bother logging in most of my sub-apps, it would be nice to have a single place which logs all requests coming in to my site and apps

For the purposes of this Blog post, I’ll just be considering one site of each kind of configuration, but it should be clear how in reality I have a few sites under each of these configurations.

Previous Setup

Topology Before

As mentioned above, I had HAProxy running as a reverse proxy, accepting public connections on port 80, with a bunch of rules to forward on to various sub-apps and any virtual hosts which were Node.js apps.  I was also running Apache to serve any static sites, and used Apache’s virtual host features to serve any virtual hosts which were static sites.  For posterity, here’s roughly what the relevant block of my HAProxy config looked like:

   frontend http-in
       bind *:80
       default_backend apache_http

       acl status_query path /haproxy_status
       acl host_fanime hdr_dom(host) fanime.info
       acl host_dashingstrike hdr_dom(host) dashingstrike.com
       acl path_app1 path_beg /app1

       use_backend apache_http if status_query
       use_backend fanime if host_fanime
       use_backend app1 if host_dashingstrike path_app1

       # Need to close otherwise re-used sockets will
       #   not get re-routed
       option http-server-close

       option forwardfor
       # http close so that we get a new connection for each
       #   request and get the forwardfor header added
       #   consistently.
       option forceclose

   backend apache_http
       server server00 localhost:81

       # add monitoring page to http server
       stats enable
       stats uri /haproxy_status
       stats refresh 15s
       stats realm   Haproxy\ Statistics
       stats auth    myadmin:mypass
       stats admin if LOCALHOST

   backend fanime
       server server00 localhost:4001

   backend app1
       # Re-write requests
       reqrep ^([^\ :]*)\ /app1/(.*)     \1\ /\2
       server server00 192.168.0.127:21022

It’s not horrible, but it’s not the best in the world, and I always forget exactly how each of the HAProxy features work.  Luckily I’ve got this posted in a blog now so I can always look it up in the future!  Of course, I’m probably doing something horribly wrong somewhere in there.

Problem details

This is somewhat the opposite problem as my previous blog post, where we wanted the absolute best performance; here I care very little about performance as compared to maintenance and flexibility.  However, this solution written entirely in Node.js can be adapted to perform well, possibly with adding in a few more Express middleware modules for better caching of static content, or by using something like Nginx to serve static files very efficiently, either behind a proxy or in front of my Node.js app.

I’m very comfortable in Node.js, enough that I know I can resolve any possible problem that would arise, so, even though it’s not the most efficient for things like server static websites, I decided it would save me the most headaches in the long term if I just set this up as a (in theory) straightforward Node.js app.

In the past, when I put a reverse proxy (HAProxy) in front of my (previously just Apache) setup, I had to re-configure Apache to run on another port, and it then had problems with directory redirects (when you access foo.com/bar and it needs to redirect you to foo.com/bar/ so that relative paths work correctly) going to the wrong port (since Apache thought redirecting to the port it was listening on was a good idea).  I did some searching and it turns out that the internet is full of millions of pages talking about “apache” “ports” and “redirects”, absolutely none of which are talking about this particular problem, which I was never able to resolve. Also, it just seemed like a giant inelegant mess to maintain, and possible security holes if I don’t keep Apache and HAProxy up to date.

I thought I was clever with setting up HAProxy to do some request re-writing for serving single-page apps under a subdirectory, however HAProxy had problems with this since the browser re-uses connections, and HAProxy only evaluates the ACL upon the initial request, so subsequent requests might get proxied to the wrong back-end server.  I was able work around this with “option http-server-close” to force every request to get a new connection but that is particularly bad for performance (especially if I ever want to use SSL).

Solution details

Topology After

I created a new, main Node.js app which accepts and routes all initial connections.  I decided to use the Express framework due to its convenient middleware architecture and ecosystem.  I also have this app handling serving of all of my static websites, which could be a performance issue if there is heavy traffic, at which point I could first enable the cluster module, and second move the static serving to a separate process (probably Nginx or something).

As far as Express middleware goes, I took advantage of vhost, morgan (for logging), and serve-index and serve-static (for static sites and directory browsing).

The pieces

Express’s vhost module essentially works by mapping a hostname to a handler or another express app, so I assembled handlers and express apps for my various back-end configurations:

  • Static sites which I serve with serve-index and serve-static
  • Proxied sites which I proxy using the http-proxy module
  • My “main site”
    • Has a path-based Express route for /app1 that proxies to another Node.js server.  Express’s path-based routes effectively rewrite the request to be relative to the folder they’re bound to, so /app1/ gets set as a request for / to my proxied server, which is what I need.
    • Also serves a static site if other rules are not matched

WebSockets

The Express router/middleware does not deal with these gracefully, so I set up a second middleware chain explicitly for handling WebSocket “upgrade” requests.  Each of the above pieces which require WebSockets also registered themselves in the same manner (but with a function which proxies WebSocket requests) to this second router.  I then bind this router directly to the “upgrade” event on my http server, which then does what is needed.

The pieces in code

NOTE: If you’re not signed in, WordPress is mangling the code blocks below, removing random characters while inserting some advertising tracking stuff. Log in to WordPress or view the code directly on GitHub instead.

Modules from NPM/Node

var express = require('express');
var http = require('http');
var httpProxy = require('http-proxy');
var morgan = require('morgan');
var serveIndex = require('serve-index');
var serveStatic = require('serve-static');
var url = require('url');
var vhost = require('vhost');

Don’t crash our app, just log errors. This is absolutely required (or something like it, using Domains), because, among other things, httpProxy throws uncaught socket hang-up exceptions whenever proxying to another server which disconnects a socket (e.g. if you you make a socket.io connection to the wrong endpoint, or your sub-app forcibly disconnects for whatever reason).

function handleError(e) {
  console.error('ERROR', new Date(), e);
}
process.on('uncaughtException', handleError);

Serve a static website by name from a folder named the same

function staticSite(app, dir) {
  app = app || express();
  app.use(serveStatic('/var/web/' + dir + '/'));
  app.use(serveIndex('/var/web/' + dir + '/', { icons: true, view: 'details' }));
  return app;
}

Handlers to proxy regular requests and WebSocket requests

function webProxy(proxy) {
  return proxy.web.bind(proxy);
}

function wsProxy(proxy) {
  return function (req, data) {
    console.log('Proxying ws upgrade request for ' + req.headers.host + ' ' + (req.originalUrl || req.url));
    proxy.ws(req, data.socket, data.head);
  };
}

Create proxies for sub sites running as separate processes

var proxy_fanimeinfo = httpProxy.createProxyServer({
  target: { host: 'localhost', port: 4001 },
  agent: http.globalAgent, // passing agent to prevent connection: close from being added
  xfwd: true, // add x-forwarded-for header so we get the real IP
});

var proxy_app1 = httpProxy.createProxyServer({
  target: { host: '192.168.0.127', port: 21022 },
  agent: http.globalAgent, // passing agent to prevent connection: close from being added
  xfwd: true, // add x-forwarded-for header so we get the real IP
});

Utility middleware to redirect things like foo.com/bar to foo.com/bar/ so relative paths work right in sub-apps.

function directoryize(app, dir) {
  app.get('/' + dir, function (req, res, next) {
    if (req.url !== '/' + dir) {
      return next();
    }
    res.redirect('/' + dir + '/');
  });
}

Putting the pieces together

Within this Node.js app, the primary Express app handling connections chains together morgan for logging, and then a bunch of vhost layers directing to various subsites, followed by my default “main site” app as described above.

My “main” app, including proxying a subdirectory

var main = express();
var main_ws = express.Router();

directoryize(main, 'app1');
main.use('/app1', webProxy(proxy_app1));
main_ws.use('/app1', wsProxy(proxy_app1));

staticSite(main, 'dashingstrike.com');

Primary Express app

// Root vhost app
var vhost_app = express();
var vhost_ws = express.Router();

// Logging
vhost_app.use(morgan('combined'));

// Directory mapped virtual hosts
vhost_app.use(vhost(/(?:www\.)?bigscreensmallgames\.com/, staticSite(null, 'bigscreensmallgames.com')));

// Virtual hosts mapping to other node apps running as separate processes
vhost_app.use(vhost(/(?:www\.)?fanime\.info/, webProxy(proxy_fanimeinfo)));
vhost_ws.use(vhost(/(?:www\.)?fanime\.info/, wsProxy(proxy_fanimeinfo)));

// Default - dashingstrike.com and variants, anything unknown, etc
vhost_app.use(main);
vhost_ws.use(main_ws);

Finally, create an HTTP server, listen, and route WebSocket upgrade requests appropriately, and other requests will be handled by the primary app defined above. Also, attach an error handler on the socket so that we get a log of errors (otherwise morgan silently suppresses them).

var server = http.createServer(function (req, res) {
  req.socket.on('error', handleError);
  vhost_app(req, res);
});

server.on('upgrade', function (req, socket, head) {
  // Use the same express.Router logic for vhost mapping of the upgrade request
  vhost_ws(req, { socket: socket, head: head }, function () {
    console.log('No one to proxy websocket to for ' + req.headers.host + ' ' + req.url);
  });
});

var port = 80;
server.listen(port);
console.log('Started webroot on port ' + port + ', process id ' + process.pid);

You can view the entire app (including logging and other details not relevant to this post) over on GitHub.

Finishing touches and other comments

I first used IPTables to forward port 80 to my Node.js app, so I don’t need to deal with process elevation.  I later stumbled upon the much easier option (especially if you don’t already have IPTables configured), which is to use setcap to give the Node.js binary rights to open otherwise privileged ports, like so:

$ sudo setcap cap_net_bind_service=+ep `which node`

After that, I can just listen on port 80 directly.

I discovered Node.js’s HTTP request parser is much more strict than Apache’s or HAProxy’s, which caused me to find some legacy native code I have deployed around the world (auto-update features in native apps) must not have been quite followingthe HTTP spec (but, hey, it worked, mostly…), so I ended up additionally having to add in another layer to deal with those requests (I absolutely needed the requests to work, as that was the means in which the applications would automatically update themselves once I fix them to be more compliant).  Mostly just because it’s amusing how many ways things can go wrong, I posted another blog post about that!

As far as performance goes, for specific tasks like proxying and serving static files, a dynamic, flexible platform like this built on Node.js will never be as fast as a highly optimized piece of software like HAProxy and Nginx. However with the right code caching your files and such, you can probably get Node.js running at least close to as fast as some of the more specialized options.  However, the current performance is more than enough for my uses in this situation, and I now have a relatively simple and easily extendable framework for managing the various Node.js applications I run on my host.  Well, it’s not quite as simple as I initially hoped, but what ever is?

References and Attribution

Advertisements

1 thought on “It’s Node.js All the Way Down”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s