Saturday, December 10, 2011

Reverse proxy node.js websockets with HAProxy

The more I toy with it, the more I love node.js and websockets. Pushing notifications asynchronously has never been this easy. In the little project I use for this post, I have an apache2 hosted app serving my pages, and I built a little notifier with node.js to broadcast various events to clients with socket.io.

Now the small issue here is I want websockets to communicate with node, while I want my other http request to be handled by apache. A first solution is to have node listen on another port, with client code looking like this:
<script src="http://www.test.tld:9000/socket.io/socket.io.js"></script>
<script>
    var socket = io.connect('http://www.test.tld:9000');
    ...
</script>

I have two obvious problems here. For one, some network setups only allow ports 80 and 443 to go through. And then it's just painfully ugly! A more graceful way would be to differenciate the requests based on subdomains rather than ports, and for that I need a reverse-proxy.

Apache2 already possesses a reverse proxy mod, but it doesn't handle websockets (and doing this with apache2 would somehow defeat the point of using node.js). I played a little with nginx, which is very light and fast, but having to patch the source code to use tcp_proxy to handle websockets made me unconfortable regarding the update process in the future. I finally chose HAProxy, which handles websockets out-of-the-box.

For the reverse proxy to work we first need to modify the ports node and apache listen to. So I changed the apache conf to have it listen locally to port 9010, and kept node on port 9000. HAProxy will now handle the initial requests on port 80 and dispatch them to node and apache. I want the requests sent to the domain "io.test.tld" to be forwarded to node, and the rest to be forwarded to apache. Here's a sample HAProxy configuration that does just that:
global
    daemon
    maxconn 4096
    user haproxy
    group haproxy

defaults
    log global

#this frontend interface receives the incoming http requests
frontend http-in
    mode http
    #process all requests made on port 80
    bind *:80
    #set a large timeout for websockets
    timeout client 86400000
    #default behavior sends the requests to apache
    default_backend www_backend
    #it all happens here: a simple check on the host string
    #when "io.test.tld" is matched, an acl I call arbitrarily
    # "websocket" triggers
    acl websocket hdr_end(host) -i io.test.tld
    #redirect to my node backend if the websocket acl triggered
    use_backend node_backend if websocket

#apache backend, transfer to port 9010
backend www_backend
    mode http
    timeout server 86400000
    timeout connect 5000
    server www_test localhost:9010

#node backend, transfer to port 9000
backend node_backend
    mode http
    timeout server 86400000
    timeout connect 5000
    server io_test localhost:9000

It's fairly straightforward and it just works. Of course the best way to handle this case is to have 2 different IP addresses, but as far as subdomains discrimination is concerned, I'm very satisfied with this solution so far.

Tuesday, November 29, 2011

Listen to Postgresql inserts with node.js

I've been playing with node.js a lot these last few days, and one of the things I've been trying to accomplish is to push live notifications to clients when an insert is done in a Postgresql database.
To start things on a good foot, I found this nice article detailing exactly my use case. I started right away and created a similar trigger on inserts on my Article table, with a stored procedure looking like this:
CREATE FUNCTION notify_article_insert() RETURNS trigger AS $$
BEGIN
  PERFORM pg_notify('article_watcher', NEW.id || '###' || NEW.name );
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

Then, with a simple install of node.js 0.6.3 equiped with the node-postgres module, I tried the following script:
var pg = require ('pg'),
    pgConnectionString = "postgres://user:pass@localhost/db";

pg.connect(pgConnectionString, function(err, client) {
  client.query('LISTEN "article_watcher"');
  client.on('notification', function(data) {
    console.log(data.payload);
  });
});

It worked just as expected... but only for 15 seconds. The query then seemed to somehow timeout, and I couldn't get anymore notification. What's worse is I couldn't find a way to catch this timeout when it happened and couldn't reissue the LISTEN query.
I dug a little and couldn't find any information on that subject. I finally got my answer straight from Brianc on the node-postgres module repo: 'pg.connect' is meant to handle pooled connections to the database, not persistent ones. For a LISTEN query, I need a standalone client that will stay open at all times:
var pg = require ('pg'),
    pgConnectionString = "postgres://user:pass@localhost/db";

var client = new pg.Client(pgConnectionString);
client.connect();
client.query('LISTEN "article_watcher"');
client.on('notification', function(data) {
    console.log(data.payload);
});

Problem solved! I ran the script for 2 days straight and it caught every single insert.

One last detail I learned along the way: Postgresql NOTIFY queries won't catch every single insert made. In case of a transaction with multiple inserts, a NOTIFY would proc just once in my tests. You have to use "Pg_notify" available on Postgresql 9+ and specify a payload to guarantee the notifications.

More articles on node.js are coming soon about websockets and reverse proxies.