Stickybit

Load balancing PostgreSQL in the Amazon cloud

Posted by Sean Finney on 2010-02-02 | 3 Comments

Tags: , , , , , , , ,

We've been working on an internal project for quite some time that among other things involves a ReSTful web service backed by a PostgreSQL database.

Our "production" installation is deployed in the Amazon EC2 cloud, with the idea that this gives us maximum flexibility to scale up resource investment dynamically with demand. However, in such a deployment PostgreSQL becomes both a potential bottleneck and SPOF (Single Point of Failure). While at its current phase in development we don't see any indication that we will be reaching a performance ceilings any time soon, this is still a concern for any large-scale roll out.

During our development we looked into a number of tools / methods to remove or at least reduce the potential bottleneck and SPOF aspects of a PostgreSQL backend. These included many options from the well established myriad of solutions out there, such as bucardo, pgcluster, pgpool-II, and slony-I. We had a pretty good idea of our requirements:

  • No requirements or major constraints on application/schema design.
  • No SPOF for write operations (asynchronous replication would be okay though).
  • Load-balanced read operations.
  • Support for recent (8.3, 8.4) versions of PostgreSQL and PostGIS.
  • Authentication and encryption support.
  • Open Source.

Unfortunately, most of the tools we evaluated failed at least one of the above criteria. Most notably, many made assumptions about a having secure local physical network for the "clustering". This would
allow for various low-level networking techniques to perform load balancing, and also be necessary for the lax security requirements that many of the solutions have. For example, not all solutions allow for encryption between nodes, and others even require passwordless (i.e. "trust") authentication! Others require configuration of shared block devices or use of other utilities that impose their own set of problems for a cloud environment (like memcached).

We felt that pgpool-II came the closest in this regard, and additionally the application was quite simple conceptually. It didn't, however, have support for the built-in OpenSSL features found in PostgreSQL, but otherwise we were quite happy with its featureset. However, after conferring with the upstream authors it was indicated that this was a feature they wanted to have in a future release, so we decided to continue evaluating it.

In our first deployment, as a proof of concept, we set up a rather complicated system of SSH tunnels (maintained active via upstart jobs spawning autossh sessions) to secure the node-to-node communications. While quite complicated and a pain to administer, it worked and did not produce any noticiable degradation in performance.

So after returning from vacation, I sat down and spent a couple days hacking together a basic implementation and sent it to the pgpool developers' mailing list. After a bit of back and forth and some peer review, the patches have now been committed and the feature will be included in the next release.

I'm now putting the finishing touches and a fairly detailed "how-to" document for setting up an EC2/PostgreSQL/pgpool-II deployment. In the process a few other ideas for new features to pgpool-II have come up (and a I've even found a bug or two!)--but in any event the next release of pgpool-II should be good news for anyone wanting to deploy a scalable PostgreSQL implementation in the cloud.


TrackBacks