Stickybit

Load balancing PostgreSQL in the Amazon cloud

Skrivet av Sean Finney den 2010-02-02 | 3 Kommentarer

Taggar: , , , , , , , ,

We've been working on an internal project for quite some time that among other things involves a ReSTful web service backed by a PostgreSQL database.

Our "production" installation is deployed in the Amazon EC2 cloud, with the idea that this gives us maximum flexibility to scale up resource investment dynamically with demand. However, in such a deployment PostgreSQL becomes both a potential bottleneck and SPOF (Single Point of Failure). While at its current phase in development we don't see any indication that we will be reaching a performance ceilings any time soon, this is still a concern for any large-scale roll out.

During our development we looked into a number of tools / methods to remove or at least reduce the potential bottleneck and SPOF aspects of a PostgreSQL backend. These included many options from the well established myriad of solutions out there, such as bucardo, pgcluster, pgpool-II, and slony-I. We had a pretty good idea of our requirements:

  • No requirements or major constraints on application/schema design.
  • No SPOF for write operations (asynchronous replication would be okay though).
  • Load-balanced read operations.
  • Support for recent (8.3, 8.4) versions of PostgreSQL and PostGIS.
  • Authentication and encryption support.
  • Open Source.

Unfortunately, most of the tools we evaluated failed at least one of the above criteria. Most notably, many made assumptions about a having secure local physical network for the "clustering". This would
allow for various low-level networking techniques to perform load balancing, and also be necessary for the lax security requirements that many of the solutions have. For example, not all solutions allow for encryption between nodes, and others even require passwordless (i.e. "trust") authentication! Others require configuration of shared block devices or use of other utilities that impose their own set of problems for a cloud environment (like memcached).

We felt that pgpool-II came the closest in this regard, and additionally the application was quite simple conceptually. It didn't, however, have support for the built-in OpenSSL features found in PostgreSQL, but otherwise we were quite happy with its featureset. However, after conferring with the upstream authors it was indicated that this was a feature they wanted to have in a future release, so we decided to continue evaluating it.

In our first deployment, as a proof of concept, we set up a rather complicated system of SSH tunnels (maintained active via upstart jobs spawning autossh sessions) to secure the node-to-node communications. While quite complicated and a pain to administer, it worked and did not produce any noticiable degradation in performance.

So after returning from vacation, I sat down and spent a couple days hacking together a basic implementation and sent it to the pgpool developers' mailing list. After a bit of back and forth and some peer review, the patches have now been committed and the feature will be included in the next release.

I'm now putting the finishing touches and a fairly detailed "how-to" document for setting up an EC2/PostgreSQL/pgpool-II deployment. In the process a few other ideas for new features to pgpool-II have come up (and a I've even found a bug or two!)--but in any event the next release of pgpool-II should be good news for anyone wanting to deploy a scalable PostgreSQL implementation in the cloud.


TrackBacks

No TrackBacks have been submitted for this page.

Trackback URL for this page.

Skicka din kommentar

Mollom CAPTCHA

Kommentarer

  • How do your application read/write to pgpool-ii on EC2? I.e. normally, a server where pgpool-ii is running would have a virtual IP for pgpool-ii - to be migrated with heartbeat when one of the servers is down.

    With EC2, it's not possible - so how do you solve it?

    Posted by Edv, 27/07/2010 5:56pm (1 månad ago)

  • Hey,

    Great article - brings a lot of hope to postgresql on ec2. Have you finished the "how-to"? I would love to read it! :-)

    Posted by Mark, 01/07/2010 2:26am (2 månader ago)

  • Good info.
    Any news about the "How-To" document ?

    Posted by Laurent, 11/05/2010 5:35pm (4 månader ago)

RSS-ström för kommentarer på den här sidan | RSS-ström för alla kommentarer