Yes, that surprised me too; I expected a cluster of several Pg servers instead.

justizin · on Aug 18, 2015

Anyone who has run a number of postgres replicas can tell you that their ability to recover from a network partition is directly proportionate to their wal_keep_segments value and there isn't otherwise much to it.

If the network recovers while the last WAL log your replica read still exists on the server, your replica probably get to catch up, as long as your replica can read the logs as fast as they're being rotated out.

Postgres doesn't really try much else in the realm of distributed systems, so there's not much else you can test.

Skytools would be interesting, though, I've never felt particularly comfortable with replication schemes that trust an external process to ship rows and other changes, compared to the binary replication that will absolutely fail if you don't have every bit.

ahachete · on Aug 18, 2015

>Anyone who has run a number of postgres replicas can tell you that their ability to recover from a network partition is directly proportionate to their wal_keep_segments value and there isn't otherwise much to it.

Rather than wal_keep_segments, you may (should) use replication slots (http://www.postgresql.org/docs/9.4/static/warm-standby.html#...), by setting the primary_slot_name parameter in recovery.conf. With replication slots, WAL files will be kept for every replica, regardless of how far the replica lags behind. So you can very directly control the ability to recover from network partitions (although in exchange you will need to monitor the disk usage for pg_xlog at the master and, if necessary, forcibly drop the slot).

justizin · on Aug 19, 2015

Neat, I haven't had the pleasure of running postgres 9.4 in production, and I don't run postgres at all at my current gig, but that's pretty dope.

In practice, the last time I ran pg, I was fond of wal-e which, esp if you're on EC2, is handy because you can store basically all of the WAL segments ever, with snapshots, in S3, and you can bring new replicas online without a read load on any of the existing nodes. It will also bring a replica online that's been shut off for days or weeks for service.

This is really just an S3 version implementation of the wal archiver, which was originally designed afaict for storing infinite history on NFS. Come to think of it, the new place has NFS. rubs chin

Anyway, thanks for the pointer! Def something I would look into in the future. Maybe it would be fun to write a jepsen test for recovering pg replicas.