Professional Documents
Culture Documents
Personal Open source Business Explore Pricing Blog Support This repository Search Sign in Sign up
ibarwick Add `witness unregister` command info in help output Latest commit 6faf0292 days ago
sql There was a missing table in sql/repmgr2_repmgr3.sql which made events 2 months ago
FAILOVER.rst Add placeholders for old FAILOVER.rst and QUICKSTART.md pointing to n… 6 months ago
HISTORY standby/witness unregister - enable even if node isn't running 2 days ago
Make le I copied over the rmtree function (and other functions needed by this… 3 months ago
QUICKSTART.md Add placeholders for old FAILOVER.rst and QUICKSTART.md pointing to n… 6 months ago
README.md standby/witness unregister - enable even if node isn't running 2 days ago
con g.c Parse the contents of the "pg_basebackup_options" parameter in repmgr… 9 days ago
con g.h Parse the contents of the "pg_basebackup_options" parameter in repmgr… 9 days ago
dbutils.c Suppress connection error display in `repmgr cluster show` 3 days ago
dbutils.h Suppress connection error display in `repmgr cluster show` 3 days ago
dirmod.c I copied over the rmtree function (and other functions needed by this… 3 months ago
dirmod.h I copied over the rmtree function (and other functions needed by this… 3 months ago
log.h Suppress connection error display in `repmgr cluster show` 3 days ago
repmgr.c Add `witness unregister` command info in help output 2 days ago
repmgr.conf.sample Add notes about setting pg_bindir for Debian/Ubuntu-based distributions. 20 days ago
https://github.com/2ndQuadrant/repmgr 1/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
repmgr.h standby/witness unregister - enable even if node isn't running 2 days ago
repmgrd.c Parse the contents of the "pg_basebackup_options" parameter in repmgr… 9 days ago
README.md
Overview
The
repmgr suite provides two main tools:
repmgrd is a daemon which actively monitors servers in a replication cluster and performs the following tasks:
repmgr supports and enhances PostgreSQL's built-in streaming replication, which provides a single read/write master server
and one or more read-only standbys containing near-real time copies of the master server's database.
For a multi-master replication solution, please see 2ndQuadrant's BDR (bi-directional replication) extension.
http://2ndquadrant.com/en-us/resources/bdr/
For selective replication, e.g. of individual tables or databases from one server to another, please see 2ndQuadrant's pglogical
extension.
http://2ndquadrant.com/en-us/resources/pglogical/
Concepts
This guide assumes that you are familiar with PostgreSQL administration and streaming replication concepts. For further
details on streaming replication, see this link:
https://www.postgresql.org/docs/current/interactive/warm-standby.html#STREAMING-REPLICATION
https://github.com/2ndQuadrant/repmgr 2/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
replication cluster
In the
repmgr documentation, "replication cluster" refers to the network of PostgreSQL servers connected by streaming
replication.
node
A
node is a server within a replication cluster.
upstream node
This is the node a standby server is connected to; either the master server or in the case of cascading replication, another
standby.
failover
This is the action which occurs if a master server fails and a suitable standby is promoted as the new master. The
repmgrd
daemon supports automatic failover to minimise downtime.
switchover
In certain circumstances, such as hardware or operating system maintenance, it's necessary to take a master server of ine;
in this case a controlled switchover is necessary, whereby a suitable standby is promoted and the existing master removed
from the replication cluster in a controlled manner. The
repmgr command line client provides this functionality.
witness server
repmgr provides functionality to set up a so-called "witness server" to assist in determining a new master server in a failover
situation with more than one standby. The witness server itself is not part of the replication cluster, although it does contain a
copy of the repmgr metadata schema (see below).
The purpose of a witness server is to provide a "casting vote" where servers in the replication cluster are split over more than
one location. In the event of a loss of connectivity between locations, the presence or absence of the witness server will
decide whether a server at that location is promoted to master; this is to prevent a "split-brain" situation where an isolated
location interprets a network outage as a failure of the (remote) master and promotes a (local) standby.
tables:
: connection and status information for each server in the replication cluster
repl_nodes
views:
: when
repl_status 's monitoring is enabled, shows current monitoring status for each node
repmgrd
The
repmgr metadata schema can be stored in an existing database or in its own dedicated database.
A dedicated database superuser is required to own the meta-database as well as carry out administrative actions.
Installation
https://github.com/2ndQuadrant/repmgr 3/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
System requirements
repmgr is developed and tested on Linux and OS X, but should work on any UNIX-like system supported by PostgreSQL
itself.
Current versions of
repmgr support PostgreSQL from version 9.3. If you are interested in using
repmgr on earlier versions
of PostgreSQL you can download version 2.1 which supports PostgreSQL from version 9.1.
All servers in the replication cluster must be running the same major version of PostgreSQL, and we recommend that they
also run the same minor version.
The
repmgr tools must be installed on each server in the replication cluster.
TIP: We recommend using a session multiplexer utility such as screen or tmux when performing long-running actions
(such as cloning a database) on a remote server - this will ensure the
repmgr action won't be prematurely terminated if
your
ssh session to the server is interrupted or closed.
Packages
We recommend installing
repmgr using the available packages for your system.
RedHat/CentOS: RPM packages for repmgr are available via Yum through the PostgreSQL Global Development Group
RPM repository ( http://yum.postgresql.org/ ). Follow the instructions for your distribution (RedHat, CentOS, Fedora, etc.)
and architecture as detailed at yum.postgresql.org.
2ndQuadrant also provides its own RPM packages which are made available at the same time as each
repmgr release,
as it can take some days for them to become available via the main PGDG repository. See here for details:
http://repmgr.org/yum-repository.html
See
PACKAGES.md for details on building .deb and .rpm packages from the
repmgr source code.
Source installation
repmgr source code can be obtained directly from the project GitHub repository:
git clone https://github.com/2ndQuadrant/repmgr
https://github.com/2ndQuadrant/repmgr/releases
http://repmgr.org/downloads.php
repmgr is compiled in the same way as a PostgreSQL extension using the PGXS infrastructure, e.g.:
sudo make USE_PGXS=1 install
https://github.com/2ndQuadrant/repmgr 4/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
repmgr can be built from source in any environment suitable for building PostgreSQL itself.
Con guration
repmgr and
repmgrd use a common con guration le, by default called
repmgr.conf (although any name can be used if
explicitly speci ed). At the very least,
repmgr.conf must contain the connection parameters for the local
repmgr database;
see
repmgr configuration file below for more details.
/etc/repmgr.conf
The following parameters in the con guration le can be overridden with command line options:
L/loglevel
b/pg_bindir
d/dbname=DBNAME
h/host=HOSTNAME
p/port=PORT
U/username=USERNAME
If
d/dbname contains an = sign or starts with a valid URI pre x (
postgresql:// or ), it is treated as a
postgres://
conninfo string. See the PostgreSQL documentation for further details:
https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-CONNSTRING
Note that if a
conninfo string is provided, values set in this will override any provided as individual parameters. For example,
with
d 'host=foo' host bar , foo will be chosen over bar .
https://www.postgresql.org/docs/current/static/libpq-envars.html
TIP: for testing , it's possible to use multiple PostgreSQL instances running on different ports on the same
repmgr
computer, with password-less SSH access to
localhost enabled.
https://github.com/2ndQuadrant/repmgr 5/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
# Enable replication connections; set this figure to at least one more
# than the number of standbys which will connect to this server
# (note that repmgr will execute `pg_basebackup` in WAL streaming mode,
# which requires two free WAL senders)
max_wal_senders = 10
# Ensure WAL files contain enough information to enable readonly queries
# on the standby
wal_level = 'hot_standby'
# How much WAL to retain on the master to allow a temporarily
# disconnected standby to catch up again. The larger this is, the
# longer the standby can be disconnected. This is needed only in
# 9.3; from 9.4, replication slots can be used instead (see below).
wal_keep_segments = 5000
# Enable readonly queries on a standby
# (Note: this will be ignored on a master but we recommend including
# it anyway)
hot_standby = on
# Enable WAL file archiving
archive_mode = on
# Set archive command to a script or application that will safely store
# you WALs in a secure place. /bin/true is an example of a command that
# ignores archiving. Use something more sensible.
archive_command = '/bin/true'
createuser s repmgr
createdb repmgr O repmgr
Ensure the
repmgr user has appropriate permissions in
pg_hba.conf and can connect in replication mode;
pg_hba.conf
should contain entries similar to the following:
local replication repmgr trust
host replication repmgr 127.0.0.1/32 trust
host replication repmgr 192.168.1.0/24 trust
local repmgr repmgr trust
host repmgr repmgr 127.0.0.1/32 trust
host repmgr repmgr 192.168.1.0/24 trust
https://github.com/2ndQuadrant/repmgr 6/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
On the standby, do not create a PostgreSQL instance, but do ensure an empty directory is available for the
postgres system
user to create a data directory.
cluster=test
node=1
node_name=node1
conninfo='host=repmgr_node1 user=repmgr dbname=repmgr'
: an arbitrary name for the replication cluster; this must be identical on all nodes
cluster
: a unique string identifying the node; we recommend a name speci c to the server (e.g. 'server_1'); avoid
node_name
names indicating the current replication role like 'master' or 'standby' as the server's role could change.
: a valid connection string for the
conninfo repmgr database on the current server. (On the standby, the database will
not yet exist, but
repmgr needs to know the connection details to complete the setup process).
repmgr.conf should not be stored inside the PostgreSQL data directory, as it could be overwritten when setting up or
reinitialising the PostgreSQL server. See section
Configuration above for further details about .
repmgr.conf
repmgr will create a schema named after the cluster and pre xed with
, e.g.
repmgr_ ; we also recommend
repmgr_test
that you set the
repmgr user's search path to include this schema name, e.g.
ALTER USER repmgr SET search_path TO repmgr_test, "$user", public;
$ repmgr f repmgr.conf master register
[20160107 16:56:46] [NOTICE] master node correctly registered for cluster test with id 1 (conninfo: host=repmgr_node1 us
repmgr=# SELECT * FROM repmgr_test.repl_nodes;
id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority |
++++++++
1 | master | | test | node1 | host=repmgr_node1 dbname=repmgr user=repmgr | | 100 |
(1 row)
Each server in the replication cluster will have its own record and will be updated when its status or role changes.
https://github.com/2ndQuadrant/repmgr 7/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
Create a
repmgr.conf le on the standby server. It must contain at least the same parameters as the master's
repmgr.conf , but with the values ,
node node_name and conninfo adjusted accordingly, e.g.:
cluster=test
node=2
node_name=node2
conninfo='host=repmgr_node2 user=repmgr dbname=repmgr'
$ repmgr h repmgr_node1 U repmgr d repmgr D /path/to/node2/data/ f /etc/repmgr.conf standby clone
[20160107 17:21:26] [NOTICE] destination directory '/path/to/node2/data/' provided
[20160107 17:21:26] [NOTICE] starting backup...
[20160107 17:21:26] [HINT] this may take some time; consider using the c/fastcheckpoint option
NOTICE: pg_stop_backup complete, all required WAL segments have been archived
[20160107 17:21:28] [NOTICE] standby clone (using pg_basebackup) complete
[20160107 17:21:28] [NOTICE] you can now start your PostgreSQL server
[20160107 17:21:28] [HINT] for example : pg_ctl D /path/to/node2/data/ start
This will clone the PostgreSQL data directory les from the master at repmgr_node1 using PostgreSQL's pg_basebackup
utility. A
recovery.conf le containing the correct parameters to start streaming from this master server will be created
automatically, and unless otherwise the
postgresql.conf and
pg_hba.conf les will be copied from the master.
Make any adjustments to the PostgreSQL con guration les now, then start the standby server.
NOTE:
repmgr standby clone does not require , however we recommend providing this as
repmgr.conf repmgr will
set the
application_name parameter in
recovery.conf as the value provided in , making it easier to
node_name
identify the node in
pg_stat_replication . It's also possible to provide some advanced options for controlling the
standby cloning process; see next section for details.
repmgr=# SELECT * FROM pg_stat_replication;
[ RECORD 1 ]+
pid | 7704
usesysid | 16384
usename | repmgr
application_name | node2
client_addr | 192.168.1.2
client_hostname |
client_port | 46196
backend_start | 20160107 17:32:58.322373+09
backend_xmin |
state | streaming
sent_location | 0/3000220
write_location | 0/3000220
flush_location | 0/3000220
replay_location | 0/3000220
sync_priority | 0
sync_state | async
https://github.com/2ndQuadrant/repmgr 8/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
repmgr f /etc/repmgr.conf standby register
[20160108 11:13:16] [NOTICE] standby node correctly registered for cluster test with id 2 (conninfo: host=repmgr_node2 u
repmgr=# SELECT * FROM repmgr_test.repl_nodes ORDER BY id;
id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority |
++++++++
1 | master | | test | node1 | host=repmgr_node1 dbname=repmgr user=repmgr | | 100 |
2 | standby | 1 | test | node2 | host=repmgr_node2 dbname=repmgr user=repmgr | | 100 |
(2 rows)
The standby server now has a copy of the records for all servers in the replication cluster. Note that the relationship between
master and standby is explicitly de ned via the
upstream_node_id value, which shows here that the standby's upstream
server is the replication cluster master. While of limited use in a simple master/standby replication cluster, this information is
required to effectively manage cascading replication (see below).
To use
rsync instead of , provide the
pg_basebackup r/rsynconly option when executing .
repmgr standby clone
Note that
repmgr forces
rsync to use
checksum mode to ensure that all the required les are copied. This results in
additional I/O on both source and destination server as the contents of les existing on both servers need to be compared,
meaning this method is not necessarily faster than making a fresh clone with .
pg_basebackup
If using
rsync to clone a standby, additional control over which les not to transfer is possible by con guring
rsync_options in
, which enables any valid
repmgr.conf rsync options to be passed to that command, e.g.:
rsync_options='exclude=postgresql.local.conf'
Controlling
primary_conninfo in
https://github.com/2ndQuadrant/repmgr
recovery.conf 9/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
Controlling
primary_conninfo in
recovery.conf
repmgr will create the
primary_conninfo setting in
recovery.conf based on the connection parameters provided to
repmgr standby clone and PostgreSQL's standard connection defaults, including any environment variables set on the local
node.
To include speci c connection parameters other than the standard host, port, username and database values (e.g. ),
sslmode
include these in a -style tring passed to
conninfo repmgr with
d/dbname (see above for details), and/or set appropriate
environment variables.
Cascading replication, introduced with PostgreSQL 9.2, enables a standby server to replicate from another standby server
rather than directly from the master, meaning replication changes "cascade" down through a hierarchy of servers. This can be
used to reduce load on the master and minimize bandwith usage between sites.
To demonstrate cascading replication, ensure you have a master and standby set up as shown above in the section "Setting
up a simple replication cluster with repmgr". Create an additional standby server with
repmgr.conf looking like this:
cluster=test
node=3
node_name=node3
conninfo='host=repmgr_node3 user=repmgr dbname=repmgr'
upstream_node=2
Ensure
upstream_node contains the node id of the previously created standby. Clone this standby (using the connection
parameters for the existing standby) and register it:
$ repmgr h repmgr_node2 U repmgr d repmgr D /path/to/node3/data/ f /etc/repmgr.conf standby clone
[20160108 13:44:52] [NOTICE] destination directory 'node_3/data/' provided
[20160108 13:44:52] [NOTICE] starting backup (using pg_basebackup)...
[20160108 13:44:52] [HINT] this may take some time; consider using the c/fastcheckpoint option
[20160108 13:44:52] [NOTICE] standby clone (using pg_basebackup) complete
[20160108 13:44:52] [NOTICE] you can now start your PostgreSQL server
[20160108 13:44:52] [HINT] for example : pg_ctl D /path/to/node_3/data start
$ repmgr f /etc/repmgr.conf standby register
[20160108 14:04:32] [NOTICE] standby node correctly registered for cluster test with id 3 (conninfo: host=repmgr_node3 d
repmgr=# SELECT * FROM repmgr_test.repl_nodes ORDER BY id;
id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority |
++++++++
1 | master | | test | node1 | host=repmgr_node1 dbname=repmgr user=repmgr | | 100 |
2 | standby | 1 | test | node2 | host=repmgr_node2 dbname=repmgr user=repmgr | | 100 |
3 | standby | 2 | test | node3 | host=repmgr_node3 dbname=repmgr user=repmgr | | 100 |
(3 rows)
https://github.com/2ndQuadrant/repmgr 10/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
Replication slots were introduced with PostgreSQL 9.4 and are designed to ensure that any standby connected to the master
using a replication slot will always be able to retrieve the required WAL les. This removes the need to manually manage WAL
le retention by estimating the number of WAL les that need to be maintained on the master using . Do
wal_keep_segments
however be aware that if a standby is disconnected, WAL will continue to accumulate on the master until either the standby
reconnects or the replication slot is dropped.
To enable
repmgr to use replication slots, set the boolean parameter
use_replication_slots in :
repmgr.conf
use_replication_slots=1
Note that
repmgr will fail with an error if this option is speci ed when working with PostgreSQL 9.3.
repmgr=# SELECT * from repl_nodes ORDER BY id;
id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority |
++++++++
1 | master | | test | node1 | host=localhost dbname=repmgr user=repmgr | repmgr_slot_1 | 100 |
2 | standby | 1 | test | node2 | host=localhost dbname=repmgr user=repmgr | repmgr_slot_2 | 100 |
3 | standby | 1 | test | node3 | host=localhost dbname=repmgr user=repmgr | repmgr_slot_3 | 100 |
repmgr=# SELECT * FROM pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn
+++++++++
repmgr_slot_3 | | physical | | | t | 26060 | | | 0/50028F0
repmgr_slot_2 | | physical | | | t | 26079 | | | 0/50028F0
(2 rows)
Note that a slot name will be created by default for the master but not actually used unless the master is converted to a
standby using e.g. .
repmgr standby switchover
Be aware that when initially cloning a standby, you will need to ensure that all required WAL les remain available while the
cloning is taking place. If using the default
pg_basebackup method, we recommend setting 's
pg_basebackup xlog
method parameter to
stream like this:
pg_basebackup_options='xlogmethod=stream'
See the
pg_basebackup documentation for details: https://www.postgresql.org/docs/current/static/app-pgbasebackup.html
To demonstrate this, set up a replication cluster with a master and two attached standby servers so that the
repl_nodes
table looks like this:
repmgr=# SELECT * FROM repmgr_test.repl_nodes ORDER BY id;
id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority |
++++++++
1 | master | | test | node1 | host=repmgr_node1 dbname=repmgr user=repmgr | | 100 |
2 | standby | 1 | test | node2 | host=repmgr_node2 dbname=repmgr user=repmgr | | 100 |
https://github.com/2ndQuadrant/repmgr 11/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
3 | standby | 1 | test | node3 | host=repmgr_node3 dbname=repmgr user=repmgr | | 100 |
(3 rows)
$ pg_ctl D /path/to/node_1/data m fast stop
At this point the replication cluster will be in a partially disabled state with both standbys accepting read-only connections
while attempting to connect to the stopped master. Note that the
repl_nodes table will not yet have been updated and will
still show the master as active.
$ repmgr f /etc/repmgr.conf standby promote
[20160108 16:07:31] [ERROR] connection to database failed: could not connect to server: Connection refused
Is the server running on host "repmgr_node1" (192.161.2.1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "repmgr_node1" (192.161.2.1) and accepting
TCP/IP connections on port 5432?
[20160108 16:07:31] [NOTICE] promoting standby
[20160108 16:07:31] [NOTICE] promoting server using '/usr/bin/postgres/pg_ctl D /path/to/node_2/data promote'
server promoting
[20160108 16:07:33] [NOTICE] STANDBY PROMOTE successful
The
repl_nodes table will now look like this:
id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority |
++++++++
1 | master | | test | node1 | host=repmgr_node1 dbname=repmgr user=repmgr | | 100 |
2 | master | | test | node2 | host=repmgr_node2 dbname=repmgr user=repmgr | | 100 |
3 | standby | 1 | test | node3 | host=repmgr_node3 dbname=repmgr user=repmgr | | 100 |
(3 rows)
However the sole remaining standby is still trying to replicate from the failed master;
repmgr standby follow must now be
executed to rectify this situation.
Following the failure or removal of the replication cluster's existing master server,
repmgr standby follow can be used to
make 'orphaned' standbys follow the new master and catch up to its current state.
To demonstrate this, assuming a replication cluster in the same state as the end of the preceding section ("Promoting a
standby server with repmgr"), execute this:
https://github.com/2ndQuadrant/repmgr 12/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
$ repmgr f /etc/repmgr.conf D /path/to/node_3/data/ h repmgr_node2 U repmgr d repmgr standby follow
[20160108 16:57:06] [NOTICE] restarting server using '/usr/bin/postgres/pg_ctl D /path/to/node_3/data/ w m fast resta
waiting for server to shut down.... done
server stopped
waiting for server to start.... done
server started
id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority |
++++++++
1 | master | | test | node1 | host=repmgr_node1 dbname=repmgr user=repmgr | | 100 |
2 | master | | test | node2 | host=repmgr_node2 dbname=repmgr user=repmgr | | 100 |
3 | standby | 2 | test | node3 | host=repmgr_node3 dbname=repmgr user=repmgr | | 100 |
(3 rows)
In some cases however it's desirable to promote the standby in a planned way, e.g. so maintenance can be performed on the
master; this kind of switchover is supported by the
repmgr standby switchover command.
NOTE
repmgr standby switchover performs a relatively complex series of operations on two servers, and should
therefore be performed after careful preparation and with adequate attention. In particular you should be con dent that
your network environment is stable and reliable.
We recommend running
repmgr standby switchover at the most verbose logging level (
loglevel DEBUG
) and capturing all output to assist troubleshooting any problems.
verbose
To demonstrate switchover, we will assume a replication cluster running on PostgreSQL 9.5 or later with a master ( )
node1
and a standby ( ); after the switchover
node2 node2 should become the master with
node1 following it.
The switchover command must be run from the standby which is to be promoted, and in its simplest form looks like this:
$ repmgr f /etc/repmgr.conf C /etc/repmgr.conf standby switchover v
[20160127 16:38:33] [NOTICE] using configuration file "/etc/repmgr.conf"
https://github.com/2ndQuadrant/repmgr 13/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
[20160127 16:38:33] [NOTICE] switching current node 2 to master server and demoting current master to standby...
[20160127 16:38:34] [NOTICE] 5 files copied to /tmp/repmgrnode1archive
[20160127 16:38:34] [NOTICE] connection to database failed: FATAL: the database system is shutting down
[20160127 16:38:34] [NOTICE] current master has been stopped
[20160127 16:38:34] [ERROR] connection to database failed: FATAL: the database system is shutting down
[20160127 16:38:34] [NOTICE] promoting standby
[20160127 16:38:34] [NOTICE] promoting server using '/usr/local/bin/pg_ctl D /var/lib/postgresql/9.5/node_2/data promot
server promoting
[20160127 16:38:36] [NOTICE] STANDBY PROMOTE successful
[20160127 16:38:36] [NOTICE] Executing pg_rewind on old master server
[20160127 16:38:36] [NOTICE] 5 files copied to /var/lib/postgresql/9.5/data
[20160127 16:38:36] [NOTICE] restarting server using '/usr/local/bin/pg_ctl w D /var/lib/postgresql/9.5/node_1/data m
pg_ctl: PID file "/var/lib/postgresql/9.5/node_1/data/postmaster.pid" does not exist
Is server running?
starting server anyway
[20160127 16:38:37] [NOTICE] node 1 is replicating in state "streaming"
[20160127 16:38:37] [NOTICE] switchover was successful
The old master is now replicating as a standby from the new master and
repl_nodes should have been updated to re ect
this:
repmgr=# SELECT * from repl_nodes ORDER BY id;
id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority | act
++++++++
1 | standby | 2 | test | node1 | host=localhost dbname=repmgr user=repmgr | | 100 | t
2 | master | | test | node2 | host=localhost dbname=repmgr user=repmgr | | 100 | t
(2 rows)
Caveats
The functionality provided
repmgr standby switchover is primarily aimed at a two-server master/standby replication
cluster and currently does not support additional standbys.
repmgr standby switchover is designed to use the
pg_rewind utility, standard in 9.5 and later and available for
seperately in 9.3 and 9.4 (see note below)
pg_rewind requires that either
wal_log_hints is enabled, or that data checksums were enabled when the cluster was
initialized. See the
pg_rewind documentation for details: https://www.postgresql.org/docs/current/static/app-
pgrewind.html
repmgrd should not be running when a switchover is carried out, otherwise the
repmgrd may try and promote a
standby by itself.
Any other standbys attached to the old master will need to be manually instructed to point to the new master (e.g. with
).
repmgr standby follow
You must ensure that following a server start using , log output is not send to STDERR (the default behaviour). If
pg_ctl
logging is not con gured, We recommend setting logging_collector=on in postgresql.conf and providing an explicit
l/log setting in
's
repmgr.conf pg_ctl_options parameter.
https://github.com/2ndQuadrant/repmgr 14/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
The utility
pg_rewind provides an ef cient way of doing this, however is not included in the core PostgreSQL distribution for
versions 9.3 and 9.4. However, pg_rewind is available separately for these versions and we strongly recommend its
installation. To use it with versions 9.3 and 9.4, provide the command line option pg_rewind , optionally with the path to
the
pg_rewind binary location if not installed in the PostgreSQL
bin directory.
If
pg_rewind is not available, as a fallback
repmgr will use
repmgr standby clone to resynchronise the old master's data
directory using . However, in order to ensure all les are synchronised, the entire data directory on both servers must
rsync
be scanned, a process which can take some time on larger databases, in which case you should consider making a fresh
standby clone.
repmgr standby unregister f /etc/repmgr.conf
This will remove the standby record from repmgr 's internal metadata table ( ). A
repl_nodes standby_unregister event
noti cation will be recorded in the
repl_events table.
Note that this command will not stop the server itself or remove it from the replication cluster.
If the standby is not running, the command can be executed on another node by providing the id of the node to be
unregistered using the command line parameter , e.g. executing the following command on the master server will
node
unregister the standby with id 3:
repmgr standby unregister f /etc/repmgr.conf node=3
repmgrd is a management and monitoring daemon which runs on standby nodes and which can automate actions such as
failover and updating standbys to follow the new master.
To use
repmgrd for automatic failover, the following
repmgrd options must be set in :
repmgr.conf
failover=automatic
promote_command='repmgr standby promote f /etc/repmgr/repmgr.conf'
follow_command='repmgr standby follow f /etc/repmgr/repmgr.conf'
(See
repmgr.conf.sample for further -speci c settings).
repmgrd
Additionally,
postgresql.conf must contain the following line:
shared_preload_libraries = 'repmgr_funcs'
When
failover is set to , upon detecting failure of the current master,
automatic repmgrd will execute one of
promote_command or
, depending on whether the current server is becoming the new master or needs to
follow_command
follow another server which has become the new master. Note that these commands can be any valid shell script which
results in one of these actions happening, but we strongly recommend executing
repmgr directly.
https://github.com/2ndQuadrant/repmgr 15/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
repmgrd f /etc/repmgr.conf verbose >> $HOME/repmgr/repmgr.log 2>&1
To demonstrate automatic failover, set up a 3-node replication cluster (one master and two standbys streaming directly from
the master) so that the
repl_nodes table looks like this:
repmgr=# SELECT * FROM repmgr_test.repl_nodes ORDER BY id;
id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority |
++++++++
1 | master | | test | node1 | host=repmgr_node1 dbname=repmgr user=repmgr | | 100 |
2 | standby | 1 | test | node2 | host=repmgr_node2 dbname=repmgr user=repmgr | | 100 |
3 | standby | 1 | test | node3 | host=repmgr_node3 dbname=repmgr user=repmgr | | 100 |
(3 rows)
Start
repmgrd on each standby and verify that it's running by examining the log output, which at log level INFO will look like
this:
[20160105 13:15:40] [INFO] checking cluster configuration with schema 'repmgr_test'
[20160105 13:15:40] [INFO] checking node 2 in cluster 'test'
[20160105 13:15:40] [INFO] reloading configuration file and updating repmgr tables
[20160105 13:15:40] [INFO] starting continuous standby node monitoring
Each
repmgrd should also have noted its successful startup in the
repl_events table:
repmgr=# SELECT * FROM repl_events WHERE event = 'repmgrd_start';
node_id | event | successful | event_timestamp | details
++++
2 | repmgrd_start | t | 20160127 18:22:38.080231+09 |
3 | repmgrd_start | t | 20160127 18:22:38.08756+09 |
(2 rows)
pg_ctl D /path/to/node1/data m immediate stop
This will force the master node to shut down straight away, aborting all processes and transactions. This will cause a urry of
activity in the
repmgrd log les as each
repmgrd detects the failure of the master and a failover decision is made. Here
extracts from the standby server promoted to new master:
[20160106 18:32:58] [WARNING] connection to upstream has been lost, trying to recover... 15 seconds before failover deci
[20160106 18:33:03] [WARNING] connection to upstream has been lost, trying to recover... 10 seconds before failover deci
[20160106 18:33:08] [WARNING] connection to upstream has been lost, trying to recover... 5 seconds before failover decis
...
[20160106 18:33:18] [NOTICE] this node is the best candidate to be the new master, promoting...
...
[20160106 18:33:20] [NOTICE] STANDBY PROMOTE successful
and here from the standby server which is now following the new master:
[20160106 18:32:58] [WARNING] connection to upstream has been lost, trying to recover... 15 seconds before failover deci
[20160106 18:33:03] [WARNING] connection to upstream has been lost, trying to recover... 10 seconds before failover deci
[20160106 18:33:08] [WARNING] connection to upstream has been lost, trying to recover... 5 seconds before failover decis
...
https://github.com/2ndQuadrant/repmgr 16/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
[20160106 18:33:23] [NOTICE] node 2 is the best candidate for new master, attempting to follow...
[20160106 18:33:23] [INFO] changing standby's master
...
[20160106 18:33:25] [NOTICE] node 3 now following new upstream node 2
The repl_nodes table should have been updated to re ect the new situation, with the original master ( ) marked as
node1
inactive, and standby
node3 now following the new master ( node2 ):
repmgr=# SELECT * from repl_nodes ORDER BY id;
id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority | act
++++++++
1 | master | | test | node1 | host=localhost dbname=repmgr user=repmgr | | 100 | f
2 | master | | test | node2 | host=localhost dbname=repmgr user=repmgr | | 100 | t
3 | standby | 2 | test | node3 | host=localhost dbname=repmgr user=repmgr | | 100 | t
(3 rows)
The
repl_events table will contain a summary of what happened to each server during the failover:
repmgr=# SELECT * from repmgr_test.repl_events where event_timestamp>='20160106 18:30';
node_id | event | successful | event_timestamp | details
++++
2 | standby_promote | t | 20160106 18:33:20.061736+09 | node 2 was successfully promoted to mas
2 | repmgrd_failover_promote | t | 20160106 18:33:20.067132+09 | node 2 promoted to master; old master 1
3 | repmgrd_failover_follow | t | 20160106 18:33:25.331012+09 | node 3 now following new upstream node
(3 rows)
repmgrd log rotation
/var/log/postgresql/repmgr9.5.log {
missingok
compress
rotate 52
maxsize 100M
weekly
create 0600 postgres postgres
}
In addition to the
repmgr con guration settings, parameters in the
conninfo string in uence how
repmgr makes a
network connection to PostgreSQL. In particular, if another server in the replication cluster is unreachable at network level,
system network settings will in uence the length of time it takes to determine that the connection is not possible.
https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS
Monitoring with
repmgrd
https://github.com/2ndQuadrant/repmgr 17/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
When
repmgrd is running with the option , it will constantly write standby node status
m/monitoringhistory
information to the
repl_monitor table, providing a near-real time overview of replication status on all nodes in the cluster.
The view
repl_status shows the most recent state for each node, e.g.:
repmgr=# SELECT * FROM repmgr_test.repl_status;
[ RECORD 1 ]+
primary_node | 1
standby_node | 2
standby_name | node2
node_type | standby
active | t
last_monitor_time | 20160105 14:02:34.51713+09
last_wal_primary_location | 0/3012AF0
last_wal_standby_location | 0/3012AF0
replication_lag | 0 bytes
replication_time_lag | 00:00:03.463085
apply_lag | 0 bytes
communication_time_lag | 00:00:00.955385
The interval in which monitoring history is written is controlled by the con guration parameter ;
monitor_interval_secs
default is 2.
Note that when a standby node is not streaming directly from its upstream node, i.e. recovering WAL from an archive,
apply_lag will always appear as
.
0 bytes
In a situation caused e.g. by a network interruption between two data centres, it's important to avoid a "split-brain" situation
where both sides of the network assume they are the active segment and the side without an active master unilaterally
promotes one of its standbys.
To prevent this situation happening, it's essential to ensure that one network segment has a "voting majority", so other
segments will know they're in the minority and not attempt to promote a new master. Where an odd number of servers exists,
this is not an issue. However, if each network has an even number of nodes, it's necessary to provide some way of ensuring a
majority, which is where the witness server becomes useful.
This is not a fully- edged standby node and is not integrated into replication, but it effectively represents the "casting vote"
when deciding which network segment has a majority. A witness server can be set up using repmgr witness create (see
below for details) and can run on a dedicated server or an existing node. Note that it only makes sense to create a witness
server in conjunction with running ; the witness server will require its own
repmgrd repmgrd instance.
Cascading replication - where a standby can connect to an upstream node and not the master server itself - was introduced in
PostgreSQL 9.2.
repmgr and
repmgrd support cascading replication by keeping track of the relationship between standby
servers - each node record is stored with the node id of its upstream ("parent") server (except of course the master server).
In a failover situation where the master node fails and a top-level standby is promoted, a standby connected to another
standby will not be affected and continue working as normal (even if the upstream standby it's connected to becomes the
master node). If however the node's direct upstream fails, the "cascaded standby" will attempt to reconnect to that node's
parent.
https://github.com/2ndQuadrant/repmgr 18/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
Each time
repmgr or
repmgrd perform a signi cant event, a record of that event is written into the
repl_events table
together with a timestamp, an indication of failure or success, and further details if appropriate. This is useful for gaining an
overview of events affecting the replication cluster. However note that this table has advisory character and should be used in
combination with the
repmgr and PostgreSQL logs to obtain details of any events.
Example output after a master was registered and a standby cloned and registered:
repmgr=# SELECT * from repmgr_test.repl_events ;
node_id | event | successful | event_timestamp | details
++++
1 | master_register | t | 20160108 15:04:39.781733+09 |
2 | standby_clone | t | 20160108 15:04:49.530001+09 | Cloned from host 'repmgr_node1', port 5432; bac
2 | standby_register | t | 20160108 15:04:50.621292+09 |
(3 rows)
Additionally, event noti cations can be passed to a user-de ned program or script which can take further action, e.g. send
email noti cations. This is done by setting the
event_notification_command parameter in .
repmgr.conf
%n node ID
%e event type
%s success (1 or 0)
%t timestamp
%d details
The values provided for "%t" and "%d" will probably contain spaces, so should be quoted in the provided command
con guration, e.g.:
event_notification_command='/path/to/some/script %n %e %s "%t" "%d"'
By default, all noti cations will be passed; the noti cation types can be ltered to explicitly named ones:
event_notifications=master_register,standby_register,witness_create
master_register
standby_register
standby_unregister
standby_clone
standby_promote
standby_follow
standby_switchover
witness_create
witness_create
repmgrd_start
repmgrd_shutdown
repmgrd_failover_promote
repmgrd_failover_follow
Note that under some circumstances (e.g. no replication cluster master could be located), it will not be possible to write an
entry into the
repl_events table, in which case
event_notification_command can serve as a fallback.
https://github.com/2ndQuadrant/repmgr 19/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
Upgrading repmgr
repmgr is updated regularly with point releases (e.g. 3.0.2 to 3.0.3) containing bug xes and other minor improvements. Any
substantial new functionality will be included in a feature release (e.g. 3.0.x to 3.1.x).
In general
repmgr can be upgraded as-is without any further action required, however feature releases may require the
repmgr database to be upgraded. An SQL script will be provided - please check the release notes for details.
Reference
Default values
For some command line and most con guration le parameters,
repmgr falls back to default values if values for these are
not explicitly provided.
The le repmgr.conf.sample documents the default value of con guration parameters if one is set. Of particular note is the
log level, which defaults to NOTICE; particularly when using repmgr from the command line it may be useful to set this to a
higher level with . e.g. to
L/loglevel .
INFO
Execute
repmgr help to see the default values for various command line parameters, particularly database connection
parameters.
repmgr commands
The
repmgr command line tool accepts commands for speci c servers in the replication in the format "
server_type
", or for the entire replication cluster in the format "
action cluster ". Each command is described below.
action
In general, each command needs to be provided with the path to , which contains connection details for the
repmgr.conf
local database.
master register
Registers a master in a cluster. This command needs to be executed before any standby nodes are registered.
standby register
Registers a standby with . This command needs to be executed to enable promote/follow operations and to
repmgr
allow
repmgrd to work with the node. An existing standby can be registered using this command.
standby unregister
Unregisters a standby with . This command does not affect the actual replication, just removes the standby's
repmgr
entry from the
repl_nodes table.
standby clone [node to be cloned]
Clones a new standby node from the data directory of the master (or an upstream cascading standby) using
pg_basebackup or
. Additionally it will create the
rsync recovery.conf le required to start the server as a standby.
This command does not require
repmgr.conf to be provided, but does require connection details of the master or
upstream server as command line parameters.
Provide the
D/datadir option to specify the destination data directory; if not, the same directory path as on the
source server will be used. By default,
pg_basebackup will be used to copy data from the master or upstream node but
this can only be used for bootstrapping new installations. To update an existing but 'stale' data directory (for example
https://github.com/2ndQuadrant/repmgr 20/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
standby promote
Promotes a standby to a master if the current master has failed. This command requires a valid
repmgr.conf le for the
standby, either speci ed explicitly with
f/configfile or located in the current working directory; no additional
arguments are required.
If the standby promotion succeeds, the server will not need to be restarted. However any other standbys will need to
follow the new server, by using
standby follow (see below); if repmgrd is active, it will handle this.
This command will not function if the current master is still running.
standby switchover
Promotes a standby to master and demotes the existing master to a standby. This command must be run on the standby
to be promoted, and requires a password-less SSH connection to the current master. Additionally the location of the
master's
repmgr.conf le must be provided with .
C/remoteconfigfile
repmgrd should not be active if a switchover is attempted. This restriction may be lifted in a later version.
standby follow
This command will force a restart of the standby server. It can only be used to attach a standby to a new master node.
witness create
Creates a witness server as a separate PostgreSQL instance. This instance can be on a separate server or a server
running an existing node. The witness server contain a copy of the repmgr metadata tables but will not be set up as a
standby; instead it will update its metadata copy each time a failover occurs.
This command requires a repmgr.conf le containing a valid conninfo string for the server to be created, as well as the
other minimum required parameters detailed in the section
repmgr configuration file above.
By default the witness server will use port 5499 to facilitate easier setup on a server running an existing node. To use a
different port, supply this explicitly in the
repmgr.conf conninfo string.
This command also requires the location of the witness server's data directory to be provided ( ) as well
D/datadir
as valid connection parameters for the master server.
By default this command will create a superuser and a repmgr user. The
repmgr user name will be extracted from the
conninfo string in
.
repmgr.conf
cluster show
Displays information about each active node in the replication cluster. This command polls each registered server and
shows its role (master / standby / witness) or
FAILED if the node doesn't respond. It polls each server directly and can
be run on any node in the cluster; this is also useful when analyzing connectivity from a particular node.
Example:
$ repmgr f /etc/repmgr.conf cluster show
Role | Name | Upstream | Connection String
https://github.com/2ndQuadrant/repmgr 21/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
+||
* master | node1 | | host=db_node1 dbname=repmgr user=repmgr
standby | node2 | node1 | host=db_node2 dbname=repmgr user=repmgr
standby | node3 | node2 | host=db_node3 dbname=repmgr user=repmgr
To show database connection errors when polling nodes, run the command in
verbose mode.
The
cluster show command now accepts the optional parameter , which outputs the replication cluster's status
csv
in a simple CSV format, suitable for parsing by scripts:
$ repmgr f /etc/repmgr.conf cluster show csv
1,1
2,0
3,1
The rst column is the node's ID, and the second column represents the node's status (0 = master, 1 = standby, -1 =
failed).
cluster cleanup
Error codes
repmgr or
repmgrd will return one of the following error codes on program exit:
2ndQuadrant provides 24x7 production support for , including con guration assistance, installation veri cation and
repmgr
training for running a robust replication cluster. For further details see:
http://2ndquadrant.com/en/support/
http://groups.google.com/group/repmgr
https://github.com/2ndQuadrant/repmgr 22/23
8/4/2016 GitHub - 2ndQuadrant/repmgr: The Most Popular Replication Manager for PostgreSQL (Postgres)
https://github.com/2ndQuadrant/repmgr
We'd love to hear from you about how you use repmgr. Case studies and news are always welcome. Send us an email at
info@2ndQuadrant.com, or send a postcard to
repmgr
c/o 2ndQuadrant
7200 The Quorum
Oxford Business Park North
Oxford
OX4 2JZ
United Kingdom
Ian Barwick
Jaime Casanova
Abhijit Menon-Sen
Simon Riggs
Cedric Villemain
Further reading
http://blog.2ndquadrant.com/improvements-in-repmgr-3-1-4/
http://blog.2ndquadrant.com/managing-useful-clusters-repmgr/
http://blog.2ndquadrant.com/easier_postgresql_90_clusters/
© 2016 GitHub, Inc. Terms Privacy Security Status Help Contact GitHub API Training Shop Blog About
https://github.com/2ndQuadrant/repmgr 23/23