Description
PGVIP - auto failover service from master to standby PostgreSQL database using virtual (floating) IP address
The service is installed on one server, which is a single point of failure (example, nginx proxy), or on any two servers (hereinafter referred to as web servers). There is no need to install anything on the database servers; the web server connects to them via ssh and executes commands from the configuration file. Administrator performs initial setup of the standby database and reverse switch manually.
Failover schemas:
for one web server

for multiple web servers

Technically, the PGVIP service is implemented as a wrapper that executes operating system commands from a configuration file. It was originally developed as a bash script, then rewritten in C language.
Conditions underling the development:
- small projects in which installing a cluster (for example, Patroni) is impractical
- no requirements for a standby database
- the administrator manually creates a standby database and after failover switches it back using standard tools (pg_basebackup)
- all commands are customizable; the administrator can run any command on the command line to check it
- direct connection to the master database without intermediate layers, VIP is activated (up) on the network interface with a physical address
- compilation with one command without configuration and libraries
- actions are performed only depending on the current state, this allows to avoid saving information about the state of databases and not synchronizing web servers
Main actions performed by the pgvip service:
- checking the availability of the master database and the lag of the standby database, the status of the virtual IP address (up/down)
- managing a virtual IP address, running a script to automatically turn it off (necessary if the connection is lost)
- activation (promotion) of a standby database and transfer of a virtual IP address
Status (pgvip status)
[root@web-server ~]# pgvip status status: SUCCESS (duration: 0 00:00:32) +---------+--------------+-------------------------+---------------+---------------+ | Role | IP address | Database state | 192.168.56.10 | VIP auto down | +---------+--------------+-------------------------+---------------+---------------+ | master | 192.168.56.1 | read-write | up on enp0s3 | executing | | standby | 192.168.56.2 | in recovery, lag 0 min. | down | executing | +---------+--------------+-------------------------+---------------+---------------+ [root@web-server ~]# pgvip status status: ERROR (duration: 0 00:00:14, cause: master db is not read-write) +---------+--------------+-------------------------+---------------+---------------+ | Role | IP address | Database state | 192.168.56.10 | VIP auto down | +---------+--------------+-------------------------+---------------+---------------+ | master | 192.168.56.1 | not available | not available | not available | | standby | 192.168.56.2 | in recovery, lag 0 min. | down | executing | +---------+--------------+-------------------------+---------------+---------------+ [root@web-server ~]# pgvip status status: STANDBY_PROMOTED (duration: 0 00:00:06) +---------+--------------+----------------+---------------+---------------+ | Role | IP address | Database state | 192.168.56.10 | VIP auto down | +---------+--------------+----------------+---------------+---------------+ | master | 192.168.56.1 | not available | not available | not available | | standby | 192.168.56.2 | read-write | up on enp0s3 | executing | +---------+--------------+----------------+---------------+---------------+
System journal (journalctl -fu pgvip or tail -f /var/log/pgsuite/pgvip.log)
[root@web-server ~]# journalctl -fu pgvip Feb 08 09:49:17 web-server systemd[1]: Started pgvip (PostgreSQL master-standby auto failover). Feb 08 09:49:17 web-server systemd[1]: Starting pgvip (PostgreSQL master-standby auto failover)... Feb 08 09:49:17 web-server pgvip[2713]: INFO ACTION_EXECUTOR thread started, thread_id: 2718 Feb 08 09:49:17 web-server pgvip[2713]: INFO ADMINISTRATION thread started, thread_id: 2719 Feb 08 09:49:17 web-server pgvip[2713]: INFO ADMINISTRATION binded to unix socket "/tmp/.s.PGVIP.0" Feb 08 09:49:17 web-server pgvip[2713]: INFO ADMINISTRATION listening Feb 08 09:49:17 web-server pgvip[2713]: INFO CHECKER_MASTER_DB thread started, thread_id: 2714 Feb 08 09:49:17 web-server pgvip[2713]: INFO CHECKER_MASTER_VIP thread started, thread_id: 2715 Feb 08 09:49:17 web-server pgvip[2713]: INFO CHECKER_STANDBY_DB thread started, thread_id: 2716 Feb 08 09:49:17 web-server pgvip[2713]: INFO CHECKER_STANDBY_VIP thread started, thread_id: 2717 ... Feb 08 09:50:15 web-server pgvip[2713]: WARN CHECKER_MASTER_DB PGSUITE-905 OS command executed with error (errno: 124) Feb 08 09:50:15 web-server pgvip[2713]: WARN CHECKER_MASTER_DB command: Feb 08 09:50:15 web-server pgvip[2713]: WARN CHECKER_MASTER_DB timeout 5 ssh 192.168.56.1 "su - postgres -c "psql -Aqtc 'select pg_is_in_recovery() or pg_current_xact_id() is null, extract(epoch from now()-pg_last_xact_replay_timestamp())'"" Feb 08 09:50:15 web-server pgvip[2713]: WARN CHECKER_MASTER_DB output: Feb 08 09:50:15 web-server pgvip[2713]: WARN CHECKER_MASTER_DB Feb 08 09:50:15 web-server pgvip[2713]: WARN CHECKER_MASTER_VIP PGSUITE-905 OS command executed with error (errno: 124) Feb 08 09:50:15 web-server pgvip[2713]: WARN CHECKER_MASTER_VIP command: Feb 08 09:50:15 web-server pgvip[2713]: WARN CHECKER_MASTER_VIP timeout 5 ssh 192.168.56.1 "flock /tmp/pgvip.time -c 'date +%s > /tmp/pgvip.time' ; flock -n /tmp/pgvip.vip_auto_down -c '' ; echo -n $?, ; ip address show | awk '/ inet 192.168.56.10\// {print $NF}'" Feb 08 09:50:15 web-server pgvip[2713]: WARN CHECKER_MASTER_VIP output: Feb 08 09:50:15 web-server pgvip[2713]: WARN CHECKER_MASTER_VIP ... Feb 08 09:50:56 web-server pgvip[2713]: INFO ACTION_EXECUTOR executing action "action_standby_db_promote" Feb 08 09:50:56 web-server pgvip[2713]: INFO ACTION_EXECUTOR condition: (time_now()-status.value_time)>g_time_standby_promote_delay_int && status.master_db_state!=G_DB_STATE_READ_WRITE && status.standby_db_state==G_DB_STATE_IN_RECOVERY && status.standby_db_lag<=g_time_standby_allowable_lag_int Feb 08 09:50:56 web-server pgvip[2713]: INFO ACTION_EXECUTOR status: ERROR (duration: 0 00:00:41, cause: master db is not read-write) Feb 08 09:50:56 web-server pgvip[2713]: INFO ACTION_EXECUTOR Feb 08 09:50:56 web-server pgvip[2713]: INFO ACTION_EXECUTOR +---------+--------------+-------------------------+---------------+---------------+ Feb 08 09:50:56 web-server pgvip[2713]: INFO ACTION_EXECUTOR | Role | IP address | Database state | 192.168.56.10 | VIP auto down | Feb 08 09:50:56 web-server pgvip[2713]: INFO ACTION_EXECUTOR +---------+--------------+-------------------------+---------------+---------------+ Feb 08 09:50:56 web-server pgvip[2713]: INFO ACTION_EXECUTOR | master | 192.168.56.1 | not available | not available | not available | Feb 08 09:50:56 web-server pgvip[2713]: INFO ACTION_EXECUTOR | standby | 192.168.56.2 | in recovery, lag 0 min. | down | executing | Feb 08 09:50:56 web-server pgvip[2713]: INFO ACTION_EXECUTOR +---------+--------------+-------------------------+---------------+---------------+ Feb 08 09:50:56 web-server pgvip[2713]: INFO ACTION_EXECUTOR Feb 08 09:50:56 web-server pgvip[2713]: INFO ACTION_EXECUTOR executing command: Feb 08 09:50:56 web-server pgvip[2713]: INFO ACTION_EXECUTOR timeout 5 ssh 192.168.56.1 "ip address del 192.168.56.10/24 dev `ip address show | awk '/ inet 192.168.56.10\// {print $NF}'`" Feb 08 09:50:59 web-server pgvip[2713]: WARN ACTION_EXECUTOR PGSUITE-905 OS command executed with error (errno: 255) Feb 08 09:50:59 web-server pgvip[2713]: WARN ACTION_EXECUTOR output: Feb 08 09:50:59 web-server pgvip[2713]: WARN ACTION_EXECUTOR Feb 08 09:50:59 web-server pgvip[2713]: INFO ACTION_EXECUTOR executing command: Feb 08 09:50:59 web-server pgvip[2713]: INFO ACTION_EXECUTOR timeout 5 ssh 192.168.56.2 "su - postgres -c "\$(pg_config --bindir)/pg_ctl promote -W -D \$(psql -Aqtc 'show data_directory')"" Feb 08 09:50:59 web-server pgvip[2713]: INFO ACTION_EXECUTOR output: Feb 08 09:50:59 web-server pgvip[2713]: INFO ACTION_EXECUTOR server promoting Feb 08 09:50:59 web-server pgvip[2713]: INFO ACTION_EXECUTOR Feb 08 09:50:59 web-server pgvip[2713]: INFO ACTION_EXECUTOR command executed successfully Feb 08 09:50:59 web-server pgvip[2713]: INFO ACTION_EXECUTOR executing command: Feb 08 09:50:59 web-server pgvip[2713]: INFO ACTION_EXECUTOR timeout 5 ssh 192.168.56.2 "ip address add 192.168.56.10/24 dev `ip address show | awk '/ inet 192.168.56.2\// {print $NF}'`" Feb 08 09:51:00 web-server pgvip[2713]: INFO ACTION_EXECUTOR output: Feb 08 09:51:00 web-server pgvip[2713]: INFO ACTION_EXECUTOR Feb 08 09:51:00 web-server pgvip[2713]: INFO ACTION_EXECUTOR command executed successfully Feb 08 09:51:00 web-server pgvip[2713]: INFO ACTION_EXECUTOR action "action_standby_db_promote" executed successfully ... Feb 08 09:52:29 web-server pgvip[2713]: INFO ACTION_EXECUTOR executing action "action_master_db_break" Feb 08 09:52:29 web-server pgvip[2713]: INFO ACTION_EXECUTOR condition: status.master_db_state==G_DB_STATE_READ_WRITE && status.standby_db_state==G_DB_STATE_READ_WRITE Feb 08 09:52:29 web-server pgvip[2713]: INFO ACTION_EXECUTOR status: STANDBY_PROMOTED (duration: 0 00:01:25) Feb 08 09:52:29 web-server pgvip[2713]: INFO ACTION_EXECUTOR Feb 08 09:52:29 web-server pgvip[2713]: INFO ACTION_EXECUTOR +---------+--------------+----------------+---------------+---------------+ Feb 08 09:52:29 web-server pgvip[2713]: INFO ACTION_EXECUTOR | Role | IP address | Database state | 192.168.56.10 | VIP auto down | Feb 08 09:52:29 web-server pgvip[2713]: INFO ACTION_EXECUTOR +---------+--------------+----------------+---------------+---------------+ Feb 08 09:52:29 web-server pgvip[2713]: INFO ACTION_EXECUTOR | master | 192.168.56.1 | read-write | down | executing | Feb 08 09:52:29 web-server pgvip[2713]: INFO ACTION_EXECUTOR | standby | 192.168.56.2 | read-write | up on enp0s3 | executing | Feb 08 09:52:29 web-server pgvip[2713]: INFO ACTION_EXECUTOR +---------+--------------+----------------+---------------+---------------+ Feb 08 09:52:29 web-server pgvip[2713]: INFO ACTION_EXECUTOR Feb 08 09:52:29 web-server pgvip[2713]: INFO ACTION_EXECUTOR executing command: Feb 08 09:52:29 web-server pgvip[2713]: INFO ACTION_EXECUTOR timeout 5 ssh 192.168.56.1 "su - postgres -c "psql -c 'alter system set listen_addresses=pgvip_broken'" ; ps -e | awk '$4=="postmaster" || $4=="postgres" {print $1}' | xargs -r kill -9" Feb 08 09:52:30 web-server pgvip[2713]: INFO ACTION_EXECUTOR output: Feb 08 09:52:30 web-server pgvip[2713]: INFO ACTION_EXECUTOR ALTER SYSTEM Feb 08 09:52:30 web-server pgvip[2713]: INFO ACTION_EXECUTOR Feb 08 09:52:30 web-server pgvip[2713]: INFO ACTION_EXECUTOR command executed successfully Feb 08 09:52:30 web-server pgvip[2713]: INFO ACTION_EXECUTOR action "action_master_db_break" executed successfully
[root@web-server ~]# mkdir /tmp/pgvip ; cd /tmp/pgvip [root@web-server /tmp/pgvip]# curl -Os https://pgvip.org/files/download/pgvip.tar [root@web-server /tmp/pgvip]# tar -xf pgvip.tar [root@web-server /tmp/pgvip]# make install ip_master=192.168.56.1 ip_standby=192.168.56.2 ip_virtual=192.168.56.10 subnet_mask=24 # Configuration if [ -z "24" ] ; then echo -e ' Usage with parameters: make install ip_master=[ip_master] ip_standby=[ip_standby] ip_virtual=[ip_virtual] subnet_mask=[subnet_mask] Example: make install ip_master=192.168.56.1 ip_standby=192.168.56.2 ip_virtual=192.168.56.10 subnet_mask=24' ; exit 1; fi if [ -e /etc/pgsuite/pgvip.conf ] ; then echo -e ' Configuration file /etc/pgsuite/pgvip.conf already exists, change it' ; exit 1; fi cp pgvip.conf /etc/pgsuite/pgvip.conf sed -i "s/[ip_master]/192.168.56.1/;s/[ip_standby]/192.168.56.2/;s/[ip_virtual]/192.168.56.10/;s/[subnet_mask]/24/" /etc/pgsuite/pgvip.conf # Service if [ -e /etc/systemd/system/pgvip.service ] ; then echo -e ' Service file /etc/systemd/system/pgvip.service already exists, change it' ; exit 1; fi cp pgvip.service /etc/systemd/system/pgvip.service rm -f /usr/bin/pgvip cp pgvip /usr/bin/pgvip systemctl daemon-reload echo -e ' pgvip service has been successfully created. Use command "systemctl --now enable pgvip" to enable and start service, "journalctl -fu pgvip" to view the journal' pgvip service has been successfully created. Use command "systemctl --now enable pgvip" to enable and start service, "journalctl -fu pgvip" to view the journal [root@web-server pgvip]# systemctl --now enable pgvip Created symlink from /etc/systemd/system/multi-user.target.wants/pgvip.service to /etc/systemd/system/pgvip.service. [root@web-server pgvip]# systemctl status pgvip ● pgvip.service - pgvip (PostgreSQL master-standby auto failover) Loaded: loaded (/etc/systemd/system/pgvip.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2025-02-15 10:51:33 EST; 2s ago Main PID: 1546 (pgvip) CGroup: /system.slice/pgvip.service ├─1546 /usr/bin/pgvip execute ├─1553 timeout 5 ssh 192.168.56.1 su - postgres -c "psql -Aqtc 'select pg_is_in_recovery() or pg_current_xact_id() is null, extract(epoch from now()-pg_last_xact_replay_timestamp())'" ├─1554 timeout 5 ssh 192.168.56.1 flock /tmp/pgvip.time -c 'date +%s > /tmp/pgvip.time' ; flock -n /tmp/pgvip.vip_auto_down -c '' ; echo -n $?, ; ip address show | awk '/ inet 192.168.5... ├─1555 timeout 5 ssh 192.168.56.2 su - postgres -c "psql -Aqtc 'select pg_is_in_recovery() or pg_current_xact_id() is null, extract(epoch from now()-pg_last_xact_replay_timestamp())'" ├─1556 timeout 5 ssh 192.168.56.2 flock /tmp/pgvip.time -c 'date +%s > /tmp/pgvip.time' ; flock -n /tmp/pgvip.vip_auto_down -c '' ; echo -n $?, ; ip address show | awk '/ inet 192.168.5... ├─1557 ssh 192.168.56.1 su - postgres -c "psql -Aqtc 'select pg_is_in_recovery() or pg_current_xact_id() is null, extract(epoch from now()-pg_last_xact_replay_timestamp())'" ├─1558 ssh 192.168.56.1 flock /tmp/pgvip.time -c 'date +%s > /tmp/pgvip.time' ; flock -n /tmp/pgvip.vip_auto_down -c '' ; echo -n $?, ; ip address show | awk '/ inet 192.168.56.10// {p... ├─1559 ssh 192.168.56.2 su - postgres -c "psql -Aqtc 'select pg_is_in_recovery() or pg_current_xact_id() is null, extract(epoch from now()-pg_last_xact_replay_timestamp())'" └─1560 ssh 192.168.56.2 flock /tmp/pgvip.time -c 'date +%s > /tmp/pgvip.time' ; flock -n /tmp/pgvip.vip_auto_down -c '' ; echo -n $?, ; ip address show | awk '/ inet 192.168.56.10// {p... Feb 15 10:51:33 web-server systemd[1]: Started pgvip (PostgreSQL master-standby auto failover). Feb 15 10:51:33 web-server systemd[1]: Starting pgvip (PostgreSQL master-standby auto failover)... Feb 15 10:51:33 web-server pgvip[1546]: INFO ACTION_EXECUTOR thread started, thread_id: 1551 Feb 15 10:51:33 web-server pgvip[1546]: INFO ADMINISTRATION thread started, thread_id: 1552 Feb 15 10:51:33 web-server pgvip[1546]: INFO ADMINISTRATION binded to unix socket "/tmp/.s.PGVIP.0" Feb 15 10:51:33 web-server pgvip[1546]: INFO ADMINISTRATION listening Feb 15 10:51:33 web-server pgvip[1546]: INFO CHECKER_MASTER_DB thread started, thread_id: 1547 Feb 15 10:51:33 web-server pgvip[1546]: INFO CHECKER_MASTER_VIP thread started, thread_id: 1548 Feb 15 10:51:33 web-server pgvip[1546]: INFO CHECKER_STANDBY_DB thread started, thread_id: 1549 Feb 15 10:51:33 web-server pgvip[1546]: INFO CHECKER_STANDBY_VIP thread started, thread_id: 1550 [root@web-server pgvip]# pgvip status status: SUCCESS (duration: 0 00:00:00) +---------+--------------+-------------------------+---------------+---------------+ | Role | IP address | Database state | 192.168.56.10 | VIP auto down | +---------+--------------+-------------------------+---------------+---------------+ | master | 192.168.56.1 | read-write | up on enp0s3 | executing | | standby | 192.168.56.2 | in recovery, lag 0 min. | down | executing | +---------+--------------+-------------------------+---------------+---------------+ [root@web-server pgvip]# journalctl -fu pgvip Feb 15 10:54:50 web-server pgvip[1767]: INFO ADMINISTRATION connection accepted Feb 15 10:54:50 web-server pgvip[1767]: INFO ADMINISTRATION received command "status" Feb 15 10:54:50 web-server pgvip[1767]: INFO ADMINISTRATION info sent Feb 15 10:54:50 web-server pgvip[1767]: INFO ACTION_EXECUTOR output: Feb 15 10:54:50 web-server pgvip[1767]: INFO ACTION_EXECUTOR Feb 15 10:54:50 web-server pgvip[1767]: INFO ACTION_EXECUTOR command executed successfully Feb 15 10:54:50 web-server pgvip[1767]: INFO ACTION_EXECUTOR action "action_master_vip_up" executed successfully Feb 15 10:54:52 web-server pgvip[1767]: INFO ADMINISTRATION connection accepted Feb 15 10:54:52 web-server pgvip[1767]: INFO ADMINISTRATION received command "status" Feb 15 10:54:52 web-server pgvip[1767]: INFO ADMINISTRATION info sent [root@web-server pgvip]# pgvip show config PGVIP is a PostgreSQL master-standby auto failover service via a virtual IP address version 25.1.1, linux 64 bits Runtime configuration IP addresses ip_master: 192.168.56.1 ip_standby: 192.168.56.2 ip_virtual: 192.168.56.10 ip_subnet_mask: 24 OS command templates command_ssh: timeout ${time_command_ssh_timeout} ssh ${ip} "${command}" command_db_state: su - postgres -c "psql -Aqtc 'select pg_is_in_recovery() or pg_current_xact_id() is null, extract(epoch from now()-pg_last_xact_replay_timestamp())'" command_db_break: su - postgres -c "psql -c 'alter system set listen_addresses=pgvip_broken'" ; ps -e | awk '$4=="postmaster" || $4=="postgres" {print $1}' | xargs -r kill -9 command_db_promote: su - postgres -c "$(pg_config --bindir)/pg_ctl promote -W -D $(psql -Aqtc 'show data_directory')" command_ifname: ip address show | awk '/ inet ${ip}// {print $NF}' command_vip_state: flock /tmp/pgvip.time -c 'date +%s > /tmp/pgvip.time' ; flock -n /tmp/pgvip.vip_auto_down -c '' ; echo -n $?, ; ${command_ifname} command_vip_up: ip address add ${ip_virtual}/${ip_subnet_mask} dev `${command_ifname}` command_vip_down: ip address del ${ip_virtual}/${ip_subnet_mask} dev `${command_ifname}` command_vip_auto_down_script: delta=0 ; while [ $delta -lt ${time_vip_auto_down_duration} ] ; do set -x; date +%Y-%m-%d_%H:%M:%S ; sleep ${time_check_interval} ; delta=$(($(date +%s)-$(flock /tmp/pgvip.time cat /tmp/pgvip.time))) ; if [ $delta -gt ${time_vip_auto_down_timeout} ] ; then ${command_vip_down} ; fi ; done command_vip_auto_down_execute: flock -w 1 /tmp/pgvip.vip_auto_down -c "${command_vip_auto_down_script} > /dev/null 2>&1 &" Timings in seconds time_command_ssh_timeout: 5 time_check_interval: 5 time_check_not_available_delay: 15 time_standby_allowable_lag: 300 time_standby_promote_delay: 30 time_vip_auto_down_timeout: 30 time_vip_auto_down_duration: 600 time_monitoring_interval: 15 Generated OS commands command_master_db_state: timeout 5 ssh 192.168.56.1 "su - postgres -c "psql -Aqtc 'select pg_is_in_recovery() or pg_current_xact_id() is null, extract(epoch from now()-pg_last_xact_replay_timestamp())'"" command_master_db_break: timeout 5 ssh 192.168.56.1 "su - postgres -c "psql -c 'alter system set listen_addresses=pgvip_broken'" ; ps -e | awk '$4=="postmaster" || $4=="postgres" {print $1}' | xargs -r kill -9" command_master_vip_state: timeout 5 ssh 192.168.56.1 "flock /tmp/pgvip.time -c 'date +%s > /tmp/pgvip.time' ; flock -n /tmp/pgvip.vip_auto_down -c '' ; echo -n $?, ; ip address show | awk '/ inet 192.168.56.10\// {print $NF}'" command_master_vip_up: timeout 5 ssh 192.168.56.1 "ip address add 192.168.56.10/24 dev `ip address show | awk '/ inet 192.168.56.1\// {print $NF}'`" command_master_vip_down: timeout 5 ssh 192.168.56.1 "ip address del 192.168.56.10/24 dev `ip address show | awk '/ inet 192.168.56.10\// {print $NF}'`" command_master_vip_auto_down_execute: timeout 5 ssh 192.168.56.1 "flock -w 1 /tmp/pgvip.vip_auto_down -c "delta=0 ; while [ \$delta -lt 600 ] ; do set -x; date +%Y-%m-%d_%H:%M:%S ; sleep 5 ; delta=\$((\$(date +%s)-\$(flock /tmp/pgvip.time cat /tmp/pgvip.time))) ; if [ \$delta -gt 30 ] ; then ip address del 192.168.56.10/24 dev \`ip address show | awk '/ inet 192.168.56.10\\// {print \$NF}'\` ; fi ; done > /dev/null 2>&1 &"" command_standby_db_state: timeout 5 ssh 192.168.56.2 "su - postgres -c "psql -Aqtc 'select pg_is_in_recovery() or pg_current_xact_id() is null, extract(epoch from now()-pg_last_xact_replay_timestamp())'"" command_standby_db_promote: timeout 5 ssh 192.168.56.2 "su - postgres -c "\$(pg_config --bindir)/pg_ctl promote -W -D \$(psql -Aqtc 'show data_directory')"" command_standby_vip_state: timeout 5 ssh 192.168.56.2 "flock /tmp/pgvip.time -c 'date +%s > /tmp/pgvip.time' ; flock -n /tmp/pgvip.vip_auto_down -c '' ; echo -n $?, ; ip address show | awk '/ inet 192.168.56.10\// {print $NF}'" command_standby_vip_up: timeout 5 ssh 192.168.56.2 "ip address add 192.168.56.10/24 dev `ip address show | awk '/ inet 192.168.56.2\// {print $NF}'`" command_standby_vip_down: timeout 5 ssh 192.168.56.2 "ip address del 192.168.56.10/24 dev `ip address show | awk '/ inet 192.168.56.10\// {print $NF}'`" command_standby_vip_auto_down_execute: timeout 5 ssh 192.168.56.2 "flock -w 1 /tmp/pgvip.vip_auto_down -c "delta=0 ; while [ \$delta -lt 600 ] ; do set -x; date +%Y-%m-%d_%H:%M:%S ; sleep 5 ; delta=\$((\$(date +%s)-\$(flock /tmp/pgvip.time cat /tmp/pgvip.time))) ; if [ \$delta -gt 30 ] ; then ip address del 192.168.56.10/24 dev \`ip address show | awk '/ inet 192.168.56.10\\// {print \$NF}'\` ; fi ; done > /dev/null 2>&1 &"" [root@web-server /tmp/pgvip]#
Patroni is a cluster and is better in every aspect than PGVIP, which performs a one-time switchover and is not a cluster.
However, may be considered using PGVIP instead of Patroni in the following situations:
- no database administrator (DBA), entry-level system administrator (junior)
- unacceptability of transaction data loss when using non-synchronous standby, the need to manually transfer data after failover
- standby database with a lag (recovery_min_apply_delay parameter)
- applying WAL segment files from backup to standby (restore_command parameter) without creating a replication slot on master
- direct network connection to the database without using intermediate levels (HAProxy)
- lack of 3 physical devices required to create a quorum from etcd/Zookeeper/Consul
- lower computing power of the standby server compared to the master
It is possible to configure Keepalived to auto-switch master-standby.
Keepalived transfers the virtual IP address based on service availability, so different situations need to be considered when configuring:
- standby database was activated (promoted), after the reboot the master server booted first
- loss of connection between servers
- large lag standby
- and the like
PGVIP performs actions based on information from the master and standby servers. Parametrs in one configuration file that contains all commands.
When PGVIP installed on one web server, split-brain is impossible by definition.
When installed on two web servers, split-brain is unlikely, because when two databases are detected from any web server in read-write state, the master database stops without the ability to start:
- listen_addresses configuration parameter is set to an invalid value
- postmaster and postgres processes terminate in the operating system
- Any standby database - there are no requirements for a standby database. The standby database may have a lag (recovery_min_apply_delay parameter), using WAL segment files from backup (restore_command parameter) without creating a replication slot
- Direct connection - the virtual IP address is activated (up) on the network interface with the physical address, connection to the database occurs without intermediate layers
- No DBA required - installation and configuration can be performed by a system administrator at the junior level, modification - by a system administrator at the middle level
- Easy installation - installation consists of two operations: setting up ssh keys and executing the make install command
- Making is optional - making has already been completed, because does not depend on Linux versions and does not require additional libraries. If necessary, executed with one gcc command
- All commands are customizable - pgvip service only executes operating system commands from the configuration file. To check, the administrator can execute any in command line
- Obvious action log - When an action is executed, printed to log: action name, condition, status (pgvip status), OS command with output. The action name and condition are actual C code snippets from the action.c file, obtained by the macro #define #[name]
- Actions without history - actions are performed only depending on the current status, it is possible not to save information about the state of databases and do not synchronize web servers
- Large projects - use on large projects is not recommended due PGVIP is not a cluster and requires manual recovery. Large projects involve DBA who will set up the cluster (for example, Patroni)
- Delay before switching - In case of temporary unavailability of the master server (for example, during a reboot), there is a delay before activation standby database
- Manual setup and recovery - administrator performs initial setup of the standby database and reverse switch manually, usually pg_basebackup is used
- Linux - only Linux operating system is supported
- Using WEB servers - installation is performed on web servers, not on database servers