Use keepalived health checks with BGP-based failover
Keepalived is one of the most commonly used applications that implements VRRP, a networking protocol that manages IP address assignment and ARP-based failover. It can be configured with additional health checks, such as checking the status of a service or running a custom script. When one of these health checks detects an issue, the Compute Instance changes to a fault state and failover is triggered. During these state transitions, additional task can be performed through custom scripts.
Our platform is currently undergoing network infrastructure upgrades, which affects IP address assignment and failover. Once this upgrade occurs for the data center and hardware that your Compute Instances reside on, VRRP software like keepalived can no longer directly manage failover. However, other features of keepalived can still be used. For instance, keepalived can continue to run health checks or VRRP scripts. It can then be configured to interact with whichever BGP daemon your system is using to manage IP address assignment and failover.
This guide covers how to configure keepalived with a simple health check and enable it to control lelastic, a BGP daemon created for the platform.
If you are migrating to BGP-based failover and currently have health checks configured with keepalived, you can modify the steps in this guide to include your own settings.
Configure IP sharing and BGP failover
Before continuing, IP Sharing and BGP failover must be properly configured on both Compute Instances. To do this, follow the Configure failover on a Compute Instance guide, which walks you through the process of configuring failover with lelastic. If you decide to use a tool other than lelastic, you will need to make modifications to some of the commands or code examples provided in some of the following sections.
Install and configure keepalived
This section covers installing the keepalived software from your distribution's repository. See Installing Keepalived on the official documentation if you prefer to install it from source.
-
Log in to your Compute Instance over SSH. See Connecting to a remote server over SSH for assistance.
-
Install keepalived by following the instructions for your system's distribution.
Ubuntu and Debian:
sudo apt update && sudo apt upgrade sudo apt install keepalived
CentOS 8 Stream, CentOS/RHL 8 (including derivatives such as AlmaLinux 8 and Rocky Linux 8), Fedora:
sudo dnf upgrade sudo dnf install keepalived
CentOS 7:
sudo yum update sudo yum install keepalived
-
Create and edit a new keepalived configuration file.
sudo nano /etc/keepalived/keepalived.conf
-
Enter the following settings for your configuration into this file. Use the example below as a starting point, replacing each item below with the appropriate values for your Compute Instance.
-
$password: A secure password to use for this keepalived configuration instance. The same password must be used for each Compute Instance you configure.
-
$ip-a: The IP address of this Compute Instance.
-
$ip-b: The IP address of the other Compute Instance.
-
$ip-shared: The Shared IP address.
vrrp_instance example_instance { state BACKUP nopreempt interface eth0 virtual_router_id 10 priority 100 advert_int 1 authentication { auth_type PASS auth_pass $password } unicast_src_ip $ip-a unicast_peer { $ip-b } virtual_ipaddress { $ip-shared/32 } }
In the above configuration file, the state is set to BACKUP and the parameter
nopreempt
is included. When each Compute Instance uses these settings, failover is sticky. This means the Shared IP address remains routed to a Compute Instance until it enters a FAULT state, even if it is lower priority than the other Compute Instance. If you wish to prioritize one Compute Instance over the other, remove thenopreempt
parameter, set one of the Compute Instances to a MASTER state, and adjust thePRIORITY
parameter as desired. -
-
Enable and start the keepalived service.
sudo systemctl enable keepalived sudo systemctl start keepalived
-
Perform these steps again on the other Compute Instance you would like to configure.
Create the notify script
Keepalived can be configured to run notification scripts when the Compute Instance changes state (such as when entering a MASTER, BACKUP ,or FAULT state). These scripts can perform any action and are commonly used to interact with a service or modify network configuration files. For this guide, the scripts are used to update a log file and start or stop the BGP daemon that controls BGP failover on your Compute Instance.
-
Create and edit the notify script.
sudo nano /etc/keepalived/notify.sh
-
Copy and paste the following bash script into the newly created file. If you wish to control a BGP daemon other than lelastic, replace
sudo systemctl restart lelastic
andsudo systemctl stop lelastic
with the appropriate commands for your service.#!/bin/bash keepalived_log='/tmp/keepalived.state' function check_state { local state=$1 cat << EOF >> $keepalived_log =================================== Date: $(date +'%d-%b-%Y %H:%M:%S') [INFO] Now $state EOF if [[ "$state" == "Master" ]]; then sudo systemctl restart lelastic else sudo systemctl stop lelastic fi } function main { local state=$1 case $state in Master) check_state Master;; Backup) check_state Backup;; Fault) check_state Fault;; *) echo "[ERR] Provided arguement is invalid" esac } main $1
-
Make the file executable.
sudo chmod +x /etc/keepalived/notify.sh
-
Modify the keepalived configuration files so that the notify script is used for each state change.
vrrp_instance example_instance { ... notify_master "/etc/keepalived/notify.sh Master" notify_backup "/etc/keepalived/notify.sh Backup" notify_fault "/etc/keepalived/notify.sh Fault" }
-
Restart your BGP daemon and keepalived.
sudo systemctl restart lelastic sudo systemctl restart keepalived
-
View the log file to see if it was properly created and updated. If the notification script was successfully used, this log file should have an accurate timestamp and the current state of the Compute Instance.
cat /tmp/keepalived.state
=================================== Date: 14-Oct-2022 14:30:54 [INFO] Now Master
Configure the health check (VRRP script)
The next step is to configure keepalived with a health check so that it can failover if it ever detects an issue. This is the primary reason you may want to use keepalived alongside a BGP daemon. Keepalived can be configured to track a file (track_file
), track a process (track_process
), or run a custom script so that you can preform more complex health checks. When using a script, like is shown in this example, the script should return a 0
to indicate success and return any other value to indicate a failure. When a failure is detected, the state is changed to FAULT and the notify script runs.
This guide helps you configure a custom script that detects if a file is present or not. If the file is present, the script returns a 1 to indicate a failure.
-
Create and edit the health check script.
sudo nano /etc/keepalived/check.sh
-
Copy the following script and paste it into the file.
#!/bin/bash trigger='/etc/keepalived/trigger.file' if [ -f $trigger ]; then exit 1 else exit 0 fi
-
Make the file executable.
sudo chmod +x /etc/keepalived/check.sh
-
Update the keepalived configuration file to define the VRRP script and enable your VRRP instance to use the script. The interval determines how often the script is run, fall determines how many times the script must return a failure before the state is changed to FAULT, and rise determines how many times a success is returned before the instance goes back to a BACKUP or MASTER state.
vrrp_script check_for_file { script "/etc/keepalived/check.sh" interval 5 fall 2 rise 2 } vrrp_instance example_instance { ... track_script { check_for_file } ... }
-
Restart your BGP daemon and keepalived.
sudo systemctl restart lelastic sudo systemctl restart keepalived
-
To test this health check, create the trigger file on whichever Compute Instance is in a MASTER state.
touch /etc/keepalived/trigger.file
-
Check the log file on that Compute Instance to make sure it enters a FAULT state. Once it does, check the log file on the other Compute Instance to verify that it enters a MASTER state.
tail -F /tmp/keepalived.state
=================================== Date: 14-Oct-2022 14:30:54 [INFO] Now Master
Additional recommended security settings
By default, keepalived attempts to run the scripts using a keepalived_script user. If that doesn't exist, it uses the root user. Since running these scripts as the root user introduces many security concerns, this section discusses creating the keepalived_script user.
-
Create a limited user account called keepalived_script. Since it is never used to log in, that feature can be disabled.
sudo useradd -r -s /sbin/nologin -M keepalived_script
-
Edit the
sudoers
file.visudo /etc/sudoers
-
Within this file, grant permission for the new user to restart and stop the BGP daemon. The example below uses lelastic.
# User privilege specification root ALL=(ALL:ALL) ALL keepalived_script ALL=(ALL:ALL) NOPASSWD: /usr/bin/systemctl restart lelastic, /usr/bin/systemctl stop lelastic
-
Update the ownership of the
/etc/keepalived
directory (and all of the files within it).sudo chown -R keepalived_script:keepalived_script /etc/keepalived
-
Once again, edit the keepalived configuration file and paste the following snippet to the top of that file.
global_defs { enable_script_security } ...
Example configuration files
The code samples below contain complete working configuration files. Please review them if you would like to see all of the recommended settings for each Compute Instance combined into a single file.
- Shared IP: 203.0.113.57 (configured on the loopback interface)
- Compute Instance A: 192.0.2.173
- Compute Instance B: 198.51.100.49
global_defs {
enable_script_security
}
vrrp_script check_for_file {
script "/etc/keepalived/check.sh"
interval 5
fall 2
rise 2
}
vrrp_instance example_instance {
state BACKUP
nopreempt
interface eth0
virtual_router_id 10
priority 99
advert_int 1
track_script {
check_for_file
}
authentication {
auth_type PASS
auth_pass dT409gtNjMiS
}
unicast_src_ip 192.0.2.173
unicast_peer {
198.51.100.49
}
virtual_ipaddress {
203.0.113.57/32 dev lo
}
notify_master "/etc/keepalived/notify.sh Master"
notify_backup "/etc/keepalived/notify.sh Backup"
notify_fault "/etc/keepalived/notify.sh Fault"
}
global_defs {
enable_script_security
}
vrrp_script check_for_file {
script "/etc/keepalived/check.sh"
interval 5
fall 2
rise 2
}
vrrp_instance example_instance {
state BACKUP
nopreempt
interface eth0
virtual_router_id 10
priority 99
advert_int 1
track_script {
check_for_file
}
authentication {
auth_type PASS
auth_pass dT409gtNjMiS
}
unicast_src_ip 198.51.100.49
unicast_peer {
192.0.2.173
}
virtual_ipaddress {
203.0.113.57/32 dev lo
}
notify_master "/etc/keepalived/notify.sh Master"
notify_backup "/etc/keepalived/notify.sh Backup"
notify_fault "/etc/keepalived/notify.sh Fault"
}
Updated about 1 month ago