Use keepalived health checks with BGP-based failover

Keepalived is one of the most commonly used applications that implements VRRP, a networking protocol that manages IP address assignment and ARP-based failover. It can be configured with additional health checks, such as checking the status of a service or running a custom script. When one of these health checks detects an issue, the Linode changes to a fault state and failover is triggered. During these state transitions, additional task can be performed through custom scripts.

Our platform is currently undergoing network infrastructure upgrades, which affects IP address assignment and failover. Once this upgrade occurs for the data center and hardware that your Linodes reside on, VRRP software like keepalived can no longer directly manage failover. However, other features of keepalived can still be used. For instance, keepalived can continue to run health checks or VRRP scripts. It can then be configured to interact with whichever BGP daemon your system is using to manage IP address assignment and failover.

This guide covers how to configure keepalived with a simple health check and enable it to control lelastic, a BGP daemon created for the platform.

📘
If you are migrating to BGP-based failover and currently have health checks configured with keepalived, you can modify the steps in this guide to include your own settings.

Configure IP sharing and BGP failover

Before continuing, IP Sharing and BGP failover must be properly configured on both Linodes. To do this, follow the Configure failover on a Linode guide, which walks you through the process of configuring failover with lelastic. If you decide to use a tool other than lelastic, you will need to make modifications to some of the commands or code examples provided in some of the following sections.

Install and configure keepalived

This section covers installing the keepalived software from your distribution's repository. See Installing Keepalived on the official documentation if you prefer to install it from source.

Log in to your Linode over SSH. See Connecting to a remote server over SSH for assistance.
Install keepalived by following the instructions for your system's distribution.

Ubuntu and Debian:
```
sudo apt update && sudo apt upgrade
sudo apt install keepalived
```
CentOS 8 Stream, CentOS/RHL 8 (including derivatives such as AlmaLinux 8 and Rocky Linux 8), Fedora:
```
sudo dnf upgrade
sudo dnf install keepalived
```
CentOS 7:
```
sudo yum update
sudo yum install keepalived
```
Create and edit a new keepalived configuration file.
```
sudo nano /etc/keepalived/keepalived.conf
```
Enter the following settings for your configuration into this file. Use the example below as a starting point, replacing each item below with the appropriate values for your Linode.
- $password: A secure password to use for this keepalived configuration instance. The same password must be used for each Linode you configure.
- $ip-a: The IP address of this Linode.
- $ip-b: The IP address of the other Linode.
- $ip-shared: The Shared IP address.
```
vrrp_instance example_instance {
    state BACKUP
    nopreempt
    interface eth0
    virtual_router_id 10
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass $password
    }
    unicast_src_ip $ip-a
    unicast_peer {
        $ip-b
    }
    virtual_ipaddress {
        $ip-shared/32
    }
}
```
In the above configuration file, the state is set to BACKUP and the parameter nopreempt is included. When each Linode uses these settings, failover is sticky. This means the Shared IP address remains routed to a Linode until it enters a FAULT state, even if it is lower priority than the other Linode. If you wish to prioritize one Linode over the other, remove the nopreempt parameter, set one of the Linodes to a MASTER state, and adjust the PRIORITY parameter as desired.

Enable and start the keepalived service.

sudo systemctl enable keepalived
sudo systemctl start keepalived

Perform these steps again on the other Linode you would like to configure.

Create the notify script

Keepalived can be configured to run notification scripts when the Linode changes state (such as when entering a MASTER, BACKUP ,or FAULT state). These scripts can perform any action and are commonly used to interact with a service or modify network configuration files. For this guide, the scripts are used to update a log file and start or stop the BGP daemon that controls BGP failover on your Linode.

Create and edit the notify script.
```
sudo nano /etc/keepalived/notify.sh
```

Copy and paste the following bash script into the newly created file. If you wish to control a BGP daemon other than lelastic, replace sudo systemctl restart lelastic and sudo systemctl stop lelastic with the appropriate commands for your service.

#!/bin/bash

keepalived_log='/tmp/keepalived.state'
function check_state {
        local state=$1
        cat << EOF >> $keepalived_log
===================================
Date:  $(date +'%d-%b-%Y %H:%M:%S')
[INFO] Now $state

EOF
        if [[ "$state" == "Master" ]]; then
                sudo systemctl restart lelastic
        else
                sudo systemctl stop lelastic
        fi
}

function main {
        local state=$1
        case $state in
        Master)
                check_state Master;;
        Backup)
                check_state Backup;;
        Fault)
                check_state Fault;;
        *)
                echo "[ERR] Provided argument is invalid"
        esac
}
main $1

Make the file executable.

sudo chmod +x /etc/keepalived/notify.sh

Modify the keepalived configuration files so that the notify script is used for each state change.

vrrp_instance example_instance {
    ...
    notify_master "/etc/keepalived/notify.sh Master"
    notify_backup "/etc/keepalived/notify.sh Backup"
    notify_fault "/etc/keepalived/notify.sh Fault"
}

Restart your BGP daemon and keepalived.

sudo systemctl restart lelastic
sudo systemctl restart keepalived

View the log file to see if it was properly created and updated. If the notification script was successfully used, this log file should have an accurate timestamp and the current state of the Linode.
```
cat /tmp/keepalived.state
```
```
===================================
Date:  14-Oct-2022 14:30:54
[INFO] Now Master
```

Configure the health check (VRRP script)

The next step is to configure keepalived with a health check so that it can failover if it ever detects an issue. This is the primary reason you may want to use keepalived alongside a BGP daemon. Keepalived can be configured to track a file (track_file), track a process (track_process), or run a custom script so that you can preform more complex health checks. When using a script, like is shown in this example, the script should return a 0 to indicate success and return any other value to indicate a failure. When a failure is detected, the state is changed to FAULT and the notify script runs.

This guide helps you configure a custom script that detects if a file is present or not. If the file is present, the script returns a 1 to indicate a failure.

Create and edit the health check script.
```
sudo nano /etc/keepalived/check.sh
```

Copy the following script and paste it into the file.

#!/bin/bash

trigger='/etc/keepalived/trigger.file'
if [ -f $trigger ]; then
  exit 1
else
  exit 0
fi

Make the file executable.
```
sudo chmod +x /etc/keepalived/check.sh
```
Update the keepalived configuration file to define the VRRP script and enable your VRRP instance to use the script. The interval determines how often the script is run, fall determines how many times the script must return a failure before the state is changed to FAULT, and rise determines how many times a success is returned before the instance goes back to a BACKUP or MASTER state.
```
vrrp_script check_for_file {
    script "/etc/keepalived/check.sh"
    interval 5
    fall 2
    rise 2
}
vrrp_instance example_instance {
    ...
    track_script {
        check_for_file
    }
    ...
}
```

Restart your BGP daemon and keepalived.

sudo systemctl restart lelastic
sudo systemctl restart keepalived

To test this health check, create the trigger file on whichever Linode is in a MASTER state.
```
touch /etc/keepalived/trigger.file
```
Check the log file on that Linode to make sure it enters a FAULT state. Once it does, check the log file on the other Linode to verify that it enters a MASTER state.
```
tail -F /tmp/keepalived.state
```
```
===================================
Date:  14-Oct-2022 14:30:54
[INFO] Now Master
```

Additional recommended security settings

By default, keepalived attempts to run the scripts using a keepalived_script user. If that doesn't exist, it uses the root user. Since running these scripts as the root user introduces many security concerns, this section discusses creating the keepalived_script user.

Create a limited user account called keepalived_script. Since it is never used to log in, that feature can be disabled.
```
sudo useradd -r -s /sbin/nologin -M keepalived_script
```
Edit the sudoers file.
```
visudo /etc/sudoers
```

Within this file, grant permission for the new user to restart and stop the BGP daemon. The example below uses lelastic.

# User privilege specification
root    ALL=(ALL:ALL) ALL
keepalived_script ALL=(ALL:ALL) NOPASSWD: /usr/bin/systemctl restart lelastic, /usr/bin/systemctl stop lelastic

Update the ownership of the /etc/keepalived directory (and all of the files within it).
```
sudo chown -R keepalived_script:keepalived_script /etc/keepalived
```
Once again, edit the keepalived configuration file and paste the following snippet to the top of that file.
```
global_defs {
    enable_script_security
}
...
```

Example configuration files

The code samples below contain complete working configuration files. Please review them if you would like to see all of the recommended settings for each Linode combined into a single file.

Shared IP: 203.0.113.57 (configured on the loopback interface)
Linode A: 192.0.2.173
Linode B: 198.51.100.49

global_defs {
    enable_script_security
}
vrrp_script check_for_file {
    script "/etc/keepalived/check.sh"
    interval 5
    fall 2
    rise 2
}
vrrp_instance example_instance {
    state BACKUP
    nopreempt
    interface eth0
    virtual_router_id 10
    priority 99
    advert_int 1
    track_script {
        check_for_file
    }
    authentication {
        auth_type PASS
        auth_pass dT409gtNjMiS
    }
    unicast_src_ip 192.0.2.173
    unicast_peer {
    	198.51.100.49
    }
    virtual_ipaddress {
        203.0.113.57/32 dev lo
    }
    notify_master "/etc/keepalived/notify.sh Master"
    notify_backup "/etc/keepalived/notify.sh Backup"
    notify_fault "/etc/keepalived/notify.sh Fault"
}

global_defs {
    enable_script_security
}
vrrp_script check_for_file {
    script "/etc/keepalived/check.sh"
    interval 5
    fall 2
    rise 2
}
vrrp_instance example_instance {
    state BACKUP
    nopreempt
    interface eth0
    virtual_router_id 10
    priority 99
    advert_int 1
    track_script {
        check_for_file
    }
    authentication {
        auth_type PASS
        auth_pass dT409gtNjMiS
    }
    unicast_src_ip 198.51.100.49
    unicast_peer {
    	192.0.2.173
    }
    virtual_ipaddress {
        203.0.113.57/32 dev lo
    }
    notify_master "/etc/keepalived/notify.sh Master"
    notify_backup "/etc/keepalived/notify.sh Backup"
    notify_fault "/etc/keepalived/notify.sh Fault"
}

Updated about 1 year ago