Rescue and rebuild
Even the best system administrators may need to deal with unplanned events in the operation of their services. Cloud Manager provides recovery tools that you can leverage if you are having trouble connecting to one of the Compute Instances, and this guide describes those tools:
-
You can boot your Compute Instance in Rescue Mode to perform system recovery tasks and transfer data off the disks, if necessary.
-
If you are unable to resolve the system issues, you can rebuild the Compute Instance from a backup or start over with a fresh Linux distribution.
Troubleshooting resources
While this guide outlines the recovery tools available to you, it does not provide a specific troubleshooting strategy. Our other guides offer a logical progression of steps you can follow when troubleshooting different symptoms:
-
If you are not able to establish basic network connections with the Compute Instance, then review the Troubleshooting basic connection issues on Compute Instances guide.
-
If you can ping the Compute Instance but can't access SSH, follow the Troubleshooting SSH on Compute Instances guide.
-
If you can access SSH but are experiencing an outage with a web server or other service, review Troubleshooting web servers, databases, and other services.
-
For an overview of all these issues and answers to other questions, see the Troubleshooting overview.
Rescuing
Rescue Mode is a safe environment for performing many system recovery and disk management tasks. Rescue Mode is based on the Finnix recovery distribution, a self-contained and bootable Linux distribution. Within Rescue Mode, you can mount a Compute Instance's disks and run Linux commands to further troubleshoot issues. You can also use Rescue Mode for tasks other than disaster recovery, such as:
-
Formatting disks to use different filesystems
-
Copying data between disks
-
Downloading files from a disk through SSH and SFTP
Rescue Mode overview
To access Rescue Mode, you need to reboot your Compute Instance from Cloud Manager and then connect through Lish or SSH. After you connect, you can perform a check on your file system if you suspect that it is corrupted. If you need access to a certain software package to troubleshoot the system, you can install it.
Disks are not mounted by default and need to be mounted manually before you can access your files. After you mount the primary filesystem, you can change root to have Rescue Mode emulate normal Linux distribution.
Boot into Rescue Mode
To boot a Compute Instance into Rescue Mode, follow the instructions below.
-
Log in to Cloud Manager.
-
Click the Linodes link in the sidebar:
-
Click on the more options ellipsis next to the Compute Instance that you wish to boot in to Rescue Mode, and click on the Rescue option to open the Rescue form:
-
In the Rescue form, select the disks you want to mount:
Make a note of which devices the disks are assigned to (e.g.
/dev/sda
,/dev/sdb
, etc). For example, in the screenshot shown above, the Ubuntu disk corresponds to/dev/sda
. These assignments are where you can mount the disks from inside Rescue Mode. -
If you need to assign additional disks to be accessible inside Rescue Mode, click the Add Disk option:
You can assign up to 7 disks in Rescue Mode.
/dev/sdh
is always assigned to the Finnix recovery distribution.As a best practice, review the names that your Compute Instance's disks are using in your configuration profile (
/dev/sda
,/dev/sdb
, etc.) and match those names to the device assignments you specify in the Rescue form before starting Rescue Mode.Matching these names is especially important if you need to change root within Rescue Mode. The chroot will be able to read your Compute Instance's
/etc/fstab
file, which defines where and how your Compute Instance mounts its disks when booting up, to automatically apply the correct mount options and mount directories to your disks.A mismatch in the names of your disks between your Compute Instance's configuration profile and your Rescue Mode environment may cause the chroot to mount these disks in the wrong location or with the wrong mount options. As a result, it is important to ensure that these names match.
Disks are not mounted by default in Rescue Mode and will need to be mounted manually. See Mounting disks for instructions on mounting individual disks.
-
Click the Reboot into Rescue Mode button. The Compute Instance reboots into Rescue Mode, and the progress percentage appears. When the Compute Instance appears as Running again, proceed to Connecting to a Compute Instance running in Rescue Mode.
Connecting to a Compute Instance running in Rescue Mode
By default, Rescue Mode's Finnix environment does not accept SSH connections. To access the Compute Instance when it's running in Rescue Mode, connect to it through the Lish console.
It is possible to enable SSH for Rescue Mode by manually starting the SSH daemon. Using SSH can provide a better experience and lets you copy files off of the server. Review the Starting SSH section for instructions. You need to use Lish at least once in order to start SSH.
To connect with Lish:
-
From the Compute Instance's detail page, click the Launch Console button:
-
A new window appears which displays your Lish console, a
Welcome to Finnix!
message, and a root prompt:
See Access your system console using Lish for further explanation of the Lish console and alternative methods for accessing it, including from your computer's terminal application.
Starting SSH
The Finnix recovery distribution does not automatically start an SSH server, but you can enable one manually. This is useful if your Compute Instance does not boot and you need to copy files off of the disks. You can also copy entire disks over SSH. To start SSH:
-
Open the Lish Console for your Compute Instance.
-
Set the
root
password for the Finnix rescue environment by entering the following command:passwd
This root password is separate from the root password of the disk that you normally boot from. Setting the root password for Finnix does not affect the root account of the distribution.
-
Enter the new password for the
root
user. -
Start the SSH server:
service ssh start
You can now connect to the server as root with the SSH client on a local computer. You can also access mounted disks with an SFTP client:
-
For instructions on connecting with an SFTP client, see the file transfer reference manuals.
-
For instructions on copying an entire disk over SSH, see Copy a disk over SSH.
Performing a file system check
You can use the e2fsck
system utility (short for "ext file system check") to check the consistency of filesystems and repair any damage detected on ext file systems. If you suspect that the Compute Instance's filesystem is corrupted, run e2fsck
to check for and repair any damage on most disks:
-
Enter the
df -h
command to verify that the primary disks are not currently mounted:Filesystem Size Used Avail Use% Mounted on udev 1.9G 0 1.9G 0% /dev tmpfs 395M 516K 394M 1% /run /dev/sr0 503M 503M 0 100% /run/live/medium /dev/loop0 426M 426M 0 100% /run/live/rootfs/filesystem.squashfs tmpfs 2.0G 17M 2.0G 1% /run/live/overlay overlay 2.0G 17M 2.0G 1% / tmpfs 2.0G 0 2.0G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 4.0M 0 4.0M 0% /sys/fs/cgroup tmpfs 2.0G 0 2.0G 0% /tmp tmpfs 395M 0 395M 0% /run/user/0pressed_root unionfs 739M 1016K 738M 1% / devtmpfs 10M 0 10M 0% /d
The primary disks should not appear in the list. In the example screenshot from the Boot into Rescue Mode section, the Ubuntu 18.04 disk is assigned to
/dev/sda
. Because this device does not appear in the example output fromdf -h
, run a filesystem check on it.Never run
e2fsck
on a mounted disk. Do not continue unless you're sure that the target disk is unmounted. -
Run
e2fsck
by entering the following command, replacing/dev/sda
with the location of the disk you want to check and repair:e2fsck -f /dev/sda
-
If no problems are detected,
e2fsck
displays the tests it performed:e2fsck 1.45.6 (20-Mar-2020) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/sda: 44611/2564096 files (0.1% non-contiguous), 602550/10240000 blocks
-
If
e2fsck
determines that there is a problem with the filesystem, it prompts you to fix problems as they are found during each test:e2fsck 1.45.6 (20-Mar-2020) ext2fs_open2: Bad magic number in super-block e2fsck: Superblock invalid, trying backup blocks... Resize inode not valid. Recreate<y>?
Press enter to automatically attempt to fix the problems.
After the filesystem check completes, any problems detected should be fixed. Try rebooting the Compute Instance from Cloud Manager. If
e2fsck
fixed the issues, the Compute Instance should boot normally.
Installing packages
The Finnix recovery distribution is based on Debian, so you can use the APT package management system to install additional software packages in the temporary rescue environment. For example, you could install and run the nmon
utility by using the following commands:
apt update
apt install nmon
nmon
The software packages you install is available as long as the Compute Instance is running in Rescue Mode.
Mounting disks
Before you mount the disk check the location of the root partition in the
/etc/fstab
file and update it accordingly. In the following example/dev/sda
is the location of the disk. For more information, see the Update your fstab guide.
By default, your disks are not mounted when your Compute Instance boots into Rescue Mode. However, you can manually mount a disk under Rescue Mode to perform system recovery and maintenance tasks.
These instructions mount the /dev/sda
disk. If you are mounting a different disk, replace sda
with the name of your disk throughout these instructions.
-
Create a new directory for your disk:
mkdir -p /media/sda
-
Mount the disk to make its contents available at the
/media/sda
directory:mount -o barrier=0 /dev/sda /media/sda
-
View the contents of the disk to confirm you can access them:
ls /media/sda
You can now read and write to files on the mounted disk.
You can unmount your disk by running the
unmount
command. You may want to unmount your disk to run a file system check, for example.The
umount
command requires you to specify the device you want to unmount. You may specify this device in one of two ways:
Specify the device name itself:
umount /dev/sda
Specify the mount directory:
umount /media/sda
If you would like to mount or unmount additional disks on your system, repeat these instructions with the appropriate substitutions.
Change root
Changing root is the process of changing your working root directory. When you change root (abbreviated as chroot) to your root disk, you are able to run commands as though you are logged in to that system.
Chroot lets you change user passwords, remove/install packages, and do other system maintenance and recovery tasks in your Compute Instance's normal Linux environment.
-
Create a new directory for your disk:
mkdir -p /media/sda
-
Before you use chroot, mount the root disk:
mount -o exec,barrier=0 /dev/sda /media/sda
If you mounted your disk without using the
exec
option before reviewing this section, include theremount
option in yourmount
command:mount -o remount,exec,barrier=0 /dev/sda /media/sda
-
To create the chroot, you need to mount the temporary filesystems:
cd /media/sda mount -t proc proc proc/ mount -t sysfs sys sys/ mount -o bind /dev dev/ mount -t devpts pts dev/pts/
-
Chroot to your disk:
chroot /media/sda /bin/bash
-
Your Compute Instance may expect its other disks to be mounted in specific directories during its regular operations. These disks and their expected directories are defined in the
/etc/fstab
file. In order for these directories to be accessible within the chroot, you need to mount them from within the chroot. For example, if your Compute Instance defines/dev/sdc
in its/etc/fstab
, you can the following command to mount it:mount /dev/sdc
This
mount
command only specifies a disk name without specifying a mount point. This causesmount
to use the/etc/fstab
file in the chroot to determine the mount point and apply the correct mount options.As a result, this command depends on you having made these disks available to your Rescue Mode environment under the same names that they use in your configuration profile.
If these names do not match, mounting your disks using only a device name will either fail completely, mount them at the wrong directory, and/or apply the wrong mount options to them.
The easiest way to alleviate this problem is by starting a new Rescue Mode session from Cloud Manager which properly matches these disk names between your Rescue Mode environment and your configuration profile.
-
To exit the chroot and get back to Finnix type "exit" :
exit
Rebuilding
If you can't rescue and resolve issues on an existing disk or if you want to enable or disable disk encryption, you can rebuild the Compute Instance. Rebuilding the Compute Instance is the process of starting over with a set of known-good disks that you can boot from. There are a few different ways you can do this:
-
If you are subscribed to the Backups service, you can restore from an existing backup and return the Compute Instance to a previous state.
-
If you aren't subscribed to the Backups Service, you can copy files off an existing disk and then use the Rebuild feature of Cloud Manager to erase everything and start over again from scratch.
-
If you have a backup system other than the Backups Service in place, you can rebuild your Compute Instance and then restore the data from that backup service. The methods for restoring data varies by the kind of backup system that you use.
Did an unauthorized intruder gain access to your Compute Instance? Since it is virtually impossible to determine the full scope of an attacker's reach into a compromised system, you should never continue using a compromised system.
For recovery instructions, see Recovering from a system compromise. You need to create a new Compute Instance, copy your existing data from the old Compute Instance to the new one, and then swap IP addresses.
Restoring from a backup
If you previously enabled the Backups service, you may be able to restore one of the backups to the Compute Instance. See Restore a backup to an existing Compute Instance for instructions.
If you created backups with an application other than the Backups Service, review the application's instructions to restore a backup to the Compute Instance.
Use the rebuild feature
Cloud Manager provides a Rebuild feature performs the following two actions:
-
The current disks are removed.
-
A new set of disks is provisioned from one of Cloud Manager's built-in Linux images, or from one of the saved images.
If you use the Rebuild feature, the data from the disks that are deleted are not retrievable. You may back up your data manually or create a snapshot using the Backups service to preserve data before using the Rebuild feature.
If you'd like to deploy a new Linux distribution without erasing the existing disks, follow the instructions in the Creating a Disk guide. This is a better option if you need to create a new distribution, but also need to save the existing data.
The Compute Instance needs to have some amount of unallocated disk space in order to provision a new distribution. If the Compute Instance does not have enough unallocated space, you can shrink your existing disks to free up space or resize your Compute Instance to a higher resource tier.
If you need to copy files from your existing disk to another location before rebuilding, you can start SSH under Rescue Mode and then use an SFTP client to copy files to your computer.
To use the Rebuild feature:
-
If you need to copy files from existing disk to another location before rebuilding, you can start SSH under Rescue Mode and then use an SFTP client to copy files to your computer, another server, or somewhere else.
-
Log in to Cloud Manager.
-
Click on the Linodes link in the sidebar:
-
Click on the more options ellipsis next to the Compute Instance that will be rebuilt, and click on the Rebuild option to open the Rebuild form:
-
Complete the Rebuild form. Select an image or StackScript to deploy and enter a root password. Optionally, select one or more SSH keys (if you have not added any SSH Keys via Cloud Manager, this option does not appear). The Encrypt Disk setting for Compute Instances attached to an LKE node pool can not be changed. For Compute Instances in distributed regions, the disk encryption setting is always enabled. For more information on this feature, see Disk encryption.
-
Click on Rebuild button after completing the form:
-
The Compute Instance may take several minutes to complete the rebuild process. Select the Compute Instance that is being rebuilt and select the
Activity Feed
tab to monitor rebuild progress and confirm that the rebuild has been completed:
Updated about 1 month ago