Monitor and maintain a Compute Instance
Once you have a Compute Instance up and running, it's time to think about monitoring and maintaining your server. This guide introduces the essential tools and skills you'll need to keep your server up to date and minimize downtime. You'll learn how to monitor the availability and performance of your system, manage your logs, and update your server's software.
Availability monitoring
The availability of your servers, and the websites and web applications you host on them, can be critically important. If you generate income from a blog or charge subscription fees for your web application, downtime can have a severe impact on your bottom line. Using an availability monitoring tool can help you rapidly detect and resolve service disruptions, thereby mitigating the impact on your websites and web applications.
Assess your needs
Not everyone needs to monitor the availability of their server. For example, if you use your Compute Instance to host a personal picture gallery website for friends and family, the occasional service interruption probably won't bother you. The small inconvenience of your website going offline for a few minutes doesn't justify the time it would take to set up and configure an availability monitoring tool.
If you depend on your website or web application for your livelihood, an availability monitoring tool is practically a necessity. Once set up, the tool actively watches your servers and services and alerts you when they're unavailable. You'll be able to troubleshoot the problem and restore service as quickly as possible.
Whether you use one Compute Instance or dozens of them, mission-critical servers and services should be watched by an independent monitoring tool that can keep tabs on their availability. The tool should have an automated method of detecting service-related incidents and be able to notify you via email, text message, or SMS. That way you'll know that a server or service is down within minutes of it having failed.
Find the right tool
There are several different availability monitoring tools available. Your decision should be based on how many servers you'll be monitoring:
- Multiple Servers: If you run more than one server, the Elastic Stack is an excellent monitoring tool.
- Single Server: If you only run a single server, you might want to use a third-party service to monitor your Compute Instance. You could also use a network diagnostic tool like MTR to diagnose and isolate networking errors.
- Managed: The Managed service lets us manage your infrastructure and provides incident response around the clock.
Configure shutdown watchdog (lassie)
Shutdown Watchdog, also known as Lassie, is a Cloud Manager feature capable of automatically rebooting your Compute Instance if it powers off unexpectedly. Lassie is not technically an availability monitoring tool, but it can help get your Compute Instance back online fast if it's accidentally powered off.
To turn Lassie on and off, see the Recover from unexpected shutdowns with Lassie guide. Once Lassie is enabled, your Compute Instance will automatically reboot if it is unexpectedly powered off in the future.
Performance monitoring
Performance monitoring tools record vital server and service performance metrics. Similar to a vehicle's dashboard, which has gauges for things like speed and oil pressure, performance monitoring tools provide valuable insight into the inner workings of your virtual server. With practice, you'll be able to review this information and determine whether your server is in good health.
Cloud Manager
If you're new to performance monitoring, you can get started by logging in to the Cloud Manager. There are four simple graphs available on the Dashboard and in the Graphs section:
- CPU %: Monitor how your Compute Instance's CPU cores are being utilized. Note that each of your Compute Instance's CPU cores is capable of 100% utilization, which means you could see this graph spike well over 100%, depending on your plan size.
- IPv4 Network Traffic: Keep tabs on how much incoming and outgoing bandwidth your server is using.
- IPv6 Network Traffic: Wondering if any of your visitors are using IPv6? Check this graph to see how much bandwidth has been transferred over IPv6.
- Disk I/O: Watch for disk input/output bottlenecks.
When you first start monitoring the graphs, you won't know what numbers are normal. Don't worry. With time and practice, you'll learn what the graphs are supposed to look like when your server is operating normally. Then you'll be able to spot performance abnormalities before they turn into full-blown problems.
Configure Cloud Manager email alerts
The Cloud Manager allow you to configure email alerts that automatically notify you through email if certain performance thresholds are reached. To learn how to configure these email alerts, see Configure email alerts for resource usage on Compute Instances.
Use third-party tools
The graphs in the Cloud Manager provide basic information for things like CPU utilization and bandwidth consumption. That's good information as far as it goes, but it won't sate the appetite of true geeks who crave detailed statistics on a server's disk, network, system, and service performance. For that kind of information, you'll need to install and configure a third-party performance monitoring tool.
There are several free third-party performance monitoring tools available for your Compute Instance:
- Munin: Munin is a system and network monitoring tool that generates graphs of resource usage in an accessible web based interface. Munin also makes it possible to monitor multiple Compute Instances with a single installation.
- Cacti: If you have advanced monitoring needs, try Cacti. It allows you to monitor larger systems and more complex deployments with its plugin framework and web-based interface.
Managed services
Managed is our monitoring service that offers 24x7 incident response, dashboard metrics for your Compute Instances, free cPanel, and an automatic backup service. If you are running more than one Compute Instance, not all are required to be managed. You can establish separate accounts (e.g., production and development) and monitor only the most critical services running on designated Compute Instance(s). Existing customers can sign up for Managed by contacting Support.
Manage logs
Important events that occur on your system — things like login attempts or services being restarted — are recorded in your server's logs. Similar to car maintenance records and completed tax forms, which provide a paper trail if a problem or discrepancy occurs, log files keep track of system events. You might review logs when troubleshooting errors, tracking usage, or investigating unusual behavior on your system.
Rotate logs
As more and more events are logged, the log files on your server get bigger and bigger. Left unchecked, those files can start consuming a surprising amount of disk space. You can mitigate this problem by using logrotate, a utility that automatically archives and compresses current log files after a certain interval, creates new log files, and deletes old log files after a specified amount of time.
Use the logrotate guide to get started.
Monitor system logs
It's important to keep an eye on the events recorded in your system logs. But unless you're the type of person who loves scanning through hundreds of lines of log entries, you won't want to open log files unless absolutely necessary. Fortunately, there's an easier way to learn about the most important system events fast. Logwatch is a customizable utility that can automatically parse system logs and email you detailed reports highlighting notable events.
Use the Logwatch guide to get started.
Update software
Linux distributions are frequently updated to fix bugs, add new features, and patch security vulnerabilities. To take advantage of the new packages and patches, you'll need to remember to perform some simple steps every once in a while. This section shows you what to do.
Update installed packages
You learned about the importance of regularly updating your server's packages in the Set up and secure a Compute Instance guide. If nothing else, installing updates is a fast and easy way to mitigate vulnerabilities on your server.
To check for software updates and install them in Ubuntu or Debian, enter the following commands, one by one:
apt-get update
apt-get upgrade --show-upgraded
If you're using a distribution other than Ubuntu or Debian, you can learn more about package management by reading An Overview of Package Management in Linux.
There are ways to automate the installation of software updates, but this is not recommended. You should always manually review the lists of available patches before installing updates.
Apply kernel updates
When you first sign up for an account and create a Compute Instance, Cloud Manager automatically creates a configuration profile that uses either the distribution's system kernel (in most cases) or uses the latest available Akamai-supplied kernel.
If your system is using an Akamai-supplied kernel, it's important to know that we update the kernels as necessary and make them available in Cloud Manager. In most cases, new kernels are automatically selected and, once a new kernel is released, all you have to do is reboot your Compute Instance to start using it.
To check for a new kernel and start using it on your Compute Instance:
-
First, check what version kernel your Compute Instance is currently using. Log in to your Compute Instance and execute the following command:
cat /proc/version
-
Examine the output and remember the version number:
Linux version 4.15.12-x86_64-linode105 (maker@build.linode.com) (gcc version 4.9.2 (Debian 4.9.2-10+deb8u1)) #1 SMP Thu Mar 22 02:13:40 UTC 2018
-
Log in to Cloud Manager.
-
Click the Linodes link in the sidebar.
-
Select your Compute Instance. The Compute Instance's details page appears.
-
Select the active configuration profile by clicking the Edit link, as shown below.
-
From the Kernel menu, verify that GRUB 2 is selected:
-
If you selected a new kernel, click Submit. The Compute Instance's dashboard appears.
-
Select Reboot from the status menu to reboot your Compute Instance and start using the new kernel.
Upgrade to a new release
Linux distributions such as Ubuntu and Fedora use version numbers to identify the individual versions, or releases, of the operating system. It's important to know which release your server is running, as releases are usually supported for one or more years. After support for your release is discontinued, you won't be able to download or apply critical security packages, which can put your server at risk.
There are two ways to upgrade a Compute Instance running an unsupported release. You can upgrade your existing server to the next release, or you can create a new Compute Instance with the newest release available and transfer your files from the old server. See our upgrading guides for more information.
Check the distribution's website to learn when support for your release will be discontinued. Ubuntu offers a long-term support (LTS) release that is supported for five years.
Updated about 1 month ago