Data durability

Data durability

Data durability describes the ability of stored data to remain intact, complete, and uncorrupted over time. When considering the design of a storage solution, durability is typically evaluated, and sometimes confused with another very important concept, data availability. Data availability describes the ability to provide reliable access to data.

For example, if a network outage prevents access to a storage endpoint, availability is affected, but durability is not. When the network is restored, availability to the data returns and the data is complete and intact.

Importance of data durability for your data

The level of importance for data durability within a deployment is determined by the importance of that specific data to your business. If loss or corruption of the original media or software would cause a fundamental, unrecoverable business impact, then the durability of that data is critical.

In a deployment where copies of data are stored in multiple locations to improve performance, data durability may have lower importance. In this example, since the loss of a copy of data can be replaced from a copy in another location, durability of each copy of data may not be as critical.

What level of data durability does Akamai Object Storage provide?

Akamai Object Storage type E2 and E3 endpoints are designed to provide 99.999999999% (11 9’s) of data durability for data stored within each endpoint. Each Akamai Object Storage endpoint is hosted within a data center within each region.

How is 11 9’s of data durability achieved?

Within any multi-petabyte storage system with thousands of storage devices, the statistical chance of a hard drive failure becomes a daily occurrence. A common technique to deal with this problem is erasure coding. Akamai Object Storage uses erasure coding in an 8+4 configuration to achieve 11 9’s of data durability.

What is erasure coding?

Erasure coding (EC) is a process where a file is split into multiple parts, however not all the parts are required to reconstruct the file. This means that one or more parts of the file can be lost, but the complete original file can continue to be reconstructed.

This is normally expressed as 2 numbers, the original + the extras. For example 8+4 means that 12 parts are created, however only 8 parts are ever needed to reconstruct the file.

Placement groups

Even with erasure coding in an 8+4 configuration, if all parts were stored on a single disk, and that disk failed, data will be lost. Placement groups help prevent data loss from a single disk failure by ensuring that the system distributes and stores content across available disks, servers, and racks.

Bit rot

The term bit rot is used to describe data loss that can occur over time due to magnetic degradation. Modern NVMe drives reduce the risk of bit rot, but sectors can still get corrupted. Sector corruption is handled like any other failure. The sectors are marked as unhealthy, and the data is rebuilt elsewhere in the distributed system to maintain parity.

What is not in scope for the data durability calculation?

As described in the previous section, the 11 9’s data durability calculation is a mathematical analysis of hardware failures and the ability of erasure coding to ensure that data remains intact, complete, and uncorrupted over time.

The data durability calculation does not take into consideration instances like natural disasters, fire, water damage, or human error.

The data durability calculation also does not protect against unplanned or unexpected usage of your account. For example, application software defects, or a bad actor gaining access to your cloud account and compromising your data.

What can I do to protect my data for factors outside of the data durability calculation?

To help protect your data against factors not included in the data durability calculation some things to consider are:

  • Follow best practices for management of users within your cloud account and limit the scope of access keys
  • Store copies of your data in multiple Akamai Object Storage regions, or using a secondary storage provider
  • Use versioning to ensure you have multiple copies of data
  • Use Object Lock to protect against accidental deletion of data