Best practices for high-throughput applications
Application design best practices
Scale the number of TCP connections
A fundamental design pattern for high-throughput applications is to distribute content over multiple TCP connections. This should be considered whenever you access Akamai Object Storage. For client applications, multiple TCP connections across multiple threads can help increase overall performance by minimizing the impact of blocking API calls and better utilizing CPU resources. Similarly, multiple TCP connections also allow for multiple network paths to be followed, which optimizes throughput across the network.
Distribute requests across multiple IP addresses (for each endpoint)
Each DNS lookup for an Akamai Object Storage S3 hostname (ex. us-sea-9.linodeobjects.com) randomly returns 12 IP addresses from a larger pool of addresses. Each DNS A record TTL has a timeout of 30 seconds. A diversity of IP addresses when connecting to Object Storage also helps to ensure optimal throughput rates by distributing content across multiple ingress points to the Object Storage service. It’s important to ensure that applications respect the TTL of these A records and resolve again when the TTL expires. Additionally, any local libraries, SDK’s, or caches should be reviewed to ensure that connection requests are spread across all of the IP addresses available for the Akamai Object Storage endpoint.
We recommend that applications model for a maximum throughput of 1 Gbps (gigabits per second) per connection. At higher rates, individual connections may be throttled. For uploads (ingress), throttling results in TCP backpressure on the sender, which limits the amount of data being transmitted. By spreading uploads across multiple connections with a model target of 1 Gbps each, any risk of throttling is avoided. For example, an application may issue 8 simultaneous GET requests to 8 distinct IP addresses to target a download (egress) rate of 8 Gbps.
Use larger objects (> 1 MiB) when possible
Larger objects (greater than 1 MiB) should be preferred over smaller objects for high-throughput applications. Larger objects minimize the TCP connection and operation overhead with individual S3 API requests. This can meaningfully impact overall application throughput for large numbers of small objects.
Examples when using Amazon AWS S3 SDKs
Increase connection pool size
Most Amazon AWS S3 SDKs default to a small connection pool size. For instance, the Java SDK defaults to 50 while Python/boto3 defaults to 10. This means that at high throughput, most requests funnel into a handful of persistent connections, potentially all to the same IP address. Here are some example SDK configurations to increase the connection pool size:
- Java SDK v2:
ClientOverrideConfiguration.builder().httpClientBuilder(ApacheHttpClient.builder().maxConnections(200)) - Golang SDK: Tune
MaxIdleConnsandMaxIdleConnsPerHostin the underlyinghttp.Transport - Python/boto3:
botocore.config.Config(max_pool_connections=100)
Increase worker (or thread) count
For serial uploads and downloads (one at a time), it is likely that connection optimizations or local caching of DNS addresses may result in the same TCP connection or destination IP address being re-used. If this is an issue, you can use parallel workers (or threads) in your application to spread requests across multiple connections and IP addresses. This may also be accomplished by running multiple instances of your client.
Adjust retry behavior on failures
While retries or failovers are essential, it’s important to make sure they don’t all target the same IP address. We recommend implementing exponential backoff with jitter, using standard mode in the SDK retry configuration. Some example configurations in the Amazon AWS S3 SDK’s are:
- Java SDK v2:
RetryPolicy.builder().numRetries(3).backoffStrategy(BackoffStrategy.defaultStrategy()) - Python/boto3:
botocore.config.Config(retries={'max_attempts': 5, 'mode': 'standard'})
The default retry behavior of standard mode is recommended in most use cases. However, if your application has a particular sensitivity to failed requests and can accept slightly higher latency as a trade-off, adaptive retry is an option.
Updated 1 day ago
