The API throttling feature available in the API Keys and Traffic Management app lets you define throttling counters to limit incoming API traffic on a per second basis. This ensures that every API consumer can have a high-quality experience when interacting with your API and prevents API consumers from dominating the capacity of your backend API infrastructure.

A single API consumer sending more than a thousand requests to your API within one second could negatively impact the experience of other API consumers. For example, your API could respond slowly to their requests or not send a response at all. API throttling ensures such problems do not occur by rejecting excessive requests before they reach your API server.

Other advantages of API throttling include:

Preventing system outages as a result of extreme spikes in traffic.
Protecting against excessive automated API calls by limiting the incoming requests rate to a value you consider typical for real-user traffic.

📘
To deal with this specific use case efficiently, Akamai recommends the Cloud Security products in tandem with API Gateway. To learn more, see the Cloud Security documentation in the Protect section.

To throttle your API traffic, you create throttling counters that increment based on the incoming requests to your APIs.

A throttling counter is an object composed of a set of conditions that, when matched by an incoming client request, cause the counter to increase. For each counter, you define a limit of allowed requests per second, and if that limit is reached, the edge server will reject any subsequent requests that match the counter's associated conditions. A throttling counter operates based on a moving average of received requests during the last 5 seconds. If the average decreases below the specified requests-per-second limit, API consumers regain the capability to make requests that match the counter's associated conditions.

📘
Cached requests also count towards your throttling quota.

API throttling vs. user quota

API throttling is similar to another API Gateway feature called user quota. Both features limit the number of requests an API consumer can send to your API within a specific time period. The table below helps you understand the main differences between user quota and API throttling.

User quota	API throttling
You can only associate quota with a key collection. The quota limit increases whenever API consumers include API keys from that key collection in requests to your registered APIs.	You can associate a throttling counter with the following conditions: API keys, API key collections, and endpoints and resources including HTTP methods. Depending on your configuration, a throttling counter may increase whenever an incoming request meets any of the above conditions, or a combination of these conditions. Note that depending on the conditions’ combinations, throttling counter may be applied to a specific resource. For example, if you configure throttling for an endpoint wiith two resources, counter will be applied for both (logical operator: OR). If you add another condition, for a key, throttling counter will be applied only to requests additionally including that key (logical operator: AND).
You can schedule a quota window for the time periods of 1 hour, 6 hours, 12 hours, 1 day, 1 week, or 1 month.	The time period for API throttling always equals one second.
Once quota is full, it requires an automatic or manual reset to allow any subsequent requests with a given API key.	Throttling does not require a reset. It operates based on a moving average. If a throttling counter reaches its limit, an API consumer will wait for a maximum of 5 seconds to regain the capability to make subsequent requests.

When you configure both quota and throttling for a given API key, API Gateway first applies throttling conditions, and based on whether the request was successful, increases the quota count for the API key.

To learn more about quota, see User quota.

API throttling error margin

A throttling counter calculates the throttling limit by measuring the counter value every second and updating a moving average of 5 seconds.

Edge servers within the network share updated counter values with a delay ranging from 1 to 3 seconds. This latency on counter updates and the use of a moving average may affect the throttling limit calculation and result in allowing requests over or denying requests under the current throttling limit.

Depending on the distribution of your traffic on the network, the error margin over the last minute may range from a median of less than 10% for steady traffic rates up to about 20% of large sudden changes in traffic rates.

In rare cases, a burst of traffic may result from extra latency on a widely-distributed region of edge servers sharing its counter updates for a low throttling limit. If the updates from the under-synchronized region significantly increase the undercounted moving average causing it to exceed the throttling limit, edge servers will keep denying incoming requests over the throttling limit until the moving average stabilizes and drops below the specified limit.