Why we need rate limiting for APIs

Assume that, a DDOS attack has been started to your server by an attacker. What can you do? Your API will not be available for users. Why? A DDOS attack can send a ton of requests to your API in a second, and the API can’t handle these requests. We can overcome this issue with auto-scaling. Auto-scaling might be costly, and in this scenario, the above requests were from an attack and not the real customer requests. We can use a mechanism to limit the rate at which requests are made to avoid the server. We call this feature as rate-limiting. Rate limiting is the easiest way to prevent server overloading. More than that, rate-limiting can be used to give an excellent customer experience and commercial use of APIs.

Good Customer Experience

Commercial Use

We know the significance of the rate-limiting so how can we implement this rate-limiting. Many algorithms are used to implement the rate-limiting. Algorithms will be described one by one below.

Leaky Bucket Algorithm

Leaky bucket

Imagine a bucket with a small hole in the bottom. No matter the rate at which water enters the bucket, the outflow is at a constant rate, and when the bucket is full, excess water will be overflowed from the bucket. This is called the leaky bucket algorithm. Here we can imagine the data storage as a bucket and requests as water. When the requests are exceeded, exceeded requests will be dropped.

Token Bucket Algorithm

This is a flexible algorithm. The requests are not lost. The tokens will be added to the token bucket in regular intervals to deal with the bursty attack. Token bucket has a limited capacity. So whenever the bucket is full, it will not take the tokens. When a ready request comes to the rate-limiter It will take one token and is processed by API. That token will be discarded. Leaky bucket sends tokens at an average constant rate; however, the token bucket algorithm can not send the requests in the regular speed.

How can we use this algorithm in rate-limiting? I break that algorithm into three steps.

  • Fetch the token
  • Access the token
  • Update the token

Consider we have planned to limit our API 10 request/minute. For every unique user, we can track last time they have accessed the API and available token(number) of that user. These things can be stored in the Redis server with the key as user id.

Fetch the token
So when a request comes from the user, then our rate-limiting algorithm first fetch the user’s token using the user id.

Access the token
When we have enough tokens for the particular user, then it will allow processing otherwise it blocks the request to hit the API. The last access time and remaining tokens of that specific user will be changed.

Update Token
Rate limiting algorithm will update the token in the Redis as well. One more thing, the tokens will be restored after the time interval. In our scenario, ten will be updated after the time period as a token value.

Fixed Windows Counter Algorithm

Here is an example of the key-value pair of the Redis server. We take the interval as one minute and the limit as 20 req/minute. So in the beginning, we have a simple token like this for user_1

{user_1_1563177600: 0}

When the request hit the server, the value will increase. If we have eight requests hit in that interval, the value will be increased to eight. After this, the key-value pair is {user_1_1563177600:8}. If we pass the minute, the key-value pair of the Redis will be discarded, and a new key-value pair will be generated as {user_1_1563177660:0}.

However, this simplest fixed window algorithm has one drawback that is, it can sometimes let through twice the number of allowed requests per the given interval. For an example when we have set the rate limit as 200 requests/minute and 200 requests are made at the end of the minute, and another 200 requests are made at the beginning of the next minute. In this scenario, actually within a minute server get 400 requests, but our rate limit is 200 req/minute. So We got more than 200 requests. How can we overcome this issue? For that, we can look at another algorithm, which is called a sliding window log algorithm.

Sliding Window Log Algorithm

Sliding Window Algorithm

We can implement the rate-limiting based on any algorithms which were discussed above. Every algorithm has its pros and conflicts. However, these algorithms are worked well in the single server setup. When we have distributed server setup, we face the race condition. We can easily understand this race condition problem with below example.

In this setup, We have three APIs and two load balancers in different regions. Rate limiting keys will be stored in the Redis. Redis servers are connected to maintain consistency. In a scenario, A user can send multiple requests and the requests can be forward to different APIs. We have set the rate limit to 10 request/minute. The user has already sent nine requests for that minute.

If three requests are fired from the user at the same time, and they routed to three different rate limiters. After that, each the rate limiter will query the Redis. Each Redis server will say there can be allowed one request for the user to all three rate limiters. Here, We are serving twelve requests, but we had ten requests/minute. This is called a race condition. For this problem, We can maintain sticky sessions in the load balancers to always route the same user to the same region. This is not a good design because when that user gives multiple requests, then all requests will be going to the same API. It will increase the load of API sustainability. We can resolve this race condition problem using synchronised locks as well. When a rate limiter wants to access the user’s key-value pair, It can lock that particular key-value pair of the Redis. After accessing data, rate limiter will release the lock of the key-value pair, but this may add some latency to our server.

In all the above scenario, We have saved the data of the rate limiter to the Redis server. Alternatively, we can keep these data to the local machine as well. If we keep the data inside the node, then it will very fast, but accuracy will be decreased. Redis will give higher accuracy and add some latency. If you are going to use the rate-limiting for back-end protection, then you can save the data inside the data locally to make higher performance. For back-end protection, performance is essential. For commercial use, We want the Distributed Rate Limiter so you can save your data in the Redis for commercial use. For the better performance, we can use the rate limiter as follow; rate limiter is entirely in-memory to the instance(local) then move the data to the Redis. It will give more performance and synchronisation between all servers.

In the real scenario, Companies have multiple APIs. So they want to manage them in a profer manner. For that, they use the API gateway. An API gateway is a layer between the user and the back-end APIs. The API gateway is well expressed here. One of the features of the API gateway is rate-limiting.

API gateways have the capabilities to limit the requests based on API, User, IP and a group of users. Kong community edition API gateway is an open-source gateway. Their rate limiting based on fixed window counter algorithm. More than that, Kong has a paid Edition and its rate limiting based on the sliding window algorithm. Tyk API gateway has the rate-limiting and their rate limiting based on the leaky bucket algorithm.

Finally, The rate-limiting can be implemented with multiple algorithms. Choosing the algorithm depends on our requirements such as performance, accuracy or other factors.


Medium. (2019). What is an API gateway?. [online] Available at: https://medium.com/ingeniouslysimple/what-is-an-api-gateway-8f0c4a1a1d39 [Accessed 15 Aug. 2019].

Tyk.io. (2019). Rate Limiting. [online] Available at: https://tyk.io/docs/control-limit-traffic/rate-limiting/ [Accessed 15 Aug. 2019].

Kong. (2019). Open-Source API Management and Microservice Management. [online] Available at: https://docs.konghq.com/hub/kong-inc/rate-limiting/ .

Undergraduate of Department of Computer Science and Engineering, University of Moratuwa.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store