Why we need rate limiting for APIs
Assume that, a DDOS attack has been started to your server by an attacker. What can you do? Your API will not be available for users. Why? A DDOS attack can send a ton of requests to your API in a second, and the API can’t handle these requests. We can overcome this issue with auto-scaling. Auto-scaling might be costly, and in this scenario, the above requests were from an attack and not the real customer requests. We can use a mechanism to limit the rate at which requests are made to avoid the server. We call this feature as rate-limiting. Rate limiting is the easiest way to prevent server overloading. More than that, rate-limiting can be used to give an excellent customer experience and commercial use of APIs.
Good Customer Experience
Generally, APIs are developed for customer use. For that, if a customer feels uncomfortable with the API that is very bad for the company as well. Think of a scenario; You have 1000 users for your API and. Three hundred users are using your API; mostly, they send the requests as they can. API busy with these requests In that time, so when another user apart from that three hundred users comes to the picture, the user may feel that the server is overloaded. For the above bad customer experience scenario can be ignored using rate-limiting.
Commercial Use
If you are developing an API for commercial use, then you want to track the users and their usage of the API. For example, you may allow your API for the customers to test with 100 requests, and if they want to exceed that limit, you can charge them for a different subscription package with higher rate limits. This can be done using rate limiting.
We know the significance of the rate-limiting so how can we implement this rate-limiting. Many algorithms are used to implement the rate-limiting. Algorithms will be described one by one below.
Leaky Bucket Algorithm
Imagine a bucket with a small hole in the bottom. No matter the rate at which water enters the bucket, the outflow is at a constant rate, and when the bucket is full, excess water will be overflowed from the bucket. This is called the leaky bucket algorithm. Here we can imagine the data storage as a bucket and requests as water. When the requests are exceeded, exceeded requests will be dropped.
Token Bucket Algorithm
This is a flexible algorithm. The requests are not lost. The tokens will be added to the token bucket in regular intervals to deal with the bursty attack. Token bucket has a limited capacity. So whenever the bucket is full, it will not take the tokens. When a ready request comes to the rate-limiter It will take one token and is processed by API. That token will be discarded. Leaky bucket sends tokens at an average constant rate; however, the token bucket algorithm can not send the requests in the regular speed.
How can we use this algorithm in rate-limiting? I break that algorithm into three steps.
- Fetch the token
- Access the token
- Update the token
Consider we have planned to limit our API 10 request/minute. For every unique user, we can track last time they have accessed the API and available token(number) of that user. These things can be stored in the Redis server with the key as user id.
Fetch the token
So when a request comes from the user, then our rate-limiting algorithm first fetch the user’s token using the user id.
Access the token
When we have enough tokens for the particular user, then it will allow processing otherwise it blocks the request to hit the API. The last access time and remaining tokens of that specific user will be changed.
Update Token
Rate limiting algorithm will update the token in the Redis as well. One more thing, the tokens will be restored after the time interval. In our scenario, ten will be updated after the time period as a token value.
Fixed Windows Counter Algorithm
This is memory efficient and simple algorithm to limit the requests. In here, we have an id of the user concatenated with a timestamp as a key and the value of the key starts with zero and the value will be increased based on the number of requests.
Here is an example of the key-value pair of the Redis server. We take the interval as one minute and the limit as 20 req/minute. So in the beginning, we have a simple token like this for user_1
{user_1_1563177600: 0}
When the request hit the server, the value will increase. If we have eight requests hit in that interval, the value will be increased to eight. After this, the key-value pair is {user_1_1563177600:8}. If we pass the minute, the key-value pair of the Redis will be discarded, and a new key-value pair will be generated as {user_1_1563177660:0}.
However, this simplest fixed window algorithm has one drawback that is, it can sometimes let through twice the number of allowed requests per the given interval. For an example when we have set the rate limit as 200 requests/minute and 200 requests are made at the end of the minute, and another 200 requests are made at the beginning of the next minute. In this scenario, actually within a minute server get 400 requests, but our rate limit is 200 req/minute. So We got more than 200 requests. How can we overcome this issue? For that, we can look at another algorithm, which is called a sliding window log algorithm.
Sliding Window Log Algorithm
We have logged the timestamps of the user in an array as value for the particular user id. When a new request comes from the user, This algorithm first removes all outdated timestamps before appending the new request time to the specific user, based on the time interval. Then the algorithm decides whether this request should be processed depending on whether the log size has exceeded the limit. This algorithm can overcome the issue in the fixed window algorithm; however, this is not a memory-efficient algorithm.
Sliding Window Algorithm
This is a hybrid approach of fixed window algorithm and sliding log algorithm. Here we have, timestamps with counts as value for each user id key in Redis. So every user has a set of timestamps and respective counters. We need to sort out the array filter out the last time interval and discard other timestamps. The requests will be counted using the timestamps and counters. If our rate limit interval is one minute, and rate limit is ten req/minute. When a user sends a request to the API, and We have three timestamps in the last minute respective to the time which the new request comes. In the first timestamp, we have two counts, in the second timestamp we have five counts and, we have two counts in the third timestamp so the total request in the given minute is [2+2+5]= 9, so last minute we have nine requests so this request will be processed by the API. If we get more than ten requests per minute, then rate limiter will not be allowed to handle the request. We sort out the array for the interval to overcome the issue. We used counters to reduce the size of the entry for a key. The memory usage of the algorithm is lesser than sliding log. In sliding log, we save all timestamps, but here we keep the counts of that timestamps so when a request in the same timestamp, it will increase the count and not create the new entry.
We can implement the rate-limiting based on any algorithms which were discussed above. Every algorithm has its pros and conflicts. However, these algorithms are worked well in the single server setup. When we have distributed server setup, we face the race condition. We can easily understand this race condition problem with below example.
In this setup, We have three APIs and two load balancers in different regions. Rate limiting keys will be stored in the Redis. Redis servers are connected to maintain consistency. In a scenario, A user can send multiple requests and the requests can be forward to different APIs. We have set the rate limit to 10 request/minute. The user has already sent nine requests for that minute.
If three requests are fired from the user at the same time, and they routed to three different rate limiters. After that, each the rate limiter will query the Redis. Each Redis server will say there can be allowed one request for the user to all three rate limiters. Here, We are serving twelve requests, but we had ten requests/minute. This is called a race condition. For this problem, We can maintain sticky sessions in the load balancers to always route the same user to the same region. This is not a good design because when that user gives multiple requests, then all requests will be going to the same API. It will increase the load of API sustainability. We can resolve this race condition problem using synchronised locks as well. When a rate limiter wants to access the user’s key-value pair, It can lock that particular key-value pair of the Redis. After accessing data, rate limiter will release the lock of the key-value pair, but this may add some latency to our server.
In all the above scenario, We have saved the data of the rate limiter to the Redis server. Alternatively, we can keep these data to the local machine as well. If we keep the data inside the node, then it will very fast, but accuracy will be decreased. Redis will give higher accuracy and add some latency. If you are going to use the rate-limiting for back-end protection, then you can save the data inside the data locally to make higher performance. For back-end protection, performance is essential. For commercial use, We want the Distributed Rate Limiter so you can save your data in the Redis for commercial use. For the better performance, we can use the rate limiter as follow; rate limiter is entirely in-memory to the instance(local) then move the data to the Redis. It will give more performance and synchronisation between all servers.
In the real scenario, Companies have multiple APIs. So they want to manage them in a profer manner. For that, they use the API gateway. An API gateway is a layer between the user and the back-end APIs. The API gateway is well expressed here. One of the features of the API gateway is rate-limiting.
API gateways have the capabilities to limit the requests based on API, User, IP and a group of users. Kong community edition API gateway is an open-source gateway. Their rate limiting based on fixed window counter algorithm. More than that, Kong has a paid Edition and its rate limiting based on the sliding window algorithm. Tyk API gateway has the rate-limiting and their rate limiting based on the leaky bucket algorithm.
Finally, The rate-limiting can be implemented with multiple algorithms. Choosing the algorithm depends on our requirements such as performance, accuracy or other factors.
References
Helpx.adobe.com. (2019). Throttling and rate limiting. [online] Available at: https://helpx.adobe.com/coldfusion/api-manager/throttling-and-rate-limiting.html .
Medium. (2019). What is an API gateway?. [online] Available at: https://medium.com/ingeniouslysimple/what-is-an-api-gateway-8f0c4a1a1d39 [Accessed 15 Aug. 2019].
Tyk.io. (2019). Rate Limiting. [online] Available at: https://tyk.io/docs/control-limit-traffic/rate-limiting/ [Accessed 15 Aug. 2019].
Kong. (2019). Open-Source API Management and Microservice Management. [online] Available at: https://docs.konghq.com/hub/kong-inc/rate-limiting/ .