Bots can be intelligent, but also aggressive, and sometimes they become really annoying and affect the performance of your system.
If you’re using NGINX it’s not always clear how to handle these things. I’ve seen several example using the IP to block aggressive behavior, but bots are most likely using several IP’s to crawl your site. The following solution blocks bots based on their User-Agent.
conf.d/user-agent-rate-limit.conf
# 1 = soft, 2 = medium, 3 = hard
map $http_user_agent $rate_bot {
default "";
"~\bgooglebot\b" 1;
"~\bbingbot\b" 3;
}
# http status to apply when rules are used
limit_req_status 429;
# soft rate limit
map $rate_bot $rate_bot_soft {
default "";
1 $http_user_agent;
}
limit_req_zone $rate_bot_soft zone=ratebot_soft:16m rate=5r/s;
# medium rate limit
map $rate_bot $rate_bot_medium {
default "";
2 $http_user_agent;
}
limit_req_zone $rate_bot_medium zone=ratebot_medium:16m rate=3r/s;
# hard rate limit
map $rate_bot $rate_bot_hard {
default "";
3 $http_user_agent;
}
limit_req_zone $rate_bot_hard zone=ratebot_hard:16m rate=1r/s;
conf.d/example-com.conf
server {
server_name example.com;
...
# apply ratebot rules
limit_req zone=ratebot_soft nodelay;
limit_req zone=ratebot_medium nodelay;
limit_req zone=ratebot_hard nodelay;
}
Happy blocking :)