HOW TO BLOCK BAD ROBOTS

Cryptography and protocol

Older and simply made website scrappers can't make any ssl/tls connection via https and there's only little bit of them which are developed for new http/2 binary header, while it's a new matter. The majority of spamming and scrapping bots are 7+ years old programs. What is an age of TLS? TLS 1.2 is 5 years old and 1.3 is from 2016.

DNS, webhost and CDN

Actually, you can find anti-ddos hosting, free anti-ddos DNS and CDN. All of them are usable to filter bad bots and you can expect that their robust hardware can perform obviously better than one single server. Reality is that if your machine has to work with few MB large file size of IP addresses and rules then your page speed can go radically down. Any mod_rewrite and so far... can slow down your page, but we need to stop bad robots decreasing page speed even more than our rules do (.htaccess forces a web server to do an extra processes..., any commands do).

Block bots

Prevent bad bots, sql, xss and http injections with mode_rewrite

Mode_Rewrite can protect website from malicious bots and as well from some of automatized software scanning vulnerabilities such as eg Nmap does this (and a lot of else) via http protocol. Dynamic urls are disappearing from the public visibility and static urls don't contain any ".php" strings like in index.php, any "<script", even deeper, absolutely any symbols like "<", ">", "$", "\" could be included in our moe_rewrite file. Or we can disallow to query strings used especially for SQL sensitive data like "UserId", "Name", "Password"... or whatsoever used within your SQL database. Particularly, If you use Wordpress then there's a good reason to be disabling strings like "$wpdb", "_user", "user_" (eg this is a part of: "user_id", "user_login", "user_pass", "user_activation_key" and so far), "wordpress_logged_in_", "127.0.0.1" and "localhost" (don't lock out yourself), or your path such as /path/to/downloaded/wordpress...

Here's an example of one malicious request: "http://your_domain_or_IP/wp-admin/admin-ajax.php\?action=umm_switch_action\ &umm_sub_action=[umm_delete_user_meta|umm_edit_user_meta]&umm_user=... it's longer, too long url",

mod_rewrite

<IfModule mod_rewrite.c>

RewriteEngine On

RewriteCond %{QUERY_STRING} ^.*(\.|\*|;|>|$|<|'|"|\)|%0A|%0D|scr=|=http|=ftp|user|WPDB|%22|%27|%3C|%3E|%00|benchmark|union|select|insert|md5|cast|set| declare|drop|update|wordpress_logged_in_|twitter\.com|www.facebook\.com|google\.com|maps\.google|localhost|loopback|127\.0\.0\.1).* [NC,OR]

RewriteCond %{REMOTE_ADDR} ^fill_number_instead_of this_because_its_only_an_example\.124\.456\. [OR]

RewriteCond %{HTTP_USER_AGENT} ^$ [OR]

RewriteCond %{HTTP_USER_AGENT} ^.*(<|>|'|%0A|%0D|%27|%3C|%3E|%00|HTTrack|clshttp|archiver|python|fill_more_or_download_mine).* [NC,OR]

RewriteCond %{HTTP_REFERER} ^(.*)(<|>|'|bot|hack|test|exploit|vurnerab|inject|cyber|penetrat |kali|adult|%0A|%0D|%27|%3C|%3E|%00).* [NC,OR]

RewriteCond %{HTTP_REFERER} ^http://(www\.)?.*(-|.)?pastehtml(-|.).*$ [NC,OR]

RewriteRule ^(.*)$ - [F,L]

</IfModule>

That's not only a bandwidth is in the game. SQL database is vulnerable via input fields and http requests, so we can do something with it. Enyway, you can limit bots from all range IP addresses with REMOTE_ADDR. Often, if you limit only IPs of Data centers as Digital Ocean, Amazon and other big providers then you cut off 90% of all coming robots with few IPs and exactly the same result instead of large downloaded lists of IPs from public blacklists.

<IfModule mod_rewrite.c>

RewriteEngine On

ServerSignature Off

Options +FollowSymLinks

RewriteCond %{REQUEST_METHOD} ^(head|track|trace) [NC]

RewriteRule ^(.*)$ - [F,L]

</IfModule>

Eventually, some of you guys don't need request methods such as the "send", "options", "connect", "put" and "delete" as well and should fill in.

<IfModule mod_rewrite.c>

RewriteEngine On

RewriteCond %{HTTP:CF-IPCountry} ^(RU|CN|SK)$

RewriteRule ^ - [F,L]

</IfModule>

HTTP header with a IPCountry is not any standard, so you will need a customized header from 3rd party such as from Cloudflare, or some else providers. For example, this code is gonna block visitors from Russia, China and Slovakia. There's many other reason why to use it. For example: If you sell digital goods then maybe you mind an embargo.

If you wanna fight them then you can be interested for spider traps (spider is a synonym for a bot, web crawler) and List poisoning. But no all robots are so bad for you...