Overview
Where a site has been compromised, we have seen odd traffic from web crawlers indexing sites continue. Many of these can included weird URL’s which have a number of GET variables in the URL. An example of this would be:
"GET /test/?s=phim%20anime%202d%E3%80%90<BADURL>%E3%80%91Watch%20Free HTTP/1.0"
Especially where this generates a WordPress 404, this can lead to high resource usage for your site and even interfere with real users being able to access your site. Ideally, you should remove / block this direct within each bot’s control area such as the Google Search Console and only use this method as a fallback.
Instructions
- Via the Plesk File Manager, open the .htaccess file.
- Add the following, customising to suit your specific requirement:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} BLEXBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PetalBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Adsbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SemrushBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} bingbot [NC]
RewriteCond %{QUERY_STRING} s=.* [NC]
RewriteRule test/ - [F,L]
</IfModule>
In this example, we’re targeting all URL’s with /test and the GET variables s=. This would send a 403 to any request which matches, such as the one in the introduction. While this won’t fix the issue with the content in search engine results which will also be indexed (eg, in Google), it should at least remove the resource overhead so that your site is accessible again.