Overview

Where a site has been compromised, we have seen odd traffic from web crawlers indexing sites continue. Many of these can included weird URL’s which have a number of GET variables in the URL. An example of this would be:

"GET /test/?s=phim%20anime%202d%E3%80%90<BADURL>%E3%80%91Watch%20Free HTTP/1.0"

Especially where this generates a WordPress 404, this can lead to high resource usage for your site and even interfere with real users being able to access your site. Ideally, you should remove / block this direct within each bot’s control area such as the Google Search Console and only use this method as a fallback.

Warning

This is intended for advanced users and developers only.

Instructions

  1. Via the Plesk File Manager, open the .htaccess file.
  2. Add the following, customising to suit your specific requirement:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} BLEXBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PetalBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Adsbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SemrushBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} bingbot [NC]
RewriteCond %{QUERY_STRING} s=.* [NC]
RewriteRule test/ - [F,L]
</IfModule>

Advice

You can use the htaccess tester website to evaluate your rules before using on a production server.

In this example, we’re targeting all URL’s with /test and the GET variables s=. This would send a 403 to any request which matches, such as the one in the introduction. While this won’t fix the issue with the content in search engine results which will also be indexed (eg, in Google), it should at least remove the resource overhead so that your site is accessible again.

Was this article helpful?

Related Articles