Overview
We have observed an increasing number of sites where Google Bots have generated excess traffic or resources for a site. Generally, this has occurred after a site has been exploited. Even after malicious code has been cleaned up, Google may still have the links indexed and therefore continue to try and crawl these URL’s for months afterwards. If there are more than a few thousand pages (we’ve observed as high as 3 million), this can cause high traffic and resources (especially if it’s a WordPress 404 which isn’t cached).
To remove pages from Google Search and prevent further issues, you can follow the steps below.
Confirming pages are indexed
There are two checks which need to be completed. The first is via the access logs of your website.
- Use our Viewing website access logs via Plesk article to login and read your website logs.
- Review access based on either 404 codes or by bot activity. You can see bots based on the robot icon used for the User Agent:
Secondly, you can complete a Google search for your website URL and any of the URL patterns discovered in the logs. In either the address bar of your Google Chrome browser or within the search field of google.com.au, search for:
site:<yourwebsite.com.au> inurl:<themaliciousurlpattern>
For example, here’s a recent site where this was an issue:
We can see 24,100 results for the site with malicious content which Google still has indexed using an identifiable URL pattern.
Google Search Removal Request
- Login to your Google Search Console.
- Under Index within the left-hand menu, select Removals:
- Click on the New Request button and then enter the URL pattern analysed in the previous step:
- Click Next, then confirm you wish to remove by clicking Submit Request.
- Google will crawl the URL’s again and if they receive a 404, they will be removed from the index.