logo topseosmo.com

 query : Googlebot flooding server with requests for junk URLs with random data I'm having some trouble with GoogleBot. It keeps requesting a random URL that doesn't exist. It is trying to access:www.example.com/index.php/{TOKEN}

@Shelton105

Posted in: #Google #Googlebot #Redirects #SearchEngines #Seo

I'm having some trouble with GoogleBot. It keeps requesting a random URL that doesn't exist. It is trying to access:www.example.com/index.php/{TOKEN}

That {TOKEN} is really random, no idea where it came from. I'm trying to respond that the pages don't exist by 301 redirecting to home page (not sure if this is a good idea).

This is causing my server to overload, because it is TONS OF REQUESTS! What should I do to stop this?

Access Log:


example.com 66.249.64.28 - - [21/Feb/2018:12:13:48 -0300] "GET /index.php/66t-2nkznwh_91f4690bjij1wbgziq- HTTP/1.1" 301 178 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"

10.02% popularity Vote Up Vote Down


Report

2 Comments

Sorted by latest first Latest Oldest Best

@Chiappetta492

@Chiappetta492

I've found that Googlebot crawls URLs on my site that don't exist, doesn't have content and isn't linked from any pages. Studies have shown that it appears that Google is typing words into the search bars of websites and crawling the results of the search.

You can limit the crawl requests that Googlebot makes to your site in webmaster console.

If you feel that 301 redirecting this page back to the homepage isn't helping Google crawl your site, you can set the header status to 403 forbidden on that page. This will potentially stop Googlebot from going there. If it's in a specific directory, you can also disallow robots in robots.txt.

10% popularity Vote Up Vote Down


Report

@Michele947

@Michele947

"What should I do". As an immediate action I'd set a rule in web-server config (e.g. .htaccess) to respond with 404 to that. 404 is in case you have no /index.php as a valid path on your server. Dong so will at least drop load from your interpreter (I assume it's PHP).
Next I'd put a rule in robots.txt to forbid such a path from indexing. It should completely stop Google from crawling those URIs and stop spending crawl budget on them.
After that I'd search for links to your site using one of those URIs. Who knows, maybe it will help you to find the reason where those links are coming to Google from. What if that's your own site?


That's it I think.

PS 301 is not a good idea I think. From my experience Bot will be coming back from time to time to confirm that redirection is still there. I guess that's not what you want. Moreover 404 is really fits better per definition.


a random URL that doesn't exist

10% popularity Vote Up Vote Down


Report

Back to top | Use Dark Theme