In my sites I have created a script that sends me an email every time a new ip claiming to be google visits the site.
When I see the email I go to check (for example on whois.com) if the ip that claims to be google is really google, and if not, I block it with the firewall.
Normally I find one or two fake googles a week, but in the last few days google has been igniting my server.
103432 Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.130 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
1022802 Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.80 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
1063366 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
1178083 Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.127 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
(number of google accesses on my server in the last 24 hours)
Something is happening and google is entering my server much more than usual and along with the google accesses, the "fake google" has increased a lot. But it is strange that they have increased together...
Will I be blocking the IP of some google service?
(These are those of the last 24 hours)
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='193.203.11.230' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='45.80.104.189' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='45.148.124.171' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='37.44.196.194' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='88.218.45.98' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='46.161.60.168' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='212.60.21.63' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='85.202.195.178' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='45.148.124.139' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='84.54.58.80' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='45.148.234.198' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='212.119.46.111' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='195.133.24.218' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='83.142.55.37' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='194.87.112.182' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='193.93.195.206' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='194.156.125.92' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='212.119.46.82' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='45.140.206.107' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='45.66.208.145' reject"
$ sudo firewall-cmd --reload
they seem to come almost all from the same sources (usually whois.com results are different from each other).
So the doubt comes, am I blocking something that is part of google? Like lighthouse, pagespeed or something else that is still part of the google scans? Or are they just the ip of scammers who pretend to be google to hack my server or clone my sites?
Could it be that google related services claim to be google in the HTTP_USER_AGENT, and then there is nothing about google by verifying domain ownership on whois.com? (Even if google itself says that the only way to actually verify that an ip belongs to them is to verify its ownership, for example with a reverse dns?)
Can you help me understand who they are, and what should I do with these IPs?
Thanks
----------------------update------------------
Perhaps we are going a little off topic, perhaps because some information is missing.
I manually check every single ip that claims to be google and if it is not google (in dns) I block it with the firewall. I don't use an automatic reverse proxy, because it is actually too long and heavy.
I use fail2ban, I have many filters that block many scam ip's, these are the ones that survive my fail2ban filters.
I don't want to block or hide my content, I want it to be seen and understood by search engines, but possibly not cloned entire sites.
The "fake google" IPs that I usually find on my server belong to sql injection attempts, inserting advertising comments on my sites, and unfortunately in the past I have also found entire sites cloned. (later I added some tricks that prevent cloning).
From further checks it appears that these new "fake google" IPs all visit the same site on my server, each on a different page, and enter my server only once.
Once, I think that's why they manage to survive my fail2ban.
Everything leads to think of an attempt at cloning, even if it is strange that it is done with all these ip (no?).
But the thing that most gives me to think is the fact that the intrusions by these "fake google" are exponentially increased together with the scans of the "real google" IPs.
And this makes me think that somehow they are connected, even if I can't find a connection between real and fake google IPs.
The help I would like from you is to actually understand who they are? And in case they are just scammers, how to block them.
These are today's IPs:
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='85.202.194.0/24' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='85.202.194.214' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='62.76.232.248' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='212.193.13.168' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='194.104.11.51' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='194.58.68.26' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='5.133.123.198' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='45.140.207.224' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='212.192.28.155' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='83.142.52.178' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='185.89.100.216' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='193.93.192.236' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='195.133.24.190' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='45.140.206.79' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='185.68.184.215' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='185.250.46.119' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='31.40.249.208' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='185.233.187.174' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='37.44.253.226' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='93.177.118.162' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='37.44.253.229' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='94.158.22.50' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='83.171.226.158' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='185.250.45.51' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='212.193.15.194' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='194.99.26.102' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='83.142.55.48' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='185.14.194.78' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='77.220.193.69' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='193.203.10.224' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='193.202.82.135' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='193.31.126.107' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='83.142.54.110' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='77.220.194.114' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='193.151.189.82' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='212.60.21.72' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='195.133.24.68' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='185.250.45.43' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='185.88.100.212' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='212.193.14.102' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='45.80.104.253' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='194.58.68.204' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='194.58.33.128' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='185.68.247.216' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='83.171.227.121' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='212.192.28.169' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='77.243.91.234' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='194.87.116.145' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='194.58.33.192' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='45.80.104.160' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='91.222.239.251' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='194.87.52.175' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='185.68.185.126' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='193.31.126.78' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='212.119.46.226' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='141.98.87.44' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='194.99.26.51' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='31.40.248.142' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='212.193.14.15' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='83.171.252.202' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='193.124.9.221' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='194.58.34.69' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='83.142.52.224' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='77.220.193.239' reject"
$ sudo firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='185.233.187.46' reject"
It is not complex to verify whether a googlebot is fake or not. Google suggests to use reverse + forward DNS lookup for this - Verifying Googlebot.
Just you can easy catch issues like this - fail2ban # 2951, where a DNS lookup takes very long time (or even fails with timeout).
So if you want implement some banning for fake googlebots, better would be to organize a cache in local DNS service or at script level, e. g. in fail2ban (or whatever thing you would use) to avoid too long hangs, especially for lot of requests from them.
Another way would be to limit rate for those agents (unless the IP gets whitelisted as real googlebots by its validation).
You have to define what you want to protect. Create scenarios and make clear what is important and what is not.
If you need to protect your site from copying, you have 2 options:
Use Google SearchConsole to know when Google can not index your site and debug the issue.