I've installed Apache a while ago, and a quick look at my access.log shows that all sorts of unknown IPs are connecting, mostly with a status code 403, 404, 400, 408. I have no idea how they're finding my IP, because i only use it for personal use, and added a robots.txt hoping it'd keep search engines away. I block indexes and there's nothing really important on it.
How are these bots (or people) finding the server? Is it common for this to happen? Are these connections dangerous/what can I do about it?
Also, lots of the IPs come from all sorts of countries, and don't resolve a hostname.
Here's a bunch of examples of what comes through:
in one large sweep, this bot tried to find phpmyadmin:
"GET /w00tw00t.at.blackhats.romanian.anti-sec:) HTTP/1.1" 403 243 "-" "ZmEu"
"GET /3rdparty/phpMyAdmin/scripts/setup.php HTTP/1.1" 404 235 "-" "ZmEu"
"GET /admin/mysql/scripts/setup.php HTTP/1.1" 404 227 "-" "ZmEu"
"GET /admin/phpmyadmin/scripts/setup.php HTTP/1.1" 404 232 "-" "ZmEu"
i get plenty of these:
"HEAD / HTTP/1.0" 403 - "-" "-"
lots of "proxyheader.php", i get quite a bit requests with http:// links in the GET
"GET http://www.tosunmail.com/proxyheader.php HTTP/1.1" 404 213 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
"CONNECT"
"CONNECT 213.92.8.7:31204 HTTP/1.0" 403 - "-" "-"
"soapCaller.bs"
"GET /user/soapCaller.bs HTTP/1.1" 404 216 "-" "Morfeus Fucking Scanner"
and this really sketchy hex crap..
"\xad\r<\xc8\xda\\\x17Y\xc0@\xd7J\x8f\xf9\xb9\xc6x\ru#<\xea\x1ex\xdc\xb0\xfa\x0c7f("400 226 "-" "-"
empty
"-" 408 - "-" "-"
That's just the gist of it. I get all sorts of junk, even with win95 user-agents.
Thanks.
Welcome to the internet :)
These are just people trying to find vulnerabilities in servers. Almost certainly done by comprimised machines.
It'll just be people scanning certain IP ranges -- you can see from the phpMyAdmin one, that it is trying to find a badly secured pre-install version of PMA. Once it's found one, it can get surprising access to the system.
Ensure that your system is kept up to date, and you don't have any services that aren't required.
These are robots scanning for known security exploits. They simply scan entire network ranges and will therefore find unadvertised servers like yours. They're not playing nice and don't care about your robots.txt. If they find a vulnerability, they'll either log it (and you can expect a manual attack shortly) or will automatically infect your machine with a rootkit or similar malware. There is very little you can do about this and it's just normal business on the internet. They are the reason why it's important to always have the latest security fixes for your software installed.
As other have noted, they are likely doing brute force scanning. If you are on a dynamic IP address they might be more likely to scan your address. (The following advice assumes Linux/UNIX, but most may be applied to Windows Servers.)
The easiest ways to block them are:
To limit the damage they can do to your system make sure that the apache process can only write to directories and files that it should be able to change. In most cases the server only needs read access to the content it serves.
The internet is public space, thus the term public ip. You can't hide except by setting some way to deny the public (vpn, acl on a firewall, directaccess etc.). These connections are dangerous as eventually someone will be quicker at exploiting you than you are at patching. I would consider some sort of authentication before responding.