I have found out that McAfee SiteAdvisor has reported my website as "may be having security issues".
I care little about whatever McAfee thinks of my website (I can secure it myself and if not, McAfee definitely is not the company I'd be asking for help, thank you very much). What bothers me, though, is that they have, apparently, crawled my website without my permission.
To clarify: There's almost no content on my website yet, just some placeholder and some files for my personal usage. There are no ToS.
My questions is: Does McAffee have the right to download content from / crawl my website? Can I forbid them to do so? I have a feeling there should be some kind of "My castle, my rules" principle, however I basically know nothing about all the legal stuff.
Update: I probably should have mentioned my server provider sends me emails about SiteAdvisor's findings on a regular basis - that's how I found out about their 'rating' and that's why I'm annoyed.
Yes, they have the right to do so - you've created a public website, what makes you think they don't?
You too, of course, have the right to stop them. You can ask them not to crawl your website with robots.txt or actively prevent them from accessing it with something like fail2ban.
Alternatively, don't worry about it and continue on with your life. It's not hurting anything and is definitely on the benign side of Internet probing.
There is legal precedent for this. Field v. Google Inc., 412 F. Supp. 2d 1106, (U.S. Dist. Ct. Nevada 2006). Google won a summary judgement based on several factors, most notably that the author did not utilize a robots.txt file in the metatags on his website, which would have prevented Google from crawling and caching pages the website owner did not want indexed.
Ruling pdf
There is NO U.S. law specifically dealing with robots.txt files; however another court case has set some precedent that could eventually lead to robots.txt files being considered as circumventing intentional electronic measures taken to protect content. In HEALTHCARE ADVOCATES, INC Vs HARDING, EARLEY, FOLLMER & FRAILEY, et. al, Healthcare Advocates argued that Harding et al essentially hacked the capabilities of the Wayback Machine in order to gain access to cached files of pages that had newer versions with robots.txt files. While Healthcare Advocates lost this case, the District Court noted that the problem was not that Harding et al "picked the lock," but that they gained access to the files because of a server-load problem with the Wayback Machine that granted access to the cached files when it shouldn't have and therefore there was "no lock to pick."
Court Ruling pdf
It is only a matter of time IMHO until someone takes this ruling and turns it on its side: The court indicated that robots.txt is a lock to prevent crawling and circumventing it is picking the lock.
Many of these lawsuits, unfortunately, aren't as simple as "I tried to tell your crawler that it is not allowed and your crawler ignored those settings/commands." There are a host of other issues in all these cases that ultimately affect the outcome more than that core issue of whether or not a robots.txt file should be considered electronic protection method under US DCMA law.
That having been said, this is a US law and someone from China can do what they want--not because of the legal issue, but because China won't enforce US trademark and copyright protection, so good luck going after them.
Not a short answer, but there really isn't a short, simple answer to your question!
Whether this behaviour is ethical or not isn't perfectly clear cut.
The act of crawling a public site is, itself, not unethical (unless you've forbidden it explicitly using a robots.txt or other technological measures, and they're circumventing them).
What they are doing is the rough equivalent of cold calling you, while announcing to the world that you are possibly not secure. If that damages your reputation and is unjustified, it's unethical; if it does that and the only resolution for it involves you paying them, it's racketeering. But, I don't think this is what is going on.
The other time this becomes unethical is when someone crawls your site to appropriate your content or data and then represents it as their own. But, that too isn't what is going on.
So, I suggest that their behaviour in this case is ethical, and you can also most likely ignore it.
Their related behaviour of spamming you is unethical if you have no relationship with them and didn't request the emails, but I suspect they have a working unsubscribe.
Technical approach to blocking certain people or companies from accessing your web site:
You can block specific IP addresses, or ranges of addresses from accessing the pages of your site. This is in .htaccess file (if your site is running on Apache Web Server).
http://www.htaccess-guide.com/deny-visitors-by-ip-address/
Have your web server log IP addresses that it is accessed from, and look up those IP addresses, to find ones associated with McAfee. Probably easy to tell now, if you don't have any regular visitors.
Of course, they might change IP addresses in the future. Still, if you look up the IP addresses you find, to see who owns them, you might be able to learn about a whole block of addresses owned by McAfee, and block them all.
For a legal basis for doing so:
"Website owners can legally block some users, court rules"
http://www.computerworld.com/s/article/9241730/Website_owners_can_legally_block_some_users_court_rules
(If your website is a personal one, no one would contest your right to block some users. But if it is a website for a business, there are legal and moral arguments on both sides of that discussion. The smaller your business, the easier it is to be legally protected -- and the less anyone else would care enough to complain anyway.)
You might also be interested in "Deny visitors by referrer".
http://www.htaccess-guide.com/deny-visitors-by-referrer/