Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They are more accessible here on HN, though, because crawlers that obey robots.txt will not see it on git.savannah.gnu.org but will see it here on HN.


I suspect that if a crawler is being used to farm email addresses for spamming, it's highly unlikely that robots.txt would be any deterrent whatsoever.


Spammers' crawlers use URLs obtained from search engines and public sources. If the whole directory is blocked in robots.txt, it WILL reduce crawling activity massively.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: