Re: robots.txt: Good, Bad, Ugly?



caveman@xxxxxxxxxxxxxxxxx said:
Do civilized modern web crawlers still use robots.txt? In a recent
debate a friend was suggesting that robots.txt should not be used
anymore since there are other means of authorizing/restricting access
to a web site. After the debate I was left with the impression that
only malicious individuals seek the contents of robots.txt.

Well, robots.txt was never meant to restrict access; it is just
a hint to the crawlers that a certain part of the site does not
contain material worth indexing (whatever that might be on a given
site).

As was said in the other response, "good" crawlers still honor it;
"bad" ones apparently either just ignore it, or use it in an attempt
to find non-public information. The latter is not a problem as long
as you don't try to use robots.txt as a mechanism to protect data
content.
--
Wolf a.k.a. Juha Laiho Espoo, Finland
(GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
"...cancel my subscription to the resurrection!" (Jim Morrison)
.