Re: robots.txt: Good, Bad, Ugly?
- From: Juha Laiho <Juha.Laiho@xxxxxx>
- Date: Thu, 21 Sep 2006 15:37:03 GMT
caveman@xxxxxxxxxxxxxxxxx said:
Do civilized modern web crawlers still use robots.txt? In a recent
debate a friend was suggesting that robots.txt should not be used
anymore since there are other means of authorizing/restricting access
to a web site. After the debate I was left with the impression that
only malicious individuals seek the contents of robots.txt.
Well, robots.txt was never meant to restrict access; it is just
a hint to the crawlers that a certain part of the site does not
contain material worth indexing (whatever that might be on a given
site).
As was said in the other response, "good" crawlers still honor it;
"bad" ones apparently either just ignore it, or use it in an attempt
to find non-public information. The latter is not a problem as long
as you don't try to use robots.txt as a mechanism to protect data
content.
--
Wolf a.k.a. Juha Laiho Espoo, Finland
(GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
"...cancel my subscription to the resurrection!" (Jim Morrison)
.
- References:
- robots.txt: Good, Bad, Ugly?
- From: caveman
- robots.txt: Good, Bad, Ugly?
- Prev by Date: UDP Port 28711
- Next by Date: Re: robots.txt: Good, Bad, Ugly?
- Previous by thread: Re: robots.txt: Good, Bad, Ugly?
- Next by thread: Re: Locating Linux security tools
- Index(es):