Re: IIS Logfile

From: Miles Stevenson (miles_at_mstevenson.org)
Date: 10/26/04

  • Next message: Gabriel Orozco: "Re: Advice on Fastest NMAP Scan"
    To: security-basics@securityfocus.com
    Date: Tue, 26 Oct 2004 01:27:26 -0400
    
    
    

    Hello mfernandez,

    <snip>
    > 2004-10-25 04:16:46 64.246.165.10 - W3SVC1 FILESERVER xxx.xxx.xxx.xxx 80
    > GET /robots.txt - 401 5 0 www.mydomain.com SurveyBot/2.3+(Whois+Source)
    > http://www.whois.sc/
    > 2004-10-25 04:16:46 64.246.165.10 - W3SVC1 FILESERVER xxx.xxx.xxx.xxx 80
    > GET / - 401 5 0 www.mydomain.com SurveyBot/2.3+(Whois+Source)
    > http://www.whois.sc/mydomain.com
    <snip>

    First I'll explain what this means, then I'll answer your questions:

    This log is being generated by a script identifying itself as
    "SurveyBot/2.3+". These scripts are web "crawlers" or "robots". Basically,
    they follow links throughout the internet, grabbing information, and indexing
    the information for search engine use. Expect to be hit regularly by the
    bigger search engines such as Google and Yahoo, especially if your site is
    large and popular. If you want to find out more about this particular bot,
    then Google is your friend (and mine too:
    http://www.whois.sc/info/webmasters/surveybot.html).

    You will notice that the first file they are looking for is called
    "robots.txt". This means that this particular "crawler" (SurveyBot), is
    playing nice, and behaving the way it should. The "robots.txt" file is a
    standard way for website administrators to tell these "robots" about
    particular pages that you do NOT want indexed. For example, if you didn't
    want "network-info.html" to end up in search engines, you would add this file
    to your "robots.txt" and the crawlers that behave nicely will ignore your
    network-info.html page.

    Take note though, that the badguys will NOT honor this (duh!). In fact, the
    bad guys know that any web URL you put in your "robots.txt" file, are pages
    that you don't want lots of people to see, which is exactly why the badguys
    want to see them. Lots of "newbie" blackhats will scan for robots.txt files,
    looking for interesting web pages (there are lots of automated scripts that
    do this for the "kiddies"). Of course, every skilled administrator should
    know better than to put sensitive material on publicly accessible web pages,
    robots.txt or no robots.txt!

    Moral of the story: If you don't want people to see it, don't make it public,
    and you won't need to worry about it in the first place.

    Here is a fun trick that my company uses (I wish I could take credit for the
    idea but I can't) and finds very effective: use the robots.txt concept as a
    "honeytoken". Here is what you do:

    1. Set up a dummy html page publicly accessible on your site, and give it an
    interesting but hard to guess filename, such as "admin44687-secret.html". You
    don't even need to put any info on the page. You can just leave it blank. But
    you should have it call a script (we'll get to that script in step 3).

    2. Add this to your robots.txt file. You now know that anyone who accesses
    "admin44687-secret.html" is trying to look at something they KNOW they are
    not supposed to. There are NO false positives here (hence "honeytoken").
    Anyone who accesses this is BAD. Period. Valid web-crawlers will ignore this
    page since you listed it in robots.txt. When the kiddies DO go to this page,
    your script is called:

    3. Your magic script gets the bad guys source IP address, and automatically
    adds the IP to a temporary "blacklist". Maybe he gets blocked at your
    firewall for a week, a month, whatever you want (although it sounds like you
    are a Windows shop, which limits your flexibility quite a bit. I wouldn't
    even begin to know how to accomplish this with Windows, anyone else on the
    list care to make a suggestion?). You could even have your fake "admin" page
    display a message along the lines of:

     "You are now blocked from our site. If you were just screwing around and
    don't want to be blocked, send us an email and we MIGHT let you back in by
    our good graces."

    This is fun stuff!

    And now for your question (which you can probably answer yourself by now if
    you've been paying attention):

    > I understand that some "whois" site is checking my server, but Is this
    > dangerous? Should I block this IP?

    Dangerous: No, not really. Not unless you are actually putting VALID pages in
    your robots.txt file that you really DON'T want others to see. Remember, you
    shouldn't be doing this. If you don't want people to see it, don't put it on
    the Internet! Otherwise, treat this as normal web traffic.

    Should you block the IP's? No, not really. Most of these are valid web
    crawlers like the Googlebot. You DO want people to be able to find your site
    via Google and Yahoo and all the others don't you? Again, if you WANT to get
    fancy and set up a "honeytoken" with this, then it can be a lot of fun. But
    it seems to me that this would be difficult or near impossible on Windows
    platforms. And while fun, this is definitely NOT a "necessary" defense tool.
    This is more like "icing on the cake", a cake made out of a very solid
    foundation of effective security measures. Concentrate on the fundamental
    stuff first, like strong firewall filters, good network design, system
    hardening, patching, anti-virus, and all the other REALLY important stuff
    that really boring and geeky security people (like me) keep trying to drill
    into the public. Don't get fancy until you are really good at the fundamental
    stuff, because this is where your biggest "bang for your buck" is. The fancy
    stuff on top are much smaller gains.

    Have fun.

    -- 
    Miles Stevenson
    miles@mstevenson.org
    PGP FP: 035F 7D40 44A9 28FA 7453 BDF4 329F 889D 767D 2F63
    
    



  • Next message: Gabriel Orozco: "Re: Advice on Fastest NMAP Scan"

    Relevant Pages

    • Re: [slrn] macro to search message-id
      ... Had a bit of a look - not sure exactly just by reading the code, ... That's why tsca's script extracts the MID for you;)) ... Hmmm - I see on close inspection that that script does indeed google for ...
      (news.software.readers)
    • confused Re: running fsck out of a script; drive never shows being checked
      ... On Tue, 8 Jun 2004, Silvan wrote: ... aint it fun, dumb of me, to look at the man pages of fdisk and talk about ... > So I should swap fsck for e2fsck for starters. ... but i'd add the umount to the script, ...
      (Debian-User)
    • Re: NBC so who are your "five freebies"?
      ... relationship that makes it fun. ... it does get me suspicious glances when I happen to suggest that a ... certain actress might be appropriate for a part in the screen adaptation of ... I don't think I'd want to write the script -- let somebody else ...
      (rec.music.artists.springsteen)
    • Re: B5:TLT update from jms
      ... It probably resembles an animation script more than a standard ... TV script, because it incorporates my directorial notes, shot for shot, ... but it should be fun in the end. ... Have fun, JMS! ...
      (rec.arts.sf.tv.babylon5.moderated)
    • Re: Saving output of Turtle Graphics?
      ... Dick Moores wrote: ... the other day and have been having some fun with it. ... Now I'm wondering if there is a way to build into a script the saving of each window just before it is cleared. ... that exports canvas graphics to SVG ...
      (comp.lang.python)