RE: Possible DOS against search engines?
From: Rob Shein (shoten@starpower.net)
Date: 02/04/03
- Previous message: Philip Stoev: "Possible DOS against search engines?"
- In reply to: Philip Stoev: "Possible DOS against search engines?"
- Next in thread: jasonk: "RE: Possible DOS against search engines?"
- Reply: jasonk: "RE: Possible DOS against search engines?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
From: "Rob Shein" <shoten@starpower.net> To: "'Philip Stoev'" <philip@stoev.org>, <vuln-dev@securityfocus.com> Date: Mon, 3 Feb 2003 18:45:00 -0500
I see a few problems here. Problems are listed below each concept, for
clarity, and assume a decent webcrawler.
>
> 1. You create a generator for fake web pages, whose purpose
> is to spit out HTML containing a huge amount of (pseudo)
> random _non-existing_ words, as well as links to other pages
> within the generator;
I doubt this would make even a slight dent in things. Seeing as how
webcrawlers already walk the entire internet, with its various languages,
enormous expanse, and endless misspellings, I think anything you could
create would end up being a drop in the bucket.
>
> 2. You place that generator somewhere and submit the URL to
> search engines for crawling;
>
> 3. The search engines then crawls the site, possibly reaching
> their pre-defined maximum of crawling depth (or, if badly
> broken, crawl the site indefinitely, jumping from one freshly
> generated page to another);
But they don't crawl indefinitely. What do they do if they hit two sites
that link to each other? They notice this, and move on.
> 4. Upon adding the gathered words to the search engine's
> index, the index becomes heavily overloaded with the newly
> added words, as they are outside of the real-language words
> already present in the index. The following should be
> theoretically possible:
But who would search on them?
> - craft fake words so that they attack a specific hash
> function. Make a bunch of fakes that hash to the same value
> as a legitimate word in the English language. This will
> possibly impact the performance of search engines using that
> particular hash function when they try to look up the
> legitimate words that are being targeted.
This would be noticed by the search engine long before it became a real
problem, and it would be addressed. This is how they deal with many things,
including people who try to influence their ranking using various means.
> - craft fake words so that they disbalance a b-tree
> index, if one is used. I am not entirely sure, however it
> appears to me that it is possible to craft words in such a
> way as to alter the shape of the b-tree and thus impact the
> performance on the lookups where it used.
>
> - craft fake words randomly so that the index just grows.
> To the best of my understanding, most search engines will
> index and retain keywords that are only seen on one web page
> in the entire Internet. However, I think the capacity of the
> search engines to keep track of such one-time non-English
> letter sequences is limited and can be eventually exhausted.
It is my belief that, again, they will notice the impact on their database
and quickly address the issue. What about a bit of code that states that if
more then 5% of the words in a page are unique in the database, that that
page is dropped?
> If the above-mentioned things are feasible, then one can even
> construct a worm of some sort, that will auto-install such
> fake page generators on valid sites, thus increasing the
> traffic to the crawler even more. Writing an short Apache
> handler meant to be silently installed in httpd.conf at
> root-kit installation should not be that difficult. When is
> the last time your reviewed the module list of your Apache?
> Will you spot a malicious module if it is called
> mod_ip_vhost_alias, loaded inbetween two other modules that
> you never knew are vital or not?
No, but I'd notice an abrupt lack of space on my web server. And the sudden
oddly-named URLS in my logs. And the corresponding oddly-named pages in my
site. And if I didn't notice, my hosting provider would.
> Please note that the setup described differs from the
> practice of generating fake pages containing a lot of real
> (mostly adult) keywords. After all, such real-language words
> already exist in the index, whereas I suggest bombing the
> index with a huge number of not-previously-existing
> freshly-generated random letter sequences. Also, please note
> that the purpose of the attack is to damage the index, and
> not to make the crawler consume bandwidth by going in an
> endless loop or something like that (though, the crawler has
> to scan the pages first so that the generated keywords are
> ultimately delivered to the index).
>
> I will appreciate any and all thoughts on the issue.
>
> Philip Stoev
>
- Next message: 3APA3A: "Re: Windows reverse Shell"
- Previous message: Philip Stoev: "Possible DOS against search engines?"
- In reply to: Philip Stoev: "Possible DOS against search engines?"
- Next in thread: jasonk: "RE: Possible DOS against search engines?"
- Reply: jasonk: "RE: Possible DOS against search engines?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Relevant Pages
|