Re: Statistical Anomaly Analysis?
From: Xiaoyong Wu (xwu@anr.mcnc.org)Date: 03/15/02
- Previous message: switched: "Re: Possibility to cheat integrity checking?"
- In reply to: Marcus J. Ranum: "Re: Statistical Anomaly Analysis?"
- Next in thread: Chad Schieken: "Re: Statistical Anomaly Analysis?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Date: Fri, 15 Mar 2002 13:57:06 -0500 (EST) From: Xiaoyong Wu <xwu@anr.mcnc.org> To: "Marcus J. Ranum" <mjr@nfr.com>
On Fri, 15 Mar 2002, Marcus J. Ranum wrote:
<snip>
>
> It's probably a safe assumption that short term distribution is close to long
> term distribution but the question becomes "how short?" and also I would
> suggest that the effectiveness of such a measure depends largely on the
> size of the network whose traffic you're looking at. Within a smaller
> subnet or a network isolated behind a firewall, I suspect your traffic may
> remain rather similar over time but what about within a larger context
> or an unfiltered context? Vern Paxon's done some interesting work on
> statistical properties of networks ("critically examining criticality") and
> I think there's still too much to be learned before systems for
> statistical anomaly detection will be useful in the large scale.
Interesting. I thought that it would be harder for small scale networks.
For large scale networks, the aggregation of traffics should smooth the
variasions and turn it to more statistically expectable distributions.
This is also the basic assumption for those Poisson packet generators.
This might not be true since more and more discussions on self-similarity
attributes etc. on the Internet traffics. I agree that the answer
to the question about "how short?" depends from site to site. The system
has to get trained also for those parameters.
>
> >This technique was first invented in SRI's NIDES IDS and our group adopted
> >into detecting some intrusions or attacks against a routing protocol,
> >specificly, OSPF. It was quite effective so to speak but the false alarm
> >rate was also high. Currently, we are using this technique to detect any
> >intrusions or misbehaviors in QoS networks.
>
> This is a typical approach to statistical anomaly detection, and, to me,
> it isn't very interesting. (No offense intended!) When someone talks about
> doing statistical anomaly detection, one usually thinks "cool, we'll just
> model _all_ the traffic on the network and then look for stuff that's not
> normal." As you no doubt know, that doesn't work because _all_ the traffic
> doesn't fit into a convenient model. So you can look at modelling the various
> sub-types of traffic and then it's possible to get useful results in the smaller
> scale..... BUT....
>
> Here's why it's a yawner: basically you're just building an expert system
> that has some statistical signature engines at the leaf nodes of your
> decision tree. By deciding (based on convenience or expert knowledge)
> to do statistical anomaly detection on OSPF, you've narrowed the field
> down so far that it's probably like shooting fish in a barrel. I've also seen
> a lot of cases where what passes as "statistical anomaly detection" is
> based on so much a priori knowledge of the protocols involved that the
> statistical method used has been tailored to the point of being a pretty
> straightforward signature.
My experience seems to give me a little bit different perspective. All
traffics or the aggregated traffics might be easier to model than a
sub-type of traffic because of the smoothing effects although either of
them is a difficult problem. Well, yes, the statistical anomaly analysis
is a simplified version of an expert system or just a small piece for
building an expert system. As for the OSPF case here, the distribution for
different types of OSPF packets is highly variant to different topologies.
As long as the topology keeps stable, the distribution will be more or
less constant. It might not be easy to tailor it to a straightforward
signature but a small expert system will be enough to describe this property.
Also, there doesn't seem to be many IDS works on network infrastruture, as
I mentioned OSPF, or recently more hot topic, BGP.
>
> Consider if we're doing "statistical anomaly detection" to detect a certain
> class of denial of service attacks.. Let's say we decide to build a statistical
> model of the normal relationship of SYN packets to FIN packets and RST
> packets. .. Then we discover that if SYN packets are more than 2 standard
> deviations from the norm, something is going wrong. Duh. That's shooting
> fish in a barrel.
Well, yes, this is the simplest case that could be. But, let's consider
another a little bit different case. If we are building a statistical
model for all the IP packet types, it might not be straightforward to see
which one will be deviated from the norm. So, when the barrel becomes
bigger enough, the fish becomes harder and harder to shoot.:)
>
> As you say, the alarm rate is also quite high. You either get too many
> false positives or too many false negatives. There was a recent article
> (some university's P.R. department...?) about someone who did a survey
> of some statistical IDS methods:
> http://unisci.com/stories/20021/0307023.htm
> http://abcnews.go.com/sections/scitech/CuttingEdge/cuttingedge020308.html
> http://www.decisionsciences.org/dsj/Vol32_4/32_4_635.htm
> then hyped the hell out of it.
>
> According to the article, the best they got was a ~75% accuracy rate.
> On a typical commercial IDS deployment you might get a few hundred alerts
> a day of which a handful are significant. Increasing the noise factor by
> 25% would render the system completely unmanageable...
The 75% accuray rate is really not a good enough result for commercial
products. As far as I can see, the results are improving over the time.
But, I have some doubts about using the false positive rates to evaluate
the IDS. Consider the fire alarm systems which would be similar enough to
the IDS. Most of the alarms from the fire alarm systems are false
positives until there's a real fire. But we can not say that the fire
alarm systems are not effective.
>
> >Consider a company network environment, the percentage of HTTP, SMTP,
> >NNTP, SNMP traffics should be statistical expectable over a long period of
> >time. For those email worms/viri such as Love Letter, this definitely
> >introduces a spike for SMTP traffic. For those attacks against web servers
> >such as Code Red, it introduces a spike for HTTP traffic. For the recent
> >attacks against SNMP buffer overflows, there will be a spike for SNMP
> >traffic. Taking a look at the statistical service port distribution or
> >protocol distribution will discover some analmorlies in the network. One
> >problem is that this technique won't be able to tell what exactly the
> >intrusion is.
>
> I'd almost call those "statistical signatures" - If you've decided in advance
> that load spikes in SMTP and HTTP are interesting events it's not
> exactly rocket science to fire an alarm when you see them!! In fact, that's a
> pretty reasonable happy medium between classical "statistical anomaly detection"
> and pure pattern matching signatures: identify floor and ceiling conditions on
> certain well-known areas and generate alerts if those are exceeded. It's also
> MUCH more useful because at that point you have _some_ idea of what
> the traffic _means_ - you can trigger an alert and say:
> "mail traffic is 2 standard deviations from the norm: be afraid!"
> but even that'd be less useful than an alert reading:
> "unusually large number of mail messages containing the attachment foozl.exe!"
> See - if you already know that mail volumes are an interesting signature, then
> cut to the chase and get right into that dataset and forget about doing the
> fancy adaptive statistics. ;)
Well, as I described above, I won't define that load spikes in SMTP or
HTTP are interesting events but actually load spikes in any protocol or
towards a specific port are those possible alarm events. I always consider
signature based intrusion detection as pure pattern matching signatures
and not so sure about "statistical signatures". Perhaps there should be
some point that this two can be merged an combined. I totally agree that
statistical results are not intuitive and human comprehendable. I won't
consider any IDS with only statistical anomaly analysis but statistical
anomaly analysis is also possible to provide a different view point for
other techniques.
>
> The last problem you refer to is the really interesting one. Statistics don't
> "understand" anything, all they can do is describe relationships between
> things. So as your "statistical anomaly detection" system looks at increasingly
> broad parts of your data spectrum its ability to tell you what it's looking at
> becomes comparatively smaller. Suppose you looked at statistics about packet
> sizes - then you actually might flag a CODE RED attack, but it'd be nearly
> worthless information since all the IDS could do is tell you "there's an unusually
> large number of packets of size X - go figure out why!" - of course you can narrow
> it down to "there's an unusually large number of packets of size X on port 25!"
> but then you're walking right back down into building an expert system...
>
> My guess is that few commercial lDS customers would be excited by a product
> that sent them descriptive statistics about things they had to research manually.
> Few people have the time and inclination to get all worried, drive into the office,
> snarf down a bunch of packets, and figure out what's going on. There are probably
> a few dozen organizations that _would_ want to do such a thing but they're
> already doing it using in-house developed tools. They're mostly researchers, too.
> I think the conventional operations manager wants less "here: go figure it out"
> information and more "here's what's wrong" information. It's because of that that
> I don't think statistical anomaly detection (unless it's at the leaf node of a
> decision tree) is going to take off commercially...
>
> >I am not sure if any commercial NIDS product implements this technique.
>
> I don't think any successful ones do. :) But many of our customers ask about it. :)
> Usually when you start to explain the problem they go "oh, yeah. nevermind."
>
Yeah, so, doesn't that mean statistical anomaly is not a concern for
customers or it's just because of the results it provides are not useful
_yet_ for customers?
> >Will the high false alarm rate or other defects in this type of analysis
> >annoy the customers? Or, they might just turn this detection off to avoid
> >being swamped in false alarms? What is an acceptable level of false alarms
> >for any commercial product?
>
> I think that one of the greatest values in an IDS is its ability to diagnose
> what's going on. What got me into building IDS, in fact, was the large number
> of customers who used to buy my firewalls and ask "ok, so it dropped that
> connection - what does that _mean_??!" ("I don't know" is not an acceptable
> answer...) So one of the value propositions of an IDS is that it'll take this
> blast of data and turn it into a neat diagnosis:
> "code red attack XXX to XXX"
> and the customer is happy. This is absolutely critical to commercial usefulness,
> and therefore success.
>
> Imagine for a second that we're all presales support engineers for an IDS
> vendor. :) Let's pretend that we have an IDS that does _only_ statistical
> anomaly detection - no signatures or expert logic. So we go to our prospective
> customer and our sales cycle looks like this:
> 1) "Hi! Let's plug it in and give it a week to train itself, OK?"
> 2) come back a week later
> 3) puzzle over some output
> 4) Run an ISS scan and _maybe_ something will come out of the IDS
> or maybe it won't. Or, maybe the IDS will just burp out an incomprehensible
> message like:
> "host XXX is generating 2x variance connection requests in last 4 minutes."
> uh...
>
> Then imagine our competitor walks in with a misuse detection system based on
> signatures. Their sales cycle looks like:
> 1) "Hi! Let's plug it in, OK?"
> 2) Run an ISS scan
> 3) "LOOK! SEE!? It's BEEPING! It says it sees an ISS SCAN! COOL!"
>
> I'm being slightly facetious but obviously you can imagine how people will
> tend to reach for a solution that provides information that's more immediately
> accessible.
Yes, that's true. As I said before, an IDS with only statistical anomaly
analysis is no way to go. But are there definitely some of the attacks
previous unknown to the community that could only be catchable by
statistical method which of course has no way to show off to customers?
>
> Over time I think the current crop of IDS are going to incorporate some form(s)
> of statistical anomaly detection. But let me make a guess: they will do the
> anomaly detection against the alert outputs of the IDS sensors. So instead of
> the individual IDS sensor doing anomaly detection, there will be anomaly detection
> capabilities in the post-processor that manages alerts. This is/will be critical
> since at that point the data has already been turned into diagnosed information
> that is accessible to the end user. Of course, the really weird stuff will be below
> the radar screens of such systems and will be missed... There are already a
> number of researchers using IDS sensors outputs as inputs into statistical
> anomaly detection systems. In other words, they'll tell you:
> "the number of CODE RED alerts is 2x the standard deviation of alerts.
> you've got an abnormally high number of CODE RED alerts!!!"
That part of the work might be necessary when you get lots of them, or you
get lots of IDS deployed in your system. Don't you think that there should
be some higher layer stuff that combines those results from different
types of IDS'es?
>
> Frankly, I think that's such obvious "duh!" stuff as to hardly be worth calling
> "research", but they'll get some publications out of it and they'll be happy.
Well, this might be because of the different targets as for acdemic vs.
industry. Industry builds the things that work while acdemic wants the
things that work best.
To my point of view, signature based, protocol based, statistical anormaly
based, etc. should be all the different types of agents that deployed in
the network field. An expert system is required to take in all the outputs
from them as inputs and then provides a final result. Here, the signature
based means the pure pattern matching technique. The protocol based means
those follow a RFC, define an automata based on it and consider those
unreachable or dead states or un-parsable packets to be anomalies.
-Xiaoyong
-----------------------------------
Network Research Engineer, 919.248.1469
Advanced Network Research Group,MCNC xwu@anr.mcnc.org
- Previous message: switched: "Re: Possibility to cheat integrity checking?"
- In reply to: Marcus J. Ranum: "Re: Statistical Anomaly Analysis?"
- Next in thread: Chad Schieken: "Re: Statistical Anomaly Analysis?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Relevant Pages
|