RE: Statistical Anomaly Analysis?
From: Xiaoyong Wu (xwu@anr.mcnc.org)Date: 03/18/02
- Previous message: Xiaoyong Wu: "Re: Statistical Anomaly Analysis? "Was [more specific] Signaturevs. Protocol Analysis ""
- In reply to: eddonega@WellsFargo.COM: "RE: Statistical Anomaly Analysis?"
- Next in thread: Bill Royds: "RE: Statistical Anomaly Analysis?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Date: Mon, 18 Mar 2002 11:46:11 -0500 (EST) From: Xiaoyong Wu <xwu@anr.mcnc.org> To: eddonega@WellsFargo.COM
Agree. Networks are hard in modelling as representing in statistical
distributions. This is the same hard problem in network simulation or
emulation. An expert system with some statistical functionality or
statistical signatures will be more successful than the statistical system
itself. Due to the variances in the neworks, it has to have some learning
capability so that it can learn the system well. Although current
statistical systems come with training techniques, this process is too
naive to be deployable.
The statistical training process is based on the assumption that the
network are out of intrusion while training. This is not ture for the real
environment as for the lab environment. In the lab environment, the
control case can be finely tuned and to guarentee to have a clean
training. In real environment, the intrusions come and go. The training
process could be contaminated.
Another issue is there are too many parameters to be trained for current
statistical systems. Without previously knowing well what your network
would be, it is hard even to define the parameters. The time line, "how to
calculate the long term" and "how to calculate the short term", is one
parameter for sure. By using different long term and short term, it might
be possible for the statistics to cover system intrinsic dynamics over the
time, as you said, those "batchy" statistical spikes. The topology and
traffic load, "how to characterize or catalog different network traffics"
is also an important paramter. There are also many other parameters to be
considered. So, the involvement of a system admin is inevitable but there
are still too much for the system admin.
Well, so, how about small system that is easy to model and can easily tell
if it's under intrusion or not? Will any complex statistical analysis
algoriths work better than now existing simple statistical signatures to
catch those well chosen cracks tried against a few select machines?
We have to wait and see.
-Xiaoyong
On Fri, 15 Mar 2002 eddonega@WellsFargo.COM wrote:
> The problem with IDS via RMON, which this discussion almost borders, is that
> a few things have to be true. First, the "intrusion" has to be noisy enough
> to have statistical consequence. The heavy duty worms running through the
> street blasting away with network floods or application hijacking en masse
> might register on a carefully thought through statistical algorithm, but the
> noisy ones are not where I would consider the challenge of IDS to be. That
> assumption was listed early on in the start of the thread though as a given.
>
> More troubling with this line of development is the challenge of trying to
> define statistical concepts of network normalcy. Security manager seem
> eager to try, but network performance managers have been down this road
> before without huge success. Network are very different not only between
> companies, but also at different times. Many applications, systems, and
> networks are very "batchy" by nature creating statistical spikes, and
> applications being turned up or down, networks being joined etc., just make
> it impossible to effectively statistically baseline with any precision.
> SNMP "learning" devices spent huge processing power to model networks on the
> fly and rarely made much inroad to effectiveness.
>
> So it is seems a good idea at the outset, but I am not holding my breadth
> either for either a deployable system, or anything that can detect a few
> well chosen cracks tried against a few select machines.
>
> -----Original Message-----
> From: Blake Matheny [mailto:matheny@dbaseiv.net]
> Sent: Friday, March 15, 2002 12:05 PM
> To: Xiaoyong Wu
> Cc: focus-ids@securityfocus.com
> Subject: Re: Statistical Anomaly Analysis?
>
>
> Comments inlaid below.
>
> Whatchu talkin' 'bout, Willis?
> > In this type of analysis, we look at the distributions of network
> > traffics or the total amounts of network traffics. The assumption is that
> > the recent short term distribution should be close to the long term
> > distribution. Thus we compare the short term behavior with the long term
> > behavior and detectes any deviation which is over some threshold.
> Unfortunately without a significant amount of hand holding this method
> of anomaly detection is typically vulnerable to data set poisoning. That
> is to say that many of these statistical methods rely on varying types
> of regression analysis, which do a best fit type of match for data. For
> example, if you are getting statistics about a users login habits over
> the course of 6 weeks, and someone other then that user is illegally
> using the account, your data set will be poisoned. When you go to
> analyze recent data, compared with your long term data set, the behavior
> will appear normal. This example can be applied to many types of data
> analysis.
>
> In addition to that method of data poisoning, it is often possible to
> inject anomalous data slowly over a long period of time, so that it
> becomes part of the normal distribution of data. Again, this becomes a
> place where a lot of monitoring is necessary, what is the benefit?
>
> > Consider a company network environment, the percentage of HTTP, SMTP,
> > NNTP, SNMP traffics should be statistical expectable over a long period of
> > time. For those email worms/viri such as Love Letter, this definitely
> > introduces a spike for SMTP traffic. For those attacks against web servers
> > such as Code Red, it introduces a spike for HTTP traffic. For the recent
> > attacks against SNMP buffer overflows, there will be a spike for SNMP
> > traffic. Taking a look at the statistical service port distribution or
> > protocol distribution will discover some analmorlies in the network. One
> > problem is that this technique won't be able to tell what exactly the
> > intrusion is.
> Sure. That all makes sense. But I'm sure before your advanced anomaly
> detection method picks up the newly acquired problem your advanced
> secretary system will complain that they can't get to their favorite
> web site, or are having mail problems. Obviously that doesn't apply so
> much to more specialized worms with a more devious intention then "get
> into every server we can".
>
> > I am not sure if any commercial NIDS product implements this technique.
> > Will the high false alarm rate or other defects in this type of analysis
> > annoy the customers? Or, they might just turn this detection off to avoid
> > being swamped in false alarms? What is an acceptable level of false alarms
> > for any commercial product?
> I thought that Dragon did something like this, but it's been a long
> time since I looked at their product. In any case, to get to some
> semblance of a point, this type of anomaly detection (effective
> anomaly detection) is still currently an academic case. Some people
> will argue that it's ready for prime-time, but if so where is it? If
> this is something you're interested in, you may want to read some
> papers on neural nets. NNs could be there, in my opinion, if more
> people in the security realm were getting into them.
>
> -Blake
>
-----------------------------------
Network Research Engineer, 919.248.1469
Advanced Network Research Group,MCNC xwu@anr.mcnc.org
- Previous message: Xiaoyong Wu: "Re: Statistical Anomaly Analysis? "Was [more specific] Signaturevs. Protocol Analysis ""
- In reply to: eddonega@WellsFargo.COM: "RE: Statistical Anomaly Analysis?"
- Next in thread: Bill Royds: "RE: Statistical Anomaly Analysis?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Relevant Pages
|