RE: Statistical Anomaly Analysis?

From: eddonega@WellsFargo.COM
Date: 03/15/02


From: eddonega@WellsFargo.COM
To: matheny@dbaseiv.net, xwu@anr.mcnc.org
Date: Fri, 15 Mar 2002 13:15:17 -0800

The problem with IDS via RMON, which this discussion almost borders, is that
a few things have to be true. First, the "intrusion" has to be noisy enough
to have statistical consequence. The heavy duty worms running through the
street blasting away with network floods or application hijacking en masse
might register on a carefully thought through statistical algorithm, but the
noisy ones are not where I would consider the challenge of IDS to be. That
assumption was listed early on in the start of the thread though as a given.

More troubling with this line of development is the challenge of trying to
define statistical concepts of network normalcy. Security manager seem
eager to try, but network performance managers have been down this road
before without huge success. Network are very different not only between
companies, but also at different times. Many applications, systems, and
networks are very "batchy" by nature creating statistical spikes, and
applications being turned up or down, networks being joined etc., just make
it impossible to effectively statistically baseline with any precision.
SNMP "learning" devices spent huge processing power to model networks on the
fly and rarely made much inroad to effectiveness.

So it is seems a good idea at the outset, but I am not holding my breadth
either for either a deployable system, or anything that can detect a few
well chosen cracks tried against a few select machines.

-----Original Message-----
From: Blake Matheny [mailto:matheny@dbaseiv.net]
Sent: Friday, March 15, 2002 12:05 PM
To: Xiaoyong Wu
Cc: focus-ids@securityfocus.com
Subject: Re: Statistical Anomaly Analysis?

Comments inlaid below.

Whatchu talkin' 'bout, Willis?
> In this type of analysis, we look at the distributions of network
> traffics or the total amounts of network traffics. The assumption is that
> the recent short term distribution should be close to the long term
> distribution. Thus we compare the short term behavior with the long term
> behavior and detectes any deviation which is over some threshold.
Unfortunately without a significant amount of hand holding this method
of anomaly detection is typically vulnerable to data set poisoning. That
is to say that many of these statistical methods rely on varying types
of regression analysis, which do a best fit type of match for data. For
example, if you are getting statistics about a users login habits over
the course of 6 weeks, and someone other then that user is illegally
using the account, your data set will be poisoned. When you go to
analyze recent data, compared with your long term data set, the behavior
will appear normal. This example can be applied to many types of data
analysis.

In addition to that method of data poisoning, it is often possible to
inject anomalous data slowly over a long period of time, so that it
becomes part of the normal distribution of data. Again, this becomes a
place where a lot of monitoring is necessary, what is the benefit?

> Consider a company network environment, the percentage of HTTP, SMTP,
> NNTP, SNMP traffics should be statistical expectable over a long period of
> time. For those email worms/viri such as Love Letter, this definitely
> introduces a spike for SMTP traffic. For those attacks against web servers
> such as Code Red, it introduces a spike for HTTP traffic. For the recent
> attacks against SNMP buffer overflows, there will be a spike for SNMP
> traffic. Taking a look at the statistical service port distribution or
> protocol distribution will discover some analmorlies in the network. One
> problem is that this technique won't be able to tell what exactly the
> intrusion is.
Sure. That all makes sense. But I'm sure before your advanced anomaly
detection method picks up the newly acquired problem your advanced
secretary system will complain that they can't get to their favorite
web site, or are having mail problems. Obviously that doesn't apply so
much to more specialized worms with a more devious intention then "get
into every server we can".

> I am not sure if any commercial NIDS product implements this technique.
> Will the high false alarm rate or other defects in this type of analysis
> annoy the customers? Or, they might just turn this detection off to avoid
> being swamped in false alarms? What is an acceptable level of false alarms
> for any commercial product?
I thought that Dragon did something like this, but it's been a long
time since I looked at their product. In any case, to get to some
semblance of a point, this type of anomaly detection (effective
anomaly detection) is still currently an academic case. Some people
will argue that it's ready for prime-time, but if so where is it? If
this is something you're interested in, you may want to read some
papers on neural nets. NNs could be there, in my opinion, if more
people in the security realm were getting into them.

-Blake



Relevant Pages

  • Re: Statistical Anomaly Analysis?
    ... If you set up your model to account for each event type as a part of the ... the aggregation of traffics should smooth the ... > key, events that were 6-sigma outliers for a small network, and hence ... > likewise has an immense amount of variability; but its bulk statistics ...
    (Focus-IDS)
  • Re: Current state of Anomaly-based Intrusion Detection
    ... The "anomaly detection" technology that you find in successful products such ... statistics, learned traffic thresholds, and pattern recognition. ... NetFlow is a good source of flow data. ... > the fourth kind would be behavioral, where some metric of host or network ...
    (Focus-IDS)
  • Re: Statistical Anomaly Analysis?
    ... > statistical anomaly detection will be useful in the large scale. ... the aggregation of traffics should smooth the ... Also, there doesn't seem to be many IDS works on network infrastruture, as ... > fancy adaptive statistics. ...
    (Focus-IDS)
  • Re: Statistical Anomaly Analysis?
    ... > traffics or the total amounts of network traffics. ... > the recent short term distribution should be close to the long term ... of anomaly detection is typically vulnerable to data set poisoning. ...
    (Focus-IDS)
  • RE: Statistical Anomaly Analysis?
    ... network are out of intrusion while training. ... be possible for the statistics to cover system intrinsic dynamics over the ... >> traffics or the total amounts of network traffics. ... >> the recent short term distribution should be close to the long term ...
    (Focus-IDS)