Re: Statistical Anomaly Analysis?

From: Blake Matheny (
Date: 03/15/02

Date: Fri, 15 Mar 2002 15:05:15 -0500
From: Blake Matheny <>
To: Xiaoyong Wu <>

Comments inlaid below.

Whatchu talkin' 'bout, Willis?
> In this type of analysis, we look at the distributions of network
> traffics or the total amounts of network traffics. The assumption is that
> the recent short term distribution should be close to the long term
> distribution. Thus we compare the short term behavior with the long term
> behavior and detectes any deviation which is over some threshold.
Unfortunately without a significant amount of hand holding this method
of anomaly detection is typically vulnerable to data set poisoning. That
is to say that many of these statistical methods rely on varying types
of regression analysis, which do a best fit type of match for data. For
example, if you are getting statistics about a users login habits over
the course of 6 weeks, and someone other then that user is illegally
using the account, your data set will be poisoned. When you go to
analyze recent data, compared with your long term data set, the behavior
will appear normal. This example can be applied to many types of data

In addition to that method of data poisoning, it is often possible to
inject anomalous data slowly over a long period of time, so that it
becomes part of the normal distribution of data. Again, this becomes a
place where a lot of monitoring is necessary, what is the benefit?

> Consider a company network environment, the percentage of HTTP, SMTP,
> NNTP, SNMP traffics should be statistical expectable over a long period of
> time. For those email worms/viri such as Love Letter, this definitely
> introduces a spike for SMTP traffic. For those attacks against web servers
> such as Code Red, it introduces a spike for HTTP traffic. For the recent
> attacks against SNMP buffer overflows, there will be a spike for SNMP
> traffic. Taking a look at the statistical service port distribution or
> protocol distribution will discover some analmorlies in the network. One
> problem is that this technique won't be able to tell what exactly the
> intrusion is.
Sure. That all makes sense. But I'm sure before your advanced anomaly
detection method picks up the newly acquired problem your advanced
secretary system will complain that they can't get to their favorite
web site, or are having mail problems. Obviously that doesn't apply so
much to more specialized worms with a more devious intention then "get
into every server we can".

> I am not sure if any commercial NIDS product implements this technique.
> Will the high false alarm rate or other defects in this type of analysis
> annoy the customers? Or, they might just turn this detection off to avoid
> being swamped in false alarms? What is an acceptable level of false alarms
> for any commercial product?
I thought that Dragon did something like this, but it's been a long
time since I looked at their product. In any case, to get to some
semblance of a point, this type of anomaly detection (effective
anomaly detection) is still currently an academic case. Some people
will argue that it's ready for prime-time, but if so where is it? If
this is something you're interested in, you may want to read some
papers on neural nets. NNs could be there, in my opinion, if more
people in the security realm were getting into them.