RE: A question for user behaviour profile based IDS

From: Bill Royds (sf-lists@royds.net)
Date: 03/27/02


From: "Bill Royds" <sf-lists@royds.net>
To: "fengli" <lfeng@sei.xjtu.edu.cn>, "Focus-Ids" <focus-ids@securityfocus.com>
Date: Wed, 27 Mar 2002 14:25:29 -0500

A little explanation about the chi-square statistic and how it works.
The chi-square tries to measure whether observed behaviour is significantly different from expected behaviour by calculating a number that will be 0 if the behaviour is the same and very large if the observed behaviour is very different from expected.
  To do this it makes the assumption that random deviations (unimportant ones) are distributed as the normal distribution so some mathematical properties are available to give a useful measure. Notice it doesn't imply the expected behaviour is a normal distribution, just the deviations from normal, a much safer assumption. It also, of course, makes the assumption that you can put a meaningful number of expected and observed behaviour. This is actually the harder part.
  The assumption for chi-square is that the deviation between each observed and expected values is a normal(0,1) distribution (mean of 0, variance of 1), a continuous number (any value is allowed but adjustments can be made for integer values) and each deviation is independent of any other deviation.

If what you are measuring in your HIDS is packet counts per port, then you have some of the assumptions.
We are measuring numbers, the numbers should be independent of each other (incoming traffic on one port is fairly independent of another port number within reason) and, if we get enough packets, close to a smooth distribution.

So, once we have created an expected packet count/time E[i] for ports i=0 to n. We can watch our system for the time interval and get an observed packet count O[i]. We know that we won't get exactly the same observed count as expected, but the deviation should be small on average if the difference is truly random.
  We calculate the X-square (Greek letter chi) as sum(0 to n) {(O[i]-E[i])^2/E[i]} or (deviation squared)/(expected number). Since these are discrete values (packet received or not), we should adjust for continuity by grouping small expected counts together until we get an expected value of at least 5 (a rule of thumb). so we don't have expected values very small.
We must use actual counts, not percents or proportions. If it is very large (basically > a X2(n-1) function value for a given probability), we have observed packet counts different from what we expect, but we can't prove that they are malicious or an attack. It is just a heads up to look at the data further. Statistics never indicates truth or falsehood, just gives a probability that what we observe is as expected or not.

  So chi-square seems to be useful in giving a rough measure of "in control" "possible attack", but it should be coupled with other measures for any full decision making.
 
  

-----Original Message-----
From: fengli [mailto:lfeng@sei.xjtu.edu.cn]
Sent: Wed March 27 2002 08:43
To: Focus-Ids
Subject: A question for user behavior profile based IDS

  Hi all !
   I am doing the research about HIDS .and I want to analyze the user's behavior to get the their normal profiles .If the intruders or masqueraders' behavior deviate from the normal profiles then we can capture it !
   my question is How can we get the deviation ? Many years ago Sri co. put forward the "chi-square" (statistics methods) to mesure it in the NIDES. Does it really work?
   and by the way whether the research of user behavior profile based IDS is promising or not? and can you give me the advice for the promising methods for it?
  Any discussion or advice is appreciated!
                                                                                                                                                                                                  stonefeng