Re: [BULK] Applying data mining to Intrusion Detection System

From: Sanjay Rawat (sanjayr_at_intoto.com)
Date: 07/18/05

  • Next message: Krzysztof Cabaj: "Re: Wireless IDS"
    Date: Mon, 18 Jul 2005 11:24:09 +0530
    To: trantichphuoc@yahoo.com, focus-ids@securityfocus.com
    
    

    hi patrick:

    I have used the KDD'99 dataset. all the 10% type datasets are the 10% of
    the whole data. testdata is full data for testing your algo.
    corrected_labels data set is the 10% data with labels. if you dont have any
    label data then how can you measure the accuracy of your also.
    reagarding the script, i have not used that, but I used my own calculation.
    mainly, you have to provide results in terms of ROC, which is a plot
    between false positives and detection rate. you can always calculate FP and
    DR for the data, you are using as follows;

    FP= # normal, classified as abnormal/total # of normal

    DR= # abnormal, classified as abnormal/total # abnormal

    At 06:03 PM 7/16/2005, trantichphuoc@yahoo.com wrote:
    >Hi all,
    >I am a newbie in Network Security. I have looked at a webiste about KDD 99
    >(http://www-cse.ucsd.edu/users/elkan/clresults.html ) and I found this
    >very interesting.
    >I would like to try the dataset and use some data mining tools to mine
    >this. However, I am having few problems.
    >
    >
    >1. The data I downoaded from
    >(http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html)
    >
    >kddcup.data.gz The full data set (18M; 743M Uncompressed) -> I need the
    >output (classified as normal or an intrusion) so that a supervised
    >learnign can be done. This file is too big so I cant even open it to see
    >if it contains the output.
    >
    >kddcup.data_10_percent.gz A 10% subset. (2.1M; 75M Uncompressed) -> is
    >this 10% extracted from the above whole data?
    >
    >kddcup.newtestdata_10_percent_unlabeled.gz (1.4M; 45M Uncompressed) -> is
    >that true the test data is not extracted from the training data (743 Mb) ?
    >
    >kddcup.testdata.unlabeled.gz (11.2M; 430M Uncompressed) -> is this test
    >data the same with above test? and how different?
    >
    >kddcup.testdata.unlabeled_10_percent.gz (1.4M;45M Uncompressed)
    >
    >corrected.gz Test data with corrected labels.
    >
    >I see so many test sets and have no clue which one to use.
    >
    >2. What tool would you recommend me to use to mine these data?
    >
    >3. How can I run the scoring script in
    >http://www-cse.ucsd.edu/users/elkan/awkscript.html
    >I dont know how to evaluate my model after I finish training. Do I have to
    >send my model to the commeetee in order to have it evaluated, or I just run
    >
    >the script by myself. What I really want to evaluate my model is the way
    >described in http://www-cse.ucsd.edu/users/elkan/clresults.html
    >
    >
    >Could anyone please give me some advices about this?
    >Thanks
    >Have a nice day
    >Patrick Tran
    >
    >------------------------------------------------------------------------
    >Test Your IDS
    >
    >Is your IDS deployed correctly?
    >Find out quickly and easily by testing it
    >with real-world attacks from CORE IMPACT.
    >Go to http://www.securityfocus.com/sponsor/CoreSecurity_focus-ids_040708
    >to learn more.
    >------------------------------------------------------------------------

    Sanjay Rawat
    Senior Software Engineer
    INTOTO Software (India) Private Limited
    Uma Plaza, Above HSBC Bank, Nagarjuna Hills
    PunjaGutta,Hyderabad 500082 | India
    Office: + 91 40 23358927/28 Extn 423
    Website : www.intoto.com
       Homepage: http://sanjay-rawat.tripod.com

    ------------------------------------------------------------------------
    Test Your IDS

    Is your IDS deployed correctly?
    Find out quickly and easily by testing it
    with real-world attacks from CORE IMPACT.
    Go to http://www.securityfocus.com/sponsor/CoreSecurity_focus-ids_040708
    to learn more.
    ------------------------------------------------------------------------


  • Next message: Krzysztof Cabaj: "Re: Wireless IDS"

    Relevant Pages

    • Re: Sequential printed labels
      ... PC in my office to create incremental barcoded labels for musical ... If it weren't for the incremental number, you could just print the required number of labels for each record and be done, but since you really need 24 different labels for a single record, it's either 24 successive print commands (with a script modifying the incremental number between each print command) or a single print command for 24 different records. ... The first one allows to pass data between 2 files trough a "constant relationship" while the second one is a "looping script" where we increment a counter at each iteration, so that we can control the number of records that will be created in the printing file. ...
      (comp.databases.filemaker)
    • Re: label for bar chart
      ... Here is the script that you should use (N.B.: ... I had to move the terminal settings to the end. ... plot './B_101296.dat' u 4:1 w l ... When I run this script with your data, the labels will be placed at ...
      (comp.graphics.apps.gnuplot)
    • Re: Printing Labels in FMP 8
      ... If I have a partial sheet of labels how can I ... dictate to FMP which label to start printing on? ... creating with FMP's Scriptmaker to create such a script. ...
      (comp.databases.filemaker)
    • Re: Brautigams Rachmaninov Preludes
      ... However, until RK's post, I was not aware Brautigam ever ... I'd rather not pay for something I discover later I dont like. ... would that seem a good marketing result for labels. ... but from the weight of their unsold inventory. ...
      (rec.music.classical.recordings)
    • Re: arbitrary disk name assignment affects dump/restore
      ... just adapt your script to use whatever symlink you like in one of: ... I prefer labels since they can be set to something meaningful/mnemonic. ... Write a label every partition on every hard disk that was referred to ...
      (Debian-User)