[fw-wiz] Re: Flawed Surveys [was: VPN endpoints]

From: Marcus J. Ranum (mjr_at_ranum.com)
Date: 09/01/04

  • Next message: Devdas Bhagat: "Logs (was Re: [fw-wiz] VPN endpoint)"
    To: "Paul D. Robertson" <paul@compuwar.net>
    Date: Wed, 01 Sep 2004 12:52:44 -0400
    
    

    Paul D. Robertson wrote:
    >> or the CIO magazine survey on security) - a lot of these surveys are
    >> fundamentally flawed. They yield results but it's hard to say what the
    >> results actually _measured_.
    >
    >So long as they're flawed approximately the same way from survey to
    >survey, they're often both "better than nothing[1]" and a good relative
    >metric.

    Sorry, but you're completely wrong about that.

    The reason is because if you have a survey of unknown bias, you
    can't assume that the bias does not change because of other factors,
    because the bias is unknown. In other words, unless you know how
    wrong it is and why, you can't be sure it's wrong the same way
    twice.

    >We often don't need absolute metrics, relative metrics will do
    >just fine.

    Be careful; polls are opinion measures, not metrics. Metrics would
    be if you were (for example) pulling actual data from corporate
    financials regarding security expenditures. Measuring someone
    who claims to be CIO's opinion about what their expenditures
    either {are|should be} is not even good enough to give a relative
    metric.

    What I think you're saying, unfortunately, is "having some 'gee wow'
    numbers is good enough to blow some basic FUD and we need
    basic FUD so it's OK."

    > I know what my $foo risk was last year, and I know what it was
    >the year before, and I can compare to the survey and see the relative
    >differences and the relative change- therefore, I can figure out my
    >approximate relative change for this year.

    But that's the problem. You don't actually "know" anything. You
    have some information that is based on a self-selected sample
    which I guarantee you will change next year. Different people
    will be bored enough to answer the survey, and the answers
    they give will be either more or less misinformed than they
    were last year. There are no constants *whatsoever* in these
    surveys.

    Now, if you said you were going to take the same self-selected
    sample and poll those same people next year, you're starting
    to apply some controls to your survey, but they're still not going
    to be good enough to give you a result worth having.

    >> - How much the person cared about the topic (motive to respond)
    >> - How honest the respondent is (hard to verify)
    >> - Other factors (hard to predict)
    >
    >You can also (a) drop outliers

    You can't drop outliers because, since you actually know nothing
    about your data's provenance, you don't know what an "outlier"
    is when you're dealing with a self-selected sample. You might,
    for example, discard the survey response from the one *REAL*
    CIO who answers the survey! You simply do not know.

    What you're trying to do is apply science to pseudoscience. The
    result is comparable to polishing a turd: if you work at it hard enough,
    it still won't get shiny.

    >, (b) have cross-conflicting questions

    That simply measures consistency in response; not whether it is
    truthful or whether your sample is biassed.

    >(c) answer the questions on behalf of a known quantity and still be able
    >to validate polls pretty well. You obviously don't get people who don't
    >care to respond, but if the number of people who do respond is
    >significant, that's ok.

    NO IT IS NOT OK!
    ________________

    I am sorry, Paul - if you believe the statement you made above, you
    really really really need to read a few introductory texts on statistics,
    the scientific method, and research methods. Your statements above
    amount, to a trained statistician, as comparable to a declaration
    that not only is the earth flat, but it rests on the back of a turtle.

    I wasn't originally aiming my rant at Paul (I seem to be ranting
    at my buddies a lot these days...) but it is exactly the kind of
    tolerance of pseudo-science that Paul is advocating above
    that keeps security a "social science" rather than something
    measurable or quantifiable. Security practitioners are on the
    verge of understanding that we need to sell security in terms
    of ROI and risk, and it's just BEGINNING to sink in that
    risk requires real metrics and statistics. But we're still stuck
    with a lot of pseudo-science.

    >> I'm sure nobody on this list has ever filled out one of those surveys
    >> from a magazine in which they asked you your job position, whether
    >> you were a decision-maker, company size, etc... And I'm sure you
    >> all fill them out EXACTLY right. I used to enjoy periodically asserting
    >> that I was the CEO of a 1 person company with a $4,000,000 IT
    >> budget (well, a guy can dream, huh?) Unfortunately, sometimes
    >
    >You're out of the range of the mean by orders of magnitude, anyone doing
    >it even half-way should be throwing that response away (assuming they
    >*want* correct data,)

    ARRGH!! NO! NO MORE PSEUDO-SCIENCE!
    YOU ARE HURTING MY BRAIN!!!!!! MY HEAD IS
    GOING TO EXPLODE!!!

    Paul, if you are a scientist and you measure data, and then
    decide to throw away values that don't match your expectations,
    that's called "experimental fraud"!! That's um, bad!

    See, the problem is that you can't a priori decide you know
    what your mean _is_ until you know what your data is. So
    what if 50% of your self-selected sample all were feeling
    frisky that day and entered bogus figures? How _many_
    values around the mean will you throw away until you get
    a number that "feels right"??? That's how psychic researchers
    get their results: they know what they want to find and throw
    away data until it "feels right"???

    There is no amount of compensating controls you can use
    to polish a turd into a useful result. And, more importantly,
    at a certain point, the cost of polish exceeds the cost of
    doing it right in the first place!!

    Reading list:
            - "How to Lie with Statistics" - Darrell Huff
                    ISBN: 0393310728
            - "Research Design and Methods" (4th ed) Bordens and Abbott
                    ISBN: 0767421523
            - Richard Feynman's article on experimental controls and their
                    mis-application in social "sciences" from "the pleasure
                    of finding things out" (I think it's that book..)

    mjr.

    _______________________________________________
    firewall-wizards mailing list
    firewall-wizards@honor.icsalabs.com
    http://honor.icsalabs.com/mailman/listinfo/firewall-wizards


  • Next message: Devdas Bhagat: "Logs (was Re: [fw-wiz] VPN endpoint)"

    Relevant Pages

    • Re: [fw-wiz] Re: Flawed Surveys [was: VPN endpoints]
      ... >>So long as they're flawed approximately the same way from survey to ... >>We often don't need absolute metrics, ... > I am sorry, Paul - if you believe the statement you made above, you ... > risk requires real metrics and statistics. ...
      (Firewall-Wizards)
    • Re: Bored? Take a Late Show survey!!
      ... As long as this is all in good fun, and only an exhibition, then no harm done. ... A survey such as this cannot be used to prove anything. ... With a self-selected sample, no conclusions can be made about the opinions of AFL at large, let alone LS viewers generally. ...
      (alt.fan.letterman)
    • Re: [fw-wiz] Seeking input: Research Proposal: "Is a third position possible?"
      ... It would appear that not all CISSPs are "aware" of the canons, ... I'd be happy to approve a link to such a survey. ... Paul D. Robertson "My statements in this message are personal opinions ...
      (Firewall-Wizards)
    • Re: Bored? Take a Late Show survey!!
      ... As long as this is all in good fun, and only an exhibition, then no harm done. ... A survey such as this cannot be used to prove anything. ... With a self-selected sample, no conclusions can be made about the opinions of AFL at large, let alone LS viewers generally. ...
      (alt.fan.letterman)
    • Re: how to design database to handle survey data with multiple res
      ... Thanks Paul, I'll try it out. ... "Duane Hookom" wrote: ... > Are you suggesting that "At Your Survey" doesn't allow the survey designer ... >>> MS Access MVP ...
      (microsoft.public.access.tablesdbdesign)