Re: Building Sentry Systems

From: Benedikt Stockebrand (me_at_benedikt-stockebrand.de)
Date: 09/10/03

  • Next message: Adam: "sun patches for sendmail and ssh on 2.9?"
    To: focus-sun@securityfocus.com
    Date: Wed, 10 Sep 2003 14:02:02 +0200
    
    

    Hello focus-sun,

    Hal Flynn <flynn@securityfocus.com> writes:

    > [...] what I started thinking
    > about is creating a self-contained system that functions independent of
    > the cluster, and has access to all drives in the cabinet. The system
    > stores integrity information locally, and acts as an audit host to monitor
    > the integrity of files within the storage cabinet.
    >
    > Is anybody familiar with any research in this area? Has anybody
    > experimented with anything like this? I don't have access to any large
    > cabinets to tinker with these days, so I don't have the ability to play
    > with this on my own. I'd be interested in hearing about other similar
    > research and experimentation.

    How would you want to check integrity while the cluster nodes are
    actually working on the data? This sounds like you either have to
    notify your monitor about every intended write operation, which itself
    sounds like a major performance hit and makes it necessary to take
    into consideration that your monitor box may be down, or you wind up
    with a load of race conditions and temporary inconsistencies.

    As an additional note from some ("management") experience (I was
    somewhere between the customer/project and the data center/operations
    sides) with a range of cluster systems: These beasts are complex
    enough as is, adding more complexity will only make it even more
    likely that some sysadmin on call duty, called out in the middle of
    the night and unfamiliar with the cluster in question, will end up
    with a major fsckup.

    In my opinion a more reasonable approach, if you can't avoid using a
    cluster, is to make really sure a split brain won't happen. First
    thing you do is make sure you actually follow the cluster specs and
    don't take "shortcuts" using only a single heartbeat link or such.
    Then you make sure that the redundancy is properly monitored. It
    doesn't do to make sure that, yes, the data base is up and running.
    Check every heartbeat link, every redundant host-storage connector,
    storage cabinet, every disk in the RAIDs and all cluster nodes. Then
    you make sure only qualified staff touch the system (and, I am sorry
    to say, this is sometimes difficult to enforce if the vendors support
    staff is plain incompetent). Finally, make sure you test things
    periodically. If this still isn't good enough for you, disable any
    automatic failover and only do a manual switchover after someone
    clueful has taken a look into the whole situation.

    Of course I assume you don't intend to run the cluster without proper
    backup (like the old "why do I need backup---I've got a RAID system?"
    line) and don't expect the cluster to be the solution to all your IT
    problems in the world.

    Cheers,

        Ben

    -- 
    Dipl. Inform.                  Tel.:  +49 (0) 6151 - 971 823            
    Benedikt Stockebrand	       Mobil: +49 (0) 177 - 41 73 985           
    Am Karlshof 1a		       Mail:  me@benedikt-stockebrand.de        
    D-64287 Darmstadt	       WWW:   http://www.benedikt-stockebrand.de
    

  • Next message: Adam: "sun patches for sendmail and ssh on 2.9?"

    Relevant Pages

    • Re: Life after Digital
      ... And now, the Xserve will eventually replace the "server" DS10L, as well ... For the 4000-200 and later the 600 as well as a DSSI cabinet, ... When I had both a 4000-200 and the 4000-600 in the cluster, ... VMS clustering will be a found memory. ...
      (comp.os.vms)
    • OT: RX2600 Itanium boxes on eBay
      ... And now, the Xserve will eventually replace the "server" DS10L, as well ... For the 4000-200 and later the 600 as well as a DSSI cabinet, ... When I had both a 4000-200 and the 4000-600 in the cluster, ... VMS clustering will be a found memory. ...
      (comp.os.vms)
    • Life after Digital
      ... And now, the Xserve will eventually replace the "server" DS10L, as well ... For the 4000-200 and later the 600 as well as a DSSI cabinet, ... When I had both a 4000-200 and the 4000-600 in the cluster, ... VMS clustering will be a found memory. ...
      (comp.os.vms)
    • Re: Life after Digital
      ... And now, the Xserve will eventually replace the "server" DS10L, as well ... For the 4000-200 and later the 600 as well as a DSSI cabinet, ... When I had both a 4000-200 and the 4000-600 in the cluster, ... VMS clustering will be a found memory. ...
      (comp.os.vms)
    • Daily Report #4158
      ... The Sun and roughly 15% of stars are surrounded by dust disks ... Disk structure can ... CCD Stability Monitor ... HRC {at the cluster core} and WFC using ...
      (sci.astro.hubble)