Re: IPS Reliability/Availability




My original message was prompted by a similar thread
posted on the TippingPoint list. Please read below,
ath the end of the message the original thread I
compiled from the TippingPoint user list.

Indeed ASICs, FPGAs may be able do sustain more
throughput, but I honestly think that for lower
bandwidth/high latentcy networks, a mature, well
tested *NIX/Intel/RISC product can be more stable than
ASICS/PGAs technologies that are still working to iron
out their problems. And TippingPoint is an example.

Thanks,
Mike

#############
ORIGINAL MESSAGE Inquiring the TippingPoint User list
about hardware problems:

I would like to query the community about any
experiences with hardware issues seen with the 2400
series IPA's
We have had 2 of them go bad on us weeks apart. One
set of ports on a segment was giving us problems, and
the other went into layer-2 fallback with a MZDM
error. Just trying to get a feel on how the hardware
is holding up.
Thanks,
Sr. Network Security Engineer
Enterprise Security
T. Rowe Price
----------------------------------------------------

We have changed our 2400 twice already. First time a
HD problem, second time a port problem. The second
time one segment stopped working and another would not
forward layer 3 traffic in a SX/LX GBIC combination
after the latest TOS upgrade.

XXX XXX
Lead Information Security Officer/Engineer
Super Computing Technology Coordinator
Norfolk State University

-----------------------------------

We have 4 of the 2400â??s at one location and have had
3 RMAâ??d (one of them replaced twice). All have been
within the past 6 months. Problems ranging from:

- device becomes unmanageable but still passes traffic
(no https, ssh, console or LCD functionality)
- suspended task errors which causes about 80% packet
loss on all segments
- bad disk

Devices affected are on both copper and fiber (LX and
SX) segments

Another major problem we face is the fact that after a
TOS upgrade, the segment interfaces revert back to
auto/auto speed/duplex settings. This causes issues
with our copper segments and makes it very risky to
upgrade remote sites with no tech support on-site.
Itâ??s in the release notes as an upgrade issue but
has been outstanding for months now.


Systems Administrator
Avid Technology

--------------------------------

We have 4 of the 400s and 3 RMA occurrences in two
years. The units did not fail, but went into degraded
performance state, do to a thermal alarm threshold
trigger.

We have the same issue with the segments reverting
back to auto/auto during TOS upgrade. Hopefully this
will finally be corrected with the next release.

Sr Network Analyst
Cooper Cameron

-----------------------------------


out of several 2400s and 2+ years we've seen a couple
cpu fans and one
disk go bad, but these came at pretty long intervals
and weren't
particularly surprising. the rmas went fine, even
over weekends... it
seems each one was better then the one before so
progress in the right
direction.

i imagine the 5000e's (with their no moving parts)
will be even better,
and i'd love to see the same non-moving parts spread
out to the other
platforms like our beloved 2400s

Manager of Security Resources
UNC Chapel Hill

---------------------------------------------

Our 2400 IPA suffered a problem whereby the unit would
drop packets in
long TCP sessions. It was not noticed during short
file transfers, e.g. most web traffic. But if you
tried to FTP lots of data, like our Physics department
was doing, it would drop a packet here and there and
the TCP retransmission logic was unable to recover.
Our only workaround was to drop the box back into
Layer 2 FallBack mode and just do bridging.

Unfortunately, when we sent the box back, we never
found out the root
cause of the device failure. Tsk. Tsk.

Thankfully, our new 5000 IPA has been running smoothly
without a hitch.

College of William and Mary
Information Technology - Network Engineering


-------------
We lost one 2400 a few months ago due to Thermal
Failure.
The RMA box was DOA. 2d RMA worked but lost a Power
Supply on that one a few weeks ago.

The 2400 does not log events for PS issues.



Senior Network Engineer
WakeMed


----------------------------------------------------------------------

We have 2 of the 200's for little over a year and I'm
in the process of RMA for one of them for a bad disk
right now.

Does anybody else find it unacceptable that Tipping
Point is unable to process an RMA on a Weekend? I
confirmed the Disk errors on Saturday and was unable
to request a new unit be shipped until Monday and it
looks like the unit won't be in house until tomorrow.
Luckily I have a secondary unit but that's a long time
to run with downgraded redundancy.


Network Security Admin III
http://www.elementk.com

#############################

Answer Message from Don Ward, TippingPoint VP
Engineering:

From: don_ward@xxxxxxxx [mailto:don_ward@xxxxxxxx]
Sent: Tuesday, November 08, 2005 6:00 PM
To: Tipping Point Users Group
Cc: Tipping Point Users Group
Subject: RE: [tippingpoint] Issues with 2400 Series


All--

Based on the great feedback to this thread (all
honest, all healthy to bring forth), I must share with
everyone what TippingPoint has been in process doing
for the past 8 months to address both hardware and
software quality issues.

The majority of RMAs in the past 2 years have
primarily been the result of faulty HDDs and CPU fans.
HDDs have failed for two primary reasons:

1. High levels of adhoc read/write cycles (wearing
drives out)
2. Overwriting non-protected areas of the drive
leading to file corruption, non-bootable drives,
and/or drive failure

We have addressed both above issues to-date as
follows:

1. Invoked a scheduled process for writes (via the
RAMDISK function - starting in TOS 1.4.2 and beyond)
so drives do not wear out as quickly
2. Invoked a HDD Patch in 2.1.3.6321 that protects
the HDD from file corruption/overwrites so drives do
not both report superfluous error messages and/or die
due to file/boot sector corruption

We have started the process of addressing CPU fan
(thermal event issues) by phasing out CPU fans (which
are prone to fail throughout the industry) and replace
with both solid-state drives (removing spinning parts)
and heat pipes (removing the need for fans). This
change of materials has been instituted in the 5000E
product and will phase into other IPS hardware models
moving forward in the next couple of months.

The variety of software bugs around memory corruption
(i.e. non-responsive management access, page faults)
that have very frequently resulted in RMAs have been
resolved with TOS 2.1.3.6321 as well.

Over 700 customers have upgraded to 2.1.3.6321 in the
past 5 weeks and the code has been running very stable
to-date. We have witnessed a large reduction in the
number of RMAs as well.

Best Regards,
/dw



__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

------------------------------------------------------------------------
Test Your IDS

Is your IDS deployed correctly?
Find out quickly and easily by testing it
with real-world attacks from CORE IMPACT.
Go to http://www.securityfocus.com/sponsor/CoreSecurity_focus-ids_040708
to learn more.
------------------------------------------------------------------------