Re: Sun single-CPU DOS
- From: Doug Hughes <doug@xxxxxxxxxxxxxx>
- Date: Fri, 26 May 2006 12:39:19 -0500 (CDT)
On Wed, 24 May 2006, Mike O'Connor wrote:
:Sun says it is jabber, which is why I put it quotes. Since they have notindeed, and using kstat shows count of 0. more ammo in my favor and presented
:replicated in lab, they are jumping to conclusions. Yes, I agree,
:it is very specific and the backline engineer usage appears 'stretching things'
Most Sun adapters have an actual jabber counter that netstat -k will
spew out for you. You can eliminate ambiguity easily enough. Here's
an example I Google'd for:
back to irritating backline.
netstat -k eri0
ipackets 525571 ierrors 365 opackets 8446 oerrors 0 collisions 85
ifspeed 10000000 rbytes 73324309 obytes 1118022 multircv 99205 multixmt
6 brdcstrcv 415863
brdcstxmt 10 norcvbuf 0 noxmtbuf 0 inits 4 rx_inits 8 tx_inits 1
nocarrier 1 nocanput 0 allocbfail 0 drop 321 pasue_rcv_cnt 0
pasue_on_cnt 0 pasue_off_cnt 0 pasue_time_cnt 0 txmac_urun 0
txmac_maxpkt_err 0 excessive_coll 0 late_coll 0 first_coll 35
defer_timer_exp 0 peak_attempt_cnt 0 jabber 0 no_tmds 0
tx_hang 0 rx_corr 0 no_free_rx_desc 0 rx_overflow 0 rx_hang 0
rx_align_err 64 rx_crc_err 19 rx_length_err 0 rx_code_viol_err 0
bad_pkts 321 runt 40 toolong_pkts 279 rxtag_error 0 parity_error 0
pci_error_interrupt 0 unknown_fatal 0 pci_data_parity_err 0
pci_signal_target_abort 0 pci_rcvd_target_abort 0 pci_rcvd_master_abort 0
pci_signal_system_err 0 pci_det_parity_err 0 ipackets64 525571
opackets64 8446 rbytes64 73324309 obytes64 1118022 pmcap 4
:In this case it's tcp/ip.
:step 1) telnet to router
:step 2) ping some remote device on a fast link (like 2GB IP/Sonet)
:step 3) watch as returning tcp/ip telnet stream DOS's the sun.
:it is not the cisco ping the is DOS'ing the sun, it is the return stream
:of !!..!.!!!....!!!..!!!... (ad infinitum)
Ahhh, so it's just the return traffic from the Cisco printing out all
those !!..!.!!! stuff (corresponding to whatever it is the the Cisco is
pinging) that causes all this? Nifty! I didn't think that the Cisco
could print that fast! I'm fairly certain it should rate-limit/sample
that output (unless some automated thingy actually cares about that
output coming from the Cisco).
you'd be surprised how fast a gsr can spit out streams of !.!..!..!
(30,000 pps before sun craps out. ;)
:the nagle comes into play in the tcp-stream not coalescing all theyep. It's just not turned on on routers by default, so this one caught
:single char tcp/ip packets each with a single ! or . in it.
Makes perfect sense now that I get what the traffic is. As an aside,
the Nagle algorithm was designed with telnet explicitly in mind, per
RFC 896. But, a lot of folks these days use telnet for stuff apart
from interactive use, and I could see someone wanting to disable it
for performance' sake. For bare-bones stack implementations, Nagle
may not be there at all.
us a little bit by surprise when engineers were running a burn in
test in the lab on an OC-192 card.
(_usually_ you don't cream a router with lots of little packets via
:right. totally agreed. it should not cause the machine to totally lock up.also to me.
:(I specified wrong earlier, btw. Break still works, just nothing else does)
That makes it sound even more like an interrupt issue rather than some
overall system lock.
:> In this particular case, if you're talking about ICMP, and there
:> really isn't a "jabber"/physical layer issue afoot, the idea is for
:getting that someone to not slap a 'jabber' label on things and
:dismiss it out of hand is where I am currently frustrated beyond
Beyond netstat -k, you can probably use lockstat or other kernel
profiling tools as I mentioned in my earlier post to give them a
good idea of where the bug really is. Interrupt issues aren't
always going to be cut and dried. There could be some particular
flavor of IOS, network adapter, media type, CPU, OS, etc. that
is more prone or less prone to the problem.
:well, yes, this was all quite accidental in the first place.
:The solution is really quite easy, don't disable nagle on the
:cisco in the first place. However, I'm much more concerned about
:the implications of a normal user being able to DOS the machine and
:Sun not caring enough to do due dilligence to address the issue.
Judging from the amount of times we've exchanged emails (I should
have asked for a network diagram sooner to help visualize this :) ),
sometimes it's not so easy. And "what is or isn't a DoS" can be a
grey line where reasonable people may differ. I could readily see
someone saying "if you point a stupid amount of traffic at something
it dies, have you considered just not doing that?".
yup. I've got plenty of ammo to throw at irritating and dubiously
self-righteous backline, but sometimes the only way to raise matters
above somebody who doesn't want to admit there is a problem, is
to provide a little community pressure to fix it. (even if it
isn't critical or may be hard to reproduce without appreciably
fast equipment on hand).
A DOS that makes a machine unusable is a DOS. Mis-categorizing it
(on their half) as jabber is wrong as well as condescending (left
that part out) and just plain irritating from a company that usually
takes operating system availability much more seriously.
- Re: Sun single-CPU DOS
- From: Mike O'Connor
- Re: Sun single-CPU DOS
- Prev by Date: LM hashes in a hot-desking environment
- Next by Date: Re: my Web Server << v-1.0 Denial of Service Exploit
- Previous by thread: Re: Sun single-CPU DOS
- Next by thread: [ MDKSA-2006:086 ] - Updated kernel packages fix multiple vulnerabilities