Re: The cat came and stayed..

I'm going to ask some questions to clarify my spotty networking
knowledge. Essentially,
you have routers connecting buildings "A" and "B" and when you turn
off the routing and make them layer two devices (Bridging mode) things
work as expected. To me this implicates a layer three problem.
Perhaps an IP conflict with the router, a machine masquerading as the
gateway (perhaps responding to arps for the gateway ip) or a bad
route. I would start looking at layer three misconfigurations. Maybe
a dhcp server is giving a bad gateway or somesuch. What happens when
you traceroute between the networks? Do you have extra hops? Are
there specific places with time lags?


On 3/28/07, WALI <hkhasgiwale@xxxxxxxxx> wrote:

By the time you have finished reading this, I am sure you would have come
across the most fascinating networking issue haunted by our friendly ghost

With reference to my earlier thread, (Re: When cat comes chasing...), this
time the cat came and stayed. Having exploited most of my resources , I
finally decided to involve our ISP hoping that this would be the end of
it...but it wasn't supposed to be that way.

So, to cut a long story short, ISP had provided us with EoATM 100 mbps link
between two locations, say A and B.

But, since the line was given, we felt that we were not only having
intermittent problems that required switch reset but also felt that we were
not getting the right speed and the data transfer rates(FTP copy and other
stuff) was really not befitting a 100Mbps link.

In order to make sure, this time the ISP guy brought some equipment to our
premises and confirmed that speed at Layer 2 is indeed 100.

There are two cisco routers across Sites A and B and two media changers at
each end converting Fiber to UTP. Media converters are also set at 100Mbps.

Now a strange thing is that when we configure the two routers (Site A and
B) in 'bridging' mode and start data transfer across, the speed becomes
incrementally fast ( which should be taken as normal at all times). There
is also another 100Mbps link provided by the same ISP to us between
Buildings A and C, which works just fine, as it should be.

The moment we enable our routers at Site A and B in Routing mode, We get to
suffer delays and all data transfers slow down, without bringing any
core/edge switches into the picture.

Various things have been done to reach some conclusion:

1. Ip Router configurations has been reset and put to bare minimum needed
with ipcef enabled, all QoS commands disabled.
2. Configurations has been checked with all combinations of Speed Auto/100
FullDuplex/Auto with best results coming out of FD/100 but still far below
3. Equipment which serves between Site A and C has been temporarily put
between Site A and B, with same non-satisfactory results.
4. Earthing issues/Electrical disruption in the Room where routers are
located has been looked into. Routers on both sides have been changed to
rule out hardware issues. We also did a test on the line by bringing our
routers into another room ruling out some electrical disturbance of any sort.

Seems like, at Layer 2, despite being showing us full 100mbps, Layer 3 and
above transfers are unable to provide the required service. Opening
applications across the two buildings is very slow as most of our servers
reside at Site A with user base at Site B.

Currently this ISP engineer has provided us with a patched pure fibre link
between Sites A and B without any intervening ISP equipment in between and
we have connected our two core switches in both buildings directly to the
UTP interface of Media converter but that's not the permanent solution. ISP
Engineer is also trying hard to find this ghost problem. He says that he
has found no problems on his side and the only thing that comes in the
middle is a MPLS enabled router. But even he is a bit baffled.

What else can we look at?

Thanks for taking time to read this whole ghost story. If you have read
this all, I am sure you won't stop thinking ;)

At 12:57 AM 3/24/2007 +0100, Antonin Kral wrote:
>Hi Wali,
>* WALI <hkhasgiwale@xxxxxxxxx> [2007-03-24 00:50] wrote:
> > Crazy Solution: I take out any patch cable and re-inserts it, the problem
> > gets resolved. I reset any switch, the problem gets resolved. I disconnect
> > any uplink cable between the four switches or do a ARP reset thru command
> > line, the problem gets resolved for couple of hours or even days.
>This sounds like problems with spanning tree in the network. Do you run
>STP? Take a look at the topology changes reported by stp. Or one more
>thing - this could happen because of over-fulling CAM (switching) tables
>of particular switch. Check if you are not running out of memory
> Cheers,
> Antonin
>This List Sponsored by: Cenzic
>Need to secure your web apps?
>Cenzic Hailstorm finds vulnerabilities fast.
>Click the link to buy it, try it or download Hailstorm for FREE.

This List Sponsored by: Cenzic

Need to secure your web apps?
Cenzic Hailstorm finds vulnerabilities fast.
Click the link to buy it, try it or download Hailstorm for FREE.

Buz Dale buz.dale@xxxxxxx
IT Security Specialist 1-888-875-3697 (In GA)
Office of Information and Instructional Technology
University System of Georgia
GMT -5:00

This List Sponsored by: Cenzic

Need to secure your web apps?
Cenzic Hailstorm finds vulnerabilities fast.
Click the link to buy it, try it or download Hailstorm for FREE.