Re: Possible AIM Hack?

On Wed, 15 Mar 2006 19:48:01 EST, Steven said:

later, 6 hours later, 5 days later, etc. Additionally, if some server that
gives a yea/nay is on a coffe + donut break -- what would that have to do
with kicking you offline after already being authenticated?

"A distributed system is one in which the failure of a computer you didn't even
know existed can render your own computer unusable." -- Leslie Lamport

When designing very large and complex systems, it gets harder and harder to
avoid designing into them all sorts of odd dependencies and cascading failure
modes. For instance, the last 3 times our modem pool terminal servers got
hosed up, it was due to the TACACS server not being able to contact our LDAP
server. And the connection timeouts to the LDAP server were caused by some
misbehaving software beating up on *another* one of our servers and making that
server create a flood of non-optimized LDAP queries (most LDAP server software
goes into severe oink mode when it has to do queries it doesn't have a pre-built
index for).

Got that? The terminal server got indigestion waiting for the TACACS server,
which was trying to talk to the LDAP server, but couldn't get a word in edgewise
because some OTHER server was spewing broken queries at the LDAP server.
Counting the user's machine, and the machine originating the broken query to the
other server, there was a total of *7* logical machines in the chain (actually
more, as several of these were really multiple machines behind a load balancer).

This sort of thing is just *loads* of fun to unsnarl at 10PM, when none of
the system architects are handy, but plenty of people are still trying to use
the modem pool... ;)

I'd not be at *all* surprised if an unexpected failure of AOL's auth server
caused the main AIM servers to hiccup when they got a new inbound request and
were unable to deal with the failure mode, and dropping all the already existing
sessions during the hiccup reset.

And to tie it all back into the INCIDENTS charter - one of the most common causes
of a corporate server getting exploited is when a hacker finds some code that
glues 2 server systems together, and the code isn't perfect. Of course, this
also means that it's usually very difficult to figure out what happened, which
is why you often see "The hackers had been in the system for at least 3 weeks
before they were detected".....

Attachment: pgpfhHBCyyLPQ.pgp
Description: PGP signature