Re: Possible AIM Hack?

On Wed, 15 Mar 2006 19:48:01 EST, Steven said:

later, 6 hours later, 5 days later, etc. Additionally, if some server that
gives a yea/nay is on a coffe + donut break -- what would that have to do
with kicking you offline after already being authenticated?

"A distributed system is one in which the failure of a computer you didn't even
know existed can render your own computer unusable." -- Leslie Lamport

When designing very large and complex systems, it gets harder and harder to
avoid designing into them all sorts of odd dependencies and cascading failure
modes. For instance, the last 3 times our modem pool terminal servers got
hosed up, it was due to the TACACS server not being able to contact our LDAP
server. And the connection timeouts to the LDAP server were caused by some
misbehaving software beating up on *another* one of our servers and making that
server create a flood of non-optimized LDAP queries (most LDAP server software
goes into severe oink mode when it has to do queries it doesn't have a pre-built
index for).

Got that? The terminal server got indigestion waiting for the TACACS server,
which was trying to talk to the LDAP server, but couldn't get a word in edgewise
because some OTHER server was spewing broken queries at the LDAP server.
Counting the user's machine, and the machine originating the broken query to the
other server, there was a total of *7* logical machines in the chain (actually
more, as several of these were really multiple machines behind a load balancer).

This sort of thing is just *loads* of fun to unsnarl at 10PM, when none of
the system architects are handy, but plenty of people are still trying to use
the modem pool... ;)

I'd not be at *all* surprised if an unexpected failure of AOL's auth server
caused the main AIM servers to hiccup when they got a new inbound request and
were unable to deal with the failure mode, and dropping all the already existing
sessions during the hiccup reset.

And to tie it all back into the INCIDENTS charter - one of the most common causes
of a corporate server getting exploited is when a hacker finds some code that
glues 2 server systems together, and the code isn't perfect. Of course, this
also means that it's usually very difficult to figure out what happened, which
is why you often see "The hackers had been in the system for at least 3 weeks
before they were detected".....

Attachment: pgpfhHBCyyLPQ.pgp
Description: PGP signature

Relevant Pages

  • Migration from win2000 to 2008
    ... I have only 1 DC currently, intergrated DNS ... The replication generated an error: ... The failure occurred at 2009-11-13 12:47.44. ... The RPC server is unavailable.. ...
  • RE: Server tools installation options do not appear
    ... refer to reinstalling the server tools. ... failure: suitehelp.dll ... while trying to set install action override ... > I need to reinstall my server tools, but the option does not appear when I ...
  • Re: OE Version.
    ... I like your 2 possible explanations about timing on the occasional failure ... "Your server has unexpectedly terminated the connection. ... getting back to service your connection that your client thinks the server ... Antivirus software can make about the same thing happen. ...
  • Required reading for HP executives
    ... The task force report found three groups of causes for the ... Situational Awareness," and Cause 3 as "Failure of the ... "At 14:41 EDT, the primary server ... the alarm system application and all other EMS ...
  • Re: Netdiag Kerberos Fail
    ... > Upgraded our windows nt domain to windows server 2003 active directory. ... Ran netdiag on workstation with a failure on kerberos and ... This has solved our event id errors and the ldap failure in ...