Strange IIS 5 problem with client certificates

From: rlusian (rlusian_at_nospam.nospam)
Date: 04/06/05


Date: Wed, 6 Apr 2005 10:53:01 -0700

Hello!

We are having a strange IIS 5.0 problem involving client certificates.

The situation is this:

We have a system with a central server running Win2K and IIS 5.0, with the
web site secured using client certificates and a CTL. Remote Linux-based
embedded devices running OpenSSL and Tcl connect to the central server via
HTTPS, authenticate themselves via client certificate, and then POST data to
the server.

The PKI is all Microsoft. We used OpenSSL to convert the certificates to a
form that it can use. The 'bootstrap' certificate has been in use for over a
year.

Many of these sites exist and work properly, but one site is working
differently than the others. Here are the symptoms:

* When the remote device tries to contact the web site using HTTPS and a
benchmark client certificate, it fails. IIS breaks off communications and
the device gets a socket error.

* The remote device can connect if certificates are ignored, but not if they
are accepted or required.

* The remote device can connect to other servers using the same certificate
with certificates required.

* Multiple remote devices exhibit the exact same behavior.

* Windows clients (IE and Firefox) can connect to the web site using the
same certificate. Both the Windows clients and the remote device are in the
Internet, so any firewall settings apply to both.

* Multiple people have tweaked the IIS settings and manually compared the
data in the IIS MMC console with a working installation to no avail.

* When the device fails to connect nothing appears in the IIS logs (as
opposed to when it tries to connect and has a certificate problem).

* We have seen such behavior one other time; it was fixed by reinstalling
the OS and the server application. After reinstallation, the problem could
still be induced by turning off Anonymous Access, even though in that
particular case we were using many-to-one certificate mapping and thus were
never using anonymous access. This behavior has not been repeatable. In the
first installation, the problem occurred whether anonymous access was enabled
or not, and other servers work fine with many-to-one cert mapping and
anonymous access turned off.

* I ran IISDump on both servers and looked at the system XML dump and the
metabase XML dump. The DLLs in the system dump match up (same version
numbers) except for explainable differences (such as the presence of other
applications). Nothing in the metabase XML dump looked particularly
problematic.

* If I load SSL Diagnostics 1.0 (latest version: 3/30/05) on the problematic
server and a working server, turn on the Client Certificate Monitor, and then
successfully access the web page using IE, CCM logs the certificate on the
working server but not on the problematic server.

* An updated version of the remote device (with the latest OpenSSL
libraries) fails.

* An updated version with the latest OpenSSL and the TLS Tcl library
succeeds. Unfortunately, we don't have the option to update all the embedded
devices in a timely manner. The versions that don't work are openssl 0.96.b
and TLS library 1.41. The versions that do are openssl 0.9.7e and TLS 1.50.

My inferences are as follows:

The fact that the remote device can contact other servers using the same
certificate implies that the certificate on the device has not been corrupted.

The fact that Windows clients can contact that server with the same
certificate implies that the certificate validation process is functioning
properly.

The fact that an updated device with latest OpenSSL fails, but latest
OpenSSL + TLS Tcl library succeeds implies that there is something in the SSL
handshake as managed by the older TLS library that the problematic
installation of IIS doesn't like. This implies that there is some sort of
IIS configuration which is not set properly.

On the other hand, the fact that SSL Diagnostics's Client Certificate
Monitor doesn't work on the problematic installation of IIS implies that
there may be some sort of problem with the CryptoAPI system.

The experience with the previous occurrence - where installation #1 first
failed at all times and then installation #2 succeeded only when anonymous
access was enabled - is rather puzzling and could point to either a bad
installation or a configuration problem or both.

We are going to analyze the Tcl code in more detail, but we hope that an IIS
guru can point us in the right direction and help us either fix the server or
prevent such problems in the future. If you have any questions or wish us to
gather more data, please let us know. Sorry we don't have more data for you,
but troubleshooting SSL is tricky and the problem is rather subtle.

Any pointers or suggestions would be greatly appreciated.



Relevant Pages