Those SSE2 patches ;-)

From: Tom St Denis (tom_at_securescience.net)
Date: 07/14/04


Date: Wed, 14 Jul 2004 11:04:50 GMT


[This post contains 50% post-consumer recyclable materials and 50% pure
Canadian outback handmade oak pluging action]

A friend of mine from PeerSec [dudes who make MatrixSSL] sent me this link
[a bit dated I guess]

http://www.arctic.org/~dean/crypto/rsa.html

So I took a look. My SSE2 patched LTM is not only still faster [by nearly
3M cycles on 1024-bit RSA on a P4] but the patches are much cleaner.

For example, they unroll their SSE2 loop and use ugly ugly perl statements
to generate the asm. I simply added a few dozen lines per file without the
ugly unrolling, etc... Actually unrolling makes it manually makes it
slower [GCC will unroll it itself though].

The perl stuff aside [why? they could just use NASM if portable assembler
[hehe] was a concern] they approach bignum math from all the wrong angles.
Hence my math-fu is faster. Muahahaha.

The PeerSec guys recently [on my box I might add] tested the SSE2 patches in
their MatrixSSL. From what they told me they can now perform session
creation [from scratch] 37% faster than with OpenSSL on the P4. So now you
too can have a ~50KB SSL library that's faster than OpenSSL muahahahaha.

Tom