Re: Another SHA2 implementation

From: Olivier Gay (olivier.gay_at_a3.epfl.ch)
Date: 05/02/05


Date: Mon, 2 May 2005 09:11:38 +0200


> LibTomCrypt [with what will be 1.03], ICC v8 with "-O3 -xP -ip" I get
>
> sha224 : Process at 26
> sha256 : Process at 26
> sha384 : Process at 58
> sha512 : Process at 58
>
> Cycles/byte @ 4KiB blocks on a P4 Prescott processor.

I tested libtomcrypt 1.02 (latest public release) with my own benchmarking
tool, and I obtained
(icc with -O3 -xP -ip options):

SHA-256: 26.57
SHA-384: 61.72
SHA-512: 61.70

(my benchmaring tool try to report the min value, dont use cpuid
serialization and try to avoid cache misses)

If I add the -ip option with my code I got 27.05 cycles/byte for SHA-256 but
then I lost a few cycles for SHA-384/SHA-512.

> So we're close. The difference on SHA-512 could be code related but
> also could be processor related. If you're using a Northwood core that
> may make the difference...

By disabling my UNROLL_LOOPS option I can increase performance from 97.97
for SHA-384/SHA-512 to 83.76 cycles/bytes. I think this is code and cache
related : my object code is too big for the cache size, so accesses dont hit
the cache. I will try to find a tradeoff so I can still unroll loops but
also hit the cache, the result for SHA-384/SHA-512 will be then more
competitive.

Olivier