Re: Another SHA2 implementation
From: Olivier Gay (olivier.gay_at_a3.epfl.ch)
Date: 05/02/05
- Next message: abidik_gubi_at_hotmail.com: "piccadilly_75641"
- Previous message: gregofiesh_at_yahoo.com: "Re: ECC encoding?"
- Maybe in reply to: Paul Rubin: "Re: Another SHA2 implementation"
- Next in thread: Ilmari Karonen: "Re: Another SHA2 implementation"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Date: Mon, 2 May 2005 09:11:38 +0200
> LibTomCrypt [with what will be 1.03], ICC v8 with "-O3 -xP -ip" I get
>
> sha224 : Process at 26
> sha256 : Process at 26
> sha384 : Process at 58
> sha512 : Process at 58
>
> Cycles/byte @ 4KiB blocks on a P4 Prescott processor.
I tested libtomcrypt 1.02 (latest public release) with my own benchmarking
tool, and I obtained
(icc with -O3 -xP -ip options):
SHA-256: 26.57
SHA-384: 61.72
SHA-512: 61.70
(my benchmaring tool try to report the min value, dont use cpuid
serialization and try to avoid cache misses)
If I add the -ip option with my code I got 27.05 cycles/byte for SHA-256 but
then I lost a few cycles for SHA-384/SHA-512.
> So we're close. The difference on SHA-512 could be code related but
> also could be processor related. If you're using a Northwood core that
> may make the difference...
By disabling my UNROLL_LOOPS option I can increase performance from 97.97
for SHA-384/SHA-512 to 83.76 cycles/bytes. I think this is code and cache
related : my object code is too big for the cache size, so accesses dont hit
the cache. I will try to find a tradeoff so I can still unroll loops but
also hit the cache, the result for SHA-384/SHA-512 will be then more
competitive.
Olivier
- Next message: abidik_gubi_at_hotmail.com: "piccadilly_75641"
- Previous message: gregofiesh_at_yahoo.com: "Re: ECC encoding?"
- Maybe in reply to: Paul Rubin: "Re: Another SHA2 implementation"
- Next in thread: Ilmari Karonen: "Re: Another SHA2 implementation"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]