Re: RC4 on AMD64

From: Arnaud Carré (arnaud.carreNOSPAM_at_freesurf.fr)
Date: 11/03/04


Date: Wed, 3 Nov 2004 17:28:34 +0100


Hey ultra optimisation rules, I agree !

But, as always in "this is the fastest routine over the world", the test
protocol should be described carefully.

First of all, I look at the asm routine, and there is *no* special AMD
featureqs used, exept cypher is XORED 8bytes by 8bytes. That feature could
be used on Intel since Pentium 3 using SSE instruction. ( You even can write
16bytes by 16bytes with Intel P4)

And second, and far more important, the protocol do RC4 on 1024 bytes
buffer, several times. If you are a PC optimizer, you know there is ONLY one
thing you can optimze on these shitty machines: RAM access. Writing data in
the same 1024 byte area use only primary cache, and it's Faaaaaaasssttt !!

Just use the same routine with a 128Mb ram buffer and you'll see the
bytes/sec speed decreasing a lot !!

So my conclusion is that there is simply no "nice optimizing tips" in that
RC4 routine.

Arnaud

"Jose Castejon-Amenedo" <Jose.Castejon-Amenedo@hp.com> wrote in message
news:pan.2004.11.02.16.15.58.64546@hp.com...
> People in this group might be interested in this:
>
>
http://developers.slashdot.org/developers/04/11/02/050232.shtml?tid=93&tid=142&tid=185&tid=8
>
> This result, while nice, is not very impressive. HP's RC4
> implementation for IA64 delivers 381 MB/s - on a 1.3 GHz Madison. Use
> a beefier, 1.7 GHz IA64 CPU and the throughput is 499 MB/s. Just setting
> the record straight here.
>
>
>
>


Loading