Re: Salsa20 altivec timings

tomstdenis_at_gmail.com
Date: 09/28/05


Date: 28 Sep 2005 05:57:30 -0700


Paul Rubin wrote:
> "xmath" <xmath.news@gmail.com> writes:
> > The ppc 7450 can execute those operations in a single cycle each, even
> > though they are all data-dependent on the immediately preceding
> > operation.
>
> Yeah, this is the problem, XMM has much more latency. I also think it
> doesn't really have four parallel execution paths. It works on 64
> bits per cycle underneath, i.e. it's just plain slower. At least I
> think this is the case for multiplication-using instructions.
>
> As well: the Athlon 64 does have 16 XMM registers, but the regular
> x86's only have eight. But I think the obvious XMM code uses seven.

Stop riding on x86... at least it can do 32x32 multiplies ;-)

hehehehe

tom