Re: Salsa20 altivec timings
tomstdenis_at_gmail.com
Date: 09/28/05
- Next message: Paul Rubin: "Re: Salsa20 altivec timings"
- Previous message: tomstdenis_at_gmail.com: "Re: Salsa20 altivec timings"
- In reply to: Paul Rubin: "Re: Salsa20 altivec timings"
- Next in thread: Twittering One: "Re: Salsa20 altivec timings"
- Reply: Twittering One: "Re: Salsa20 altivec timings"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Date: 28 Sep 2005 05:57:30 -0700
Paul Rubin wrote:
> "xmath" <xmath.news@gmail.com> writes:
> > The ppc 7450 can execute those operations in a single cycle each, even
> > though they are all data-dependent on the immediately preceding
> > operation.
>
> Yeah, this is the problem, XMM has much more latency. I also think it
> doesn't really have four parallel execution paths. It works on 64
> bits per cycle underneath, i.e. it's just plain slower. At least I
> think this is the case for multiplication-using instructions.
>
> As well: the Athlon 64 does have 16 XMM registers, but the regular
> x86's only have eight. But I think the obvious XMM code uses seven.
Stop riding on x86... at least it can do 32x32 multiplies ;-)
hehehehe
tom
- Next message: Paul Rubin: "Re: Salsa20 altivec timings"
- Previous message: tomstdenis_at_gmail.com: "Re: Salsa20 altivec timings"
- In reply to: Paul Rubin: "Re: Salsa20 altivec timings"
- Next in thread: Twittering One: "Re: Salsa20 altivec timings"
- Reply: Twittering One: "Re: Salsa20 altivec timings"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]