Re: Salsa20 altivec timings
From: Milan VXdgsvt (milan_vxdgsvt_at_seznam.cz)
Date: 09/28/05
- Next message: Milan VXdgsvt: "Re: Re-rolled Salsa20 function"
- Previous message: Crypto_at_S.M.S: "Re: How regularly is the GnuPG source code examined?"
- In reply to: xmath: "Salsa20 altivec timings"
- Next in thread: xmath: "Re: Salsa20 altivec timings"
- Reply: xmath: "Re: Salsa20 altivec timings"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Date: Wed, 28 Sep 2005 07:29:37 +0000 (UTC)
xmath wrote:
> http://cds.xs4all.nl:8081/salsa/
>
> I get 276 cycles, or 4.31 cycles/byte. That's actually a bit more
> than twice as fast as djb's scalar implementation of Salsa20 on a G4.
Did you check the outputs? I believe that
const vu32 vrr18 = vrr07 + vrr07;
should have been
const vu32 vrr18 = vrr09 + vrr09;
I also find the reordering quite suspicious, but I don't have an
Altivec compiler to check it for sure:
// 0 1 2 3 0 5 2 7 0 5 a f
// 4 5 6 7 ----> 4 9 6 b ----> 4 9 e 3
// 8 9 a b 8 d a f 8 d 2 7
// c d e f c 1 e 3 c 1 6 b
for (int i = 0; i < 20; i++) {
z1 = y1 ^ vec_rl(y0 + y3, vrr07);
the y0+y3 combines, in the second column, 5 and 1 while it should
combine 1 and D.
I've seen Altivec first time today so maybe I'm just mistaken.
Milan
- Next message: Milan VXdgsvt: "Re: Re-rolled Salsa20 function"
- Previous message: Crypto_at_S.M.S: "Re: How regularly is the GnuPG source code examined?"
- In reply to: xmath: "Salsa20 altivec timings"
- Next in thread: xmath: "Re: Salsa20 altivec timings"
- Reply: xmath: "Re: Salsa20 altivec timings"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]