Re: Salsa20 and SSE2?
From: D. J. Bernstein (djb_at_cr.yp.to)
Date: 08/29/05
- Next message: Stian Karlsen: "Re: Fast implementation of AES, Rijndael"
- Previous message: contini_at_matmail.com: "Re: Salsa20 and SSE2?"
- In reply to: Paul Rubin: "Re: Salsa20 and SSE2?"
- Next in thread: contini_at_matmail.com: "Re: Salsa20 and SSE2?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Date: Mon, 29 Aug 2005 06:49:46 +0000 (UTC)
Paul Rubin wrote:
> It's disappointing that the SSE2 instructions are so slow.
The biggest part of the problem is the number of instructions required:
* XMM instructions can't simply add the way that LEA can. They have
to copy, then add.
* XMM instructions can't simply rotate the way that ROL can. They
have to copy, then shift, then shift, then xor.
* XMM instructions can't even copy in one instruction without serious
P4 latency problems. They have to copy a 0, then xor.
Even if you aren't worried about P4 latency, the simple add-rotate-xor
in Salsa20 turns into at least seven XMM instructions. Each round has
four add-rotate-xor operations and three shuffles, for a total of at
least 31 XMM instructions per round.
For comparison, traditional non-XMM code takes 26.75 cycles per round on
the Athlon, and completely straightforward serial code takes 24.5 cycles
per round on the PowerPC G4.
> Alternatively, any chance of a "salsa40" function that uses 64-bit
> scalar operations? What are the obvious pitfalls in designing one
> that more or less imitates salsa20?
The main problem is that the 64-bit speedup is accompanied by a severe
32-bit slowdown. Shuffling sixteen 64-bit variables into x86 registers
takes a lot of time; also, add-with-carry is often fairly slow.
---D. J. Bernstein, Professor, Mathematics, Statistics,
and Computer Science, University of Illinois at Chicago
- Next message: Stian Karlsen: "Re: Fast implementation of AES, Rijndael"
- Previous message: contini_at_matmail.com: "Re: Salsa20 and SSE2?"
- In reply to: Paul Rubin: "Re: Salsa20 and SSE2?"
- Next in thread: contini_at_matmail.com: "Re: Salsa20 and SSE2?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Relevant Pages
|
|