Re: Salsa20 altivec timings
Date: 27 Sep 2005 17:32:24 -0700
> firstname.lastname@example.org wrote:
> > xmath wrote:
> > > It's interesting to note that, due to unavoidable data-dependencies, a
> > > Salsa20 round must take at least 12 cycles, as absolute CPU-independent
> > > minimum (unless it offers combined add-rotate, rotate-xor, or xor-add
> > > instructions that execute in one cycle, something I've never seen in
> > > any CPU).
> > ARM offers add-rotate but not the others [it can xor-rotate but not
> > rotate-xor].
> Interesting, didn't know that :-)
ARMs are crazy that way. I mean come on
A = (B + C) >>> D
Is a *common* operation :-)
To be fair
A = (B + C) >> D
Is common enough (e.g. signal average) so I guess supporting rotates
was not a far stretch.
> (Of course, to be relevant in this context it also needs to be able to
> do it on four 32-bit words in parallel in a single cycle, but still...)
There is a four-core ARM11 design out there ... hehehe I know what you
mean... the best ARM does [iirc] is SIMD on 32-bit values. So it's not
quite up to this.
Would be cool to see an FPGA implementation and compare it's size/speed
to other "typical" hardware ciphers.