Re: Salsa20 altivec timings

From: xmath (xmath.news_at_gmail.com)
Date: 09/28/05


Date: 27 Sep 2005 17:10:27 -0700

tomstdenis@gmail.com wrote:
> xmath wrote:
> > It's interesting to note that, due to unavoidable data-dependencies, a
> > Salsa20 round must take at least 12 cycles, as absolute CPU-independent
> > minimum (unless it offers combined add-rotate, rotate-xor, or xor-add
> > instructions that execute in one cycle, something I've never seen in
> > any CPU).
>
> ARM offers add-rotate but not the others [it can xor-rotate but not
> rotate-xor].

Interesting, didn't know that :-)

(Of course, to be relevant in this context it also needs to be able to
do it on four 32-bit words in parallel in a single cycle, but still...)

 -xmath

PS. in the same dir as the other files I've now also put a text file
with a detailed cycle-accurate simulation of the salsa20_xor function
being run in a loop on a 7450, in case anyone is interested



Relevant Pages

  • Re: Branch prediction
    ... It's not really that the CPU is emulating any instructions, ... trying to execute more of them, and all sorts of things are done to ... Obviously a conditional branch presents a ...
    (comp.lang.asm.x86)
  • Re: AMD CodeAnalyst MASM only?
    ... that limited when instructions could dispatch together. ... can execute instructions out of order, so it is a little more difficult to ... unitused, decode cycle, execute cycle, and retire/writeback cycle. ... Next I have the decode field. ...
    (comp.lang.asm.x86)
  • Another high end 16/32 bit uC, Wide Vcc, Wide Temp
    ... shift and rotate instructions are always processed during one machine cycle independent of the number of bits to be shifted. ... Also multiplication and most MAC instructions execute in one single cycle. ... Serious Peripherals [1..63 bit SPI and UARTs] ...
    (comp.arch.embedded)
  • Re: AMD CodeAnalyst MASM only?
    ... > that limited when instructions could dispatch together. ... > can execute instructions out of order, so it is a little more difficult ... > together by which cycle they retire in. ... > unitused, decode cycle, execute cycle, and retire/writeback cycle. ...
    (comp.lang.asm.x86)
  • Re: multicore lisp?
    ... executed conditionally depending on the state of a control flag. ... cleared a "run" flag that determined whether following instructions ... would execute or be ignored. ... The CM-5 also had four vector FPUs per CPU. ...
    (comp.lang.lisp)