Re: Pentanomial better than trinomial




"Sebastian Gottschalk" <seppi@xxxxxxxxx> wrote in message news:56tc8iF2agll1U1@xxxxxxxxxxxxxxxx
Michael Scott wrote:

"Sebastian Gottschalk" <seppi@xxxxxxxxx> wrote in message
news:56t7cmF2admfrU1@xxxxxxxxxxxxxxxx
Michael Scott wrote:

"Table C-3 in Annex C lists an irreducible pentanomial of degree m over
F2
for each m, 160 < m < 2000, for which an irreducible trinomial of degree
m
does not exist. For each such m, the table lists the triple (k1, k2, k3)
for
which (i) x^m + x^k3 + x^k2+ x^k1 + 1 is irreducible over F2; (ii) k1 is
as
small as possible; (iii) for this particular value of k1, k2 is as small
as
possible; and (iv) for these particular values of k1 and k2, k3 is as
small
as possible."

Is this choice basically arbitrary, or was there some reason for it?

In the reduction algorithm (above) each element of C[i] is written to and
read from just once - one can unroll the loop to achieve this, so the
work
associated with memory accesses in an invariant. Its the number of XORs
and
shifts which count. The cost is therefore architecture dependent - on an
ARM
processor with its barrel shifter, shifts are (as you say) free. On a
Pentium they are not,

A Pentium does pipelining and register-renaming, has multiple execution
units. Except for dependencies, shifts are free.

No, I don't agree. The sequence

shl eax,11
shr ebx,13
shl ecx,24
shr edx,17

isn't "free". These are instructions which need to go down one pipeline or
another, and while they are being executed, another instruction isn't.

You have multiple execution units. Since these instructions have no
dependencies on each other, they'll be executed in parallel.

In fact most compilers will reorder instructions such that every expensive
instruction is paired with one or more non-dependent shifts. While the
expensive one executes, the other execution units are fed with the shifts.

Thus, one can reasonably claim that shifts are essentially free since they
don't cost you any more cycles.

No, that's not reasonable. Sure a shift can easily be paired down a second pipeline. But while its being executed in a second pipeline another independent instruction is not being executed down that pipeline which otherwise might be.


For this type of algorithm there is a lot of "Instruction Level Parallelism" and
a good compiler could probably schedule instructions so as to avoid having
to pay a price for any dependencies. So I would maintain that XORs and
shifts would cost the same in this setting.

May I remind you of "C[i-7]^=(T>>9)^T;"? There is a fat obvious dependency,
which will bog you down to three cycles whereas the rest would only take
two cycles. And since they're all executed in parallel, the highest makes
the total cost.

OK lets compile it. Assume T is in R1 and R4 points at C and lets use a MIPS instruction set...

dslr r2,r1,9
ld r3,-28(r4)
xor r1,r2,r1
xor r3,r1,r3
sd r3,-28(r4)

On a standard MIPS the types of dependencies which arise here do not cause stalls, because of forwarding. Even if there are multiple pipelines, RAW dependencies can be broken up by loop unrolling, and mixing instructions from different iterations.


I don't know what you mean by "XORing one value two times". You can read a
register as often as you like without creating a dependency.

I meant a hypothetical instruction r4 = r1 XOR r2 XOR r3 which gets
executed in one cycle.

Some sources (for example "Guide to Elliptic Curve Cryptography"), state
that it is advantageous if "the middle terms" of the pentanomial are
"close
to eachother". I don't get this. In my GF(2^233) example above the
middle
terms (201, 105 and 9) are not at all close.

201-105 = 105-9. I'd say this is very "close", maybe a bit too close.

Interesting observation! However this is not what was meant - 201 is not
close to 105.

Take a look at how this simple relation bogs down to the number of shifts
you need.

Don't know what you mean by that - "bogs down to the number of shift you need" means nothing to me.

Perhaps you might demonstrate what you mean?


Mike


.



Relevant Pages

  • Re: Pentanomial better than trinomial
    ... shifts are free. ... For this type of algorithm there is a lot of "Instruction Level Parallelism" and a good compiler could probably schedule instructions so as to avoid having to pay a price for any dependencies. ... So I would maintain that XORs and shifts would cost the same in this setting. ...
    (sci.crypt)
  • Re: 10 bit per channel YUV with alpha
    ... Aren't those, like, Pentium specific details which aren't really relevant anymore..? ... Intel has like 3 major architecture designs since then for implementing the IA32.. ... Then AMD has their own architecture, which again doesn't benefit much if at all from pairing instructions in specified manner. ... memory read requests can create arbitrary bottlenecks and dependencies.. ...
    (microsoft.public.win32.programmer.directx.video)
  • Re: [Lit.] Buffer overruns
    ... A CPU still executes only one instruction at a time. ... > Of all the nonsense you've ever come out with, ... > have many dozens of instructions at various phases of being executed ...
    (sci.crypt)
  • Re: OOA?
    ... >I am not saying that managing dependencies is not important. ... >about modeling the problem domain than... ... Managing dependencies in a program is remarkably important, ... A model of the solution executes, ...
    (comp.object)
  • Re: Trouble upgrading ports collection
    ... >> distfiles, just follow the instructions that pop up when you make it. ... it's build dependencies can be missing and it will still ... > Ran make install again ...
    (freebsd-questions)