Re: Pentanomial better than trinomial




"Sebastian Gottschalk" <seppi@xxxxxxxxx> wrote in message news:56t7cmF2admfrU1@xxxxxxxxxxxxxxxx
Michael Scott wrote:

"Table C-3 in Annex C lists an irreducible pentanomial of degree m over F2
for each m, 160 < m < 2000, for which an irreducible trinomial of degree m
does not exist. For each such m, the table lists the triple (k1, k2, k3) for
which (i) x^m + x^k3 + x^k2+ x^k1 + 1 is irreducible over F2; (ii) k1 is as
small as possible; (iii) for this particular value of k1, k2 is as small as
possible; and (iv) for these particular values of k1 and k2, k3 is as small
as possible."

Is this choice basically arbitrary, or was there some reason for it?

In the reduction algorithm (above) each element of C[i] is written to and
read from just once - one can unroll the loop to achieve this, so the work
associated with memory accesses in an invariant. Its the number of XORs and
shifts which count. The cost is therefore architecture dependent - on an ARM
processor with its barrel shifter, shifts are (as you say) free. On a
Pentium they are not,

A Pentium does pipelining and register-renaming, has multiple execution
units. Except for dependencies, shifts are free.

No, I don't agree. The sequence

shl eax,11
shr ebx,13
shl ecx,24
shr edx,17

isn't "free". These are instructions which need to go down one pipeline or another, and while they are being executed, another instruction isn't. For this type of algorithm there is a lot of "Instruction Level Parallelism" and a good compiler could probably schedule instructions so as to avoid having to pay a price for any dependencies. So I would maintain that XORs and shifts would cost the same in this setting.


and on some smaller processors shifts depend on the
amount of the shift and are thus potentially much more expensive than XORs.

Hm... this might be exactly the answer for your question. If k1, k2 and k3
are small, the shifts are smaller.

No. For the GF(2^283) case despite the "closeness" of 12, 7 and 5, the shifts are by 5, 10, 12, 17, 22, 27, 20 and 15 - Guide to Elliptic Curve Cryptography Algorithm 2.43.


Consider the NIST pentanomial for GF(2^283) on a 32-bit processor,
x^283+x^12+x^7+x^5+1. Per iteration its cost is 8 XORs and 8 shifts. Using
for example the pentanomial x^283+x^119+x^91+x^59+1 the cost is 6 XORs and 4
shifts, which has got to be faster any way you look at it - nearly twice as
fast on some architectures.

Hm... what about the dependencies? As I already pointed out, the dependency
of XORing one value two times make this extend to 3 cycles whereas all
other operations only took two cycles (and thus their 4-fold parallelized
version took 3 cycles in total).

A decent compiler (like Intel's) will automatically loop unroll and hence be able to mix instructions from different iterations. By doing this (and register renaming - I believe the Pentium has some 40 renaming registers), only RAW dependencies will remain, and these can be scheduled.

I don't know what you mean by "XORing one value two times". You can read a register as often as you like without creating a dependency.


Some sources (for example "Guide to Elliptic Curve Cryptography"), state
that it is advantageous if "the middle terms" of the pentanomial are "close
to eachother". I don't get this. In my GF(2^233) example above the middle
terms (201, 105 and 9) are not at all close.

201-105 = 105-9. I'd say this is very "close", maybe a bit too close.

Interesting observation! However this is not what was meant - 201 is not close to 105.


Mike

.



Relevant Pages

  • Re: Pentanomial better than trinomial
    ... shifts are free. ... Except for dependencies, shifts are free. ... In fact most compilers will reorder instructions such that every expensive ... expensive one executes, the other execution units are fed with the shifts. ...
    (sci.crypt)
  • Re: Pentanomial better than trinomial
    ... but 2 less shifts per iteration. ... With 4 pipelines and register renaming, each loop takes 2 cycles, whereas ... The cost is therefore architecture dependent - on an ARM processor with its barrel shifter, shifts are free. ... On a Pentium they are not, and on some smaller processors shifts depend on the amount of the shift and are thus potentially much more expensive than XORs. ...
    (sci.crypt)
  • Auto Gearbox Question
    ... though when I temporarily move the gear lever to the "3" position the ... box shifts up as normal, and when I then change back to the normal Drive ... position, the problem stays gone, regardless of how briefly it was in ... instructions. ...
    (uk.rec.cars.maintenance)
  • RE: A whopping 50 percent... ???
    ... > But it does make a difference, speaking as a compiler writer. ... > shifts and masks. ... And adding the instructions to do 16 bit accesses on ... to make those fewer ticks much slower ticks. ...
    (comp.os.vms)