Re: Pentanomial better than trinomial
- From: "Michael Scott" <mscott@xxxxxxxxx>
- Date: Tue, 27 Mar 2007 20:41:10 +0100
"Sebastian Gottschalk" <seppi@xxxxxxxxx> wrote in message news:56t7cmF2admfrU1@xxxxxxxxxxxxxxxx
Michael Scott wrote:
"Table C-3 in Annex C lists an irreducible pentanomial of degree m over F2
for each m, 160 < m < 2000, for which an irreducible trinomial of degree m
does not exist. For each such m, the table lists the triple (k1, k2, k3) for
which (i) x^m + x^k3 + x^k2+ x^k1 + 1 is irreducible over F2; (ii) k1 is as
small as possible; (iii) for this particular value of k1, k2 is as small as
possible; and (iv) for these particular values of k1 and k2, k3 is as small
as possible."
Is this choice basically arbitrary, or was there some reason for it?
In the reduction algorithm (above) each element of C[i] is written to and
read from just once - one can unroll the loop to achieve this, so the work
associated with memory accesses in an invariant. Its the number of XORs and
shifts which count. The cost is therefore architecture dependent - on an ARM
processor with its barrel shifter, shifts are (as you say) free. On a
Pentium they are not,
A Pentium does pipelining and register-renaming, has multiple execution
units. Except for dependencies, shifts are free.
No, I don't agree. The sequence
shl eax,11
shr ebx,13
shl ecx,24
shr edx,17
isn't "free". These are instructions which need to go down one pipeline or another, and while they are being executed, another instruction isn't. For this type of algorithm there is a lot of "Instruction Level Parallelism" and a good compiler could probably schedule instructions so as to avoid having to pay a price for any dependencies. So I would maintain that XORs and shifts would cost the same in this setting.
and on some smaller processors shifts depend on the
amount of the shift and are thus potentially much more expensive than XORs.
Hm... this might be exactly the answer for your question. If k1, k2 and k3
are small, the shifts are smaller.
No. For the GF(2^283) case despite the "closeness" of 12, 7 and 5, the shifts are by 5, 10, 12, 17, 22, 27, 20 and 15 - Guide to Elliptic Curve Cryptography Algorithm 2.43.
Consider the NIST pentanomial for GF(2^283) on a 32-bit processor,
x^283+x^12+x^7+x^5+1. Per iteration its cost is 8 XORs and 8 shifts. Using
for example the pentanomial x^283+x^119+x^91+x^59+1 the cost is 6 XORs and 4
shifts, which has got to be faster any way you look at it - nearly twice as
fast on some architectures.
Hm... what about the dependencies? As I already pointed out, the dependency
of XORing one value two times make this extend to 3 cycles whereas all
other operations only took two cycles (and thus their 4-fold parallelized
version took 3 cycles in total).
A decent compiler (like Intel's) will automatically loop unroll and hence be able to mix instructions from different iterations. By doing this (and register renaming - I believe the Pentium has some 40 renaming registers), only RAW dependencies will remain, and these can be scheduled.
I don't know what you mean by "XORing one value two times". You can read a register as often as you like without creating a dependency.
Some sources (for example "Guide to Elliptic Curve Cryptography"), state
that it is advantageous if "the middle terms" of the pentanomial are "close
to eachother". I don't get this. In my GF(2^233) example above the middle
terms (201, 105 and 9) are not at all close.
201-105 = 105-9. I'd say this is very "close", maybe a bit too close.
Interesting observation! However this is not what was meant - 201 is not close to 105.
Mike
.
- References:
- Pentanomial better than trinomial
- From: Michael Scott
- Re: Pentanomial better than trinomial
- From: Michael Scott
- Pentanomial better than trinomial
- Prev by Date: Re: Pentanomial better than trinomial
- Next by Date: Re: Surrogate factoring, top to bottom
- Previous by thread: Re: Pentanomial better than trinomial
- Next by thread: Re: Pentanomial better than trinomial
- Index(es):
Relevant Pages
|
|