Re: Large-numbers division way too slow -- what to do?

From: Francois Grieu (fgrieu_at_francenet.fr)
Date: 03/10/05


Date: Thu, 10 Mar 2005 19:03:41 +0100

In article <1110456114.739247.237380@o13g2000cwo.googlegroups.com>,
 tomstdenis@gmail.com wrote:

> Francois Grieu wrote:

> > 2) Apply the following technique simple technique for modular
> > arithmetic modulo n, when n does not change often (this is the
> > case in RSA).
> > When n is exressed a an m-digit number in base b,
> > pre-compute inv = ceil((b**(2*m))/n), which has m+1
> > digits.
> > Then u*v mod n (with 0<=u<n, 0<=v<n) is computed as
> > w = u*v
> > q = floor(w*inv/(b**(2*m))) // no division here !
> > r = w-q*n
> > while r<0
> > r=r+n
> > q=q-1 // only if quotient q is needed also
>
> This is barrett reduction

This is close to the technique in Paul Barett's Crypto '86
"Implementing the Rivest Shamir and Adleman public key encryption
algorithm on a standard digital signal processor" [*], but it is
not quite it; in particular, my inv is one more than his R.

> and it's generally NOT faster than Montgomery.

Right. One modular reduction require over 3 times as many
elementary multiplications as in Montgomery's. This can be
reduced to 2, using the technique I outline in my point Q4
(which of course is in Barett's article).
Still it is much faster than what the OP does, and it is
easier to implement than Knuth's algorithm D.

> That's why Montomgery is popular..

Why the Motgomery method finds its way in so many software
implementation of RSA escapes me. Algorithm D, or other
variants, asymptotically have the same cost in term of number
of elementary operations, and require less pre-computation.
The only drawback that I know with the classical algorithm
is the risk of misguessing a digit, which complicates hardware
implementions, and making implementations secure against side
channel attacks.
I have spent months optimising (the public key side of) RSA
on several slow microprocessors (with drastic adressing and
RAM size constraints), and invariably found that whatever
implementation trick applies to the Montgomery inner loop
can be transposed to the classic algorithm.

  Francois Grieu

[*] maybe available from
http://dsns.csie.nctu.edu.tw/research/crypto/HTML/PDF/C86/311.PDF



Relevant Pages

  • Re: What is exponent?
    ... For simple description of RSA algorithm ... I also have the receiver's certificate (public key only). ... Use RSA to encrypt the session key ...
    (microsoft.public.dotnet.security)
  • Re: Dummy questions from a newbie
    ... does this in a reasonable amount of time. ... However RSA is used widely enough that most cryptosystems ... hashing algorithm should i choose in your opinion? ... Increasing the number of Deep Crack boards could it make it simpler to ...
    (sci.crypt)
  • Re: RSA Challenge Question
    ... unless the algorithm required some deep ... RSA is not used for encryption. ... *them* (Sally Jane) submit the results, wherein Chappy calls John Doe, ...
    (sci.crypt)
  • Re: Quantum slip - Quantum conspiracy
    ... RSA and DH are done in by Shor's algorithm. ... So you can look on ECC as slightly more vulnerable ... As far as NTRU goes, ...
    (sci.crypt)
  • Re: A very fast Fermat factoring algorithm
    ... > Another is a very old technique that uses exclusion moduli to ... I'd rather not share the technique just yet. ... I have heard of several Fermat speedups, but none that are nearly this ... nothing like Fermat's algorithm, but it has the same runtime ...
    (sci.crypt)