Re: The Crypto Gardening Guide and Planting Tips
From: dave anonymous (i_have_a_few_questions@yahoo.com)
Date: 02/06/03
- Next message: flip: "Re: DIEHARD battery of tests: new version"
- Previous message: JohnTromaville: "Re: More one NASA management"
- In reply to: Michael Amling: "Re: The Crypto Gardening Guide and Planting Tips"
- Next in thread: Thomas Pornin: "Re: The Crypto Gardening Guide and Planting Tips"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
From: "dave anonymous" <i_have_a_few_questions@yahoo.com> Date: Thu, 6 Feb 2003 10:11:18 -0800
I think you missed the point of the restructuring I
proposed. The total delay of the adder is the
result of two factors: the width of the adder, and
the number of values being added. We want to
minimize both the width and number of values
being added at a given stage.
Think of it this way:
A "2 input 32 bit adder" does the following:
z[31:0] = a[31:0] + b[31:0]
Here the width is 32 bits, and 2 values are being
added. Assume it takes 5 nanoseconds for
z to settle after a and b change.
A "5 input 32 bit adder":
z[31:0] = a[31:0] + b[31:0] + ... e[31:0]
You could build a 5 input adder using 2 input
adders like this:
z = ((((a+b) + c) + d) + e)
this results in 4 levels of adders and takes 20 nS.
or
t1 = a+b; t2 = c+d # first level of adders
t3 = t1+t2 # second level
z= t3+e # third level
z = ((a+b) + (c+d)) + e
This results in 3 levels of adders and takes 15 nS
The proposal I suggested only added two 32 bit
numbers in each step and takes only 5 nS.
So what? What is the Mb/s performance if the
design uses these three different adder structures?
Sha1 takes about 80 clock ticks to process 512 bits.
With a 20 nS clock the bit rate is: 320 Mb/s
15nS clock gives: 427 Mb/s
5nS clock gives: 1.2 Gb/s
So a simple algorithm change results in a 4x performance improvement.
Is it possible to build a good compression function with the
requirement of no more than a 2 input adder in each stage?
I don't know. Maybe someone on sci.crypt can speculate.
The penalty for slow designs causes hardware to grow in other ways
to meet a given performance level. For example if I needed
to build a 1 Gb/s SHA1 engine I might need 4 instances of
the slow design, plus a scheduler and i/o handler that keeps the
4 execution units busy. The result? Longer design times, longer
design verification, more expensive parts (larger die sizes),
and maybe higher power dissipation.
-Dave
- Next message: flip: "Re: DIEHARD battery of tests: new version"
- Previous message: JohnTromaville: "Re: More one NASA management"
- In reply to: Michael Amling: "Re: The Crypto Gardening Guide and Planting Tips"
- Next in thread: Thomas Pornin: "Re: The Crypto Gardening Guide and Planting Tips"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Relevant Pages
|