Re: The Crypto Gardening Guide and Planting Tips

From: dave anonymous (i_have_a_few_questions@yahoo.com)
Date: 02/06/03


From: "dave anonymous" <i_have_a_few_questions@yahoo.com>
Date: Thu, 6 Feb 2003 10:11:18 -0800

I think you missed the point of the restructuring I
proposed. The total delay of the adder is the
result of two factors: the width of the adder, and
the number of values being added. We want to
minimize both the width and number of values
being added at a given stage.

Think of it this way:

A "2 input 32 bit adder" does the following:

z[31:0] = a[31:0] + b[31:0]

Here the width is 32 bits, and 2 values are being
added. Assume it takes 5 nanoseconds for
z to settle after a and b change.

A "5 input 32 bit adder":

z[31:0] = a[31:0] + b[31:0] + ... e[31:0]

You could build a 5 input adder using 2 input
adders like this:

z = ((((a+b) + c) + d) + e)

this results in 4 levels of adders and takes 20 nS.

or

t1 = a+b; t2 = c+d # first level of adders
t3 = t1+t2 # second level
z= t3+e # third level

z = ((a+b) + (c+d)) + e

This results in 3 levels of adders and takes 15 nS

The proposal I suggested only added two 32 bit
numbers in each step and takes only 5 nS.

So what? What is the Mb/s performance if the
design uses these three different adder structures?

Sha1 takes about 80 clock ticks to process 512 bits.
With a 20 nS clock the bit rate is: 320 Mb/s
15nS clock gives: 427 Mb/s
5nS clock gives: 1.2 Gb/s

So a simple algorithm change results in a 4x performance improvement.
Is it possible to build a good compression function with the
requirement of no more than a 2 input adder in each stage?
I don't know. Maybe someone on sci.crypt can speculate.

The penalty for slow designs causes hardware to grow in other ways
to meet a given performance level. For example if I needed
to build a 1 Gb/s SHA1 engine I might need 4 instances of
the slow design, plus a scheduler and i/o handler that keeps the
4 execution units busy. The result? Longer design times, longer
design verification, more expensive parts (larger die sizes),
and maybe higher power dissipation.

-Dave



Relevant Pages

  • 3state/gate-based MUXes
    ... We are implementing a complex adder design with VHDL and are synthesizing it in the UMC90nm library. ... -- Design: mux ...
    (comp.lang.vhdl)
  • 32bit multiplication using TTL logic
    ... I want to design what would have been a super computer in 1975, ... Magic-1, which I'm currently designing PCBs for. ... gates and 400 74283 Adder circuits. ... what would the power consumption of some 600 odd ICs be? ...
    (sci.electronics.basics)
  • Re: Question about: Logic Levels in Critical Path
    ... Symon schrieb: ... If this is a Xilinx design, try looking at the design in the timing analyser tool. ... This may cause portions of the adder to be placed before the registers thus reducing the logic levels of the adder itself. ...
    (comp.arch.fpga)
  • Re: 32bit multiplication using TTL logic
    ... I want to design what would have been a super computer in 1975, ... gates and 400 74283 Adder circuits. ... carry-lookahead adder for the final carry-propagate adder in the ... hell-bent on building this out of TTL parts,there is a 4-bit TTL ...
    (sci.electronics.basics)
  • What will the next FPGA IP-blocks be?
    ... Proceedings of ISSCC a design for an incredibly fast 64-bit adder in ... A 64b adder with a single-execution cycle time of 250ps is ... energy-delay optimization framework that can rapidly optimize ... different microarchitectures in the energy-delay space. ...
    (comp.arch.fpga)

Quantcast