Re: The Crypto Gardening Guide and Planting Tips

From: dave anonymous (i_have_a_few_questions@yahoo.com)
Date: 02/06/03


From: "dave anonymous" <i_have_a_few_questions@yahoo.com>
Date: Thu, 6 Feb 2003 10:29:53 -0800


"Thomas Pornin" <pornin@nerim.net> wrote in message
news:b1t76h$2pgp$1@norfair.nerim.net...
> According to Dave Anonymous <i_have_a_few_questions@yahoo.com>:
> > This would allow the design to operate at much higher frequencies. I
> > have no idea what it does to the compression function - but that's not
> > the point - I'm only trying to illustrate a way to minimize the total
> > propagation delay between any two registers.
>
> Well, roughly speaking, when designing a block cipher or a hash
> function, fast propagation of data is actually a property which is
> actively looked for. What I mean is that it is part of the design that
> changing one bit somewhere affects many other bits in the next algorithm
> stages. So minimizing the propagation delay as you suggest is likely to
> weaken the function.

I understand that and would like to clarify your point a little.
Propagation delay and "data avalanche" are two separate issues.

The first is a physical issue. How long does it take for a voltage
change on the input of a adder to cause a voltage change on the
output of the adder.

The second is a logical issue. If I toggle this bit, what other bits
will toggle. Independent of time.

The question I pose is it possible to rearrange the adders used
in a compression function so that a "chain of adders" is not
required. Is it possible to make a compression function that
only adds two numbers in each stage. Hardware performance
is limited by one thing only - how long does it take (i.e. what
is the propagation time) to calculate each "stage's" values. Requiring
a 5 or 7 way add in a given stage is a bad practice if you
want high performance. From a sw perspective it makes
no difference, from a hw perspective it makes a huge difference.

> Cryptographic functions are designed with an implementation goal in
> mind, and SHA-1 was clearly designed so as to be quite fast on a modern
> workstation CPU, which performs 32-bit adds in less than a clock cycle,
> but are really bad at parallelizing. It is highly difficult to design a
> function which will be very efficient on both hardware and software.
>
Agreed, which is why Peter started this thread in part. I'm
speculating the designers of SHA could have made it much more
efficient in hw (and no less efficient in sw) if the design constraint
I raise had been considered. It may be it was and you can't
make a good compression function this way. That's for the
crypto experts to answer.

>
> > Other suggestion: always provide a good set of test
> > vectors and a C source code implementation.
>
> For that SHA-1 is not the worse. The test vectors are quite clearly
> described in FIPS 180-1 (with all details on intermediate values) and
> the algorithm description, while not being C, is easily translated. The
> MD5 hash function, which is closely related to SHA-1, is described in
> RFC 1321 and comes with a sample C code, but I personnaly find that C
> code much less informative than the actual algorithmic description.
>

Agreed, FIPS 180-1 is an example for all to follow. The algorithm is
well described, and the test vectors are extensive (and were very helpful
during hw debug). Have the C code is nice because it shortens the
time to build an automatic test bench. This is used to generate "known
good answers" to compare against the simulated hardware. It's
one less thing the hw designer needs to code and verify.

-Dave



Relevant Pages

  • Re: Two stage synchroniser,how does it work?
    ... It reduces the probability that the meta-stable event will ... the available time is the clock period minus ... For high speed designs, the added propagation delay ... propagation delay elsewhere in the design, ...
    (comp.arch.fpga)
  • Re: problem with a shift register
    ... Clk05 does not have any propagation delay. ... relative to Clk. ... I'm sorry but the design is not correct. ...
    (comp.lang.vhdl)
  • Re: The Crypto Gardening Guide and Planting Tips
    ... > This would allow the design to operate at much higher frequencies. ... So minimizing the propagation delay as you suggest is likely to ... For that SHA-1 is not the worse. ... the algorithm description, while not being C, is easily translated. ...
    (sci.crypt)
  • Re: Test vectors for emulator.
    ... What is the most popular technique of producing and applying test vectors to ... Dumping test vectors from behavioral testbench during simulation. ... Applying from a file to emulated design. ... PASSED/FAILED status from emulator. ...
    (comp.lang.verilog)