Re: AES Timing Attack Implementation & Karl Malbrain code...

BRG wrote:
karl malbrain wrote:
BRG wrote:
rohit wrote:
Dear All,

I was analysing Cache-timing attacks on AES by Daniel J. Bernstein, and
tried running the source posted by Karl Malbrain at following URL:
This code implements AES without using large tables. It will be very
slow but will not (typically) be vulnerable to the DJB attack since this
depends on a lot of cache space being used for tables.

The encryption timing results are 130 cycles per byte for a large table
C version, and 229 cycles per byte for the immune small table C
version. I believe the current BRG assembly code runs about 30 cycles
per byte.

Are you including or excluding the cost of setting up the key schedule?
If you are including the key schedule cost, over how many encryption or
decryption blocks is this cost being spread?

The figure is an error I attempted to correct in a subsequent post.
When compiling for maximum optimization I get 40 cycles per byte w/o
key scheduling cost.

On the 32-bit Intel x86 architecture, without including the cost of key
scheduling, my encryption and decryption routines for 128-bit keys run
in 21 cycles per byte for C code and 18 cycles per byte for assembler.

It is also worth remembering that Daniel Bernstein found that modern
(post Pentium) x86 systems allow misaligned accesses to memory with no
appreciable penalty. This allows AES to be implemented using more
compact tables with a much reduced vulnerability to his attack without
any significant speed penalty. In fact it seems highly likely that such
implementations are faster in practice than large table versions because
of the reduced load on the cache.

This is an interesting idea. I'll have to try implementing it.
Have you tried doing anything with this observation yet?
karl m