AES trick...



The most common way to implement AES in 32/64 bit words is with the
8x32 tables and use register renaming for ShiftRows.

My trick I want to explore [and I'm seeing if anyone else tried this]
is just to use a static 16-tuple of 8x128 tables [64KB] for the entire
round function. Since ShiftRows and MixColumns are linear you could
just implement the entire round function as 16 lookups and 15 xors.

Sure this kills the cache but with SSE2 128-bit xors the trick may pay
off. GCM can get ~27 cycles/byte on my Opteron with the 64KB trick
[for the GF mult]. So I know it would work.

In particular the performance on non-x86_64 processors where registers
are scarce may be interesting.

Hmm...

Tom

.


Quantcast