On a true x86 machine AES in assembler runs in about 18 cycles per byte
(with some really arcane tweaking this can be cut to 15 cycles per byte.
This compares with C code that runs at about 21 cycles per byte (the
figures I am giving are all for AES-128 and do not count key scheduling
costs). This x86 assembler code runs in x86 mode on AMD64 in 14 cycles
per byte.

DJB style response: That's nothing, I have umac that takes 7 cycles...

Sorry...Had to...

No you didn't.

Dan Bernstein (DJB) does meaningful work, and when you make fun of him
in the way you normally do it makes you look jealous and spiteful.

Defintion 2 at

I'm making a point that DJs recent "curve255 is faster than PKCS" is a
ludicrous comparison. So here I compare his umac [hash127 etc] work to