On a true x86 machine AES in assembler runs in about 18 cycles per byte
(with some really arcane tweaking this can be cut to 15 cycles per byte.
This compares with C code that runs at about 21 cycles per byte (the
figures I am giving are all for AES-128 and do not count key scheduling
costs). This x86 assembler code runs in x86 mode on AMD64 in 14 cycles
per byte.

DJB style response: That's nothing, I have umac that takes 7 cycles...

Dan Bernstein (DJB) does meaningful work, and when you make fun of him
in the way you normally do it makes you look jealous and spiteful.

Defintion 2 at

I'm making a point that DJs recent "curve255 is faster than PKCS" is a
ludicrous comparison. So here I compare his umac [hash127 etc] work to