Re: LibTomMath forked [SSE2 addons]

From: Tom St Denis (tom_at_securescience.net)
Date: 06/30/04


Date: Wed, 30 Jun 2004 03:19:12 GMT

Updated the patches a bit more...

http://math.libtomcrypt.org/files/patch-0.30/ltmsse_patches3.zip

On my P4 2.8C I managed 1170 512-bit exptmods per second [roughly 2.4M
cycles each, down from 2.5M cycles in the previous patch set]. Roughly
speaking this is around 2.4x faster than OpenSSL.

The zip file has the patches [you can apply against LTM 0.30] as well as a
"mpi.c" file you can drop into LibTomCrypt [or use on it's own I
guess ;-)]. As per the previous you have to define LTMSSE to get the SSE2
optimizations.

As I understand it the AMD K8 processor has SSE2 as well. I was wondering
if anyone with access to one could apply the patches and run the timing
demo? [e.g. make timing ; ./ltmtest] and gimme the exptmod outputs. Would
be interesting to get the cycle count for a 512-bit exptmod so I can
compare them ;-)

Tom



Relevant Pages

  • Re: LibTomMath forked [SSE2 addons]
    ... LTM-SSE2 [patches not yet on website] ... on an Athlon XP-M. ... cycles whereas the P4-SSE2 code required 2.5 million cycles. ... updated patches on the website later tonight. ...
    (sci.crypt)
  • Re: Spartan 3 clock to output tristate timing
    ... is the cycle timing fixed? ... On write cycles the FPGA has to latch the write data before it ... into a 1-cycle pulse used as a clock enable to latch the write data. ...
    (comp.arch.fpga)
  • Re: What micros do you actually hate to work with?
    ... something in x machine cycles plus/minus zero. ... I like the earlier suggestion of a WaitUntilCyclesfunction. ... use something similar for coarser timing reading off of timers. ... would expect the compiler to implement it internally. ...
    (comp.arch.embedded)
  • Re: What micros do you actually hate to work with?
    ... something in x machine cycles plus/minus zero. ... I like the earlier suggestion of a WaitUntilCyclesfunction. ... use something similar for coarser timing reading off of timers. ... would expect the compiler to implement it internally. ...
    (comp.arch.embedded)
  • Re: New release 0.4.0 of opencbm (cbm4win/cbm4linux)
    ... just trying to help give you better timing margins :-) ... so you can make it work even without relying on the data direction ... cycles were superfluous. ... Dag Lem ...
    (comp.emulators.cbm)