Re: LibTomMath forked [SSE2 addons]
From: Tom St Denis (tom_at_securescience.net)
Date: 06/30/04
- Previous message: Michael Amling: "IND-CCA2 public only?"
- In reply to: Tom St Denis: "Re: LibTomMath forked [SSE2 addons]"
- Next in thread: Marcel Martin: "Re: LibTomMath forked [SSE2 addons]"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Date: Wed, 30 Jun 2004 16:08:18 GMT
This is funny.... check this out
> Now with SSE2 [and ICC -O3 -xN -ip]
>
> Timing Multiplying:
> 4 digits: 400 cycles
> 8 digits: 764 cycles
> 12 digits: 1280 cycles
> 16 digits: 1868 cycles
> 20 digits: 3068 cycles
> 24 digits: 4020 cycles
> 28 digits: 5146 cycles
> 32 digits: 6382 cycles
> 36 digits: 7766 cycles
> Timing Squaring:
> 4 digits: 344 cycles
> 8 digits: 644 cycles
> 12 digits: 1056 cycles
> 16 digits: 1550 cycles
> 20 digits: 2202 cycles
> 24 digits: 2866 cycles
> 28 digits: 3566 cycles
> 32 digits: 4310 cycles
> 36 digits: 5122 cycles
Now with GCC [v3.3.3, -march=pentium4 -mcpu=pentium4 -O3
-fomit-frame-pointer -funroll-loops]
Timing Multiplying:
4 digits: 464 cycles
8 digits: 740 cycles
12 digits: 1118 cycles
16 digits: 1616 cycles
20 digits: 2162 cycles
24 digits: 2858 cycles
28 digits: 3590 cycles
32 digits: 4400 cycles
36 digits: 5478 cycles
Timing Squaring:
4 digits: 460 cycles
8 digits: 766 cycles
12 digits: 1076 cycles
16 digits: 1460 cycles
20 digits: 1848 cycles
24 digits: 2300 cycles
28 digits: 2824 cycles
32 digits: 3514 cycles
36 digits: 4196 cycles
Seems GCC optimizes the code better than Intels own C compiler...
Ok I'm done now ;-) Sorry about all the posts... but it's just fun hacking
and might benefit someone.
Tom
- Previous message: Michael Amling: "IND-CCA2 public only?"
- In reply to: Tom St Denis: "Re: LibTomMath forked [SSE2 addons]"
- Next in thread: Marcel Martin: "Re: LibTomMath forked [SSE2 addons]"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Relevant Pages
|