Re: LibTomCrypt ASN.1...
- From: tomstdenis@xxxxxxxxx
- Date: 16 Apr 2006 05:07:34 -0700
Phil Carmody wrote:
Shame - in competant hands G5s are noticably faster than any
x86 variant per clock tick. (30% faster, I'd say, and that's
both for memory-bound and compute-bound tasks.)
I don't recall that being the general case. Overall it has fewer
execution units, less cache with fewer ways and IIRC used a FSB.
Reading the apple "G4 vs G5" comparisons it says at best case memory
access is 135ns. Opterons at similar clock have lower latency.
The only significant advantage is the ability to run two double
precision floating point ops per cycle. Even with SSE2 the Opterons
issue two macro-ops and take at least two cycles since they target the
same port.
It looks like as for the instruction flow it groups things in lines of
upto five ops (to upto one of five ports) whereas the AMD core does so
in groups of three. As a result the ICU window is larger on the G5
(200 units, or 40 lines of 5 ops) whereas the AMD has 24 lines of 3
macro-ops.
For raw ALU performance AMD still wins. It can issue upto 6 micro-ops
per cycle to the AGU and ALU units provided their dependencies have
been satisfied. Most register/register instructions are single cycle
and the load store access is accessible by all three ALU pipes.
According to a review at Ars Technica two dependent ALU ops cannot be
back to back without a 1 cycle penalty. This is not true in the AMD
world.
Fow raw FPU performance G5 has more issue ports per cycle, assuming
they are symmetric and have the same latencies [or better] than AMD it
would perform better.
The L1 cache could be faster in G5 world [I haven't seen any latency
claims] as it is direct mapped but it is also more likely to be trashed
by large unrolled code (of the sort that G5 and AMD like).
The L2 cache is both larger and has eight times the ways in AMD. This
means you're way more likely to have an L2 hit than in the G5 world.
The memory bus is lower latency in Opteron world but both have
relatively the same bandwidth.
Overall I don't doubt there are specific algorithms that work better on
G5 than Opteron. I seriously doubt the "general case" of being 30%
more efficient. Specially since the G5 was normally only compared
against the P4 which even compared to the Opteron is vastly less
efficient.
Now Intel is coming out with better cores. The "Core" series [I dunno
if it's what they have now or part of the MCW series] looks to copy a
lot from K8. Except they widen the macro-op line to 4, appear to have
two full 128-bit SSE ports and what looks like [iirc] three decently
full ALU paths. I don't know if that means all three ALUs will do the
int/shift/rotate opcodes. The P4 had "two" ALUs and that only meant
simple integer ops not shift/rotates.
Eitherway it looks to compare well against Opteron and would easily
beat G5 both on execution resources and memory bandwidth (something
Intel is sadly king at).
Tom
.
- References:
- [bug] LibTomCrypt ASN.1...
- From: tomstdenis
- Re: LibTomCrypt ASN.1...
- From: tomstdenis
- Re: LibTomCrypt ASN.1...
- From: Phil Carmody
- Re: LibTomCrypt ASN.1...
- From: tomstdenis
- Re: LibTomCrypt ASN.1...
- From: Phil Carmody
- [bug] LibTomCrypt ASN.1...
- Prev by Date: Re: authentication (SRP*, DH, TLS)
- Next by Date: Re: authentication (SRP*, DH, TLS)
- Previous by thread: Re: LibTomCrypt ASN.1...
- Next by thread: Re: LibTomCrypt ASN.1...
- Index(es):
Relevant Pages
|