Re: Intel core 2 quad - faster XMM?
- From: tomstdenis@xxxxxxxxx
- Date: 12 Jan 2007 10:10:06 -0800
Paul Rubin wrote:
tomstdenis@xxxxxxxxx writes:
Bignum no. XMM only has a 32x32 => 64 multiplier. So even though you
can pack two of them (and it takes more than a cycle to complete) you
still have twice as many of them as just using MULQ.
But I thought XMM bignum was already faster than MULQ even in the old
processors. Here's some old P4 timings from Eric Young:
On the P4 Prescott this was the case. Not so on the later series,
especially not true on the C2D and AMD64 processors. MULQ is fast
nowadays to the point where the XMM multiply would basically have to be
1-2 cycles to break even.
As for bit-slicing, I imagine latency would be lower so yeah it'd help
there.
Designing new primitives is almost always a bad idea (since we already
have so many), but XMM is so ubiquitous that maybe there's some
justification for trying to find a way to use it.
Really though, they should just add AES operations to the XMM
instruction set. ;)
No, they should add AES to the ISA. When I was at AMD we looked at
that [briefly] and the result was that with an FPU opcode every 2
cycles AES would be slower with discrete steps than a pure integer AES.
The best way to speed up and also secure against side channels is to
just have a one shot AES instruction.
Tom
.
- Follow-Ups:
- Re: Intel core 2 quad - faster XMM?
- From: Paul Rubin
- Re: Intel core 2 quad - faster XMM?
- References:
- Intel core 2 quad - faster XMM?
- From: Paul Rubin
- Re: Intel core 2 quad - faster XMM?
- From: tomstdenis
- Re: Intel core 2 quad - faster XMM?
- From: Paul Rubin
- Intel core 2 quad - faster XMM?
- Prev by Date: Re: Intel core 2 quad - faster XMM?
- Next by Date: Re: Intel core 2 quad - faster XMM?
- Previous by thread: Re: Intel core 2 quad - faster XMM?
- Next by thread: Re: Intel core 2 quad - faster XMM?
- Index(es):
Relevant Pages
|