Aceshardware

(not so) temporary home for the aceshardware community
 FAQ •  Search •  Register •  Login 
It is currently Fri Nov 27, 2009 3:45 pm

All times are UTC + 1 hour



Welcome
Welcome to <strong>Aceshardware</strong>.

You are currently viewing our boards as a guest, which gives you limited access to view most discussions and access our other features. By joining our free community, you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content, and access many other special features. Registration is fast, simple, and absolutely free, so please, <a href="/profile.php?mode=register">join our community today</a>!


Post new topic Reply to topic  [ 163 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6 ... 11  Next
Author Message
 Post subject:
PostPosted: Fri Aug 29, 2008 9:13 pm 
Offline

Joined: Thu Aug 28, 2008 8:50 pm
Posts: 3
I have read your info on how Intel cheats, but was more concerned about what a "CPU dispatcher" is. Sounds like it is just compiler support for generating a cpuid instruction + switch statement, for a given function marked with __declspec(cpu_dispatch).

(I was hoping somehow that it was some kind of hardware support for accessing different SSE instruction fallbacks at runtime.)

Thanks,
William Pfeil

8)


Top
 Profile  
 
 Post subject:
PostPosted: Sat Aug 30, 2008 7:21 am 
Offline

Joined: Fri Sep 07, 2007 10:31 am
Posts: 25
Location: Denmark
aceupsleeve wrote:
Sounds like it is just compiler support for generating a cpuid instruction + switch statement, for a given function marked with __declspec(cpu_dispatch).

Yes it is.
Some function libraries have dispatching of the most time-consuming functions.
Dispatching the individual instructions can be done with emulation but this is very slow.


Top
 Profile  
 
 Post subject: Intel removed the 4-operand FMA support from AVX spec
PostPosted: Mon Jan 19, 2009 4:54 am 
Offline

Joined: Tue Oct 09, 2007 6:49 pm
Posts: 5
It seems, Intel removed the 4-operand FMA support from the AVX specification. The latest version (319433-004) lists the SSE5-like

Code:
VFMADD132PD xmm0, xmm1, xmm2/m128
VFMADD213PD xmm0, xmm1, xmm2/m128
VFMADD231PD xmm0, xmm1, xmm2/m128


instead of the

Code:
VFMADDPD xmm0, xmm1, xmm2/m128, xmm3


The new version destroys one of the source operands. :(
VPERMIL2PS, VPERMIL2PD instructions also removed. VPBLENDVB and VBLENDVPS remained 4-operand in the new spec, too.

What happened? Technical difficulties? Marketing decision? It would be too good for x86? Just the drawback of the early specification release?


Top
 Profile  
 
 Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
PostPosted: Tue Jan 20, 2009 5:31 pm 
Offline

Joined: Tue Aug 07, 2007 11:57 am
Posts: 181
BLL wrote:
The new version destroys one of the source operands. :(
VPERMIL2PS, VPERMIL2PD instructions also removed. VPBLENDVB and VBLENDVPS remained 4-operand in the new spec, too.

What happened? Technical difficulties? Marketing decision? It would be too good for x86? Just the drawback of the early specification release?



Intel has mentioned an optional rescheduling of Sandy Bridge a while ago,
"In order to ensure sufficient lifecycle for Nehalem (and family)..."

Which sounds like a Marketing way of saying that they are thinking about
a serious rescheduling, more than just a few months.

(BTW this was before the Credit Crunch occurred)



Image


Top
 Profile  
 
 Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
PostPosted: Tue Jan 20, 2009 6:50 pm 
Offline

Joined: Wed Aug 29, 2007 3:55 pm
Posts: 800
Location: Great white north
Hans de Vries wrote:
Intel has mentioned an optional rescheduling of Sandy Bridge a while ago,
"In order to ensure sufficient lifecycle for Nehalem (and family)..."

Which sounds like a Marketing way of saying that they are thinking about
a serious rescheduling, more than just a few months.


In other words the Nehalem uarch is so far ahead of anything on AMD's
roadmap there is no reason not to slow down on mainstream x86 tick
tock and either reduce R&D spending in the severe downturn or divert
engineering resources to other Intel businesses where competitors are
actually putting up a sporting fight.


Top
 Profile  
 
 Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
PostPosted: Tue Jan 20, 2009 7:02 pm 
Offline

Joined: Fri Mar 21, 2008 4:07 pm
Posts: 51
Paul DeMone wrote:
Hans de Vries wrote:
Intel has mentioned an optional rescheduling of Sandy Bridge a while ago,
"In order to ensure sufficient lifecycle for Nehalem (and family)..."

Which sounds like a Marketing way of saying that they are thinking about
a serious rescheduling, more than just a few months.


In other words the Nehalem uarch is so far ahead of anything on AMD's
roadmap there is no reason not to slow down on mainstream x86 tick
tock and either reduce R&D spending in the severe downturn or divert
engineering resources to other Intel businesses where competitors are
actually putting up a sporting fight.


While AMD might seem the clear loser in this cycle, they should not be underestimated.
Intel should finish Sandy Bridge as envisioned and simply delay its release.Having an aces in your hand is always better than expecting your competitor not to perform.

What about Tuk Paul, any news on the horizon ?


Top
 Profile  
 
 Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
PostPosted: Tue Jan 20, 2009 11:43 pm 
Offline

Joined: Wed Sep 26, 2007 11:11 pm
Posts: 33
savantu wrote:
Paul DeMone wrote:
Hans de Vries wrote:
Intel has mentioned an optional rescheduling of Sandy Bridge a while ago,
"In order to ensure sufficient lifecycle for Nehalem (and family)..."

Which sounds like a Marketing way of saying that they are thinking about
a serious rescheduling, more than just a few months.


In other words the Nehalem uarch is so far ahead of anything on AMD's
roadmap there is no reason not to slow down on mainstream x86 tick
tock and either reduce R&D spending in the severe downturn or divert
engineering resources to other Intel businesses where competitors are
actually putting up a sporting fight.


While AMD might seem the clear loser in this cycle, they should not be underestimated.
Intel should finish Sandy Bridge as envisioned and simply delay its release.Having an aces in your hand is always better than expecting your competitor not to perform.


Keep in mind that this slide talks about mainstream products. I doubt Intel would break their clean streak of tick-tocks out of complacency. Nehalem was launched in low volume at the very end of the year, and I would expect the next generations to have similar launches.

The problem with the mainstream is that it's slow to make big transitions. 'Extreme' desktop and server can skip to a new socket in perhaps six months, but desktop and mobile can take well over a year. Mainstream Nehalems have been scheduled for H2 for a long time now, and thinking ahead from there you can already see the problem coming. OEMs will not even be close to Penryn/Nehalem crossover when Westmere is released and the products with integrated GPU show up. Piling up another new platform just after those might be too much to take for them; margins were already thin before the economy went bad.

To solve this perhaps in the future mainstream products will only be created from ticks, not tocks. This would take pressure of the value segment while Intel could still keep the engines rolling at full speed for customers who care about performance.


Top
 Profile  
 
 Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
PostPosted: Wed Jan 21, 2009 11:19 am 
Offline

Joined: Fri Sep 07, 2007 10:31 am
Posts: 25
Location: Denmark
BLL wrote:
It seems, Intel removed the 4-operand FMA support from the AVX specification. [...]
The new version destroys one of the source operands. :(
VPERMIL2PS, VPERMIL2PD instructions also removed. VPBLENDVB and VBLENDVPS remained 4-operand in the new spec, too.

What happened? Technical difficulties? Marketing decision? It would be too good for x86? Just the drawback of the early specification release?


The 4-operand non-destructive opcode would give more flexibility to the programmer, but at a cost to the microarchitecture. We can only guess why Intel changed their plans, but I can think of two likely reasons:

(1) Support for 4-operand microoperations would be quite costly in terms of die space and power consumption. The extra flexibility of 4-operand instructions is not worth the extra cost. The few remaining 4-operand instructions are probably split into multiple 3-operand microoperations at the decoding stage.

(2) The 4-operand instructions are one byte longer than the 3-operand instructions. It looks like the next generation of Intel processors will still have a decoding rate of only 16 bytes per clock cycle where AMD have 32 bytes per clock. In other words, instruction size is a serious bottleneck for Intel but not for AMD. This could make AMD SSE5 win over Intel FMA on some benchmark tests.

This change makes Intel FMA and AMD SSE5 even more similar. The software community will be quite unwilling to support two different codes for identical instructions. Fortunately, the delay in Intel's schedule gives AMD more time to catch up and hopefully make their processors compatible with Intel. I see no hope that Intel will ever support AMD SSE5.


Top
 Profile  
 
 Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
PostPosted: Wed Jan 21, 2009 2:23 pm 
Offline

Joined: Fri Aug 31, 2007 10:08 pm
Posts: 208
Location: Switzerland
Agner wrote:
The few remaining 4-operand instructions are probably split into multiple 3-operand microoperations at the decoding stage.


I really hope it's not the case, there will be not much purpose for VBLENDVPS if it's throughput is not better than the VANDPS+VANDNPS+VORPS equivalent

Agner wrote:
It looks like the next generation of Intel processors will still have a decoding rate of only 16 bytes per clock cycle


how do you know that ? remember we are talking about the "tock" after Sandy Bridge (2012) for FMA in AVX ?

Agner wrote:
This could make AMD SSE5 win over Intel FMA on some benchmark tests.


hey ! AVX YMM registers are 2x wider than legacy XMM registers used by "AMD SSE5", isn't it ?

Agner wrote:
compatible with Intel. I see no hope that Intel will ever support AMD SSE5.


honestly, at this point, we are really not sure if AMD itself will support "AMD SSE5"


Top
 Profile  
 
 Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
PostPosted: Wed Jan 21, 2009 2:29 pm 
Offline

Joined: Tue Aug 07, 2007 11:57 am
Posts: 181
Agner wrote:
This change makes Intel FMA and AMD SSE5 even more similar. The software community will be quite unwilling to support two different codes for identical instructions. Fortunately, the delay in Intel's schedule gives AMD more time to catch up and hopefully make their processors compatible with Intel. I see no hope that Intel will ever support AMD SSE5.


The more interesting speculations are that the delay could be (once more)
"Microsoft inspired" in order to avoid such a fork. This was my reason for
posting the slide in the first place.


Regards, Hans


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jan 21, 2009 3:31 pm 
Offline

Joined: Sun Jan 18, 2009 10:35 pm
Posts: 81
Btw, does anybody really need AVX on desktop? Chips like Larrabe and probably next gpu from NV will have that long registers. 4 double vector maybe useful in scientific calculations, but how to utilize it on desktop? Most likely video decoding will use gpu soon.
SSE5 looks better this way, you can only recompile existing vectorized application.


Top
 Profile  
 
 Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
PostPosted: Wed Jan 21, 2009 3:33 pm 
Offline

Joined: Wed Aug 29, 2007 3:55 pm
Posts: 800
Location: Great white north
Agner wrote:
Fortunately, the delay in Intel's schedule gives AMD more time to catch up and hopefully make their processors compatible with Intel. I see no hope that Intel will ever support AMD SSE5.


That presumes AMD's current business model survives long enough to
bring an all new x86 microarchitecture to market. AMD's roadmap shows
only K10 core based products for the next few years.


Top
 Profile  
 
 Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
PostPosted: Thu Jan 22, 2009 12:38 am 
Offline

Joined: Sat Mar 22, 2008 5:10 pm
Posts: 220
Agner wrote:
The 4-operand non-destructive opcode would give more flexibility to the programmer, but at a cost to the microarchitecture. We can only guess why Intel changed their plans, but I can think of two likely reasons:

(1) Support for 4-operand microoperations would be quite costly in terms of die space and power consumption. The extra flexibility of 4-operand instructions is not worth the extra cost. The few remaining 4-operand instructions are probably split into multiple 3-operand microoperations at the decoding stage.

(2) The 4-operand instructions are one byte longer than the 3-operand instructions. It looks like the next generation of Intel processors will still have a decoding rate of only 16 bytes per clock cycle where AMD have 32 bytes per clock. In other words, instruction size is a serious bottleneck for Intel but not for AMD. This could make AMD SSE5 win over Intel FMA on some benchmark tests.

This change makes Intel FMA and AMD SSE5 even more similar. The software community will be quite unwilling to support two different codes for identical instructions. Fortunately, the delay in Intel's schedule gives AMD more time to catch up and hopefully make their processors compatible with Intel. I see no hope that Intel will ever support AMD SSE5.

A third theory, Intel may be leaving space for predicates. A non-destructive FMA+predicate means a 5 source operand instruction, in the other hand, a destructive FMA+predicate has the same 4 source operands as a non-destructive MUL or ADD + predicate, predicates would make BLENDs obsolete, so they don't have to care about their future.


Top
 Profile  
 
 Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
PostPosted: Thu Jan 22, 2009 6:20 pm 
Offline

Joined: Fri Sep 07, 2007 10:31 am
Posts: 25
Location: Denmark
Hans de Vries wrote:
The more interesting speculations are that the delay could be (once more) "Microsoft inspired" in order to avoid such a fork. This was my reason for posting the slide in the first place.

Sorry, I don't get it. You posted an Intel slide. Do you have any information about delay in AMD schedule or SSE5?

EduardoS wrote:
A third theory, Intel may be leaving space for predicates. A non-destructive FMA+predicate means a 5 source operand instruction, in the other hand, a destructive FMA+predicate has the same 4 source operands as a non-destructive MUL or ADD + predicate, predicates would make BLENDs obsolete, so they don't have to care about their future.

The 4-operand AVX format has 4 unused bits probably reserved for future 5-operand instructions. No need to reserve space for a predicate operand.

Predicate instructions would be nice. In fact, I wonder why they haven't made predicates long ago. BTW, it would be possible to make predicated instructions without changing the instruction set. Intel have something they call "macro-op fusion": the decoder fuses a compare + a branch instruction into a single micro-operation. In the same way they could convert a short branch + an ALU instruction into a predicated micro-operation. They could convert an FP-MUL + FP-ADD into an FMA microoperation. And they could convert a MOV + a 2-operand ALU instruction into a 3-operand ALU operation. But they prefer to extend the already burgeoning instruction set :-)


Top
 Profile  
 
 Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
PostPosted: Thu Jan 22, 2009 10:21 pm 
Offline

Joined: Tue Aug 07, 2007 11:57 am
Posts: 181
Agner wrote:
Hans de Vries wrote:
The more interesting speculations are that the delay could be (once more) "Microsoft inspired" in order to avoid such a fork. This was my reason for posting the slide in the first place.

Sorry, I don't get it. You posted an Intel slide. Do you have any information about delay in AMD schedule or SSE5?


Bulldozer (SSE5) now comes in ~Q1 2011

Image


Regards, Hans


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 163 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6 ... 11  Next

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: