You are currently viewing our boards as a guest, which gives you limited access to view most discussions and access our other features. By joining our free community, you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content, and access many other special features. Registration is fast, simple, and absolutely free, so please, <a href="/profile.php?mode=register">join our community today</a>!
I have read your info on how Intel cheats, but was more concerned about what a "CPU dispatcher" is. Sounds like it is just compiler support for generating a cpuid instruction + switch statement, for a given function marked with __declspec(cpu_dispatch).
(I was hoping somehow that it was some kind of hardware support for accessing different SSE instruction fallbacks at runtime.)
Sounds like it is just compiler support for generating a cpuid instruction + switch statement, for a given function marked with __declspec(cpu_dispatch).
Yes it is.
Some function libraries have dispatching of the most time-consuming functions.
Dispatching the individual instructions can be done with emulation but this is very slow.
The new version destroys one of the source operands. :(
VPERMIL2PS, VPERMIL2PD instructions also removed. VPBLENDVB and VBLENDVPS remained 4-operand in the new spec, too.
What happened? Technical difficulties? Marketing decision? It would be too good for x86? Just the drawback of the early specification release?
Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
Posted: Tue Jan 20, 2009 5:31 pm
Joined: Tue Aug 07, 2007 11:57 am Posts: 181
BLL wrote:
The new version destroys one of the source operands. :( VPERMIL2PS, VPERMIL2PD instructions also removed. VPBLENDVB and VBLENDVPS remained 4-operand in the new spec, too.
What happened? Technical difficulties? Marketing decision? It would be too good for x86? Just the drawback of the early specification release?
Intel has mentioned an optional rescheduling of Sandy Bridge a while ago,
"In order to ensure sufficient lifecycle for Nehalem (and family)..."
Which sounds like a Marketing way of saying that they are thinking about
a serious rescheduling, more than just a few months.
Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
Posted: Tue Jan 20, 2009 6:50 pm
Joined: Wed Aug 29, 2007 3:55 pm Posts: 800 Location: Great white north
Hans de Vries wrote:
Intel has mentioned an optional rescheduling of Sandy Bridge a while ago, "In order to ensure sufficient lifecycle for Nehalem (and family)..."
Which sounds like a Marketing way of saying that they are thinking about a serious rescheduling, more than just a few months.
In other words the Nehalem uarch is so far ahead of anything on AMD's
roadmap there is no reason not to slow down on mainstream x86 tick
tock and either reduce R&D spending in the severe downturn or divert
engineering resources to other Intel businesses where competitors are
actually putting up a sporting fight.
Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
Posted: Tue Jan 20, 2009 7:02 pm
Joined: Fri Mar 21, 2008 4:07 pm Posts: 51
Paul DeMone wrote:
Hans de Vries wrote:
Intel has mentioned an optional rescheduling of Sandy Bridge a while ago, "In order to ensure sufficient lifecycle for Nehalem (and family)..."
Which sounds like a Marketing way of saying that they are thinking about a serious rescheduling, more than just a few months.
In other words the Nehalem uarch is so far ahead of anything on AMD's roadmap there is no reason not to slow down on mainstream x86 tick tock and either reduce R&D spending in the severe downturn or divert engineering resources to other Intel businesses where competitors are actually putting up a sporting fight.
While AMD might seem the clear loser in this cycle, they should not be underestimated.
Intel should finish Sandy Bridge as envisioned and simply delay its release.Having an aces in your hand is always better than expecting your competitor not to perform.
Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
Posted: Tue Jan 20, 2009 11:43 pm
Joined: Wed Sep 26, 2007 11:11 pm Posts: 33
savantu wrote:
Paul DeMone wrote:
Hans de Vries wrote:
Intel has mentioned an optional rescheduling of Sandy Bridge a while ago, "In order to ensure sufficient lifecycle for Nehalem (and family)..."
Which sounds like a Marketing way of saying that they are thinking about a serious rescheduling, more than just a few months.
In other words the Nehalem uarch is so far ahead of anything on AMD's roadmap there is no reason not to slow down on mainstream x86 tick tock and either reduce R&D spending in the severe downturn or divert engineering resources to other Intel businesses where competitors are actually putting up a sporting fight.
While AMD might seem the clear loser in this cycle, they should not be underestimated. Intel should finish Sandy Bridge as envisioned and simply delay its release.Having an aces in your hand is always better than expecting your competitor not to perform.
Keep in mind that this slide talks about mainstream products. I doubt Intel would break their clean streak of tick-tocks out of complacency. Nehalem was launched in low volume at the very end of the year, and I would expect the next generations to have similar launches.
The problem with the mainstream is that it's slow to make big transitions. 'Extreme' desktop and server can skip to a new socket in perhaps six months, but desktop and mobile can take well over a year. Mainstream Nehalems have been scheduled for H2 for a long time now, and thinking ahead from there you can already see the problem coming. OEMs will not even be close to Penryn/Nehalem crossover when Westmere is released and the products with integrated GPU show up. Piling up another new platform just after those might be too much to take for them; margins were already thin before the economy went bad.
To solve this perhaps in the future mainstream products will only be created from ticks, not tocks. This would take pressure of the value segment while Intel could still keep the engines rolling at full speed for customers who care about performance.
It seems, Intel removed the 4-operand FMA support from the AVX specification. [...] The new version destroys one of the source operands. :( VPERMIL2PS, VPERMIL2PD instructions also removed. VPBLENDVB and VBLENDVPS remained 4-operand in the new spec, too.
What happened? Technical difficulties? Marketing decision? It would be too good for x86? Just the drawback of the early specification release?
The 4-operand non-destructive opcode would give more flexibility to the programmer, but at a cost to the microarchitecture. We can only guess why Intel changed their plans, but I can think of two likely reasons:
(1) Support for 4-operand microoperations would be quite costly in terms of die space and power consumption. The extra flexibility of 4-operand instructions is not worth the extra cost. The few remaining 4-operand instructions are probably split into multiple 3-operand microoperations at the decoding stage.
(2) The 4-operand instructions are one byte longer than the 3-operand instructions. It looks like the next generation of Intel processors will still have a decoding rate of only 16 bytes per clock cycle where AMD have 32 bytes per clock. In other words, instruction size is a serious bottleneck for Intel but not for AMD. This could make AMD SSE5 win over Intel FMA on some benchmark tests.
This change makes Intel FMA and AMD SSE5 even more similar. The software community will be quite unwilling to support two different codes for identical instructions. Fortunately, the delay in Intel's schedule gives AMD more time to catch up and hopefully make their processors compatible with Intel. I see no hope that Intel will ever support AMD SSE5.
The few remaining 4-operand instructions are probably split into multiple 3-operand microoperations at the decoding stage.
I really hope it's not the case, there will be not much purpose for VBLENDVPS if it's throughput is not better than the VANDPS+VANDNPS+VORPS equivalent
Agner wrote:
It looks like the next generation of Intel processors will still have a decoding rate of only 16 bytes per clock cycle
how do you know that ? remember we are talking about the "tock" after Sandy Bridge (2012) for FMA in AVX ?
Agner wrote:
This could make AMD SSE5 win over Intel FMA on some benchmark tests.
hey ! AVX YMM registers are 2x wider than legacy XMM registers used by "AMD SSE5", isn't it ?
Agner wrote:
compatible with Intel. I see no hope that Intel will ever support AMD SSE5.
honestly, at this point, we are really not sure if AMD itself will support "AMD SSE5"
Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
Posted: Wed Jan 21, 2009 2:29 pm
Joined: Tue Aug 07, 2007 11:57 am Posts: 181
Agner wrote:
This change makes Intel FMA and AMD SSE5 even more similar. The software community will be quite unwilling to support two different codes for identical instructions. Fortunately, the delay in Intel's schedule gives AMD more time to catch up and hopefully make their processors compatible with Intel. I see no hope that Intel will ever support AMD SSE5.
The more interesting speculations are that the delay could be (once more)
"Microsoft inspired" in order to avoid such a fork. This was my reason for
posting the slide in the first place.
Btw, does anybody really need AVX on desktop? Chips like Larrabe and probably next gpu from NV will have that long registers. 4 double vector maybe useful in scientific calculations, but how to utilize it on desktop? Most likely video decoding will use gpu soon.
SSE5 looks better this way, you can only recompile existing vectorized application.
Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
Posted: Wed Jan 21, 2009 3:33 pm
Joined: Wed Aug 29, 2007 3:55 pm Posts: 800 Location: Great white north
Agner wrote:
Fortunately, the delay in Intel's schedule gives AMD more time to catch up and hopefully make their processors compatible with Intel. I see no hope that Intel will ever support AMD SSE5.
That presumes AMD's current business model survives long enough to
bring an all new x86 microarchitecture to market. AMD's roadmap shows
only K10 core based products for the next few years.
Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
Posted: Thu Jan 22, 2009 12:38 am
Joined: Sat Mar 22, 2008 5:10 pm Posts: 220
Agner wrote:
The 4-operand non-destructive opcode would give more flexibility to the programmer, but at a cost to the microarchitecture. We can only guess why Intel changed their plans, but I can think of two likely reasons:
(1) Support for 4-operand microoperations would be quite costly in terms of die space and power consumption. The extra flexibility of 4-operand instructions is not worth the extra cost. The few remaining 4-operand instructions are probably split into multiple 3-operand microoperations at the decoding stage.
(2) The 4-operand instructions are one byte longer than the 3-operand instructions. It looks like the next generation of Intel processors will still have a decoding rate of only 16 bytes per clock cycle where AMD have 32 bytes per clock. In other words, instruction size is a serious bottleneck for Intel but not for AMD. This could make AMD SSE5 win over Intel FMA on some benchmark tests.
This change makes Intel FMA and AMD SSE5 even more similar. The software community will be quite unwilling to support two different codes for identical instructions. Fortunately, the delay in Intel's schedule gives AMD more time to catch up and hopefully make their processors compatible with Intel. I see no hope that Intel will ever support AMD SSE5.
A third theory, Intel may be leaving space for predicates. A non-destructive FMA+predicate means a 5 source operand instruction, in the other hand, a destructive FMA+predicate has the same 4 source operands as a non-destructive MUL or ADD + predicate, predicates would make BLENDs obsolete, so they don't have to care about their future.
The more interesting speculations are that the delay could be (once more) "Microsoft inspired" in order to avoid such a fork. This was my reason for posting the slide in the first place.
Sorry, I don't get it. You posted an Intel slide. Do you have any information about delay in AMD schedule or SSE5?
EduardoS wrote:
A third theory, Intel may be leaving space for predicates. A non-destructive FMA+predicate means a 5 source operand instruction, in the other hand, a destructive FMA+predicate has the same 4 source operands as a non-destructive MUL or ADD + predicate, predicates would make BLENDs obsolete, so they don't have to care about their future.
The 4-operand AVX format has 4 unused bits probably reserved for future 5-operand instructions. No need to reserve space for a predicate operand.
Predicate instructions would be nice. In fact, I wonder why they haven't made predicates long ago. BTW, it would be possible to make predicated instructions without changing the instruction set. Intel have something they call "macro-op fusion": the decoder fuses a compare + a branch instruction into a single micro-operation. In the same way they could convert a short branch + an ALU instruction into a predicated micro-operation. They could convert an FP-MUL + FP-ADD into an FMA microoperation. And they could convert a MOV + a 2-operand ALU instruction into a 3-operand ALU operation. But they prefer to extend the already burgeoning instruction set :-)
Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
Posted: Thu Jan 22, 2009 10:21 pm
Joined: Tue Aug 07, 2007 11:57 am Posts: 181
Agner wrote:
Hans de Vries wrote:
The more interesting speculations are that the delay could be (once more) "Microsoft inspired" in order to avoid such a fork. This was my reason for posting the slide in the first place.
Sorry, I don't get it. You posted an Intel slide. Do you have any information about delay in AMD schedule or SSE5?
Users browsing this forum: No registered users and 0 guests
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum