You are currently viewing our boards as a guest, which gives you limited access to view most discussions and access our other features. By joining our free community, you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content, and access many other special features. Registration is fast, simple, and absolutely free, so please, <a href="/profile.php?mode=register">join our community today</a>!
Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
Posted: Fri Jan 23, 2009 6:05 am
Joined: Sat Mar 22, 2008 5:10 pm Posts: 220
Agner wrote:
The 4-operand AVX format has 4 unused bits probably reserved for future 5-operand instructions. No need to reserve space for a predicate operand.
I wasn't talking about reserving space in instruction opcode but on chip logic, schedulling an instruction with 5 source operands is more difficult than a 4 source operands one.
Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
Posted: Fri Jan 23, 2009 2:13 pm
Joined: Wed Jun 27, 2007 1:38 pm Posts: 475
Gabriele Svelto wrote:
no@spam.com wrote:
If so, then BD better deliver FMA in Intel-compatible form and the remainder (SSE5-FMA) with VEX encodings.
Whatever happens it is very likely that the final decision on this matter will be taken by Microsoft.
Is supporting these new instructions in OS significantly harder than supporting, e.g., SSE? If not, then why would Microsoft care? They would just add support for both, just like they did in 3DNow and SSE.
Last edited by jack on Fri Jan 23, 2009 4:04 pm, edited 2 times in total.
Whatever happens it is very likely that the final decision on this matter will be taken by Microsoft.
The new 256-bit vector registers need OS support. Everything else will work in existing operating systems, including SSE5 as well as AVX and FMA instructions as long as they are used on 128-bit registers. The final decision will be taken by software producers supporting Intel or AMD instructions or both. Microsoft cares more about C# and Visual Basic than about native C++, which is the platform most likely to use the new instruction sets.
Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
Posted: Mon Jan 26, 2009 5:16 am
Joined: Tue Jul 24, 2007 10:12 pm Posts: 59
Agner wrote:
Gabriele Svelto wrote:
Whatever happens it is very likely that the final decision on this matter will be taken by Microsoft.
The new 256-bit vector registers need OS support. Everything else will work in existing operating systems, including SSE5 as well as AVX and FMA instructions as long as they are used on 128-bit registers. The final decision will be taken by software producers supporting Intel or AMD instructions or both. Microsoft cares more about C# and Visual Basic than about native C++, which is the platform most likely to use the new instruction sets.
I am not sure why we need the 256-bit register expansion, or a whole new instruction set extension that will probably have a short useful life. Why the "in-between" stage? Why not jump to supporting something like a larrabee (or other gpu style) co-processor or the instruction extensions? If code really benefits from 256-bit registers, then wouldn't it run a whole lot faster on a larrabee co-processor? I guess supporting asymmetric cores in the OS may take some time, but I don't think most people need 4 cpu cores, much less the 8 core beast that are coming soon. Four cores and a vector co-processor makes a lot more sense.
With AMD continuing to lose money, anyone think Intel will be pushing IA-64 for everyone again, if AMD gets to weak? They seem to have taken AMD's strategy of x86 everywhere, but I don't know if everyone at Intel is happy about that.
Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
Posted: Mon Jan 26, 2009 6:16 am
Joined: Sat Sep 01, 2007 8:01 am Posts: 652
jim_cox wrote:
Agner wrote:
Gabriele Svelto wrote:
Whatever happens it is very likely that the final decision on this matter will be taken by Microsoft.
The new 256-bit vector registers need OS support. Everything else will work in existing operating systems, including SSE5 as well as AVX and FMA instructions as long as they are used on 128-bit registers. The final decision will be taken by software producers supporting Intel or AMD instructions or both. Microsoft cares more about C# and Visual Basic than about native C++, which is the platform most likely to use the new instruction sets.
I am not sure why we need the 256-bit register expansion, or a whole new instruction set extension that will probably have a short useful life. Why the "in-between" stage? Why not jump to supporting something like a larrabee (or other gpu style) co-processor or the instruction extensions? If code really benefits from 256-bit registers, then wouldn't it run a whole lot faster on a larrabee co-processor? I guess supporting asymmetric cores in the OS may take some time, but I don't think most people need 4 cpu cores, much less the 8 core beast that are coming soon. Four cores and a vector co-processor makes a lot more sense.
With AMD continuing to lose money, anyone think Intel will be pushing IA-64 for everyone again, if AMD gets to weak? They seem to have taken AMD's strategy of x86 everywhere, but I don't know if everyone at Intel is happy about that.
I am happy :)
well, if you mean adding Larrabee like cores into the CPU, you need to ritch a certain level of financial condition before you can do this, your dice size and your average selling price have to make sense.
The success of Intel is mainly based on those think of choice. For example, in 65nm, a native quad core did not make any sense financially, and we all saw that happen to AMD when they tried. Native quad core was only possible to try with serious single threaded performance in 45nm, AMD learned it the hard way.
I am not sure why we need the 256-bit register expansion, or a whole new instruction set extension that will probably have a short useful life.
I think exactly the same, it will be far better to have one single ISA for Larrabee and Sandy Bridge, they say Larrabee is x86 "compatible" and in fact we must plan for two more code paths for x86 CPUs and x86 "compatible" GPU/GPGPU/whatever
if the die area argument stand they should have done like SSE in Katmai: process each vector in two 256-bit part in the 1st generation of 512-bit CPU
if the die area argument stand they should have done like SSE in Katmai: process each vector in two 256-bit part in the 1st generation of 512-bit CPU
That's indeed what they are going to do, except divide by 2:
The first CPUs with 256-bit vector instructions will have only 128-bit ALUs and divide each 256-bit vector in two halves.
The AVX instruction format has plenty of room for expansion into bigger vectors. Hopefully, they will use the same instruction codes on the GPU.
The first CPUs with 256-bit vector instructions will have only 128-bit ALUs and divide each 256-bit vector in two halves.
from where do you get that info ? Iremember well the IDF slides where it's clearly stated that the throughput is doubled already in the 1st AVX implmeentation
it's not my logic, but the logic behind nearly all vector processors since 30+ years: decouple the width of the vectors in the ISA with the width of the hardware vector units, I provided an example : 2-wide units (FP32) in Katmai to process 4-wide SSE, it's easy to see what I mean for Sandy Bridge : define the AVX ISA initially as 16-wide but process with 8-wide execution units
Post subject: Re: Intel removed the 4-operand FMA support from AVX spec
Posted: Mon Jan 26, 2009 9:00 am
Joined: Sat Sep 01, 2007 8:01 am Posts: 652
Eric Bron wrote:
who? wrote:
I don't follow your logic, can you explain ?
it's not my logic, but the logic behind nearly all vector processors since 30+ years: decouple the width of the vectors in the ISA with the width of the hardware vector units, I provided an example : 2-wide units (FP32) in Katmai to process 4-wide SSE, it's easy to see what I mean for Sandy Bridge : define the AVX ISA initially as 16-wide but process with 8-wide execution units
I see what you are saying, I can 't comment on it yet ... I wish i could.
I don't know how big (in square microns) register files are in contemporary hardware. 16 512-bit registers is a kilobyte, three read and one write port per AVX unit feels as if it might be visible to the naked eye even in 32nm.
Various discussion on the linux-kernel mailing list suggests that 512-bit vector registers are thought of as an explicit but dim possibility; there's a medium bag on the side of AVX so that code which doesn't know about the top halves of YMM registers doesn't destroy them, combined with an indication that there won't be this special case for ZMM.
I really hope the wheel of reincarnation doesn't bring us back to the Cray world which includes registers (the 64x64 matrix in the BMM unit) too large to save automatically and defined to be destroyed by function calls.
Users browsing this forum: No registered users and 0 guests
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum