You are currently viewing our boards as a guest, which gives you limited access to view most discussions and access our other features. By joining our free community, you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content, and access many other special features. Registration is fast, simple, and absolutely free, so please, <a href="/profile.php?mode=register">join our community today</a>!
The CPUID instruction in modern PC microprocessors works in an awkward way. Originally, the CPUID gave 4 bits for family number and 4 bits for model number. This means that the maximum numbers are family 15 and model 15. When these numbers were exhausted, they added 4 more bits for the model number. The new 4 bits are concatenated with the old 4 bits to make an 8-bit number, so the maximum value for model number is 255, or FF hexadecimal.
It would be logical to do the same with the family number, but instead they have added 8 more bits called "extended family". The new 8 bits are not concatenated with the old 4 bits to make a 12-bit number. Instead they have specified that we must calculate the sum of the old 4-bit "family" number and the new 8-bit "extended family" number. This means that the same family number can be specified in more than one way - and I think I know why!
Here's my theory:
Intel have made a compiler to support the ever growing extensions to the instruction set. The Intel compiler puts a CPU-dispatcher into your code to check whether the CPU supports the SSE2, SSE3, SSE4 or whatever instruction set. The compiled program can contain more than one version of critical parts of the code, and the CPU-dispatcher automatically chooses the version that fits the available instruction set. So far so good. The problem is that the CPU-dispatcher makes its choice based on family numbers and not only based on the feature bits that tell whether SSE2 etc. is supported. And it will not recognize unknown family numbers. The consequence is that any future Intel CPU with a family number different from 6 or 15 will not be recognized and will run with all SSE instruction sets disabled or will not run at all. There are lots of software on the market that is compiled with the Intel compiler. All this software would fail to run efficiently on a new Intel CPU with a family number different from 6 unless it is recompiled. The CPU-dispatcher checks only the old 4-bit family number. They can make the old family number = 6 in order to fool the CPU-dispatcher and then make the extended family number = e.g. 10 to make the sum = 16 or whatever number the marketing department dictates.
So the awkward implementation of the CPUID instruction is to make up for a serious blunder made by the people who designed the Intel compiler.
Funny that AMD have accepted this scheme, but they probably had no choice. BTW, the CPU-dispatcher in the Intel compiler also checks the brand name in the CPU and disables all SSE extensions if the brand is anything but Intel. See
http://www.agner.org/optimize/optimizing_cpp.pdf for how to circumvent this and make the code compiled with the Intel compiler work on AMD processors.
This means that the same family number can be specified in more than one way
ough !, shades of the infamous 8086 segmented memory
Agner wrote:
Intel have made a compiler to support the ever growing extensions to the instruction set. The Intel compiler puts a CPU-dispatcher into your code to check whether the CPU supports the SSE2, SSE3, SSE4 or whatever instruction set.
yes but only with some compilation flags, though if you link with Intel libraries like MKL or IPP you indeed always have the problem
Agner wrote:
based on family numbers and not only based on the feature bits that tell whether SSE2 etc. is supported.
it looks like a very bad decision, it's much like if they say the features flags are not reliable, in fact there is probably no technical justification but an attempt to make it fast for Intel targets only instead of fast for all industry standard CPUs
Thank for the link to http://www.swallowtail.org/naughty-intel.html. He has found exactly the same as I have found in Intel's C++ compiler.
The issue is still the same in later versions of the compiler. I have complained to Intel. First they tried to cover up and explain away my findings, but after I had presented piles of evidence to them, they admitted. And they refused to change the cripple-AMD feature.
Post subject: Re: CPUID family bits added because of flaw in Intel compiler
Posted: Sun Mar 23, 2008 9:34 am
Joined: Sat Sep 01, 2007 8:01 am Posts: 652
Agner wrote:
The CPUID instruction in modern PC microprocessors works in an awkward way. Originally, the CPUID gave 4 bits for family number and 4 bits for model number. This means that the maximum numbers are family 15 and model 15. When these numbers were exhausted, they added 4 more bits for the model number. The new 4 bits are concatenated with the old 4 bits to make an 8-bit number, so the maximum value for model number is 255, or FF hexadecimal. It would be logical to do the same with the family number, but instead they have added 8 more bits called "extended family". The new 8 bits are not concatenated with the old 4 bits to make a 12-bit number. Instead they have specified that we must calculate the sum of the old 4-bit "family" number and the new 8-bit "extended family" number. This means that the same family number can be specified in more than one way - and I think I know why!
Here's my theory: Intel have made a compiler to support the ever growing extensions to the instruction set. The Intel compiler puts a CPU-dispatcher into your code to check whether the CPU supports the SSE2, SSE3, SSE4 or whatever instruction set. The compiled program can contain more than one version of critical parts of the code, and the CPU-dispatcher automatically chooses the version that fits the available instruction set. So far so good. The problem is that the CPU-dispatcher makes its choice based on family numbers and not only based on the feature bits that tell whether SSE2 etc. is supported. And it will not recognize unknown family numbers. The consequence is that any future Intel CPU with a family number different from 6 or 15 will not be recognized and will run with all SSE instruction sets disabled or will not run at all. There are lots of software on the market that is compiled with the Intel compiler. All this software would fail to run efficiently on a new Intel CPU with a family number different from 6 unless it is recompiled. The CPU-dispatcher checks only the old 4-bit family number. They can make the old family number = 6 in order to fool the CPU-dispatcher and then make the extended family number = e.g. 10 to make the sum = 16 or whatever number the marketing department dictates.
So the awkward implementation of the CPUID instruction is to make up for a serious blunder made by the people who designed the Intel compiler.
Funny that AMD have accepted this scheme, but they probably had no choice. BTW, the CPU-dispatcher in the Intel compiler also checks the brand name in the CPU and disables all SSE extensions if the brand is anything but Intel. See http://www.agner.org/optimize/optimizing_cpp.pdf for how to circumvent this and make the code compiled with the Intel compiler work on AMD processors.
the thrust is out there!
Agent Mulder
one more conspirancy theorie . . . i wonder why AMD math lab does not support Conroe core??? probably because AMD designed their CPU ID specially to stop it from running on conroe???? OR VIS ET VERSA????
Thank for the link to http://www.swallowtail.org/naughty-intel.html. He has found exactly the same as I have found in Intel's C++ compiler. The issue is still the same in later versions of the compiler. I have complained to Intel. First they tried to cover up and explain away my findings, but after I had presented piles of evidence to them, they admitted. And they refused to change the cripple-AMD feature.
You are welcome :) Just one last question, with "later version", you really meant ICC 10.X ?
As mentioned before, I thought that Intel stopped the "intel only" coding with that version, as AMD is using it against intel in the ongoing lawsuit:
Quote:
125. Intel has designed its compiler purposely to degrade performance when a program is run on an AMD platform. To achieve this, Intel designed the compiler to compile code along several alternate code paths. Some paths are executed when the program runs on an Intel platform and others are executed when the program is operated on a computer with an AMD microprocessor. (The choice of code path is determined when the program is started, using a feature known as “CPUID” which identifies the computer’s microprocessor.) By design, the code paths were not created equally. If the program detects a “Genuine Intel” microprocessor, it executes a fully optimized code path and operates with the maximum efficiency. However, if the program detects an “Authentic AMD” microprocessor, it executes a different code path that will degrade the program’s performance or cause it to crash.
If it is still in ICC 10.X then AMD would love that ^^
However it is not that severe anymore, I just re-red a print article on the ICC 10 suite. It mentions that one should use the flags -[Q]xO for AMD CPUs. The difference to the patched code of a -[Q]xP optimized binary would be interesting, though, if there is any.
I don't think AMD really has any right to complain that Intel's compiler doesn't optimize for their CPUs, or even that it de-optimizes. That's just a part of life and competing with Intel.
There are 3rd party compilers (Sun, PGI, not sure about Pathscale any more though), which I'm sure AMD can contribute to, as well as the ever present GCC and MSVC.
I don't really see that as anti-competitive, especially given how commonly ICC is used. If ICC were the most commonly used compiler, then I might have an issue with it, but I think in reality that GCC and MSVC are much more commonly used.
David,
I think the question is how it's done, the compiler generate many paths, one with SSE2 and others without (for example) and use family and features flags to determine the best path for Intel processors, the problem is that non-Intel processors with SSE2 could benefit from this path, but the compiler explicity (and unnecessarily since SSE2 was checked before and the code will work) avoid it.
I agree if someone says they don't have to otimize for competitors, but that "if(Intel)" is really necessary?
Ok, ICC is rarely used so AMD customers won't be so damaged, but Intel benchmarks AMD's processors using ICC and use this info for marketing, wich may hurt AMD's business, for example:
http://www.spec.org/cpu2006/results/res ... 00080.html http://www.spec.org/cpu2006/results/res ... 00078.html http://www.spec.org/cpu2006/results/res ... 00079.html http://www.spec.org/cpu2006/results/res ... 00081.html
Ok, ICC is rarely used so AMD customers won't be so damaged, but Intel benchmarks AMD's processors using ICC and use this info for marketing, wich may hurt AMD's business, for example:
Until relatively recently AMD benchmarked their own chips with ICC, so I don't see how 206 benchmarks by Intel are disingenuous. The first result linked is clock for clock faster than AMD's own submissions in the same year and quarter.
I just tried Intel C++ compiler version 10.1 with option /QxO as you suggested. It generates the following versions of code for common mathematical functions: SSE2, SSE3, SSE4.1 and non-Intel SSE2. It doesn't work on any CPU prior to SSE2. This is the only compiler option that makes it run reasonably on an AMD, but why are there two different SSE2 versions, one for Intel and one for AMD? When I hack the CPU-dispatcher and makes it believe that it is an Intel, it runs 50 - 100 % faster. This means that the Intel-SSE2 version is faster than the AMD-SSE2 version when running on an AMD processor!
There are also options that work on any processor. For example /QaxB. This options runs non-vectorized SSE2 code on Intel processors and old 8087 code on AMD processors. I measured this to be 5-10 times slower than the /QxO option on an AMD Opteron.
AMD does optimize its software for its own processors, nothing wrong with this!
"Announcing ACML Version 4.0!
New features in Version 4.0 include:
Update of LAPACK to version 3.1.1
Optimizations for Third-Generation AMD Opteron™ Processors "
Why it is wrong when Intel does the same?
This is old FUD, and by the way, when you don't disclose your origine, make sure you don t post on a forum with a public log ... At least, have the decency to do not use your laptop from the office.
AMD does optimize its software for its own processors, nothing wrong with this!
"Announcing ACML Version 4.0! New features in Version 4.0 include: Update of LAPACK to version 3.1.1 Optimizations for Third-Generation AMD Opteron™ Processors "
Why it is wrong when Intel does the same?
You can't see the difference between providing binaries and a compiler?
Quote:
This is old FUD, and by the way, when you don't disclose your origine, make sure you don t post on a forum with a public log ... At least, have the decency to do not use your laptop from the office.
Agner's site states who he is, everything is in there down to his private address. Where is your disclaimer ?
Users browsing this forum: No registered users and 1 guest
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum