Aceshardware

(not so) temporary home for the aceshardware community
 FAQ •  Search •  Register •  Login 
It is currently Thu Mar 23, 2017 1:14 pm

All times are UTC + 1 hour



Welcome
Welcome to <strong>Aceshardware</strong>.

You are currently viewing our boards as a guest, which gives you limited access to view most discussions and access our other features. By joining our free community, you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content, and access many other special features. Registration is fast, simple, and absolutely free, so please, <a href="/profile.php?mode=register">join our community today</a>!


Post new topic Reply to topic  [ 163 posts ]  Go to page 1, 2, 3, 4, 5 ... 11  Next
Author Message
 Post subject: CPUID family bits added because of flaw in Intel compiler
PostPosted: Sat Mar 22, 2008 9:35 am 
Offline

Joined: Fri Sep 07, 2007 10:31 am
Posts: 41
Location: Denmark
The CPUID instruction in modern PC microprocessors works in an awkward way. Originally, the CPUID gave 4 bits for family number and 4 bits for model number. This means that the maximum numbers are family 15 and model 15. When these numbers were exhausted, they added 4 more bits for the model number. The new 4 bits are concatenated with the old 4 bits to make an 8-bit number, so the maximum value for model number is 255, or FF hexadecimal.
It would be logical to do the same with the family number, but instead they have added 8 more bits called "extended family". The new 8 bits are not concatenated with the old 4 bits to make a 12-bit number. Instead they have specified that we must calculate the sum of the old 4-bit "family" number and the new 8-bit "extended family" number. This means that the same family number can be specified in more than one way - and I think I know why!

Here's my theory:
Intel have made a compiler to support the ever growing extensions to the instruction set. The Intel compiler puts a CPU-dispatcher into your code to check whether the CPU supports the SSE2, SSE3, SSE4 or whatever instruction set. The compiled program can contain more than one version of critical parts of the code, and the CPU-dispatcher automatically chooses the version that fits the available instruction set. So far so good. The problem is that the CPU-dispatcher makes its choice based on family numbers and not only based on the feature bits that tell whether SSE2 etc. is supported. And it will not recognize unknown family numbers. The consequence is that any future Intel CPU with a family number different from 6 or 15 will not be recognized and will run with all SSE instruction sets disabled or will not run at all. There are lots of software on the market that is compiled with the Intel compiler. All this software would fail to run efficiently on a new Intel CPU with a family number different from 6 unless it is recompiled. The CPU-dispatcher checks only the old 4-bit family number. They can make the old family number = 6 in order to fool the CPU-dispatcher and then make the extended family number = e.g. 10 to make the sum = 16 or whatever number the marketing department dictates.

So the awkward implementation of the CPUID instruction is to make up for a serious blunder made by the people who designed the Intel compiler.

Funny that AMD have accepted this scheme, but they probably had no choice. BTW, the CPU-dispatcher in the Intel compiler also checks the brand name in the CPU and disables all SSE extensions if the brand is anything but Intel. See
http://www.agner.org/optimize/optimizing_cpp.pdf for how to circumvent this and make the code compiled with the Intel compiler work on AMD processors.


Top
 Profile  
 
 
 Post subject: Re: CPUID family bits added because of flaw in Intel compiler
PostPosted: Sat Mar 22, 2008 9:53 am 
Offline

Joined: Fri Aug 31, 2007 10:08 pm
Posts: 238
Location: Switzerland
thanks for your detailed analysis

Agner wrote:
This means that the same family number can be specified in more than one way

ough !, shades of the infamous 8086 segmented memory

Agner wrote:
Intel have made a compiler to support the ever growing extensions to the instruction set. The Intel compiler puts a CPU-dispatcher into your code to check whether the CPU supports the SSE2, SSE3, SSE4 or whatever instruction set.


yes but only with some compilation flags, though if you link with Intel libraries like MKL or IPP you indeed always have the problem

Agner wrote:
based on family numbers and not only based on the feature bits that tell whether SSE2 etc. is supported.


it looks like a very bad decision, it's much like if they say the features flags are not reliable, in fact there is probably no technical justification but an attempt to make it fast for Intel targets only instead of fast for all industry standard CPUs


Top
 Profile  
 
 Post subject:
PostPosted: Sat Mar 22, 2008 12:08 pm 
Offline

Joined: Sun Mar 16, 2008 3:20 pm
Posts: 86
What version of ICC do you have ?

Stuff like that is quite old, look e.g. here:

http://www.swallowtail.org/naughty-intel.html

But I thought that Intel changed / removed it, due to the recent AMD complaint against Intel.

IMO the new ICCs feature also special SSE optimizations compiler flags for non-Intel CPUs. However I did not see any comparisons of them.

cheers

Opteron


Top
 Profile  
 
 Post subject: Re: CPUID family bits added because of flaw in Intel compile
PostPosted: Sat Mar 22, 2008 4:19 pm 
Offline

Joined: Sun Oct 07, 2007 6:22 pm
Posts: 119
> [family 6 or 15]

Certain past versions of Microsoft Windows only looked
at the lowest 3 of the 4 family bits... which certainly did
not help either.


Top
 Profile  
 
 Post subject: Re: CPUID family bits added because of flaw in Intel compile
PostPosted: Sat Mar 22, 2008 5:14 pm 
Offline

Joined: Sat Mar 22, 2008 5:10 pm
Posts: 370
no@spam.com wrote:
> [family 6 or 15]

Certain past versions of Microsoft Windows only looked
at the lowest 3 of the 4 family bits... which certainly did
not help either.


Thank you for the info, it explains why they jumped from family 6 (7 with Itanium) to 15.

The extended family issue didn't affect just ICC...


Top
 Profile  
 
 Post subject:
PostPosted: Sun Mar 23, 2008 8:02 am 
Offline

Joined: Fri Sep 07, 2007 10:31 am
Posts: 41
Location: Denmark
Thank for the link to http://www.swallowtail.org/naughty-intel.html. He has found exactly the same as I have found in Intel's C++ compiler.
The issue is still the same in later versions of the compiler. I have complained to Intel. First they tried to cover up and explain away my findings, but after I had presented piles of evidence to them, they admitted. And they refused to change the cripple-AMD feature.


Top
 Profile  
 
 Post subject: Re: CPUID family bits added because of flaw in Intel compiler
PostPosted: Sun Mar 23, 2008 9:34 am 
Offline

Joined: Sat Sep 01, 2007 8:01 am
Posts: 652
Agner wrote:
The CPUID instruction in modern PC microprocessors works in an awkward way. Originally, the CPUID gave 4 bits for family number and 4 bits for model number. This means that the maximum numbers are family 15 and model 15. When these numbers were exhausted, they added 4 more bits for the model number. The new 4 bits are concatenated with the old 4 bits to make an 8-bit number, so the maximum value for model number is 255, or FF hexadecimal.
It would be logical to do the same with the family number, but instead they have added 8 more bits called "extended family". The new 8 bits are not concatenated with the old 4 bits to make a 12-bit number. Instead they have specified that we must calculate the sum of the old 4-bit "family" number and the new 8-bit "extended family" number. This means that the same family number can be specified in more than one way - and I think I know why!

Here's my theory:
Intel have made a compiler to support the ever growing extensions to the instruction set. The Intel compiler puts a CPU-dispatcher into your code to check whether the CPU supports the SSE2, SSE3, SSE4 or whatever instruction set. The compiled program can contain more than one version of critical parts of the code, and the CPU-dispatcher automatically chooses the version that fits the available instruction set. So far so good. The problem is that the CPU-dispatcher makes its choice based on family numbers and not only based on the feature bits that tell whether SSE2 etc. is supported. And it will not recognize unknown family numbers. The consequence is that any future Intel CPU with a family number different from 6 or 15 will not be recognized and will run with all SSE instruction sets disabled or will not run at all. There are lots of software on the market that is compiled with the Intel compiler. All this software would fail to run efficiently on a new Intel CPU with a family number different from 6 unless it is recompiled. The CPU-dispatcher checks only the old 4-bit family number. They can make the old family number = 6 in order to fool the CPU-dispatcher and then make the extended family number = e.g. 10 to make the sum = 16 or whatever number the marketing department dictates.

So the awkward implementation of the CPUID instruction is to make up for a serious blunder made by the people who designed the Intel compiler.

Funny that AMD have accepted this scheme, but they probably had no choice. BTW, the CPU-dispatcher in the Intel compiler also checks the brand name in the CPU and disables all SSE extensions if the brand is anything but Intel. See
http://www.agner.org/optimize/optimizing_cpp.pdf for how to circumvent this and make the code compiled with the Intel compiler work on AMD processors.


the thrust is out there!

Agent Mulder
one more conspirancy theorie . . . i wonder why AMD math lab does not support Conroe core??? probably because AMD designed their CPU ID specially to stop it from running on conroe???? OR VIS ET VERSA????


Top
 Profile  
 
 Post subject:
PostPosted: Sun Mar 23, 2008 3:10 pm 
Offline

Joined: Sun Mar 16, 2008 3:20 pm
Posts: 86
Agner wrote:
Thank for the link to http://www.swallowtail.org/naughty-intel.html. He has found exactly the same as I have found in Intel's C++ compiler.
The issue is still the same in later versions of the compiler. I have complained to Intel. First they tried to cover up and explain away my findings, but after I had presented piles of evidence to them, they admitted. And they refused to change the cripple-AMD feature.


You are welcome :)
Just one last question, with "later version", you really meant ICC 10.X ?

As mentioned before, I thought that Intel stopped the "intel only" coding with that version, as AMD is using it against intel in the ongoing lawsuit:
Quote:
125. Intel has designed its compiler purposely to degrade performance when a program is run on an AMD platform. To achieve this, Intel designed the compiler to compile code along several alternate code paths. Some paths are executed when the program runs on an Intel
platform and others are executed when the program is operated on a computer with an AMD microprocessor. (The choice of code path is determined when the program is started, using a feature known as “CPUID” which identifies the computer’s microprocessor.) By design, the
code paths were not created equally. If the program detects a “Genuine Intel” microprocessor, it executes a fully optimized code path and operates with the maximum efficiency. However, if the program detects an “Authentic AMD” microprocessor, it executes a different code path
that will degrade the program’s performance or cause it to crash.

http://www.amd.com/us-en/assets/content ... plaint.pdf p. 40

If it is still in ICC 10.X then AMD would love that ^^

However it is not that severe anymore, I just re-red a print article on the ICC 10 suite. It mentions that one should use the flags -[Q]xO for AMD CPUs. The difference to the patched code of a -[Q]xP optimized binary would be interesting, though, if there is any.

cheers

Opteron


Top
 Profile  
 
 Post subject:
PostPosted: Sun Mar 23, 2008 7:21 pm 
Offline

Joined: Thu Sep 20, 2007 10:47 am
Posts: 163
I don't think AMD really has any right to complain that Intel's compiler doesn't optimize for their CPUs, or even that it de-optimizes. That's just a part of life and competing with Intel.

There are 3rd party compilers (Sun, PGI, not sure about Pathscale any more though), which I'm sure AMD can contribute to, as well as the ever present GCC and MSVC.

I don't really see that as anti-competitive, especially given how commonly ICC is used. If ICC were the most commonly used compiler, then I might have an issue with it, but I think in reality that GCC and MSVC are much more commonly used.

David


Top
 Profile  
 
 Post subject:
PostPosted: Sun Mar 23, 2008 7:55 pm 
Offline

Joined: Sat Mar 22, 2008 5:10 pm
Posts: 370
David,
I think the question is how it's done, the compiler generate many paths, one with SSE2 and others without (for example) and use family and features flags to determine the best path for Intel processors, the problem is that non-Intel processors with SSE2 could benefit from this path, but the compiler explicity (and unnecessarily since SSE2 was checked before and the code will work) avoid it.
I agree if someone says they don't have to otimize for competitors, but that "if(Intel)" is really necessary?
Ok, ICC is rarely used so AMD customers won't be so damaged, but Intel benchmarks AMD's processors using ICC and use this info for marketing, wich may hurt AMD's business, for example:
http://www.spec.org/cpu2006/results/res ... 00080.html
http://www.spec.org/cpu2006/results/res ... 00078.html
http://www.spec.org/cpu2006/results/res ... 00079.html
http://www.spec.org/cpu2006/results/res ... 00081.html


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 24, 2008 1:00 am 
Offline

Joined: Fri Aug 17, 2007 2:55 pm
Posts: 369
Quote:
Ok, ICC is rarely used so AMD customers won't be so damaged, but Intel benchmarks AMD's processors using ICC and use this info for marketing, wich may hurt AMD's business, for example:


Until relatively recently AMD benchmarked their own chips with ICC, so I don't see how 206 benchmarks by Intel are disingenuous. The first result linked is clock for clock faster than AMD's own submissions in the same year and quarter.


Top
 Profile  
 
 Post subject: Intel compiler
PostPosted: Mon Mar 24, 2008 12:32 pm 
Offline

Joined: Fri Sep 07, 2007 10:31 am
Posts: 41
Location: Denmark
I just tried Intel C++ compiler version 10.1 with option /QxO as you suggested. It generates the following versions of code for common mathematical functions: SSE2, SSE3, SSE4.1 and non-Intel SSE2. It doesn't work on any CPU prior to SSE2. This is the only compiler option that makes it run reasonably on an AMD, but why are there two different SSE2 versions, one for Intel and one for AMD? When I hack the CPU-dispatcher and makes it believe that it is an Intel, it runs 50 - 100 % faster. This means that the Intel-SSE2 version is faster than the AMD-SSE2 version when running on an AMD processor!

There are also options that work on any processor. For example /QaxB. This options runs non-vectorized SSE2 code on Intel processors and old 8087 code on AMD processors. I measured this to be 5-10 times slower than the /QxO option on an AMD Opteron.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 24, 2008 2:25 pm 
Offline

Joined: Sun Jul 22, 2007 12:53 am
Posts: 256
Makes you wonder if the source code would point to any shenanigans.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 24, 2008 5:15 pm 
Offline

Joined: Sat Sep 01, 2007 8:01 am
Posts: 652
http://developer.amd.com/tools/acml/features/Pages/default.aspx

AMD does optimize its software for its own processors, nothing wrong with this!

"Announcing ACML Version 4.0!
New features in Version 4.0 include:
Update of LAPACK to version 3.1.1
Optimizations for Third-Generation AMD Opteron™ Processors "

Why it is wrong when Intel does the same?

This is old FUD, and by the way, when you don't disclose your origine, make sure you don t post on a forum with a public log ... At least, have the decency to do not use your laptop from the office.

who?


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 24, 2008 6:32 pm 
Offline

Joined: Sun Aug 12, 2007 11:23 pm
Posts: 38
who? wrote:
http://developer.amd.com/tools/acml/features/Pages/default.aspx

AMD does optimize its software for its own processors, nothing wrong with this!

"Announcing ACML Version 4.0!
New features in Version 4.0 include:
Update of LAPACK to version 3.1.1
Optimizations for Third-Generation AMD Opteron™ Processors "

Why it is wrong when Intel does the same?


You can't see the difference between providing binaries and a compiler?

Quote:
This is old FUD, and by the way, when you don't disclose your origine, make sure you don t post on a forum with a public log ... At least, have the decency to do not use your laptop from the office.


Agner's site states who he is, everything is in there down to his private address. Where is your disclaimer ?


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 163 posts ]  Go to page 1, 2, 3, 4, 5 ... 11  Next

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
suspicion-preferred