Aceshardware Forum Index Aceshardware
(not so) temporary home for the aceshardware community
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups    RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Nehalem Preview
Goto page Previous  1, 2, 3, 4, 5, 6  Next
 
Post new topic   Reply to topic    Aceshardware Forum Index -> General forum
View previous topic :: View next topic  
Author Message
martinw



Joined: 06 Sep 2007
Posts: 106

PostPosted: Thu Jun 12, 2008 3:41 pm    Post subject: Reply with quote

stupid dog wrote:
Hans de Vries wrote:
The performance increase from Hyper Threading seems to be in the
same range as that of the Pentium 4 (see table at the end of)
http://www.xbitlabs.com/articles/cpu/display/pentium4-3066_2.html

And why that? The table is showing performance gain while running multiple task simultaneously. Anand didn't check this. On the other side, the "old HT" shows little to no performance gain in multithreaded apps. So, how can you conclude that this is "same range as that of the Pentium 4"?


Not speaking for Hans, but IIRC the original claim for P4 HT was "up to 30% faster" and the claim for Nehalem is also "up to 30% faster". The Xbitlabs article could be used as evidence to support the original P4 claim, so it seems like a reasonable conclusion that both HT implementations are in the same range . The xbitlabs article also shows how useless marketing phrases like "up to 30%" really are for performance planning purposes, so I await the real world numbers for Nehalem with interest.

Speaking for myself, we never got any significant benefit from HT in our application (3d animation/rendering) and often got slowdowns. I'd be very happy if Nehalem showed significant positive HT benefits, but I'd even be ok with zero benefit some of the time. The real problem is when things slow down and customers start beating down your door.
Back to top
View user's profile Send private message
lux_interior



Joined: 26 Jul 2007
Posts: 235

PostPosted: Thu Jun 12, 2008 7:53 pm    Post subject: Reply with quote

who? wrote:
For the hardware prefetcher ... that about Games, office applications, video edition, video encoding, 3D rendering, Java engines, sequencial database, Bio-computing, financial analysis... it is an endless list...

Your statement to "turn it off" is only true for pointer chasing applications


Pointer chasing applications? You mean, like Java engines? :)
Back to top
View user's profile Send private message
Johan



Joined: 23 Jul 2007
Posts: 128
Location: Belgium

PostPosted: Thu Jun 12, 2008 8:56 pm    Post subject: Reply with quote

Johan wrote:

Then I would really like to hear which kind of applications really benefit from the HW prefetcher. In many Server apps, it is best to leave it off as gets in the way of the "normal" bandwidth.

It makes for example a big negative impact on Specjbb, allmost all database benchmarking and several others I - i admit - have to retest to be sure.

99% of the time faster with prefetcher, I hope you are talking about Nehalem with it's ample bandwidth, because it is complete crap for the Core 2 generation.

So far the HW prefetcher should be enabled dynamically (by the OS) to be really efficient IMHO.


who? wrote:

For the hardware prefetcher ... that about Games, office applications, video edition, video encoding, 3


was talking about server apps.


Quote:


Your statement to "turn it off" is only true for pointer chasing applications, and with Nehalem, it is not going to be an issue anymore. sooooo?


Sooooo ... there are probably a few tens of millions systems out there for which it is an issue.


Quote:

Just one of your regular posting, a lot of opinion, and very little data to back it up.
who?


https://www-304.ibm.com/systems/support/supportsite.wss/docdisplay?lndocid=MIGR-5070865&brandind=5000008

"The default setting for the "Processor Hardware Prefetcher" feature is changed to "Disabled" as recommended by Intel for most servers. This setting will become "Disabled" if the user loads the default settings for all configuration settings or just the single line item setting."

Enough data to back it up for you?
Back to top
View user's profile Send private message
who?



Joined: 01 Sep 2007
Posts: 464

PostPosted: Thu Jun 12, 2008 11:29 pm    Post subject: Reply with quote

Johan wrote:
Johan wrote:

Then I would really like to hear which kind of applications really benefit from the HW prefetcher. In many Server apps, it is best to leave it off as gets in the way of the "normal" bandwidth.

It makes for example a big negative impact on Specjbb, allmost all database benchmarking and several others I - i admit - have to retest to be sure.

99% of the time faster with prefetcher, I hope you are talking about Nehalem with it's ample bandwidth, because it is complete crap for the Core 2 generation.

So far the HW prefetcher should be enabled dynamically (by the OS) to be really efficient IMHO.


who? wrote:

For the hardware prefetcher ... that about Games, office applications, video edition, video encoding, 3


was talking about server apps.


Quote:


Your statement to "turn it off" is only true for pointer chasing applications, and with Nehalem, it is not going to be an issue anymore. sooooo?


Sooooo ... there are probably a few tens of millions systems out there for which it is an issue.


Quote:

Just one of your regular posting, a lot of opinion, and very little data to back it up.
who?


https://www-304.ibm.com/systems/support/supportsite.wss/docdisplay?lndocid=MIGR-5070865&brandind=5000008

"The default setting for the "Processor Hardware Prefetcher" feature is changed to "Disabled" as recommended by Intel for most servers. This setting will become "Disabled" if the user loads the default settings for all configuration settings or just the single line item setting."

Enough data to back it up for you?


compare to the success rate of the Core 2 L1 hit on DivX (99.9%) Povvray (99.9%) , on FEAR (99.0%) , 3DSMax , or L2 success rate on FLASH video encoding (99.9%) and all the other desktop application, your data looks pretty slim (Sorry, I am in HK, I only have those data points in mind)

You try to use a cornet case to make your point. Based on your application, you may turn off the prefetcher, in the IBM case, it is a very special case, it is true for non sequencial database access.

So, if you don't count the desktop workloads, and you reduce the world to non sequencial access database, then may be you can run into situation you got to turn the prefetcher OFF. By the way, if you have the SQL source code of your database, it takes 5 min to add SQL code to help the code to be more sequencial. It is usually very rare that you access only one field out of a data array, after 2 accesses, the prefetcher can help.

Let me guest ... you are using a synthetic test to back up your statement, what about a real database application???? We are back to the same dancing again. I am not an expert in Server benchmarks, but I saw my pear saying that a lot of server benchmark artificially increase the unpredictability of the mem access, to test the corner case compare to real life database.

so, for Desktop, the prefetcher is certainly not a crap, i know some people wearing green badges who would like to have it :)

And just for your information, there are some prefetcher mecanism you can't turn OFF, the one you are talking about is the hardware prefetcher of the CPU/chipset. Without the L1 prefetcher, you statement would be funny ... trust me, you don t want to see with this one OFF.

Anyway, i ll stop the discussion here, since you are convince that the world is a database unsequencial access.

who?
Back to top
View user's profile Send private message
EduardoS



Joined: 22 Mar 2008
Posts: 75

PostPosted: Thu Jun 12, 2008 11:49 pm    Post subject: Reply with quote

who? wrote:

Let me guest ... you are using a synthetic test to back up your statement, what about a real database application???? We are back to the same dancing again. I am not an expert in Server benchmarks, but I saw my pear saying that a lot of server benchmark artificially increase the unpredictability of the mem access, to test the corner case compare to real life database.

Real life databases usually have a much worse code than those server benchmarks and even more unpredictable, it's not that easy to make them sequential, or better, there is no way to make a HASH JOIN sequential...

who? wrote:

so, for Desktop, the prefetcher is certainly not a crap, i know some people wearing green badges who would like to have it :)

They have it since September 10th, 2007 ;)
And they have it for servers too, and in their case it's not bandwidth limited.
Back to top
View user's profile Send private message
who?



Joined: 01 Sep 2007
Posts: 464

PostPosted: Fri Jun 13, 2008 2:04 am    Post subject: Reply with quote

EduardoS wrote:
who? wrote:

Let me guest ... you are using a synthetic test to back up your statement, what about a real database application???? We are back to the same dancing again. I am not an expert in Server benchmarks, but I saw my pear saying that a lot of server benchmark artificially increase the unpredictability of the mem access, to test the corner case compare to real life database.

Real life databases usually have a much worse code than those server benchmarks and even more unpredictable, it's not that easy to make them sequential, or better, there is no way to make a HASH JOIN sequential...

who? wrote:

so, for Desktop, the prefetcher is certainly not a crap, i know some people wearing green badges who would like to have it :)

They have it since September 10th, 2007 ;)
And they have it for servers too, and in their case it's not bandwidth limited.


hooo, you forgot... it is as fast as a snail too!

are you telling me that modern database never access the concecutive cache line???? and you telling me that linear pointer searching is not used in SAP or Oracle? just are just full of .........
check the lenght of your name and 1st name, and come back later and tell me if they are no opportunity for prefectch. stop drinking!



who?
Back to top
View user's profile Send private message
EduardoS



Joined: 22 Mar 2008
Posts: 75

PostPosted: Fri Jun 13, 2008 2:52 am    Post subject: Reply with quote

who? wrote:

are you telling me that modern database never access the concecutive cache line???? and you telling me that linear pointer searching is not used in SAP or Oracle? just are just full of .........
check the lenght of your name and 1st name, and come back later and tell me if they are no opportunity for prefectch. stop drinking!


Hey... Where did you read "never"? Do you have any problem with text interpretaion? Sequential access may even outnumber unpredictable ones (and actually does), but the latter may result in a slow cache miss taking more time than many predicted access, my point here was just that real life database have more unpredictables access due to the poor code, they impact performance and are not easy to avoid.

And please, before you think about using those sequential access to support your claims remeber that even old processors/chipsets are able to do the basic forward/backward prefetching, the context here is those more advanced ones that your company recomends to disable.
Back to top
View user's profile Send private message
Gabriele Svelto



Joined: 27 Jun 2007
Posts: 263
Location: Milano, Italy

PostPosted: Fri Jun 13, 2008 7:44 am    Post subject: Reply with quote

who? wrote:
You try to use a cornet case to make your point. Based on your application, you may turn off the prefetcher, in the IBM case, it is a very special case, it is true for non sequencial database access.

who? I think you didn't understand the problem. SPECjbb scores are significantly higher on C2-derived Xeons with prefetching OFF. SPECjbb has a clean, nice sequential access pattern, and well managed with very high cache hit-rates so it has nothing to do with the kind of workload you are describing. The IP-based prefetcher of C2s lowers performance because - just as with all other well behaved applications - SPECjbb depends on *bandwidth* and the prefetcher is *wasting* it for data which will not be needed thus lowering performance. In the case of C2Q this is particularly nasty because the prefetcher has no way to know that there's another couple of cores hanging off the same bus and thus it tends to starve the other cores (and vice-versa).

Quote:
So, if you don't count the desktop workloads, and you reduce the world to non sequencial access database, then may be you can run into situation you got to turn the prefetcher OFF.

I think you've got it backwards, non-sequential access is exactly the kind of workload which benefits a lot from the IP-based prefetcher and - as you pointed out - that's one of the reasons why games run so well on C2. Naturally if you're running out of bandwidth you can get hurt also under this kind of workload but this tends to happen more on C2Q than on C2D.

Quote:
By the way, if you have the SQL source code of your database, it takes 5 min to add SQL code to help the code to be more sequencial. It is usually very rare that you access only one field out of a data array, after 2 accesses, the prefetcher can help.

How do you perform a binary search with sequential access?

Quote:
Let me guest ... you are using a synthetic test to back up your statement, what about a real database application???? We are back to the same dancing again. I am not an expert in Server benchmarks, but I saw my pear saying that a lot of server benchmark artificially increase the unpredictability of the mem access, to test the corner case compare to real life database.

As we repeatedly pointed out SPECjbb which shows this effect very well has very *predictable* memory access pattern. You've got it backwards.

Quote:
so, for Desktop, the prefetcher is certainly not a crap, i know some people wearing green badges who would like to have it :)

No it is not, it's one of the reasons why C2 is so fast on many applications, but we weren't talking about desktops.

Quote:
And just for your information, there are some prefetcher mecanism you can't turn OFF, the one you are talking about is the hardware prefetcher of the CPU/chipset. Without the L1 prefetcher, you statement would be funny ... trust me, you don t want to see with this one OFF.

We were talking about the IP-based prefetcher which you can turn off thus increasing performance, why would we discuss about something which cannot be turned off?
Back to top
View user's profile Send private message
who?



Joined: 01 Sep 2007
Posts: 464

PostPosted: Fri Jun 13, 2008 9:29 am    Post subject: Reply with quote

Gabriele Svelto wrote:
who? wrote:
You try to use a cornet case to make your point. Based on your application, you may turn off the prefetcher, in the IBM case, it is a very special case, it is true for non sequencial database access.

who? I think you didn't understand the problem. SPECjbb scores are significantly higher on C2-derived Xeons with prefetching OFF. SPECjbb has a clean, nice sequential access pattern, and well managed with very high cache hit-rates so it has nothing to do with the kind of workload you are describing. The IP-based prefetcher of C2s lowers performance because - just as with all other well behaved applications - SPECjbb depends on *bandwidth* and the prefetcher is *wasting* it for data which will not be needed thus lowering performance. In the case of C2Q this is particularly nasty because the prefetcher has no way to know that there's another couple of cores hanging off the same bus and thus it tends to starve the other cores (and vice-versa).

Quote:
So, if you don't count the desktop workloads, and you reduce the world to non sequencial access database, then may be you can run into situation you got to turn the prefetcher OFF.

I think you've got it backwards, non-sequential access is exactly the kind of workload which benefits a lot from the IP-based prefetcher and - as you pointed out - that's one of the reasons why games run so well on C2. Naturally if you're running out of bandwidth you can get hurt also under this kind of workload but this tends to happen more on C2Q than on C2D.

Quote:
By the way, if you have the SQL source code of your database, it takes 5 min to add SQL code to help the code to be more sequencial. It is usually very rare that you access only one field out of a data array, after 2 accesses, the prefetcher can help.

How do you perform a binary search with sequential access?

Quote:
Let me guest ... you are using a synthetic test to back up your statement, what about a real database application???? We are back to the same dancing again. I am not an expert in Server benchmarks, but I saw my pear saying that a lot of server benchmark artificially increase the unpredictability of the mem access, to test the corner case compare to real life database.

As we repeatedly pointed out SPECjbb which shows this effect very well has very *predictable* memory access pattern. You've got it backwards.

Quote:
so, for Desktop, the prefetcher is certainly not a crap, i know some people wearing green badges who would like to have it :)

No it is not, it's one of the reasons why C2 is so fast on many applications, but we weren't talking about desktops.

Quote:
And just for your information, there are some prefetcher mecanism you can't turn OFF, the one you are talking about is the hardware prefetcher of the CPU/chipset. Without the L1 prefetcher, you statement would be funny ... trust me, you don t want to see with this one OFF.

We were talking about the IP-based prefetcher which you can turn off thus increasing performance, why would we discuss about something which cannot be turned off?


I know about SPECjbb , it does not give the right for somebody to call the prefetcher "crap", based on the rest of the benefit you get with it (All the desktop applications). Especially when those people never ever tried to do something by themselve.
Give a little respect to the people doing the thing!

Always the same guys spitting on the work of others.
I propose that they sh.t up and do something with their own brain before they put any judgment on life archivement of other people, or at least temper their words, sick and tired of Jerks. Try to do better if you can.



who?
Back to top
View user's profile Send private message
EduardoS



Joined: 22 Mar 2008
Posts: 75

PostPosted: Fri Jun 13, 2008 12:31 pm    Post subject: Reply with quote

who? wrote:

I know about SPECjbb , it does not give the right for somebody to call the prefetcher "crap", based on the rest of the benefit you get with it (All the desktop applications). Especially when those people never ever tried to do something by themselve.
Give a little respect to the people doing the thing!

Always the same guys spitting on the work of others.
I propose that they sh.t up and do something with their own brain before they put any judgment on life archivement of other people, or at least temper their words, sick and tired of Jerks. Try to do better if you can.

The prefetcher isn't crap, FSB is, the other company doesn't have any problems with prefetcher+servers even on workloads that make little use of it.
Back to top
View user's profile Send private message
Johan



Joined: 23 Jul 2007
Posts: 128
Location: Belgium

PostPosted: Fri Jun 13, 2008 1:09 pm    Post subject: Reply with quote

who? wrote:

I know about SPECjbb , it does not give the right for somebody to call the prefetcher "crap", based on the rest of the benefit you get with it (All the desktop applications). Especially when those people never ever tried to do something by themselve.
Give a little respect to the people doing the thing!

Always the same guys spitting on the work of others.
I propose that they sh.t up and do something with their own brain before they put any judgment on life archivement of other people, or at least temper their words, sick and tired of Jerks. Try to do better if you can.
who?


There is no reason to start a Greek drama, nobody called the HW prefetcher crap. I called your statement " If you let go few cycles to get 500% faster in 99% of the case with the prefetcher help, your statement is wrong" crap, especially on the current generation of xeons with limited bandwidth. It is an overgeneralization and it is too optimistic.

The prefetcher does not make 99% of the cases faster. It helps in some cases, but even Intel admits that in most server apps it lower performance instead of raising it. So it not a silver bullet right now.
Back to top
View user's profile Send private message
who?



Joined: 01 Sep 2007
Posts: 464

PostPosted: Sat Jun 14, 2008 3:58 am    Post subject: Reply with quote

Johan wrote:
who? wrote:

I know about SPECjbb , it does not give the right for somebody to call the prefetcher "crap", based on the rest of the benefit you get with it (All the desktop applications). Especially when those people never ever tried to do something by themselve.
Give a little respect to the people doing the thing!

Always the same guys spitting on the work of others.
I propose that they sh.t up and do something with their own brain before they put any judgment on life archivement of other people, or at least temper their words, sick and tired of Jerks. Try to do better if you can.
who?


There is no reason to start a Greek drama, nobody called the HW prefetcher crap. I called your statement " If you let go few cycles to get 500% faster in 99% of the case with the prefetcher help, your statement is wrong" crap, especially on the current generation of xeons with limited bandwidth. It is an overgeneralization and it is too optimistic.

The prefetcher does not make 99% of the cases faster. It helps in some cases, but even Intel admits that in most server apps it lower performance instead of raising it. So it not a silver bullet right now.


Dude, If I was telling you that your articles are a piece of "cr.p", you 'll get really pissed off. This apply to us too, we work for endless hours to make those CPUs, myself, on Nehalem, or Core 2, I am doing an average of 14 hours per day. Remember, 731millions transistor to tune and make sure their are no issue with it. I am the voicy part of the team, but many people read here, and trust me, they don't feel good when they read the way you talk about things, and as you told me before, "it does not really help your company" when you do so.

If you could turn off the L1/L2 prefetcher, you would see serious slowdowns. The only one you can turn OFF is the CPU/Chipset one. so, the only one you can quantify is this one. Prefetcher benefits are bigger than you can measure.

Just vtune on video Apps and 3D apps, and you ll understand what I mean. FSB utilization is low, outstanding request from L1 to L2 is low, but fetch request is high between L1 and L2. then, success L1 hit rate is high.
But it is ok, i know you do not want to go in that detail level, otherwise, there are no "juicy story".

who?
Back to top
View user's profile Send private message
no@spam.com



Joined: 07 Oct 2007
Posts: 53

PostPosted: Sat Jun 14, 2008 8:02 am    Post subject: Reply with quote

> If you could turn off the L1/L2 prefetcher, you would see serious
> slowdowns. The only one you can turn OFF is the CPU/Chipset one.

You are not aware of bits 9, 19, 37, and 39 of the MISC_ENABLE MSR?
In particular, bits 37 and 39 let you disable both L2->L1D prefetchers.

> so, the only one you can quantify is this one.

With the above four bits, one can quantify 16 settings.
Back to top
View user's profile Send private message
who?



Joined: 01 Sep 2007
Posts: 464

PostPosted: Sat Jun 14, 2008 2:40 pm    Post subject: Reply with quote

no@spam.com wrote:
> If you could turn off the L1/L2 prefetcher, you would see serious
> slowdowns. The only one you can turn OFF is the CPU/Chipset one.

You are not aware of bits 9, 19, 37, and 39 of the MISC_ENABLE MSR?
In particular, bits 37 and 39 let you disable both L2->L1D prefetchers.

> so, the only one you can quantify is this one.

With the above four bits, one can quantify 16 settings.


Dude, read my title and give yourself 5 min of reflection, do you really think you can teach me anything about MSR?
Those are not Bios options, are they? :) Not on Intel desktop board at least!

So, you just have to try to do the experimentation ... try it :)
Do you think an IT guy will write a program turning it OFF? I don't think so (If you know a case, let me know). So, we are back your Aceshardware nuts high lighting a corner case.
I think I ll be dancing in my cubicle very shortly.

who?
Back to top
View user's profile Send private message
lux_interior



Joined: 26 Jul 2007
Posts: 235

PostPosted: Sat Jun 14, 2008 5:56 pm    Post subject: Reply with quote

who? wrote:
Johan wrote:
"The default setting for the "Processor Hardware Prefetcher" feature is changed to "Disabled" as recommended by Intel for most servers. This setting will become "Disabled" if the user loads the default settings for all configuration settings or just the single line item setting."

Enough data to back it up for you?


compare to the success rate of the Core 2 L1 hit on DivX (99.9%) Povvray (99.9%) , on FEAR (99.0%) , 3DSMax , or L2 success rate on FLASH video encoding (99.9%) and all the other desktop application, your data looks pretty slim


IBM's and Intel's data must be pretty slim for them to recommend disabling the HW prefetcher.
I assume you hide all the interesting data to your own employer? Because apparently they don't take your recommandations seriously...
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Aceshardware Forum Index -> General forum All times are GMT + 1 Hour
Goto page Previous  1, 2, 3, 4, 5, 6  Next
Page 5 of 6   

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB
Hosted by FreeForums.org