You are currently viewing our boards as a guest, which gives you limited access to view most discussions and access our other features. By joining our free community, you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content, and access many other special features. Registration is fast, simple, and absolutely free, so please, <a href="/profile.php?mode=register">join our community today</a>!
Technically, this will probably be correct: if/when Shanghai is delayed or has trouble with the scaling, AMD may introduce 3.2GHz quad core 65nm Barcelona in 2009. Just like they introduced 3.2GHz K8 in 2007.
When you spoke to the Nehalem architect, did he mention the cache structure by chance? My strong impression is that with the possible exception of the 8-core MP part, the "last level shared cache" is an L2, not an L3 as was assumed by most people prior to the die photo. Intel was always careful to say only "multi"-level, and "last level shared".
So, how much will the Nehalem be faster than the already oh-so-fast Penryn, focusing specifically on the core IPC single thread performance? I asked Kirk Skaugen, Intel's Digital Enterprise Group VP and GM of Server Platforms Group.
Kirk is always friendly, even to the point of having had a fun chat during the previous Taipei IDF on how great it would have been if Nvidia actually did come out with a dual socket Xeon Nforce SLI chipset - which it didn't up to now - so I expected a fun answer this time too.
Well, no fun here - he didn't want to comment on the CPU core-specific performance expectations beyond the well known integrated memory controllers and interconnects. But he did say that the CPU core performance jump from the same process Core 2 (Penryn) to Nehalem would be higher than the jump for Netburst to Core 2 itself.
Sorry, I didn't ask him about the cache structure, but he probably wouldn't have said much, he was obviously only will to talk very broad brush.
We'll see. Personally I'm not expecting Nehalem to be all that huge a jumpm in IPC. Netburst vs Conroe was HUGE for IPC. I just can't see that happening, except perhaps for some specific bandwidth hungry server work loads.
Paul DeMone wrote:
Given Core 2 Duo is a four issue core the odds are you don't understand very well.
Given that I even mentioned that Core and Nehalem are any "issue" at all the odds are that I made a silly mistake / typo - it's not a complicated concept, three vs four issue - which is in fact what happened, so if you can all find it in your cold, poisoned hearts to forgive me, that would be swell.
One other comment I can make is that the Nehalem architect was very quick to say how wonderful he thought the improvements in IPC were for Penryn over Conroe. It was probably just a political gesture, and perhaps said to make him and his team look good when Nehalem arrives and blows away Penryn. But honestly, in the vast majority of instances Penryn is a pretty modst improvement IPC-wise. I am aware that Penryn is a tock and not a tick, so let's not argue about that. Process obviosuly another matter.
Joined: Wed Jun 27, 2007 10:19 am Posts: 325 Location: Milano, Italy
caboosemoose wrote:
We'll see. Personally I'm not expecting Nehalem to be all that huge a jumpm in IPC. Netburst vs Conroe was HUGE for IPC. I just can't see that happening, except perhaps for some specific bandwidth hungry server work loads.
I wouldn't discount Nehalem improvements coming from the IMC as just something which will improve 'bandwidth' sensitive workloads. That's because even on non-bandwidth intensive workloads the C2x hardware prefetchers are perfectly able to generate a significant amount of traffic and reduce apparent memory latency. However on quad-core models this doesn't reap as many benefits as it does on dual-core variants and this is a visible effect on benchmarks with well behaved memory patterns or significant bandwidth demands were turning-off prefetching usually leads to higher performance.
Quad-core Nehalem variants will have both significantly lower latency and per-core bandwidth than C2Q which will in turn allow for even more aggressive prefetching strategies. This could provide a significant boost even on integer applications - especially the ones with larger working sets - and will have the benefit of providing this performance with less tuning compared to C2Q.
The point of the link was (obviously) to provide a reference to the different models of Nehalem, as it is only the EX part that appears to have 3 levels of cache indicated.
Besides the die photo, and speculation on a couple sites, another reason I suspect Nehalem does not simply have a shared L3 like Barcelona for all parts in the range is the way Intel has been so deliberately vague about it.
It seems to me there'd be little reason to do that with a "standard" L1/L2 shared L3 structure. But if they have "leveraged SmartCache" in an innovative manner to be able to have a large 4-way shared L2, well, that WOULD be something to keep quiet about for a longer time.
The point of the link was (obviously) to provide a reference to the different models of Nehalem, as it is only the EX part that appears to have 3 levels of cache indicated.
What you are seeing there with the EX part is probably an L4, with the L1 and L2 being considered part of the exclusive core cache.
Besides the die photo, and speculation on a couple sites, another reason I suspect Nehalem does not simply have a shared L3 like Barcelona for all parts in the range is the way Intel has been so deliberately vague about it.
It seems to me there'd be little reason to do that with a "standard" L1/L2 shared L3 structure. But if they have "leveraged SmartCache" in an innovative manner to be able to have a large 4-way shared L2, well, that WOULD be something to keep quiet about for a longer time.
The chip experts are going for a 512KB L2 and 8MB L3
Besides the die photo, and speculation on a couple sites, another reason I suspect Nehalem does not simply have a shared L3 like Barcelona for all parts in the range is the way Intel has been so deliberately vague about it.
It seems to me there'd be little reason to do that with a "standard" L1/L2 shared L3 structure. But if they have "leveraged SmartCache" in an innovative manner to be able to have a large 4-way shared L2, well, that WOULD be something to keep quiet about for a longer time.
The chip experts are going for a 512KB L2 and 8MB L3
David also has direct access to Intel architects, I think you are fighting a heavy tide here ;-).
Hans' analysis is not convincing at all. (I believe he's got some other errors there, in addition to the cache analysis) Still waiting for David to reply. I'm sticking with my position, as it stands:
Users browsing this forum: No registered users and 1 guest
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum