Aceshardware

(not so) temporary home for the aceshardware community
 FAQ •  Search •  Register •  Login 
It is currently Sun Nov 08, 2009 4:11 am

All times are UTC + 1 hour



Welcome
Welcome to <strong>Aceshardware</strong>.

You are currently viewing our boards as a guest, which gives you limited access to view most discussions and access our other features. By joining our free community, you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content, and access many other special features. Registration is fast, simple, and absolutely free, so please, <a href="/profile.php?mode=register">join our community today</a>!


Post new topic Reply to topic  [ 136 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10  Next
Author Message
 Post subject:
PostPosted: Sun Mar 30, 2008 3:16 pm 
Offline

Joined: Thu Mar 13, 2008 10:53 am
Posts: 11
I haven't seen still an die-size comparison of Shanghai, so I decided to do it myself:

Image

I had some great deal of figuring the exact (or near by) scale matching here -- I hope it's representative enough. ;)

On the right side you can see the two single-core excerpts, for better perception.

One note to pay attention for: the I/O interface logic (HT & DRAM) in Shanghai takes the same size, as in its 65nm predecessor.


Top
 Profile  
 
 Post subject:
PostPosted: Sun Mar 30, 2008 4:19 pm 
Offline

Joined: Wed Jun 27, 2007 10:19 am
Posts: 320
Location: Milano, Italy
fellix wrote:
I haven't seen still an die-size comparison of Shanghai, so I decided to do it myself:

Good stuff :)

Quote:
One note to pay attention for: the I/O interface logic (HT & DRAM) in Shanghai takes the same size, as in its 65nm predecessor.

I think that was expected. It would be interesting to make a die-size comparison of the I/O part with Intel's Nehalem DDR3+QPI implementation.


Top
 Profile  
 
 Post subject:
PostPosted: Sun Mar 30, 2008 5:50 pm 
Offline

Joined: Sat Mar 22, 2008 5:10 pm
Posts: 209
JumpingJack wrote:
CPU/NB--L1/L2/L3 Latency (Everest in ns), L1/L2/L3 CPUID in cycles
1.8/1.8 --1.6/5.1/10.8 -- 3/15/51
1.8/2.0 --1.6/5.1/9.8 -- 3/15/48 (oddly CPUID reports 4 levels of cache at these settings, 1 L1 @ 8K and 1 L1 at 64K weird)
2.0/2.0 --1.5/4.6/9.2 -- 3/16/50
2.0/1.8 --1.5/4.6/9.8 -- 3/16/45
2.0/1.6 --1.5/4.6/10.5--3/15/55

It seens Barcelona's prefetcher fooled Everest and CPUID... I'm not sure if it is possible to disable the prefetcher, anyway, Rightmark is a bit more complete.

http://cpu.rightmark.org/download.shtml


Top
 Profile  
 
 Post subject:
PostPosted: Fri Apr 11, 2008 12:20 pm 
Offline

Joined: Sun Mar 16, 2008 3:20 pm
Posts: 82
We had some instruction set discussing on the previous pages, I found some new evidence now in the updated AMD CPUID specification:
Quote:
April 2.28
• 3.1 [Legacy Method] on page 23: Clarified.
2008
• CPUID Fn0000_0001_ECX[SSE41]: Added.
• CPUID Fn0000_0001_ECX[SSSE3]: Added.
• CPUID Fn8000_0001_ECX[SSE5]: Added.
• CPUID Fn8000_0001_ECX[IBS]: Added.
• CPUID Fn8000_0008_EAX[GuestPhysAddrSize]: Ad
• CPUID Fn8000_0008_EAX[PhysAddrSize]: Updated
• CPUID Fn8000_000A_EDX[Ssse3Sse5Dis]: Added.


http://www.amd.com/us-en/assets/content ... /25481.pdf

The last update before that was in july 2007 and had all the K10 infos on SSE4A, L3 caches, etc.

So I guess one can assume that AMD's 45nm K10 edition is indeed featuring SSE5 plus some older standards.

cheers

Opteron

P.S: Several other AMD guides also got an update, FYI: http://www.amd.com/us-en/Processors/Tec ... 43,00.html


Top
 Profile  
 
 Post subject:
PostPosted: Fri Apr 11, 2008 1:23 pm 
Offline

Joined: Sat Sep 01, 2007 8:01 am
Posts: 650
It is going to be interesting, let s see what speed they come up with on MPSADBW, that is the one that matter the most on SSE41.

who?


Top
 Profile  
 
 Post subject:
PostPosted: Fri Apr 11, 2008 5:38 pm 
Offline

Joined: Sun Oct 07, 2007 6:22 pm
Posts: 102
Also, it is going to be interesting to see who gets to FMA first -- Intel, with the successor to Sandy Bridge, or AMD, with SSE5 in Bulldozer.

And then there is the SSE5 vs AVX vs Larrabee conundrum. Will AMD adopt AVX? Will it drop SSE5 in favor of AVX? Will Larrabee introduce yet another extension for the same functionality, and how will developers cope with having to develop/test/support all those different code paths?


Top
 Profile  
 
 Post subject:
PostPosted: Fri Apr 11, 2008 8:47 pm 
Offline

Joined: Mon Jul 23, 2007 1:48 am
Posts: 81
would it be possible that AVX and SSE5 implement some common subset of each other? given that they do essentially the same exact things, seems logical to have an overlap in opcodes. after looking that the 2 instruction set listings, i thought there was some overlap (similar names, similar descriptions, AVX able to handle both 128 and 256 bits and 3 or 4 operands), but doesn't look like the case. Could someone verify that they are disjoint?

http://softwarecommunity.intel.com/isn/ ... 943302.pdf

http://www.amd.com/us-en/assets/content ... /43479.pdf


Top
 Profile  
 
 Post subject:
PostPosted: Thu Apr 17, 2008 9:00 am 
Offline

Joined: Sun Mar 23, 2008 7:11 pm
Posts: 17
Location: Tarragona, Spain
The latency masures from my previous post were calculated with RMMA, I think the best memory and microarchitecture test suite. http://cpu.rightmark.org/download.shtml

The performance influence of the frecuency of the NB/L3/MemoryControllers in Barcelona Opterons is not that much important. I've tested NB frecuencies of 1.8/2.0/2.2 and 2.4 GHz and the best result is a 9% increase in winrar /7zip.

The latencies of L3 are reasonable considering:
- The asyncronous nature of the chip.
- The two clock domains with complex dividers.
- The semi-exclusive nature of L3 with two narrow 128 bit buses (one for each direction, to and from L2).
- The high associativity of 32 ways wich also adds latency.

I don´t think Shanghai changes much of these facts. I supose it to be a die shrink without much novelties.

A big step forward would be a totally syncronous Shanghai but, the power considerations? Probably the current thermal envelopes don't allow designs like this.

It seems Intel's Nehalem L3 will be asyncronous too, and the latency important. We'll see.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Apr 17, 2008 11:08 am 
Offline

Joined: Fri Sep 07, 2007 8:41 am
Posts: 12
@alavo03

There are already some Numbers about L3-Latency in Nehalem out:
http://realworldtech.com/page.cfm?Artic ... 182719&p=7

Quote:
As a result, the load to use latency for Nehalem varies depending on the relative frequency and phase alignment of the cores and the L3 itself and the latency of arbitration for access to the L3. In the best case, i.e. phase aligned operation and frequencies that differ by an integer multiple, Nehalem’s L3 load to use latency is somewhere in the range of 30-40 cycles according to Intel architects.


It is quite in the same range as the Barcelona-L3. I'm wondering how the small L2-caches will work in that context. I think that Intel will face more problems than they expected. Core2 is profiting a lot of the big & low-latency L2.

Regards,
Tom


Top
 Profile  
 
 Post subject:
PostPosted: Thu Apr 17, 2008 12:03 pm 
Offline

Joined: Sun Mar 23, 2008 7:11 pm
Posts: 17
Location: Tarragona, Spain
I'm aware of the excellent article of Real World Technologies, but I mean real measures, not estimations from the manufacturer. But 30-40 cycles in the best case seems quite high considering the expertise in caches from Intel.

From AMD Barcelona we can learn a few facts about that kind of complex asyncronous devices. The L3/NB/Memory_Controlers in his own clock domain creates latency problems aggravated by the fact that core clock changes in accordance with the performance requirement of the system.

Regards, Carlos.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Apr 17, 2008 12:55 pm 
Offline

Joined: Fri Sep 07, 2007 8:41 am
Posts: 12
>seems quite high considering the expertise in caches from Intel
Even their expertise cannot fight against physical laws.

I think that the decision to integrate the L2 so "border less" into the lower right corner of the core could be wrong. It is not possible to go to bigger L2-Caches without changing the position of some pipeline-stages. Montreal will have 1MB dedicated L2-cache per core instead of 512kb (shanghai & barcelona) and it won't change the look of the core.

I'm quite shure that benchmarks like superpi won't run that fast on nehalem compared to penryn (on an ipc basis), because of the slower cache-hierarchie(small l2-cache, slow l3-cache).

Regards,
Tom


Top
 Profile  
 
 Post subject:
PostPosted: Wed Apr 23, 2008 6:55 am 
Offline

Joined: Mon Jul 23, 2007 12:34 am
Posts: 107
Xbit are reporting that AMD's 45nm process is delayed:

http://www.xbitlabs.com/news/cpu/displa ... arter.html

..."“We’ll start the production ramp in the summertime and start to ship products in volume in Q4 2008,” said Dirk Meyer, president and chief operating officer at AMD during a conference call with financial analysts."...


Top
 Profile  
 
 Post subject:
PostPosted: Wed Apr 23, 2008 5:43 pm 
Offline

Joined: Wed Sep 26, 2007 11:11 pm
Posts: 33
LiamC wrote:
Xbit are reporting that AMD's 45nm process is delayed:

http://www.xbitlabs.com/news/cpu/displa ... arter.html

..."“We’ll start the production ramp in the summertime and start to ship products in volume in Q4 2008,” said Dirk Meyer, president and chief operating officer at AMD during a conference call with financial analysts."...


That is not a delay, it's is just one of the first truly upfront statements about the 45nm schedule.

Anything you heard from official sources about mid-2008 was cleverly deceptive, trying to make it look like they were less far behind than they really are. Phrases like "we'll start up the ramp" and such, which are quite meaningless to outsiders and therefore can always be wormed out of if anybody would actually call them out for it.

Those with good will towards AMD (or less experience with their marketing) could interpret those mid-2008 statements as them closing the process technology gap with Intel by up to half a year. But the realists already knew it would very likely be the end of the year, especially when word got out that they only got first silicon for Shanghai in Q1.

This is a recurring phenomenon by the way. They did it with 65nm and 45nm and they'll do it again with 32nm. Look out for it next time - always remember that "in the second half of the year" can mean December 31st without it being a lie ;).


Top
 Profile  
 
 Post subject: Re: Time for an update.
PostPosted: Thu Aug 21, 2008 10:01 am 
Offline

Joined: Tue Aug 07, 2007 11:57 am
Posts: 169
Hans de Vries wrote:
Phenom wrote:
Hans could "Bridge to 2nd Die?" really be PCIe link toward on package GPU like on this schematic
Image


Yes. For what the 8 core version concerns. After having seen Dunnington
I think we may may expect another monstermonolithic die (~700 mm2)
rather than a dual die package.


Regards, Hans


Eight core Nehalem has a mammoth die indeed...
http://download.intel.com/pressroom/kit ... G_0895.JPG

31.5mm x 21.5mm ~ 680mm^2

83 dies on a 300 mm wafer. No wonder they want 450 mm wafers...

No details visible unfortunately.


Regards, Hans


Top
 Profile  
 
 Post subject:
PostPosted: Thu Aug 21, 2008 10:47 am 
Offline

Joined: Wed Jun 27, 2007 1:38 pm
Posts: 468
Quad core Nehalem is only 246mm^2, does the 8 core version have a huge L3 cache?


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 136 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10  Next

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: