| Aceshardware http://aceshardware.freeforums.org/ |
|
| Finally an image of Shanghai http://aceshardware.freeforums.org/finally-an-image-of-shanghai-t405-120.html |
Page 9 of 10 |
| Author: | fellix [ Sun Mar 30, 2008 3:16 pm ] |
| Post subject: | |
I haven't seen still an die-size comparison of Shanghai, so I decided to do it myself:
I had some great deal of figuring the exact (or near by) scale matching here -- I hope it's representative enough. ;) On the right side you can see the two single-core excerpts, for better perception. One note to pay attention for: the I/O interface logic (HT & DRAM) in Shanghai takes the same size, as in its 65nm predecessor. |
|
| Author: | Gabriele Svelto [ Sun Mar 30, 2008 4:19 pm ] |
| Post subject: | |
fellix wrote: I haven't seen still an die-size comparison of Shanghai, so I decided to do it myself: Good stuff :) Quote: One note to pay attention for: the I/O interface logic (HT & DRAM) in Shanghai takes the same size, as in its 65nm predecessor.
I think that was expected. It would be interesting to make a die-size comparison of the I/O part with Intel's Nehalem DDR3+QPI implementation. |
|
| Author: | EduardoS [ Sun Mar 30, 2008 5:50 pm ] |
| Post subject: | |
JumpingJack wrote: CPU/NB--L1/L2/L3 Latency (Everest in ns), L1/L2/L3 CPUID in cycles
1.8/1.8 --1.6/5.1/10.8 -- 3/15/51 1.8/2.0 --1.6/5.1/9.8 -- 3/15/48 (oddly CPUID reports 4 levels of cache at these settings, 1 L1 @ 8K and 1 L1 at 64K weird) 2.0/2.0 --1.5/4.6/9.2 -- 3/16/50 2.0/1.8 --1.5/4.6/9.8 -- 3/16/45 2.0/1.6 --1.5/4.6/10.5--3/15/55 It seens Barcelona's prefetcher fooled Everest and CPUID... I'm not sure if it is possible to disable the prefetcher, anyway, Rightmark is a bit more complete. http://cpu.rightmark.org/download.shtml |
|
| Author: | Opteron [ Fri Apr 11, 2008 12:20 pm ] |
| Post subject: | |
We had some instruction set discussing on the previous pages, I found some new evidence now in the updated AMD CPUID specification: Quote: April 2.28
• 3.1 [Legacy Method] on page 23: Clarified. 2008 • CPUID Fn0000_0001_ECX[SSE41]: Added. • CPUID Fn0000_0001_ECX[SSSE3]: Added. • CPUID Fn8000_0001_ECX[SSE5]: Added. • CPUID Fn8000_0001_ECX[IBS]: Added. • CPUID Fn8000_0008_EAX[GuestPhysAddrSize]: Ad • CPUID Fn8000_0008_EAX[PhysAddrSize]: Updated • CPUID Fn8000_000A_EDX[Ssse3Sse5Dis]: Added. http://www.amd.com/us-en/assets/content ... /25481.pdf The last update before that was in july 2007 and had all the K10 infos on SSE4A, L3 caches, etc. So I guess one can assume that AMD's 45nm K10 edition is indeed featuring SSE5 plus some older standards. cheers Opteron P.S: Several other AMD guides also got an update, FYI: http://www.amd.com/us-en/Processors/Tec ... 43,00.html |
|
| Author: | who? [ Fri Apr 11, 2008 1:23 pm ] |
| Post subject: | |
It is going to be interesting, let s see what speed they come up with on MPSADBW, that is the one that matter the most on SSE41. who? |
|
| Author: | no@spam.com [ Fri Apr 11, 2008 5:38 pm ] |
| Post subject: | |
Also, it is going to be interesting to see who gets to FMA first -- Intel, with the successor to Sandy Bridge, or AMD, with SSE5 in Bulldozer. And then there is the SSE5 vs AVX vs Larrabee conundrum. Will AMD adopt AVX? Will it drop SSE5 in favor of AVX? Will Larrabee introduce yet another extension for the same functionality, and how will developers cope with having to develop/test/support all those different code paths? |
|
| Author: | cornelius785 [ Fri Apr 11, 2008 8:47 pm ] |
| Post subject: | |
would it be possible that AVX and SSE5 implement some common subset of each other? given that they do essentially the same exact things, seems logical to have an overlap in opcodes. after looking that the 2 instruction set listings, i thought there was some overlap (similar names, similar descriptions, AVX able to handle both 128 and 256 bits and 3 or 4 operands), but doesn't look like the case. Could someone verify that they are disjoint? http://softwarecommunity.intel.com/isn/ ... 943302.pdf http://www.amd.com/us-en/assets/content ... /43479.pdf |
|
| Author: | alavo03 [ Thu Apr 17, 2008 9:00 am ] |
| Post subject: | |
The latency masures from my previous post were calculated with RMMA, I think the best memory and microarchitecture test suite. http://cpu.rightmark.org/download.shtml The performance influence of the frecuency of the NB/L3/MemoryControllers in Barcelona Opterons is not that much important. I've tested NB frecuencies of 1.8/2.0/2.2 and 2.4 GHz and the best result is a 9% increase in winrar /7zip. The latencies of L3 are reasonable considering: - The asyncronous nature of the chip. - The two clock domains with complex dividers. - The semi-exclusive nature of L3 with two narrow 128 bit buses (one for each direction, to and from L2). - The high associativity of 32 ways wich also adds latency. I don´t think Shanghai changes much of these facts. I supose it to be a die shrink without much novelties. A big step forward would be a totally syncronous Shanghai but, the power considerations? Probably the current thermal envelopes don't allow designs like this. It seems Intel's Nehalem L3 will be asyncronous too, and the latency important. We'll see. |
|
| Author: | mocad_tom [ Thu Apr 17, 2008 11:08 am ] |
| Post subject: | |
@alavo03 There are already some Numbers about L3-Latency in Nehalem out: http://realworldtech.com/page.cfm?Artic ... 182719&p=7 Quote: As a result, the load to use latency for Nehalem varies depending on the relative frequency and phase alignment of the cores and the L3 itself and the latency of arbitration for access to the L3. In the best case, i.e. phase aligned operation and frequencies that differ by an integer multiple, Nehalem’s L3 load to use latency is somewhere in the range of 30-40 cycles according to Intel architects.
It is quite in the same range as the Barcelona-L3. I'm wondering how the small L2-caches will work in that context. I think that Intel will face more problems than they expected. Core2 is profiting a lot of the big & low-latency L2. Regards, Tom |
|
| Author: | alavo03 [ Thu Apr 17, 2008 12:03 pm ] |
| Post subject: | |
I'm aware of the excellent article of Real World Technologies, but I mean real measures, not estimations from the manufacturer. But 30-40 cycles in the best case seems quite high considering the expertise in caches from Intel. From AMD Barcelona we can learn a few facts about that kind of complex asyncronous devices. The L3/NB/Memory_Controlers in his own clock domain creates latency problems aggravated by the fact that core clock changes in accordance with the performance requirement of the system. Regards, Carlos. |
|
| Author: | mocad_tom [ Thu Apr 17, 2008 12:55 pm ] |
| Post subject: | |
>seems quite high considering the expertise in caches from Intel Even their expertise cannot fight against physical laws. I think that the decision to integrate the L2 so "border less" into the lower right corner of the core could be wrong. It is not possible to go to bigger L2-Caches without changing the position of some pipeline-stages. Montreal will have 1MB dedicated L2-cache per core instead of 512kb (shanghai & barcelona) and it won't change the look of the core. I'm quite shure that benchmarks like superpi won't run that fast on nehalem compared to penryn (on an ipc basis), because of the slower cache-hierarchie(small l2-cache, slow l3-cache). Regards, Tom |
|
| Author: | LiamC [ Wed Apr 23, 2008 6:55 am ] |
| Post subject: | |
Xbit are reporting that AMD's 45nm process is delayed: http://www.xbitlabs.com/news/cpu/displa ... arter.html ..."“We’ll start the production ramp in the summertime and start to ship products in volume in Q4 2008,” said Dirk Meyer, president and chief operating officer at AMD during a conference call with financial analysts."... |
|
| Author: | Wouter Tinus [ Wed Apr 23, 2008 5:43 pm ] |
| Post subject: | |
LiamC wrote: Xbit are reporting that AMD's 45nm process is delayed:
http://www.xbitlabs.com/news/cpu/displa ... arter.html ..."“We’ll start the production ramp in the summertime and start to ship products in volume in Q4 2008,” said Dirk Meyer, president and chief operating officer at AMD during a conference call with financial analysts."... That is not a delay, it's is just one of the first truly upfront statements about the 45nm schedule. Anything you heard from official sources about mid-2008 was cleverly deceptive, trying to make it look like they were less far behind than they really are. Phrases like "we'll start up the ramp" and such, which are quite meaningless to outsiders and therefore can always be wormed out of if anybody would actually call them out for it. Those with good will towards AMD (or less experience with their marketing) could interpret those mid-2008 statements as them closing the process technology gap with Intel by up to half a year. But the realists already knew it would very likely be the end of the year, especially when word got out that they only got first silicon for Shanghai in Q1. This is a recurring phenomenon by the way. They did it with 65nm and 45nm and they'll do it again with 32nm. Look out for it next time - always remember that "in the second half of the year" can mean December 31st without it being a lie ;). |
|
| Author: | Hans de Vries [ Thu Aug 21, 2008 10:01 am ] |
| Post subject: | Re: Time for an update. |
Hans de Vries wrote: Phenom wrote: Yes. For what the 8 core version concerns. After having seen Dunnington I think we may may expect another monstermonolithic die (~700 mm2) rather than a dual die package. Regards, Hans Eight core Nehalem has a mammoth die indeed... http://download.intel.com/pressroom/kit ... G_0895.JPG 31.5mm x 21.5mm ~ 680mm^2 83 dies on a 300 mm wafer. No wonder they want 450 mm wafers... No details visible unfortunately. Regards, Hans |
|
| Author: | jack [ Thu Aug 21, 2008 10:47 am ] |
| Post subject: | |
Quad core Nehalem is only 246mm^2, does the 8 core version have a huge L3 cache? |
|
| Page 9 of 10 | All times are UTC + 1 hour |
| Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |
|