You are currently viewing our boards as a guest, which gives you limited access to view most discussions and access our other features. By joining our free community, you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content, and access many other special features. Registration is fast, simple, and absolutely free, so please, <a href="/profile.php?mode=register">join our community today</a>!
>> So Deerfield wasn't 180mm2?
> No. Deerfield was a Madison 6M with 3/4 of its L3 disabled.
(In answer to the actual question: all of these were 130nm.)
I have seen Deerfield cited as 266 mm^2 [1] and 250 million
transistors [can't remember where]. By contrast, Madison 6M
was 374 mm^2 and 410 million, while Madison II 9M was 432
mm^2 and 592 million.
On the other hand, the Itanium II spec update suggests just a
single stepping for Madison.
Mmh. I wonder where that 266 mm^2 number cam from then.
Joined: Wed Aug 29, 2007 3:55 pm Posts: 829 Location: Great white north
no@spam.com wrote:
Mmh. I wonder where that 266 mm^2 number cam from then.
A factual error about IPF on the net? Is that possible? :-P
There was never a unique device for Deerfield. It was fuse programmed
variant of Madison 6M binned for operation at lower voltage and clock
frequency than mainstream Madison SKUs.
Back to topic please. This is not an IA64 thread ;-)
Some first hand info, why AMD sticks with the old FMA definition:
Quote:
Since we don't control the definition of AVX, all we can say for sure is that we expect our initial products to be compatible with version 5 of the specification (the most recent one, as of this writing, published in January of 2009), except for the FMA instructions, which we expect will be compatible with version 3 (published in August of 2008).
Why the FMA difference? This was not something we did lightly. In December of 2008, Intel made significant changes to the FMA definition, which we found we could not accommodate without unacceptable risk to our product schedules. Yet we did not want to deprive customers of the significant performance benefits of FMA. So we decided to stick with the earlier definition, renaming it FMA4 (for four-operand FMA - Intel's newer definition uses what we believe to be a less capable three-operand, destructive-destination format). It will have a different CPUID feature flag from Intel's FMA extension. At some future point, we will likely adopt Intel's newer FMA definition as well, coexisting with FMA4. But as you might imagine, we may wait until we're sure the specification is stable.
Conclusion from the blog post Opteron linked is that AMD will support AVX AND its own XOP,CVT16,FMA4 specifications! The only difference between SandyB and Bulldozer will be that Sandy will not have FMA at all and SandyB's successor will either have FMA3 or intel will adopt more powerful FMA4 format that will be in K11.
They must have been working very hard to implement all that within such a short timeschedule. I am sure they will implement the 256-bit vector registers as two 128-bit registers, but so will Intel.
They must have been working very hard to implement all that within such a short timeschedule. I am sure they will implement the 256-bit vector registers as two 128-bit registers, but so will Intel.
So do your earlier criticisms of Intel still apply?
why do you think Intel will not simply do what they have stated: true 256-bit registers with the peak throughput effectively doubled ?
Because potential 256-bit instructions that cannot be split into two 128-bit instructions are totally missing in the current AVX spec. All shuffle instructions, horizontal add instructions, etc. are constructed so that there is no data communication across the middle of the 256-bit register, even though such instructions would be highly needed.
Both Intel and AMD did the same trick with the first implementations of 128-bit registers. They were split into two 64-bit registers. It took several years before we got true 128-bit registers and execution units.
I expect to see little or no gain in performance at the execution unit stage, but a moderate improvement at the instruction fecth/decode stage. Instruction fetching and decoding is often a bottleneck on Intel processors, but not on AMD processors. What AMD will gain from supporting 256-bit registers in Bulldozer is probably compatibility more than speed.
We will have to wait several years before we see true 256-bit registers and execution units, but it will also take several years before software that utilizes the 256-bit capabilities becomes mainstream.
I don't know whether memory access will have a throughput of 128 or 256 bits per clock, though.
Because potential 256-bit instructions that cannot be split into two 128-bit instructions are totally missing in the current AVX spec.
it's just like *all* SSEx instructions that can be split in 2 x 64-bit halves but have true 128-bit support in Conroe/Penryn/Nehalem/Westmere, if it was doubled from P4 4 cores ago (2 "tocks" ago) it's not difficult to trust Intel when they say it will doubled again in 32nm Sandy Bridge just as stated in the IDF slides, the peak throughput will be doubled and the effective throughput will be roughly 1.5x -1.8x for 100% vectorized kernels I'll say
[edit]
reference : IDF Spring 2008 slides in "SP_NGMS002_100r_eng.pdf", see p. 9, 3 last lines :
"
Intel®AVX targets a high-performance first implementation
-256-bit Multiply, Add and Shuffle engines (2X today)
-2nd load port
"
Post subject: Re: AMD to support all Intel instructions [edited]
Posted: Fri May 08, 2009 7:34 pm
Joined: Thu Sep 20, 2007 10:47 am Posts: 131
Eric Bron wrote:
Agner wrote:
Because potential 256-bit instructions that cannot be split into two 128-bit instructions are totally missing in the current AVX spec.
it's just like *all* SSEx instructions that can be split in 2 x 64-bit halves but have true 128-bit support in Conroe/Penryn/Nehalem/Westmere, if it was doubled from P4 4 cores ago (2 "tocks" ago) it's not difficult to trust Intel when they say it will doubled again in 32nm Sandy Bridge just as stated in the IDF slides, the peak throughput will be doubled and the effective throughput will be roughly 1.5x -1.8x for 100% vectorized kernels I'll say
[edit] reference : IDF Spring 2008 slides in "SP_NGMS002_100r_eng.pdf", see p. 9, 3 last lines :
" Intel®AVX targets a high-performance first implementation -256-bit Multiply, Add and Shuffle engines (2X today) -2nd load port "
Do you have a link to this file you can share? It's quite annoying when Intel takes down their IDF archives.
Users browsing this forum: No registered users and 1 guest
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum