Aceshardware

(not so) temporary home for the aceshardware community
 FAQ •  Search •  Register •  Login 
It is currently Sun Nov 08, 2009 6:59 am

All times are UTC + 1 hour



Welcome
Welcome to <strong>Aceshardware</strong>.

You are currently viewing our boards as a guest, which gives you limited access to view most discussions and access our other features. By joining our free community, you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content, and access many other special features. Registration is fast, simple, and absolutely free, so please, <a href="/profile.php?mode=register">join our community today</a>!


Post new topic Reply to topic  [ 136 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7 ... 10  Next
Author Message
 Post subject:
PostPosted: Sun Mar 16, 2008 8:42 pm 
Offline

Joined: Sun Mar 16, 2008 3:20 pm
Posts: 82
P4man wrote:
Are you telling me there are any non geeks that have any idea what SSEx is let alone, that buy a cpu to get faster "PMSADBW" ? Further more, what law says SSE5 must be 100% completely binary backwards compatible with SSE4/3/2/1 ? I'm not a fan of this naming mess either, but lets not be ridiculous here.
I neither get his problem ... Intel's not better .. anyone remenbers the "Supplemental Streaming SIMD Extension 3 " called SSSE3 ? In the first reports these commands were called SSE4, then suddenly they were renamed to SSSE3, and there was another SSE4 extension. Or the Pentium4 ... the first batch with S423 was not really faster than a high clocked Pentium3, or the first DDR2-400 memory; slower than DDR1-400 despite the "2" ... naming confusions in the computer business were common, are common and will be common ...

Not good for the customer, but one gets used to it.

Edit:
Thx @Hans for the link, 2004 seems sufficient to have "enough" support, but as you wrote, we do not know ;-)

cheers

Opteron

P.S: The PCGH article also states that AMD would like to include the SSE4.1 commands, but they mention that this depends on Intel only.


Last edited by Opteron on Sun Mar 16, 2008 8:47 pm, edited 4 times in total.

Top
 Profile  
 
 Post subject: Re: Time for an update.
PostPosted: Sun Mar 16, 2008 8:43 pm 
Offline

Joined: Thu Sep 20, 2007 10:47 am
Posts: 130
Gabriele Svelto wrote:
Michael Westman wrote:
Excellent work Hans! What strikes me is the L3 cache density. Why is the AMD L3 cache about the same as L2? It's not like it's very fast, if you think of Barcelona...?

I seem to remember that they basically reused the cells designed for the L2 and yeah, it's not very fast.


Barcelona uses the same srams for L2 and L3. You have to remember that AMD is not nearly as aggressive on cache design as Intel and they have to contend with hysterisys from SOI.

Since they are using the same cells for L2 and L3, I would expect that there is timing slack in the L3 design which is traded away to lower the power draw.

DK


Top
 Profile  
 
 Post subject:
PostPosted: Sun Mar 16, 2008 9:00 pm 
Offline

Joined: Sat Sep 01, 2007 8:01 am
Posts: 650
Opteron wrote:
P4man wrote:
Are you telling me there are any non geeks that have any idea what SSEx is let alone, that buy a cpu to get faster "PMSADBW" ? Further more, what law says SSE5 must be 100% completely binary backwards compatible with SSE4/3/2/1 ? I'm not a fan of this naming mess either, but lets not be ridiculous here.
I neither get his problem ... Intel's not better .. anyone remenbers the "Supplemental Streaming SIMD Extension 3 " called SSSE3 ? In the first reports these commands were called SSE4, then suddenly they were renamed to SSSE3, and there was another SSE4 extension. Or the Pentium4 ... the first batch with S423 was not really faster than a high clocked Pentium3, or the first DDR2-400 memory; slower than DDR1-400 despite the "2" ... naming confusions in the computer business were common, are common and will be common ...

Not good for the customer, but one gets used to it.

Edit:
Thx @Hans for the link, 2004 seems sufficient to have "enough" support, but as you wrote, we do not know ;-)

cheers

Opteron

P.S: The PCGH article also states that AMD would like to include the SSE4.1 commands, but they mention that this depends on Intel only.


you are trying to mix up everything together, very effective FUD technic.
back to the point: AMD-SSE5 does not include SSE4: This is misleading.

who?
again, this is my own opinion.


Top
 Profile  
 
 Post subject:
PostPosted: Sun Mar 16, 2008 9:06 pm 
Offline

Joined: Sun Mar 16, 2008 3:20 pm
Posts: 82
who? wrote:
back to the point: AMD-SSE5 does not include SSE4: This is misleading.
Huu .. now you are mixing things up. Any SSEx standard denotes just a handful of commands, nobody says that they have to include the predecessors. However, most CPUs do, so please do not mix up processors which support several command sets, and the name of a single command subset.

cheers

Opteron


Top
 Profile  
 
 Post subject:
PostPosted: Sun Mar 16, 2008 9:09 pm 
Offline

Joined: Sat Sep 01, 2007 8:01 am
Posts: 650
Opteron wrote:
who? wrote:
back to the point: AMD-SSE5 does not include SSE4: This is misleading.
Huu .. now you are mixing things up. Any SSEx standard denotes just a handful of commands, nobody says that they have to include the predecessors. However, most CPUs do, so please do not mix up processors which support several command sets, and the name of a single command subset.

cheers

Opteron


As part of the group of people who design SSE to SSE4, I am telling you, this is the golden rule, you got to include the previous instruction set before moving to the next level.

Will you tell me you are more qualify than the people who named and did the work?

who?/ Francois
This is my own opinion, my employer is not responsable for this posting.


Top
 Profile  
 
 Post subject:
PostPosted: Sun Mar 16, 2008 9:48 pm 
Offline

Joined: Tue Aug 07, 2007 11:57 am
Posts: 169
Since it looks more and more that the previously leaked Gesher details
were indeed for the processor now codenamed Nehalem we actually may
have some numbers for the cache access times of Nehalem.

L1: 3 cycles.
L2: 9 cycles.
L3: 33 cycles.

The sheet mentions for the cache per core: L1=32kB, L2=512kB and L3=2-3 MB.
The Nehalem numbers are L1=32kB, L2=256kB, L3=2MB with the L3's
shared and maybe the L1 split in two 16kB halves per thread in SMT mode.

Image


Regards, Hans


Top
 Profile  
 
 Post subject:
PostPosted: Sun Mar 16, 2008 10:48 pm 
Offline

Joined: Fri Aug 17, 2007 2:55 pm
Posts: 346
Quote:
Since it looks more and more that the previously leaked Gesher details were indeed for the processor now codenamed Nehalem we actually may have some numbers for the cache access times of Nehalem.


There is at least one big thing on the Gesher slide that says that it is not Nehalem: 7FP/cycle using SSEx. Does Nehalem have a ring internal bus?


Top
 Profile  
 
 Post subject:
PostPosted: Sun Mar 16, 2008 11:05 pm 
Offline

Joined: Tue Aug 07, 2007 11:57 am
Posts: 169
TacoBell wrote:
Quote:
Since it looks more and more that the previously leaked Gesher details were indeed for the processor now codenamed Nehalem we actually may have some numbers for the cache access times of Nehalem.


There is at least one big thing on the Gesher slide that says that it is not Nehalem: 7FP/cycle using SSEx. Does Nehalem have a ring internal bus?


The SSE units on Nehalem are redesigned. They are not the same as
on Penryn. I would expect accumulate extensions to the multiplier, like
in SSE5, that alone would bring the number of double precision FP ops to 6.

The reason to bring the FP accu inside the multiplier is that you then
can do effectively single cycle FP adds (instead of 4 to 5 cycles) using
a few tricks. The whole idea is that one can start a new MAC every cycle,
accumulating all products together. I designed FP hardware like that
already 15 years or so ago.

The ring unit would be the L3 intercommunication: 4 x 64B for 4 cores
each hopping from one read/write buffer section to the other.


Regards, Hans


Last edited by Hans de Vries on Mon Mar 17, 2008 12:29 am, edited 1 time in total.

Top
 Profile  
 
 Post subject:
PostPosted: Sun Mar 16, 2008 11:34 pm 
Offline

Joined: Sun Mar 16, 2008 3:20 pm
Posts: 82
who? wrote:
Will you tell me you are more qualify than the people who named and did the work?
Sure I will, cause you are top professionals in processor design or instruction set implementation, but not in naming, semantic or technical text writing ;-)

Just have a look at that official intel illustration:
Image
source: ftp://download.intel.com/technology/arc ... -paper.pdf

Here we can see all the new instructions added to the Intel ISA since the Pentium era.

First thing that catches the eye is that it names "Intel processor ... additions" so the context of these extensions are intel CPUs.

Second thing: The number of instructions are counted independently of the predecessor's additions.

Then there are text passages like these:
1.
Quote:
Intel will also introduce new sets of instructions designed to optimize the performance and lower the power needs of a broad range
of existing and new applications. To effectively get the benefit of these new instruction, existing applications will need to be recompiled with
....
"New set of instructions" does not sound like these instructions are dependent on any other instructions. They are new, that's it ...

2.
Quote:
SSE4 is Intel’s largest ISA extension in terms of scope and
impact since SSE2.
So it is an "ISA" extension. What is ISA ? I interpret ISA as "standard IA32", or Intel64 (formally know as IA32e, not to be mistaken with IA64). Your argumentation would imply that "ISA" is always the latest and greatest with all the previous SSEx bells and whistles ...


All in all the only conclusion one can draw from the picture above is that Intel is supporting all SSEx instructions with its processors. That any other company is doing the same with its own processors is not a universally valid assumption. All SSE instructions are just independent ISA additions, one can implement all, parts of it, none ... anything is possible...

Your golden rule is nice, makes sense, and I naturally believe you, too, but as long as it is not written anywhere that *any* CPU has to implement all SSEx instructions to be "official" compatible to the SSE standard(if there is any), its area of influence is quite limited.

The whole thing boils down to a matter of interpretations or in other words nitpicking. Quite annoying ... lets agree to the fact that there are always several views possible of text, views, problems etc. ... and lets stop the discussion at this point, I prefer technical argumentations to nitpicking ;-)

For example Hans' findings / speculations on Nehalem's FP Units are much more interesting ;-)

cheers

Opteron


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 17, 2008 1:06 am 
Offline

Joined: Fri Aug 17, 2007 2:55 pm
Posts: 346
Hans de Vries wrote:
TacoBell wrote:
Quote:
Since it looks more and more that the previously leaked Gesher details were indeed for the processor now codenamed Nehalem we actually may have some numbers for the cache access times of Nehalem.


There is at least one big thing on the Gesher slide that says that it is not Nehalem: 7FP/cycle using SSEx. Does Nehalem have a ring internal bus?


The SSE units on Nehalem are redesigned. They are not the same as
on Penryn. I would expect accumulate extensions to the multiplier, like
in SSE5, that alone would bring the number of double precision FP ops to 6.

The reason to bring the FP accu inside the multiplier is that you then
can do effectively single cycle FP adds (instead of 4 to 5 cycles) using
a few tricks. The whole idea is that one can start a new MAC every cycle,
accumulating all products together. I designed FP hardware like that
already 15 years or so ago.

The ring unit would be the L3 intercommunication: 4 x 64B for 4 cores
each hopping from one read/write buffer section to the other.


Regards, Hans


I'm still skeptical that Nehalem will do >4 FP/cycle. There are tons of presentations from mid 2007 that all show Gesher distinct from Nehalem.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 17, 2008 7:29 am 
Offline

Joined: Wed Aug 15, 2007 3:06 am
Posts: 46
There is a PCWatch article that says that the 7 DP FP/cycle will be present in the generation after Nehalem.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 17, 2008 7:38 am 
Offline

Joined: Fri Aug 31, 2007 10:08 pm
Posts: 197
Location: Switzerland
Opteron wrote:

Just have a look at that official intel illustration:
Image
source: ftp://download.intel.com/technology/arc ... -paper.pdf

Here we can see all the new instructions added to the Intel ISA since the Pentium era.



in the Pentium era things were not so simple, I can remember the Pentium-MMX and the Pentium Pro available at the same period with the PPro lacking MMX and the PMMX lacking CMOV and other new instructions in the PPro

talking about SSEx, these are indeed clearly segregated (i.e. SSEn not included in SSEn+1), it's like that in the CPUID features flags and in all official Intel technical documentation I'm aware of, in fact it's quite common to talk between developers and say things like "I use SSE, SSE2 and SSE 4.1" (i.e. neither SSE3 nor SSSE3)

if who ? was right the features flags will be replaced by a simple version number and things will be way simpler for developers


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 17, 2008 8:31 am 
Offline

Joined: Tue Oct 09, 2007 5:43 pm
Posts: 17
Eric Bron wrote:
Opteron wrote:

Just have a look at that official intel illustration:
Image
source: ftp://download.intel.com/technology/arc ... -paper.pdf

Here we can see all the new instructions added to the Intel ISA since the Pentium era.



in the Pentium era things were not so simple, I can remember the Pentium-MMX and the Pentium Pro available at the same period with the PPro lacking MMX and the PMMX lacking CMOV and other new instructions in the PPro

talking about SSEx, these are indeed clearly segregated (i.e. SSEn not included in SSEn+1), it's like that in the CPUID features flags and in all official Intel technical documentation I'm aware of, in fact it's quite common to talk between developers and say things like "I use SSE, SSE2 and SSE 4.1" (i.e. neither SSE3 nor SSSE3)

if who ? was right the features flags will be replaced by a simple version number and things will be way simpler for developers


And things are even less simple now, SSE4a is a subset of SSE4, SSE5 is amd's new instuction set that adds better profiling support and 3 operand instructions. AMD will call it SSE5 and it may or may not be supported by Intel.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 17, 2008 12:28 pm 
Offline

Joined: Fri Sep 07, 2007 1:01 pm
Posts: 18
shank15217 wrote:
SSE4a is a subset of SSE4


Worse: AMD SSE4a doesn't intersect with Intel SSE4 at all.

BTW. I also don't understand why Intel splitted SSE4 into SSE4.1 and SSE4.2 ....


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 17, 2008 12:43 pm 
Offline

Joined: Sat Sep 01, 2007 4:11 pm
Posts: 170
Isn't x86 becoming incredibly convoluted now? Intel and AMD battling to inject the most instructions and larrabee coming with a subset of x86 instructions. I guess Atom will only support a subset too pluss the various AMD subset implementations. And all this to get data parallel performance boosts that is just as likely to eventually come from other circuitry than gp cores.

I wouldn't be surprised if either AMD or Intel came out with chips codenamed Nimrod or Tower.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 136 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7 ... 10  Next

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: