Aceshardware

(not so) temporary home for the aceshardware community
 FAQ •  Search •  Register •  Login 
It is currently Thu Dec 17, 2009 8:47 am

All times are UTC + 1 hour



Welcome
Welcome to <strong>Aceshardware</strong>.

You are currently viewing our boards as a guest, which gives you limited access to view most discussions and access our other features. By joining our free community, you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content, and access many other special features. Registration is fast, simple, and absolutely free, so please, <a href="/profile.php?mode=register">join our community today</a>!


Post new topic Reply to topic  [ 66 posts ]  Go to page 1, 2, 3, 4, 5  Next
Author Message
 Post subject: Shared FP unit
PostPosted: Thu Nov 12, 2009 1:42 pm 
Offline

Joined: Sat Jul 25, 2009 2:08 pm
Posts: 48
"Never say never".
During Analysts Day AMD confirmed that Bulldozer has a shared (between two Int cores) independent FP unit.
For the last year AMD assured us that independent cores of Shanghai/Istanbul are much better approach than Hyperthreading of Intel's Nehalem.
Shared FP unit can save die space but at the same time it adds complexity of the logic and potential (for very FP intensive programs) bottleneck. It's also a new task for OS dispatcher.
I see it as a step back vs. current Int&FP independent cores. How much do you think AMD gains in space vs. dedicated FP units? Does the game cost the return?


Top
 Profile  
 
 Post subject: Re: Shared FP unit
PostPosted: Thu Nov 12, 2009 3:31 pm 
Offline

Joined: Tue Jul 31, 2007 1:25 pm
Posts: 200
Aren't they also going to a 256 bit FPU?

Could they be planning siamesed 128 bit FPUs? The two FPUs could be "independent" for 128 bit operations and shared for 256 bit.

Bulldozer is supposed to be very modular. A later version could have 4 of the FPU blocks (for dual 256 bit FPUs).


Top
 Profile  
 
 Post subject: Re: Shared FP unit
PostPosted: Thu Nov 12, 2009 5:00 pm 
Offline

Joined: Wed Jun 27, 2007 1:38 pm
Posts: 479
Their terminology is really confusing. Simply, a single core contains two integer schedulers and one FP scheduler. I don't how this is "sharing" a FP unit, after all, every core will have a single FP scheduler with two 128-bit FPUs.


Top
 Profile  
 
 Post subject: Re: Shared FP unit
PostPosted: Thu Nov 12, 2009 7:55 pm 
Offline

Joined: Tue Aug 18, 2009 5:57 pm
Posts: 30
I certainly hope they are not going to do a Niagara thing. Perhaps AMD is, like said above, just taking a leaf from NVIDIA's playbook and calling an ALU a "core".


Top
 Profile  
 
 Post subject: Re: Shared FP unit
PostPosted: Thu Nov 12, 2009 8:05 pm 
Offline

Joined: Tue Sep 04, 2007 4:40 pm
Posts: 126
jack wrote:
Their terminology is really confusing. Simply, a single core contains two integer schedulers and one FP scheduler. I don't how this is "sharing" a FP unit, after all, every core will have a single FP scheduler with two 128-bit FPUs.

No it's not.You'll have 4 modules each having 2 integer mini-cores(clusters) and one shared SIMD unit(shared a la SMT way,but having ability to be dynamically assigned to each cluster in various combination:1x256b,2x128b etc. depending on the workload). Those 4 modules will in essence be seen by the OS as 8 physical cores(which they actually are) and that way they will be marketed/branded: X8 for 4 module version and X4 for 2 module version. Not to mention there is room for growth by simple addition of bulldozer modules to the design ,so >8 cores versions are quite possible.


Top
 Profile  
 
 Post subject: Re: Shared FP unit
PostPosted: Thu Nov 12, 2009 9:17 pm 
Offline

Joined: Wed Jun 27, 2007 1:38 pm
Posts: 479
inf64 wrote:
jack wrote:
Their terminology is really confusing. Simply, a single core contains two integer schedulers and one FP scheduler. I don't how this is "sharing" a FP unit, after all, every core will have a single FP scheduler with two 128-bit FPUs.

No it's not.You'll have 4 modules each having 2 integer mini-cores(clusters) and one shared SIMD unit(shared a la SMT way,but having ability to be dynamically assigned to each cluster in various combination:1x256b,2x128b etc. depending on the workload). Those 4 modules will in essence be seen by the OS as 8 physical cores(which they actually are) and that way they will be marketed/branded: X8 for 4 module version and X4 for 2 module version. Not to mention there is room for growth by simple addition of bulldozer modules to the design ,so >8 cores versions are quite possible.


In my opinion it's very confusing to redefine meaning of "core". When talking about CPUs, a core is much more than collection of ALUs.

I really hope that Bulldozer will have 8 real (FP) cores. After all, it's a high end CPU to be released in 2011.

Quote:
I certainly hope they are not going to do a Niagara thing. Perhaps AMD is, like said above, just taking a leaf from NVIDIA's playbook and calling an ALU a "core".


Actually AMD itself has taken this much further and they are calling each 32-bit component of an ALU a "stream processor". For example, newest Radeon chip has 320 ALUs that can operate on a 4-component vector and scalar, and AMD markets it having 1600 stream processors.


Top
 Profile  
 
 Post subject: Re: Shared FP unit
PostPosted: Thu Nov 12, 2009 11:04 pm 
Offline

Joined: Tue Jul 31, 2007 1:25 pm
Posts: 200
It's almost starting to sound like an x86 spin on the Itanium (wide core with instruction bundles). 8 execution units sound familiar to anyone else? Did AMD's recent licensing agreement extend to the Itanium instruction set?

Quote:
A single Bulldozer "module" looks to the OS like a single processor core with simultaneous multithreading (SMT) enabled, which makes sense, because that's essentially what it is. But unlike a normal SMT core, instructions from each thread are dispatched, tracked throughout the execution process, and retired by a dedicated instruction window. And when instructions from one thread retire, they write their results out to a dedicated data cache (so each module has two d-caches).

AMD has not said how many instructions per cycle the front-end can dispatch, but it can't be less than four, and it may be as high as six or eight, depending on the amount of decode hardware.

As you can see in the diagram above, there are two integer schedulers, each of which feeds four pipelines: two integer pipes and two memory pipes (load and store). Right now, AMD is referring to each integer scheduler and the pipelines associated with it as a "core," making each Bulldozer module "dual-core." I think this terminology is a huge mistake, and I hope AMD rethinks it. It's probably better to call each back-end an "execution core"—a term that I actually use in my book—in contrast to a "processor core" or just a "core," which is the front-end and everything behind it.

http://arstechnica.com/hardware/news/20 ... mpaign=rss


Top
 Profile  
 
 Post subject: Re: Shared FP unit
PostPosted: Fri Nov 13, 2009 4:38 am 
Offline

Joined: Sat Mar 22, 2008 5:10 pm
Posts: 224
jack wrote:
Actually AMD itself has taken this much further and they are calling each 32-bit component of an ALU a "stream processor". For example, newest Radeon chip has 320 ALUs that can operate on a 4-component vector and scalar, and AMD markets it having 1600 stream processors.

Do you know what is an ALU? In every plausible definition for it there is no way one can fit 320 for Cypress...


Top
 Profile  
 
 Post subject: Re: Shared FP unit
PostPosted: Fri Nov 13, 2009 5:13 am 
Offline

Joined: Sat Jul 25, 2009 2:08 pm
Posts: 48
inf64 wrote:
No it's not.You'll have 4 modules each having 2 integer mini-cores(clusters) and one shared SIMD unit(shared a la SMT way,but having ability to be dynamically assigned to each cluster in various combination:1x256b,2x128b etc. depending on the workload).

What do you mean "mini-core"? They are simpler than in K10?


Top
 Profile  
 
 Post subject: Re: Shared FP unit
PostPosted: Fri Nov 13, 2009 8:32 am 
Offline

Joined: Wed Jun 27, 2007 1:38 pm
Posts: 479
yfe wrote:
inf64 wrote:
No it's not.You'll have 4 modules each having 2 integer mini-cores(clusters) and one shared SIMD unit(shared a la SMT way,but having ability to be dynamically assigned to each cluster in various combination:1x256b,2x128b etc. depending on the workload).

What do you mean "mini-core"? They are simpler than in K10?


Yes. K10 core has 3 integer ALUs, while Bulldozer "cluster" has 2 integer ALUs (Bulldozer core has 4 integer ALUs in total).


Top
 Profile  
 
 Post subject: Re: Shared FP unit
PostPosted: Fri Nov 13, 2009 9:56 am 
Offline

Joined: Thu Mar 13, 2008 10:53 am
Posts: 15
The initial K8 design had two "asymmetrical" load/sore units attached to the dL1 cache, three similar complex INT pipes and an FP/SIMD box. K10 changed that very generalized configuration by side-strapping a second 64-bit SIMD unit to the existing 80-bit FPU, doubling its 128-bit SSE throughput, but still -- technically -- it was a 2*64-bit pipe.
Now Bulldozer seems to streamline that whole architecture, by doubling the LS capacity (one LS couple per cluster/core) sawing off the third INT rarely utilized pipe and fusing two native 128-bit FMAC/SIMD units in between the two new "slimmed" INT boxes. There are much more questions dipping into details, like the cache structure, hierarchy and sizes, etc.


Top
 Profile  
 
 Post subject: Re: Shared FP unit
PostPosted: Fri Nov 13, 2009 1:23 pm 
Offline

Joined: Sat Jul 25, 2009 2:08 pm
Posts: 48
jack wrote:
Yes. K10 core has 3 integer ALUs, while Bulldozer "cluster" has 2 integer ALUs (Bulldozer core has 4 integer ALUs in total).

Where did you get it from? Don't you mix it with Bobcat?
On Bulldozer pic, four pipelines/core are shown only, w/o any details.


Top
 Profile  
 
 Post subject: Re: Shared FP unit
PostPosted: Fri Nov 13, 2009 1:57 pm 
Offline

Joined: Sat Jul 25, 2009 2:08 pm
Posts: 48
BTW, it's just my guessing but is it possible that AMD devised the new thread dispatching scheme?
I'll explain. Now for Nehalem and Opteron, OS is well aware of cores and core threading. Is it possible that for Bulldozer OS will NOT aware of it? For Bulldozer OS will only know how many threads it can execute simultaneously. Dispatching threads inside cpu is done by Bulldozer's dispatcher, i.e. by hardware itself!
In that case Bulldozer's dispatcher could switch threads from one module to another based on their on-fly execution including FP-execution! The best scenario is when "INT thread" is sitting next to "FP thread" in one module.


Top
 Profile  
 
 Post subject: Re: Shared FP unit
PostPosted: Fri Nov 13, 2009 2:48 pm 
Offline

Joined: Fri Aug 17, 2007 2:55 pm
Posts: 357
Quote:
Is it possible that for Bulldozer OS will NOT aware of it?


Bulldozer will look like a single CPU with SMT to the OS, so not possible.


Top
 Profile  
 
 Post subject: Re: Shared FP unit
PostPosted: Fri Nov 13, 2009 2:50 pm 
Offline

Joined: Fri Aug 17, 2007 2:55 pm
Posts: 357
yfe wrote:
jack wrote:
Yes. K10 core has 3 integer ALUs, while Bulldozer "cluster" has 2 integer ALUs (Bulldozer core has 4 integer ALUs in total).

Where did you get it from? Don't you mix it with Bobcat?
On Bulldozer pic, four pipelines/core are shown only, w/o any details.


Ars claims 4 pipes = 2 ALU + load + store


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 66 posts ]  Go to page 1, 2, 3, 4, 5  Next

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: