Aceshardware Forum Index Aceshardware
(not so) temporary home for the aceshardware community
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups    RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

45nm Phenom samples in the wild
Goto page Previous  1, 2, 3, 4, 5  Next
 
Post new topic   Reply to topic    Aceshardware Forum Index -> General forum
View previous topic :: View next topic  
Author Message
slacker



Joined: 02 Nov 2007
Posts: 26

PostPosted: Fri Jul 18, 2008 8:51 pm    Post subject: Reply with quote

Paul DeMone wrote:

2) The Montecito L3 consists of two distinct and separate halves,
each half servicing only one core with its 1.25 MB of L2 reducing
traffic to each half of the L3. The Shanghai uses a single unified
L3 servicing traffic from four cores with only a 0.5 MB L2 reducing
traffic between each core and the L3. Activity factor and power
dissipation? well gosh!

Do you have an activity factor comparison which accounts for the 2048 data bits transferred in 5 clocks on Montecito for an interleaved read/write vs. the 128 (256?) bits transferred on each clock for Barcelona? Looks to me that the cache transistor activity factor for Montecito would be higher.

Paul DeMone wrote:
dkanter wrote:

The main objective of the cell shrink was to use as little man power as possible. IIRC, the shrink only took around 10 man years to complete at the cost of suboptimal area and power scaling.


That would be a change in strategy vs the 90 nm to 65 nm shrink.
IBM took the opportunity to use the device performance gain and
more or less fixed target frequency to replace some control logic
in the SPEs that was implemented with dynamic circuitry in 90 nm
to meet the cycle time goal with static circuitry in 65 nm to reduce
power.

No one but no one dicks around changing working control logic let
alone time critical circuit topologies when you are trying to use "as
little manpower as possible". :roll:


Cell Processor: 3 Years, 3 Process Generations
  • "Quick shrink – choosing between smaller, faster or lower power – IBM opted for lower power."

  • "Power consumption of the 45 nm CELL processor is less than forty-percent that of the 90 nm CELL processor – now less than 20 watts."

  • "No High-K gate-oxide or Metal-Gate electrode in the 45 nm SOI process used by the 45 nm CELL processor."

  • "IBM converted dynamic circuits used in the 90 nm and 65 nm designs to fully static circuits."

  • "Re-optimized signal paths removed the requirement for high-speed, high-slew-rate drivers, the high-speed, high-slew-rate drivers are then replaced with slower and lower power drivers."

  • "In the 45 nm CELL processor presentation, IBM was rightfully proud to boast that it was able to dramatically lower power consumption of the processor with minimal design resources – estimated at (low) ten’s of man-years of design effort spent in the successful port of a highly complex modern processor from one process technology to another process technology."
Back to top
View user's profile Send private message
Paul DeMone



Joined: 29 Aug 2007
Posts: 376
Location: Great white north

PostPosted: Sat Jul 19, 2008 2:16 am    Post subject: Reply with quote

slacker wrote:

Do you have an activity factor comparison which accounts for the 2048 data bits transferred in 5 clocks on Montecito for an interleaved read/write vs. the 128 (256?) bits transferred on each clock for Barcelona?


Huh? The L3 will need to service a certain number of read and write
transfers per X million instructions executed. A 2x wider data path
needs to be active for only half as many cycles to accomplish this.
Back to top
View user's profile Send private message
no@spam.com



Joined: 07 Oct 2007
Posts: 53

PostPosted: Sat Jul 19, 2008 6:36 am    Post subject: Reply with quote

> Huh? The L3 will need to service a certain number of read and write
> transfers per X million instructions executed. A 2x wider data path
> needs to be active for only half as many cycles to accomplish this.

Only if the locality of those accesses trends towards "streaming" -- if it
does not, then the 128-bit-per-1-cycle bus will beat the 2,048-bits-per-
5-cycles bus, both on performance and power, at some point.
Back to top
View user's profile Send private message
Hans de Vries



Joined: 07 Aug 2007
Posts: 74

PostPosted: Wed Jul 23, 2008 2:45 pm    Post subject: Reply with quote

Significant power savings for the 45nm Deneb (6MB-L3) versus
the 65nm Barcelona (2MB-L3) if this is indicative.

Power consumption at 2.3 GHz:

Deneb: 45nm ___ Full minus Idle Load = 29W
Barcelona___ ___ Full minus Idle Load = 46W

This is a pre-production C0-stepping.

http://translate.google.nl/translate?u=http%3A%2F%2Fwww.itocp.com%2Fredirect.php%3Ftid%3D11603%26goto%3Dnewpost%23newpost&sl=zh-CN&tl=en&hl=en&ie=UTF-8

original page:
http://www.itocp.com/redirect.php?tid=11603&goto=newpost#newpost


Regards, Hans
Back to top
View user's profile Send private message
Opteron



Joined: 16 Mar 2008
Posts: 44

PostPosted: Wed Jul 23, 2008 10:02 pm    Post subject: Reply with quote

Hans de Vries wrote:
original page:
http://www.itocp.com/redirect.php?tid=11603&goto=newpost#newpost
Thx for the link, seems like socket AM3 will be 938 pins, as one can see 2 missing:

Note:
The Chinese mixed up the CPU types, here is a Phenom 9850 picture as comparison:


cheers

Opteron[/img]
Back to top
View user's profile Send private message
AtWork



Joined: 31 Jul 2007
Posts: 116

PostPosted: Mon Jul 28, 2008 2:17 pm    Post subject: Reply with quote

Hans de Vries wrote:
Significant power savings for the 45nm Deneb (6MB-L3) versus
the 65nm Barcelona (2MB-L3) if this is indicative.

Power consumption at 2.3 GHz:

Deneb: 45nm ___ Full minus Idle Load = 29W
Barcelona___ ___ Full minus Idle Load = 46W

This is a pre-production C0-stepping.

Regards, Hans


So a 6 core 45nm part should use somewhat less power than a 4 core 65nm part (given that memory controllers, etc. don't increase as much).

Still seems like the 12 core part will be thermally constrained.

When I run cpuburn on a 32 core box (vs. idle with, power-now on) power at the wall goes up by just under 800w. Power use per core at idle (checked by adding or removing 1 CPU to a 2 CPU box is a couple of watts (it's essentially noise when measured at the wall). Factoring in PS and on0board voltage regulation, it seems like the current quad cores max out at ~80w under burn. So a 12 core part would need that 140w they've been talking about and still be limted to < 2.5ghz.

OTOH, a single-image 96 core box would be capable of some significant processing.....
Back to top
View user's profile Send private message
inf64



Joined: 04 Sep 2007
Posts: 59

PostPosted: Wed Jul 30, 2008 10:56 am    Post subject: Reply with quote

AtWork wrote:
Hans de Vries wrote:
Significant power savings for the 45nm Deneb (6MB-L3) versus
the 65nm Barcelona (2MB-L3) if this is indicative.

Power consumption at 2.3 GHz:

Deneb: 45nm ___ Full minus Idle Load = 29W
Barcelona___ ___ Full minus Idle Load = 46W

This is a pre-production C0-stepping.

Regards, Hans


So a 6 core 45nm part should use somewhat less power than a 4 core 65nm part (given that memory controllers, etc. don't increase as much).

Still seems like the 12 core part will be thermally constrained.

When I run cpuburn on a 32 core box (vs. idle with, power-now on) power at the wall goes up by just under 800w. Power use per core at idle (checked by adding or removing 1 CPU to a 2 CPU box is a couple of watts (it's essentially noise when measured at the wall). Factoring in PS and on0board voltage regulation, it seems like the current quad cores max out at ~80w under burn. So a 12 core part would need that 140w they've been talking about and still be limted to < 2.5ghz.

OTOH, a single-image 96 core box would be capable of some significant processing.....

AFAIK,AMD plans using High-K for Magny Cours(MCM of two 6 core chips).
High-K would help them control the leakage at higher clocks.So the 12C will not be made of the same C1(or 2 ) stepping Shanghai cores,but from more advanced ones(from process pov),presumably RevD.
Back to top
View user's profile Send private message
Alessandro



Joined: 31 Jul 2007
Posts: 10
Location: Italy

PostPosted: Wed Jul 30, 2008 2:28 pm    Post subject: Reply with quote

I knew present step was C0 and that it was the production step, at least for Q4 2008. That's correct?
Back to top
View user's profile Send private message
Groo



Joined: 22 Jul 2007
Posts: 127

PostPosted: Wed Jul 30, 2008 3:13 pm    Post subject: Where did you get that? Reply with quote

inf64 wrote:
AFAIK,AMD plans using High-K for Magny Cours(MCM of two 6 core chips). High-K would help them control the leakage at higher clocks.So the 12C will not be made of the same C1(or 2 ) stepping Shanghai cores,but from more advanced ones(from process pov),presumably RevD.


Where did you hear this? I have heard the High-K on 45 both ways, just trying to narrow things down. Is it from a public or private source?

-Charlie
Back to top
View user's profile Send private message
inf64



Joined: 04 Sep 2007
Posts: 59

PostPosted: Wed Jul 30, 2008 3:27 pm    Post subject: Reply with quote

Alessandro wrote:
I knew present step was C0 and that it was the production step, at least for Q4 2008. That's correct?

C0 was from April.I don't think this will be the retail stepping.
Here you can see a C2 OCed(cpuz shows it as a C1 and doesn't read the vcore properly-you can see the real vcore in AMD overdrive):
http://www.overclock.net/4285114-post9.html



Groo wrote:

Where did you hear this? I have heard the High-K on 45 both ways, just trying to narrow things down. Is it from a public or private source?

-Charlie


Public source:
http://www.amd.com/us-en/assets/content_type/DownloadableAssets/AMD__45nm_Press_Q-A.pdf

Quote:
• Ultra-low-k Dielectrics. In some later 45nm products, AMD plans on using ultralow-
k dielectrics to reduce wire delays by as much as 15 percent and enable greater
overall processor performance.
• High-k/metal Gates. As part of AMD’s Continuous Transistor Improvement (CTI)
approach, AMD has the option to introduce high-k/metal gates into 45nm production
to further enhance transistor performance. The “gate first” approach, developed with
IBM, is designed to provide a simpler, less time consuming way to migrate to high-k
metal gate technology and secure benefits that include improved performance and
reduced power consumption.


It(high k) was the part of the "gate first" initiative.Presumably RevD was supposed to be the first rev to receive the high-k treatment.

From this link:
http://www.amd.com/us-en/assets/content_type/DownloadableAssets/AMD_45nm_Press_Presentation2.pdf (45nm press kit,page 4):

Quote:
Performance and Power Efficiency
- 20 percent performance improvement from strained silicon and
ultra-low-K dielectrics
- High-k/metal gate / future 45nm option
- Continuous Transistor Improvements (CTI) applied throughout
life-cycle of the process
Back to top
View user's profile Send private message
no@spam.com



Joined: 07 Oct 2007
Posts: 53

PostPosted: Wed Jul 30, 2008 6:31 pm    Post subject: Reply with quote

> Here you can see a C2 OCed(cpuz shows it as a C1 and doesn't read
> the vcore properly-you can see the real vcore in AMD overdrive):
> http://www.overclock.net/4285114-post9.html
>
>

stepping=1 is C1, not C2
Back to top
View user's profile Send private message
Alessandro



Joined: 31 Jul 2007
Posts: 10
Location: Italy

PostPosted: Wed Jul 30, 2008 7:10 pm    Post subject: Reply with quote

How do you know is a C2 and not a C1 as reported in CPUZ?
Public o private source?
Edit:2 more steps(considering C0 in April) in 3 months? It seems unlike to me.
Groo can you tell something about that?
Back to top
View user's profile Send private message
inf64



Joined: 04 Sep 2007
Posts: 59

PostPosted: Thu Jul 31, 2008 5:40 pm    Post subject: Reply with quote

no@spam.com wrote:


stepping=1 is C1, not C2

CPUz isn't reading it right...
Back to top
View user's profile Send private message
no@spam.com



Joined: 07 Oct 2007
Posts: 53

PostPosted: Thu Jul 31, 2008 9:05 pm    Post subject: Reply with quote

>> stepping=1 is C1, not C2
> CPUz isn't reading it right...

C2 reports F-4-2.

This screen shot shows F-4-1 -- not because CPUz
is broken, but because it is running on a C1.
Back to top
View user's profile Send private message
up



Joined: 06 Oct 2007
Posts: 35

PostPosted: Thu Aug 07, 2008 12:51 pm    Post subject: Reply with quote

Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Aceshardware Forum Index -> General forum All times are GMT + 1 Hour
Goto page Previous  1, 2, 3, 4, 5  Next
Page 3 of 5   

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB
Hosted by FreeForums.org