| View previous topic :: View next topic |
| Author |
Message |
slacker
Joined: 02 Nov 2007 Posts: 26
|
Posted: Fri Jul 18, 2008 8:51 pm Post subject: |
|
|
| Paul DeMone wrote: |
2) The Montecito L3 consists of two distinct and separate halves,
each half servicing only one core with its 1.25 MB of L2 reducing
traffic to each half of the L3. The Shanghai uses a single unified
L3 servicing traffic from four cores with only a 0.5 MB L2 reducing
traffic between each core and the L3. Activity factor and power
dissipation? well gosh! |
Do you have an activity factor comparison which accounts for the 2048 data bits transferred in 5 clocks on Montecito for an interleaved read/write vs. the 128 (256?) bits transferred on each clock for Barcelona? Looks to me that the cache transistor activity factor for Montecito would be higher.
| Paul DeMone wrote: | | dkanter wrote: |
The main objective of the cell shrink was to use as little man power as possible. IIRC, the shrink only took around 10 man years to complete at the cost of suboptimal area and power scaling. |
That would be a change in strategy vs the 90 nm to 65 nm shrink.
IBM took the opportunity to use the device performance gain and
more or less fixed target frequency to replace some control logic
in the SPEs that was implemented with dynamic circuitry in 90 nm
to meet the cycle time goal with static circuitry in 65 nm to reduce
power.
No one but no one dicks around changing working control logic let
alone time critical circuit topologies when you are trying to use "as
little manpower as possible". :roll: |
Cell Processor: 3 Years, 3 Process Generations
- "Quick shrink – choosing between smaller, faster or lower power – IBM opted for lower power."
- "Power consumption of the 45 nm CELL processor is less than forty-percent that of the 90 nm CELL processor – now less than 20 watts."
- "No High-K gate-oxide or Metal-Gate electrode in the 45 nm SOI process used by the 45 nm CELL processor."
- "IBM converted dynamic circuits used in the 90 nm and 65 nm designs to fully static circuits."
- "Re-optimized signal paths removed the requirement for high-speed, high-slew-rate drivers, the high-speed, high-slew-rate drivers are then replaced with slower and lower power drivers."
- "In the 45 nm CELL processor presentation, IBM was rightfully proud to boast that it was able to dramatically lower power consumption of the processor with minimal design resources – estimated at (low) ten’s of man-years of design effort spent in the successful port of a highly complex modern processor from one process technology to another process technology."
|
|
| Back to top |
|
 |
Paul DeMone
Joined: 29 Aug 2007 Posts: 376 Location: Great white north
|
Posted: Sat Jul 19, 2008 2:16 am Post subject: |
|
|
| slacker wrote: |
Do you have an activity factor comparison which accounts for the 2048 data bits transferred in 5 clocks on Montecito for an interleaved read/write vs. the 128 (256?) bits transferred on each clock for Barcelona? |
Huh? The L3 will need to service a certain number of read and write
transfers per X million instructions executed. A 2x wider data path
needs to be active for only half as many cycles to accomplish this.
|
|
| Back to top |
|
 |
no@spam.com
Joined: 07 Oct 2007 Posts: 53
|
Posted: Sat Jul 19, 2008 6:36 am Post subject: |
|
|
> Huh? The L3 will need to service a certain number of read and write
> transfers per X million instructions executed. A 2x wider data path
> needs to be active for only half as many cycles to accomplish this.
Only if the locality of those accesses trends towards "streaming" -- if it
does not, then the 128-bit-per-1-cycle bus will beat the 2,048-bits-per-
5-cycles bus, both on performance and power, at some point.
|
|
| Back to top |
|
 |
Hans de Vries
Joined: 07 Aug 2007 Posts: 74
|
|
| Back to top |
|
 |
Opteron
Joined: 16 Mar 2008 Posts: 44
|
|
| Back to top |
|
 |
AtWork
Joined: 31 Jul 2007 Posts: 116
|
Posted: Mon Jul 28, 2008 2:17 pm Post subject: |
|
|
| Hans de Vries wrote: | Significant power savings for the 45nm Deneb (6MB-L3) versus
the 65nm Barcelona (2MB-L3) if this is indicative.
Power consumption at 2.3 GHz:
Deneb: 45nm ___ Full minus Idle Load = 29W
Barcelona___ ___ Full minus Idle Load = 46W
This is a pre-production C0-stepping.
Regards, Hans |
So a 6 core 45nm part should use somewhat less power than a 4 core 65nm part (given that memory controllers, etc. don't increase as much).
Still seems like the 12 core part will be thermally constrained.
When I run cpuburn on a 32 core box (vs. idle with, power-now on) power at the wall goes up by just under 800w. Power use per core at idle (checked by adding or removing 1 CPU to a 2 CPU box is a couple of watts (it's essentially noise when measured at the wall). Factoring in PS and on0board voltage regulation, it seems like the current quad cores max out at ~80w under burn. So a 12 core part would need that 140w they've been talking about and still be limted to < 2.5ghz.
OTOH, a single-image 96 core box would be capable of some significant processing.....
|
|
| Back to top |
|
 |
inf64
Joined: 04 Sep 2007 Posts: 59
|
Posted: Wed Jul 30, 2008 10:56 am Post subject: |
|
|
| AtWork wrote: | | Hans de Vries wrote: | Significant power savings for the 45nm Deneb (6MB-L3) versus
the 65nm Barcelona (2MB-L3) if this is indicative.
Power consumption at 2.3 GHz:
Deneb: 45nm ___ Full minus Idle Load = 29W
Barcelona___ ___ Full minus Idle Load = 46W
This is a pre-production C0-stepping.
Regards, Hans |
So a 6 core 45nm part should use somewhat less power than a 4 core 65nm part (given that memory controllers, etc. don't increase as much).
Still seems like the 12 core part will be thermally constrained.
When I run cpuburn on a 32 core box (vs. idle with, power-now on) power at the wall goes up by just under 800w. Power use per core at idle (checked by adding or removing 1 CPU to a 2 CPU box is a couple of watts (it's essentially noise when measured at the wall). Factoring in PS and on0board voltage regulation, it seems like the current quad cores max out at ~80w under burn. So a 12 core part would need that 140w they've been talking about and still be limted to < 2.5ghz.
OTOH, a single-image 96 core box would be capable of some significant processing..... |
AFAIK,AMD plans using High-K for Magny Cours(MCM of two 6 core chips).
High-K would help them control the leakage at higher clocks.So the 12C will not be made of the same C1(or 2 ) stepping Shanghai cores,but from more advanced ones(from process pov),presumably RevD.
|
|
| Back to top |
|
 |
Alessandro
Joined: 31 Jul 2007 Posts: 10 Location: Italy
|
Posted: Wed Jul 30, 2008 2:28 pm Post subject: |
|
|
I knew present step was C0 and that it was the production step, at least for Q4 2008. That's correct?
|
|
| Back to top |
|
 |
Groo
Joined: 22 Jul 2007 Posts: 127
|
Posted: Wed Jul 30, 2008 3:13 pm Post subject: Where did you get that? |
|
|
| inf64 wrote: | | AFAIK,AMD plans using High-K for Magny Cours(MCM of two 6 core chips). High-K would help them control the leakage at higher clocks.So the 12C will not be made of the same C1(or 2 ) stepping Shanghai cores,but from more advanced ones(from process pov),presumably RevD. |
Where did you hear this? I have heard the High-K on 45 both ways, just trying to narrow things down. Is it from a public or private source?
-Charlie
|
|
| Back to top |
|
 |
inf64
Joined: 04 Sep 2007 Posts: 59
|
|
| Back to top |
|
 |
no@spam.com
Joined: 07 Oct 2007 Posts: 53
|
Posted: Wed Jul 30, 2008 6:31 pm Post subject: |
|
|
> Here you can see a C2 OCed(cpuz shows it as a C1 and doesn't read
> the vcore properly-you can see the real vcore in AMD overdrive):
> http://www.overclock.net/4285114-post9.html
>
>
stepping=1 is C1, not C2
|
|
| Back to top |
|
 |
Alessandro
Joined: 31 Jul 2007 Posts: 10 Location: Italy
|
Posted: Wed Jul 30, 2008 7:10 pm Post subject: |
|
|
How do you know is a C2 and not a C1 as reported in CPUZ?
Public o private source?
Edit:2 more steps(considering C0 in April) in 3 months? It seems unlike to me.
Groo can you tell something about that?
|
|
| Back to top |
|
 |
inf64
Joined: 04 Sep 2007 Posts: 59
|
Posted: Thu Jul 31, 2008 5:40 pm Post subject: |
|
|
| no@spam.com wrote: |
stepping=1 is C1, not C2 |
CPUz isn't reading it right...
|
|
| Back to top |
|
 |
no@spam.com
Joined: 07 Oct 2007 Posts: 53
|
Posted: Thu Jul 31, 2008 9:05 pm Post subject: |
|
|
>> stepping=1 is C1, not C2
> CPUz isn't reading it right...
C2 reports F-4-2.
This screen shot shows F-4-1 -- not because CPUz
is broken, but because it is running on a C1.
|
|
| Back to top |
|
 |
up
Joined: 06 Oct 2007 Posts: 35
|
Posted: Thu Aug 07, 2008 12:51 pm Post subject: |
|
|
|
|
|
| Back to top |
|
 |
|