| View previous topic :: View next topic |
| Author |
Message |
Michael Westman
Joined: 27 Jun 2007 Posts: 24 Location: Amsterdam
|
Posted: Tue Nov 20, 2007 3:43 pm Post subject: Re: Phenom review is available |
|
|
| HighTech4US wrote: | | jack wrote: | http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=3153
Very sad situation for the AMD and the industry |
Have the engineers, who charlie saw dancing in the isles, been fired yet?
If this was what they were dancing about they should be gone. |
Considering this is your first post here: Did you by any chance consider the fact that K10 clocked quite well and that might have been the reason why they where dancing? (The L3 TLB bug was not discovered until recently) and AFAIK this bug is the reason why we have 2.3 GHz versions today...
If you have some inside information why don't you share with the rest of us?
|
|
| Back to top |
|
 |
up
Joined: 06 Oct 2007 Posts: 38
|
Posted: Tue Nov 20, 2007 4:17 pm Post subject: |
|
|
Rumor says, the "L3 TLB bug" took the phenom-crew by surprise in warsaw, so there was no time to do anything better instantly...
|
|
| Back to top |
|
 |
Blitzkrieg
Joined: 31 Jul 2007 Posts: 64 Location: New Zealand
|
Posted: Tue Nov 20, 2007 7:35 pm Post subject: |
|
|
Considering the 125TDP for a low speed bin I would like to know what the 3ghz samples were chewing up.
I don't think it is just the nb problem.
|
|
| Back to top |
|
 |
Alexko
Joined: 20 Sep 2007 Posts: 18
|
Posted: Tue Nov 20, 2007 8:35 pm Post subject: |
|
|
| Gabriele Svelto wrote: |
Yeah, that's why I said that it is not possible that the added memory latency comes from it. If the full load-to-use latency of the L3 cache is ~20ns then the time needed to check the tags (and thus initiate a memory request on a miss) should be a fraction of that. If theInq's article on a problem regarding TLB & L3 cache interaction is true it might just be possible that in order to fix it memory requests have to go through a longer delay before being issued.
That's all speculation on my part obviously but I think we're starting to have enough data points to understand why K10 is making such a poor showing (lack of high-cloked parts asides). |
Are we, really? Sure, the extra memory latency doesn't help, which means that in single-socket configuration, Phenoms are at a disadvantage against C2Qs and their huge --- and fast --- L2 caches, but that doesn't explain everything. On some benchmarks, such as games, they take such a huge beating that I can't believe this latency problem is the only thing hurting Phenoms.
|
|
| Back to top |
|
 |
lux_interior
Joined: 26 Jul 2007 Posts: 252
|
Posted: Tue Nov 20, 2007 11:11 pm Post subject: |
|
|
| Blitzkrieg wrote: | Considering the 125TDP for a low speed bin I would like to know what the 3ghz samples were chewing up.
I don't think it is just the nb problem. |
Agreed. Hardware.fr shows the Phenom 9600 (2.3 GHz) chewing 110.8W at full load (VRMs included). Accounting for VRM inefficiency this leaves the CPU close to the official 95W TDP limit.
http://www.hardware.fr/medias/photos_news/00/21/IMG0021456.gif
No wonder the 2.5GHz K10 Opteron is supposed to be an "SE" (i.e. higher TDP) version.
|
|
| Back to top |
|
 |
Alberto
Joined: 04 Sep 2007 Posts: 111 Location: Italy
|
Posted: Tue Nov 20, 2007 11:40 pm Post subject: |
|
|
| Blitzkrieg wrote: | Considering the 125TDP for a low speed bin I would like to know what the 3ghz samples were chewing up.
I don't think it is just the nb problem. |
A partial answer for your question:
http://www.extremetech.com/article2/0,1697,2218304,00.asp
9900 (B3 step) is a 140W cpu.
IMO faster FX cpus will are in the range of 150W ;-).
Intel can comfortably relax the TDP to have better yields and faster cpus up to....?????.
Alberto.
|
|
| Back to top |
|
 |
Gabriele Svelto
Joined: 27 Jun 2007 Posts: 290 Location: Milano, Italy
|
Posted: Wed Nov 21, 2007 9:06 am Post subject: |
|
|
| Alexko wrote: | | Are we, really? Sure, the extra memory latency doesn't help, which means that in single-socket configuration, Phenoms are at a disadvantage against C2Qs and their huge --- and fast --- L2 caches, but that doesn't explain everything. On some benchmarks, such as games, they take such a huge beating that I can't believe this latency problem is the only thing hurting Phenoms. |
Modern game code is very sensible to latency and profundly in love with large, fast caches. C2Q proves this point quite well, K10 cache subsystem is inferior to C2Q so higher memory latency has a larger impact on it. I'm not surprised games perform poorly on it, one of the reasons K8 wiped the floor with P4s in games was precisely the significantly lower memory latency. That doesn't mean K10 hasn't got other problems and everything I pointed to is speculation on my part based on data which can be wrong. However I'm fairly confident that for non-cache friendly workloads we're really seeing the higher latency dragging down K10s. That doesn't hold true for everything, naturally there are non-memory bound benchmarks were K10 still loses clock-for-clock to C2Q proving that its architecture is unable to match it in other workloads anyway.
|
|
| Back to top |
|
 |
jack
Joined: 27 Jun 2007 Posts: 358
|
Posted: Wed Nov 21, 2007 9:26 am Post subject: |
|
|
I wonder how AMD managed to create that poor L3 cache. It's small, running at a low clockspeed and has a very high latency (even when measured in cycles).
For example:
It seems that 2.4GHz Phenom will have L3 cache running at 2GHz, thus the latency will be about 19ns (according to the Tech report).
Core2 has IIRC 12 cycle L2 latency, thus the overall latency for 2.4GHz Core2 is 12/2,4 = 5ns!
Basically Phenom's L3 cache is 50% smaller and it's latency is almost four times as high. No wonder the performance is poor.
|
|
| Back to top |
|
 |
Pjotr
Joined: 06 Aug 2007 Posts: 159
|
Posted: Wed Nov 21, 2007 12:52 pm Post subject: |
|
|
| jack wrote: | | I wonder how AMD managed to create that poor L3 cache. It's small, running at a low clockspeed and has a very high latency (even when measured in cycles). |
Isn't the L3 built from the company AMD bought a few years ago, Z... something?
|
|
| Back to top |
|
 |
jack
Joined: 27 Jun 2007 Posts: 358
|
Posted: Wed Nov 21, 2007 2:16 pm Post subject: |
|
|
| Pjotr wrote: | | jack wrote: | | I wonder how AMD managed to create that poor L3 cache. It's small, running at a low clockspeed and has a very high latency (even when measured in cycles). |
Isn't the L3 built from the company AMD bought a few years ago, Z... something? |
That technology was ZRAM. As far as I known Phenom's L3 cache doesn't use it (ZRAM is supposed to give much better cache density).
|
|
| Back to top |
|
 |
Gabriele Svelto
Joined: 27 Jun 2007 Posts: 290 Location: Milano, Italy
|
Posted: Wed Nov 21, 2007 2:17 pm Post subject: |
|
|
| Pjotr wrote: | | Isn't the L3 built from the company AMD bought a few years ago, Z... something? |
AMD has licensed Z-RAM from Innovative Silicon but they didn't use yet in their products.
|
|
| Back to top |
|
 |
up
Joined: 06 Oct 2007 Posts: 38
|
Posted: Wed Nov 21, 2007 6:57 pm Post subject: |
|
|
|
|
|
| Back to top |
|
 |
who?
Joined: 01 Sep 2007 Posts: 540
|
Posted: Wed Nov 21, 2007 9:20 pm Post subject: |
|
|
Z-RAM ... the masked technology ...
you know what Z stand for?
Zorro! very soon the revenge?
who?
PS: this is humour, if you don t get it, check the TV program.
|
|
| Back to top |
|
 |
dkanter
Joined: 20 Sep 2007 Posts: 59
|
Posted: Wed Nov 21, 2007 10:19 pm Post subject: |
|
|
AMD is not using ZRAM. Nobody has currently used ZRAM in a production design.
There are a couple of issues relating to the L3 cache.
1. Relatively high access latency
2. Clock and voltage domain crossings
3. Additional memory latency
I don't know why the L3 latency is so high. Fundamentally, they should be able to hit about the same cycle times as they did on the K8's L2 caches. The main issue is power consumption. For the L3 cache, you probably want it to be somewhat slower, but more power efficient. It could be that thermal problems forced them to slow it down...
Clock crossings - these always add latency, pretty nasty. It's just an unavoidable aspect of AMD's architecture. Probably the clock ratio between the cores and uncores is a non-integer, which means there is a variable delay in access, and returning data.
Lastly, adding an L3 cache adds to overall memory access. IIRC, the latency for AMD's hierarchy is something like 3 cycles, ~10 cycles and ~23 cycles. To access memory, Barcelona first has to check the tags in the L1 and L2 caches, then send a request which crosses a clock domain, check the L3 tag, and then if it misses, finally send to the memory controller. So however long it takes to check the L3 tags is all additional memory latency...
Of course to compensate, I think the L2 latency is lower, but I don't know how long it takes to determine a miss.
DK
|
|
| Back to top |
|
 |
up
Joined: 06 Oct 2007 Posts: 38
|
Posted: Wed Nov 21, 2007 11:51 pm Post subject: |
|
|
|
|
|
| Back to top |
|
 |
|