Aceshardware Forum Index Aceshardware
(not so) temporary home for the aceshardware community
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups    RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Phenom review is available
Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8  Next
 
Post new topic   Reply to topic    Aceshardware Forum Index -> General forum
View previous topic :: View next topic  
Author Message
Michael Westman



Joined: 27 Jun 2007
Posts: 24
Location: Amsterdam

PostPosted: Tue Nov 20, 2007 3:43 pm    Post subject: Re: Phenom review is available Reply with quote

HighTech4US wrote:
jack wrote:
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=3153
Very sad situation for the AMD and the industry

Have the engineers, who charlie saw dancing in the isles, been fired yet?

If this was what they were dancing about they should be gone.


Considering this is your first post here: Did you by any chance consider the fact that K10 clocked quite well and that might have been the reason why they where dancing? (The L3 TLB bug was not discovered until recently) and AFAIK this bug is the reason why we have 2.3 GHz versions today...

If you have some inside information why don't you share with the rest of us?
Back to top
View user's profile Send private message
up



Joined: 06 Oct 2007
Posts: 38

PostPosted: Tue Nov 20, 2007 4:17 pm    Post subject: Reply with quote

Rumor says, the "L3 TLB bug" took the phenom-crew by surprise in warsaw, so there was no time to do anything better instantly...
Back to top
View user's profile Send private message
Blitzkrieg



Joined: 31 Jul 2007
Posts: 64
Location: New Zealand

PostPosted: Tue Nov 20, 2007 7:35 pm    Post subject: Reply with quote

Considering the 125TDP for a low speed bin I would like to know what the 3ghz samples were chewing up.
I don't think it is just the nb problem.
Back to top
View user's profile Send private message
Alexko



Joined: 20 Sep 2007
Posts: 18

PostPosted: Tue Nov 20, 2007 8:35 pm    Post subject: Reply with quote

Gabriele Svelto wrote:
Michael Westman wrote:
According to tech-report the L3 cache latency is ~ 23ns when the northbridge is running at 1.8GHz (CPU @ 2.0Ghz) and ~19ns when the NB is running at 2.0GHz (CPU @ 2.5Ghz).

http://techreport.com/articles.x/13176/3

That would be the load to use latency IIRC.

Tech ARP says the latency in cycles should be less then 38 cycles.

http://www.techarp.com/showarticle.aspx?artno=424&pgno=2

http://www.techreport.com/articles.x/13633/4

In the phenom review with the NB running at 2.0GHz the L3 latency seem to be ~22ns. http://www.techreport.com/articles.x/13633/4

Yeah, that's why I said that it is not possible that the added memory latency comes from it. If the full load-to-use latency of the L3 cache is ~20ns then the time needed to check the tags (and thus initiate a memory request on a miss) should be a fraction of that. If theInq's article on a problem regarding TLB & L3 cache interaction is true it might just be possible that in order to fix it memory requests have to go through a longer delay before being issued.

That's all speculation on my part obviously but I think we're starting to have enough data points to understand why K10 is making such a poor showing (lack of high-cloked parts asides).


Are we, really? Sure, the extra memory latency doesn't help, which means that in single-socket configuration, Phenoms are at a disadvantage against C2Qs and their huge --- and fast --- L2 caches, but that doesn't explain everything. On some benchmarks, such as games, they take such a huge beating that I can't believe this latency problem is the only thing hurting Phenoms.
Back to top
View user's profile Send private message Send e-mail MSN Messenger
lux_interior



Joined: 26 Jul 2007
Posts: 252

PostPosted: Tue Nov 20, 2007 11:11 pm    Post subject: Reply with quote

Blitzkrieg wrote:
Considering the 125TDP for a low speed bin I would like to know what the 3ghz samples were chewing up.
I don't think it is just the nb problem.


Agreed. Hardware.fr shows the Phenom 9600 (2.3 GHz) chewing 110.8W at full load (VRMs included). Accounting for VRM inefficiency this leaves the CPU close to the official 95W TDP limit.
http://www.hardware.fr/medias/photos_news/00/21/IMG0021456.gif

No wonder the 2.5GHz K10 Opteron is supposed to be an "SE" (i.e. higher TDP) version.
Back to top
View user's profile Send private message
Alberto



Joined: 04 Sep 2007
Posts: 111
Location: Italy

PostPosted: Tue Nov 20, 2007 11:40 pm    Post subject: Reply with quote

Blitzkrieg wrote:
Considering the 125TDP for a low speed bin I would like to know what the 3ghz samples were chewing up.
I don't think it is just the nb problem.


A partial answer for your question:

http://www.extremetech.com/article2/0,1697,2218304,00.asp

9900 (B3 step) is a 140W cpu.

IMO faster FX cpus will are in the range of 150W ;-).

Intel can comfortably relax the TDP to have better yields and faster cpus up to....?????.

Alberto.
Back to top
View user's profile Send private message
Gabriele Svelto



Joined: 27 Jun 2007
Posts: 290
Location: Milano, Italy

PostPosted: Wed Nov 21, 2007 9:06 am    Post subject: Reply with quote

Alexko wrote:
Are we, really? Sure, the extra memory latency doesn't help, which means that in single-socket configuration, Phenoms are at a disadvantage against C2Qs and their huge --- and fast --- L2 caches, but that doesn't explain everything. On some benchmarks, such as games, they take such a huge beating that I can't believe this latency problem is the only thing hurting Phenoms.

Modern game code is very sensible to latency and profundly in love with large, fast caches. C2Q proves this point quite well, K10 cache subsystem is inferior to C2Q so higher memory latency has a larger impact on it. I'm not surprised games perform poorly on it, one of the reasons K8 wiped the floor with P4s in games was precisely the significantly lower memory latency. That doesn't mean K10 hasn't got other problems and everything I pointed to is speculation on my part based on data which can be wrong. However I'm fairly confident that for non-cache friendly workloads we're really seeing the higher latency dragging down K10s. That doesn't hold true for everything, naturally there are non-memory bound benchmarks were K10 still loses clock-for-clock to C2Q proving that its architecture is unable to match it in other workloads anyway.
Back to top
View user's profile Send private message
jack



Joined: 27 Jun 2007
Posts: 358

PostPosted: Wed Nov 21, 2007 9:26 am    Post subject: Reply with quote

I wonder how AMD managed to create that poor L3 cache. It's small, running at a low clockspeed and has a very high latency (even when measured in cycles).

For example:
It seems that 2.4GHz Phenom will have L3 cache running at 2GHz, thus the latency will be about 19ns (according to the Tech report).

Core2 has IIRC 12 cycle L2 latency, thus the overall latency for 2.4GHz Core2 is 12/2,4 = 5ns!

Basically Phenom's L3 cache is 50% smaller and it's latency is almost four times as high. No wonder the performance is poor.
Back to top
View user's profile Send private message
Pjotr



Joined: 06 Aug 2007
Posts: 159

PostPosted: Wed Nov 21, 2007 12:52 pm    Post subject: Reply with quote

jack wrote:
I wonder how AMD managed to create that poor L3 cache. It's small, running at a low clockspeed and has a very high latency (even when measured in cycles).


Isn't the L3 built from the company AMD bought a few years ago, Z... something?
Back to top
View user's profile Send private message
jack



Joined: 27 Jun 2007
Posts: 358

PostPosted: Wed Nov 21, 2007 2:16 pm    Post subject: Reply with quote

Pjotr wrote:
jack wrote:
I wonder how AMD managed to create that poor L3 cache. It's small, running at a low clockspeed and has a very high latency (even when measured in cycles).


Isn't the L3 built from the company AMD bought a few years ago, Z... something?


That technology was ZRAM. As far as I known Phenom's L3 cache doesn't use it (ZRAM is supposed to give much better cache density).
Back to top
View user's profile Send private message
Gabriele Svelto



Joined: 27 Jun 2007
Posts: 290
Location: Milano, Italy

PostPosted: Wed Nov 21, 2007 2:17 pm    Post subject: Reply with quote

Pjotr wrote:
Isn't the L3 built from the company AMD bought a few years ago, Z... something?

AMD has licensed Z-RAM from Innovative Silicon but they didn't use yet in their products.
Back to top
View user's profile Send private message
up



Joined: 06 Oct 2007
Posts: 38

PostPosted: Wed Nov 21, 2007 6:57 pm    Post subject: Reply with quote

Nice read about zram!
Interview by MS
http://www.lostcircuits.com/memory/zram/
Back to top
View user's profile Send private message
who?



Joined: 01 Sep 2007
Posts: 540

PostPosted: Wed Nov 21, 2007 9:20 pm    Post subject: Reply with quote

up wrote:
Nice read about zram!
Interview by MS
http://www.lostcircuits.com/memory/zram/


Z-RAM ... the masked technology ...
you know what Z stand for?

Zorro! very soon the revenge?


who?
PS: this is humour, if you don t get it, check the TV program.
Back to top
View user's profile Send private message
dkanter



Joined: 20 Sep 2007
Posts: 59

PostPosted: Wed Nov 21, 2007 10:19 pm    Post subject: Reply with quote

AMD is not using ZRAM. Nobody has currently used ZRAM in a production design.

There are a couple of issues relating to the L3 cache.

1. Relatively high access latency
2. Clock and voltage domain crossings
3. Additional memory latency

I don't know why the L3 latency is so high. Fundamentally, they should be able to hit about the same cycle times as they did on the K8's L2 caches. The main issue is power consumption. For the L3 cache, you probably want it to be somewhat slower, but more power efficient. It could be that thermal problems forced them to slow it down...

Clock crossings - these always add latency, pretty nasty. It's just an unavoidable aspect of AMD's architecture. Probably the clock ratio between the cores and uncores is a non-integer, which means there is a variable delay in access, and returning data.

Lastly, adding an L3 cache adds to overall memory access. IIRC, the latency for AMD's hierarchy is something like 3 cycles, ~10 cycles and ~23 cycles. To access memory, Barcelona first has to check the tags in the L1 and L2 caches, then send a request which crosses a clock domain, check the L3 tag, and then if it misses, finally send to the memory controller. So however long it takes to check the L3 tags is all additional memory latency...

Of course to compensate, I think the L2 latency is lower, but I don't know how long it takes to determine a miss.

DK
Back to top
View user's profile Send private message
up



Joined: 06 Oct 2007
Posts: 38

PostPosted: Wed Nov 21, 2007 11:51 pm    Post subject: Reply with quote

Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Aceshardware Forum Index -> General forum All times are GMT + 1 Hour
Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8  Next
Page 4 of 8   

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB
Hosted by FreeForums.org