Aceshardware

(not so) temporary home for the aceshardware community
 FAQ •  Search •  Register •  Login 
It is currently Thu Dec 17, 2009 8:39 am

All times are UTC + 1 hour



Welcome
Welcome to <strong>Aceshardware</strong>.

You are currently viewing our boards as a guest, which gives you limited access to view most discussions and access our other features. By joining our free community, you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content, and access many other special features. Registration is fast, simple, and absolutely free, so please, <a href="/profile.php?mode=register">join our community today</a>!


Post new topic Reply to topic  [ 163 posts ]  Go to page Previous  1 ... 7, 8, 9, 10, 11  Next
Author Message
 Post subject: Re: AMD to support all Intel instructions [edited]
PostPosted: Fri May 08, 2009 8:35 pm 
Offline

Joined: Tue Aug 07, 2007 11:57 am
Posts: 190
dkanter wrote:

Do you have a link to this file you can share? It's quite annoying when Intel takes down their IDF archives.

DK


http://www.cse.scitech.ac.uk/disco/mew1 ... lHoran.pdf


Regards, Hans


Top
 Profile  
 
 Post subject:
PostPosted: Wed May 20, 2009 3:39 pm 
Offline

Joined: Fri Sep 07, 2007 7:48 am
Posts: 28
Location: Berlin, Germany
Hiroshige Goto wrote an article regarding AVX, Bulldozer architecture (maybe inspired by my blog, since he quotes some of the same patent applications), new instructions and their encoding:

http://pc.watch.impress.co.jp/docs/colu ... 68661.html


Top
 Profile  
 
 Post subject: Re: AMD to support all Intel instructions [edited]
PostPosted: Wed May 20, 2009 4:16 pm 
Offline

Joined: Fri Aug 31, 2007 10:08 pm
Posts: 217
Location: Switzerland
dkanter wrote:
Eric Bron wrote:
"
Intel®AVX targets a high-performance first implementation
-256-bit Multiply, Add and Shuffle engines (2X today)
-2nd load port
"


Do you have a link to this file you can share? It's quite annoying when Intel takes down their IDF archives.

DK


alternate source for my quote :

http://pc.watch.impress.co.jp/img/pcw/d ... 3.jpg.html

thanks to Dresdenboy


Top
 Profile  
 
 Post subject:
PostPosted: Tue Jul 07, 2009 12:31 pm 
Offline

Joined: Fri Sep 07, 2007 7:48 am
Posts: 28
Location: Berlin, Germany
There appeared a few more interesting patent applications and my first take on them can be read here (as always):
http://citavia.blog.de/

They cover the multi threading aspect of Bulldozer.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jul 08, 2009 3:11 am 
Offline

Joined: Tue Aug 07, 2007 11:57 am
Posts: 190
Dresdenboy wrote:
There appeared a few more interesting patent applications and my first take on them can be read here (as always):
http://citavia.blog.de/

They cover the multi threading aspect of Bulldozer.


Thank you for posting this, very interesting. Some of these
architectural features might actually be for 2012, 2013 improved
versions of Bulldozer.

Regards, Hans


Top
 Profile  
 
 Post subject:
PostPosted: Thu Jul 09, 2009 7:56 am 
Offline

Joined: Fri Sep 07, 2007 7:48 am
Posts: 28
Location: Berlin, Germany
Hans de Vries wrote:
Dresdenboy wrote:
There appeared a few more interesting patent applications and my first take on them can be read here (as always):
http://citavia.blog.de/

They cover the multi threading aspect of Bulldozer.


Thank you for posting this, very interesting. Some of these
architectural features might actually be for 2012, 2013 improved
versions of Bulldozer.

Regards, Hans


Thanks for your comment. I will include it in my blog soon.

Do you have an idea, what all this decoding power might be good for? On P3DNow we developed an idea, because there are already several pat. app. regarding µCode caching, hierarchical µCode, safe known good code execution etc. to implement architectural changes:
For so much µCode they better have some way to decode it without limitations as in K7-K10. Optimum would be to be able to use up to all available decoders (µCode + fast path) for decoding. That's an important point in those new pat. apps., I think. They'd essentially maximize the input bandwidth to the decoders (both from i-fetch to fast path decode and from µOp code storage to µOp decode) by increasing the output bandwidth, since the paths to the clusters are already there.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Jul 09, 2009 2:15 pm 
Offline

Joined: Sun Mar 16, 2008 3:20 pm
Posts: 82
Just an additional info for those who did not read the patent(s):

The described (Bulldozer?) core has 4 fast path and 4 µCode decoders which could be used *simultaneously*.

That's a change from today's 3+1 design where either the three direct path decoders or the one and only µCode decoder are active.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Jul 16, 2009 2:06 pm 
Offline

Joined: Fri Sep 07, 2007 7:48 am
Posts: 28
Location: Berlin, Germany
I wrote about that µCode stuff on my blog today.

Currently I'm checking academic research papers regarding CMT architectures (benefits, power profile, performance profile etc.) and updating my blog again ;)


Top
 Profile  
 
 Post subject:
PostPosted: Fri Jul 17, 2009 12:01 pm 
Offline

Joined: Tue Sep 04, 2007 4:40 pm
Posts: 126
Dresdenboy wrote:
I wrote about that µCode stuff on my blog today.

Currently I'm checking academic research papers regarding CMT architectures (benefits, power profile, performance profile etc.) and updating my blog again ;)


Thanks Dresdenboy,very good and informative blogs about BD you wrote there ;).


Top
 Profile  
 
 Post subject: AVX on Sandy Bridge
PostPosted: Wed Sep 23, 2009 11:05 am 
Offline

Joined: Fri Aug 31, 2007 10:08 pm
Posts: 217
Location: Switzerland
dkanter wrote:
Thanks Eric, I owe you one!
DK


A few new public information about the 1st AVX incarnation in Sandy Bridge in the SF09_ARCS002_FIN.pdf document available here :

http://www.intel.com/idf/technology-tracks/ -> "32 nm Implementation of... " -> "See sessions within this track" -> ARCS002 PDF Icon

1) slide 6, one "AVX HIGH" unit on port 0, and "AVX LOW" on port 1 => it looks like 256-bit AVX will have the same throughput than 128-bit SSE on current cores: 2 clocks for a 256-bit vmulps/pd + a 256-bit vaddps/pd instead of 1 clock for 128-bit mulps/pd + 128-bit addps/pd (i.e. same sp/dp flops per clock with balanced add/mul), so if Sandy Bridge can't issue in the same clock a mul and an add unlike Conroe, Penryn and Nehalem it will be actually less efficient than these previous cores with a lot of legacy 128-bit SSE code ?, if it's indeed the case Agner was pretty right after all

2) slide 6, only 128-bit paths from the L1D cache to execution units (I was hoping full featured 256-bit paths), a few consequences :
- the extra load port will help as much legacy 128-bit SSE or 128-bit AVX than 256-bit AVX, same 48 B / clock maximum L1 bandwidth
- loop fission will be probably no more a good optimization if intermediate results are stored in L1D, probably better to overflow the LSD than the L1D, particularly with multiple threads fighting for L1D access
- more incentive to use 64-bit code to have 16 ymm registers instead of 8 to minimize L1D access

3) slide 54, 64 B cache lines (unchanged), so :
- align memory still important (more important than on Nehalem), 1/2 access will incur a cache line split otherwise

4) slide 58, masked moves considered harmful, replace vmaskmovps by vblendvps + vmovaps just like in legacy SSE4 code ?


Top
 Profile  
 
 Post subject: Re: AVX on Sandy Bridge
PostPosted: Wed Sep 23, 2009 2:03 pm 
Offline

Joined: Tue Aug 07, 2007 11:57 am
Posts: 190
Eric Bron wrote:
dkanter wrote:
Thanks Eric, I owe you one!
DK


A few new public information about the 1st AVX incarnation in Sandy Bridge in the SF09_ARCS002_FIN.pdf document available here :

http://www.intel.com/idf/technology-tracks/ -> "32 nm Implementation of... " -> "See sessions within this track" -> ARCS002 PDF Icon

1) slide 6, one "AVX HIGH" unit on port 0, and "AVX LOW" on port 1 => it looks like 256-bit AVX will have the same throughput than 128-bit SSE on current cores: 2 clocks for a 256-bit vmulps/pd + a 256-bit vaddps/pd instead of 1 clock for 128-bit mulps/pd + 128-bit addps/pd (i.e. same sp/dp flops per clock with balanced add/mul), so if Sandy Bridge can't issue in the same clock a mul and an add unlike Conroe, Penryn and Nehalem it will be actually less efficient than these previous cores with a lot of legacy 128-bit SSE code ?, if it's indeed the case Agner was pretty right after all

2) slide 6, only 128-bit paths from the L1D cache to execution units (I was hoping full featured 256-bit paths), a few consequences :
- the extra load port will help as much legacy 128-bit SSE or 128-bit AVX than 256-bit AVX, same 48 B / clock maximum L1 bandwidth
- loop fission will be probably no more a good optimization if intermediate results are stored in L1D, probably better to overflow the LSD than the L1D, particularly with multiple threads fighting for L1D access
- more incentive to use 64-bit code to have 16 ymm registers instead of 8 to minimize L1D access

3) slide 54, 64 B cache lines (unchanged), so :
- align memory still important (more important than on Nehalem), 1/2 access will incur a cache line split otherwise

4) slide 58, masked moves considered harmful, replace vmaskmovps by vblendvps + vmovaps just like in legacy SSE4 code ?


That would correspond with this below here which doesn't show any doubled FP resources.

Image

Regards, Hans


Top
 Profile  
 
 Post subject: Re: AVX on Sandy Bridge
PostPosted: Wed Sep 23, 2009 5:53 pm 
Offline

Joined: Fri Aug 31, 2007 10:08 pm
Posts: 217
Location: Switzerland
Hans de Vries wrote:
That would correspond with this below here which doesn't show any doubled FP resources.
Image
Regards, Hans


thanks Hans, FYI I try to undertand it further here : http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/68554/


Top
 Profile  
 
 Post subject: Re: Intel AVX kills AMD SSE5
PostPosted: Wed Sep 23, 2009 7:23 pm 
Offline

Joined: Wed Sep 23, 2009 7:20 pm
Posts: 1
Response here ... http://software.intel.com/en-us/forums/ ... e/1/#96178

Regards,
Mark


Top
 Profile E-mail  
 
 Post subject: Re: AVX on Sandy Bridge
PostPosted: Wed Sep 23, 2009 7:28 pm 
Offline

Joined: Wed Aug 08, 2007 1:43 pm
Posts: 10
Hans de Vries wrote:

That would correspond with this below here which doesn't show any doubled FP resources.

Image

Regards, Hans


So Hans is wrong again... What a surprise!


Top
 Profile  
 
 Post subject: Westmere wafer
PostPosted: Wed Sep 23, 2009 10:53 pm 
Offline

Joined: Wed Aug 22, 2007 9:24 am
Posts: 28
for further die analysis: a Westmere wafer close-up.

ftp://download.intel.com/pressroom/kits/events/idffall_2009/images/Westmere_wafer.jpg


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 163 posts ]  Go to page Previous  1 ... 7, 8, 9, 10, 11  Next

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: