Aceshardware

(not so) temporary home for the aceshardware community
 FAQ •  Search •  Register •  Login 
It is currently Thu Dec 17, 2009 8:28 am

All times are UTC + 1 hour



Welcome
Welcome to <strong>Aceshardware</strong>.

You are currently viewing our boards as a guest, which gives you limited access to view most discussions and access our other features. By joining our free community, you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content, and access many other special features. Registration is fast, simple, and absolutely free, so please, <a href="/profile.php?mode=register">join our community today</a>!


Post new topic Reply to topic  [ 136 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7, 8 ... 10  Next
Author Message
 Post subject:
PostPosted: Mon Mar 17, 2008 1:01 pm 
Offline

Joined: Mon Jul 23, 2007 2:44 pm
Posts: 267
Location: Belgium
ajensen wrote:
Isn't x86 becoming incredibly convoluted now? Intel and AMD battling to inject the most instructions and larrabee coming with a subset of x86 instructions. I guess Atom will only support a subset too pluss the various AMD subset implementations. And all this to get data parallel performance boosts that is just as likely to eventually come from other circuitry than gp cores.

I wouldn't be surprised if either AMD or Intel came out with chips codenamed Nimrod or Tower.


I agree: it is a pure waste of money.. Just consider how much money gets lost because VMWare has to work around this (Vmotion). And this for a few percentage gain in a few applications that will only appear in only a few years from now. AMD, Intel stop this right now.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 17, 2008 2:23 pm 
Offline

Joined: Wed Jun 27, 2007 10:19 am
Posts: 331
Location: Milano, Italy
Johan wrote:
I agree: it is a pure waste of money.. Just consider how much money gets lost because VMWare has to work around this (Vmotion). And this for a few percentage gain in a few applications that will only appear in only a few years from now. AMD, Intel stop this right now.

Actually this isn't much different from what has been happening in other ISAs. ARM has been leading the way in confusing & incompatible ISA extensions, POWER has four distinct set of vector extensions depending on the target market (AltiVec, VMX128, BlueGene paired FP instructions and the 750CL/Gecko paired FP instructions), etc... Actually on x86 this has been less painful because of the need for backwards compatibility but since the extensions after SSE2 are largely left unused I think we're going to see more CPUs supporting-this-but-not-that-and-part-of-something-else. I agree that this sucks and actually turns away developers from providing support for this mess.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 17, 2008 2:37 pm 
Offline

Joined: Sat Sep 01, 2007 4:11 pm
Posts: 173
Are Vmware planing to support Vmotion across different CPU architectures? I got the feeling that you need the same stepping or all bets are off.

Haven't tried Vmotion yet though, but should be running soon. The sales guy just shook his head sadly when asked about AMD+Vmotion. We weren't really planing on AMD anyway for this setup so I didn't dig into it.

Shouldn't there be some conspiracy theories as to why x86 is being obfuscated so much by its guardians?


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 17, 2008 3:38 pm 
Offline

Joined: Tue Aug 07, 2007 11:57 am
Posts: 190
The never ending instruction set extension mess.... With an unknown
amount of behind the scene quarreling between Microsoft, Intel and AMD

These are Intel's official ISA extensions.

45nm - Penryn: SSE4.1 - media acceleration
45nm - Nehalem: SSE4.2 - string and text processing.
32nm - Westmere: AES NI - encryption

One could sense a tendency to avoid or delay the use of the term "SSE5" ?

SSE5 showed up first in Microsoft documents about it's Phoenix future
compiler project.

http://www.msakademik.net/academicdays2 ... tazavi.ppt
http://research.microsoft.com/phoenix/

Then (Aug 2007) AMD announced SSE5 support for future processors.

SSE5 is mostly fused Multiply/Add-Accumulate, but it is also AES
encryption like in Intel's Westmere, The 32nm shrink of Nehalem.

Intel claims a 3x AES performance improvement for Westmere with AES-NI
AMD claims a 5x AES performance improvement due to SSE5 for
Bulldozer (or with future 45nm Phenom's now??)

The question is if Microsoft is trying to 'promote' SSE5 to Intel and
if Intel is playing hard-to-get in order to constrain (developer)
ambitions to manageable proportions.

The rest of the developer community must hope that any behind the
scene negotiations do lead to convergence. The ironical thing is that all
these extensions are much better done by ASIC's or FPGA's.

Standardized High level algorithmic interfaces for commonly used special
purpose functions are of much more value as low level instruction set
extensions, and they can be implemented in micro-code as well using
non-documented, non-legacy-polluting instruction set extensions
which can be changed/improved going from one generation to the other.

Or better, if gate budgets allows, delegated to special purpose high
level units, like we have now for video encoding and 3d graphics.



Regards, Hans.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 17, 2008 5:26 pm 
Offline

Joined: Sat Sep 01, 2007 8:01 am
Posts: 652
HenryWince wrote:
shank15217 wrote:
SSE4a is a subset of SSE4


Worse: AMD SSE4a doesn't intersect with Intel SSE4 at all.

BTW. I also don't understand why Intel splitted SSE4 into SSE4.1 and SSE4.2 ....


some of the SSE4.2 instructions only make sense for the next gen.
You have to understand that instruction set very often reflect a change or addition of architecture. In the case of SSE4, the super shuffle is the main deal.
And you are correct, SSE4a has nothing to do with SSE4, neither 4.1 or 4.2. It is an attempt to make consumer believe they get SSE4.

When we did SSSE3, it was too small to do a major number release, so, we called it "supplemental". I am not a fan of the naming convention for the instruction set, but the guys doing this follow a set of rules, to avoid problem in the shop.

I can't speak too much about the next gen here, obviously, and those who does should be ashamed by what they are doing (NDA material), but I can tell you that there is not big mess about the instruction set coming, everything is logical and nicely uniform, as always.

Then, about AMD-SSE5, well, time will tell, but if it is as used as 3Dnow or SSE4a, I can sleep well.

who?
this is my personal opinion.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 17, 2008 5:48 pm 
Offline

Joined: Sat Sep 01, 2007 8:01 am
Posts: 652
Gabriele Svelto wrote:
Johan wrote:
I agree: it is a pure waste of money.. Just consider how much money gets lost because VMWare has to work around this (Vmotion). And this for a few percentage gain in a few applications that will only appear in only a few years from now. AMD, Intel stop this right now.

Actually this isn't much different from what has been happening in other ISAs. ARM has been leading the way in confusing & incompatible ISA extensions, POWER has four distinct set of vector extensions depending on the target market (AltiVec, VMX128, BlueGene paired FP instructions and the 750CL/Gecko paired FP instructions), etc... Actually on x86 this has been less painful because of the need for backwards compatibility but since the extensions after SSE2 are largely left unused I think we're going to see more CPUs supporting-this-but-not-that-and-part-of-something-else. I agree that this sucks and actually turns away developers from providing support for this mess.




Try to find a video decoder not using SSE2, it is fairly hard. try to find a video encoder not using SSE3, it is very difficult: LDDQU is very nice for the unaligned loads of the motion estimation, the CPU can deal with this now, but it is still a good help at the micro-code level.

SSE4.1, with the dot products, DPPS (DPPD etc ...) was asked by the video game programmers for years, now, they have it, and they are using it. It takes from 12 to 18 months to get to the market.

Saying that nobody uses the instruction since SSE2 is not right, Intel promote less the instruction sets with marketing, so, you see less about it, but it is still happening in the back ground, as before.

One of the reason of the decrease of visibility is threading. Programmers all around the world understood that we are serious about going many cores, and we are not joking.
In the case of the modern video encoder, custom hardware can't really get close to a good instruction set. I am working on one encoder, this encoder is pretty famous, and it does check for faces in the encoding phase, to put more bits where it matters, using HMM systems. This can only be done with instruction set, if you try to design custom hardware for all of those cases, you ll finish with a dice as big as a stadium.
In the case of the video encoder i described, you can encode with half of the bandwitch of other encoders, and you still get the same level of quality.

I don't know why you guys call that a mess, it is actually very well organized if you understand the logic of it. Did you spend enough time to look at it?
Separate the MMX like and SIMD FP kind, few management systems related instructions and that 's it. no big mess ...

What make it a big mess for you?

who?
this is my personal opinion


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 17, 2008 7:59 pm 
Offline

Joined: Sat Sep 01, 2007 8:01 am
Posts: 652
Hans de Vries wrote:
The never ending instruction set extension mess.... With an unknown
amount of behind the scene quarreling between Microsoft, Intel and AMD

These are Intel's official ISA extensions.

45nm - Penryn: SSE4.1 - media acceleration
45nm - Nehalem: SSE4.2 - string and text processing.
32nm - Westmere: AES NI - encryption

One could sense a tendency to avoid or delay the use of the term "SSE5" ?

SSE5 showed up first in Microsoft documents about it's Phoenix future
compiler project.

http://www.msakademik.net/academicdays2 ... tazavi.ppt
http://research.microsoft.com/phoenix/

Then (Aug 2007) AMD announced SSE5 support for future processors.

SSE5 is mostly fused Multiply/Add-Accumulate, but it is also AES
encryption like in Intel's Westmere, The 32nm shrink of Nehalem.

Intel claims a 3x AES performance improvement for Westmere with AES-NI
AMD claims a 5x AES performance improvement due to SSE5 for
Bulldozer (or with future 45nm Phenom's now??)

The question is if Microsoft is trying to 'promote' SSE5 to Intel and
if Intel is playing hard-to-get in order to constrain (developer)
ambitions to manageable proportions.

The rest of the developer community must hope that any behind the
scene negotiations do lead to convergence. The ironical thing is that all
these extensions are much better done by ASIC's or FPGA's.

Standardized High level algorithmic interfaces for commonly used special
purpose functions are of much more value as low level instruction set
extensions, and they can be implemented in micro-code as well using
non-documented, non-legacy-polluting instruction set extensions
which can be changed/improved going from one generation to the other.

Or better, if gate budgets allows, delegated to special purpose high
level units, like we have now for video encoding and 3d graphics.



Regards, Hans.


one of the issue with AMD designing an instruction set is their short minded approche. For example, http://developer.amd.com/assets/Develop_Brighton_Justin_Boggs-2.pdf
in this document, they explain you that you got to code with the /Blend flags and:
For best performance on 128-bit SSE:
– Replace MOVLPD / MOVHPD pairs with MOVUPD or MOVDDUP
– Replace MOVLPD-mem with MOVSD-mem (upper 64 bits are zeroed)
– Replace MOVSD-reg with MOVAPD
– Take advantage of misaligned load-op mode (special code path required)

What they do not explain you , it is that for 3 years, they avocated to split the 128bits intructions in 2 64bits, going against the natural evolution of the platform:
In 2005, "The MOVUPS, MOVUPD and MOVDQU instructions are VectorPath when one of the operands is a
memory location. It is better to use one of the MOVLPx/MOVHPx or MOVQ/MOVHPD pairs. It is
prefereable to load or store the 64-bit halves of an XMM register separately when the memory
location cannot be guaranteed to be aligned."
Page 198 of http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF

They changed their mind many times like this, making it very confusing for the little number of 3DNow followers. I got hundreds of complains about this in the 64bits mode.

When you do an instruction set, you got to follow logic, and the natural flow of technology. This little accident explain the lack of performance of Barcelona on x86-64 code. you don t get the benefit of the 128bits. If they saw a little further, they could have avoid it.

Where I work, we have a people checking for this kind of accident before they happen in the instruction set.

who?
this is my personal opinion.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 17, 2008 8:57 pm 
Offline

Joined: Sat Sep 01, 2007 8:01 am
Posts: 652
speaking of new Instructions, and new architecture, you probably want to take a look at this:

http://www.intel.com/pressroom/archive/releases/20080317fact.htm

Who?


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 17, 2008 10:32 pm 
Offline

Joined: Wed Jun 27, 2007 10:19 am
Posts: 331
Location: Milano, Italy
who? wrote:
Try to find a video decoder not using SSE2, it is fairly hard.

Did you actually read what I wrote? I didn't say that SSE2 is not used, I said that SSE2 *is* used, especially since it is mandated by x86-64. What I said is that the stuff introduced after is pretty much absent.

Quote:
try to find a video encoder not using SSE3, it is very difficult: LDDQU is very nice for the unaligned loads of the motion estimation, the CPU can deal with this now, but it is still a good help at the micro-code level.

I have disassembled the whole contents of /usr/bin and /usr/lib of my Fedora 8 installation (x86-64 naturally). Not a single trace of SSE3+ instructions and I've got all of the video/audio codecs installed.

Quote:
SSE4.1, with the dot products, DPPS (DPPD etc ...) was asked by the video game programmers for years, now, they have it, and they are using it. It takes from 12 to 18 months to get to the market.

What would you use an inner dot-product for? What would be the benefit? Optimized game engines already pack data in SoA format because it is better suited to both SSE1/2 vectorization *and* for sending it to the GPU under both DirectX and OpenGL. That is if the processor touches the geometry, something which happens less and less considering the flexibility of the latest shader models for vertex shading (and yes, that also includes DX9).

Quote:
Saying that nobody uses the instruction since SSE2 is not right, Intel promote less the instruction sets with marketing, so, you see less about it, but it is still happening in the back ground, as before.

You might be right, it is not true that nobody uses stuff above SSE2. There are a couple of benchmarks around which do use that kind of stuff, but you know, computers usually don't spend most of the time running benchmarks.

Quote:
In the case of the modern video encoder, custom hardware can't really get close to a good instruction set. I am working on one encoder, this encoder is pretty famous, and it does check for faces in the encoding phase, to put more bits where it matters, using HMM systems. This can only be done with instruction set, if you try to design custom hardware for all of those cases, you ll finish with a dice as big as a stadium.

You are joking right? Are you aware that most of the recent high-end mobile phone oriented SoCs do offer hardware acceleration for encoding/decoding HD streams in various formats using less area and power compared to any software solution around?

Quote:
I don't know why you guys call that a mess, it is actually very well organized if you understand the logic of it. Did you spend enough time to look at it?
Separate the MMX like and SIMD FP kind, few management systems related instructions and that 's it. no big mess ...

Oh, sure, keeping and testing 5 or 6 different code paths in a large application is not a mess.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 17, 2008 10:47 pm 
Offline

Joined: Sat Sep 01, 2007 8:01 am
Posts: 652
Gabriele Svelto wrote:
who? wrote:
Try to find a video decoder not using SSE2, it is fairly hard.

Did you actually read what I wrote? I didn't say that SSE2 is not used, I said that SSE2 *is* used, especially since it is mandated by x86-64. What I said is that the stuff introduced after is pretty much absent.

Quote:
try to find a video encoder not using SSE3, it is very difficult: LDDQU is very nice for the unaligned loads of the motion estimation, the CPU can deal with this now, but it is still a good help at the micro-code level.

I have disassembled the whole contents of /usr/bin and /usr/lib of my Fedora 8 installation (x86-64 naturally). Not a single trace of SSE3+ instructions and I've got all of the video/audio codecs installed.

Quote:
SSE4.1, with the dot products, DPPS (DPPD etc ...) was asked by the video game programmers for years, now, they have it, and they are using it. It takes from 12 to 18 months to get to the market.

What would you use an inner dot-product for? What would be the benefit? Optimized game engines already pack data in SoA format because it is better suited to both SSE1/2 vectorization *and* for sending it to the GPU under both DirectX and OpenGL. That is if the processor touches the geometry, something which happens less and less considering the flexibility of the latest shader models for vertex shading (and yes, that also includes DX9).

Quote:
Saying that nobody uses the instruction since SSE2 is not right, Intel promote less the instruction sets with marketing, so, you see less about it, but it is still happening in the back ground, as before.

You might be right, it is not true that nobody uses stuff above SSE2. There are a couple of benchmarks around which do use that kind of stuff, but you know, computers usually don't spend most of the time running benchmarks.

Quote:
In the case of the modern video encoder, custom hardware can't really get close to a good instruction set. I am working on one encoder, this encoder is pretty famous, and it does check for faces in the encoding phase, to put more bits where it matters, using HMM systems. This can only be done with instruction set, if you try to design custom hardware for all of those cases, you ll finish with a dice as big as a stadium.

You are joking right? Are you aware that most of the recent high-end mobile phone oriented SoCs do offer hardware acceleration for encoding/decoding HD streams in various formats using less area and power compared to any software solution around?

Quote:
I don't know why you guys call that a mess, it is actually very well organized if you understand the logic of it. Did you spend enough time to look at it?
Separate the MMX like and SIMD FP kind, few management systems related instructions and that 's it. no big mess ...

Oh, sure, keeping and testing 5 or 6 different code paths in a large application is not a mess.


As usual, you speak of your little lalaland and make generality of it:
Check DivX and Windows Media for LDDQU, together they represent more than half the video codec market. Your other comments are not any better.
Please give me a game that use SoA .. . Please!!! lol! most of them use Array of Structure, not structure of Array dude! I guess, you never ever putted your hand on any of the game code, and if you did, please tell me witch one, because I think you did not. Taking about stuff without knowing again? I know for sure many of those engines, I work with them almost every day.

For your cellphone , well, look at the result of the video encoded, and the bit rate ... lol!

[EDIT]
I forgot, Structure of Array is a stupid system, it gives you something like
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
(here you open a lot of memory channels ... load port don t like it!)
in the memory,
Array of structure is xyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyz
Better for Memory port, but bad for SIMD.

The solution is SOS (Thanks AlexK) (Structure of Structure)
XXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZ

This give you perfect data locality, very good for load ports, and awesome for SIMD...

At least, you learned something today :)
[/EDIT]

who?
PS: I copied your posting style, hirritating , isn't it?


Top
 Profile  
 
 Post subject:
PostPosted: Tue Mar 18, 2008 12:40 am 
Offline

Joined: Sun Mar 16, 2008 3:20 pm
Posts: 82
who? wrote:
speaking of new Instructions, and new architecture, you probably want to take a look at this:

http://www.intel.com/pressroom/archive/releases/20080317fact.htm
Thanks for the link, I guess Hans will like the high-res Nehalem die picture ;-)

In the pdfs one can also find hard Nehalem info now:
32+32kb L1
256L2
8MB L3
2level TLB


Seems like "the mess" is going on .. Intel's introducing 3 operand 256bit instructions in 2010, and AMD's introducing 128bit 3 Operand instructions end of 2008, if the PCGH information is correct.

Grrrrreat ....


To your statement about AMD's short minded instruction sets, I really like your sentence here:
Quote:
Where I work, we have a people checking for this kind of accident before they happen in the instruction set.
Seems like you work for VIA then ...

Intel's general optimization guideline is full of advices like: On Core architecture do this, but that will degenerate performance on Netburst CPUs, use instead that ...

CPU architectures evolve, that's not new here. Intel introduced 128bit SSE with Core2 CPUs(named: "Digital media boost"), AMD did it now with Barcelona, thus the old implementations are not 100% perfect, business as usual.

The performance deficits of Barcelona are in my opinion due to "old" software which does not know the K10. It just recognize an AMD CPU and chooses the 3dnow/x87 code-path. Good idea before K10, bad idea now, because the 128bit wont be used then :( ..

Another thing ... which i just recognized (probably already a very old joke, sorry if anybody is bored by this): What is the "Core architecture" ?

I really had a good laugh: "The Core CPUs are not part of Intel's Core architecture."

cheers

Opteron


Top
 Profile  
 
 Post subject:
PostPosted: Tue Mar 18, 2008 1:51 am 
Offline

Joined: Fri Aug 17, 2007 2:55 pm
Posts: 357
Quote:
The performance deficits of Barcelona are in my opinion due to "old" software which does not know the K10. It just recognize an AMD CPU and chooses the 3dnow/x87 code-path. Good idea before K10, bad idea now, because the 128bit wont be used then :( ..


No performance sensitive code is going to fail to use SSE2 on K10s. There is a reason why there is a feature flag.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Mar 18, 2008 8:02 am 
Offline

Joined: Fri Aug 31, 2007 10:08 pm
Posts: 217
Location: Switzerland
who? wrote:

XXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZ
[/EDIT]



yeah sure, that's a very good layout to port from 128-bit vectors to 256-bit vectors, all structures mired in a 4*FP32 static layout, wow really, great idea !


Top
 Profile  
 
 Post subject:
PostPosted: Tue Mar 18, 2008 9:29 am 
Offline

Joined: Tue Aug 07, 2007 11:57 am
Posts: 190
Opteron wrote:
Thanks for the link, I guess Hans will like the high-res Nehalem die picture ;-) cheers Opteron


This one is better now. The Tripple DDR3 shows up clearly. The "one-row
DDR3-I/O-die" must have been a photoshop cut rather then a real die.
one row DDR3 "die"

Regards, Hans

Image


Top
 Profile  
 
 Post subject:
PostPosted: Tue Mar 18, 2008 9:58 am 
Offline

Joined: Wed Jun 27, 2007 10:19 am
Posts: 331
Location: Milano, Italy
who? wrote:
As usual, you speak of your little lalaland and make generality of it:
Check DivX and Windows Media for LDDQU, together they represent more than half the video codec market.

My lalaland is a fairly popular Linux distro, BTW I did the same thing here at work with Debian and no SSE3 code either, which means that there isn't any in Ubuntu too. But I guess that your employer doesn't care about Linux, does it? Oh and could you care to point to some hard data proving your claim that DivX and WMV (using which codec? and which version?) represent half of the market (what market? Stuff downloaded from P2P networks?).

Quote:
Your other comments are not any better.
Please give me a game that use SoA .. . Please!!! lol! most of them use Array of Structure, not structure of Array dude!
I guess, you never ever putted your hand on any of the game code, and if you did, please tell me witch one, because I think you did not. Taking about stuff without knowing again? I know for sure many of those engines, I work with them almost every day.

Sure you do. And how exactly do engines using SSEx deal with arrays of structures? They unpack the vertiexes (and their attributes, something you curiously forgot) every time they have to deal with them?

Quote:
For your cellphone , well, look at the result of the video encoded, and the bit rate ... lol!

LOL? You should pay attention to what you say, recent SoCs are capable of encoding 720p HD video and that's what is excepted from them since they will not be used anymore as 'just' phones. And your company is quite aware of it even if you aren't.

Quote:
I forgot, Structure of Array is a stupid system

That 'stupid system' is warmly recommended by your employer optimization manuals.
Quote:
it gives you something like
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
(here you open a lot of memory channels ... load port don t like it!)
in the memory,

That's why your company designs processor with multi-way set-associative caches and multiple memory prefetchers in case you were wondering about those.

Quote:
Array of structure is xyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyzxyz
Better for Memory port, but bad for SIMD.

The solution is SOS (Thanks AlexK) (Structure of Structure)
XXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZXXXXYYYYZZZZ

This give you perfect data locality, very good for load ports, and awesome for SIMD...

Eric already pointed out how wonderful it is to assume the size of your vectors (what about your comments on optimizing for AMD processors? If it is so bad to choose a granularity of 64-bit why choose 128-bit when you know you're going to port it to 256?). Oh, and finally how does your SoS structure can be used with inner-vector dot products because that was what we were talking about? How do you use those instructions with SoS? Do you de-interleave your data every time you use it? Vertex data can also be 8- or 16-bit integers, how do you interleave those? Do you throw away the potential memory savings and extend them to 32-bit? And in which part of your 3D pipeline you use data formatted that way considering that most of the grunt work on vertexes today is done on the GPU *anyway*.

Quote:
PS: I copied your posting style, hirritating , isn't it?

I usually answer to your post by addressing every point you make, you quoted the whole text of my post and didn't address many of the points I raised so I don't understand exactly what did you copy? Is coping with 5 or 6 different code paths easy? What are inner-vector dot-products used for since the way you store data in memory to use them prevents vectorization? Oh, and the word you were looking for is irritating. Without an 'h'.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 136 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7, 8 ... 10  Next

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron