Aceshardware Forum Index Aceshardware
(not so) temporary home for the aceshardware community
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups    RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Nvidia's CUDA
Goto page Previous  1, 2, 3, 4, 5
 
Post new topic   Reply to topic    Aceshardware Forum Index -> General forum
View previous topic :: View next topic  
Author Message
TacoBell



Joined: 17 Aug 2007
Posts: 266

PostPosted: Wed Jul 09, 2008 11:12 am    Post subject: Reply with quote

Quote:
Its certainly not video encoding.


Video encoding is exactly the kind of task that can be trivially parallelized across N cores. There are already products that will split the encode of a large job across N computers where N is large and can generate a bit-wise identical encode to doign the same thing on a single computer (i.e loss in compression efficiency) by presearching for keyframes (e.g. fade to black or a cut).

CUDA or Larabee or whatever would allow decent quality video files to be stored and for near real time conversion for moving these files onto a portable media player.
Back to top
View user's profile Send private message
who?



Joined: 01 Sep 2007
Posts: 531

PostPosted: Mon Jul 14, 2008 12:12 am    Post subject: Reply with quote

TacoBell wrote:
Quote:
Its certainly not video encoding.


Video encoding is exactly the kind of task that can be trivially parallelized across N cores.


well, video encoding is not as parrallel as you may think ... when you use N threads to video encode, you slice the encoding, and this generate n number of block matched buffer. after you are doing with the motion estimation, you have to recombile the n list of macroblock matches into one single list, and match the double/many detected blocks...
This is extremely MIPS consuming, and not parallel much. If you do not do this, your get CUDA-BADABOOM, a very mediocre video compression ratio.

i wish the H264 code was more parrallel, but if you push the envellope, you get the "comcast like" blocks as soon as motion increase (Comcast use hardware parrallel encoder)
Take CUDA-BADABOOM, and try to encode a scene with high motion ... like a scene of the movie "crank". You ll understand what I mean, Mosaic is not part of "crank", but after encoding, you get some.

being H264 compliant and compressing well is not easy, CUDA did BADABOOM, litterally! hehehe!

who?
Back to top
View user's profile Send private message
TacoBell



Joined: 17 Aug 2007
Posts: 266

PostPosted: Mon Jul 14, 2008 10:01 am    Post subject: Reply with quote

Quote:

well, video encoding is not as parrallel as you may think ... when you use N threads to video encode, you slice the encoding, and this generate n number of block matched buffer. after you are doing with the motion estimation, you have to recombile the n list of macroblock matches into one single list, and match the double/many detected blocks...
This is extremely MIPS consuming, and not parallel much. If you do not do this, your get CUDA-BADABOOM, a very mediocre video compression ratio.


This is only going to be true within key frames. The large distributed encoding systems I've read about split at keyframes using a pre-encode search (for example cu scenes and ft-black) and then each segment is treated as if it were independent. If the source was long enough it woudl seem this should scale very well, subject to memory limitations.
Back to top
View user's profile Send private message
MadRat



Joined: 22 Jul 2007
Posts: 137

PostPosted: Mon Jul 14, 2008 12:37 pm    Post subject: Reply with quote

Memory throughput is the key for the majority of the editing, yes.
Back to top
View user's profile Send private message
who?



Joined: 01 Sep 2007
Posts: 531

PostPosted: Mon Jul 14, 2008 2:40 pm    Post subject: Reply with quote

TacoBell wrote:
Quote:

well, video encoding is not as parrallel as you may think ... when you use N threads to video encode, you slice the encoding, and this generate n number of block matched buffer. after you are doing with the motion estimation, you have to recombile the n list of macroblock matches into one single list, and match the double/many detected blocks...
This is extremely MIPS consuming, and not parallel much. If you do not do this, your get CUDA-BADABOOM, a very mediocre video compression ratio.


This is only going to be true within key frames. The large distributed encoding systems I've read about split at keyframes using a pre-encode search (for example cu scenes and ft-black) and then each segment is treated as if it were independent. If the source was long enough it woudl seem this should scale very well, subject to memory limitations.


This is valid for all frames, not matching the Sub threaded buffers decrease the compression ratio in EVERY case. What you did read was not accurate. (Think about it, you have a gray back ground, every thread will send to delta frames the gray blocks ... simple example of a predicted catastrophe if you have a back ground of grass ... You got to be able to use the pattern find in the 1st thread for the other threads, at least ....)

What you did read is accurate for H263, but in H264, depending of the level choosen, you have to be able to match block in +/- 16 frames ... profile mostly used is 2.0 and 2.2, and the requirement is +/- 4 frames.
The CUDA encoder advertise 2.0, but i think it is only 1.0 profile.

who?
Back to top
View user's profile Send private message
who?



Joined: 01 Sep 2007
Posts: 531

PostPosted: Mon Jul 14, 2008 5:23 pm    Post subject: Reply with quote

ho, i forgot, because you do not match in the Key frame properly, you will increase the size of the key frame, forcing the delta to be smaller, increasing better match rejection for bandwidth reason.

good compression ratio at high level of quality is hard to get. If you can get your hand on a 280 and BADABOOM, test it yourself.

who?
Back to top
View user's profile Send private message
P4man



Joined: 26 Jun 2007
Posts: 522

PostPosted: Mon Jul 14, 2008 6:32 pm    Post subject: Reply with quote

This is not exactly my field of expertise so I have no idea what you are talking about, but one (well, me) would think that different threads encoding different parts of a videostream (different *scenes*, and I believe pretty much all encoders detect a scene change) would share nothing, so 99.9% of the streams out there, there wouldnt really be a problem or ? Perhaps, not sure, but this approach eats more bandwidth, but that is what GPUs excel at anyhow.
Back to top
View user's profile Send private message
TacoBell



Joined: 17 Aug 2007
Posts: 266

PostPosted: Mon Jul 14, 2008 10:12 pm    Post subject: Reply with quote

P4man wrote:
This is not exactly my field of expertise so I have no idea what you are talking about, but one (well, me) would think that different threads encoding different parts of a videostream (different *scenes*, and I believe pretty much all encoders detect a scene change) would share nothing, so 99.9% of the streams out there, there wouldnt really be a problem or ? Perhaps, not sure, but this approach eats more bandwidth, but that is what GPUs excel at anyhow.


I see his point which was that even cut scenes have some overlap, even i by accident. Suppose a scene has some black and is then added through black. If all o the compression is based on the nearest N frames then to achieve bit identical you need real MT code, not the many segment model which works when you don't have compression across keyrames.
Back to top
View user's profile Send private message
MadRat



Joined: 22 Jul 2007
Posts: 137

PostPosted: Mon Jul 14, 2008 11:26 pm    Post subject: Reply with quote

Without keyframes you can't seamlessly jump around the video. Rewinding and fast forwarding would screw up the synchronization.
Back to top
View user's profile Send private message
P4man



Joined: 26 Jun 2007
Posts: 522

PostPosted: Tue Jul 15, 2008 8:31 am    Post subject: Reply with quote

TacoBell wrote:

I see his point which was that even cut scenes have some overlap, even i by accident. Suppose.


Fair enough. But then why not have each thread render a few extra frames and then blend them together?
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Aceshardware Forum Index -> General forum All times are GMT + 1 Hour
Goto page Previous  1, 2, 3, 4, 5
Page 5 of 5   

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB
Hosted by FreeForums.org