Maybe We Don't Understand the implications of OCL and HSA

Enigmoid · Jun 23, 2013

Quicksync is usefull and intel has the backing to push it.

Without quicksync or openCL acceleration the test takes 226 seconds on the APU and 113 on the 4770k.

Cerb · Jun 23, 2013

monstercameron said:
this isnt necessarily about amd v. intel, rather apu v cpu, also if we are talking about the future, there is always kaveri that has steamroller cores with avx2 support and gcn core with massive ocl perf boost over vliw4...so lets keep the flame baiting to a minimum, please.

Yes, bring on the dual 128-bit FPUs, again (so their big 4-module CPU can compete with an i3). And, AFAIK, we're still very much in the dark as to how they are implementing non-FP AVX2. Intel's IGP is also already almost at par.

And yes, it is very much about AMD v. Intel, because they are the only two companies with CPUs that run Windows and/or OS X, which are where this tech is mostly being used. The software support infrastructure is still not quite there for non-x86, ATM.

SiliconWars · Jun 23, 2013

Cerb said:
Intel's IGP is also already almost at par.

If by "already" you mean "many years later" and by "almost at par" you mean "30%-40% slower", then yes you're right.

ViRGE · Jun 23, 2013

monstercameron said:
found theses benchmarks

you can see how the a10-6800k performs in a straight up cpu benchmark, not bad. The Intel parts do out perform it.

now if we look at the benchmark with the a10 6.8k having enabled avx and opencl acceleration, it outperforms the i7-3960x -in cpu transcoding.

This has me thinking that maybe we [by that I mean I] dont fully comprehend what a fully hsa compliant apu is capable of. Although it wont matter if devs dont take advantage of this.

I havent found any other benchmarks like this so its pretty much a cherrypick. In any case what do you think are the implications of HSA?

It's been a while since I've used MediaEspresso, but isn't "OpenCL" just their very poorly named option for using AMD's hardware encoder? I'm almost positive turning on OpenCL is actually turning on VCE, and turning on CUDA turns on NVENC for NVIDIA cards.

In which case that's not an example of HSA nor GPU compute; rather it's an example of using a dedicated DSP.

monstercameron · Jun 23, 2013

another one from the cherry picker.

Imouto · Jun 23, 2013

OpenCL performance is stellar with the new Haswell chips.

http://www.tomshardware.com/reviews/core-i7-4770k-haswell-review,3521-3.html

Dunno why you have to bash it so badly. You'll get OCL in virtually any device from now on and HSA in almost any non Intel/Nvidia stuff. As far as I got it, HSA is only a way to speed up even further OpenCL and the likes.

monstercameron · Jun 23, 2013

ViRGE said:
It's been a while since I've used MediaEspresso, but isn't "OpenCL" just their very poorly named option for using AMD's hardware encoder? I'm almost positive turning on OpenCL is actually turning on VCE, and turning on CUDA turns on NVENC for NVIDIA cards.

In which case that's not an example of HSA nor GPU compute; rather it's an example of using a dedicated DSP.

not so sure about that

MediaEspresso 6.5's support for AMD® Accelerated Parallel Processing (APP) and Fusion E-series & C-series Accelerated Processors technologies lets users leverage advanced hardware and software technologies that enable AMD graphics processors (GPU), working in concert with the system's central processor (CPU), to accelerate the video conversion process. This ensures more balanced system performance for faster handling of HD video transcoding.
In conjunction with AMD APP technology, MediaEspresso 6.5 accelerates the conversion of standard and HD video into multiple formats for use on various consumer electronics devices. Support is also available for UVD (Unified Video Decoder), a video decoding unit of AMD APP that supports the hardware decode of H.264 and VC-1 video codec, enabling MediaEspresso 6.5 to provide quick output of video content for playback on PSP, iPad, iPod, and iPhone platforms.

makes no mention of the fixed function encoders(also note that the fusion e&c-series dont have vce) It also specifically mentions amd app and nvidia cuda.

also note that the only hardware encoder it support is intel quicksync.

Super Performance

Intel Quick Sync Video enables MediaEspresso 6.5 to improve overall performance including encoding, previewing and simultaneous conversion of multiple video file formats.
The graph on the right shows the performance of MediaEspresso 6.5 in converting HD video content for output to a Sony PS3 game console using a 2nd generation Intel Core i7 processor with Intel Quick Sync Video technology compared to the same conversion using the previous generation Intel hardware. Optimizations for the new hardware within MediaEspresso 6 result in significantly shorter video rendering times.

http://www.cyberlink.com/products/mediaespresso/overview_en_US.html?&r=1

Khato · Jun 23, 2013

Imouto said:
Dunno why you have to bash it so badly. You'll get OCL in virtually any device from now on and HSA in almost any non Intel/Nvidia stuff. As far as I got it, HSA is only a way to speed up even further OpenCL and the likes.

The only reason why HSA enabled products will be faster than OpenCL is the unified memory feature. The rest of HSA is just AMD's latest promise to provide a decent programming framework.

As for unified memory, I've already seen a few rumors that it's something Broadwell also supports. So sure AMD might beat into to the punch with Kaveri, but that doesn't mean it's going to be the superior solution.

monstercameron · Jun 23, 2013

Khato said:
The only reason why HSA enabled products will be faster than OpenCL is the unified memory feature. The rest of HSA is just AMD's latest promise to provide a decent programming framework.

As for unified memory, I've already seen a few rumors that it's something Broadwell also supports. So sure AMD might beat into to the punch with Kaveri, but that doesn't mean it's going to be the superior solution.

stop focusing on intel v. amd, we are disqusing the current, possible and unseen advantages of hsa in an apu v. cpu.

do you have any idea how exactly unified address space will be advantageous?

Exophase · Jun 23, 2013

There are compute tasks where GPUs are faster than CPUs. That's not something we don't understand the implications of. If that weren't well established by now there wouldn't have ever been a compute hardware market.

Putting aside that encoding is a bad use case to highlight the benefits of GPU compute, the real question is if memory unification and hardware with tight integration for faster communication (IGPs) is enabling any real big compute benefits. This benchmark doesn't do anything to highlight that.

ViRGE · Jun 23, 2013

monstercameron said:
not so sure about that
makes no mention of the fixed function encoders(also note that the fusion e&c-series dont have vce) It also specifically mentions amd app and nvidia cuda.

also note that the only hardware encoder it support is intel quicksync.

http://www.cyberlink.com/products/mediaespresso/overview_en_US.html?&r=1

Without going too far into the gory details here, Cyberlink doesn't actually implement any of this stuff themselves (i.e. they haven't written any OpenCL/CUDA code for Espresso). What they do is call the appropriate video encode API the video drivers provide, and pass off the bulk of the work in that manner.

For AMD's very early parts without VCE, this was a GPU (but technically not OpenCL) accelerated path, which attempted to use the shaders to offload some of the encoding work without much success (AMD's encoder was single threaded, so it was always CPU bottlenecked). When VCE was introduced the API was kept, however using it shunted the job to the VCE block instead.

So what does this have to do with MediaEspresso? The description you've pulled was written before VCE was introduced (6.5 was launched in 2011, IIRC). It's not wrong per-se, but it's only applicable for AMD parts without the VCE encoder. So if Guru3D has turned on "OpenCL", what they've actually done is turn on VCE. Which given the scores makes far more sense anyhow.

Basically Cyberlink is being lazy here. When they use AMD's video encode API they call it OpenCL, when they use NVIDIA's they call it CUDA, and when they use Intel's they call it QuickSync. In practice with current drivers ME supports NVENC, VCE, and QuickSync.

monstercameron · Jun 23, 2013

Exophase said:
There are compute tasks where GPUs are faster than CPUs. That's not something we don't understand the implications of. If that weren't well established by now there wouldn't have ever been a compute hardware market.

Putting aside that encoding is a bad use case to highlight the benefits of GPU compute, the real question is if memory unification and hardware with tight integration for faster communication (IGPs) is enabling any real big compute benefits. This benchmark doesn't do anything to highlight that.

that was what i was getting at, I couldn't find any better examples to demonstrate it though.

frozentundra123456 · Jun 23, 2013

monstercameron said:
stop focusing on intel v. amd, we are disqusing the current, possible and unseen advantages of hsa in an apu v. cpu.

do you have any idea how exactly unified address space will be advantageous?

Come on, with your known support of a particular company, you had to know it was going to turn into AMD vs Intel debate, especially since you were touting how a cheap AMD processor was faster than a 1000.00 intel processor.

ViRGE · Jun 23, 2013

Khato said:
The only reason why HSA enabled products will be faster than OpenCL is the unified memory feature. The rest of HSA is just AMD's latest promise to provide a decent programming framework.

It's a bit more complex than that. Although shared memory is one of the big features necessary to enable HSA, the other "huge" feature for HSA is that HSA compliant GPUs will have to offer fast context switching. Currently GPUs are very slow at context switching (with all the registers and cache there's several MB of context to save). Full HSA will bring with it a new iteration of AMD's GCN design that will support fast context switching, which in turn will hopefully allow workloads that previously weren't possible to efficiently process due to the slow context switching.

monstercameron · Jun 23, 2013

frozentundra123456 said:
Come on, with your known support of a particular company, you had to know it was going to turn into AMD vs Intel debate, especially since you were touting how a cheap AMD processor was faster than a 1000.00 intel processor.

I havent turned it in to a troll thread, i was trying not to sound biased in my op

now if we look at the benchmark with the a10 6.8k having enabled avx and opencl acceleration, it outperforms the i7-3960x -in cpu transcoding.

This has me thinking that maybe we [by that I mean I] dont fully comprehend what a fully hsa compliant apu is capable of. Although it wont matter if devs dont take advantage of this.

I havent found any other benchmarks like this so its pretty much a cherrypick. In any case what do you think are the implications of HSA?

as virge pointed out the 1st 2 benches might actually be even further misrepresenting my point

frozentundra123456 · Jun 23, 2013

ViRGE said:
It's a bit more complex than that. Although shared memory is one of the big features necessary to enable HSA, the other "huge" feature for HSA is that HSA compliant GPUs will have to offer fast context switching. Currently GPUs are very slow at context switching (with all the registers and cache there's several MB of context to save). Full HSA will bring with it a new iteration of AMD's GCN design that will support fast context switching, which in turn will hopefully allow workloads that previously weren't possible to efficiently process due to the slow context switching.

So I dont completely understand this. Will HSA work with a discrete card? If so will it work with an intel chip or FX, or will it have to be paired with an APU? And if a powerful discrete card is paired with an APU and the IGP is not being used, will HSA still work?

Cerb · Jun 23, 2013

monstercameron said:
stop focusing on intel v. amd, we are disqusing the current, possible and unseen advantages of hsa in an apu v. cpu.

An AMD CPU with integrated IGP is an APU.
HSA was largely an initiative by AMD, and was certainly hyped most by AMD.

It's frankly impossible for this not to be an AMD v. ??? thread.

do you have any idea how exactly unified address space will be advantageous?

To work with the same memory in any practical capacity, the CPU or GPU must be able to get exclusive access to a cache line. To do so with dissimilar address spaces requires translation between the address spaces, which may have very different semantics for cache coherency (often mostly-software on the GPU side), different cache line sizes, if any concept of that at all, and different virtual memory management, if any at all, is going to be complicated, to such a point that right-minded people will just avoid it. More generally, just treating everything as non-sharable, if it is writable, such that buffers get copied back and forth a lot, which can result in being slower than just doing it all on the CPU.

By using the host CPU's virtual and physical address space, and cache coherency protocol, exclusive access becomes similar to using 2 CPUs. I'm not even sure how to describe it in generally accessible terms, so it would probably be wrong to say it's not complicated, but it should be no more complicated than software made to use multiple CPUs, which is all well-understood. It's the difference between, "sorry, but that just can't be done, without a government money pit budget," and, "hmmm, will it be worth the effort?"

BrightCandle · Jun 23, 2013

Speaking as as a developer I am not so sure I care too much about unified memory access. On the one hand it might allow me to do some operations on the CPU and others on the GPU, but that is going to get very messy very quickly and synchronization is going to really slow it down.

I care more about heterogeneous computing where I have some cores that are slower but more abundent and some faster cores that are good at dealing with branches and general code not compute heavy code. But for that to work I don't just need UMA I also need a similar instruction set or some way for two paths of code to be compiled from the same code such that it targets both platforms. So far the only company going this way is Intel with its Knights Bridge implementation. They seem to understand I don't want to recode everything just to make it run on a GPU, I want just to tweak it to make it perform better.

Its for this reason I think the GPU coprocessor approach will ultimately die out and become replaced with big and little cores using the same instruction set, x86.

ViRGE · Jun 23, 2013

frozentundra123456 said:
So I dont completely understand this. Will HSA work with a discrete card? If so will it work with an intel chip or FX, or will it have to be paired with an APU? And if a powerful discrete card is paired with an APU and the IGP is not being used, will HSA still work?

You would need an HSA compliant CPU and an HSA compliant dGPU. If AMD doesn't roll out a stand-alone Steamroller product, then I'm not sure if we'll ever see the former. In which case you'd in theory be able to use a dGPU + AMD APU as part of an HSA setup, but the big question about the potential performance hit (due to their vast distance apart) remains unanswered.

mrmt · Jun 23, 2013

monstercameron said:
stop focusing on intel v. amd, we are disqusing the current, possible and unseen advantages of hsa in an apu v. cpu.

do you have any idea how exactly unified address space will be advantageous?

It is clear by now that the GPU can be faster than the CPU in some tasks, specially in parallel ones. What Intel is doing with vector instructions is to bridge this gap. Sure, GPUs will still be faster, but you can have some benefits of GPGPU for free and offering a much larger user base, far larger than AMD and Nvidia will have to offer.

This is the crucial point of HSA. No matter how good it is, will developers recode all software just for the sake of a small minority on the market? I guess the answer will be no for most software you can think. If they are going to recode, it will be to take advantage of AVX2 and FMA3 instructions that AMD will have to support if they continue to work with x86.

It is counter-intuitive but on specific niches, where someone might have a budget and the will to recode the software for HSA, things will be even worse. Intel is already beating AMD on openCL, and this on the desktop, where Intel SKUs aren't gaining many cores as in servers. Can you imagine the OpenCL monster that Haswell-EP will be with 12 cores? It will not even need a GPU to be a monster. But... you need the best performance on the market, right? Why not optimize to Intel latest instruction set and Xeon Phi?

fusion238 · Jun 23, 2013

Adobe is a major developer collaborating with AMD and supporting OpenCL acceleration. With a great number of new Kaveri mobiles and desktops about to ship, they signal what will be a flood of broadbased HSA/OpenCL app support.

http://www.techpowerup.com/185977/a...th-adobe-photoshop-cc-and-premier-pro-cc.html

galego · Jun 23, 2013

monstercameron said:
found theses benchmarks

you can see how the a10-6800k performs in a straight up cpu benchmark, not bad. The Intel parts do out perform it.

now if we look at the benchmark with the a10 6.8k having enabled avx and opencl acceleration, it outperforms the i7-3960x -in cpu transcoding.

This has me thinking that maybe we [by that I mean I] dont fully comprehend what a fully hsa compliant apu is capable of. Although it wont matter if devs dont take advantage of this.

I havent found any other benchmarks like this so its pretty much a cherrypick. In any case what do you think are the implications of HSA?

Adobe is optimizing applications such as Photoshop and Premiere for AMD APUs + OpenCL, as shown by the poster above.

HSA is radically different to OpenCL. HSA introduces heterogeneous computing at the hardware level. You must read the next link. Developers working in HSA are finding a 500% increase in performance in their applications

http://www.expertreviews.co.uk/processors/1299913/the-big-interview-apus-hsa-and-where-next-for-amd

BallaTheFeared · Jun 23, 2013

galego said:
http://www.expertreviews.co.uk/processors/1299913/the-big-interview-apus-hsa-and-where-next-for-amd

I like their web browser and flash examples!

I have a problem taking cooperate slides at face value, it's the definition of cherry picking.

Vesku · Jun 23, 2013

Imouto said:
OpenCL performance is stellar with the new Haswell chips.

http://www.tomshardware.com/reviews/core-i7-4770k-haswell-review,3521-3.html

Dunno why you have to bash it so badly. You'll get OCL in virtually any device from now on and HSA in almost any non Intel/Nvidia stuff. As far as I got it, HSA is only a way to speed up even further OpenCL and the likes.

Intel may end up with HSA products or at least their own version. Look at Iris with its shared eDram.

ShintaiDK · Jun 24, 2013

fusion238 said:
Adobe is a major developer collaborating with AMD and supporting OpenCL acceleration. With a great number of new Kaveri mobiles and desktops about to ship, they signal what will be a flood of broadbased HSA/OpenCL app support.

http://www.techpowerup.com/185977/a...th-adobe-photoshop-cc-and-premier-pro-cc.html

Did OpenCL slow down that IB? Else it makes no sense.

Maybe We Don't Understand the implications of OCL and HSA

Platinum Member

Elite Member

Platinum Member

Elite Member, Moderator Emeritus

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Elite Member, Moderator Emeritus

Diamond Member

Lifer

Elite Member, Moderator Emeritus

Diamond Member

Lifer

Elite Member

Diamond Member

Elite Member, Moderator Emeritus

Diamond Member

Member

Golden Member

Diamond Member

Diamond Member

Lifer