Maybe We Don't Understand the implications of OCL and HSA

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
Quicksync is usefull and intel has the backing to push it.



Without quicksync or openCL acceleration the test takes 226 seconds on the APU and 113 on the 4770k.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
this isnt necessarily about amd v. intel, rather apu v cpu, also if we are talking about the future, there is always kaveri that has steamroller cores with avx2 support and gcn core with massive ocl perf boost over vliw4...so lets keep the flame baiting to a minimum, please.
Yes, bring on the dual 128-bit FPUs, again (so their big 4-module CPU can compete with an i3). And, AFAIK, we're still very much in the dark as to how they are implementing non-FP AVX2. Intel's IGP is also already almost at par.

And yes, it is very much about AMD v. Intel, because they are the only two companies with CPUs that run Windows and/or OS X, which are where this tech is mostly being used. The software support infrastructure is still not quite there for non-x86, ATM.
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
found theses benchmarks

you can see how the a10-6800k performs in a straight up cpu benchmark, not bad. The Intel parts do out perform it.


now if we look at the benchmark with the a10 6.8k having enabled avx and opencl acceleration, it outperforms the i7-3960x -in cpu transcoding.

This has me thinking that maybe we [by that I mean I] dont fully comprehend what a fully hsa compliant apu is capable of. Although it wont matter if devs dont take advantage of this.

I havent found any other benchmarks like this so its pretty much a cherrypick. In any case what do you think are the implications of HSA?
It's been a while since I've used MediaEspresso, but isn't "OpenCL" just their very poorly named option for using AMD's hardware encoder? I'm almost positive turning on OpenCL is actually turning on VCE, and turning on CUDA turns on NVENC for NVIDIA cards.

In which case that's not an example of HSA nor GPU compute; rather it's an example of using a dedicated DSP.
 

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
It's been a while since I've used MediaEspresso, but isn't "OpenCL" just their very poorly named option for using AMD's hardware encoder? I'm almost positive turning on OpenCL is actually turning on VCE, and turning on CUDA turns on NVENC for NVIDIA cards.

In which case that's not an example of HSA nor GPU compute; rather it's an example of using a dedicated DSP.

not so sure about that
MediaEspresso 6.5's support for AMD® Accelerated Parallel Processing (APP) and Fusion E-series & C-series Accelerated Processors technologies lets users leverage advanced hardware and software technologies that enable AMD graphics processors (GPU), working in concert with the system's central processor (CPU), to accelerate the video conversion process. This ensures more balanced system performance for faster handling of HD video transcoding.
In conjunction with AMD APP technology, MediaEspresso 6.5 accelerates the conversion of standard and HD video into multiple formats for use on various consumer electronics devices. Support is also available for UVD (Unified Video Decoder), a video decoding unit of AMD APP that supports the hardware decode of H.264 and VC-1 video codec, enabling MediaEspresso 6.5 to provide quick output of video content for playback on PSP, iPad, iPod, and iPhone platforms.
makes no mention of the fixed function encoders(also note that the fusion e&c-series dont have vce) It also specifically mentions amd app and nvidia cuda.

also note that the only hardware encoder it support is intel quicksync.
Super Performance

Intel Quick Sync Video enables MediaEspresso 6.5 to improve overall performance including encoding, previewing and simultaneous conversion of multiple video file formats.
The graph on the right shows the performance of MediaEspresso 6.5 in converting HD video content for output to a Sony PS3 game console using a 2nd generation Intel Core i7 processor with Intel Quick Sync Video technology compared to the same conversion using the previous generation Intel hardware. Optimizations for the new hardware within MediaEspresso 6 result in significantly shorter video rendering times.

http://www.cyberlink.com/products/mediaespresso/overview_en_US.html?&r=1
 
Last edited:

Khato

Golden Member
Jul 15, 2001
1,225
281
136
Dunno why you have to bash it so badly. You'll get OCL in virtually any device from now on and HSA in almost any non Intel/Nvidia stuff. As far as I got it, HSA is only a way to speed up even further OpenCL and the likes.

The only reason why HSA enabled products will be faster than OpenCL is the unified memory feature. The rest of HSA is just AMD's latest promise to provide a decent programming framework.

As for unified memory, I've already seen a few rumors that it's something Broadwell also supports. So sure AMD might beat into to the punch with Kaveri, but that doesn't mean it's going to be the superior solution.
 

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
The only reason why HSA enabled products will be faster than OpenCL is the unified memory feature. The rest of HSA is just AMD's latest promise to provide a decent programming framework.

As for unified memory, I've already seen a few rumors that it's something Broadwell also supports. So sure AMD might beat into to the punch with Kaveri, but that doesn't mean it's going to be the superior solution.

stop focusing on intel v. amd, we are disqusing the current, possible and unseen advantages of hsa in an apu v. cpu.

do you have any idea how exactly unified address space will be advantageous?
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
There are compute tasks where GPUs are faster than CPUs. That's not something we don't understand the implications of. If that weren't well established by now there wouldn't have ever been a compute hardware market.

Putting aside that encoding is a bad use case to highlight the benefits of GPU compute, the real question is if memory unification and hardware with tight integration for faster communication (IGPs) is enabling any real big compute benefits. This benchmark doesn't do anything to highlight that.
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
not so sure about that
makes no mention of the fixed function encoders(also note that the fusion e&c-series dont have vce) It also specifically mentions amd app and nvidia cuda.

also note that the only hardware encoder it support is intel quicksync.


http://www.cyberlink.com/products/mediaespresso/overview_en_US.html?&r=1
Without going too far into the gory details here, Cyberlink doesn't actually implement any of this stuff themselves (i.e. they haven't written any OpenCL/CUDA code for Espresso). What they do is call the appropriate video encode API the video drivers provide, and pass off the bulk of the work in that manner.

For AMD's very early parts without VCE, this was a GPU (but technically not OpenCL) accelerated path, which attempted to use the shaders to offload some of the encoding work without much success (AMD's encoder was single threaded, so it was always CPU bottlenecked). When VCE was introduced the API was kept, however using it shunted the job to the VCE block instead.

So what does this have to do with MediaEspresso? The description you've pulled was written before VCE was introduced (6.5 was launched in 2011, IIRC). It's not wrong per-se, but it's only applicable for AMD parts without the VCE encoder. So if Guru3D has turned on "OpenCL", what they've actually done is turn on VCE. Which given the scores makes far more sense anyhow.

Basically Cyberlink is being lazy here. When they use AMD's video encode API they call it OpenCL, when they use NVIDIA's they call it CUDA, and when they use Intel's they call it QuickSync. In practice with current drivers ME supports NVENC, VCE, and QuickSync.
 

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
There are compute tasks where GPUs are faster than CPUs. That's not something we don't understand the implications of. If that weren't well established by now there wouldn't have ever been a compute hardware market.

Putting aside that encoding is a bad use case to highlight the benefits of GPU compute, the real question is if memory unification and hardware with tight integration for faster communication (IGPs) is enabling any real big compute benefits. This benchmark doesn't do anything to highlight that.

that was what i was getting at, I couldn't find any better examples to demonstrate it though.
 
Aug 11, 2008
10,451
642
126
stop focusing on intel v. amd, we are disqusing the current, possible and unseen advantages of hsa in an apu v. cpu.

do you have any idea how exactly unified address space will be advantageous?

Come on, with your known support of a particular company, you had to know it was going to turn into AMD vs Intel debate, especially since you were touting how a cheap AMD processor was faster than a 1000.00 intel processor.
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
The only reason why HSA enabled products will be faster than OpenCL is the unified memory feature. The rest of HSA is just AMD's latest promise to provide a decent programming framework.
It's a bit more complex than that. Although shared memory is one of the big features necessary to enable HSA, the other "huge" feature for HSA is that HSA compliant GPUs will have to offer fast context switching. Currently GPUs are very slow at context switching (with all the registers and cache there's several MB of context to save). Full HSA will bring with it a new iteration of AMD's GCN design that will support fast context switching, which in turn will hopefully allow workloads that previously weren't possible to efficiently process due to the slow context switching.
 

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
Come on, with your known support of a particular company, you had to know it was going to turn into AMD vs Intel debate, especially since you were touting how a cheap AMD processor was faster than a 1000.00 intel processor.

I havent turned it in to a troll thread, i was trying not to sound biased in my op

now if we look at the benchmark with the a10 6.8k having enabled avx and opencl acceleration, it outperforms the i7-3960x -in cpu transcoding.

This has me thinking that maybe we [by that I mean I] dont fully comprehend what a fully hsa compliant apu is capable of. Although it wont matter if devs dont take advantage of this.

I havent found any other benchmarks like this so its pretty much a cherrypick. In any case what do you think are the implications of HSA?

as virge pointed out the 1st 2 benches might actually be even further misrepresenting my point
 
Aug 11, 2008
10,451
642
126
It's a bit more complex than that. Although shared memory is one of the big features necessary to enable HSA, the other "huge" feature for HSA is that HSA compliant GPUs will have to offer fast context switching. Currently GPUs are very slow at context switching (with all the registers and cache there's several MB of context to save). Full HSA will bring with it a new iteration of AMD's GCN design that will support fast context switching, which in turn will hopefully allow workloads that previously weren't possible to efficiently process due to the slow context switching.

So I dont completely understand this. Will HSA work with a discrete card? If so will it work with an intel chip or FX, or will it have to be paired with an APU? And if a powerful discrete card is paired with an APU and the IGP is not being used, will HSA still work?
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
stop focusing on intel v. amd, we are disqusing the current, possible and unseen advantages of hsa in an apu v. cpu.
An AMD CPU with integrated IGP is an APU.
HSA was largely an initiative by AMD, and was certainly hyped most by AMD.

It's frankly impossible for this not to be an AMD v. ??? thread.

do you have any idea how exactly unified address space will be advantageous?
To work with the same memory in any practical capacity, the CPU or GPU must be able to get exclusive access to a cache line. To do so with dissimilar address spaces requires translation between the address spaces, which may have very different semantics for cache coherency (often mostly-software on the GPU side), different cache line sizes, if any concept of that at all, and different virtual memory management, if any at all, is going to be complicated, to such a point that right-minded people will just avoid it. More generally, just treating everything as non-sharable, if it is writable, such that buffers get copied back and forth a lot, which can result in being slower than just doing it all on the CPU.

By using the host CPU's virtual and physical address space, and cache coherency protocol, exclusive access becomes similar to using 2 CPUs. I'm not even sure how to describe it in generally accessible terms, so it would probably be wrong to say it's not complicated, but it should be no more complicated than software made to use multiple CPUs, which is all well-understood. It's the difference between, "sorry, but that just can't be done, without a government money pit budget," and, "hmmm, will it be worth the effort?"
 
Last edited:

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
Speaking as as a developer I am not so sure I care too much about unified memory access. On the one hand it might allow me to do some operations on the CPU and others on the GPU, but that is going to get very messy very quickly and synchronization is going to really slow it down.

I care more about heterogeneous computing where I have some cores that are slower but more abundent and some faster cores that are good at dealing with branches and general code not compute heavy code. But for that to work I don't just need UMA I also need a similar instruction set or some way for two paths of code to be compiled from the same code such that it targets both platforms. So far the only company going this way is Intel with its Knights Bridge implementation. They seem to understand I don't want to recode everything just to make it run on a GPU, I want just to tweak it to make it perform better.

Its for this reason I think the GPU coprocessor approach will ultimately die out and become replaced with big and little cores using the same instruction set, x86.
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
So I dont completely understand this. Will HSA work with a discrete card? If so will it work with an intel chip or FX, or will it have to be paired with an APU? And if a powerful discrete card is paired with an APU and the IGP is not being used, will HSA still work?
You would need an HSA compliant CPU and an HSA compliant dGPU. If AMD doesn't roll out a stand-alone Steamroller product, then I'm not sure if we'll ever see the former. In which case you'd in theory be able to use a dGPU + AMD APU as part of an HSA setup, but the big question about the potential performance hit (due to their vast distance apart) remains unanswered.
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
stop focusing on intel v. amd, we are disqusing the current, possible and unseen advantages of hsa in an apu v. cpu.

do you have any idea how exactly unified address space will be advantageous?

It is clear by now that the GPU can be faster than the CPU in some tasks, specially in parallel ones. What Intel is doing with vector instructions is to bridge this gap. Sure, GPUs will still be faster, but you can have some benefits of GPGPU for free and offering a much larger user base, far larger than AMD and Nvidia will have to offer.

This is the crucial point of HSA. No matter how good it is, will developers recode all software just for the sake of a small minority on the market? I guess the answer will be no for most software you can think. If they are going to recode, it will be to take advantage of AVX2 and FMA3 instructions that AMD will have to support if they continue to work with x86.

It is counter-intuitive but on specific niches, where someone might have a budget and the will to recode the software for HSA, things will be even worse. Intel is already beating AMD on openCL, and this on the desktop, where Intel SKUs aren't gaining many cores as in servers. Can you imagine the OpenCL monster that Haswell-EP will be with 12 cores? It will not even need a GPU to be a monster. But... you need the best performance on the market, right? Why not optimize to Intel latest instruction set and Xeon Phi?
 

galego

Golden Member
Apr 10, 2013
1,091
0
0
found theses benchmarks

you can see how the a10-6800k performs in a straight up cpu benchmark, not bad. The Intel parts do out perform it.


now if we look at the benchmark with the a10 6.8k having enabled avx and opencl acceleration, it outperforms the i7-3960x -in cpu transcoding.

This has me thinking that maybe we [by that I mean I] dont fully comprehend what a fully hsa compliant apu is capable of. Although it wont matter if devs dont take advantage of this.

I havent found any other benchmarks like this so its pretty much a cherrypick. In any case what do you think are the implications of HSA?

Adobe is optimizing applications such as Photoshop and Premiere for AMD APUs + OpenCL, as shown by the poster above.

HSA is radically different to OpenCL. HSA introduces heterogeneous computing at the hardware level. You must read the next link. Developers working in HSA are finding a 500% increase in performance in their applications

http://www.expertreviews.co.uk/processors/1299913/the-big-interview-apus-hsa-and-where-next-for-amd
 
Last edited:

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |