OpenCL OpenGL review interview at toms

gorobei · Jun 11, 2012

http://www.tomshardware.com/reviews/photoshop-cs6-gimp-aftershot-pro,3208.html

very nice coverage of developments in the OCL arena. The interview with the adobe PS head scientist Russell Williams suggests more possibilities on the APU side than gpu. Memory pass over seems to be the bigger issue with gpu OCL.

benchmarks indicate OCL benefits are significant enough despite being in the baby steps stage.

BenchPress · Jun 11, 2012

Pretty worthless when it depends on whether you have an APU or a discrete GPU. And even with an APU it only speeds up a fraction of the operations.

AVX2 will speed up everything.

Olikan · Jun 11, 2012

BenchPress said:
Pretty worthless when it depends on whether you have an APU or a discrete GPU. And even with an APU it only speeds up a fraction of the operations.

AVX2 will speed up everything.

like opencl can't use avx2

Olikan · Jun 11, 2012

sorry for double post, but there is an awesome interview about openCL at semiaccurate

http://semiaccurate.com/2012/06/11/andrew-richards-talks-about-opencls-future/

ShintaiDK · Jun 12, 2012

OpenCL is the new hype.

pelov · Jun 12, 2012

ShintaiDK said:
OpenCL is the new hype.

Being an open standard helps. People used CUDA because it was the only option for years but now OpenCL is gaining faster traction and pulling ahead because it isn't hardware specific. That's huge.

CUDA might still be the better option for HPC because it allows closer-to-metal programming and you can really squeeze out every little bit of performance possible. It also helps that AMD thought they can just ship out retail gaming-grade GPUs as HPC co-processors

I'll bet that Intel begins to support openCL more over the coming years if they want to sleep in the same bed as Apple.

BenchPress · Jun 12, 2012

Olikan said:
like opencl can't use avx2

Of course it can. And every other language and framework can use AVX2. OpenCL is limited to the lowest common denominator, while AVX2 is not.

But AVX2 isn't even mentioned in that article. It's clearly just an advertisement for AMD. And their heterogeneous computing will fail miserably against homogeneous high throughput computing using AVX2.

bronxzv · Jun 12, 2012

BenchPress said:
Of course it can. And every other language and framework can use AVX2. OpenCL is limited to the lowest common denominator, while AVX2 is not.

But AVX2 isn't even mentioned in that article. It's clearly just an advertisement for AMD. And their heterogeneous computing will fail miserably against homogeneous high throughput computing using AVX2.

this Open CL thingy looks more and more like a solution desesparately seeking for a problem, good luck to them for convincing ISVs to port big applications to this toy (not just a module or two with the development payed by the hardware vendor to have at least a few showcases), with as a primary side effects regression of performance vs native code on mainstream targets, more difficult maintenance (language restrictions, subpar development tools, lack of Open CL fluent coders), stalled projects during the migration, I should haved missed one thing or two

CPUarchitect · Jun 12, 2012

gorobei said:
http://www.tomshardware.com/reviews/photoshop-cs6-gimp-aftershot-pro,3208.html

That has to be one of the most biased articles I've seen in years:

Tom's Hardware said:
...todays Sandy Bridge-based HD Graphics engines dont support OpenCL (and we still haven't been able to get our hands on any Ivy Bridge-based Core i5 machines).

Reminds me of the PhysX87 fiasco, in which NVIDIA tried to convince the world that its GPUs are faster than CPUs when having the CPU run horrendously unoptimized code.

The least they could have done was run OpenCL on the CPU, which every Intel CPU produced in the last decade supports. Anand's tests show that Intel's CPUs are faster than AMD's iGPUs in that case.

And that's with an API designed for the GPU, not the CPU! Furthermore, next year we'll have CPUs with AVX2 which double the throughput and add gather support! It will change the world of computing.

Tom's article looks like a desperate attempt for AMD to sell heterogeneous computing, before homogeneous computing wipes the floor with it.

Olikan · Jun 12, 2012

CPUarchitect said:
The least they could have done was run OpenCL on the CPU, which every Intel CPU produced in the last decade supports. Anand's tests show that Intel's CPUs are faster than AMD's iGPUs in that case.

i didn't get your point...
OpenCL is better than pure CPU for intel, as well on those tests

CPUarchitect · Jun 12, 2012

Olikan said:
i didn't get your point...

The point is that the Tom's Hardware article is horribly biased. They either should run optimized CPU code, or at the very least run OpenCL on the CPU.

OpenCL is better than pure CPU for intel, as well on those tests

I don't see any tests comparing OpenCL on the CPU versus pure CPU code.

Olikan · Jun 12, 2012

CPUarchitect said:
I don't see any tests comparing OpenCL on the CPU versus pure CPU code.

CPUarchitect · Jun 12, 2012

Olikan said:

Ah, he might have mixed up the second and fifth result. Note that he says that "obviously Sandy Bridge saw no benefit from the OpenCL optimizations". It makes no sense that using generic OpenCL on the CPU defeats hand-tuned assembly code. Unless this compares pure CPU transcoding against GPU decode + CPU encode or something. :hmm:

Pardon me for the confusion. In any case the fact is that the CPU is way faster than Tom's article would have people believe. Comparing optimized OpenCL on the GPU against unoptimized CPU code is just shameful, especially since using OpenCL on the CPU is straightforward and would offer quite reasonable use of multi-core vector processing.

It makes me wonder whether AMD is so afraid of AVX2 that they need to cheat. They're clearly dodging any attempt at making a fair(er) comparison.

Riek · Jun 12, 2012

What is wityh some guys and AVX2??
There doesn't exist a cpu with AVX2 support.
There doesn't exist 1 program that supports AVX2
There doesn't exist compilers that support AVX2

Also, AVX2 is a part of the chain that will be supported by Intel and AMD... so why this is used to draw a rift between intel and AMD is plain stupid, not even mentionning the fact it doesn't exist yet.

openCL vs AVXi is also a meaningless discussion... AVX2 is an instructionset

People who believe AVX2 will outspace gpu in raw power... are idiots.... AVX2 is an instruction set... Its completely disconnected by the speed of the hardware below it... just as openCL is completely disconnected by the operation it will eventually use to reach its goal.

If openCL does deliver such a huge increase than it is worth investing in it... seeing most people already have unused resources that support OpenCL. Would have been great with some more diversity in the test though.

bronxzv · Jun 12, 2012

Olikan said:

ROTFL Open CL faster than native code, it shows well how unoptimized is the baseline native, not even using AVX I suppose

bronxzv · Jun 12, 2012

Riek said:
There doesn't exist 1 program that supports AVX2

we have one http://software.intel.com/en-us/forums/showthread.php?t=103133&o=a&s=lr and we are a small ISV

MKL http://software.intel.com/en-us/articles/intel-mkl/
AVX2 support is there since several months
and IPP
http://software.intel.com/en-us/articles/intel-ipp/
as well, it means than all software linked with these is actually shiping with AVX2 code paths, that's probably more software already than all the Open CL technology demos

Riek said:
There doesn't exist compilers that support AVX2

huh? are you sure ? how do you think that we spit out code ?

mikk · Jun 12, 2012

CPUarchitect said:
The least they could have done was run OpenCL on the CPU, which every Intel CPU produced in the last decade supports. Anand's tests show that Intel's CPUs are faster than AMD's iGPUs in that case.

That's a bad example since the Handbrake OpenCL Version is developed by AMD and OpenCL didn't work in this Beta on Intel Hardware in this test. It's boosted by DXVA not OpenCL. So you see Intel DXVA vs AMD OpenCL here. You cannot except OpenCL Support for Intel if AMD is the developer of the software.

BenchPress · Jun 12, 2012

Riek said:
What is wityh some guys and AVX2??
There doesn't exist a cpu with AVX2 support.
There doesn't exist 1 program that supports AVX2
There doesn't exist compilers that support AVX2

AVX2 brings GPU technology into the CPU cores. It offers the same computing power, without the overhead or limitations. So there's plenty of reason to get excited over AVX2.

And yes, no CPU supports it yet. But neither does any APU today support a unified address space and context switches. That's only planned to be complete by 2014. So AVX2 will get there sooner.

GCC 4.7 supports AVX2, LLVM 3.1 supports AVX2 and Visual Studio 2012 supports AVX2. So compilers are well ahead of schedule too.

Also, AVX2 is a part of the chain that will be supported by Intel and AMD... so why this is used to draw a rift between intel and AMD is plain stupid, not even mentionning the fact it doesn't exist yet.

Because AMD has yet to announce that they'll support AVX2. It's inevitable that they will, but they'd rather have people use HSA instead. In other words they're betting the farm on other technology. Looking at what can already be achieved with AVX, and all the phenomenal things added by AVX2, that's really going to turn out to be a big mistake on AMD's part.

Just like NVIDIA realized, they should back away from making compromises to graphics performance for the sake of GPGPU. General purpose computing is what the CPU is for, and AVX2 adds a lot more oomph to it. Heterogeneous computing doesn't scale, due to the round-trip latency and bandwidth bottleneck. So the GPU should concentrate on pure graphics only, which is a one-way process.

openCL vs AVXi is also a meaningless discussion... AVX2 is an instructionset

It's not really OpenCL versus AVX2. It's homogeneous versus heterogeneous general purpose throughput computing. OpenCL is just one way to get code auto-vectorized. But AVX2 supports many more programming languages and frameworks. So it's not a question of one or the other. Indeed as you indicate, one is hardware and the other is software. That said, OpenCL may not survive long after homogeneous computing proves to be superior, since it will have to compete against other languages which have fewer restrictions.

AVX2 can be used by any language as-is. All you need is loops with independent iterations to auto-vectorize them. AVX2's gather support is critical in enabling that. And it means developers can use languages they already know and love, instead of trying to shoehorn things into the OpenCL framework and losing performance on heterogeneous architectures.

People who believe AVX2 will outspace gpu in raw power... are idiots.... AVX2 is an instruction set... Its completely disconnected by the speed of the hardware below it...

Sure, it depends on the underlying hardware whether it's a high performance implementation or not. But that's equally true for GPUs!

Haswell's implementation of AVX2 will have three 256-bit execution units per core. Two of these will be capable of FMA operations, resulting in a peak performance of 500 GFLOPS for a quad-core. On a performance/area metric that's actually quite close to any GPU. And you don't lose any of the existing CPU qualities like far superior sequential speed, large cache space per thread, branch prediction to prevent stalls, etc.

Last but not least, AVX2 is not the end of the road. The encoding format supports extending it up to 1024-bit registers. This can be used to lower the power consumption of the CPU's front-end and out-of-order execution, by executing AVX-1024 instructions in four cycles (i.e. same ALU throughput for four times less power consumption in the rest of the pipeline). This would effectively make the CPU behave much more like a GPU in terms of power consumption. So heterogeneous computing won't have any benefits left.

bronxzv · Jun 12, 2012

BenchPress said:
since it will have to compete against other languages which have fewer restrictions.

exactly this + the enormous cost to port legacy applications a lof of pain with no gain

gorobei · Jun 12, 2012

the benchmarks are slightly cherry picked, but rather than seeing it as endorsing a particular brand I view it as confirmation that there are enough applications that will benefit from parallelization regardless of hardware vendor. they are getting significant enough results to justify coding to OCL.

the more relevant part was the interview with Russel Williams on the pro/cons of keeping the data on chip versus shipping it out to pcie-bus or memory. in the s/a video, Andrew Richards also reiterates the issue: power and latency costs of going from cpu to gpu vs staying on single silicon(apu). John Carmack touched on it in his ID tech speech last year as well.

There are some aps where sheer number of units(gpu) is faster/better. There are others where its better to keep it on die and not wait as you pack up all the data to ship it out to bus and waiting for the results to come back(apu).

Russell Williams: I don't have numbers off the top of my head, but think of a 16-megapixel DSLR image. Say you want to do something, like modifying the tilt of the blur plane in the blur gallery, and you want to get feedback in real-time30 to 60 FPS. Then you have to composite the result with 50 other layers, and that compositing needs to be done back on the CPU, because the entire compositing engine isn't done on the GPU. So copying data back at 60 FPS, you're copying the full image that's being processed two or three times per frame. Suddenly, that PCIe doesn't look as fast as you originally thought.

....

Russell Williams: If you want to make a sandwich, and you invent a machine that can make your sandwich in two seconds, it still doesn't make sense to drive to New York to use the machine when you live in California. The shorter latency of the APU empowers us to use the GPU in all sorts of ways that don't make sense for discrete graphics.

the other issue was ubiquity. coding for avx only covers intel users. coding for cuda eliminates amd gpu users. coding for directX-compute eliminates mac. OpenCL is open and everyone amd/intel/arm can be targeted by app writers. you may sacrifice some performance in most common denominator, but it is a bigger customer base.

the real issue is will it be another java situation where you code once and optimize 20 times for all the different architectures.

piesquared · Jun 12, 2012

Wow, gotta love it. Everything get's pounded by OpenCl. Traditional CPU's are dinosaurs. Heterogeneous computing is where the future is at, guaranteed. Lot's of big names signing on to the new HSA Foundation including ARM, TI, Imagination and MediaTek.

BenchPress · Jun 12, 2012

gorobei said:
the benchmarks are slightly cherry picked, but rather than seeing it as endorsing a particular brand I view it as confirmation that there are enough applications that will benefit from parallelization regardless of hardware vendor. they are getting significant enough results to justify coding to OCL.

Are you kidding? There are no OpenCL results for NVIDIA nor for Intel. You can't say this is significant enough to justify coding for OpenCL when the vast majority of systems isn't even represented.

the real issue is will it be another java situation where you code once and optimize 20 times for all the different architectures.

That's exactly the big concern here. With AVX2 there is no concern because it's guaranteed to be faster since there's no heterogeneous latency or bandwidth issue.

OpenCL is a new "standard" where there is no need for one, leading to fragmentation and wildly varying performance. Just look at how the GTX 680 fails against a quad-core CPU! With homogeneous computing like AVX2, developers can use existing languages, and higher performance across the board with less effort.

Olikan · Jun 12, 2012

BenchPress said:
OpenCL is a new "standard" where there is no need for one, leading to fragmentation and wildly varying performance. Just look at how the GTX 680 fails against a quad-core CPU! With homogeneous computing like AVX2, developers can use existing languages, and higher performance across the board with less effort.

...fragmentation and varying performance....

while avx2 = intel and maybe amd, one day

bronxzv · Jun 12, 2012

BenchPress said:
OpenCL is a new "standard" where there is no need for one, leading to fragmentation and wildly varying performance. Just look at how the GTX 680 fails against a quad-core CPU!

I love this example because it's a 3D renderer, if they spent more time to optimize a native path with AVX instead of porting to OCL I'm quite sure it will be even competitive with AMD single chip GPUs, not to mention that high-end systems in DCC are 2 socket 8-core Xeons workstations not a single quad core

bronxzv · Jun 12, 2012

Olikan said:
...fragmentation and varying performance....

while avx2 = intel and maybe amd, one day

more generally the question is about Open CL vs native code (ARM + Neon, x86 + AVXn, other...)

Open CL promoters
http://forums.anandtech.com/showpost.php?p=33559491&postcount=4
now talk about adding another layer on top of Open CL to support more languages and to obfuscate OCL code, i.e. they aknowlege the "L" part of Open CL isn't important so the only aim is to become yet one more IL/VM

OpenCL OpenGL review interview at toms

Diamond Member

Senior member

Platinum Member

Platinum Member

Lifer

Diamond Member

Senior member

Senior member

Senior member

Platinum Member

Senior member

Platinum Member

Senior member

Senior member

Senior member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Golden Member

Senior member

Platinum Member

Senior member

Senior member