OpenCL OpenGL review interview at toms

BenchPress

Senior member
Nov 8, 2011
392
0
0
Pretty worthless when it depends on whether you have an APU or a discrete GPU. And even with an APU it only speeds up a fraction of the operations.

AVX2 will speed up everything.
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
Pretty worthless when it depends on whether you have an APU or a discrete GPU. And even with an APU it only speeds up a fraction of the operations.

AVX2 will speed up everything.

like opencl can't use avx2
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
OpenCL is the new hype.

Being an open standard helps. People used CUDA because it was the only option for years but now OpenCL is gaining faster traction and pulling ahead because it isn't hardware specific. That's huge.

CUDA might still be the better option for HPC because it allows closer-to-metal programming and you can really squeeze out every little bit of performance possible. It also helps that AMD thought they can just ship out retail gaming-grade GPUs as HPC co-processors

I'll bet that Intel begins to support openCL more over the coming years if they want to sleep in the same bed as Apple.
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
like opencl can't use avx2
Of course it can. And every other language and framework can use AVX2. OpenCL is limited to the lowest common denominator, while AVX2 is not.

But AVX2 isn't even mentioned in that article. It's clearly just an advertisement for AMD. And their heterogeneous computing will fail miserably against homogeneous high throughput computing using AVX2.
 

bronxzv

Senior member
Jun 13, 2011
460
0
71
Of course it can. And every other language and framework can use AVX2. OpenCL is limited to the lowest common denominator, while AVX2 is not.

But AVX2 isn't even mentioned in that article. It's clearly just an advertisement for AMD. And their heterogeneous computing will fail miserably against homogeneous high throughput computing using AVX2.

this Open CL thingy looks more and more like a solution desesparately seeking for a problem, good luck to them for convincing ISVs to port big applications to this toy (not just a module or two with the development payed by the hardware vendor to have at least a few showcases), with as a primary side effects regression of performance vs native code on mainstream targets, more difficult maintenance (language restrictions, subpar development tools, lack of Open CL fluent coders), stalled projects during the migration, I should haved missed one thing or two
 
Last edited:

CPUarchitect

Senior member
Jun 7, 2011
223
0
0
That has to be one of the most biased articles I've seen in years:
Tom's Hardware said:
...today’s Sandy Bridge-based HD Graphics engines don’t support OpenCL (and we still haven't been able to get our hands on any Ivy Bridge-based Core i5 machines).
Reminds me of the PhysX87 fiasco, in which NVIDIA tried to convince the world that its GPUs are faster than CPUs when having the CPU run horrendously unoptimized code.

The least they could have done was run OpenCL on the CPU, which every Intel CPU produced in the last decade supports. Anand's tests show that Intel's CPUs are faster than AMD's iGPUs in that case.

And that's with an API designed for the GPU, not the CPU! Furthermore, next year we'll have CPUs with AVX2 which double the throughput and add gather support! It will change the world of computing.

Tom's article looks like a desperate attempt for AMD to sell heterogeneous computing, before homogeneous computing wipes the floor with it.
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
The least they could have done was run OpenCL on the CPU, which every Intel CPU produced in the last decade supports. Anand's tests show that Intel's CPUs are faster than AMD's iGPUs in that case.

i didn't get your point...
OpenCL is better than pure CPU for intel, as well on those tests
 

CPUarchitect

Senior member
Jun 7, 2011
223
0
0
i didn't get your point...
The point is that the Tom's Hardware article is horribly biased. They either should run optimized CPU code, or at the very least run OpenCL on the CPU.
OpenCL is better than pure CPU for intel, as well on those tests
I don't see any tests comparing OpenCL on the CPU versus pure CPU code.
 

CPUarchitect

Senior member
Jun 7, 2011
223
0
0
Ah, he might have mixed up the second and fifth result. Note that he says that "obviously Sandy Bridge saw no benefit from the OpenCL optimizations". It makes no sense that using generic OpenCL on the CPU defeats hand-tuned assembly code. Unless this compares pure CPU transcoding against GPU decode + CPU encode or something. :hmm:

Pardon me for the confusion. In any case the fact is that the CPU is way faster than Tom's article would have people believe. Comparing optimized OpenCL on the GPU against unoptimized CPU code is just shameful, especially since using OpenCL on the CPU is straightforward and would offer quite reasonable use of multi-core vector processing.

It makes me wonder whether AMD is so afraid of AVX2 that they need to cheat. They're clearly dodging any attempt at making a fair(er) comparison.
 

Riek

Senior member
Dec 16, 2008
409
14
76
What is wityh some guys and AVX2??
There doesn't exist a cpu with AVX2 support.
There doesn't exist 1 program that supports AVX2
There doesn't exist compilers that support AVX2

Also, AVX2 is a part of the chain that will be supported by Intel and AMD... so why this is used to draw a rift between intel and AMD is plain stupid, not even mentionning the fact it doesn't exist yet.

openCL vs AVXi is also a meaningless discussion... AVX2 is an instructionset

People who believe AVX2 will outspace gpu in raw power... are idiots.... AVX2 is an instruction set... Its completely disconnected by the speed of the hardware below it... just as openCL is completely disconnected by the operation it will eventually use to reach its goal.

If openCL does deliver such a huge increase than it is worth investing in it... seeing most people already have unused resources that support OpenCL. Would have been great with some more diversity in the test though.
 
Last edited:

bronxzv

Senior member
Jun 13, 2011
460
0
71
There doesn't exist 1 program that supports AVX2

we have one http://software.intel.com/en-us/forums/showthread.php?t=103133&o=a&s=lr and we are a small ISV

MKL http://software.intel.com/en-us/articles/intel-mkl/
AVX2 support is there since several months
and IPP
http://software.intel.com/en-us/articles/intel-ipp/
as well, it means than all software linked with these is actually shiping with AVX2 code paths, that's probably more software already than all the Open CL technology demos

There doesn't exist compilers that support AVX2

huh? are you sure ? how do you think that we spit out code ?
 

mikk

Diamond Member
May 15, 2012
4,180
2,213
136
The least they could have done was run OpenCL on the CPU, which every Intel CPU produced in the last decade supports. Anand's tests show that Intel's CPUs are faster than AMD's iGPUs in that case.


That's a bad example since the Handbrake OpenCL Version is developed by AMD and OpenCL didn't work in this Beta on Intel Hardware in this test. It's boosted by DXVA not OpenCL. So you see Intel DXVA vs AMD OpenCL here. You cannot except OpenCL Support for Intel if AMD is the developer of the software.
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
What is wityh some guys and AVX2??
There doesn't exist a cpu with AVX2 support.
There doesn't exist 1 program that supports AVX2
There doesn't exist compilers that support AVX2
AVX2 brings GPU technology into the CPU cores. It offers the same computing power, without the overhead or limitations. So there's plenty of reason to get excited over AVX2.

And yes, no CPU supports it yet. But neither does any APU today support a unified address space and context switches. That's only planned to be complete by 2014. So AVX2 will get there sooner.

GCC 4.7 supports AVX2, LLVM 3.1 supports AVX2 and Visual Studio 2012 supports AVX2. So compilers are well ahead of schedule too.
Also, AVX2 is a part of the chain that will be supported by Intel and AMD... so why this is used to draw a rift between intel and AMD is plain stupid, not even mentionning the fact it doesn't exist yet.
Because AMD has yet to announce that they'll support AVX2. It's inevitable that they will, but they'd rather have people use HSA instead. In other words they're betting the farm on other technology. Looking at what can already be achieved with AVX, and all the phenomenal things added by AVX2, that's really going to turn out to be a big mistake on AMD's part.

Just like NVIDIA realized, they should back away from making compromises to graphics performance for the sake of GPGPU. General purpose computing is what the CPU is for, and AVX2 adds a lot more oomph to it. Heterogeneous computing doesn't scale, due to the round-trip latency and bandwidth bottleneck. So the GPU should concentrate on pure graphics only, which is a one-way process.
openCL vs AVXi is also a meaningless discussion... AVX2 is an instructionset
It's not really OpenCL versus AVX2. It's homogeneous versus heterogeneous general purpose throughput computing. OpenCL is just one way to get code auto-vectorized. But AVX2 supports many more programming languages and frameworks. So it's not a question of one or the other. Indeed as you indicate, one is hardware and the other is software. That said, OpenCL may not survive long after homogeneous computing proves to be superior, since it will have to compete against other languages which have fewer restrictions.

AVX2 can be used by any language as-is. All you need is loops with independent iterations to auto-vectorize them. AVX2's gather support is critical in enabling that. And it means developers can use languages they already know and love, instead of trying to shoehorn things into the OpenCL framework and losing performance on heterogeneous architectures.
People who believe AVX2 will outspace gpu in raw power... are idiots.... AVX2 is an instruction set... Its completely disconnected by the speed of the hardware below it...
Sure, it depends on the underlying hardware whether it's a high performance implementation or not. But that's equally true for GPUs!

Haswell's implementation of AVX2 will have three 256-bit execution units per core. Two of these will be capable of FMA operations, resulting in a peak performance of 500 GFLOPS for a quad-core. On a performance/area metric that's actually quite close to any GPU. And you don't lose any of the existing CPU qualities like far superior sequential speed, large cache space per thread, branch prediction to prevent stalls, etc.

Last but not least, AVX2 is not the end of the road. The encoding format supports extending it up to 1024-bit registers. This can be used to lower the power consumption of the CPU's front-end and out-of-order execution, by executing AVX-1024 instructions in four cycles (i.e. same ALU throughput for four times less power consumption in the rest of the pipeline). This would effectively make the CPU behave much more like a GPU in terms of power consumption. So heterogeneous computing won't have any benefits left.
 

gorobei

Diamond Member
Jan 7, 2007
3,716
1,078
136
the benchmarks are slightly cherry picked, but rather than seeing it as endorsing a particular brand I view it as confirmation that there are enough applications that will benefit from parallelization regardless of hardware vendor. they are getting significant enough results to justify coding to OCL.

the more relevant part was the interview with Russel Williams on the pro/cons of keeping the data on chip versus shipping it out to pcie-bus or memory. in the s/a video, Andrew Richards also reiterates the issue: power and latency costs of going from cpu to gpu vs staying on single silicon(apu). John Carmack touched on it in his ID tech speech last year as well.

There are some aps where sheer number of units(gpu) is faster/better. There are others where its better to keep it on die and not wait as you pack up all the data to ship it out to bus and waiting for the results to come back(apu).
Russell Williams: I don't have numbers off the top of my head, but think of a 16-megapixel DSLR image. Say you want to do something, like modifying the tilt of the blur plane in the blur gallery, and you want to get feedback in real-time—30 to 60 FPS. Then you have to composite the result with 50 other layers, and that compositing needs to be done back on the CPU, because the entire compositing engine isn't done on the GPU. So copying data back at 60 FPS, you're copying the full image that's being processed two or three times per frame. Suddenly, that PCIe doesn't look as fast as you originally thought.

....

Russell Williams: If you want to make a sandwich, and you invent a machine that can make your sandwich in two seconds, it still doesn't make sense to drive to New York to use the machine when you live in California. The shorter latency of the APU empowers us to use the GPU in all sorts of ways that don't make sense for discrete graphics.

the other issue was ubiquity. coding for avx only covers intel users. coding for cuda eliminates amd gpu users. coding for directX-compute eliminates mac. OpenCL is open and everyone amd/intel/arm can be targeted by app writers. you may sacrifice some performance in most common denominator, but it is a bigger customer base.

the real issue is will it be another java situation where you code once and optimize 20 times for all the different architectures.
 

piesquared

Golden Member
Oct 16, 2006
1,651
473
136
Wow, gotta love it. Everything get's pounded by OpenCl. Traditional CPU's are dinosaurs. Heterogeneous computing is where the future is at, guaranteed. Lot's of big names signing on to the new HSA Foundation including ARM, TI, Imagination and MediaTek.
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
the benchmarks are slightly cherry picked, but rather than seeing it as endorsing a particular brand I view it as confirmation that there are enough applications that will benefit from parallelization regardless of hardware vendor. they are getting significant enough results to justify coding to OCL.
Are you kidding? There are no OpenCL results for NVIDIA nor for Intel. You can't say this is significant enough to justify coding for OpenCL when the vast majority of systems isn't even represented.
the real issue is will it be another java situation where you code once and optimize 20 times for all the different architectures.
That's exactly the big concern here. With AVX2 there is no concern because it's guaranteed to be faster since there's no heterogeneous latency or bandwidth issue.

OpenCL is a new "standard" where there is no need for one, leading to fragmentation and wildly varying performance. Just look at how the GTX 680 fails against a quad-core CPU! With homogeneous computing like AVX2, developers can use existing languages, and higher performance across the board with less effort.
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
OpenCL is a new "standard" where there is no need for one, leading to fragmentation and wildly varying performance. Just look at how the GTX 680 fails against a quad-core CPU! With homogeneous computing like AVX2, developers can use existing languages, and higher performance across the board with less effort.

...fragmentation and varying performance....

while avx2 = intel and maybe amd, one day
 

bronxzv

Senior member
Jun 13, 2011
460
0
71
OpenCL is a new "standard" where there is no need for one, leading to fragmentation and wildly varying performance. Just look at how the GTX 680 fails against a quad-core CPU!

I love this example because it's a 3D renderer, if they spent more time to optimize a native path with AVX instead of porting to OCL I'm quite sure it will be even competitive with AMD single chip GPUs, not to mention that high-end systems in DCC are 2 socket 8-core Xeons workstations not a single quad core
 
Last edited:

bronxzv

Senior member
Jun 13, 2011
460
0
71
...fragmentation and varying performance....

while avx2 = intel and maybe amd, one day

more generally the question is about Open CL vs native code (ARM + Neon, x86 + AVXn, other...)

Open CL promoters
http://forums.anandtech.com/showpost.php?p=33559491&postcount=4
now talk about adding another layer on top of Open CL to support more languages and to obfuscate OCL code, i.e. they aknowlege the "L" part of Open CL isn't important so the only aim is to become yet one more IL/VM
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |