VISC CPU 3X the IPC?

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
The most useful part of that slide, IMO, is the comparison of the Apple Cyclone, ARM A15, ARM A57, and Haswell.

How accurate is it, though? IIRC, Saltwell's pipeline (16 stages) is at least as long as Haswell. And what are the clock speeds?

Edit: nvm the pipeline comment because they used Silvermont not Saltwell, but I think Cyclone's is longer.
 
Last edited:

Nothingness

Platinum Member
Jul 3, 2013
2,757
1,405
136
How accurate is it, though? IIRC, Saltwell's pipeline (16 stages) is at least as long as Haswell. And what are the clock speeds?

Edit: nvm the pipeline comment because they used Silvermont not Saltwell, but I think Cyclone's is longer.
Interesting comment about pipeline length... Haswell branch misprediction penalty is 15-20 cycles, while Cyclone is 14-19 cycles. To me this means two things: both CPUs are as deep (so the slide is not accurate), and Cyclone pipe stages are rather short which probably means frequency can be increased well beyond the current level (though probably with a too high power requirement).
 
Mar 10, 2006
11,715
2,012
126
Interesting comment about pipeline length... Haswell branch misprediction penalty is 15-20 cycles, while Cyclone is 14-19 cycles. To me this means two things: both CPUs are as deep (so the slide is not accurate), and Cyclone pipe stages are rather short which probably means frequency can be increased well beyond the current level (though probably with a too high power requirement).

Sounds like extra gas in the tank for future chips, particularly built on 14/16 FinFET.
 

Atreidin

Senior member
Mar 31, 2011
464
27
86
I'm skeptical. However I don't doubt that we'll see super-fans touting how awesome it is, even after real-world products don't show the predicted theoretical performance.
 

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
Interesting comment about pipeline length... Haswell branch misprediction penalty is 15-20 cycles, while Cyclone is 14-19 cycles. To me this means two things: both CPUs are as deep (so the slide is not accurate), and Cyclone pipe stages are rather short which probably means frequency can be increased well beyond the current level (though probably with a too high power requirement).

Good find, link is broken.

http://www.agner.org/optimize/microarchitecture.pdf
 

Spungo

Diamond Member
Jul 22, 2012
3,217
2
81
This is one of those images that looks nice in theory, but it doesn't really work in practice. Not yet anyway. If there was a way to make a single virtual core from 1000 GPU cores or 8 CPU cores, AMD and Nvidia would have done it by now. They realize this is ridiculously hard to do, so they're approaching the problem from the opposite end. Instead of making single thread code work better on multicore, make it easier to create multithreaded code. This is done by creating code libraries. AMD had their Stream project, Nvidia had CUDA. The open standard is OpenCL. Microsoft's is DirectCompute. C++ AMP is a GPU focused library for Microsoft Visual C++. Visual C# has some multithreaded libraries as well.

I'll give an example of how easy it is to create threaded software if the proper code libraries are available. Let's say I have an array of 100 numbers. I want to add 5 to every value in the array. Using current code libraries without multithreading, it would be something like this (pseudocode):
Code:
foreach (array) {
   += 5;
}
This will go through each item of the array, one at a time, and add 5.

Without a library, it's difficult to split this up to use 5 or 10 processor cores at one time. If I have a library, it's very easy:
Code:
use threading;      
foreach.threaded.threads = 10;
foreach.threaded (array) {
   +=5;
}
Now it's processing 10 at a time. Easy. Not everything can be split this easily, but it's a heck of a lot easier to do in software than it is to do in hardware.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
59
91
Extraordinary claims require extraordinary proof.

Surely the VISC folks know this and are scrambling to get their proof into the hands of credible people who can then validate the VISC claims.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Many cores working on the same thread, do they say if this is working on a popular instruction set such as ARM, x86, or MIPS? The design aspect is impressive if it's not running some customized instruction set designed specifically to make it easier to manage that single thread. If it's a custom instruction set then to me it downgrades from impressive to interesting. Notable that I don't see any perf/W information nor comparisons of die size.
 

Paul98

Diamond Member
Jan 31, 2010
3,732
199
106
Many cores working on the same thread, do they say if this is working on a popular instruction set such as ARM, x86, or MIPS? The design aspect is impressive if it's not running some customized instruction set designed specifically to make it easier to manage that single thread. If it's a custom instruction set then to me it downgrades from impressive to interesting. Notable that I don't see any perf/W information nor comparisons of die size.

I thought I read that it does both ARM, and x86
 

pw257008

Senior member
Jan 11, 2014
288
0
0
Many cores working on the same thread, do they say if this is working on a popular instruction set such as ARM, x86, or MIPS? The design aspect is impressive if it's not running some customized instruction set designed specifically to make it easier to manage that single thread. If it's a custom instruction set then to me it downgrades from impressive to interesting. Notable that I don't see any perf/W information nor comparisons of die size.
One of the articles cites the VISC folks as saying perf/W was about on par with the competing cores. Seems impressive especially for the prototype stage, assuming further improvements with refinement, if this ends up workable and the numbers end up being close to this impressive with independent testing.
 

ElFenix

Elite Member
Super Moderator
Mar 20, 2000
102,425
8,388
126
Many cores working on the same thread, do they say if this is working on a popular instruction set such as ARM, x86, or MIPS? The design aspect is impressive if it's not running some customized instruction set designed specifically to make it easier to manage that single thread. If it's a custom instruction set then to me it downgrades from impressive to interesting. Notable that I don't see any perf/W information nor comparisons of die size.

it virtualizes the instruction set. so it can run anything.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
59
91
Many cores working on the same thread, do they say if this is working on a popular instruction set such as ARM, x86, or MIPS? The design aspect is impressive if it's not running some customized instruction set designed specifically to make it easier to manage that single thread. If it's a custom instruction set then to me it downgrades from impressive to interesting. Notable that I don't see any perf/W information nor comparisons of die size.

Others have directly answered your question but here is the relevant passage of text, if it helps:

As you can see from the image above, there is a very specific way that they achieve this virtualized CPU architecture who’s goal is not only to abstract the cores but to also abstract the ISA, as they claim to be able to run virtually any ISA on their cores if needed, which is how they are able to demonstrate that they are running Android ICS (not Kit Kat) on their demonstration machine today at Linley.
 

DrMrLordX

Lifer
Apr 27, 2000
21,805
11,161
136
Extraordinary claims require extraordinary proof.

Surely the VISC folks know this and are scrambling to get their proof into the hands of credible people who can then validate the VISC claims.

There's a lot of good money riding on those claims. I don't remember Bitboys having backers quite like that. Maybe my memory is just a bit hazy.

It could still be a scam, or just a good idea gone bad. Just sayin.
 

MisterMac

Senior member
Sep 16, 2011
777
0
0
I'm skeptical that if all it tooks was a few hundred million dollars to create a way to increase IPC dramaticly - somehow even MIPS would have found a way to get it created before these guys.

Even Intel has to be researching new design ways - or experimenting with making new theorectical ISAs.

If they eventually find one they can patent and increase performance on 100%++ scale without breaking too much behind compatibility, i bet they'd do.
 

sm625

Diamond Member
May 6, 2011
8,172
137
106
I'm skeptical that if all it tooks was a few hundred million dollars to create a way to increase IPC dramaticly - somehow even MIPS would have found a way to get it created before these guys.

Even Intel has to be researching new design ways - or experimenting with making new theorectical ISAs.

If they eventually find one they can patent and increase performance on 100%++ scale without breaking too much behind compatibility, i bet they'd do.

Dont forget that intel sat on P4 for 3 years longer than they should have. Intel only cares about its margins, profits, dividends, etc. They will not innovate unless forced.
 

sm625

Diamond Member
May 6, 2011
8,172
137
106
http://www.extremetech.com/extreme/...nceptual-breakthrough-weve-been-waiting-for/2



That's all there is for performance and power I have seen. Would have expected lower power.

We got people on this thread saying this design is useless due to the 350MHz clock speed. But this image seems to negate that, no? I see no mention of any "normalization" of clock speeds. Clearly it would not perform this well at 350MHz unless IPC was an order of magnitude higher. Which means, to me anyway, that if it runs this well at 350Mhz, then it could at least conceivably scale to 3GHz, which would be totally revolutionary.
 

Khato

Golden Member
Jul 15, 2001
1,225
280
136
We got people on this thread saying this design is useless due to the 350MHz clock speed. But this image seems to negate that, no? I see no mention of any "normalization" of clock speeds. Clearly it would not perform this well at 350MHz unless IPC was an order of magnitude higher. Which means, to me anyway, that if it runs this well at 350Mhz, then it could at least conceivably scale to 3GHz, which would be totally revolutionary.

Eh, immediately below that performance chart the articles states that, 'Power consumption is listed as “about” the same' which isn't exactly a good sign. Does that mean that even at such a low clock frequency they still have to be running at a comparable voltage to the other designs? Which could mean that they have just as much or more total logic but are simply dividing the pipeline into less stages - that's a great way to 'improve' IPC. It also could easily mean that their 'virtualization' logic has dependencies which can't be broken up into multiple pipeline stages and hence they're limited to low frequency regardless of voltage.

Or it might be that they aren't so great at actually designing a processor and have some horrible timing paths that are killing their possible efficiency... But given their hype you'd think that they'd be loudly advertising such if it was the case since it'd result in markedly better results than what they're showing.
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
We got people on this thread saying this design is useless due to the 350MHz clock speed. But this image seems to negate that, no? I see no mention of any "normalization" of clock speeds. Clearly it would not perform this well at 350MHz unless IPC was an order of magnitude higher. Which means, to me anyway, that if it runs this well at 350Mhz, then it could at least conceivably scale to 3GHz, which would be totally revolutionary.
Yes, Intel must be very scared now. AMD could suddenly release a CPU with 3.5x Haswell's IPC and obliterate Intel's market share.

Eh, immediately below that performance chart the articles states that, 'Power consumption is listed as “about” the same' which isn't exactly a good sign. Does that mean that even at such a low clock frequency they still have to be running at a comparable voltage to the other designs? Which could mean that they have just as much or more total logic but are simply dividing the pipeline into less stages - that's a great way to 'improve' IPC. It also could easily mean that their 'virtualization' logic has dependencies which can't be broken up into multiple pipeline stages and hence they're limited to low frequency regardless of voltage.

Or it might be that they aren't so great at actually designing a processor and have some horrible timing paths that are killing their possible efficiency... But given their hype you'd think that they'd be loudly advertising such if it was the case since it'd result in markedly better results than what they're showing.
You should be impressed if this is true. I never hear anyone saying we'll ever get 30GHz silicon, but if you can scale this chip, which is at 0.35GHz actually faster than 1GHz Haswell, to even 2GHz, it will actually be 60% faster than Devil's Canyon, with a lot of frequency headroom left. Or is that too simplistic?
 

Nothingness

Platinum Member
Jul 3, 2013
2,757
1,405
136
You should be impressed if this is true. I never hear anyone saying we'll ever get 30GHz silicon, but if you can scale this chip, which is at 0.35GHz actually faster than 1GHz Haswell, to even 2GHz, it will actually be 60% faster than Devil's Canyon, with a lot of frequency headroom left. Or is that too simplistic?
What makes you think it will scale? I am with Khato: they probably have quite large pipe stages (which would explain they only got to 350 MHz on 28 nm), and if that's correct frequency uplift will be limited.
 

ElFenix

Elite Member
Super Moderator
Mar 20, 2000
102,425
8,388
126
So it's possible that K12/Zen is actually just one chip?

if this were k12/zen, then yes. but i very much doubt that k12/zen are this.


We got people on this thread saying this design is useless due to the 350MHz clock speed. But this image seems to negate that, no? I see no mention of any "normalization" of clock speeds. Clearly it would not perform this well at 350MHz unless IPC was an order of magnitude higher. Which means, to me anyway, that if it runs this well at 350Mhz, then it could at least conceivably scale to 3GHz, which would be totally revolutionary.

i don't know why that would necessarily be the case. you can't get a graphics card to 3GHz, and they run real well at 350MHz as well.


Eh, immediately below that performance chart the articles states that, 'Power consumption is listed as “about” the same' which isn't exactly a good sign. Does that mean that even at such a low clock frequency they still have to be running at a comparable voltage to the other designs? Which could mean that they have just as much or more total logic but are simply dividing the pipeline into less stages - that's a great way to 'improve' IPC. It also could easily mean that their 'virtualization' logic has dependencies which can't be broken up into multiple pipeline stages and hence they're limited to low frequency regardless of voltage.

Or it might be that they aren't so great at actually designing a processor and have some horrible timing paths that are killing their possible efficiency... But given their hype you'd think that they'd be loudly advertising such if it was the case since it'd result in markedly better results than what they're showing.
if that's power consumption of an experimental processor with probably limited power tuning and built out of transistors designed to minimize the rate of failures rather than other concerns, it may not be an issue.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,805
11,161
136
Until we see more tests done on the demo hardware and get more information about the 80% of the die area allegedly not concerned with implementing speculative threading (or whatever it is you care to call it), it will be difficult to make any accurate comments regarding its efficiency; ability to scale to higher clockspeeds; or really anything else of substance.

One thing that seems interesting, and a bit odd, is that the provided JPEG/color compression demo that they have released would seem to indicate that there is more than just speculative threading providing an IPC boost to the prototype VISC CPU. Correct me if I am wrong, but the demo shows that two "actual" VISC cores operating as one virtual core @ real clockspeeds of 350 mhz can beat a 1 ghz Haswell core in the same task. If the speculative threading feature is 100% efficient and makes the two physical cores function as a 700 mhz virtual core, you still have a 700 mhz virtual core beating a 1 ghz Haswell core.

Or, to put it a different way, the test seems to indicate that a single "actual" VISC core @ 350mhz would beat a single 500 mhz Haswell core in JPEG/color compression.

That makes one heck of a statement with or without speculative threading. Can the VISC cores manage such a feat in other computational tasks?
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
What makes you think it will scale? I am with Khato: they probably have quite large pipe stages (which would explain they only got to 350 MHz on 28 nm), and if that's correct frequency uplift will be limited.
That's just an assumption. Most modern processors can go to 3-5GHz, so it seem quite extraordinary that this one can go to only 1/10th, so to me it seems you need also quite extraordinary evidence. Maybe they simply wanted to have the same power consumption as Apple A7, A15, etc.?

i don't know why that would necessarily be the case. you can't get a graphics card to 3GHz, and they run real well at 350MHz as well.
As far as I know, GPUs simply run at 1GHz because that is the best trade-off between voltage/power, performance and die area. If you reduce the clock speed to 1GHz, you can have a lot more shaders within the power budget to improve performance instead of quadratic scaling with voltage.
 

positivedoppler

Golden Member
Apr 30, 2012
1,112
174
106
So it's possible that K12/Zen is actually just one chip?

That is an excellent point. The fact that AMD specified they are two different chip art means neither are using VISC. If people are correct in assuming that perfectly threaded software will see no perf incr. this might noy even see the light of day
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |