Should Intel design a CPU core with a perfomance level in between Core and Atom?

cbn · Nov 3, 2015

Currently Intel has two CPU core designs:

1. Core, which is a (4 wide) high frequency (4.2 Ghz turbo for i7 6700K), high IPC design slotted for use from 4.5W (Core M dual core) to 95W (i7 6700K).

2. Atom, which is a (2 wide) lower frequency (2.4Ghz turbo for Braswell N3700) low IPC design slotted for use between 2W (SDP) to 6W.

IPC on the Core line is about 2x greater than the atom line, but the Core design is capable of much higher frequency widening the single thread gap to about 3.5X greater for the Core CPU (at max clocks) compared to atom (at max clocks). That is pretty large gap in single thread performance.

Now granted the Core line does have SMT to help narrow that gap a bit, but I still wonder if Intel is planning a mid size core.

Some reasons for the mid size core:

1. Better performance at low wattage (basically a replacement for Core M)
2. Better performance per watt (at the same level of single thread performance) compared to Core in high core count Xeon Servers.

EDIT (11/4/2015 @ 9:10 pm): Changed title from "ITT: We discuss the possibility of Intel designing a third CPU core" to "Should Intel design a CPU core with a perfomance level in between Core and Atom?"

NTMBK · Nov 3, 2015

Already started. Skylake for server is a different core.

Qwertilot · Nov 3, 2015

Although that is of course a new big core. Needed too - the standard core stuff isn't really getting larger over time. Its probably already medium now and only going one way from here.

dark zero · Nov 3, 2015

Atom needs to get Core instructions and a big revamp in order to become useful

cbn · Nov 3, 2015

Regarding the high core count servers....

For every new node the potential to increase cores basically doubles. But the node typically only offers ~30% power reduction. This rather than the 50% power reduction needed to keep clocks the same (for X TDP) at this hypothetical doubling of core count.

So in order to get around this problem, uarch efficiency needs to increase some way and/or the core count increase needs to be less than double.

Intel up to this point has been keeping the core count increase rather modest, but I just have to wonder if this begins to change in the near future.

Maybe a "mid size" low frequency core design that is high IPC (wide) to replace a future "big core" high frequency core design (being used at low frequency) that is also high IPC (wide)?

Or maybe a "mid size" mid frequency core design that is medium IPC to replace a future "big core" high frequency core design (being used at low frequency) that is high IPC (wide)?

jpiniero · Nov 3, 2015

Intel seems pretty intent on having one core design for everything, and using their fabs to cover up the gaps. MorphCore would be pretty useful in maintaining MT perf/W without having to eat up too much die.

Not sure how a Big Core Phi would work though.

NTMBK said:
Already started. Skylake for server is a different core.

Not quite. Skylake server is still derived from an earlier version of Skylake mainstream which had AVX3 among other things.

Phynaz · Nov 3, 2015

Core
Atom
Phi
Quark
Itanium
Various controller cores, such as the xl170

kimmel · Nov 3, 2015

jpiniero said:
Not quite. Skylake server is still derived from an earlier version of Skylake mainstream which had AVX3 among other things.

I love how people make things up on the internet.

frozentundra123456 · Nov 3, 2015

Well, for mobile, I am afraid it is too late already. Bay Trail was a nice improvement, but 14nm Cherry Trail is awful on the cpu side, and they needed it to be a home run.
Yes it has better graphics, but do you have the cpu power to utilize it? And where are the true SOC designs. I just cant believe a company so good at designing desktop and server chips can struggle so with mobile.

cbn · Nov 3, 2015

frozentundra123456 said:
Bay Trail was a nice improvement, but 14nm Cherry Trail is awful on the cpu side, and they needed it to be a home run.
Yes it has better graphics, but do you have the cpu power to utilize it? And where are the true SOC designs. I just cant believe a company so good at designing desktop and server chips can struggle so with mobile.

Yes, on that Cherry Trail/Braswell I remember thinking to myself the CPU was very weak in relation to the GPU. And apparenly when both are used together (at the lowest TDP specification) they throttle.

In contrast, I wish the CPU was better performing and the GPU more modest. Then integrate some lower power IP (of various sorts) to complement that.

Zodiark1593 · Nov 3, 2015

How about we start with AVX across the board?

Nothingness · Nov 3, 2015

kimmel said:
I love how people make things up on the internet.

How is server Skylake different from consumer Skylake then?

Nothingness · Nov 3, 2015

Zodiark1593 said:
How about we start with AVX across the board?

Like it

jhu · Nov 4, 2015

Phynaz said:
Core
Atom
Phi
Quark
Itanium
Various controller cores, such as the xl170

/thread

beginner99 · Nov 4, 2015

Zodiark1593 said:
How about we start with AVX across the board?

Why would Atom even need that? only software I know of that at least makes a little bit sense to run on an Atom is Handbrake. But else? Waste of die pace and energy.

kimmel · Nov 4, 2015

Nothingness said:
How is server Skylake different from consumer Skylake then?

Officially how does anyone outside of Intel or NDA know what server Skylake is? Yes, we have AVX512 leaks aplenty, which since we know consumer Skylake doesn't have them it would seem that would be a big difference. However, positing the development history of an unreleased core on an unreleased product seems a bit premature.

Arachnotronic · Nov 4, 2015

Nothingness said:
How is server Skylake different from consumer Skylake then?

We don't know yet other than AVX-512 support in SKL-server. However, from what I have been told privately, the differences between consumer and server Skylake should be fairly interesting (although I haven't been able to get actual details).

NTMBK · Nov 4, 2015

Arachnotronic said:
We don't know yet other than AVX-512 support in SKL-server. However, from what I have been told privately, the differences between consumer and server Skylake should be fairly interesting (although I haven't been able to get actual details).

David Kanter was speculating that L2 will be doubled on SKX- based on the fact that SKL halved L2 associativity. Doubling would get them back to 8-way, and reduce cache pressure from those enormous AVX-512 loads.

My personal guess is that SKX's interconnect will move to a mesh topology, similar to the one used on Knight's Landing. The ring interconnect was already stretched to breaking point on top end Haswell- the cores were no longer connected by a single bidirectional ring, but by a pair of bidirectional rings connected by switches. Seems like the ring does not scale well beyond ~10 stops. If they want to continue to ramp up core counts (which seems likely), they will need a new topology proven to scale better- and the one from KNL looks like a good candidate.

Arachnotronic · Nov 4, 2015

NTMBK said:
David Kanter was speculating that L2 will be doubled on SKX- based on the fact that SKL halved L2 associativity. Doubling would get them back to 8-way, and reduce cache pressure from those enormous AVX-512 loads.

My personal guess is that SKX's interconnect will move to a mesh topology, similar to the one used on Knight's Landing. The ring interconnect was already stretched to breaking point on top end Haswell- the cores were no longer connected by a single bidirectional ring, but by a pair of bidirectional rings connected by switches. Seems like the ring does not scale well beyond ~10 stops. If they want to continue to ramp up core counts (which seems likely), they will need a new topology proven to scale better- and the one from KNL looks like a good candidate.

I agree with the L2$. Intel is doing the same trick with Goldmont -- consumer Goldmont will have 1MB L2$ per core pair but server Goldmont will have 2MB L2$ per core pair.

Also agree with a move to a mesh topology in SKX -- this should help them scale to ~40 cores which we will likely see in CNL-EP/EX.

Insert_Nickname · Nov 4, 2015

Zodiark1593 said:
How about we start with AVX across the board?

That'd be a start.

Though it might be problematic for the atom cores.

beginner99 said:
Why would Atom even need that? only software I know of that at least makes a little bit sense to run on an Atom is Handbrake. But else? Waste of die pace and energy.

One of the big features of .NET 4.6 is a new JIT compiler with support for, you guessed it, AVX2. So while not critical right now, it'll become more important going forward.

Nothingness · Nov 4, 2015

Arachnotronic said:
We don't know yet other than AVX-512 support in SKL-server. However, from what I have been told privately, the differences between consumer and server Skylake should be fairly interesting (although I haven't been able to get actual details).

I guess this will amount to increasing some internal structures (more TLB entries, larger predictor arrays, etc.), but no change in the micro-architecture itself, nothing like tick-tock changes.

But as you ans Kimmel say, we won't know for sure until Intel releases some information.

Nothingness · Nov 4, 2015

NTMBK said:
David Kanter was speculating that L2 will be doubled on SKX- based on the fact that SKL halved L2 associativity. Doubling would get them back to 8-way, and reduce cache pressure from those enormous AVX-512 loads.

There are good reasons why Intel could have halved L2 associativity: latency, power. I fail to see how it could mean larger L2 cache for server parts, especially if that increased cache size comes with more power consumption due to higher associativity. Speculating of course

My personal guess is that SKX's interconnect will move to a mesh topology, similar to the one used on Knight's Landing. The ring interconnect was already stretched to breaking point on top end Haswell- the cores were no longer connected by a single bidirectional ring, but by a pair of bidirectional rings connected by switches. Seems like the ring does not scale well beyond ~10 stops. If they want to continue to ramp up core counts (which seems likely), they will need a new topology proven to scale better- and the one from KNL looks like a good candidate.

This looks very plausible.

Essence_of_War · Nov 4, 2015

beginner99 said:
Why would Atom even need that? only software I know of that at least makes a little bit sense to run on an Atom is Handbrake. But else? Waste of die pace and energy.

Because Atom cores are used in the Xeon Phi, which needs wide SIMD for HPC.

kimmel · Nov 4, 2015

Essence_of_War said:
Because Atom cores are used in the Xeon Phi, which needs wide SIMD for HPC.

Atom cores are not used in Xeon Phi. They may have started with that design but it's not even close after modifications.

http://www.realworldtech.com/knights-landing-details/

The custom core in Knights Landing is derived from Silvermont, but with substantial modifications. KNL probably uses 4-way multi-threading for latency tolerance and also performance compatibility with the previous generation. The rumors also state that the KNL core will replace each of the floating point pipelines in Silvermont with a full blown AVX-512 vector unit, doubling the FLOPs/clock to 32.

sm625 · Nov 4, 2015

There is no way Intel can design a mobile cpu core that is worth a damn without it cannabalizing their high margin overpriced offerings. So they wont. Atom will remain useless and Core M will remain overpriced. Even if it costs them their business at some point.

Should Intel design a CPU core with a perfomance level in between Core and Atom?

Lifer

Lifer

Golden Member

Platinum Member

Lifer

Lifer

Lifer

Senior member

Lifer

Lifer

Platinum Member

Platinum Member

Platinum Member

Lifer

Diamond Member

Senior member

Lifer

Lifer

Lifer

Diamond Member

Platinum Member

Platinum Member

Platinum Member

Senior member

Diamond Member