Discussion Intel current and future Lakes & Rapids thread

Page 124 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

ajc9988

Senior member
Apr 1, 2015
278
171
116
Lack of understanding you said...?..

What is the purpose of comparing at same frequencies other than to check the arch logical efficency without looking at the electrical efficency..?.

Because there s no use of a CPU hat would have 10% higher IPC at the expense of 20% more power.
You think that say 10% more throughput/Hz is free of power charge..?

So testing at the same power will give an indication of IPC and intrinsical efficency of the uarch, ultimately the only thing that mater is the perf/watt, exactly what i m stating, and in this respect one has to admit that 10% better Cinebench score at same 15W power than a 12nm based product is nothing to bragg about, looking at Intel vs Intel 10nm ICL has 20% better perf (and perf/watt) at isopower than the most refined 14nm SKL parts.
No, testing at the same power DOES NOT give IPC. IPC is a DEFINED term. Do you not understand that defined terms and how they are tested are what they are.

You say such a measurement is useless. That's fine and is your opinion. But what you cannot say is that measuring at a fixed power draw measures IPC! IT DOES NOT! Instead, it gives the performance at a specific power draw, which is performance per watt. THAT IS A DIFFERENT METRIC. Sure, it is more apt at determining performance within a specific power draw scenario, which can make it more useful, and is reliant on IPC as a contributing factor that is baked in, but it, in and of itself, does NOT give the IPC number itself.

So you are wrong that it gives IPC, because it doesn't. It is performance per watt, which IPCxFrequency gives the performance, or rather it can tell you how many instructions are performed.

You are NOT talking IPC, you are talking overall performance or performance at a specific power draw. That is not the same as IPC.

So you are mixing measurements, trying to get people off of examining IPC, then pushing a different metric. If you only pushed the different metric as more accurate, that is fine. But so grossly misrepresenting what IPC is IS what I am taking issue with.

From WIkipedia (https://en.wikipedia.org/wiki/Instructions_per_cycle):

Calculation of IPC
The calculation of IPC is done through running a set piece of code, calculating the number of machine-level instructions required to complete it, then using high-performance timers to calculate the number of clock cycles required to complete it on the actual hardware. The final result comes from dividing the number of instructions by the number of CPU clock cycles.

The number of instructions per second and floating point operations per second for a processor can be derived by multiplying the number of instructions per cycle with the clock rate (cycles per second given in Hertz) of the processor in question. The number of instructions per second is an approximate indicator of the likely performance of the processor.

The number of instructions executed per clock is not a constant for a given processor; it depends on how the particular software being run interacts with the processor, and indeed the entire machine, particularly the memory hierarchy. However, certain processor features tend to lead to designs that have higher-than-average IPC values; the presence of multiple arithmetic logic units (an ALU is a processor subsystem that can perform elementary arithmetic and logical operations), and short pipelines. When comparing different instruction sets, a simpler instruction set may lead to a higher IPC figure than an implementation of a more complex instruction set using the same chip technology; however, the more complex instruction set may be able to achieve more useful work with fewer instructions.

Factors governing IPC
A given level of instructions per second can be achieved with a high IPC and a low clock speed (like the AMD Athlon and early Intel's Core Series), or from a low IPC and high clock speed (like the Intel Pentium 4 and to a lesser extent the AMD Bulldozer). Both are valid processor designs, and the choice between the two is often dictated by history, engineering constraints, or marketing pressures.[original research?] However, a high IPC with a high frequency will always give the best performance.
 
Last edited:

Abwx

Lifer
Apr 2, 2011
11,167
3,862
136
Who said that i m talking of IPC alone..?
I m saying that IPC improement alone is not a guarantee for signifcantly better perfs.

Power is basically IPC weighted by execution efficency counted in watt/flop, because executing 10% more ops at same frequency will require at least 10% more power.

There s no free lunch, if FPU usage is increased by 10% the FPU consumption will increase accordingly, or do you think that one can increase IPC without increasing power...?

If you think so then all debate is useless...
 
Reactions: esquared

ajc9988

Senior member
Apr 1, 2015
278
171
116
Who said that i m talking of IPC alone..?
I m saying that IPC improement alone is not a guarantee for signifcantly better perfs.

Power is basically IPC weighted by execution efficency counted in watt/flop, because executing 10% more ops at same frequency will require at least 10% more power.

There s no free lunch, if FPU usage is increased by 10% the FPU consumption will increase accordingly, or do you think that one can increase IPC without increasing power...?

If you think so then all debate is useless...


On the IPC front Intel s 18% is BS, from PCWorld review we can deduct that in Geekbench 5% of the better "IPC" AVERAGE is brought by AES and AVX2.

In CB R15 ST it is 3% faster clock/clock than SKL and roughly 4% slower than Zen 2, i guess that to get their 18% they also included Quiksync as another IPC provider...

Those are scores at 15W TDP, so everything is included, IPC and intrinsical efficency of the uarch at a given node...

So, if you look at the Wikipedia definition of "IPC is done through running a set piece of code, calculating the number of machine-level instructions required to complete it, then using high-performance timers to calculate the number of clock cycles required to complete it on the actual hardware. The final result comes from dividing the number of instructions by the number of CPU clock cycles" along with the definition of instructions per second, which "is an approximate indicator of the likely performance of the processor," you can see that you keep talking about IPC and even said IPC was measured by PCWorld, which it was not, thereby misconstruing to people here on what IPC is.

You then say you are not looking only at IPC. That is fine. Then get IPC out of your mouth. It is a very specific measurement. How do you not realize that?

Also, if a different instruction set is used on the same test with different hardware, then you are not measuring IPC at all. You need the instruction set to be the same, then try to control the frequency (cycles per second), the memory bandwidth and latency, and any other factors that you can to arrive at the actual IPC for a given instruction set.

That means that if you change the instruction set by running a different task, you will arrive at a different value. This is where understanding your use cases, the instructions and performance on different hardware, etc. is vital in arriving at a proper determination. That is why Intel made a stink about AVX-512 optimizations for Cascade-SP chips, which AMD then retorted that it did use in the comparison, back at Computex when comparing the 28-core chip to the Rome 64-core chip, along with mentioning that Intel's 56-core chip was not commercially available where they could do a test system with it, striking back at Intel's retort there. This was discussed in Ian Cutress's interview with Wendell of Level 1 Techs on Youtube.

My point is, stop using IPC incorrectly. Say the metric you are using directly. Defend the use of that metric. But stop trying to say you can get IPC isolated in some way, whether it be you misusing the term or using imprecise language when discussing it.

Also, performance per watt isn't the best way to measure for a number of reasons. Some uarch and processes can run hotter at the same power draw. That's a fact. This is why we compare Instructions per second or tasks, ignoring that they may use different instruction sets, completed in a given period of time. That gives us an estimate of what we will expect in overall performance between two CPUs, while allowing the best instruction set to be used by the program, depending on how it was coded, to inform how similar our programs used in daily life are to that task with the same instruction set, thereby informing purchasing decisions.

But, by finding IPC for each instruction set, then looking at the cycles per second, we can start getting closer to an idea of our actual expected performance. If you have power constraints, then those need factored in and performance per watt becomes more viable for comparison purposes.

As such, I am trying to inform you thatt you are wrong in your explanation and use of the term IPC.
 

ajc9988

Senior member
Apr 1, 2015
278
171
116
Last time I checked [higher] IPC is exactly being able to run more ops using the same power.
Not necessarily. That is isopower comparison, or performance per watt. You isolate power consumption, not the IPC. Because of that, frequencies can be different between the two CPUs, thereby meaning you are leaving the IPC measurement, due to different frequencies, and looking at instructions per second with isopower, which can give you the performance per watt figure, but not the IPC figure.
 

ondma

Platinum Member
Mar 18, 2018
2,770
1,351
136
You are in a circular reasonning where you think that you can harvest one thing several times..

If it s tested at 15W it means that the IPC advantage is accounted, by running them at same frequency you ll have a chip that consume more than the other, namely the one with higher IPC, you can then downclock the one with higher IPC such that it consume the same and it will still provide better throughput due to frequency being downclocked by a smaller amount than the IPC difference.

Once all is made you can compare the chips at equal power, wich is the only metric that matter, or should we compare them at different powers, that is at, same frequency.?.

Really, i wonder the lengths some people are going through once the numbers do not reflect their hopes..
You should know!!
 

ajc9988

Senior member
Apr 1, 2015
278
171
116
IPC is not connected to power at all - its precisely what the name implies - the measure of how many instructions can be executed per cycle. It does tell you nothing about how much energy is consumed per cycle.
Thank you for that. I also don't want to get into the efficiency curve for isopower and isoperformance with them, or that there is a point where efficiency dumps off a cliff and the performance per watt tanks. Nor how that can vary per process used, or uarch decisions, etc.
 

birdie

Member
Jan 12, 2019
98
71
51
I don't know what's up with people here who downvote me left and right, I'm looking at @coercitiv and @Space Tyrant, but
almost every CPU/GPU in the past 30 years which had a better IPC could do more work in the same power package than its counterparts with a worse IPC.
Feel free to downvote this message as well if it's your knee-jerk reaction without using any grey matter. I won't leave any comments any regard to IPC any more since there are too many theorists around who just love to disagree and downvote. God, I've never downvoted a single person here but it looks like some people just cannot contain themselves.
 
Last edited:

Space Tyrant

Member
Feb 14, 2017
149
115
116
I don't know what's up with people here who downvote me left and right, I'm looking at @coercitiv and @Space Tyrant, but
almost every CPU/GPU in the past 30 years which had a better IPC could do more work in the same power package than its counterparts with a worse IPC.
Feel free to downvote this message as well if it's your knee-jerk reaction without using any grey matter. I won't leave any comments any regard to IPC any more since there are too many theorists around who just love to disagree and downvote. God, I've never downvoted a single person here but it looks like some people just cannot contain themselves.
I didn't downvote you. I was just a little "surprised" by your fine redefinition of IPC.
 
Reactions: ajc9988

birdie

Member
Jan 12, 2019
98
71
51
It might be defined differently but it has a common, almost ubiquitous physical manifestation in the real world which I expressed. Perhaps I had to be more precise with my words 'cause some people here are nit-picky as hell.
 

jur

Junior Member
Nov 23, 2016
23
4
81
What's going on with 3DPM avx-512/avx2 ? 4 times as fast as Whiskey Lake, but theoretical flops are the same. Icelake does have one more shuffle and much improved load/store but it does not explain such differences.
 

Abwx

Lifer
Apr 2, 2011
11,167
3,862
136
What's going on with 3DPM avx-512/avx2 ? 4 times as fast as Whiskey Lake, but theoretical flops are the same. Icelake does have one more shuffle and much improved load/store but it does not explain such differences.

AVX512 as well as power increased hugely the necessary time to increase "IPC" thanks to frequency, up to 45W for 8s in "15W" mode and 20s in "25W" mode.

And i thought that the benches were made at a stable 15/25W, that was counting without Intel s usual trickeries...



https://www.anandtech.com/show/14664/testing-intel-ice-lake-10nm/5
 
Reactions: lightmanek

itsmydamnation

Platinum Member
Feb 6, 2011
2,863
3,417
136
I dont mind high wattage bursting, but its on the reviewers to ensure that they run valid tests to really determine what real world performance looks like.

But these U parts are really showing why there will be no 10nm desktop part soon.
 

Abwx

Lifer
Apr 2, 2011
11,167
3,862
136
How can they extract perf/Hz with SPEC tests at those conditions..?

Frequency burst are not problematic in a real product, but in a review we would expect some metrics being accurately measured.

If anything thoses numbers suggest that Intel will be back to MT perf as the saving grace, those should be way more impressive than the frequency limited ST figures.

Also the process, despite mediocre max frequency, seems largely up to TSMC s in its linear part, the turbo somewhat blur the computation of the thing, but at first glance the power/frequency curve (in the favourable segment) is not as steep as TSMC s 7nm.
 
Reactions: lightmanek

ondma

Platinum Member
Mar 18, 2018
2,770
1,351
136
AVX512 as well as power increased hugely the necessary time to increase "IPC" thanks to frequency, up to 45W for 8s in "15W" mode and 20s in "25W" mode.

And i thought that the benches were made at a stable 15/25W, that was counting without Intel s usual trickeries...



https://www.anandtech.com/show/14664/testing-intel-ice-lake-10nm/5
Again, you cant measure "IPC" with turbo engaged. It has to be done at a fixed frequency. Are you being deliberately obtuse, or do you just not get it?
 

ajc9988

Senior member
Apr 1, 2015
278
171
116
It might be defined differently but it has a common, almost ubiquitous physical manifestation in the real world which I expressed. Perhaps I had to be more precise with my words 'cause some people here are nit-picky as hell.
AVX512 as well as power increased hugely the necessary time to increase "IPC" thanks to frequency, up to 45W for 8s in "15W" mode and 20s in "25W" mode.

And i thought that the benches were made at a stable 15/25W, that was counting without Intel s usual trickeries...



https://www.anandtech.com/show/14664/testing-intel-ice-lake-10nm/5
IPC is Instruction Per Cycle. Frequency is Cycle per Second. If you control for the Cycles per second (frequency) and use a known set of instructions so that you have an absolute value, you then can figure out, by knowing the number of seconds, thereby giving you the number of cycles, what the number of Instrcutions executed per cycle is.

Most people care more about the overall instructions per second, which is the performance of the chip at a specific task with a specific instruction set. But, the IPC is a SET VALUE. It is a factor in the performance equation.

IPC cannot be used cavalierly like marketers use it. It has always been defined like this. It is not nitpicking, it is correct.

If you are unwilling to learn what these values are and mean, or how to use them for approximation of performance, then just don't share your opinions on it so as not to mislead others.

Now that you know you multiply IPC by the frequency to get instructions per second to get the performance number you care about, you then can take the instructions per second, divide that by the power consumption during the period (in Watts) to get the instruction per second per watt value as an approximation of performance per watt.

Considering boost behavior, you have to use averaging to arrive there, which is why they time how much time it takes to perform a task of known quantity, then compare those numbers with different hardware to estimate performance. The performance varies by tasks, instruction sets, etc. Even coding optimizations can change the values for similar tasks.

This is why we "nitpick," because words matter, especially defined words with set equations attached to them!
 

Abwx

Lifer
Apr 2, 2011
11,167
3,862
136
Again, you cant measure "IPC" with turbo engaged. It has to be done at a fixed frequency. Are you being deliberately obtuse, or do you just not get it?

I was sarcastic about the thing but seems that it was taken at face value.

That being said IPC can be measured quite accurately even with the turbo enabled by using a single thread, the CPU will get to the max turbo and sustain it without difficulty, 2 threads on a single core may not reach the max frequency and is hence not relevant.
 

ajc9988

Senior member
Apr 1, 2015
278
171
116
I was sarcastic about the thing but seems that it was taken at face value.

That being said IPC can be measured quite accurately even with the turbo enabled by using a single thread, the CPU will get to the max turbo and sustain it without difficulty, 2 threads on a single core may not reach the max frequency and is hence not relevant.
That isn't how Intel boost operates. Intel sets the max boost to drop after a certain amount of time if it is being ran at stock. If the test goes longer than the boost value when the frequency drops, it becomes an average and that becomes less reliable in calculations of the IPC. This is why you lock the frequency.
 

Yotsugi

Golden Member
Oct 16, 2017
1,029
487
106
You are comparing two different process nodes, one of which is significantly denser than the other. What you may or may not know is Intel's 14nm process for Skylake was MORE dense than the 14nm++ process used by coffee lake and likely comet lake CPUs.
You're implying ICL-U isn't using tons of HP libs where needed.
Don't.
 

ajc9988

Senior member
Apr 1, 2015
278
171
116
You're implying ICL-U isn't using tons of HP libs where needed.
Don't.
I'm assuming that ICL-U is using the High performance libraries but not the Ultra Performance libraries where needed. Intel, with 10nm, went to using three different library types instead of two, each of different densities. That is separate from the iterations on a process reducing density.

So, you are missing what was being said there. I was NOT referring to libraries used solely as the means to make the process iteration less dense.

Edit: Also, 10nm has a significant transistor density advantage over 14nm. That increases the heat concentration per mm, which means heat will become a factor necessitating reduction in cycles per second to fit within a temp.

So, since the number of transistors is drastically up on 10nm over 14nm of any iteration, that was more my point. I just wanted to also point out that the density on Skylake was more than on Kaby or Coffee.
 
Last edited:

amdfan111

Junior Member
Feb 9, 2018
19
18
81
What's going on with 3DPM avx-512/avx2 ? 4 times as fast as Whiskey Lake, but theoretical flops are the same. Icelake does have one more shuffle and much improved load/store but it does not explain such differences.

3DPM is a useless benchmark that does nothing but division. CNL/ICL has some special acceleration for divide.

Man, the Sunnycove core is a beast! How long before it makes it to desktop?

It's not happening... Next two years of roadmap is already confirmed, and it's Skylake Skylake Skylake. Only AMD releases desktop processors today, whereas Intel dumps rejected laptop parts as "desktop."

We can see today that Intel's 10 nm health "improvements" are nonsense. New architecture reduces frequency by just as much as any IPC "gains"? That means the overall performance will be worse than Skylake, since IPC is an average across workloads, whereas gigahertz applies to every workload.

On the other hand, we have AMD and Zen2 showing how competent firms design microprocessors. Not only did AMD increase the IPC by 25% in Zen2 (leading to 10% more IPC than Skylake), they did so while increasing the frequency, leading to exponential gains. Meanwhile Intel can't even keep performance from dropping with a "new" core.
 
Reactions: Drazick

Yotsugi

Golden Member
Oct 16, 2017
1,029
487
106
I'm assuming that ICL-U is using the High performance libraries but not the Ultra Performance libraries where needed
You don't know and we won't know until Intel does a Broadwell-like disclosure of xtor composition of their mobile die.
Intel, with 10nm, went to using three different library types instead of two, each of different densities
As if I don't read foundry papers.
Edit: Also, 10nm has a significant transistor density advantage over 14nm. That increases the heat concentration per mm, which means heat will become a factor necessitating reduction in cycles per second to fit within a temp.
Sorry to disappoint you, but Zen2 CCDs are even denser in heat@mm^2 and clock higher than ICL-U, too.
I just wanted to also point out that the density on Skylake was more than on Kaby or Coffee.
Neither Kaby, nor mobile Kaby parts did anything to reduce density.
And these both clocked way about ICL-U too.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |