4C/8T Zen should be 65W vs 4790K at 88W.Averaging 4790K performance with the same core count at similar power would be well beyond my expectations, and make Zen a serious competitor.
Sent from HTC 10
That really depends on the frequencies they require to be competitive.4C/8T Zen should be 65W vs 4790K at 88W.
Except that the competition is actually Kaby Lake.
If Zen is not quite up to Haswell, then it is probably Ivy Bridge level, and AMD is taking too long to release it, imo.
I think we are looking at Ivy Bridge vs Kaby Lake in terms of performance, but at similar tdp.
Extremely difficult when you have tight budgets that only allow the very basic of research, resources, manpower and testing.How difficult is it to design a good cpu at RTL level these days vs post RTL verification process?
Bd as a solution was bad on rtl level and implementation. Or implementation was near impossible due to wrong arch desicions. Perhaps those issues shadow that the basic problem that bd was trying to take Intel head on.
Intel have ressources to implement a RTL design in a quality nobody can approach. Fortunately the design of Zen seems so different with eg non avx2 that it might get a solid compettitive edge in some server segments and high end workstation laptops. Segments where all the smt capability will be usefull. And where there is profit to be made.
I am not an engineer but it doesnt take much knowledge to know AMD could easily use 2000 man years extra on implementation to reach higher freq at acceptable efficiency. Issues that hit highend desktop user especially.
4C/8T Zen should be 65W vs 4790K at 88W.
Except that the competition is actually Kaby Lake.
If Zen is not quite up to Haswell, then it is probably Ivy Bridge level, and AMD is taking too long to release it, imo.
I think we are looking at Ivy Bridge vs Kaby Lake in terms of performance, but at similar tdp.
My tests indicate that complex repetitive patterns are predicted well after a certain learning period. There appears to be no sharp limit to the length of branch patterns that can be predicted, and even very long patterns can be predicted. There seems to be no loop counter, and nested loops are not predicted particularly well. Indirect branches are predicted well. The prediction success rate is somewhat higher in the Steamroller than in the previous models.
The processor is able to predict very long repetitive jump patterns with few or no mispredictions. I found no specific limit to the length of jump patterns that could be predicted. Loops are successfully predicted up to a count of 32 or a little more. Nested loops and branches inside loops are predicted reasonably well.
That there is a 4C/8T Zen at all to test at this point is surprising to me. I was expecting Zen to be 8C only for a little while, at least.That really depends on the frequencies they require to be competitive.
In the past, it has been FAR easier to double lower clocked cores than increase clocks with half the core count at a given TDP.
This depends more on their process implementation.
The question is, what clocks does 4/8 Zen need to be competitive with 4790K 4-4.4GHz?
Sent from HTC 10
Well, the leak said close to Haswell, so I said Ivy Bridge since that's close to Haswell.I dont see why people keep saying IVB for Zen IPL, its wider, has more resources, will have faster memory, more modern predictors etc etc. I dot see AMD beating skylake for ILP but also dont see it being at IVB levels, there would have to be some big glass jaws in the architecture. With better cache and wider execution resources the two big ones are already gone, the uop cache helps with the 3rd (miss predict penalty).
Some of the best in the business worked on this thing (Jim Keller and the team he assembled, the guys who like a challenge). Also on the process side it's not just AMD who does high frequency research. Ton of semiconductor companies use these nodes as well. And a lot of them deal with high speed interconnect stuff and microwave. A lot of IBM research went into Global Fundries and Samsung Venture as well. These aren't small players.Extremely difficult when you have tight budgets that only allow the very basic of research, resources, manpower and testing.
It's all about the resources.
Sent from HTC 10
It seems, you missed the 4C8T ES on Zauba months ago.That there is a 4C/8T Zen at all to test at this point is surprising to me. I was expecting Zen to be 8C only for a little while, at least.
If there are already 4C Zen chips being tested, then I am puzzled by the lack of a release.
I work for one of the firms you mentioned. The reality in these corps is, decisions aren't made by superstar engineers but by their management and the execs. Most of whom be non-technical, businessy, snakeoil salesmen.Some of the best in the business worked on this thing (Jim Keller and the team he assembled, the guys who like a challenge). Also on the process side it's not just AMD who does high frequency research. Ton of semiconductor companies use these nodes as well. And a lot of them deal with high speed interconnect stuff and microwave. A lot of IBM research went into Global Fundries and Samsung Venture as well. These aren't small players.
I do think that Intel has an edge since they own their own fabs which allows them further process level customization for a given product. But I wouldn't count AMD out on this one.
I think Zen will likely have Haswell or better IPC.. in some cases better than Broadwell-E (workload dependent, like the blender demo showed). But it will probably not be able to reach the OC clocks we're getting on Intel side. It could however have more perf/watt on stock clocks. We shall see.. the day is approaching.
I work for one of the firms you mentioned. The reality in these corps is, decisions aren't made by superstar engineers but by their management and the execs. Most of whom be non-technical, businessy, snakeoil salesmen.
Excellent decisions worth an easy billions in profit get canned. Viable, cheaper solutions get overlooked. In-house excellency gets buried. Products far ahead of time don't even get a chance. All without explanation to the superstar engineers.
That's the reality of international corporations ESPECIALLY when things are not going well. Corporate politics is absolutely crazy.
Engineering/Science =! Business/Politics.
Trust me, it's not the engineering I doubt.
Sent from HTC 10
Intel's process is typically FAR better than whatever AMD uses for the respective pMOS/nMOS values...
Did Intel just drastically change their architecture, hint much wider and more powerful?
Do you think IPC increases + SMT are free?
That happened from Core to Nehalem. Remember that?
Do you realize same chip, one running default and one tuned to extract higher IPC on a specific code = latter burns more power?
So you've designed nanotech ICs and you say there is no scientific chance possible that can make Zen clock lower or equal to Excavator at 95W.
Strong statements.
So if Zen turns out to clock lower than Exc at 95W, you seriously have no clue about processor design or nanotechnology.
I've already explained to you your compares and deductions are logically invalid but you just keep repeating the same words. You are missing most of the crucial factors to create crazy hype around Zen.
Your 4.0G/4.3Hz 8/16 belief is... we'll see by how much you're off
Sent from HTC 10
kaby lake so for all intents and purposes Haswell
Which 4C/8T the APU or a cutdown Zeppelin? The TDP can be whatever they wan't just make it configurable.
There is nothing architecturally that suggests IVB performance. It has broadwell/skylake level sized structures, it has a much improved cache system over CON cores ( that blender demo is one big cache test (you can tell because SMT gives such a big perf increase and its 128bit vectors)), it follows on from CAT/CON core predictors*. As i said before load store system is hardest to gauge but that was an area that saw continued improvement in steamroller and excavator and will likely continue to improve with Zen.**
I dont see why people keep saying IVB for Zen IPL, its wider, has more resources, will have faster memory, more modern predictors etc etc. I dot see AMD beating skylake for ILP but also dont see it being at IVB levels, there would have to be some big glass jaws in the architecture. With better cache and wider execution resources the two big ones are already gone, the uop cache helps with the 3rd (miss predict penalty).
*anger on AMD predictors
for Intel
So even though its soft words they are pretty equal and you would expect AMD to continue to improve them with Zen.
** from agners data they look pretty simlar with AMD having more edge cases the big thing that hurts AMD is when store forwarding fails its 25 cycles vs upto 11 for intel, which look very much like cache latency issues which should be much better for Zen compared to CON.
edit: repeat after me, AVX2 and AVX can both be 128 or 256bits! Zen fully supports AVX just like SR and EX. Zen even supports 256bit AVX,AVX2 and FMA (again different to both AVX and AVX2) as one uop even though it needs to be issues to the execution units twices (ones for lower half ones for upper half).
Zen is only 128bit wide datapaths and execution units not 256 bit like >Haswell, it has nothing to do with AVX2 specificly.
The shared int and fp pipeline on INTEL are far worse than the separate int and fp scheduler.
INTEL perform well only because has many memory ports, the FP units are 256 bit and can do up to 2x256 bit FMAC. On 128 bit code without FMACs Zen has more potential throughput...
Yes, it's so worse that in the Zen's best case, with Blender / SSE / 128 bit, Intel takes only 2% more time.Zen can decode 4, issue 6, dispatch 10 and retire 8 uops... It's better than IVB, HSW, BDW and on par or slightly better with skylake (except decode)... The only unknown are cache and branch prediction efficiency...
The shared int and fp pipeline on INTEL are far worse than the separate int and fp scheduler. INTEL perform well only because has many memory ports, the FP units are 256 bit and can do up to 2x256 bit FMAC. On 128 bit code without FMACs Zen has more potential throughput...
And you would expect both cores to be load store limited in that workload, which is why i kind think all these port counting things get a little silly.Yes, it's so worse that in the Zen's best case, with Blender / SSE / 128 bit, Intel takes only 2% more time.
Proof?
Yeah, that's the ONLY reason Intel CPUs perform well. Yeesh.
Yes, it's so worse that in the Zen's best case, with Blender / SSE / 128 bit, Intel takes only 2% more time.
Well, the scene used by AMD for the Blender test doesn't seem to be so complicated to stress a lot the L/S unit. On the contrary: it's quite simple.And you would expect both cores to be load store limited in that workload, which is why i kind think all these port counting things get a little silly.
Well, the processors are there with their optimization manuals. I don't know what kind of "secret sauce" is still missing...There are other things that are better in INTEL, that we don't know, evidently...
I want to see Intel be less lazy again.
In science, hypothesis can be pretty absurd without empirical test results.Zen can decode 4, issue 6, dispatch 10 and retire 8 uops... It's better than IVB, HSW, BDW and on par or slightly better with skylake (except decode)... The only unknown are cache and branch prediction efficiency...
The shared int and fp pipeline on INTEL are far worse than the separate int and fp scheduler. INTEL perform well only because has many memory ports, the FP units are 256 bit and can do up to 2x256 bit FMAC. On 128 bit code without FMACs Zen has more potential throughput...