[Techpowerup] AMD "Zen" CPU Prototypes Tested, "Meet all Expectations"

Page 16 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

erunion

Senior member
Jan 20, 2013
765
0
0
Abwx,
I see you were utterly defeated on Semi-Accurate by the Internet STRONGMAN, Juanrga. :biggrin:

I'm convinced juanrga is stuck in a feedback loop but doesnt realize it.
Several times he's made speculation about zen and within a few days that same information appears on one of the several copy-paste tech news site.
Rather than realizing those sites are just plagiarizing his forum posts, he believes independent sources are providing the same info. Thus he believes he's on the right track.
 

TechGod123

Member
Oct 30, 2015
94
1
0
I'm convinced juanrga is stuck in a feedback loop but doesnt realize it.
Several times he's made speculation about zen and within a few days that same information appears on one of the several copy-paste tech news site.
Rather than realizing those sites are just plagiarizing his forum posts, he believes independent sources are providing the same info. Thus he believes he's on the right track.

This is funny.
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
You appear to have no idea what it costs to add a new execution unit. The cost isn't die area (ALUs take minimal area on modern chips), but clock speed. The clock speed of chips is limited by the longest path, both in transistors that need to switch and wire distance that needs to be crossed, that has to be traversable in a single clock cycle. This longest path is almost invariably found in the forwarding network that shuffles results from one execution unit to another. Adding more execution units is very easy, but it also directly reduces the maximum clock speed of your chip.

If you are stuck on the same process, yes, that's true. But Intel hasn't exactly been stuck on 45nm. They are on 14nm.

Intel has been adding very little to the performance since Penryn:

http://files.looncraz.net/intel_ipc_claimsvbench.jpg

Of course, AMD went the other way
 

CHADBOGA

Platinum Member
Mar 31, 2009
2,135
832
136
I'm convinced juanrga is stuck in a feedback loop but doesnt realize it.
Several times he's made speculation about zen and within a few days that same information appears on one of the several copy-paste tech news site.
Rather than realizing those sites are just plagiarizing his forum posts, he believes independent sources are providing the same info. Thus he believes he's on the right track.
LOL

I have invited him to these forums, but so far he hasn't taken up the invitation. :biggrin:
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,701
1,228
136
Would be more beliavable if you said their fixed their construction cores finding better cache implementations.
Well from what I gathered these are the changes;

'New' 2-Level Branch Predictor;
L2 BP -> Predicts branches that hit both cores.
L1 BP -> Predicts branches that hit a single core.

'New' L1 Instruction Cache;
- Dual-ported
- More associativity
x Possible larger size

'New' Fetch/Decode;
- A 32B/2 fetch unit per core. (32B/2 -> 32B every other cycle)
- 2 * small Decode + 2 * big Decode unit per core. (Small -> 1x AMD64 or 128-bit FPU macro-op // Big -> 1x AMD64 or 128-bit FPU or 256-bit FPU macro-op)

'New' IEU/LSU arrangement;
- 2 LD AGLUs // 1 PipelinedMul/Load AGLU - 1 PipelinedDiv/Load AGLU
- 2 LD/ST AGLUs // 1 Branch/LD/ST AGLU - 1 POPCNT/LD/ST AGLU
- Eqv to 4 ALUs + 4 AGUs

'New" L1D Cache;
- Quadported
- 2 Load ports
- 2 Load or Store ports

'New' FPU arrangement;
- p2 is fused into p1
- p0/p1 have been increased from 128-bit width to 256-bit width

L2 Cache / L2 Interface changes;
- No more 4KB WCB // Don't really know -> 512KB L2_0/512KB L2_1, Crossbar, Shifters, bleh.

The core name for the 22FDSOI version is "Harvester", the 14FF version is "Crane".
 
Last edited:

looncraz

Senior member
Sep 12, 2011
722
1,651
136
I'm convinced juanrga is stuck in a feedback loop but doesnt realize it.
Several times he's made speculation about zen and within a few days that same information appears on one of the several copy-paste tech news site.
Rather than realizing those sites are just plagiarizing his forum posts, he believes independent sources are providing the same info. Thus he believes he's on the right track.

He's stuck, that's for sure. An entire forum trying to explain to him that the AGU:ALU ratio, by itself, means almost nothing for CPU performance and he just keeps saying they don't understand, it's about the ratio, not how many there are... I've been in arguments with him before, there's only one way to handle him... which is to search for the "Ignore" option :thumbsup:

I just wish SemiAccurate had open forums, I'd love to get in on that conversation (what can I say, 'm a glutton for punishment :twisted.
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
Well from what I gathered these are the changes;

'New' 2-Level Branch Predictor;
L2 BP -> Predicts branches that hit both cores.
L1 BP -> Predicts branches that hit a single core.

'New' L1 Instruction Cache;
- Dual-ported
- More associativity
x Possible larger size

'New' Fetch/Decode;
- A 32B/2 fetch unit per core. (32B/2 -> 32B every other cycle)
- 2 * small Decode + 2 * big Decode unit per core. (Small -> 1x AMD64 or 128-bit FPU macro-op // Big -> 1x AMD64 or 128-bit FPU or 256-bit FPU macro-op)

'New' IEU/LSU arrangement;
- 2 LD AGLUs // 1 PipelinedMul/Load AGLU - 1 PipelinedDiv/Load AGLU
- 2 LD/ST AGLUs // 1 Branch/LD/ST AGLU - 1 POPCNT/LD/ST AGLU
- Eqv to 4 ALUs + 4 AGUs

'New" L1D Cache;
- Quadported
- 2 Load ports
- 2 Load or Store ports

'New' FPU arrangement;
- p2 is fused into p1
- p0/p1 have been increased from 128-bit width to 256-bit width

L2 Cache / L2 Interface changes;
- No more 4KB WCB // Don't really know -> 512KB L2_0/512KB L2_1, Crossbar, Shifters, bleh.

The core name for the 22FDSOI version is "Harvester", the 14FF version is "Crane".

This would be quite interesting. Not so sure about predicting branches across cores - it seems these should be largely unrelated as branches exist in single threads of execution. The new FPU arrangement would also seem to be a regression, performance wise, though may well save power. Same with the decoder arrangements.

This could make sense for a small, low clocked, CMT design, though. Losing the write coalescing cache (what you call WCB?) would certainly be a problem with permanently segmenting the L2. Smaller should mean lower latency, which would certainly help.

The question is, are you just making this up for kicks, or do you have a valid reason to believe this to be the case
 

TechGod123

Member
Oct 30, 2015
94
1
0
Well from what I gathered these are the changes;

'New' 2-Level Branch Predictor;
L2 BP -> Predicts branches that hit both cores.
L1 BP -> Predicts branches that hit a single core.

'New' L1 Instruction Cache;
- Dual-ported
- More associativity
x Possible larger size

'New' Fetch/Decode;
- A 32B/2 fetch unit per core. (32B/2 -> 32B every other cycle)
- 2 * small Decode + 2 * big Decode unit per core. (Small -> 1x AMD64 or 128-bit FPU macro-op // Big -> 1x AMD64 or 128-bit FPU or 256-bit FPU macro-op)

'New' IEU/LSU arrangement;
- 2 LD AGLUs // 1 PipelinedMul/Load AGLU - 1 PipelinedDiv/Load AGLU
- 2 LD/ST AGLUs // 1 Branch/LD/ST AGLU - 1 POPCNT/LD/ST AGLU
- Eqv to 4 ALUs + 4 AGUs

'New" L1D Cache;
- Quadported
- 2 Load ports
- 2 Load or Store ports

'New' FPU arrangement;
- p2 is fused into p1
- p0/p1 have been increased from 128-bit width to 256-bit width

L2 Cache / L2 Interface changes;
- No more 4KB WCB // Don't really know -> 512KB L2_0/512KB L2_1, Crossbar, Shifters, bleh.

The core name for the 22FDSOI version is "Harvester", the 14FF version is "Crane".

Source?
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,701
1,228
136
Do you have a valid reason to believe this to be the case
AMD Boston has all the eggs.

- The Cache-Coherent Data Fabric based on Hypertransport.
- The second GPU team which development followed Gustafson and Larrabee.
- Offloading of the Bulldozer team as well.

So, you have a single team working on a CPU, a GPU, and a data fabric that combines CPUs and GPUs. Huh.

Then, everyone realizes Zen/K12 was a "global effort." While this was only done by AMD Boston and contractors @ an extremely low budget.
it seems these should be largely unrelated as branches exist in single threads of execution.
If both cores are operating on the same codeset, the L2 BP predicts branches. If both cores are operating on different codesets, the L1 BPs predicts branches.
 
Last edited:

looncraz

Senior member
Sep 12, 2011
722
1,651
136
Which suggests that it is actually not too easy to get a large performance gain?

No, Intel has been doing mostly targeted improvements based on the highest paying customers. This is part of why they now have 95% of the server market (and part of why the most popular benchmarks show the biggest gains).

While it isn't a cakewalk to get more performance, Intel has had what it takes to do so at their fingertips for years.

Merging the assets of two cores to work on one thread is entirely feasible, though they will certainly need to group execution resources carefully and add an extra stage or two to keep clock rate high.

It's all about organizing what is going to be executed. In this scenario there will be two ALUs which can do multiplication and two that can do division, so you put them next to each other and give them a end-of-line scheduler. Side ports between the scheduler stages will allow non-specialized instructions to move to another ALU scheduler if a slot becomes available while specialized instructions are queued. It is, indeed, at least one more stage, but you have twice the computational resources available, so who cares?

Aside from just the lack of motivation, the valid arguments against this are power draw, scaling, and marketing. Power draw should be something that can be managed, whereas scaling will require far more work (though very traditional efforts will do the trick). Marketing, however, is an issue. This, by no means, could be called anything other than a single core. I'd call it a 4DMacroCore when using double the resources and executing four threads.

So we'd have Dual 4DMacroCore CPUs which can handle eight threads and performs like an i7 in multithreaded loads, and spanks it in single threads.
 

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
No, Intel has been doing mostly targeted improvements based on the highest paying customers. This is part of why they now have 95% of the server market (and part of why the most popular benchmarks show the biggest gains).

While it isn't a cakewalk to get more performance, Intel has had what it takes to do so at their fingertips for years.

Then you are claiming that Intel can best Zen easily whenever they need to, even if Zen is faster than Skylake.

So it seems like you are basically saying Zen has no chance.

Either AMD can't get the performance up there, or if they do, Intel can easily crank out a faster chip.
 

TechGod123

Member
Oct 30, 2015
94
1
0
Then you are claiming that Intel can best Zen easily whenever they need to, even if Zen is faster than Skylake.

So it seems like you are basically saying Zen has no chance.

Either AMD can't get the performance up there, or if they do, Intel can easily crank out a faster chip.

The latter may be true. But wait! That would only be true if AMD wasn't ALSO going to be releasing Zen+.

You assume AMD will stand still, don't you think they've learnt by now?
 

Azuma Hazuki

Golden Member
Jun 18, 2012
1,532
866
131
And where exactly are the scary catmen getting this information from? It SOUNDS interesting, especially since Harvester and Crane appear to be what Orochi always should have been, but this is completely out of left field.
 

TechGod123

Member
Oct 30, 2015
94
1
0
And where exactly are the scary catmen getting this information from? It SOUNDS interesting, especially since Harvester and Crane appear to be what Orochi always should have been, but this is completely out of left field.

I have a hard time believing Zen was cancelled.
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
Then you are claiming that Intel can best Zen easily whenever they need to, even if Zen is faster than Skylake.

So it seems like you are basically saying Zen has no chance.

Either AMD can't get the performance up there, or if they do, Intel can easily crank out a faster chip.

Intel can best Zen easily and have already done so unless Zen is an astonishing success and the 40% claim was a gross under representation of the improvement over Excavator.

Zen will be Haswell +/- 10%, IPC-wise.

I had a friend who worked at Intel doing CPU microcode (he worked on Pentium 4, IIRC). He said, even then, that Intel could easily double the performance of their CPUs by doing away with a policy he called "copy exact." He said that they were not allowed to optimize existing microcode at all, they could only add to it.

Apparently this meant there was guaranteed compatibility, but that the microcode was full of workarounds and performance issues aplenty.

This is not surprising from a corporation like Intel, they have a LOT riding on maintaining proper compatibility and they don't really have to worry about optimal performance. I have no clue if this policy still exists, or even if I fully understood what he meant, but it is a good example of the type of thing that can happen in Intel that could never happen in a more vulnerable company. You don't leave a doubling of performance on the table.

If you compare the die sizes of Intel CPUs and their competition, the situation becomes rather clear. Intel is advancing on a cadence as they are exactly so they can give the appearance of progress without becoming too efficient too fast. They do NOT want the holy grail of performance, that would hurt sales immensely. Already, the "fast enough for the average Joe" computers are only being upgraded after they have a failure... many years, or even a decade or more, later.

In addition, Intel knows that they can't get too far ahead of AMD. If they were to make AMD any more irrelevant than they have, they will be facing massive legal battles in several countries - even without AMD initiating anything (since they have agreed to not do so). This is probably yet another disincentive to make genuine attempts at significant progress.

Just imagine the mainstream socket getting a 6 core CPU option without an iGPU from Intel... they won't do that, even though there is no doubt there is a significant enough market to make it profitable.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
I had a friend who worked at Intel doing CPU microcode (he worked on Pentium 4, IIRC). He said, even then, that Intel could easily double the performance of their CPUs by doing away with a policy he called "copy exact." He said that they were not allowed to optimize existing microcode at all, they could only add to it.

Apparently this meant there was guaranteed compatibility, but that the microcode was full of workarounds and performance issues aplenty.

I am very skeptical of this claim. Unless the microcode was contributing to something very, very broken there's no way any change in it would result in a 2x performance difference. Even a CPU like Pentium 4 wouldn't be spending that much time in microcode.
 

coercitiv

Diamond Member
Jan 24, 2014
6,593
13,907
136
You sound like you really know your stuff, showing a technical grasp I've only ever seen from Looncraz before (and both of you have disturbing cat avatars...weird). But this is so far beyond the pale.
Nosta is a strange cat, some of us believe he's not a real cat at all: tends to post lists of technical details of upcoming architectures, has little to contribute after product launches in terms of evaluation/confirmation for said technical details.
 

DrMrLordX

Lifer
Apr 27, 2000
21,991
11,538
136
Something to pay attention to here, as well, is that Excavator is pretty much dead even with Penryn for IPC.

I'm not sure that's accurate. The_Stilt was nice enough to post some XV numbers @ 3.4 GHz (no throttle/turbo) using his dev platform, and adding 40% to his Cinebench R10 numbers would put Zen ahead of Skylake per clock (assuming the same number of cores). 8c/16t Zen would be an R10 monster. Now when you take into account that R10 is mostly an fp SSE2 benchmark, and when you take into account that much of Zen's improvements in IPC over XV will come from:

1). Faster cache
2). Presumably shorter pipeline (less performance loss from stalls)
3). additional fp resources

you will probably see improvements from Zen on the high side in a benchmark like Cinebench R10. The open question is: how high will Zen clock? Maybe not all that high (at least as a base clock), but we'll find out soon enough.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Intel can best Zen easily and have already done so unless Zen is an astonishing success and the 40% claim was a gross under representation of the improvement over Excavator.

Zen will be Haswell +/- 10%, IPC-wise.

I had a friend who worked at Intel doing CPU microcode (he worked on Pentium 4, IIRC). He said, even then, that Intel could easily double the performance of their CPUs by doing away with a policy he called "copy exact." He said that they were not allowed to optimize existing microcode at all, they could only add to it.

Apparently this meant there was guaranteed compatibility, but that the microcode was full of workarounds and performance issues aplenty.

This is not surprising from a corporation like Intel, they have a LOT riding on maintaining proper compatibility and they don't really have to worry about optimal performance. I have no clue if this policy still exists, or even if I fully understood what he meant, but it is a good example of the type of thing that can happen in Intel that could never happen in a more vulnerable company. You don't leave a doubling of performance on the table.

If you compare the die sizes of Intel CPUs and their competition, the situation becomes rather clear. Intel is advancing on a cadence as they are exactly so they can give the appearance of progress without becoming too efficient too fast. They do NOT want the holy grail of performance, that would hurt sales immensely. Already, the "fast enough for the average Joe" computers are only being upgraded after they have a failure... many years, or even a decade or more, later.

In addition, Intel knows that they can't get too far ahead of AMD. If they were to make AMD any more irrelevant than they have, they will be facing massive legal battles in several countries - even without AMD initiating anything (since they have agreed to not do so). This is probably yet another disincentive to make genuine attempts at significant progress.

Just imagine the mainstream socket getting a 6 core CPU option without an iGPU from Intel... they won't do that, even though there is no doubt there is a significant enough market to make it profitable.

Copy Exactly is used in manufacturing, not design. It means every line in every fab is an exact clone of every other. If an optimization is developed at one location it is rolled out to other locations. This is in contrast to other fabs that use SPM on a per line basis. The purpose is to reduce cost.

I believe I read somewhere that Intel syncs their manufacturing processes every 90 days.

Intel page.
 
Last edited:

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Broadwell-E will be 3.3Ghz for 8 cores and 3Ghz for 10 cores. All at 140W.

Anyone in their right mind still believing in 4Ghz Haswell IPC 8C/16T at 95W with 14 LPP for Zen?
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,701
1,228
136
1). Faster cache
2). Presumably shorter pipeline (less performance loss from stalls)
3). additional fp resources
1). Inclusive 8 MB L3 Cache per 4 cores. Kind of shoots down that faster cache notion.
2). BD -> XV has 15 pipeline stages for the Integer side. ZN/ZN+ has 17 pipeline stages for the Integer side.
3). The additional FP resources is negated by having MAC units rather than FMAC units. 2 128b AVX Adds + 2 128b AVX Muls or 1 128b/256b AVX Muladd.

Summit Ridge vs Cannonlake SoC
- CL atleast has an iGPU

Raven Ridge vs Cannonlake SoC
- CL has moar coars.
- CL uses MCDRAM.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |