Anandtech：Intel's Skylake-SP Xeon VS AMD's EPYC 7000

JoeRambo · Jul 12, 2017

krumme said:
Now how is amd going to improve on that performance of the distributed l3?
Its a disadvantage for them and shows the backside of the design.
A way to look at it is simply. They cant. But so what. Where that matters its a lost market.
Imo the best strategy forward is to entrench and build on your strenght. Not try to make the ocean boil.
7nm high perf with just tuned core and uncore and they are in a very good position. It looks balanced as it is and can do. No need to invent a new bd.

They have PLENTY of options in fact:

1) Continue playing into strong L2 performance by increasing it to 1MB per core. Obvious path with time proven gains, reduced misses, less L3 traffic. ( just keep those morons who designed previous AMD caches away please )
2) Increase L3 per CCX to 12 or 16MB to help keep traffic local to CCX.
4) Keep pumping clock to interconnects and keep rising memory clocks / lowering latency.
3) Increase core count per CCX to 6 ( or one can dream - to 8 ).

wahdangun · Jul 12, 2017

JoeRambo said:
They have PLENTY of options in fact:

1) Continue playing into strong L2 performance by increasing it to 1MB per core. Obvious path with time proven gains, reduced misses, less L3 traffic. ( just keep those morons who designed previous AMD caches away please )
2) Increase L3 per CCX to 12 or 16MB to help keep traffic local to CCX.
4) Keep pumping clock to interconnects and keep rising memory clocks / lowering latency.
3) Increase core count per CCX to 6 ( or one can dream - to 8 ).

if amd increase the core count per ccx, it will lead to increasing latency.

i think by just removing the 4 Ghz barier will be enough and maybe decrease inter-ccx latency.

especially after intel was regressed in IPC with their new mesh topology, so increase in IPC is not really crucial.

JoeRambo · Jul 12, 2017

wahdangun said:
if amd increase the core count per ccx, it will lead to increasing latency.

How it will increase latency? Intra-Core latency is const, even IF you keep L3 size and interconnect speed the same, you still have less Inter-CCX traffic just by virtue of having more cores in CCX and not needing to communicate to outside. That is of course amplified if OS scheduler / app is CCX aware.
4Ghz barrier is irrelevant for servers, optimal clocks will not rise to 4Ghz+ for what is 32/48C monsters.

wahdangun · Jul 12, 2017

JoeRambo said:
How it will increase latency? Intra-Core latency is const, even IF you keep L3 size and interconnect speed the same, you still have less Inter-CCX traffic just by virtue of having more cores in CCX and not needing to communicate to outside. That is of course amplified if OS scheduler / app is CCX aware.
4Ghz barrier is irrelevant for servers, optimal clocks will not rise to 4Ghz+ for what is 32/48C monsters.

because actually the L3 in the ccx is split to 4 module, that is why when the data spill to L3 and crossing certain size the latency will increase substantially.

JoeRambo · Jul 12, 2017

wahdangun said:
because actually the L3 in the ccx is split to 4 module, that is why when the data spill to L3 and crossing certain size the latency will increase substantially.

While technically you are correct about four L3 slices, the graph does not support the notion. Each slice is 2MB, but notice how 4MB accesses are still 15ns and only 6MB-8MB is deteriorating*. So CCX with 6 or 8 cores, would have 6 or 8 slices of L3 and same "characteristics". If L3 size per slice is increased, that would actually result in performance gain. See what happens @ 8->12? Imagine CCX having 6 cores and 12MB of L3 and putting that point away.
That's what i mean with "2) Increase L3 per CCX to 12 or 16MB to help keep traffic local to CCX." -> by increasing L3 of CCX you are in fact boosting the actual LLC that core can use.

* I think deteriorating happens because L3 is eviction and prefers local slice, so once you go beyond certain point you start evicting older cache lines from local slice, instead of using full size of L3 in CCX.

wahdangun · Jul 12, 2017

JoeRambo said:
While technically you are correct about four L3 slices, the graph does not support the notion. Each slice is 2MB, but notice how 4MB accesses are still 15ns and only 6MB-8MB is deteriorating*. So CCX with 6 or 8 cores, would have 6 or 8 slices of L3 and same "characteristics". If L3 size per slice is increased, that would actually result in performance gain. See what happens @ 8->12? Imagine CCX having 6 cores and 12MB of L3 and putting that point away.
That's what i mean with "2) Increase L3 per CCX to 12 or 16MB to help keep traffic local to CCX." -> by increasing L3 of CCX you are in fact boosting the actual LLC that core can use.

* I think deteriorating happens because L3 is eviction and prefers local slice, so once you go beyond certain point you start evicting older cache lines from local slice, instead of using full size of L3 in CCX.

but by increasing L3 cache you will also increasing the L3 latency across the board, and the topology will be not square, so the core will not balance especially if the data was from the evicted l2 core that locate far away from the center of ccx.

JoeRambo · Jul 12, 2017

wahdangun said:
but by increasing L3 cache you will also increasing the L3 latency across the board, and the topology will be not square, so the core will not balance especially if the data was from the evicted l2 core that locate far away from the center of ccx.

Still not quite sure what you mean.
http://www.chip-architect.com/news/Zen_Summit_Ridge_First.jpg

take this, add 2 more cores to CCX, things are still "square", L3 can be 8MB, 9MB or 12MB or whatever AMD desires and can afford. And L3 is still in the center, with ~same characteristics. SURE average latency could increase a bit due to distance.

krumme · Jul 12, 2017

In regards to l3 latency. I dont know what major value the AT mysql db test have anyway.
What is the likelyhood of a database just fitting in l3 for sklx but not epyc? Isnt it ram size we talk here like nearly always.

NTMBK · Jul 12, 2017

JoeRambo said:
Still not quite sure what you mean.
http://www.chip-architect.com/news/Zen_Summit_Ridge_First.jpg

take this, add 2 more cores to CCX, things are still "square", L3 can be 8MB, 9MB or 12MB or whatever AMD desires and can afford. And L3 is still in the center, with ~same characteristics. SURE average latency could increase a bit due to distance.

If they keep L3/CCX at 8MB, then the L3/core goes down from 2MB to 1.33MB, quite a big drop. Or if they boost L3 to 12MB, then there will be a matching increase in latency.

JoeRambo · Jul 12, 2017

NTMBK said:
Or if they boost L3 to 12MB, then there will be a matching increase in latency.

Latency will increase maybe 1ns, but since memory is further 75ns away, there is quite a gap for average improvement?

moinmoin · Jul 12, 2017

German ComputerBase released an article on their own. No benchmarks but afaik they are the first to release more slides from Intel that are quite telling.
https://www.computerbase.de/2017-07/intel-xeon-skylake-sp-purley/3/
Looks like Intel spent quite some time to disparage AMD's approach in its Xeon presentation (around 20 slides focus on that), so the spokesperson previously dissing Epyc as "stitching together 4 desktop dies" is apparently company PR policy now.
http://www.barrons.com/articles/amd-reveals-epyc-details-intel-vows-to-top-it-1497997334

Some of the slides:

Thanks for the laughs, Intel. They don't seem to be as confident in their own products anymore to have to go this low.

formulav8 · Jul 12, 2017

jpiniero said:
Double the articles, double the clicks!

Ha, sounds about right for Toms

itsmydamnation · Jul 12, 2017

So according to Intel their own desktop processors are now not even desktop worthy*

* logic path, Ryzen supports ECC, EPYC is re purposed desktop, regular skylake doesn't support ECC, ECC is desktop level stuff ( see intel dissing amd slide) , there for skylake is trash not even worthy of desktop........

/s ...... sigh......

plopke · Jul 12, 2017

As someone who used only Intel CPU's for the last decade , them saying , "4 glued-together desktop dies" just comes over childish,stupid and for some reason annoys me. If AMD glued together 4 "crap" CPU's , how come you get beaten in quiet some scenarios that might be very valid scenarios for me as a customer who asked for more cores at lower price points for around 7 years. Urgh if AMD had a proper marketing team they could make Intel look so disrespectful/greedy/dumb with just that one line.

PS : I will be still buying many Intel CPU's .

raghu78 · Jul 12, 2017

moinmoin said:
Thanks for the laughs, Intel. They don't seem to be as confident in their own products anymore to have to go this low.

Intel is in panic mode. They know the writing is on the wall. Their server market share and margins are going down.

wahdangun · Jul 12, 2017

JoeRambo said:
Still not quite sure what you mean.
http://www.chip-architect.com/news/Zen_Summit_Ridge_First.jpg

take this, add 2 more cores to CCX, things are still "square", L3 can be 8MB, 9MB or 12MB or whatever AMD desires and can afford. And L3 is still in the center, with ~same characteristics. SURE average latency could increase a bit due to distance.

No, it won't be square. And with increasing latency the ipc will also decrease.

The real problem is not latency between ccx but windows and the software need to be aware about ryzen topology.

So they don't need more core per ccx but more ghz and decreasing latency between them.

Atari2600 · Jul 12, 2017

Why so surprised? Sure Intel are back to bribing suppliers to offer Ryzen with only lethargic memory speeds (well below max supported).

Ancalagon44 · Jul 12, 2017

Atari2600 said:
Why so surprised? Sure Intel are back to bribing suppliers to offer Ryzen with only lethargic memory speeds (well below max supported).

Luckily companies like Microsoft, Google and Facebook are large enough that this tactic doesn't work so well anymore. Google has engineers who would accept EPYC configured in a sub-par way and they also will not be swayed by rebates from Intel since they are not reselling them processors.

Once the massive companies adopt EPYC, smaller companies will then ask the server suppliers for properly configured EPYC servers.

Anyway, the best part about this is that AMD sells their top end EPYC for $4200, when it costs much less to produce than any of Intel's chips. Their yields will be dramatically higher. I think AMD will be having a good quarter.

Tuna-Fish · Jul 12, 2017

Markfw said:
That review is BS IMO, its only benchmarked against older Xeon processors, how lame is that !

jpiniero said:
They'll probably do an epyc article later. Double the articles, double the clicks!

formulav8 said:
Ha, sounds about right for Toms

To be fair, according to the Anandtech article, AMD sent their processor really late. For a system as complex as a 2S server, running all the relevant tests properly in a week can be really hard. Anandtech chose to try, Tom's decided to delay.

jpiniero said:
1S is pretty much workstations. Servers are more or less 2S. You do have things like Xeon D though; but that's much lower power and footprint than either Epyc or the typical Xeons.

This is more because of Intel's market segmentation than because of business demands. What I, (and I expect many more!) would like to see a lot of cores as possible in multiple 1S systems fit into a single 2U enclosure. Basically, density is king, but I see no need for having that many CPUs in a single system.

Ancalagon44 said:
I think AMD will be having a good quarter.

Well, not this quarter. Server market sales move slower than retail sales, I expect high volume on AMD servers no earlier than Q4.

Technotronic · Jul 12, 2017

Intel is acting pathetically. If they are being this underhanded in their slide deck.. I am sure to imagine they are back to their old slimeball ways of bribes and threats behind closed doors. Brian Krzanich even looks the part..

IEC · Jul 12, 2017

Chip segmentation, compared:

krumme · Jul 12, 2017

moinmoin said:
German ComputerBase released an article on their own. No benchmarks but afaik they are the first to release more slides from Intel that are quite telling.
https://www.computerbase.de/2017-07/intel-xeon-skylake-sp-purley/3/
Looks like Intel spent quite some time to disparage AMD's approach in its Xeon presentation (around 20 slides focus on that), so the spokesperson previously dissing Epyc as "stitching together 4 desktop dies" is apparently company PR policy now.
http://www.barrons.com/articles/amd-reveals-epyc-details-intel-vows-to-top-it-1497997334

Some of the slides:

Thanks for the laughs, Intel. They don't seem to be as confident in their own products anymore to have to go this low.

Selling to amazon google facebook with this marketing from the 80ties only hurts brandvalue and backfires even for uninformed b2b segments. If those "you dont get fired for buying ibm" customers even exist today?? I thought they were extinct. They talk to their customers as if they are braindead and is on a b2c market.
More idiotic and weird than vega FE launch. Where do these tech firms get their marketing folks? It sure isnt from eg heineken or apple.
I find it hard to beliewe its legit slides.

flash-gordon · Jul 12, 2017

Beaten by a glue.... infinity glue...

wildhorse2k · Jul 12, 2017

Congratulations to AMD. The review started with mixed results for Epyc with the Spec CPU 2006 and database performance but the it had respectable lead in many other benchmarks or was even. Very good alternative to Xeons for cases where it performs well.

Phynaz · Jul 12, 2017

IMHO Anandtech's testing isn't all that valid. Who doesn't virtualize their servers now? Would it have been difficult to install Hyper-V?

Anandtech：Intel's Skylake-SP Xeon VS AMD's EPYC 7000

Golden Member

Golden Member

Golden Member

Golden Member

Golden Member

Golden Member

Golden Member

Diamond Member

Lifer

Golden Member

Diamond Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Golden Member

Golden Member

Diamond Member

Golden Member

Junior Member

Elite Member

Diamond Member

Member

Member

Lifer