Anandtech:Intel's Skylake-SP Xeon VS AMD's EPYC 7000

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Now how is amd going to improve on that performance of the distributed l3?
Its a disadvantage for them and shows the backside of the design.
A way to look at it is simply. They cant. But so what. Where that matters its a lost market.
Imo the best strategy forward is to entrench and build on your strenght. Not try to make the ocean boil.
7nm high perf with just tuned core and uncore and they are in a very good position. It looks balanced as it is and can do. No need to invent a new bd.

They have PLENTY of options in fact:

1) Continue playing into strong L2 performance by increasing it to 1MB per core. Obvious path with time proven gains, reduced misses, less L3 traffic. ( just keep those morons who designed previous AMD caches away please )
2) Increase L3 per CCX to 12 or 16MB to help keep traffic local to CCX.
4) Keep pumping clock to interconnects and keep rising memory clocks / lowering latency.
3) Increase core count per CCX to 6 ( or one can dream - to 8 ).
 

wahdangun

Golden Member
Feb 3, 2011
1,007
148
106
They have PLENTY of options in fact:

1) Continue playing into strong L2 performance by increasing it to 1MB per core. Obvious path with time proven gains, reduced misses, less L3 traffic. ( just keep those morons who designed previous AMD caches away please )
2) Increase L3 per CCX to 12 or 16MB to help keep traffic local to CCX.
4) Keep pumping clock to interconnects and keep rising memory clocks / lowering latency.
3) Increase core count per CCX to 6 ( or one can dream - to 8 ).

if amd increase the core count per ccx, it will lead to increasing latency.

i think by just removing the 4 Ghz barier will be enough and maybe decrease inter-ccx latency.

especially after intel was regressed in IPC with their new mesh topology, so increase in IPC is not really crucial.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
if amd increase the core count per ccx, it will lead to increasing latency.

How it will increase latency? Intra-Core latency is const, even IF you keep L3 size and interconnect speed the same, you still have less Inter-CCX traffic just by virtue of having more cores in CCX and not needing to communicate to outside. That is of course amplified if OS scheduler / app is CCX aware.
4Ghz barrier is irrelevant for servers, optimal clocks will not rise to 4Ghz+ for what is 32/48C monsters.
 

wahdangun

Golden Member
Feb 3, 2011
1,007
148
106
How it will increase latency? Intra-Core latency is const, even IF you keep L3 size and interconnect speed the same, you still have less Inter-CCX traffic just by virtue of having more cores in CCX and not needing to communicate to outside. That is of course amplified if OS scheduler / app is CCX aware.
4Ghz barrier is irrelevant for servers, optimal clocks will not rise to 4Ghz+ for what is 32/48C monsters.

because actually the L3 in the ccx is split to 4 module, that is why when the data spill to L3 and crossing certain size the latency will increase substantially.

 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
because actually the L3 in the ccx is split to 4 module, that is why when the data spill to L3 and crossing certain size the latency will increase substantially.



While technically you are correct about four L3 slices, the graph does not support the notion. Each slice is 2MB, but notice how 4MB accesses are still 15ns and only 6MB-8MB is deteriorating*. So CCX with 6 or 8 cores, would have 6 or 8 slices of L3 and same "characteristics". If L3 size per slice is increased, that would actually result in performance gain. See what happens @ 8->12? Imagine CCX having 6 cores and 12MB of L3 and putting that point away.
That's what i mean with "2) Increase L3 per CCX to 12 or 16MB to help keep traffic local to CCX." -> by increasing L3 of CCX you are in fact boosting the actual LLC that core can use.


* I think deteriorating happens because L3 is eviction and prefers local slice, so once you go beyond certain point you start evicting older cache lines from local slice, instead of using full size of L3 in CCX.
 

wahdangun

Golden Member
Feb 3, 2011
1,007
148
106
While technically you are correct about four L3 slices, the graph does not support the notion. Each slice is 2MB, but notice how 4MB accesses are still 15ns and only 6MB-8MB is deteriorating*. So CCX with 6 or 8 cores, would have 6 or 8 slices of L3 and same "characteristics". If L3 size per slice is increased, that would actually result in performance gain. See what happens @ 8->12? Imagine CCX having 6 cores and 12MB of L3 and putting that point away.
That's what i mean with "2) Increase L3 per CCX to 12 or 16MB to help keep traffic local to CCX." -> by increasing L3 of CCX you are in fact boosting the actual LLC that core can use.


* I think deteriorating happens because L3 is eviction and prefers local slice, so once you go beyond certain point you start evicting older cache lines from local slice, instead of using full size of L3 in CCX.


but by increasing L3 cache you will also increasing the L3 latency across the board, and the topology will be not square, so the core will not balance especially if the data was from the evicted l2 core that locate far away from the center of ccx.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
but by increasing L3 cache you will also increasing the L3 latency across the board, and the topology will be not square, so the core will not balance especially if the data was from the evicted l2 core that locate far away from the center of ccx.

Still not quite sure what you mean.
http://www.chip-architect.com/news/Zen_Summit_Ridge_First.jpg

take this, add 2 more cores to CCX, things are still "square", L3 can be 8MB, 9MB or 12MB or whatever AMD desires and can afford. And L3 is still in the center, with ~same characteristics. SURE average latency could increase a bit due to distance.
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
In regards to l3 latency. I dont know what major value the AT mysql db test have anyway.
What is the likelyhood of a database just fitting in l3 for sklx but not epyc? Isnt it ram size we talk here like nearly always.
 

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
Still not quite sure what you mean.
http://www.chip-architect.com/news/Zen_Summit_Ridge_First.jpg

take this, add 2 more cores to CCX, things are still "square", L3 can be 8MB, 9MB or 12MB or whatever AMD desires and can afford. And L3 is still in the center, with ~same characteristics. SURE average latency could increase a bit due to distance.

If they keep L3/CCX at 8MB, then the L3/core goes down from 2MB to 1.33MB, quite a big drop. Or if they boost L3 to 12MB, then there will be a matching increase in latency.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Or if they boost L3 to 12MB, then there will be a matching increase in latency.

Latency will increase maybe 1ns, but since memory is further 75ns away, there is quite a gap for average improvement?
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
German ComputerBase released an article on their own. No benchmarks but afaik they are the first to release more slides from Intel that are quite telling.
https://www.computerbase.de/2017-07/intel-xeon-skylake-sp-purley/3/
Looks like Intel spent quite some time to disparage AMD's approach in its Xeon presentation (around 20 slides focus on that), so the spokesperson previously dissing Epyc as "stitching together 4 desktop dies" is apparently company PR policy now.
http://www.barrons.com/articles/amd-reveals-epyc-details-intel-vows-to-top-it-1497997334

Some of the slides:



Thanks for the laughs, Intel. They don't seem to be as confident in their own products anymore to have to go this low.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,867
3,418
136
So according to Intel their own desktop processors are now not even desktop worthy*

* logic path, Ryzen supports ECC, EPYC is re purposed desktop, regular skylake doesn't support ECC, ECC is desktop level stuff ( see intel dissing amd slide) , there for skylake is trash not even worthy of desktop........


/s ...... sigh......
 

plopke

Senior member
Jan 26, 2010
238
74
101
As someone who used only Intel CPU's for the last decade , them saying , "4 glued-together desktop dies" just comes over childish,stupid and for some reason annoys me. If AMD glued together 4 "crap" CPU's , how come you get beaten in quiet some scenarios that might be very valid scenarios for me as a customer who asked for more cores at lower price points for around 7 years. Urgh if AMD had a proper marketing team they could make Intel look so disrespectful/greedy/dumb with just that one line.

PS : I will be still buying many Intel CPU's .
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
Thanks for the laughs, Intel. They don't seem to be as confident in their own products anymore to have to go this low.
Intel is in panic mode. They know the writing is on the wall. Their server market share and margins are going down.
 

wahdangun

Golden Member
Feb 3, 2011
1,007
148
106
Still not quite sure what you mean.
http://www.chip-architect.com/news/Zen_Summit_Ridge_First.jpg

take this, add 2 more cores to CCX, things are still "square", L3 can be 8MB, 9MB or 12MB or whatever AMD desires and can afford. And L3 is still in the center, with ~same characteristics. SURE average latency could increase a bit due to distance.

No, it won't be square. And with increasing latency the ipc will also decrease.


The real problem is not latency between ccx but windows and the software need to be aware about ryzen topology.

So they don't need more core per ccx but more ghz and decreasing latency between them.
 

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
Why so surprised? Sure Intel are back to bribing suppliers to offer Ryzen with only lethargic memory speeds (well below max supported).
 
Reactions: utmode and Drazick

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
Why so surprised? Sure Intel are back to bribing suppliers to offer Ryzen with only lethargic memory speeds (well below max supported).

Luckily companies like Microsoft, Google and Facebook are large enough that this tactic doesn't work so well anymore. Google has engineers who would accept EPYC configured in a sub-par way and they also will not be swayed by rebates from Intel since they are not reselling them processors.

Once the massive companies adopt EPYC, smaller companies will then ask the server suppliers for properly configured EPYC servers.

Anyway, the best part about this is that AMD sells their top end EPYC for $4200, when it costs much less to produce than any of Intel's chips. Their yields will be dramatically higher. I think AMD will be having a good quarter.
 
Reactions: DeeJayBump and Yakk

Tuna-Fish

Golden Member
Mar 4, 2011
1,422
1,759
136
That review is BS IMO, its only benchmarked against older Xeon processors, how lame is that !

They'll probably do an epyc article later. Double the articles, double the clicks!

Ha, sounds about right for Toms

To be fair, according to the Anandtech article, AMD sent their processor really late. For a system as complex as a 2S server, running all the relevant tests properly in a week can be really hard. Anandtech chose to try, Tom's decided to delay.

1S is pretty much workstations. Servers are more or less 2S. You do have things like Xeon D though; but that's much lower power and footprint than either Epyc or the typical Xeons.

This is more because of Intel's market segmentation than because of business demands. What I, (and I expect many more!) would like to see a lot of cores as possible in multiple 1S systems fit into a single 2U enclosure. Basically, density is king, but I see no need for having that many CPUs in a single system.

I think AMD will be having a good quarter.

Well, not this quarter. Server market sales move slower than retail sales, I expect high volume on AMD servers no earlier than Q4.
 

Technotronic

Junior Member
Jul 12, 2017
23
78
41
Intel is acting pathetically. If they are being this underhanded in their slide deck.. I am sure to imagine they are back to their old slimeball ways of bribes and threats behind closed doors. Brian Krzanich even looks the part..
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
German ComputerBase released an article on their own. No benchmarks but afaik they are the first to release more slides from Intel that are quite telling.
https://www.computerbase.de/2017-07/intel-xeon-skylake-sp-purley/3/
Looks like Intel spent quite some time to disparage AMD's approach in its Xeon presentation (around 20 slides focus on that), so the spokesperson previously dissing Epyc as "stitching together 4 desktop dies" is apparently company PR policy now.
http://www.barrons.com/articles/amd-reveals-epyc-details-intel-vows-to-top-it-1497997334

Some of the slides:



Thanks for the laughs, Intel. They don't seem to be as confident in their own products anymore to have to go this low.
Selling to amazon google facebook with this marketing from the 80ties only hurts brandvalue and backfires even for uninformed b2b segments. If those "you dont get fired for buying ibm" customers even exist today?? I thought they were extinct. They talk to their customers as if they are braindead and is on a b2c market.
More idiotic and weird than vega FE launch. Where do these tech firms get their marketing folks? It sure isnt from eg heineken or apple.
I find it hard to beliewe its legit slides.
 

wildhorse2k

Member
May 12, 2017
180
83
71
Congratulations to AMD. The review started with mixed results for Epyc with the Spec CPU 2006 and database performance but the it had respectable lead in many other benchmarks or was even. Very good alternative to Xeons for cases where it performs well.
 
Reactions: Technotronic
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |