AMD EPYC Server Processor Thread - EPYC 7000 series specs and performance leaked

Page 7 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,655
136
You want to see the latency Intel has on this gen?

This is what you show those that complained about the CCX latency!

Edit: Sorry, forgot to link the article-
https://www.pcper.com/reviews/Proce...X-Processor-Review/Thread-Thread-Latency-and-

Am I reading that right and that the latency to every single core from any core is +100ns? If so then yeah the complaints of CCX cross talk is kind of unfounded. Though it probably explains why they decided to up the L2 cache by so much on each core.
 

ajc9988

Senior member
Apr 1, 2015
278
171
116
Am I reading that right and that the latency to every single core from any core is +100ns? If so then yeah the complaints of CCX cross talk is kind of unfounded. Though it probably explains why they decided to up the L2 cache by so much on each core.
They also talked crap on AMD using the L3 as a victim cache, but said nothing of it when Intel did it, except praise! Duplicity, HA!

But yes, you read that right. Now, even though inter-core comms is slower, it supposedly lowers latency of the cores to other elements on the chip, which is slower with a dual ring design. But that could also be why AMD designed their chip that way with IF. Just love showing the duplicity!!! Thought you would love this news, though!
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,655
136
They also talked crap on AMD using the L3 as a victim cache, but said nothing of it when Intel did it, except praise! Duplicity, HA!

But yes, you read that right. Now, even though inter-core comms is slower, it supposedly lowers latency of the cores to other elements on the chip, which is slower with a dual ring design. But that could also be why AMD designed their chip that way with IF. Just love showing the duplicity!!! Thought you would love this news, though!
I love it. It really shows the genius of the CCX design. If AMD ever decided to go big die, it's obvious that the treating the CPU's and some of the other useful portions like memory controllers as one module was an intelligent decision. They could easily go with a 16 core or even 32 core design die and have tons better latency within the modules. I think MCM is great choice for AMD here. Probably will be for awhile. But they have options and obviously are on the right track.
 

inf64

Diamond Member
Mar 11, 2011
3,764
4,223
136
I love it. It really shows the genius of the CCX design. If AMD ever decided to go big die, it's obvious that the treating the CPU's and some of the other useful portions like memory controllers as one module was an intelligent decision. They could easily go with a 16 core or even 32 core design die and have tons better latency within the modules. I think MCM is great choice for AMD here. Probably will be for awhile. But they have options and obviously are on the right track.
AMD has smart engineers, just like intel . They knew they couldn't make a monolithic design that would be economically feasible and clock just the same as MCM one . Now they can just pick the best Summitridge dice and mix and match them to form the best perf./watt/mm^2 SKUs, genius decision I must say. They will likely up the core count per CCX in Zen 2, to 6 cores per CCX. This should enable them to compete with whatever intel comes up with their own EMIB solutions. Fun times ahead
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
I love it. It really shows the genius of the CCX design. If AMD ever decided to go big die, it's obvious that the treating the CPU's and some of the other useful portions like memory controllers as one module was an intelligent decision. They could easily go with a 16 core or even 32 core design die and have tons better latency within the modules. I think MCM is great choice for AMD here. Probably will be for awhile. But they have options and obviously are on the right track.

The era of the massive monolithic die is over. AMD just spotted this fact a bit earlier than Intel and designed Zen with Infinity Fabric for this reality. I think the areas which AMD will work on to improve future generations of Zen are

1. Higher Fabric speed
2. Improved cache latency
3. More execution resources and higher IPC.
4. Improved branch prediction.
5. Higher clocks.

I think Zen on 14nm+ will start by addressing higher clocks. I don't know if Pinnacle Ridge could sport higher fabric speeds but its something AMD might look at before 7nm Zen 2. With 7nm AMD will have access to a true high performance node designed for 5 Ghz operation. 7nm Zen 2 should allow AMD to really address a lot of the shortcomings of the current Zen core. I think Intel Icelake on 10+ vs Zen 2 on GF 7LP will probably be the most competitive contest since the Athlon K7 and K8 days.
 

ajc9988

Senior member
Apr 1, 2015
278
171
116
The era of the massive monolithic die is over. AMD just spotted this fact a bit earlier than Intel and designed Zen with Infinity Fabric for this reality. I think the areas which AMD will work on to improve future generations of Zen are

1. Higher Fabric speed
2. Improved cache latency
3. More execution resources and higher IPC.
4. Improved branch prediction.
5. Higher clocks.

I think Zen on 14nm+ will start by addressing higher clocks. I don't know if Pinnacle Ridge could sport higher fabric speeds but its something AMD might look at before 7nm Zen 2. With 7nm AMD will have access to a true high performance node designed for 5 Ghz operation. 7nm Zen 2 should allow AMD to really address a lot of the shortcomings of the current Zen core. I think Intel Icelake on 10+ vs Zen 2 on GF 7LP will probably be the most competitive contest since the Athlon K7 and K8 days.

What you are missing is the latency hit from cross ccx movement in the cache is the infinity fabric latency. They are working on infinity fabric 2 already. This addresses your number 1 and 2. Use of the IBM 7nm addresses number 3 and 5. Number 4 every company does. So it is already being addressed, just a question of IF2 being ready or intermediate advances on IF.
 
Last edited:

ajc9988

Senior member
Apr 1, 2015
278
171
116
New information on latency:
https://www.pcper.com/reviews/Proce...Core-i5/CCX-Latency-Testing-Pinging-between-t
https://www.pcper.com/reviews/Proce...X-Processor-Review/Thread-Thread-Latency-and-

It should be noted that the Alienware Area 51 is built with 2933 ram. So, this means the gap is practically neutral moving between cores, with Intel having a slight advantage. Home built rigs may see higher ram speeds than even 3200MHz AND TR and EPYC are supposed to have an improved interconnect over Ryzen!
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,655
136
What you are missing is the latency hit from cross ccx movement in the cache is the infinity fabric latency. They are working on infinity fabric 2 already. This addresses your number 1 and 2. Use of the IBM 7nm addresses number 3 and 5. Number 4 every company does. So it is already being addressed, just a question of IF2 being ready or intermediate advances on IF.

I am not sure AMD has much room to grow on branch prediction. Ryzen supposedly as a high 90s% efficient predictor.
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,655
136
It should be noted that the Alienware Area 51 is built with 2933 ram. So, this means the gap is practically neutral moving between cores, with Intel having a slight advantage. Home built rigs may see higher ram speeds than even 3200MHz AND TR and EPYC are supposed to have an improved interconnect over Ryzen!

TR and EPYC will not have an improved interconnect. Honestly you should probably expect latency to increase when dealing with communication between dies.
 

Veradun

Senior member
Jul 29, 2016
564
780
136
http://www.servers-maintenance.com/...-processors-to-the-worlds-bestselling-server/

New Dell EMC PowerEdge servers deliver adaptable configurations to assist these new workloads. With the high PCIe lane depend (128 lanes) of the new AMD EPYC and its skill to assist up to 24 NVMe products out of a one processor, we now supply some truly exclusive server innovation in software described storage and huge knowledge/knowledge analytics at an outstanding TCO.

Dell EMC welcomes AMD’s re-entry into the organization area by building the EPYC processor a element of our PowerEdge server technological innovation. We’re fired up to make this technological innovation readily available to you in the 2nd half of 2017.
 

ajc9988

Senior member
Apr 1, 2015
278
171
116
I am not sure AMD has much room to grow on branch prediction. Ryzen supposedly as a high 90s% efficient predictor.
Even with that, that doesn't mean they are not working to improve it.

TR and EPYC will not have an improved interconnect. Honestly you should probably expect latency to increase when dealing with communication between dies.
When I said this, I was referring to the rumor from March which said 100gbps Infinity Fabric speeds (same one I pointed to for the B2 revision). The latency is tied to clocks, not absolute time. So, as the speed increases, the latency decreases (in absolute time, not per clock). We also discussed the ES chips for Ryzen having 1:1 DR of the ram speed. If they have the 1:1 of DR, not SR, then the increased speed of the Infinity Fabric will make the latency less of an impact, even though I do agree that the latency should increase to a degree per clock (we'll find out soon). Does it make more sense what I was trying to say previously?
http://www.tweaktown.com/news/56891/amds-12c-24t-16c-32t-cpus-called-threadripper/index.html
"Infinity Fabric can have a bandwidth up to 100GB/S"
 
Last edited:
Reactions: DarthKyrie

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,655
136
Even with that, that doesn't mean they are not working to improve it.


When I said this, I was referring to the rumor from March which said 100gbps Infinity Fabric speeds (same one I pointed to for the B2 revision). The latency is tied to clocks, not absolute time. So, as the speed increases, the latency decreases (in absolute time, not per clock). We also discussed the ES chips for Ryzen having 1:1 DR of the ram speed. If they have the 1:1 of DR, not SR, then the increased speed of the Infinity Fabric will make the latency less of an impact, even though I do agree that the latency should increase to a degree per clock (we'll find out soon). Does it make more sense what I was trying to say previously?
http://www.tweaktown.com/news/56891/amds-12c-24t-16c-32t-cpus-called-threadripper/index.html
"Infinity Fabric can have a bandwidth up to 100GB/S"

Well if that is true I think the reference to 100GB/s probably is going to turn out to be the link between the two dies and not interconnect. It's probably running at a lower speed than memory but because it doesn't have a silicon design cost it's probably wider. So higher throughput at a lower speed and therefore higher latency.
 
Reactions: ajc9988

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
Regarding the interconnect speed between Ryzen dies we already know that for dual socket systems Epyc will use 64 PCIe 3.0 lanes for extending the IF between the two sockets. Assuming this is no bottleneck nor underused this amounts to 512 GT/s between 2x 4x Ryzen dies.
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
So, why are the claims in live report and the table are different? And claims in live report actually make sense (check out with SPECint database).
What are the claims on the live report cant see it?

Table from AT
2P EPYC 7601 vs 2P E5-2699A V4
SPECint - SPECfp
Performance 1.47x - 1.75x
Average Power 0.96x - 0.99x
Total System Level Energy 0.88x - 0.78x
Overall Perf/Watt 1.54 - 1.76x
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
What are the claims on the live report cant see it?
Wait, never mind, i need a cup of coffee, could have sworn the numbers on event slide were different.

Oh, i got it, i confused AMD's slides with AMD/HPE one:



For reference, these numbers for int_rate/fp_rate is what i can find in SPEC database, while AMD's fp_rate is significantly off mark compared to other results. Probably icc vs gcc shenanigans (since you can notice that Naples fp_rate is much higher too here), but nonetheless.
 
Reactions: lightmanek

ajc9988

Senior member
Apr 1, 2015
278
171
116
"The connectivity here is set at a bidirectional 42.6 GB/sec per link, at around an average energy of ~2 pJ per bit (or 0.672W per link, 0.336W per die per link, totaling 4.032W for the chip). It is worth noting that Intel’s eDRAM for Broadwell was set as a 50 GB/s bidirectional link, so in essence moving off die in EPYC has a slightly slower bandwidth than Crystalwell. With a total of six links within the silicon, that provides a total of 2 terabits per second of data movement, although AMD didn’t state what the bottlenecks or latency values were."

"Socket-to-socket communication is designed at the die level, rather than going through a singular interface. One die in each processor is linked to the same die in the other processor, meaning that for the worst-case scenario data has to make two hops to reach a core or memory controller on the other side of the system. Each link has a bidirectional 37.9 GB/s bandwidth, which is only slightly less than the intra-socket communication bandwidth, although we would expect socket-to-socket to have a slightly higher latency based on distance. AMD has not shared latency numbers at this time."

"It is worth noting that the 42.6 GB/s die-to-die bandwidth is identical to the dual-channel memory bandwidth quoted per die:

http://images.anandtech.com/doci/11...or_press_and_analysts_06_19_2017-page-077.jpg

Time will tell if these become bottlenecks. Latency numbers please, I’d love to fill in that table above."

"As part of the launch today, AMD is announcing partners working with them to optimize the platform for various workloads. Sources say that this includes all the major cloud providers, as well as all the major OEMs. We saw several demo systems at the launch event with partners as well, such as HPE and Dell."

Good info here! Thanks!
 

formulav8

Diamond Member
Sep 18, 2000
7,004
522
126
Looks like AMD reduced Intel numbers do to the apparently unfair compiler advantage on Intel scores if TH is correct. I could see it proper to some degree but not 46%. Wacky.

Need real reviews.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |