News Amazon announces new cloud processors

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,258
15,390
136
I lot of marketing, but not even a suggestion of performance.
 

Saylick

Diamond Member
Sep 10, 2012
3,605
8,075
136
I lot of marketing, but not even a suggestion of performance.
Not much detail but Amazon is announcing Graviton4 (general CPU) and Trainium2 (AI). Graviton4 is up to 50% more cores than Graviton3 (which would make it up to 96 cores) and Amazon claims it can provide 30% more compute performance.
We'll just need to see actual benchmarking to confirm those claims.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,258
15,390
136
Graviton3.

The benchmark I found said Graviton3 was about 10% faster than Milan, and we know that Genoa is a LOT faster than Milan, so it would be interesting to see Genoa benchmarked against Graviton 4.


I just found it. Genoa is 43% faster than Milan, so adding that 10% + 30% the Genoa should still be faster than Graviton 4. At least it will be close.

Also, we don't know what the lower usage of Graviton4 is.
 

Saylick

Diamond Member
Sep 10, 2012
3,605
8,075
136

The benchmark I found said Graviton3 was about 10% faster than Milan, and we know that Genoa is a LOT faster than Milan, so it would be interesting to see Genoa benchmarked against Graviton 4.


I just found it. Genoa is 43% faster than Milan, so adding that 10% + 30% the Genoa should still be faster than Graviton 4. At least it will be close.

Also, we don't know what the lower usage of Graviton4 is.
Right, I don't expect it to be faster outright than the leader of the pack, but Amazon doesn't need it to have the performance crown. They just need it to be price competitive, which is the whole ethos of their Graviton line to begin with. If one day Graviton becomes performance competitive with Intel or AMD, that'd be a BAD look for Intel/AMD. It would spell disaster considering that Graviton is not even fully custom.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
I've run EDA workloads on Gravs and been very, very impressed - especially for single-thread. That V1 core has some oomph and V2 should be a nice boost.

They also mentioned adding Grace Hopper instances (though not, I think, standalone Grace - but its perf should be generally similar to Grav4 anyway.)
 

mikegg

Golden Member
Jan 30, 2010
1,847
471
136

The benchmark I found said Graviton3 was about 10% faster than Milan, and we know that Genoa is a LOT faster than Milan, so it would be interesting to see Genoa benchmarked against Graviton 4.


I just found it. Genoa is 43% faster than Milan, so adding that 10% + 30% the Genoa should still be faster than Graviton 4. At least it will be close.

Also, we don't know what the lower usage of Graviton4 is.
Power consumption should be much lower than Zen4 and it should be cheaper per EC2 server.

Most companies don't need the fastest ST server CPUs. They need adequate speeds and then all that matters is $/perf.
 
Reactions: Nothingness

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,258
15,390
136
Power consumption should be much lower than Zen4 and it should be cheaper per EC2 server.

Most companies don't need the fastest ST server CPUs. They need adequate speeds and then all that matters is $/perf.
And you know this because ???? Link ??
 

mikegg

Golden Member
Jan 30, 2010
1,847
471
136
And you know this because ???? Link ??
I know this because ARM CPUs are generally more efficient than AMD CPUs and the fact that Graviton1, 2, and 3 have been more power efficient than equivalent AMD CPUs.

and is the most powerful and energy efficient chip we have ever built for a broad range of workloads.
Quote here from Amazon suggests Graviton4 is even more efficient. No reason to think otherwise.
 

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
I know this because ARM CPUs are generally more efficient than AMD CPUs and the fact that Graviton1, 2, and 3 have been more power efficient than equivalent AMD CPUs.


Quote here from Amazon suggests Graviton4 is even more efficient. No reason to think otherwise.
In the DC space, ARM only really holds power efficiency crown when you look at specific workloads, not overall. Genoa and especially Bergamo are both extremely power competitive from a power perspective, even when looking from a "cloud-native" workload perspective.

Where Graviton is "cheaper" is that you don't pay a different company for the product, so your costs are whatever R&D costs alongside tapeout and manufacturing costs pretty much. If you can produce enough of them, you can get them cheaper overall than going to a vendor... but only if you can make enough of them.
 
Reactions: Tlh97 and moinmoin

StefanR5R

Elite Member
Dec 10, 2016
5,968
8,941
136
Last edited:
Reactions: Tlh97 and Elfear

Nothingness

Diamond Member
Jul 3, 2013
3,134
2,145
136
In the DC space, ARM only really holds power efficiency crown when you look at specific workloads, not overall. Genoa and especially Bergamo are both extremely power competitive from a power perspective, even when looking from a "cloud-native" workload perspective.
Do you have some link to share to give some substance to your claim? Has anyone ever measured power consumption on AWS or other cloud machines, Arm-based or not?

EDIT: I forgot to say I definitely agree with the second part of your post
 

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
Do you have some link to share to give some substance to your claim? Has anyone ever measured power consumption on AWS or other cloud machines, Arm-based or not?

EDIT: I forgot to say I definitely agree with the second part of your post
I'm pretty sure people put it to test against Ampere Altra Max specifically. Obviously, AmpereOne is still vapourware so we're stuck comparing against Altra Max only, and well in ServeTheHome's words:

Power consumption is perhaps the most shocking. We often hear that Arm servers will always be better on power consumption than x86, but in the cloud native space, that is only part of the story. With our AMD EPYC 9754, we had SPEC CPU2017 figures that were roughly 3x its only 128-core competitor, the Ampere Altra Max M128-30. Power consumption was nowhere near 3x. In our recent HPE ProLiant RL300 Gen11 Review, we were seeing a server maximum of around 350-400W. In our 2U Supermicro ARS-210ME-FNR 2U Edge Ampere Altra Max Arm Server Review we saw idle at 132W and 365W-400W. We tested the Bergamo part in several single-socket 2U Supermicro servers that we have including the Supermicro CloudDC AS-2015CS-TNR and we saw idle in the 117-125W range and a maximum of 550-600W.

The impact of this is that AMD is now offering 3x the SPEC CPU2017 performance at similar idle but only around 50% higher power consumption. We fully expect Ampere AmpereOne will rebalance this, but for those who have counted x86 out in the cloud native space, it is not that simple.
Link to the aforementionned article is here. ServeTheHome doesn't expect AmpereOne to completely close the gap, and it's pretty understandable why.

Graviton will likely fare a lot better in power efficiency thanks to the drastically reduced clock speeds (Graviton 3 runs at 2.5GHz, given the perf vs core count numbers provided Graviton4 must be running at 2.2GHz) but we actually don't know for certain because those perf counters aren't exposed. But such reduced clock speeds now pose a totally different question: now the cost advantage might be in risk again. Each Graviton4 is going to be a good bit weaker than a single Bergamo (128 cores vs 96 cores, sustained 2.2GHz vs 3.1GHz) after all. Bergamo has more silicon, but the compute die is a big chunk of N5 for Ampere, and I/O is connected using EMIB (or maybe it's CoWoS or something this time - packaging looks different to Graviton3).

I'm actually a bit curious about Graviton4, because from AWS's figures it's probably not really much faster than Graviton3E, if at all (claimed 30% faster than G3 for G4 vs 35% faster than G3 for G3E) . Just likely uses less power. Really says a lot when everything points to G4 being the more expensive product - aside from the aforementioned difference in packaging. No considerable size improvement with V1 vs V2 (or at least, with their respective X variants), plus you have 50% more cores, DDR5 channels and PCIe lanes vs G3E. It's all going to add up.
 

mikegg

Golden Member
Jan 30, 2010
1,847
471
136
I'm pretty sure people put it to test against Ampere Altra Max specifically. Obviously, AmpereOne is still vapourware so we're stuck comparing against Altra Max only, and well in ServeTheHome's words:


Link to the aforementionned article is here. ServeTheHome doesn't expect AmpereOne to completely close the gap, and it's pretty understandable why.

Graviton will likely fare a lot better in power efficiency thanks to the drastically reduced clock speeds (Graviton 3 runs at 2.5GHz, given the perf vs core count numbers provided Graviton4 must be running at 2.2GHz) but we actually don't know for certain because those perf counters aren't exposed. But such reduced clock speeds now pose a totally different question: now the cost advantage might be in risk again. Each Graviton4 is going to be a good bit weaker than a single Bergamo (128 cores vs 96 cores, sustained 2.2GHz vs 3.1GHz) after all. Bergamo has more silicon, but the compute die is a big chunk of N5 for Ampere, and I/O is connected using EMIB (or maybe it's CoWoS or something this time - packaging looks different to Graviton3).

I'm actually a bit curious about Graviton4, because from AWS's figures it's probably not really much faster than Graviton3E, if at all (claimed 30% faster than G3 for G4 vs 35% faster than G3 for G3E) . Just likely uses less power. Really says a lot when everything points to G4 being the more expensive product - aside from the aforementioned difference in packaging. No considerable size improvement with V1 vs V2 (or at least, with their respective X variants), plus you have 50% more cores, DDR5 channels and PCIe lanes vs G3E. It's all going to add up.
Shouldn't we compare the same generation? It's a 2+ year old chip vs AMD's latest.


When Ampere had talked about their plans to put to market a 128-core variant of the Neoverse N1, a 60% increase in cores over their first generation 80-core attempt, we were of course perplexed on how they would achieve this, especially considering the chip is meant to be used on the very same platform with same memory resources, and also on the same fundamental technology – same core microarchitecture, same mesh IP, and same process node.

The Altra Max is a lot more dual-faced than other chips on the market. On one hand, the increase of core count to 128 cores in some cases ends up with massive performance gains that are able to leave the competition in the dust. In some cases, the M128-30 outperforms the EPYC 7763 by 45 to 88% in edge cases, let’s not mention Intel’s solutions.
 

mikegg

Golden Member
Jan 30, 2010
1,847
471
136
Where Graviton is "cheaper" is that you don't pay a different company for the product, so your costs are whatever R&D costs alongside tapeout and manufacturing costs pretty much. If you can produce enough of them, you can get them cheaper overall than going to a vendor... but only if you can make enough of them.
There are many possible reasons why Graviton *is* cheaper on AWS than AMD CPUs:
  • Paying for transistors that you need for the exact workload instead of a more general-purpose CPU
  • Using ARM's core designs means lower cost
  • Higher number of cores per chip
  • Higher number of VMs per rack
  • Lower power consumption
  • Lower cooling requirements
I don't think it's as simple as designing volume chips inhouse automatically means it's cheaper.

That said, let's not downplay that the biggest advantage for Graviton is that it is cheaper/perf. That's all it matters for most cloud workloads.
 

Nothingness

Diamond Member
Jul 3, 2013
3,134
2,145
136
Yes TCO is what matters in the end. So Graviton has to be as power efficient as the competition (or close enough) to be worth it.
 

moinmoin

Diamond Member
Jun 1, 2017
5,094
8,098
136
Considering only AWS can own Graviton chips the whole TCO discussion is kind of moot though.
 

Nothingness

Diamond Member
Jul 3, 2013
3,134
2,145
136
Considering only AWS can own Graviton chips the whole TCO discussion is kind of moot though.
I agree we can't deduce real values (and that's why I asked above if power consumption was measured), but if TCO was higher, they wouldn't deploy that many Graviton instances. They're not a charity organization
 

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
Shouldn't we compare the same generation? It's a 2+ year old chip vs AMD's latest.

You literally couldn't read the second sentence I wrote, could you?

You can't compare against a product that hasn't sampled to any media because it is functionally vapourware.

There are many possible reasons why Graviton *is* cheaper on AWS than AMD CPUs:
  • Paying for transistors that you need for the exact workload instead of a more general-purpose CPU
  • Using ARM's core designs means lower cost
  • Higher number of cores per chip
  • Higher number of VMs per rack
  • Lower power consumption
  • Lower cooling requirements
I don't think it's as simple as designing volume chips inhouse automatically means it's cheaper.

That said, let's not downplay that the biggest advantage for Graviton is that it is cheaper/perf. That's all it matters for most cloud workloads.

1 and 2 are effectively the same thing - that the ARM core is supposedly smaller. At least in theory, because in practice Zen 4C is very close in size to X/V ARM core designs. 1 would make more sense if AWS were designing their own CPU cores, but they're not - they're using the standard ARM cores with the largest L2 size available to them afaik. Granted, I don't know what Graviton4 looks like from a shared cache perspective, so perhaps there's some die area saved on this. If you're referring to other parts of the design... well Graviton4 sports just as many DDR5 channels and PCIe lanes as Bergamo I believe, so it's not really cut there. Maybe they shaved a couple of mm^2 off for some USB or something? Bit of a stretch here.

3 literally isn't even true unless you're talking about per individual die, which is a frankly useless metric. Per package Graviton4 sports less cores than Bergamo at 96 cores vs 128 cores, and only just introduces support for 2P servers (something x86 land has supported for many years).

4 is a decent point now, but it's true thanks to point 5 - lower power consumption. This isn't an advantage of the ARM cores mind you, but rather an advantage with the exceptionally low power target Graviton4 likely has, allowing for less space to be used by cooling. The obvious trade-off here is - like I mentionned before - a reduction in per-core performance (significantly lower clocks - even lower than the prior generation Graviton3 is), and lower core count on top. Also same goes for point 6, which is tied into point 5 again.

So all in all - you have 2 potentially valid points. The size of the core could potentially be smaller - but any differences here I'd chalk up to the difference in area spent on cache rather than cores - and power consumption is the real benefit. But like I mentioned before, that also has it's own tradeoffs, so whatever.

It's really obvious you didn't read what I wrote, because you wouldn't have said half the stuff you wrote if you did. Which I just find very funny honestly.
 
Reactions: Tlh97

moinmoin

Diamond Member
Jun 1, 2017
5,094
8,098
136
I agree we can't deduce real values (and that's why I asked above if power consumption was measured), but if TCO was higher, they wouldn't deploy that many Graviton instances. They're not a charity organization
AWS is actually Amazon's huge money maker, with its traditional store front losing money. Graviton is obviously a try at increasing AWS' margins even more. But that doesn't have to mean that it's already doing so now.
 

StefanR5R

Elite Member
Dec 10, 2016
5,968
8,941
136
Timothy Prickett Morgan of The Next Platform speculates that Graviton4 might be manufactured on TSMC N4, might consist of two rather than one CPU dies, is clocked at 2.7 GHz using 130 W, and provides 96 MB system level cache (L3$).
https://www.nextplatform.com/2023/11/28/aws-adopts-arm-v2-cores-for-expansive-graviton4-server-cpu/

Granted, I don't know what Graviton4 looks like from a shared cache perspective, so perhaps there's some die area saved on this.
Andrei Frumusanu pointed out that the L2$ snoop filters take up a lot of space, which is why he expects that implementations of the Demeter platform design would tend to be frugal WRT system level cache size.
https://www.anandtech.com/show/16640/arm-announces-neoverse-v1-n2-platforms-cpus-cmn700-mesh/7
 
Reactions: Gideon

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
I think the mentality of some of the posters here is stuck in the era where wannabe "i hope clouderinos buy out our company" guys took 64 or 80 cores from R-PI, combined them with record breaking 8MB of L3 and system interconnect meant for mobile phone.

Graviton 3 and now Graviton 4 have nothing to do with these early efforts and are in fact purpose built for their cloud niche.

1) 100-130W power consumption ballpark for chip that parks 64 and now 96 cores
2) Competent IPC, no longer a fraction of Intel/AMD stuff
3) Cloud optimized system architecture: 2MB of L2 keep traffic local, 1MB L3 per core helps inter process comms and performance overall, proper mem bandwidth even if latency is not that great. The things like JVMs just work on these machines, no hidden gotchas due to non existing system level integration.
4) Personally i love predictable performance due to the fact that these are monolithic core complex, fixed clock, single NUMA domain chips with no HT, you get what you pay for and price is adjusted compared to x86 performance.
5) Software ecosystem is getting greater each day, gone are the days of phone chip compiler optimizations from 2014 and proper ARM core optimizations in LLVM and GCC are in.

So X86 "per package core advantage", "ST advantages" mean nothing unless the customer is looking for the specifically and those niches are shrinking steadily.
 

Harry_Wild

Senior member
Dec 14, 2012
841
152
106
Nvidia’s H200 will be release in 2024! Amazon AWS management wants to challenge Microsoft Azure division. Microsoft want to be the king of AI. Sam Altman also want to build an AI chip too!😱🫣🤡
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |