Not much detail but Amazon is announcing Graviton4 (general CPU) and Trainium2 (AI). Graviton4 is up to 50% more cores than Graviton3 (which would make it up to 96 cores) and Amazon claims it can provide 30% more compute performance.
I lot of marketing, but not even a suggestion of performance.
We'll just need to see actual benchmarking to confirm those claims.Not much detail but Amazon is announcing Graviton4 (general CPU) and Trainium2 (AI). Graviton4 is up to 50% more cores than Graviton3 (which would make it up to 96 cores) and Amazon claims it can provide 30% more compute performance.
30% more than what ? Hence my marketing claim.We'll just need to see actual benchmarking to confirm those claims.
Graviton3.
Right, I don't expect it to be faster outright than the leader of the pack, but Amazon doesn't need it to have the performance crown. They just need it to be price competitive, which is the whole ethos of their Graviton line to begin with. If one day Graviton becomes performance competitive with Intel or AMD, that'd be a BAD look for Intel/AMD. It would spell disaster considering that Graviton is not even fully custom.
The benchmark I found said Graviton3 was about 10% faster than Milan, and we know that Genoa is a LOT faster than Milan, so it would be interesting to see Genoa benchmarked against Graviton 4.
I just found it. Genoa is 43% faster than Milan, so adding that 10% + 30% the Genoa should still be faster than Graviton 4. At least it will be close.
Also, we don't know what the lower usage of Graviton4 is.
Power consumption should be much lower than Zen4 and it should be cheaper per EC2 server.
The benchmark I found said Graviton3 was about 10% faster than Milan, and we know that Genoa is a LOT faster than Milan, so it would be interesting to see Genoa benchmarked against Graviton 4.
I just found it. Genoa is 43% faster than Milan, so adding that 10% + 30% the Genoa should still be faster than Graviton 4. At least it will be close.
Also, we don't know what the lower usage of Graviton4 is.
And you know this because ???? Link ??Power consumption should be much lower than Zen4 and it should be cheaper per EC2 server.
Most companies don't need the fastest ST server CPUs. They need adequate speeds and then all that matters is $/perf.
I know this because ARM CPUs are generally more efficient than AMD CPUs and the fact that Graviton1, 2, and 3 have been more power efficient than equivalent AMD CPUs.And you know this because ???? Link ??
Quote here from Amazon suggests Graviton4 is even more efficient. No reason to think otherwise.and is the most powerful and energy efficient chip we have ever built for a broad range of workloads.
In the DC space, ARM only really holds power efficiency crown when you look at specific workloads, not overall. Genoa and especially Bergamo are both extremely power competitive from a power perspective, even when looking from a "cloud-native" workload perspective.I know this because ARM CPUs are generally more efficient than AMD CPUs and the fact that Graviton1, 2, and 3 have been more power efficient than equivalent AMD CPUs.
Quote here from Amazon suggests Graviton4 is even more efficient. No reason to think otherwise.
Do you have some link to share to give some substance to your claim? Has anyone ever measured power consumption on AWS or other cloud machines, Arm-based or not?In the DC space, ARM only really holds power efficiency crown when you look at specific workloads, not overall. Genoa and especially Bergamo are both extremely power competitive from a power perspective, even when looking from a "cloud-native" workload perspective.
I'm pretty sure people put it to test against Ampere Altra Max specifically. Obviously, AmpereOne is still vapourware so we're stuck comparing against Altra Max only, and well in ServeTheHome's words:Do you have some link to share to give some substance to your claim? Has anyone ever measured power consumption on AWS or other cloud machines, Arm-based or not?
EDIT: I forgot to say I definitely agree with the second part of your post
Link to the aforementionned article is here. ServeTheHome doesn't expect AmpereOne to completely close the gap, and it's pretty understandable why.Power consumption is perhaps the most shocking. We often hear that Arm servers will always be better on power consumption than x86, but in the cloud native space, that is only part of the story. With our AMD EPYC 9754, we had SPEC CPU2017 figures that were roughly 3x its only 128-core competitor, the Ampere Altra Max M128-30. Power consumption was nowhere near 3x. In our recent HPE ProLiant RL300 Gen11 Review, we were seeing a server maximum of around 350-400W. In our 2U Supermicro ARS-210ME-FNR 2U Edge Ampere Altra Max Arm Server Review we saw idle at 132W and 365W-400W. We tested the Bergamo part in several single-socket 2U Supermicro servers that we have including the Supermicro CloudDC AS-2015CS-TNR and we saw idle in the 117-125W range and a maximum of 550-600W.
The impact of this is that AMD is now offering 3x the SPEC CPU2017 performance at similar idle but only around 50% higher power consumption. We fully expect Ampere AmpereOne will rebalance this, but for those who have counted x86 out in the cloud native space, it is not that simple.
Shouldn't we compare the same generation? It's a 2+ year old chip vs AMD's latest.I'm pretty sure people put it to test against Ampere Altra Max specifically. Obviously, AmpereOne is still vapourware so we're stuck comparing against Altra Max only, and well in ServeTheHome's words:
Link to the aforementionned article is here. ServeTheHome doesn't expect AmpereOne to completely close the gap, and it's pretty understandable why.
Graviton will likely fare a lot better in power efficiency thanks to the drastically reduced clock speeds (Graviton 3 runs at 2.5GHz, given the perf vs core count numbers provided Graviton4 must be running at 2.2GHz) but we actually don't know for certain because those perf counters aren't exposed. But such reduced clock speeds now pose a totally different question: now the cost advantage might be in risk again. Each Graviton4 is going to be a good bit weaker than a single Bergamo (128 cores vs 96 cores, sustained 2.2GHz vs 3.1GHz) after all. Bergamo has more silicon, but the compute die is a big chunk of N5 for Ampere, and I/O is connected using EMIB (or maybe it's CoWoS or something this time - packaging looks different to Graviton3).
I'm actually a bit curious about Graviton4, because from AWS's figures it's probably not really much faster than Graviton3E, if at all (claimed 30% faster than G3 for G4 vs 35% faster than G3 for G3E) . Just likely uses less power. Really says a lot when everything points to G4 being the more expensive product - aside from the aforementioned difference in packaging. No considerable size improvement with V1 vs V2 (or at least, with their respective X variants), plus you have 50% more cores, DDR5 channels and PCIe lanes vs G3E. It's all going to add up.
When Ampere had talked about their plans to put to market a 128-core variant of the Neoverse N1, a 60% increase in cores over their first generation 80-core attempt, we were of course perplexed on how they would achieve this, especially considering the chip is meant to be used on the very same platform with same memory resources, and also on the same fundamental technology – same core microarchitecture, same mesh IP, and same process node.
The Altra Max is a lot more dual-faced than other chips on the market. On one hand, the increase of core count to 128 cores in some cases ends up with massive performance gains that are able to leave the competition in the dust. In some cases, the M128-30 outperforms the EPYC 7763 by 45 to 88% in edge cases, let’s not mention Intel’s solutions.
There are many possible reasons why Graviton *is* cheaper on AWS than AMD CPUs:Where Graviton is "cheaper" is that you don't pay a different company for the product, so your costs are whatever R&D costs alongside tapeout and manufacturing costs pretty much. If you can produce enough of them, you can get them cheaper overall than going to a vendor... but only if you can make enough of them.
I agree we can't deduce real values (and that's why I asked above if power consumption was measured), but if TCO was higher, they wouldn't deploy that many Graviton instances. They're not a charity organizationConsidering only AWS can own Graviton chips the whole TCO discussion is kind of moot though.
You literally couldn't read the second sentence I wrote, could you?Shouldn't we compare the same generation? It's a 2+ year old chip vs AMD's latest.
The Ampere Altra Max Review: Pushing it to 128 Cores per Socket
www.anandtech.com
There are many possible reasons why Graviton *is* cheaper on AWS than AMD CPUs:
I don't think it's as simple as designing volume chips inhouse automatically means it's cheaper.
- Paying for transistors that you need for the exact workload instead of a more general-purpose CPU
- Using ARM's core designs means lower cost
- Higher number of cores per chip
- Higher number of VMs per rack
- Lower power consumption
- Lower cooling requirements
That said, let's not downplay that the biggest advantage for Graviton is that it is cheaper/perf. That's all it matters for most cloud workloads.
AWS is actually Amazon's huge money maker, with its traditional store front losing money. Graviton is obviously a try at increasing AWS' margins even more. But that doesn't have to mean that it's already doing so now.I agree we can't deduce real values (and that's why I asked above if power consumption was measured), but if TCO was higher, they wouldn't deploy that many Graviton instances. They're not a charity organization
Andrei Frumusanu pointed out that the L2$ snoop filters take up a lot of space, which is why he expects that implementations of the Demeter platform design would tend to be frugal WRT system level cache size.Granted, I don't know what Graviton4 looks like from a shared cache perspective, so perhaps there's some die area saved on this.