Discussion Future ARM Cortex + Neoverse µArchs Discussion

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

SarahKerrigan

Senior member
Oct 12, 2014
609
1,489
136
I say probably because the RGCloudS guy on twitter has been saying Samsung is cooking some exotic ARM cores. Apparently they are going to use some special modified Cortex X5s in their 2025 Exynos 'Dream Chip'.

Since when does ARM allow deep modification of its licensed cores?

I am always very skeptical when I see claims like this, because to the best of my knowledge, it has literally never happened.
 

FlameTail

Diamond Member
Dec 15, 2021
3,209
1,847
106
Since when does ARM allow deep modification of its licensed cores?

I am always very skeptical when I see claims like this, because to the best of my knowledge, it has literally never happened.
Indeed.

The Built-on-Cortex program only allows for slight modifications.

That guy's credibility isn't very high though, and one would be right to be skeptical.
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
So now that so many players are jumping into the ARM PC bandwagon
But are they? Do we already have confirmations? I'm not sure I'd want to hold my breath already for actually seen multiple ARM based PC devices maybe beyond select QC devices.
 

Panino Manino

Senior member
Jan 28, 2017
847
1,061
136
Cardyak on twitter. I've found the block diagrams to be very accurate (see the link in the post below). There are block diagrams of a large variety of cores from AMD/Intel/Apple/ARM.


I'm only a "average user", very ignorant that knows nothing about these things.
But after reading Anandtech in the beginning of the 2000's é now seeing CPU with 9-wide decode and 8 ALU... wow 😮
 
Reactions: NTMBK

FlameTail

Diamond Member
Dec 15, 2021
3,209
1,847
106
I'm only a "average user", very ignorant that knows nothing about these things.
But after reading Anandtech in the beginning of the 2000's é now seeing CPU with 9-wide decode and 8 ALU... wow 😮
I remember reading for the first time Andrei's review of the A13 Bionic, as a 13 year old. It blew me away. How far we have come...
 

SpudLobby

Senior member
May 18, 2022
963
658
106
Crazy a15 is still better in performance and energy efficiency.. so the x5 core will reach a16 while apple a19 will increase the lead
Not surprising whatsoever. Arm designs for low power + performance, but they can only go so far with the area constraints at hand.

The X4 is a generic Firestorm core. Matches an M1/A15 roughly on GB5 and close on Spec, albeit 26% more power draw (in the Spec example and probably similar for GB5 with the 8 Gen 3).

In a laptop SoC on the same process, I'd bet this thing would still draw less power than e.g. Phoenix would at that performance, and certainly they have a good 25% ish perf/GHz lead on AMD.

But it's not going to close a gap with Apple or Qualcomm without more cache and probably a different cache hierarchy with respect to the L1, which might be what they'll change with the next core. They'll have to keep growing it a bit to keep pace.

Qualcomm did go with 2MB of L2 on the X4 per Geekkwan's review btw, which is good - doesn't seem like that's restricted to a datacenter thing at all. Lot of predictions look silly now on that end.
 

Henry swagger

Senior member
Feb 9, 2022
449
284
106
Not surprising whatsoever. Arm designs for low power + performance, but they can only go so far with the area constraints at hand.

The X4 is a generic Firestorm core. Matches an M1/A15 roughly on GB5 and close on Spec, albeit 26% more power draw (in the Spec example and probably similar for GB5 with the 8 Gen 3).

In a laptop SoC on the same process, I'd bet this thing would still draw less power than e.g. Phoenix would at that performance, and certainly they have a good 25% ish perf/GHz lead on AMD.

But it's not going to close a gap with Apple or Qualcomm without more cache and probably a different cache hierarchy with respect to the L1, which might be what they'll change with the next core. They'll have to keep growing it a bit to keep pace.

Qualcomm did go with 2MB of L2 on the X4 per Geekkwan's review btw, which is good - doesn't seem like that's restricted to a datacenter thing at all. Lot of predictions look silly now on that end.
Yeah firestorm is a impressive core in ppw
 

soresu

Platinum Member
Dec 19, 2014
2,970
2,201
136
Which puts Nvidia and AMD in a weird position if they are also entering the space themselves (!?).
If by "the space" you mean SoC's purely for WoA laptops and NUC/Mac Mini style desktops then it's not really entering for either of them.

On AMD's side their APUs are basically doing that already just on a different CPU ISA.

While nVidia has been milling about with Tegra for over a decade already, so it's just a matter of software platform migration in their case.

The only question is will nVidia's effort be based on Cortex or Neoverse IP given the last Tegra chip (Orin?) was based on A78, and Grace is based on V2.
 

SpudLobby

Senior member
May 18, 2022
963
658
106
If by "the space" you mean SoC's purely for WoA laptops and NUC/Mac Mini style desktops then it's not really entering for either of them.

On AMD's side their APUs are basically doing that already just on a different CPU ISA.

While nVidia has been milling about with Tegra for over a decade already, so it's just a matter of software platform migration in their case.

The only question is will nVidia's effort be based on Cortex or Neoverse IP given the last Tegra chip (Orin?) was based on A78, and Grace is based on V2.
Uh, Cortex. Neoverse is for servers. It won't be custom either IMHO. It's possible they'll do a MediaTek co-packaged Windows solution but based on the rumor about the Surface contract and what Qualcomm lost, I think it's probably likely Nvidia has a real laptop SoC with Cortex reference cores and a low power adaptation of new GPU IP on something like N3E for 2025.
 

soresu

Platinum Member
Dec 19, 2014
2,970
2,201
136
Uh, Cortex. Neoverse is for servers
There's nothing intrinsically stopping an SoC designer from using Neoverse IP for consumer uses, though I imagine that licensing Neoverse is more expensive as generally server stuff is more profitable.
It won't be custom either IMHO
That's a given.

nVidia tried custom with Denver 1/2 and Carmel, but ultimately they couldn't match (let alone surpass) ARM's own designs well enough for the R&D cost to be worth it.

Same thing with Qualcomm's OG Kryo.

Even worse with Samsung's Mongoose3/M3/Meerkat.

A 6 wide µArch getting curb stomped by a 4 wide µArch (A76+) is not a good look.

It truly just goes to show how good the ARM Ltd design teams are to be able to compete with such comparatively huge companies.
 
Reactions: Tlh97

Geddagod

Golden Member
Dec 28, 2021
1,214
1,177
106
The X4 is a generic Firestorm core. Matches an M1/A15 roughly on GB5 and close on Spec, albeit 26% more power draw (in the Spec example and probably similar for GB5 with the 8 Gen 3).
Huh, really? Iso node too?
 

SpudLobby

Senior member
May 18, 2022
963
658
106
Huh, really? Iso node too?
N4P. But not that much of a gap between it and the N5P the A15 was on, and it draws about 26% more power (5.7W vs 4.5W) for Spec by trailing by about 8-10%. That’s still really, really good for the area & cache Arm are working with. GB5 it’s about 1693 @ 3.3GHz and probably similar power. Check the latest Geekerkwan video.

Generic Firestorm albeit without the Apple L1, L2 roughly matches what I think this thing is (and that cache deficit is likely why power is still higher). But still great.
 

SpudLobby

Senior member
May 18, 2022
963
658
106
There's nothing intrinsically stopping an SoC designer from using Neoverse IP for consumer uses, though I imagine that licensing Neoverse is more expensive as generally server stuff is more profitable.
It wouldn’t make much sense. Sure in principle but why? They have extra area penalties and are built for different workloads and are usually lagging the real Cortex cores. They’d sooner just do X5 + smaller X5, Arm is flexible deliberately.
That's a given.

nVidia tried custom with Denver 1/2 and Carmel, but ultimately they couldn't match (let alone surpass) ARM's own designs well enough for the R&D cost to be worth it.

Same thing with Qualcomm's OG Kryo.

Even worse with Samsung's Mongoose3/M3/Meerkat.

A 6 wide µArch getting curb stomped by a 4 wide µArch (A76+) is not a good look.

It truly just goes to show how good the ARM Ltd design teams are to be able to compete with such comparatively huge companies.
 

soresu

Platinum Member
Dec 19, 2014
2,970
2,201
136
They’d sooner just do X5 + smaller X5
Lower clock yes, smaller unlikely.

Xn are designed with raw perf in mind over area and power vs the PPA balance of A7x/7xx.

When you already have a good balance it's easier to trade off something in the design to get a bit more area or power as ARM offers with the A720 lite variant.

I'm pretty confident now that X5/Blackhawk will be the basis for V3/Poseidon - that's why I was thinking a V3 based hi end WoA SoC might be possible being as the Grace successor will almost certainly be using it anyway.

Very unlikely I know, but still a possibility - and to me a better choice as we still lack for a System Ready type of spec for Cortex based SoCs.

(unless I missed a PR slide SDXE is not System Ready either?)
 

SpudLobby

Senior member
May 18, 2022
963
658
106
Lower clock yes, smaller unlikely.

Xn are designed with raw perf in mind over area and power vs the PPA balance of A7x/7xx.

When you already have a good balance it's easier to trade off something in the design to get a bit more area or power as ARM offers with the A720 lite variant.

I'm pretty confident now that X5/Blackhawk will be the basis for V3/Poseidon - that's why I was thinking a V3 based hi end WoA SoC might be possible being as the Grace successor will almost certainly be using it anyway.

Very unlikely I know, but still a possibility - and to me a better choice as we still lack for a System Ready type of spec for Cortex based SoCs.

(unless I missed a PR slide SDXE is not System Ready either?)
Nah the X4’s smaller version is 1/2-1/4 the L2 and has amaller SIMD units (2X128B like the A720) which is huge in reducing the overall area but you’d still get better performance than the A720’s. I absolutely could see them doing a 4+4 design of that kind. Lower clock is just the other part of it. Much like MediaTek’s 9300 but adapted for a PC and replace the A720’s with the smaller X cores, make the first four all big.
 

FlameTail

Diamond Member
Dec 15, 2021
3,209
1,847
106
Hey.

So you know how Intel has faced issues with it's implementation of hybrid CPUs in recent generations (P and E cores). They had to disable a bunch of stuff like AVX-512 due to architectural mismatch as the E-cores don't support them.

Don't ARM cores face this conundrum?
 

DrMrLordX

Lifer
Apr 27, 2000
21,813
11,168
136
Don't ARM cores face this conundrum?
No.

On older ARM gens, all the cores have at least barebones support for NEON instructions. On newer ARM versions you have SVE2 which (long story short) can be supported on all compliant ARM cores regardless of vector width for that specific core.

Note that anything ARMv8.x it's pretty much just NEON or in-house instruction sets.
 
Last edited:
Reactions: soresu
Sep 18, 2023
26
13
41
Samsung already uses AMD's RDNA GPU IP in their Exynos Mobile SoCs.

Have in mind that the Samsung agreement with AMD is clear that is they are meant to use the Radeon IP in markets where AMD is not competing.

They have the license to exploit the Radeon IP in smartphone SOCs for sure, on tablets not so sure, and definitely they can't use them for laptop SOCs.

AMD themselves have the flexibility to start producing ARM-based laptop processors, they have commented about this in the past, but at least as of now, it's not a sound strategy when they have the current duopoly with Intel.

What I guess could be in their interests is something "Hybrid", featuring x86 cores that only them and Intel can manufacture and will remain compatible with the vast majority of Windows software; and run ARM-based cores for energy efficiency and background tasks.

But I don't see AMD abandoning x86 anytime soon.
 

SarahKerrigan

Senior member
Oct 12, 2014
609
1,489
136
No.

On older ARM gens, all the cores have at least barebones support for NEON instructions. On newer ARM versions you have SVE2 which (long story short) can be supported on all compliant ARM cores regardless of vector width for that specific core.

Note that anything ARMv8.x it's pretty much just NEON or in-house instruction sets.

Actually, there are some gotchas with heterogeneity on SVE - namely, process migration to a core with a different vector length during a vectorized loop body is unsafe.

The same binary can run on both, though - there are just risks involved in migration at runtime. I assume this is why AFAIK every consumer SVE-capable core has had vlen defined as 128b, and opted for multiple units, rather than wider vectors, to increase throughput.
 
Reactions: Tlh97 and moinmoin

SarahKerrigan

Senior member
Oct 12, 2014
609
1,489
136
That's interesting. Unsafe how? Are we talking execution failure or security issues?

So since they don't exist, I haven't tested any of this on a real vector-width-heterogeneous machine - I've only ever worked with SVE1 and only on server-class silicon. But my thinking is that if you have a migration midway through an SVE iteration - at least, a after any vector op that affects state (so, reduction ops, vector stores) based on the lanes only available in the bigger core - onto a narrower vector length, you're likely to have problems of the "your program no longer works correctly" variety. With any luck, it would just see that the generated predicate is out of range and crash, rather than continuing on an Undefined Behavior Adventure.

It's been a minute, so I'll look at the docs later and see if ARM comments on this particular scenario.
 

DrMrLordX

Lifer
Apr 27, 2000
21,813
11,168
136
So since they don't exist, I haven't tested any of this on a real vector-width-heterogeneous machine - I've only ever worked with SVE1 and only on server-class silicon. But my thinking is that if you have a migration midway through an SVE iteration - at least, a after any vector op that affects state (so, reduction ops, vector stores) based on the lanes only available in the bigger core - onto a narrower vector length, you're likely to have problems of the "your program no longer works correctly" variety. With any luck, it would just see that the generated predicate is out of range and crash, rather than continuing on an Undefined Behavior Adventure.

It's been a minute, so I'll look at the docs later and see if ARM comments on this particular scenario.

Thanks for taking the time to elaborate. I've looked into it briefly and it seems that consumer SVE2 SoCs are all probably 128b vector length for the time being, so yes, this hypothetical situation doesn't exist - yet.
 
Reactions: igor_kavinski
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |