Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

Page 49 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DrMrLordX

Lifer
Apr 27, 2000
22,491
12,364
136
RK3588 was announced long before it actually went to fabs I think.

The specs they originally announced were different to what they later made.
Yup, there was a long wait for it since people were excited for an SBC SoC with something other than A72 on it. Took way too long to get to market, hence my comment about the O6 and RK3688.
 
Reactions: Tlh97 and soresu

Nothingness

Diamond Member
Jul 3, 2013
3,277
2,329
136
A bit of reading seems to imply by SIMD's basic definition that is a vector, depending on what you classify to be a vector:

For some a vector can only look like CDC and Cray vector implementations. Anyway the point is moot: the article under discussion clearly classifies both R-V vector and Arm SVE as vector extensions:

Vector instructions such as RISC-V vector extension [9] and ARMSVE [18] have recently been introduced in general-purpose CPUs [1,17]. A vector instruction processes multiple data elements in atime-division manner, achieving high performance with lower hardware cost.

They used RISC-V for their study for obvious reasons (the main being the availability of a cycle accurate model, Spike).
 

soresu

Diamond Member
Dec 19, 2014
3,689
3,026
136
Yup, there was a long wait for it since people were excited for an SBC SoC with something other than A72 on it. Took way too long to get to market, hence my comment about the O6 and RK3688.
Ye, the Amlogic 'competitor' S928X took even longer to get to market and on a larger node to boot.
 

DZero

Senior member
Jun 20, 2024
769
291
96
IIRC the GPU spec changed from one more contemporary to the A76 to the G610.

I might be misremembering things though.

RockChip have an annoying tendency to be ambiguous with specs of future SoCs sometimes, like the RK3688 mentions a v9.3-A CPU core, but according to latest rumours of X930 and Ax30 they are v9.4-A ISA instead 😒
Wait, there isn't v9.3 CPU core?
 

naukkis

Senior member
Jun 5, 2002
991
841
136
A bit of reading seems to imply by SIMD's basic definition that is a vector, depending on what you classify to be a vector:


Vector cpu vectors are loops of independent scalar operations which can take different paths. Direct not abstracted SIMD hardware uses packed vectors -without shuffle instructions hardware is identical to scalar but operate with packed vectors instead of scalar operations. OOO is so identical for SIMD hardware as scalar hardware - one op per vector. Vector ISA code instead have hundreds of possible ops per vector.
 
Last edited:

naukkis

Senior member
Jun 5, 2002
991
841
136
Here is a link to hotchips slides for SXAurora Vector Engine Proccessor https://old.hotchips.org/hc30/2conf/2.14_NEC_vector_NEC_SXAurora_TSUBASA_HotChips30_finalb.pdf, Anandtech did a live blog on it here https://www.anandtech.com/show/13259/hot-chips-2018-nec-vector-processor-live-blog. The blog post mentions OoO, while the slides are using OoO scheduling. You can also find SX-ACE slides here https://old.hotchips.org/wp-content...e-epub/HC26.11.110-SX-ACE-MOMOSE-NEC-v004.pdf

From high level point of view they seem similar, SX-ACE hotchips slides don't mention OoO explicitly as far as I can tell. But still Aurora seems like an evolution of ACE so they thought that adding OoO scheduling is important.

My vector cpu knowledge might be from 80's but Nec designers in that ACE document mention that they consider their vector design OOO. It's just software based as hardware resolving memory dependencies in system where programmer/compiler has already packed massive mostly independent instructions into vectors seems like a way to wreck whole design. But maybe Intel EPIC would have eventually also develop into hardware-OOO machinery......
 

naukkis

Senior member
Jun 5, 2002
991
841
136
Intel experimented with it. The final gen, Poulson, was also dynamically scheduled and could have been extended to full OoO relatively easily.

For EPIC it was only potential way to go forward. For vector cpu it ain't. All vector cpu's also have scalar side of cpu which can handle situations where vector side performs poorly. There have not been, yet, vector cpu where scalar side have packed SIMD support but RV might evolve to that direction.
 

soresu

Diamond Member
Dec 19, 2014
3,689
3,026
136
Vector cpu vectors are loops of independent scalar operations which can take different paths. Direct not abstracted SIMD hardware uses packed vectors -without shuffle instructions hardware is identical to scalar but operate with packed vectors instead of scalar operations. OOO is so identical for SIMD hardware as scalar hardware - one op per vector. Vector ISA code instead have hundreds of possible ops per vector.
Too many vectors.....



Sorry, had to be done 😆
 

soresu

Diamond Member
Dec 19, 2014
3,689
3,026
136
I invite you to show us some RISC-V vector code that demonstrates that it fits that "definition" which I don't understand.
I can only assume that they mean instead of SIMD's "every problem must fit 4 hammers at once" take that 'vector ISA' code can do a lot more than 4 operations per instruction, or possibly any number of operations from 2 to whatever the limit is.

I can only assume that the limit is determined by the number of ALUs and their size.

So if you had a 128 bit ALU you could do 64 x 2 bit ops, or 32 x 4 bis, or 16 x 8 bits and so on and so on.

IIRC rapid packed math in GPUs required actually changing the SIMD ALUs so that you could actually get double rate FP16, where as prior to that you just got full rate FP32 with FP16 at the same speed despite the halved precision.

My guess would be that he is implying a true vector ISA is built to do all of these possible variations from the ground up.
 

naukkis

Senior member
Jun 5, 2002
991
841
136
I can only assume that they mean instead of SIMD's "every problem must fit 4 hammers at once" take that 'vector ISA' code can do a lot more than 4 operations per instruction, or possibly any number of operations from 2 to whatever the limit is.

Don't confuse vector isa to SIMD. Vector cpu binary language is to present loops of scalar instructions as vectors. RVV vector max length is 64KB. Vector cpu's don't have to have SIMD hardware at all, all code could be executed just fine with scalar execution units like it was done on first vector cpus. But that code presented as vectors is also executable with SIMD hardware - at any SIMD length with same binaries. It really seems that most people don't even understand whole basic idea behind vector cpus.
 

naukkis

Senior member
Jun 5, 2002
991
841
136
I invite you to show us some RISC-V vector code that demonstrates that it fits that "definition" which I don't understand.

Vector cpus vectors are independent scalar ops. Vector cpu''s can usually chain those ops between execution units and if data addresses are known there's no need for hardware out-of-ordering. OOO will be needed when addressing is calculated dynamically on the fly and when doing so there's as many tracked ops as there's scalar ops in vectors as they are independent ops and not packed to solid packed vectors. With low-width SIMD hardware it's still possible to do hardware data tracking and ordering but as those NEC engineers noted, it's pretty impractical to track and rearrange wide alu count vector machine ops, like those 256 per cycle on that Nec design. Hardware limitation makes it possible to do either wide in-order execution units or much narrower OOO units.
 

naukkis

Senior member
Jun 5, 2002
991
841
136
It really wasn't.

Hardware can easily detect data patterns from runtime execution which cannot predict in compile time. When it detects those and those are critical for code timing it's only logical to add out-of-order hardware to execute those in advance. Intel hardware designers did know what to do - but such a complex ISA implementation made hardware implementations too complex to be any competitive against simpler designs.
 

Nothingness

Diamond Member
Jul 3, 2013
3,277
2,329
136
Vector cpus vectors are independent scalar ops. Vector cpu''s can usually chain those ops between execution units and if data addresses are known there's no need for hardware out-of-ordering. OOO will be needed when addressing is calculated dynamically on the fly and when doing so there's as many tracked ops as there's scalar ops in vectors as they are independent ops and not packed to solid packed vectors. With low-width SIMD hardware it's still possible to do hardware data tracking and ordering but as those NEC engineers noted, it's pretty impractical to track and rearrange wide alu count vector machine ops, like those 256 per cycle on that Nec design. Hardware limitation makes it possible to do either wide in-order execution units or much narrower OOO units.
So can you exhibit R-V code that demonstrates that it is more of a vector ISA than SVE?
 

naukkis

Senior member
Jun 5, 2002
991
841
136
So can you exhibit R-V code that demonstrates that it is more of a vector ISA than SVE?

I don't understand why? RVV is vector isa, pure and clean. SVE instead is scalable packed SIMD - which doesn't actually work beyond academic use cases. In vector isa underlying hardware is totally abstracted, code just works in any hardware if ISA and hardware are bug free. SVE instead is relying software support for different width SIMD hardware -that ain't newer worked and probably newer will. I really don't know why ARM wants to push that braindead solution which both software and hardware vendors don't want to use.

And about where that discussion started - no, hardware vendors don't want or should make OOO arm core with in-order SVE. SVE is OOO-friendly and and removing OOO would just make it slower, specially when running with 128bit vectors where SVE is performing well.
 
Last edited:

LightningDust

Member
Sep 3, 2024
40
67
51
Hardware can easily detect data patterns from runtime execution which cannot predict in compile time. When it detects those and those are critical for code timing it's only logical to add out-of-order hardware to execute those in advance. Intel hardware designers did know what to do

I'm not saying out-of-order isn't useful. I'm saying EPIC performance could have been scaled fairly easily, and that Intel microarchitects had a clear idea of how they would go about it if there was going to be an extended IPF roadmap.

but such a complex ISA implementation made hardware implementations too complex to be any competitive against simpler designs.

On the contrary, IPF cores were fairly small; Itanium silicon was dominated by SRAM (and the small cores compared to, say, Power meant that IPF was able to bring LLC on-die almost a decade prior to IBM.) Additionally, Itanium was competitive against comparable RISC and x86 server processors as long as Intel and HP were continuing to seriously invest in it. Even after the Montecito fiasco, where a late-breaking erratum in novel power management features caused Montecito to be delayed by a year and to lose 15% of its projected clock speed, the resulting part was essentially performance-competitive at release. Itanium silicon only became uncompetitive when RISC/UNIX as a whole had started to decline. By that stage, there were non-technical considerations in play - I have an informed suspicion that Poulson, a massive improvement, was deliberately held back by at least a year so that the hilariously bad Tukwila could have a full sales lifecycle.
 
Jul 27, 2020
23,540
16,535
146
On the contrary, IPF cores were fairly small; Itanium silicon was dominated by SRAM (and the small cores compared to, say, Power meant that IPF was able to bring LLC on-die almost a decade prior to IBM.) Additionally, Itanium was competitive against comparable RISC and x86 server processors as long as Intel and HP were continuing to seriously invest in it. Even after the Montecito fiasco, where a late-breaking erratum in novel power management features caused Montecito to be delayed by a year and to lose 15% of its projected clock speed, the resulting part was essentially performance-competitive at release. Itanium silicon only became uncompetitive when RISC/UNIX as a whole had started to decline. By that stage, there were non-technical considerations in play - I have an informed suspicion that Poulson, a massive improvement, was deliberately held back by at least a year so that the hilariously bad Tukwila could have a full sales lifecycle.
Welcome back, Sarah Kerrigan
 

Nothingness

Diamond Member
Jul 3, 2013
3,277
2,329
136
I don't understand why? RVV is vector isa, pure and clean. SVE instead is scalable packed SIMD - which doesn't actually work beyond academic use cases. In vector isa underlying hardware is totally abstracted, code just works in any hardware if ISA and hardware are bug free. SVE instead is relying software support for different width SIMD hardware -that ain't newer worked and probably newer will. I really don't know why ARM wants to push that braindead solution which both software and hardware vendors don't want to use.
I think we already went through this. I'd really like to see code VL agnostic and make comparisons of SVE vs R-V vector extension.

I've seen VL agnostic SVE code that doesn't need a single change for different VL. But I guess there are cases where that doesn't work (shuffles?) and I'd be interested in seeing how R-V handles that.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |