Question Are scalable vectors the ultimate solution to fixed width SIMD?

Jul 27, 2020
17,912
11,683
116
This thread inspired by the following quote from http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=threads/qualcomm-snapdragon-thread.2616013/post-41241585

And for SIMD - RV style to made it vector machine instead of hardwired SIMD makes hardware totally agnostic to SIMD register width.

@naukkis @SarahKerrigan @Nothingness

Is the scalable vector approach going to be a game changer when it's already possible to do it more or less in software with multiple supported targets? https://github.com/google/highway
 
Jul 27, 2020
17,912
11,683
116
So RISC-V is stupid? http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=threads/qualcomm-snapdragon-thread.2616013/post-41241618

Jim Keller have been very loud to defend that RV approach.
Jim has been very vocal about any ISA not mattering for high perf CPU.
Anyway, no matter what he said, what do you expect from the CEO of a company doing RISC-V chips?
Hmmm...why would Jim Keller put his reputation on the line for RISC-V in such a public way? Or maybe his RISC-V designs are more sensible?
 

Nothingness

Platinum Member
Jul 3, 2013
2,751
1,397
136

View attachment 102087
That highlighted part. Can RISC-V really do it and break all performance records?
There's a typo in the table: SVE can go up to 2048 bits.

Regarding the 64k max of R-V, two things make this unrealistic: hardware size, extracting that width from existing code. Wide vectors are only useful for very particular HPC workloads.

Anyway I can't go into more details as I don't know the RVV extension.
 

Nothingness

Platinum Member
Jul 3, 2013
2,751
1,397
136
RISC-V is not stupid. It's just based on a university archaic vision of RISC that evolved into a marketing war machine.

Hmmm...why would Jim Keller put his reputation on the line for RISC-V in such a public way? Or maybe his RISC-V designs are more sensible?
This was my answer:
Jim has been very vocal about any ISA not mattering for high perf CPU.
Anyway, no matter what he said, what do you expect from the CEO of a company doing RISC-V chips?
I agree with Jim on that: when you are targeting high performance, the ISA stops mattering.
I doubt he ever said R-V was magic sauce that had technical advantage. The only interesting point is that people can do whatever they want with the ISA contrary to x86 or Arm.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,422
1,759
136
already possible to do it more or less in software with multiple supported targets? https://github.com/google/highway
Industry has consistently rejected multiple supported targets for a cool 30 years now.

Compiler vendors sometimes advertise that using their compiler, you can do multiple supported targets and not have to increase testing to test every path. The industry response to that has been more or less this.
 

naukkis

Senior member
Jun 5, 2002
779
636
136
Vector ISA like RV or those 70-80's Crays express vectors as loops of scalar instructions. Vectors can be any length, with RV there's that practical 64K limit. And those vectors can be executed with any kind of hardware - from scalar to max 64K width SIMD. It's up to loops length if there's performance uplift from using wider SIMD execution units but existing code can extract more performance from wider SIMD hardware if parallelism is in the code. I actually wonder why only RV does utilize full vector isa - like every other ISA vendors want to stay on fixed length SIMD for cheap hardware implementations instead code reuseability. SVE sucks as badly as any other fixed length SIMD instruction set - or even more with that braindead scheme to support variable SIMD hardware. No wonder nobody wants to use it instead of NEON.
 
Reactions: Nothingness
Jul 27, 2020
17,912
11,683
116
But cpu designs are about to extract so big OOO windows that separating address generation from actual load/storing will come beneficial for extreme performance designs. RV is right there because it lacks that cheap implementation currently used.
Which CPU designs are you referring to? Zen 6? Apple M5? Upcoming ARM designs?
 

naukkis

Senior member
Jun 5, 2002
779
636
136
You mean cheap in terms of transistor count?

Yeah, those simple predictable addressing modes can be handled pretty much with fixed function logic. But hardware needs massive out-of-order window going towards thousand instructions to being able to pick those ld instructions so far ahead of rest code that data loads won't stall execution. In RV model it's also possible just to change address calculations before load instructions to achieve same effect - and if used wisely possible greatly outperform those fixed-function designs. There aren't those kind of hardware/software implementations out yet so this is of course only a speculation from what is possible to come - there might be coming some really performing RV designs in few years.
 

Nothingness

Platinum Member
Jul 3, 2013
2,751
1,397
136
Every high performance designs is heading to thousand instruction window and over.
Indeed and all high perf CPUs have uop split, uop fusion, large OoOE windows, etc.

That's why having poor addressing modes is silly. You gain exactly nothing (except larger code size and increased register pressure due to an extra reg needed for address computation) since your uarch is already very complex. There's nothing to gain by being simplistic as RISC-V for high performance. I wonder why R-V has reg + imm addressing mode since you could compute that before doing the memory access 🙄
 
Reactions: SarahKerrigan

naukkis

Senior member
Jun 5, 2002
779
636
136
I wonder why R-V has reg + imm addressing mode since you could compute that before doing the memory access 🙄

That's near pointer from cpu hardware point, operating in 4KB range. RV design is very well done, there's not much which could be done better.
 

naukkis

Senior member
Jun 5, 2002
779
636
136
You still need a full adder for that, no matter what the range is.

Fastpath only needs to calculate lower address. Whether if ISA supports longer immediates or don't well written code is optimized to pages. 4KB is immediate range forces coders and compilers to made them making better code than what would happen when allowing larger operating range.
 

Nothingness

Platinum Member
Jul 3, 2013
2,751
1,397
136
Fastpath only needs to calculate lower address. Whether if ISA supports longer immediates or don't well written code is optimized to pages. 4KB is immediate range forces coders and compilers to made them making better code than what would happen when allowing larger operating range.
That's so naive I'm speechless.

Do you have any experience in programming larger program?
 
Reactions: SarahKerrigan

naukkis

Senior member
Jun 5, 2002
779
636
136
That's so naive I'm speechless.

Do you have any experience in programming larger program?

Those general code optimization rules will stay as long as hardware is page-based. Optimal data access patterns are full pages as those are easy to cache. Those indexed addressing modes are usually worst for cache optimization as scaling addresses will very easily result running out of cache ways - though code optimization nowadays is pretty much compilers problems.
 

Nothingness

Platinum Member
Jul 3, 2013
2,751
1,397
136
Those general code optimization rules will stay as long as hardware is page-based. Optimal data access patterns are full pages as those are easy to cache. Those indexed addressing modes are usually worst for cache optimization as scaling addresses will very easily result running out of cache ways - though code optimization nowadays is pretty much compilers problems.
That's utter nonsense.

Will you do all memory allocations so that they're 4KB page aligned? And will you ensure all your data structure sizes will be <4KB? Can't wait to see how you achieve that in real life. If you have to do so because your ISA is limited, it's a dead-end.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |