Schmide
Diamond Member
- Mar 7, 2002
- 5,590
- 724
- 126
RVV is vector ISA not SIMD. You are talking about SIMD arch, vector ISA is totally different. Vector ISA vectors aren't fixed size but variable so your example cannot turn 1:1 into vector example. BTW second permute seems to have typo, you probably meant dh instead df. But if vector ISA needs to take changes to vector registers it either compress masked data into different vector or gather them with index vector. RVV has actually pretty risc view of doing those kind of permutations, traditional vector cpus have compress/expand and gather/scatter to vector registers but RVV does only have compress and gather. So no way to easiest programming with manipulating vectors forth and back, always only way is to generate code towards high performance path.
Nice catch on the df.
The way I see it. A vector ISA is just a variable sized SIMD. The point I am trying to make still stands, if you exceed the size of the lane (or operational element) the system is going to have to do more passes to achieve the same results or reorder the data before or after the operation. Using the interleave example on arm, vzip takes 2 vectors and spits out 1 vector double the size. This simplifies the operation but somewhat handicaps the functionality. The work is generally the same.
Gather and scatter seem very useful, but If you think of what it is actually doing, it's reading or writing to memory multiple times to achieve the same results.
For RVV it sure seems programming it will be easier, I hate lanes, whether that translates to better performance is unknown.