Absolutely. Look at the SpecINT scores, and Anandtech's own review. Try to commit the entire CPU to one working set and performance tanks. Run a bunch of small VMs with low core counts and performance is much better. Either cache is too small (half the reference Neoverse size) or the interconnect is problematic. Possibly both.
Yes, for large instances there is about 30% performance hit. But from economical point of view you don't enjoy 53% cost savings (small VM instances) but only 40% cost savings. So it's like super huge win vs. huge win.
But still huge economical win for Graviton2. And pretty nice win in higher performance per thread for G2 too. Not mentioning 1/10th CPU cost and half power consumption.
Regarding SVE:
- 128-2048 bit
- sizeless vector type
- optional functions contains BFloat16 for ML and matrix multiplication
SVE2
- is instruction extension, length stays 128-2048 bit
- a lot of DSP functions
- optional functions contains cryptography fncs
I think Fujitsu ARM CPU will be fine with SVE1 and optional fncs for mat mul.
Take a look at Table of Content:
6. List of base SVE functions
6.2. Loads
6.3. Stores
6.4. Prefetches
6.5. Address calculations
6.6. Scalar to vector operations
6.7. Integer arithmetic
6.8. Logical operations
6.9. Shifts
6.10. Integer reductions
6.11. Integer comparisons
6.12. While comparisons
6.13. Counting bits
6.14. Conversion
6.15. Reversal
6.16. Floating-point arithmetic
6.17. Floating-point reductions
6.18. Floating-point comparisons
6.19. Floating-point conversions
6.20. Permutation and selection
6.21. Vector creation
6.22. Vector insertion and extraction
6.23. Predicate creation
6.24. Predicate operations
6.25. Testing predicates
6.26. FFR manipulation
6.27. Counting elements
6.28. Saturating scalar arithmetic
6.29. Reinterpreting data
7. List of optional SVE functions
7.2. BFloat16 extensions
7.3. INT8 matrix multiply extensions
7.4. FP32 matrix multiply extensions
7.5. FP64 matrix multiply extensions
8. List of base SVE2 functions
8.2. While greater comparisons
8.3. Uniform DSP operations
8.4. Widening DSP operations
8.5. Narrowing DSP operations
8.6. Unary narrowing operations
8.7. Non-widening pairwise arithmetic
8.8. Widening pairwise arithmetic
8.9. Bitwise ternary logical instructions
8.10. Large integer arithmetic
8.11. Multiplication by indexed elements
8.12. Uniform complex integer arithmetic
8.13. Widening complex integer arithmetic
8.14. Complex integer dot product
8.15. Extra floating-point conversions
8.16. Floating-point widening multiply-accumulate
8.17. Floating-point integer binary logarithm
8.18. Vector histogram count
8.19. Character match
8.20. Contiguous conflict detection
8.21. Polynomial arithmetic
8.22. Extended table lookup/permute
8.23. Non-temporal gather/scatter
9. List of optional SVE2 functions
9.2. Bit permutation
9.3. AES-128 functions
9.4. SHA-3 functions
9.5. SM4 functions