600W is actually less than I expected. 100B transistors, 8 stacks of HBM with bw probably well north of 2TB/s.
A100 is 400W @312TF/s FP16
If they can really do 1PF/s FP16 in just 600W without any "sparsity" tricks it would be awesome.
That's 312TF/s in FP16 Tensor operations though. If you go by FP16 doubling FP32, A100 should be at 40TFlops.
They also call it "Peta
Ops" meaning it's something like the Sparsity/Tensor to get that.
I also wouldn't call 1PF/s FP16 Tensor impressive either. It'll be going against next generation Nvidia parts. So 312TF x 1.5 TDP x 2x Next Gen roughly equals the 1PF/s figure.
Remember they were also boasting 40 TFlops in FP32 using 4 tiles. Ponte Vecchio has 16. That's 160 TFlops FP32. Another doubling gets us to 320TFlops. Higher clocks and something like Int8 numbers gets us to 1PF/s?
Not really interested in Ponte Vecchio or any of the HP/HPC parts for that matter.
I'll probably get the DG2 512EU version unless it's really really bad. The compatibility for games on Xe-LP is acceptable. I feel as though it may arrive after EIP-1559 and they'll need to go by how it performs not just because it's available.