I just read the write up of Xe on Anandtech, and I'm feeling... Skeptical. Lots of implausible numbers thrown around, like "50x increase!".
Hmm, maybe not though.
They didn't specify what they were comparing it to. They said 40x DP FP Flop
per EU.
Intel Gen architectures have 1:4, DP to SP ratio, which would make Ponte Vecchio 10x. That seems crazy right? But nope.
It's because Gen 11 doesn't have DP FP hardware! That's part of their quest to make it power/area efficient on client and gaming.
So 40x could mean back to 1:2, if Gen 11 performs like 1:80 due to emulation.
And Ponte Vecchio has 8 GPU dies per card, times two for two boards, making it total of 16 dies. If Ian is right on each Ponte Vecchio being ~66 TFlops DP with 2400 nodes, then each die has 4TFlop DP compute power.
Which coincidentally is equal to 512EU running at 1GHz with 1:2 DP ratio. But, with a real product, even Linpack isn't 100% efficient. Maybe it'll need 1.2GHz to get that performance. You'll have the Sapphire Rapids chips contribute some too.
He also thinks its possible its only 1200 nodes. Then you are talking each GPU needing to deliver double that which is 8TFlop DP. So 1024EU with 1.2GHz or 512EU with 2.4GHz. The former seems likely.
It could be 1200 nodes. This is based on the fact that it talks about Aurora having 10PB of memory. That means each node with 2x Sapphire Rapids has 8TB of memory.
4TB per CPU makes sense when you consider 8 memory channels and using 512GB, 3rd generation Optane DC PMM devices. That makes slightly more sense than 2TB per CPU using 256GB Optane modules when assuming total of 2400 nodes?