6 WGPs(12 CUs) with 4x SIMD32 per CU and 2x vALU32 per SIMD32 would be 3072 ALUs or 512 ALUs per WGP. Just check out the last 2-3 pages in this thread.
The question is, what is true.
Does RDNA3 have 2x more SIMD32 per CU or does RDNA3 have 2x more vALU32 per SIMD32, or both are true and theoretical FP32 thoughtput is 4x better.
I think
2x 32 lane VALU per SIMD32 is quite certain, people don't play with LLVM. It is incredibly hard to optimize it correctly, touching something else is just asking for trouble. It could produce some random code. Which actually happened for Blender, AMD's HIP compiler produces code crashing the compute kernel, it was fixed some months ago.
Regarding 4x SIMD32 per CU, I believe it is possible. I think the enum value of
NUM_SIMD_PER_CU = 4 is actually correct.
The gfx_v11.c code is just copied from Navi10. So it is not updated yet in so many places. So they may indeed use the actual SIMD count from soc21, fingers crossed but no reason to think otherwise
Golden registers are not there, RAS is not there, firmware is not there and not uploaded in LVFS yet. Still long way to go.
I think they are just doing final bringup and have not got everything done. After that they will push the rest of the code.
Just a reminder, driver for every new ASIC is get developed initially on emulator as usual.
Once the new merge Window opens up with Linux 5.20, expect another burst of changes. Couple of weeks from now.
Another thing I have noticed is that N31 has no XGMI, no support added uptil now. N21 used up around a good chunk of mm2 for XGMI alone and only used in Radeon Pro W6800X for Apple.
Regarding MCD, I think 32MiB is just too small, like 35mm2 per MCD including UMC+PHY. Quite weird to have so small chiplets, Packaging overhead could be a thing,
It should be at least 64MiB each, it would take the MCD to around 60-65mm2 in size. Otherwise might just make a single MCD of 64 MiB instead of 2x 32 MiB chiplets
Also would make N32 carry more cache than N21 (i.e. 256 MiB and not 128 MiB if it were 32 MiB per MCD) considering it is so much more potent and needs the BW.