If it's close to the N3E recticle limit and 512-bit, it's going to be stupid expensive. Even the cut down model will probably be a Titan and priced like it.
What makes you think its near reticle limit?
Look at the changes Nvidia made from A100 to H100 despite the 50% transistor density budget. They doubled the number of 32 bit and 64 bit ALUs, improved tensor cores by 2X, inceased L1 cache by 30% per SM and L2 cache by 20% and inceased the SM count by 20% (so overall L1 cache went up by 56%). They still reduced the overall die size even
On client with 50% budget, just taking H100 and reducing number of ALUs by cutting out 64 bit, reducing L1 cache fom 256KB to A100 level 192KB (they seem to offer 1KB per CUDA core since Turing), and reducing the scale of the hype bus link in Hopper they should maintain a die size of 620 mm squared or less.
Considering that kopite7kimi never mentioned or agreed with the other guy's 128mb L2 cache, the L2 cache may stay same size or reduce to maintain die sizes.