- Mar 3, 2017
- 1,747
- 6,598
- 136
We're back in business baby!
Yeah, I had some questions when I saw how much they beefed up the L1 -> FP PRF bandwidth and FP register file size compared to INT, so larger FP gains were expected, but not to this extent perhaps? I'm very interested in learning why they made the choices they did.Strix Point is a small step forward. It is more competitive with SDXE in many areas. And it is shipping before M4 shows up in laptops.
But it is almost exactly what I didn't want. 9-10% higher INT 1T with a bigger FP and MT performance increase. I would prefer the opposite.
Time to rewrite software to use FLOATs instead of INTs !!!"When we said nT specint, we meant 1t specfp!"
Time to rewrite software to use FLOATs instead of INTs !!!
Why? Let's do it and deal with the fallout later(Note: None of the above is me genuinely suggesting software be rewritten to use FP types. That sounds horrifying.)
Now they don't have to avoid FP. Zen 5 will save the day!Also, Javascript and Lua (prior to the int types introduced in... 5.3?) use FP as their primary scalar data type, though implementations generally try to avoid actually executing FP code for them when possible.
Why? Let's do it and deal with the fallout later
Obama: Everybody's Got to Learn How to Code
Colors, letters and coding? President Obama has new homework for America’s students.www.vox.com
Maybe Kamala's new initiative upon getting elected should be, Everybody learn to rewrite software in FLOATs!
If only they've bothered to do GMI-Wide, then Zen5 would probably interest me a bit, otherwise the calc speed is limited by RAM bandwidth. Hopefully x3d SKUs won't be as locked as they are now and more coolable, otherwise its not something that stirs my upgrade-ido.Time to rewrite software to use FLOATs instead of INTs !!!
Improved load/store is also for FP. You can't be as granular so both 512-bit and 256-bit AVX cores have the same thing. Every time Intel doubled FP execution, they doubled Load/Store.The thing is, in Strix Point there is really little that could boost the FP performance. The only thing worth of note is updated scheduler layout, bigger register file and lower add latency. Where int side was boosted with improvements to load, store, and more execution units. I mean int scalar side has seen much more changes comparatively speaking.
Why? Let's do it and deal with the fallout later
Obama: Everybody's Got to Learn How to Code
Colors, letters and coding? President Obama has new homework for America’s students.www.vox.com
Maybe Kamala's new initiative upon getting elected should be, Everybody learn to rewrite software in FLOATs!
Who cares?Where is the proof?
For me, 9950X is the proof for Zen 5's frequency scaling. Where is Apple's proof of their silicon's frequency scaling superiority?
Who cares?
This is proof that both Intel and AMD are smoking the high frequency drug again ala Netburst/Bulldozer, for dubious marketing benefits when a better core would do better for sales.
They need a complete rejig of the architecture with clocks aimed at under 5GHz. A mini Conroe/Zen transition.
No, it also matters when you can't achieve the IPC needed to be competitive. And Zen 5 probably did need 250-300MHz more.Yeah, "muh 6GHz" only really matters to those looking to something to latch on to.
Gonna happen with Zen 7. Don't see how it can't. They have to compete with Nova Lake. The Jim Keller complete re-design.They need a complete rejig of the architecture with clocks aimed at under 5GHz. A mini Conroe/Zen transition.
It's not me:Improved load/store is also for FP. You can't be as granular so both 512-bit and 256-bit AVX cores have the same thing. Every time Intel doubled FP execution, they doubled Load/Store.
Integer is that much HARDER than FP to improve. So things such as increased core-to-core latency impacts Int much greater than on FP.
Also, this is another indication to me both P core teams(Intel and AMD) are struggling.
from C&C Mike Clark interview and since STRIX doesn't have 512 data paths, it did not get improved FP bandwidth vs Zen4 but can do 4 loads on scalar side. At least this is how I understand this quote.Yeah, so I’ll try, I mean I know it is a little bit complex. So, we like to think about it of the data cache can handle 4 memory operations per cycle and so starting from that baseline on the load side, they can all be loads, 4 loads, we can do. Now based on the size of the load, because we only have a data path to the floating-point unit that’s you know 512 bits on two of the ports. You can only do 2 loads that are floating point.
Gonna happen with Zen 7. Don't see how it can't. They have to compete with Nova Lake. The Jim Keller complete re-design.
Please don't cry when AMD hits 6 GHz with one hand tied behind their back using TSMC A16. Apple might be lucky to get close to 5.5 GHz on the same process but even that seems very optimistic.Yeah, "muh 6GHz" only really matters to those looking to something to latch on to.
Please say it one more time, but with confidence the Tejas team had!Please don't cry when AMD hits 6 GHz with one hand tied behind their back using TSMC A16.
It's not me:
from C&C Mike Clark interview and since STRIX doesn't have 512 data paths, it did not get improved FP bandwidth vs Zen4 but can do 4 loads on scalar side. At least this is how I understand this quote.
I think Geekerwan's FP testing for Zen 4 is unrealistic. Even in their original Zen 4 testing, they found a <5% increase in IPC from spec2017 FP vs Zen 3, which doesn't make much sense.OK!! The fabled 35% has finally reared its head! lol
Please don't cry when AMD hits 6 GHz with one hand tied behind their back using TSMC A16.
That team could've done it on Intel 7 Ultra process! They were just too ahead of their time.Please say it one more time, but with confidence the Tejas team had!
Zen 4 could only do 0.5 512b store per cycle. Or rather it could only do 256b store per cycle. Now Zen5 can do 2x256 stores per cycle [or 1x512] according to the same interview. So Strix Point got the improvement to Store bandwidth not to the load bandwidth.This seems to verifyt only 1 512b load/store per cycle for strix point. But store bandwidth is doubled compared to Zen4. https://blog.hjc.im/zen-5-more-details-1.html
The thing with these desktop designs is that they are extremely bandwidth starved. If Zen 5 had the kind of membw that M4 has, its IPC would really fly. Apple has overbuilt their memory subsystem so much that even all processing blocks running at 100% usage can't saturate their available membw because they are running too slow to crunch more data. Zen 5/6/7 has greater chances of scaling with better performing memory in future, especially DDR6. Just look at how much Zen 4 flies with X3D's cache bandwidth. Zen5X3D will be an INT/FP data devouring monster.Right now, 5.7GHz Zen4 (I'll withhold judgment on Zen5 until GNR launches) looks just okay against 4.4GHz M4. It doesn't get treated as magically superior because it's running at a higher clock.