- Mar 3, 2017
- 1,747
- 6,598
- 136
It's not a given.As I said before, we need to wait for APX instruction set implementation before we see huge IPC increase.
Discussing the bits and pieces of architectural weaknesses and how to overcome them is not coping. I don't know why this word is very much used in the shillicon twitter universe.We'll have our preview with STX Halo soon enough I guess.
The uncore is just pure cope at this point.
I don't mean that discussing it is cope, it isn't. I meant that the uncore is just poor. It's just downright funny that CCD is unable to use all of the memory bandwidth because of a single GMI3 link.Discussing the bits and pieces of architectural weaknesses and how to overcome them is not coping. I don't know why this word is very much used in the shillicon twitter universe.
I am saying this based on the visible improvements in SPEC int, fixed clock, when Z4 is equipped with 3D V-Cache . I would think removing the uncore bottlenecks which 3D Vcache attempts to work around would improve the situation, until the next bottle neck at least
Also Z4 in MI300A benefits from the LLC prefetching as per AMD themselves.
Additionally, they have prefetching updates for L1/L2 instead of just stream, stride, burst, nextline
Yee did point out huge improvements in scalar integer though. It's everything in-between that's stagnating.Well, It is not exactly stellar, saying it is mild improvement is being too generous considering the time frame involved.
I am mostly looking at Alexander Yee's blog to make this statement.
Other than AVX512 there is not much improvement
How would you feed these cores in AM5?So this is another typical AMD launch. Couple users over hype the product. Others fall for this hype. When the product is actually released, everyone feels disappointed. For me the performance meets the expectation from the architectural perspective.
It is well know that it is very difficult to increase integer IPC. The number of general purpose registers is a bottle neck. More read/write ports will help, but it may also increase power usage. As I said before, we need to wait for APX instruction set implementation before we see huge IPC increase.
Having said that, there is still lots of potential still left in AVX. With AVX512 they can probably go over 16 execution units.
My real disappointment is there is no 24/32 core AM5 Zen5 CPU.
It's about the same time-between-releases as Zen 4. But for this length of time people expect bigger gains (even if the process uplift was less)Wouldn't matter as much if there were a realistic chance of the cadence catching up, but it seems to get worse instead.
But is constrained by memory bandwidth, that is they have throughput as long as there is no data to be fetched from somewhere lower in the memory hierarchy.Yee did point out huge improvements in scalar integer though
Which was known to be delayed to account for CXL. So Zen 5 to spend the same time means it is actually doubly delayed instead catching up with the intended cadence.It's about the same time-between-releases as Zen 4.
That was to be expected though considering we already knew bigger uncore/IO changes would only happen with Zen 6 going by previous gens.But is constrained by memory bandwidth, that is they have throughput as long as there is no data to be fetched from somewhere lower in the memory hierarchy.
But they kept the L2 at 1MiB and kept the L2 to L3 at 32B/cycle. So no respite there too.
Let's just put it this way. Only once did a Zen land on time and that was Zen 3. And Zen 5 is right on average. If all Zens but one are delayed then well, what's the exception? It isn't Zen 5.Which was known to be delayed to account for CXL. So Zen 5 to spend the same time means it is actually doubly delayed instead catching up with the intended cadence.
Maybe you need to know your stuff to interpret what any one particular result means, but if you're just looking at the average it really doesn't matter. When it comes to distilling performance to a single number, regression to the mean is real, and sample size is king.
Looking at Phoronix results today, seems like Turin-D is going to be really, really good for cloud customers. It's a shame these kind of improvements didn't translate well to the consumer side (excluding a couple of areas like browsers).
If DDR4 is enough for 16 cores, I am sure DDR5 with double the bandwidth enough for 32 cores. At least should be enough for 24 cores.How would you feed these cores in AM5?
Yes, my body is ready again.Soo... hype train back on tracks?
AMD just released their new architecture, ZEN 5%More good news about the 9600X.
I am using DDR4 3200 with 5950X, so DDR5 6400 should be enough for 24 cores if not 32 Zen 5 cores. Remember L2 was increased in Zen4, so it will relieve some of the memory pressure. Sure more memory bandwidth will help, but what matters is whether 24/32 cores will out perform 16 cores in multi threaded applications. If even with limited bandwidth, a 24 core Zen5 beats 16 Zen5 within same power/bandwidth envelope, it is a win for the user.For Zen 3 DDR4 4000 was the sweet spot, for zen 5 it's 6000. So that's 50% more bandwidth. The codes are also higher performance so need more memory bandwidth. You can't have more than 26 cores without increasing the memory channels, which taxes everyone on the platform. If you need more cores or bandwidth you go to TR.
You know you dont have to buy hypothetical 24 core product, if you feel its too constrained by memory bandwith, right?For Zen 3 DDR4 4000 was the sweet spot, for zen 5 it's 6000. So that's 50% more bandwidth. The codes are also higher performance so need more memory bandwidth. You can't have more than 26 cores without increasing the memory channels, which taxes everyone on the platform. If you need more cores or bandwidth you go to TR.
It would be a pointless product that nobody would buy. Partly because no one buys the 16 core chips anyway, but additionally because they would be severely bandwidth constrained. If you've read this thread you can see that the current chips seem bandwidth constrained as they are already.I am using DDR4 3200 with 5950X, so DDR5 6400 should be enough for 24 cores if not 32 Zen 5 cores. Remember L2 was increased in Zen4, so it will relieve some of the memory pressure. Sure more memory bandwidth will help, but what matters is whether 24/32 cores will out perform 16 cores in multi threaded applications. If even with limited bandwidth, a 24 core Zen5 beats 16 Zen5 within same power/bandwidth envelope, it is a win for the user.
I wouldn't, and I'm not going to because AMD isn't releasing it.hiw many people are going to be interested in an underperforming 24 or 32 core chips that's so memory contained? Either you want the chip to work (get epyc or thread ripper) or you don't. Not many people are in the "I want to buy a high core count processor and would buy it even if it's memory starved" camp.You know you dont have to buy hypothetical 24 core product, if you feel its too constrained by memory bandwith, right?
Arent you constrained by memory bandwith only when you saturate entire available memory?I wouldn't, and I'm not going to because AMD isn't releasing it.hiw many people are going to be interested in an underperforming 24 or 32 core chips that's so memory contained? Either you want the chip to work (get epyc or thread ripper) or you don't. Not many people are in the "I want to buy a high core count processor and would buy it even if it's memory starved" camp.
No? You're memory bandwidth constrained when you can't get things out of the memory fast enough.Arent you constrained by memory bandwith only when you saturate entire available memory?
Allright then. There are still tasks, like 3D rendering, that does not benefit from faster RAM significantly, that would immensely benefit from additionaĺ cores. RAM speed perhaps becomes important factor when you run out of it and data needs to be fetched from drive, but thats better to be resolved by more RAM anyway.No? You're memory bandwidth constrained when you can't get things out of the memory fast enough.