igor_kavinski
Lifer
- Jul 27, 2020
- 20,921
- 14,496
- 146
@Nothingness any comments?And claims that soft will grow enough over time (which apparently doesn't mean only Z5 but also 6/7 possibly).
@Nothingness any comments?And claims that soft will grow enough over time (which apparently doesn't mean only Z5 but also 6/7 possibly).
Yep: Zen5 is a failure 😉@Nothingness any comments?
Agreed. But it is interesting how he mentions that essentially the problem is the shift from the 6 wide decode to 4 wide, and also the former 4 ALUs. He isn't asked about FP or INT or anything, just "why's the IPC only that?" and the answer is specifically jumping to decode width and ALU count, which IIRC only went from 4 to 6 in INT. So my little ear got attentive there. Seems like even he knows what the crux of the complaints is. And claims that soft will grow enough over time (which apparently doesn't mean only Z5 but also 6/7 possibly).
Umm ackshuallyProblem is influencers
...
Mahesh Subramony was a lead SoC guy for Strix but nobody asked him anything
Faster network I/O could mean reduced latency in multiplayer games, amirite?AMD is adding a possibility to directly fill L3 from I/O devices.
The effect is completely negligible in that use. It matters for high-throughput web servers and such, as it reduces the overhead of handling a packet. I think it requires special support from both hardware and the OS, and is unlikely to be supported outside specialist server-grade stuff and Linux.Faster network I/O could mean reduced latency in multiplayer games, amirite?
I don't know. Seems like something the Killer NICs could leverage, if Intel still owns them and they are looking for another marketing bullet point to help sell their NICs to mobo makers. Would be weird for them to support an AMD feature but they just might do it, considering they badly need sales at the moment.The effect is completely negligible in that use. It matters for high-throughput web servers and such, as it reduces the overhead of handling a packet. I think it requires special support from both hardware and the OS, and is unlikely to be supported outside specialist server-grade stuff and Linux.
Yeah games typically acquire through TCP then stream UDP and there's a regular check to see if you're still on.The effect is completely negligible in that use. It matters for high-throughput web servers and such, as it reduces the overhead of handling a packet. I think it requires special support from both hardware and the OS, and is unlikely to be supported outside specialist server-grade stuff and Linux.
Not computation but latency can be reduced further since packet data won't need to be copied to RAM first. It will remain in cache always. It could increase the number of players serviced by a single server.There is no computation overhead
If Killer were to enter the market of ≥400Gb/s NICs, then yes. :-)Seems like something the Killer NICs could leverage,
I don't think cutting 100-200 Nanoseconds from 1-150 Milliseconds typical game server latency will make you hit that head-shot more often ...Not computation but latency can be reduced further since packet data won't need to be copied to RAM first. It will remain in cache always. It could increase the number of players serviced by a single server.
Because there aren't enough EU servers or players???Not when EU players still go to NA servers for fun and have a latency leap from 30ms to 120ms and find that "bad, but bearable".
I don't think cutting 100-200 Nanoseconds from 1-150 Milliseconds typical game server latency will make you hit that head-shot more often ...
I think it would be more meaningful to hook up a profiler and profile the software you are concerned with than to write another synthetic program that will try its best to pretend its average workload of type A, or B or C. So something like what C&C is doing but this takes time and you need the time also to get to know the documentation to know what performance counters mean. And since there isn't one yet available for Zen5 you need to read older ones and hope they still are doing the same thing. But its not guaranteed...Can someone here come up with an easy to run Windows benchmark (even just console based) and release it on Github so people here can run it on their 7950X and 9950X PCs and their results analyzed to figure out what's holding back Zen 5? Something like a staggered workload that starts with low core count and low data requirements and then progresses to higher and higher core counts and becomes increasingly memory bound?
I’m guessing this will be super useful for InfiniBand HPC stuffI haven't seen this mentioned here yet:
AMD is adding a possibility to directly fill L3 from I/O devices.
So it's the most frontend-limited design aroundZen 5 Variants and More, Clock for Clock
Zen 5 is AMD’s newest core architecture.chipsandcheese.com
nobody asked MLID for his opinion on ZEN5 failure, yet he gave it and perhaps good teaching lesson for AMD CPU departmentProblem is influencers gave "celebrity status" to core uarch guys as if the rest don't make a difference.
Mike Clark, the uarch chief architect, leader of the core roadmap, will absolutely say the Zen 5 core is great or in simulation it is great etc.
Nobody bothered to ask Sam Naffziger, the fabric and chiplet lead about what is up with Infinity fabric, or the chiplet tech in Zen 5
Nobody was seeking the SoC guys, or the product guys why their chip performed the way it did.
Mahesh Subramony was a lead SoC guy for Strix but nobody asked him anything, everybody asked only Mike Clark.
The product guys assemble all the IPs together to make the final purchasable product, so they definitely are responsible for the final performance of the product, not just the uarch folks
So it's the most frontend-limited design around
Yeah, the growth in backend resources outpaced the front-end. Clam pretty much said this in the GNR review:So it's the most frontend-limited design around
Widening the core may have been premature too. Much of the potential throughput offered by Zen 5’s wider pipeline is lost to latency, either with backend memory accesses or frontend delays.
I will let myself quote the original article:So it's the most frontend-limited design around
ibx264 is backend bound (...) Zen 5 loses more throughput, but that’s because it has a wider pipeline and thus more potential throughput to lose. Zen 5 is still the leader here, but that’s because of its increased reordering capacity and better frontend rather than core width.
Kernel compilation is a very frontend bound workload (...) Again Zen 5 loses the most potential throughput to frontend reasons, despite having arguably the most advanced frontend of all CPUs here. Feeding a 8-wide core is hard when branches are everywhere. Despite not looking so good in this graph, it’s important to remember that Zen 5 outperforms every other core architecture here, even when limited to four cores at 3 GHz
Still, the fundamental limiters for CPU performance have remained the same over the past decade and more. (...) I’ll have fun watching engineers try their best to tackle those challenges. It’s like watching a new player learn to play Dark Souls. Tragedy is inevitable. With that in mind, I wish them the best.
In other words try as hard as you can software engineers will manage to nullify whatever performance improvements you have came up withMemory latency and spaghetti code will claim your performance. Resistance is futile.
Zen5 is exploring entirely new thresholds of being front-end bound.Zen 5 Variants and More, Clock for Clock
Zen 5 is AMD’s newest core architecture.chipsandcheese.com