- Mar 3, 2017
- 1,747
- 6,598
- 136
as I understand from tremont discussion second decoder should decode second path of branchSo how would the 2nd decoder be used? It would decode the instruction immediately after the one being decoded by the first decoder?
AMD's laptop firmware hasn't allowed disabling SMT ever since Renoir from what I've heard.may be if someone with 370 in hand can disable SMT and e cores and makes some test then we can compare results with 4 core zen4
also interesting to see core to core comparison with SMT on and off.
to save areaI'm not sure it was mentioned, but why did AMD split stx into 2 ccx's? I think that totally killed the core to core latency. I mean it can't be a technical problem. Because after all, the 8500G has 2x z4 and 4x z4c sharing the same ccx without the latency problem. Why AMD? Why?
That is another thing that would be nice if AT investigated. I mean it would be nice they could comment on both the core performance in isolation (pin the workload to single core, double check it stays there and measure whatever needs measuring) and MT/SoC performance where scheduling issues could be confirmed and pointed out. Now we get visual representation that something was not exactly ok [but was that the test procedure, the CPU itself or Windows] but we are left guessing what. Of course I understand those things take time but they could be highlighted in the review and it could be mentioned that they will be investigated in another piece [bonus points for keeping up the promise and doing the piece]The encoding test are also all over the place (looking at AV1), with there being large gap between Phoenix and Strix and then the difference gets swapped in the same test but different resolution? Feels like something doesn't quite work in the scheduling or in clock management. Could also be SVT having awful threading model perhaps. AV1 encoding via Handbrake seemed to suck in ComputerBase review too, perhaps the encoder sucks on big.little (although in massively threaded up, it should effectively stop being big.little except for the caches). Maybe the encoder is prone to some problem with threading or dispatching SIMD on zen 5 / strix for some reason.
My understanding is that when there's a branch, each decoder will work on a separate branch. Here's the chips and cheese article on the matter. https://chipsandcheese.com/2024/07/...t-how-30-year-old-idea-allows-for-new-tricks/So how would the 2nd decoder be used? It would decode the instruction immediately after the one being decoded by the first decoder?
which of zen5 products will be on 3nm?
Maybe Strix halo too?
Certainly PBO or a manual OC though. If it achieved it at 230W, its so odd that AMD chose to require PBO to reach that perf instead of letting the proc handle it itself. Only thing I can think of is its a reliability/culpability play, and the voltage and current requirements to hit this ~45K are just not something they are comfortable having to warranty.
That is why it would be nice if somebody would test decode behaviour with SMT on/off in BIOS for 1T load to put the doubts to rest.AMD's Mike Clark gave interviews last week to Chips and Cheese and Ian Cutress. He said that practically all core resources can be used by a single thread.
Interesting post by David Huang;
"The consequence of Zen 5's initial release to most media outlets for testing on ultra-thin notebooks is that you can't even find a few Cinebench tests where a single core ran at full frequency without being throttled..."
No wonder AT couldn't measure any ST IPC increase in Specint while David measured around 10% jump vs Zen 4 mobile part.
Another comment (spicey language):
edit;
One more
x.com
x.com
"I suggest you wait until I finish running SPEC and GB under Linux in a few days before drawing any conclusions.In addition, if you have read my previous analysis of performance bottlenecks, you will know that even for a 6-wide 4ALU x86 processor, the performance bottleneck is mostly not in the decoding width or the number of ALUs."
Almost certainly this. If it could be done with reliability guaranteed, theres no way they wouldnt have done it with Arrow Lake looming.2, AMD is also afraid of silicon degrading which happened at Raptorlake.
There is still time for bios updates, I hope.Problem is, from leaks we've seen in R23 ST, even desktop silicon is also not holding its full single core boost freqs.
This also happens with Apples chips too, they don’t reach max clocks in the r23 ST test but they do in r2024.Problem is, from leaks we've seen in R23 ST, even desktop silicon is also not holding its full single core boost freqs.
Strange how it’s reporting 3.4GHz as the clock speed.Strix Halo
HP - Geekbench
Benchmark results for a HP with an AMD Eng Sample: 100-000001422-31_N processor.browser.geekbench.comHP - Geekbench
Benchmark results for a HP with an AMD Eng Sample: 100-000001422-31_N processor.browser.geekbench.com
Have you seen the clocks next to the score? Or do you base it on the fact that score is less than expected? These two things don't need to go hand in hand [although it would be better if boost was not reached for the leaked scores, then there would be a chance something can be tuned in BIOS to boost the clock to advertised values]Problem is, from leaks we've seen in R23 ST, even desktop silicon is also not holding its full single core boost freqs.
But this also shows that ZEN5 need more juice compared to ZEN4. The advantage at low wattages is pretty meh for 50% more threads. It only starts getting decent at power levels above ZEN4s sweetspot. I'm 80% sure we will see that every ZEN5 Desktop SKU is slower than it's predecessor at low wattages. Similar to Igors Leaks, the ZEN5 ES was besten by 7950X pretty much everywhere below 100W. ZEN5 needs Juice to run properly.The performance at the same wattage is 17-34% higher. Notebookcheck compared against the Z1 Extreme and 8945HS:
Its because the scores are not equaling +17% vs known Zen 4 SKUs at known clocks. We are seeing ~9%-14% for 9600X and 9700X vs their corresponding Zen 4 SKUs depending on whether PBO is on or off.Have you seen the clocks next to the score? Or do you base it on the fact that score is less than expected? These two things don't need to go hand in hand [although it would be better if boost was not reached for the leaked scores, then there would be a chance something can be tuned in BIOS to boost the clock to advertised values]
So ideally, there should be 4 decoders, to handle the case of two branching pathways and the instruction to be executed after the branch is entered.My understanding is that when there's a branch, each decoder will work on a separate branch. Here's the chips and cheese article on the matter. https://chipsandcheese.com/2024/07/...t-how-30-year-old-idea-allows-for-new-tricks/
IOD is said to be N3E, yes.Maybe Strix halo too?
So ideally, there should be 4 decoders, to handle the case of two branching pathways and the instruction to be executed after the branch is entered.