Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 446 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ajay

Lifer
Jan 8, 2001
16,069
8,098
136
That's setting yourself up for disappointment.

I don't expect both a 40% IPC gain (already very high) and a fMax boost in the same generation. If they manage to pull a rabbit out of a hat and complete a magic trick to do that, Zen 5 will be unobtanium.
Eh, all that needs to happen is for *one* benchmark to increase by more than 30% and it's winner, winner, chicken dinner. All the leakers will simultaneously orgasm.
 

cherullo

Member
May 19, 2019
48
115
106
A few months back, an AMD patent about a op-cache that could spill it's contents into the L1I made the rounds in the forums:

Method and apparatus for virtualizing the micro-op cache - https://patents.justia.com/patent/11586441
Rather than dropping the evicted micro-operations, the evicted micro-operations are written to the conventional cache subsystem.
There is a "pre-decode cache" which stores whether a cache line stores instructions or uops. There may be distinct pre-decode caches covering each cache level (L1I, L2, L3) or a single, global cache.
Since instructions are usually denser than uops, the patent lists some possibilities upon eviction of uops from the uop-cache: compression of immediate values in the uop cache entry, usage of two cache lines (two ways) simultaneously or just discard the decoded uops.
When uops are evicted from the L3, the patent doesn't say whether they should be written to memory. So this is volatile and not Denver-like.

While I was looking for it, I also found this:
Processor with multiple op cache pipelines - https://patents.justia.com/patent/11907126
Basically, the op-cache would be able to retrieve uops from two different addresses on the same clock. One possibility would be to fetch the instructions up to a branch and the instructions following the target address.
AFAIK current Zens can't do this (even for branches not taken), so branches do reduce the instruction throughput out of the uop-cache. Could this be what was called Zero-Bubble Branch Predictor in the Zen5 slides?
Another interesting possibility that Zen4 can't do is to fetch uops from different threads in the same cycle. This behavior can be steered by external policies (QoS) and how full each downstream uop queue is. Could make SMT even more efficient.

So, did we get any more information about it? Maybe LLVM or Linux kernel patches with related info?
Do you guys think that any of this made it into Zen5?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,736
14,765
136
A few months back, an AMD patent about a op-cache that could spill it's contents into the L1I made the rounds in the forums:

Method and apparatus for virtualizing the micro-op cache - https://patents.justia.com/patent/11586441

There is a "pre-decode cache" which stores whether a cache line stores instructions or uops. There may be distinct pre-decode caches covering each cache level (L1I, L2, L3) or a single, global cache.
Since instructions are usually denser than uops, the patent lists some possibilities upon eviction of uops from the uop-cache: compression of immediate values in the uop cache entry, usage of two cache lines (two ways) simultaneously or just discard the decoded uops.
When uops are evicted from the L3, the patent doesn't say whether they should be written to memory. So this is volatile and not Denver-like.

While I was looking for it, I also found this:
Processor with multiple op cache pipelines - https://patents.justia.com/patent/11907126
Basically, the op-cache would be able to retrieve uops from two different addresses on the same clock. One possibility would be to fetch the instructions up to a branch and the instructions following the target address.
AFAIK current Zens can't do this (even for branches not taken), so branches do reduce the instruction throughput out of the uop-cache. Could this be what was called Zero-Bubble Branch Predictor in the Zen5 slides?
Another interesting possibility that Zen4 can't do is to fetch uops from different threads in the same cycle. This behavior can be steered by external policies (QoS) and how full each downstream uop queue is. Could make SMT even more efficient.

So, did we get any more information about it? Maybe LLVM or Linux kernel patches with related info?
Do you guys think that any of this made it into Zen5?
Zen 5 was sampling months ago, no way any of that is in it. Zen 6, maybe even Zen 7 is where it might be.
 

leoneazzurro

Golden Member
Jul 26, 2016
1,003
1,590
136
Since 12 Gen core laptops are still being sold sold I think we can find Zen 4 laptops for a foreseeable future.
True, I just got a MSI katana 15 with Alder Lake and a 4070 for 1100€, these machines and the Zen4 are still very capable so with the right pricing (and the fact there will be reduction of the costs of N5 and N4 processes with time, they will easily occupy the lower end/mainstream part of the market.
 
Reactions: Tlh97 and biostud

Ajay

Lifer
Jan 8, 2001
16,069
8,098
136
The leaker that said 39% said it was in one specific benchmark. So ????
So that one person, if that's the number - will prove that he actually has an excellent source compared to the rest. The rest will still boast about getting it right because they have held 8 different positions on desktop Zen5 cpus and will claim victory - aka, that one piece of spaghetti actually did stick to the wall.
 
Last edited:

Timorous

Golden Member
Oct 27, 2008
1,726
3,141
136
Reactions: Tlh97 and Joe NYC

Timorous

Golden Member
Oct 27, 2008
1,726
3,141
136
Why use 2x8c instead of 1x16c, for the 16c client CPU variants of Zen6?

Assuming there will be 16c CCD of Zen6 available anyway, why not use them on client too? Also opens up for 2x16c on client CPUs, and 1x16c + 1x8c.

Clockspeed, yields, node? 8c could be on an older node than 16c and 32c so would be quite a bit cheaper. V-cache compatibility.

Those are just off of the top of my head.
 
Reactions: Tlh97 and Mopetar

Thunder 57

Platinum Member
Aug 19, 2007
2,794
4,075
136
Why use 2x8c instead of 1x16c, for the 16c client CPU variants of Zen6?

Assuming there will be 16c CCD of Zen6 available anyway, why not use them on client too? Also opens up for 2x16c on client CPUs, and 1x16c + 1x8c.

Maybe they can only do 16 cores with dense variants which would hurt performance in desktop models.
 

StefanR5R

Elite Member
Dec 10, 2016
5,679
8,218
136
Apparently it's not just 8c/16c/32c CCDs, but 8c/16c/32c CCXs even.

As for the client CCDs, remember that it has been claimed that Zen 6 desktop will no longer be server derived, but mobile derived.
Regarding client CCXs, the larger and the more complex the last level cache, the higher tends to be its latency.

Also notable is that InstLatX64 claims that EPYC 9006 is on socket SP7.
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,294
4,703
96
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |