Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 664 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Hitman928

Diamond Member
Apr 15, 2012
6,058
10,397
136
Some people on Xitter are saying it might be a packaging issue, whatever that means. Obviously, that's not in reference to the box the CPU comes in, although it would be funny if the issue was to fix a typo on the box.

According to AMD, it is not a design or packaging issue, but that they discovered that not all chips that were sent out went through QA, so they are sending out new chips to make sure they were properly tested before being sold/reviewed. See post #16,557.

Edit: If it was an actual issue with the chips, there's no chance they would be able to get them fixed and new ones out the door within a week or two. It would have to be either the QA testing miss as explained, or something wrong with the microcode/firmware that they could fix and push out quickly.
 

poke01

Platinum Member
Mar 8, 2022
2,004
2,542
106
According to AMD, it is not a design or packaging issue, but that they discovered that not all chips that were sent out went through QA, so they are sending out new chips to make sure they were properly tested before being sold/reviewed. See post #16,557.

Edit: If it was an actual issue with the chips, there's no chance they would be able to get them fixed and new ones out the door within a week or two. It would have to be either the QA testing miss as explained, or something wrong with the microcode/firmware that they could fix and push out quickly.
That’s not what hardwareluxx is reporting. It’s also a hardware issue.
 

Saylick

Diamond Member
Sep 10, 2012
3,504
7,764
136
According to AMD, it is not a design or packaging issue, but that they discovered that not all chips that were sent out went through QA, so they are sending out new chips to make sure they were properly tested before being sold/reviewed. See post #16,557.

Edit: If it was an actual issue with the chips, there's no chance they would be able to get them fixed and new ones out the door within a week or two. It would have to be either the QA testing miss as explained, or something wrong with the microcode/firmware that they could fix and push out quickly.
Maybe. Ryan seems to think it's a packaging issue as well:
 

poke01

Platinum Member
Mar 8, 2022
2,004
2,542
106
That’s not what hardwareluxx is reporting. It’s also a hardware issue.
More info, translated:

Quality problems ensure a complete recall of the samples and also of the processors already delivered to the trade. All processors already delivered initially will therefore be replaced by a fresh production badge. AMD does not provide any information exactly which quality problems have occurred. But apparently it is a hardware problem that cannot be fixed by software.
 

SK10H

Member
Jun 18, 2015
124
57
101
Do you remember the launch problems of Zen 4 and burned burned sockets? Maybe AMD simply prefer a launch without bugs, as in September no one will remember if it was a late July or early August launch. They will however remember if their CPU is not working.
Not sure if this affect later Zen4, but the 3 7900x retail samples I got in Feb2023 that had ihs dated Jul-Aug 2022 all failed single thread corecycler AVX2 ycruncher/p95 at stock clock with no pbo or curve optimizer.
I just live with the last one with +10 curve on some cores but set static clock almost all the time. I just sidegraded to a 7800x3d recently at no cost for power efficient v/f 24/7 operation, as the Zen4 reg voltage is stupidly unoptimized below the 4.8Ghz range as I tested last year. The x3d I have obviously is a better quality die at lower clock, so pass this AVX2 test just fine at ~-20 curve.

Looking forward for ppl to test single thread AVX2 corecycler on Zen5 at stock clock no pbo/curve, and what the v/f curve look like since they sure know how to tweak the x3d die. 😏
 

Abwx

Lifer
Apr 2, 2011
11,517
4,303
136
Conductor resistance is a big deal on advanced nodes and channel mobility increases with lower temperature, it's not just the conductors.

I mean, we have direct tests of power use vs. temperature and decades of practical overclocking experience to tell us that your theory is not correct. I honestly thought this was just established knowledge at this point, at least in overclocking communities.

They said that cold bug for the 9950X occur at -130°C, it means that at this temp the device is just too slow to work, wich say that at extremely low temps lowered transconductance has more impact than the lower resistances.

It s just that under LN2 they must make sure that the silicon reach a minimal temperature to be functional, because even with LN2 it will be way over this temp once it booted and is somewhat loaded.
 

Hitman928

Diamond Member
Apr 15, 2012
6,058
10,397
136
Maybe. Ryan seems to think it's a packaging issue as well:

More info, translated:

Quality problems ensure a complete recall of the samples and also of the processors already delivered to the trade. All processors already delivered initially will therefore be replaced by a fresh production badge. AMD does not provide any information exactly which quality problems have occurred. But apparently it is a hardware problem that cannot be fixed by software.

I mean, this is just them speculating based upon the fact that mobile isn't being recalled and it can't be fixed in software. Whereas you have an AMD rep directly stating that it isn't a hardware issue but a testing one. That makes the most sense (if they're not sending out new firmware) because, like I said, if it was something in the chip, there is zero chance they could get replacements out this quickly. It's possible that some bad samples went out because they were damaged during packaging (packaging has defects and yields too) and didn't go through the proper QA testing to catch it before shipping, but that would still be what the AMD rep said, that some chips slipped through QA and so they were sending out new chips they know went through the proper testing.
 

Abwx

Lifer
Apr 2, 2011
11,517
4,303
136
That could be as trivial as badly aligned SMS caps on the CPU substrate, the chips
would still work reliably but that s something to be corrected because that wouldnt look professional.
 

Hitman928

Diamond Member
Apr 15, 2012
6,058
10,397
136
They said that cold bug for the 9950X occur at -130°C, it means that at this temp the device is just too slow to work, wich say that at extremely low temps lowered transconductance has more impact than the lower resistances.

It s just that under LN2 they must make sure that the silicon reach a minimal temperature to be functional, because even with LN2 it will be way over this temp once it booted and is somewhat loaded.

Cold bugs aren't because it's too slow, they happen because of either timing violations or that there are analog parts of the CPU that fail with the increased Vth from cold temperatures. I don't think the analog part is really a concern with modern CPUs, so it's most likely a hold time violation as the timing paths shift too far with the extreme temperatures and the data misses the edge window of the flip flop and fails to propagate to the next stage. It's not running too slow, the timings just weren't designed for that cold of operation.
 

Abwx

Lifer
Apr 2, 2011
11,517
4,303
136
I don't think the analog part is really a concern with modern CPUs, so it's most likely a hold time violation as the timing paths shift too far with the extreme temperatures and the data misses the edge window of the flip flop and fails to propagate to the next stage. It's not running too slow, the timings just weren't designed for that cold of operation.

But for time violation to occur or interstage propagation to be too slow something has to limit the speed at wich the transistors are switching since lower resistance are supposed to help...

This means that the parasistic capacitances cant be charged fast enough, that is, that the provided current are too low, wich get us back to too low transistors conductance, actually low temp would be an advantage for higher speed if it werent for the transistors worse characteristics under this condition.
 

Hitman928

Diamond Member
Apr 15, 2012
6,058
10,397
136
But for time violation or interstage propagation to be too slow something has to limit the speed at wich the transistors are switching since lower resistance are supposed to help...

This means that the parasistic capacitances cant be charged fast enough, that is, that the provided current are too low, wich get us back to too low transistors conductance, actually low temp would be an advantage for higher speed if it werent for the transistors worse characteristics under this condition.

Timing violation does not mean too slow, it just means off. It can also be too fast. Flip flops need a narrow window for the signal to be present and held in. If the signal is too early, it will also be a timing violation. A hold time violation cannot be fixed by lowering the frequency (i.e., the signal is propagating too quickly), hence a cold bug will still be there even if you down clock as low as possible. Again, your theory is wrong. You can argue all you want, but real world tests have shown that it is not correct.
 

poke01

Platinum Member
Mar 8, 2022
2,004
2,542
106
All I can add, is after the Intel fiasco, AMD wants to be SURE there is nothing at all wrong with what they send out, even if it causes a slight delay. 2 weeks is a slight delay. You can't get pissed about that.
Yep, Im happy AMD is doing this. Cooled down a bit and a yeah better do it now and have a smooth launch.
 
Reactions: igor_kavinski

Abwx

Lifer
Apr 2, 2011
11,517
4,303
136
Timing violation does not mean too slow, it just means off. It can also be too fast. Flip flops need a narrow window for the signal to be present and held in. If the signal is too early, it will also be a timing violation.

It doesnt mater if it s too early as long as the clocks rising and falling edges are fast enough, once triggered the flip flop will keep its state for at least the duration of a clock cycle.

A hold time violation cannot be fixed by lowering the frequency (i.e., the signal is propagating too quickly), hence a cold bug will still be there even if you down clock as low as possible. Again, your theory is wrong.

Same as above, if the signal is propagated swiftly this will allow for better level validation, what is a problem actually is when clocks signal hedges are not fast enough, at wich point levels coherency can no more be maintained since the flip flops cant be switched on/off correctly if the clocks signals are not well formed, no matter what are the data signals levels and shapes.

You can argue all you want, but real world tests have shown that it is not correct.
I never use such sentences, i mean such arguments or rather lack of, you know, things like "it s well known that", "it s shown in real world tests" and so on.
 
Last edited:

Josh128

Senior member
Oct 14, 2022
292
406
96
Arrow Lake Leak got somebody excited enough to tune up a 9950X on Geekbench. Multiple different runs of 5950 MHz all core OC today, probably DI or LN2



 
Last edited:

CouncilorIrissa

Senior member
Jul 28, 2023
521
2,002
96

Hitman928

Diamond Member
Apr 15, 2012
6,058
10,397
136
It doesnt mater if it s too early as long as the clocks rising and falling edges are fast enough, once triggered the flip flop will keep its state for at least the duration of a clock cycle.



Same as above, if the signal is propagated swiftly this will allow for better level validation, what is a problem actually is when clocks signal hedges are not fast enough, at wich point levels coherency can no more be maintained since the flip flops cant be switched on/off correctly if the clocks signals are not well formed, no matter what are the data signals levels and shapes.


I never use such sentences, i mean such arguments or rather lack of, you know, things like "it s well known that", "it s shown in real world tests" and so on.

You may not like that real world tests prove your theory wrong, but that is the ultimate evidence. You can theorize all you want, but if the real life tests show something very different or even the opposite, then your theory is clearly wrong. The proof is in the pudding.

Hold time violations are also called minimum delay violations because the signal is propagating too fast, so saying it doesn't matter if it is too early is, again, wrong. These type of timing violations are frequency independent. A quick google search will show this is true. I've never met someone who is so confidently wrong over and over again. If you want to prove me, and every digital designer out there, wrong, show some proof of what you say is true in working designs. Outside of that, best of luck to you, I won't be wasting any more time on this.
 

Abwx

Lifer
Apr 2, 2011
11,517
4,303
136
You may not like that real world tests prove your theory wrong, but that is the ultimate evidence. You can theorize all you want, but if the real life tests show something very different or even the opposite, then your theory is clearly wrong. The proof is in the pudding.

Hold time violations are also called minimum delay violations because the signal is propagating too fast,

The pudding interior say that time violation occur mainly when the data signal is too late.

It can occur if the signal comes too early but in this case it s only if the clock is too high and as a consequence that there s not enough time for the stage to be triggered during the relevant clock cycle as to hold the desired value.

So assuming that frequency is low enough at the start there will be no time violation by other mean than the transistors not switching fast enough, that is, too low transconductance to charge parasistic capacitances in due time, i.e, signal being too late as a result.
 

branch_suggestion

Senior member
Aug 4, 2023
373
831
96
Gives AMD enough time to launch a new AGESA for reviews.
N3B being a complete mess has really led to lots of chaos, thankfully Zen 6 development is going nicely.
But for this gen rollout, things are rough, same with GPUs for all players.
 

Hitman928

Diamond Member
Apr 15, 2012
6,058
10,397
136
The pudding interior say that time violation occur mainly when the data signal is too late.

It can occur if the signal comes too early but in this case it s only if the clock is too high and as a consequence that there s not enough time for the stage to be triggered during the relevant clock cycle as to hold the desired value.

So assuming that frequency is low enough at the start there will be no time violation by other mean than the transistors not switching fast enough, that is, too low transconductance to charge parasistic capacitances in due time, i.e, signal being too late as a result.

Prove it, otherwise. . .

Hold violation happen when data is too fast compared to the clock speed. For fixing the hold violation, delay should be increased in the data path.

*Note:* Hold violations is critical and on priority basis in comparison are not fixed before the chip is made, more there is nothing that can be done post fabrication to fix hold problems unlike setup violation where the clock speed can be reduced. The designer needs to simply add more delay to the data path.

 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |