Ryzen: Strictly technical

Borf · Apr 6, 2017

Kromaatikse said:
Of course not. There's no iGPU, so there's no display signal to send.

Of course I'm referring to the APUs.
I had incorrectly assumed that the HDMI on AM4 was limited to 1.4. For a HTPC this is not an option, but I now see some boards support HDMI 2.0.

Kromaatikse · Apr 6, 2017

Given that this is a desktop platform, I would expect only dedicated display output ports to be used.

I honestly don't know how "alternate mode" outputs work in laptops. In all probability, they just multiplex an actual display output onto the same physical wires.

gtbtk · Apr 6, 2017

CatMerc said:
That would mean DF running at double memory clock would it not?
3200MT/s memory, AKA 1600MHz, would be 51.2GB/s right now. I don't see 6400MT/s memory coming anytime soon, so we have to be talking higher data fabric speeds.

Problem is that the frequency is 1/2 of actual memory frequency, in spite of what the OP stated in this post - please see LClk in the AMD produced slide below.

Using 3200Mhz RAM would give you DF bandwidth of 32Bytes * 800000 cycles = 25.6GBs is the maximum throughput that this design has to the memory, or the PCIe bus, or in between the CCX modules.

Those bottlenecks are the reason that the Gaming performance hits a ceiling with fast GPUs, particularly when you only have 2133 or 2400Mhz Ram installed. The IO hub expects to be getting 22.5GBs of when if you are using 2400Mhz Ram, there is only 19.5GBs available between the CPU and the IO hub, and that is assuming that the CPU/GPU is not actually trying to access memory, storage or swap threads at the same time it is also trying to send data to the GPU.

It also indicates that the Aida64 memory benchmarks are including the L1, L2 and L3 cache performance in what it claims is the memory performance benchmark results inflating the benchmark scores over what can actually be written to the RAM sticks. I can only assume that it is also doing that with Intel chips

If you increase the design to use 8 memory controllers (4 x the 2 that exist now), you get the 100GBs but each module is still going to be connected with 32Bytes per cycle interconnects unless there is a way to overclock the Data Fabric to run at a higher ratio of ram frequency.

JimmiG · Apr 6, 2017

w3rd said:
Have you though about trying a few consecutive cold reboots in a row..?

Yeah sometimes it will even cold boot on the first try too, but I don't want my computer to behave like an old lawnmower, sometimes requiring multiple attempts to start.
Maybe the new AGESA code will solve these kinds of problems, though it might just be an inherent problem with Ryzen's memory controller and higher speed RAM, requiring it to "warm up" before it accepts the highest RAM speeds/lowest latencies.

imported_jjj · Apr 6, 2017

gtbtk said:
Problem is that the frequency is 1/2 of actual memory frequency, in spite of what the OP stated in this post - please see LClk in the AMD produced slide below.

Using 3200Mhz RAM would give you DF bandwidth of 32Bytes * 800000 cycles = 25.6GBs is the maximum throughput that this design has to the memory, or the PCIe bus, or in between the CCX modules.

Those bottlenecks are the reason that the Gaming performance hits a ceiling with fast GPUs, particularly when you only have 2133 or 2400Mhz Ram installed. The IO hub expects to be getting 22.5GBs of when if you are using 2400Mhz Ram, there is only 19.5GBs available between the CPU and the IO hub, and that is assuming that the CPU/GPU is not actually trying to access memory, storage or swap threads at the same time it is also trying to send data to the GPU.

It also indicates that the Aida64 memory benchmarks are including the L1, L2 and L3 cache performance in what it claims is the memory performance benchmark results inflating the benchmark scores over what can actually be written to the RAM sticks. I can only assume that it is also doing that with Intel chips

If you increase the design to use 8 memory controllers (4 x the 2 that exist now), you get the 100GBs but each module is still going to be connected with 32Bytes per cycle interconnects unless there is a way to overclock the Data Fabric to run at a higher ratio of ram frequency.

Lclk has nothing to do with the data fabric clocks, as the slide shows you(lower right corner).

Borf · Apr 6, 2017

Kromaatikse said:
Given that this is a desktop platform, I would expect only dedicated display output ports to be used.

I honestly don't know how "alternate mode" outputs work in laptops. In all probability, they just multiplex an actual display output onto the same physical wires.

Yeah that would make sense. I thought that AMD might have done it from the SOC, but multiplexing off the chip would allow motherboard makers to add it as an optional feature.

CatMerc · Apr 6, 2017

gtbtk said:
Problem is that the frequency is 1/2 of actual memory frequency, in spite of what the OP stated in this post - please see LClk in the AMD produced slide below.

Using 3200Mhz RAM would give you DF bandwidth of 32Bytes * 800000 cycles = 25.6GBs is the maximum throughput that this design has to the memory, or the PCIe bus, or in between the CCX modules.

Those bottlenecks are the reason that the Gaming performance hits a ceiling with fast GPUs, particularly when you only have 2133 or 2400Mhz Ram installed. The IO hub expects to be getting 22.5GBs of when if you are using 2400Mhz Ram, there is only 19.5GBs available between the CPU and the IO hub, and that is assuming that the CPU/GPU is not actually trying to access memory, storage or swap threads at the same time it is also trying to send data to the GPU.

It also indicates that the Aida64 memory benchmarks are including the L1, L2 and L3 cache performance in what it claims is the memory performance benchmark results inflating the benchmark scores over what can actually be written to the RAM sticks. I can only assume that it is also doing that with Intel chips

If you increase the design to use 8 memory controllers (4 x the 2 that exist now), you get the 100GBs but each module is still going to be connected with 32Bytes per cycle interconnects unless there is a way to overclock the Data Fabric to run at a higher ratio of ram frequency.

Lclk is just the IO Hub Controller clock. The slide clearly shows that the data fabric does in fact run at memory clock.

51GB/s for 3200MT/s RAM.

TerionX6 · Apr 7, 2017

Let's review some basics, there's no need to waste pages and posts on definition in a thread meant for technical discussion.

I think when people discuss the data fabric they forget, or forget to mention, that it is simultaneously bi-directional. Going by AMD's SoC bandwidth statements we must assume it is 32B/cycle in both directions. We can deduce this by seeing that the DF supports 32B/cycle, while the memory controller supports 16B/cycle for each memory channel, two per Zeppelin die. By necessity then the Zeppelin data fabric must be able to send and receive 16B/cycle*2 between the controller and CCXs.
Therefore:
DDR4-2400 = memClk@1200Mhz = DF@38.4GB/s*2 = 76.8GB/s total SoC bandwidth
DDR4-3600 = memClk@1800Mhz = DF@57.6GB/s*2 = 115.2GB/s total SoC bandwidth

I believe, with what we've seen of Ryzen, that bandwidth is much less important than how latency can be reduced by increasing memclk. Successfully running 4000MT/s RAM would give the chip a 40% reduction in latency, nearly closing the tested core-to-core latency gaps we've seen between Ryzen and Broadwell-E.

What also gets either ignored or misrepresented is that the data fabric clock IS the memory controller clock, nothing more and nothing less. There is no "half of RAM speed" for the DF. It simply is running at memclk, as we can see in AMD's clock domain slide.

Ciao,
Terion

KTE · Apr 7, 2017

Timur Born said:
Many discussions tried to make sense of Ryzen's Tctl temperature readings. Here is my interpretation of how CPU Tctl temperature reading and offsets (plural) work on a *stock* Ryzen 1800X and how this affects stability.

- There are three (3) different *dynamic* offsets to Tctl, +0°C (aka base), +10°C and +20°C.

Not so. AMD posted the details a while back.

1700X/1800X have a +20 offset (Tctl) from real junction temp. -20C is the real temp.

You seem to be describing the chips heating per load instead.

Sent from my HTC 10 using Tapatalk

Timur Born · Apr 7, 2017

KTE said:
Not so. AMD posted the details a while back.

1700X/1800X have a +20 offset (Tctl) from real junction temp. -20C is the real temp.

No, my CPU is not running at 10°C idle on an AIO cooler. But I am also playing with Sense Skew now. So this might solve the far too low readings.

You seem to be describing the chips heating per load instead.

No, I reproduced that this should not be heating per load. It's also hard to believe that heating always happens at exactly +10C and +20C intervals with every software I tested. That even happens when the CPU is pre-heated.

There seems to be a 95°C ceiling, but it can be broken by offset jump (like: 88C -> 98C and then it settles down to 95C again).

KTE · Apr 7, 2017

Timur Born said:
No, my CPU is not running at 10°C idle on an AIO cooler. But I am also playing with Sense Skew now. So this might solve the far too low readings.

No, I reproduced that this should not be heating per load. It's also hard to believe that heating always happens at exactly +10C and +20C intervals with every software I tested. That even happens when the CPU is pre-heated.

There seems to be a 95°C ceiling, but it can be broken by offset jump (like: 88C -> 98C and then it settles down to 95C again).

I find what you're relating very strange... Broken sensors if that's true.

What you describe is not an engineering practise.

What do you idle and what are your ambients?

Sent from my HTC 10 using Tapatalk

looncraz · Apr 7, 2017

Timur Born said:
No, my CPU is not running at 10°C idle on an AIO cooler. But I am also playing with Sense Skew now.

No, I reproduced that this not heating per load, unless you believe that heating always happens at exactly +10C and +20C intervals, even when the CPU is already pre-heated by said load.

I observe none of these temperature jumps with my 1700X. I can change that behavior, though, using the temperature compensation value in the BIOS on the C6H. If I use 63, the default, then I get actual temperatures. If I set 62, then I see temperatures 20C higher (default for AMD behavior), if I set 64 I seem to show 10C too cold... and does demonstrate some strange jumpiness in the values.

This is maintained pretty well:

The "CPU Power" also seems to be fairly accurate, though is the full SoC power... and, if anything, overstates the power being used slightly.

Running Intel Burn Test "Very High" with 16 threads, 10 minutes, 1700X stock:

My "Intake Fans" and pump are driven my the "Radiator Intake" temperature, the "Exhaust Fans" and Rear Intake" are driven by CPU temperature (I would just drive it by Water Out temperature if I could...

Timur Born · Apr 7, 2017

I am using stock (Optimized Defaults) settings, so any Sense Skew settings are what Asus set them to. I now disabled this completely and the idle temps make more sense now, did not check load yet as I am too busy with too many things at the same time.

Furthermore I stumbled over another serious temp anomaly that I am not yet able to reproduce (aka switching between two states of how temps are handled, one of which can crash the CPU to Code 8 before any throttling happens).

CatMerc · Apr 7, 2017

TerionX6 said:
Let's review some basics, there's no need to waste pages and posts on definition in a thread meant for technical discussion.

I think when people discuss the data fabric they forget, or forget to mention, that it is simultaneously bi-directional. Going by AMD's SoC bandwidth statements we must assume it is 32B/cycle in both directions. We can deduce this by seeing that the DF supports 32B/cycle, while the memory controller supports 16B/cycle for each memory channel, two per Zeppelin die. By necessity then the Zeppelin data fabric must be able to send and receive 16B/cycle*2 between the controller and CCXs.
Therefore:
DDR4-2400 = memClk@1200Mhz = DF@38.4GB/s*2 = 76.8GB/s total SoC bandwidth
DDR4-3600 = memClk@1800Mhz = DF@57.6GB/s*2 = 115.2GB/s total SoC bandwidth

I believe, with what we've seen of Ryzen, that bandwidth is much less important than how latency can be reduced by increasing memclk. Successfully running 4000MT/s RAM would give the chip a 40% reduction in latency, nearly closing the tested core-to-core latency gaps we've seen between Ryzen and Broadwell-E.

What also gets either ignored or misrepresented is that the data fabric clock IS the memory controller clock, nothing more and nothing less. There is no "half of RAM speed" for the DF. It simply is running at memclk, as we can see in AMD's clock domain slide.

Ciao,
Terion

I actually sent a message along to AMD to confirm if it's 32B/cycle in both directions when I wrote about the Ryzen clock domains.

They haven't responded :/

TerionX6 · Apr 7, 2017

CatMerc said:
I actually sent a message along to AMD to confirm if it's 32B/cycle in both directions when I wrote about the Ryzen clock domains.

They haven't responded :/

I was thinking last week that if the controller or fabric have to wait before sending data a different direction, that might explain the crazy latency we see when a CCX crosstalks to another CCX's L3.

iBoMbY · Apr 7, 2017

CatMerc said:
I actually sent a message along to AMD to confirm if it's 32B/cycle in both directions when I wrote about the Ryzen clock domains.

They haven't responded :/

It's pretty obvious, because if it wasn't 32B in every direction Ryzen would be limited to Single Channel DDR4 speed ...

Timur Born · Apr 7, 2017

Some more information on Tctl offset and throttling (based on CH6 + 1800X):

- Offsets are not applied over 95C Tctl.
- If an offset shoots Tctl over 95C then the offset is dialed back quickly to match Tctl = 95C.
- As real CPU temperature keeps rising the offset decreases accordingly to match 95C.
- When real CPU temperature increases over 95C Tctl increases accordingly without offset.

- "Soft" throttling (down to around x30) is applied when Tctl + offset = 95C. The higher the real CPU temp increases towards 95C the more soft throttling is applied.
- "Hard" throttling (down to x0.5) is applied when real CPU temp hits 95C.

- Emergency temperature shutdown on my CH6 happens when SIO CPU temp hits 110C, Tctl can increase to slightly higher than SIO CPU over 95C. I saw 113C Tctl before shutdown hit.

KeMuケミュー · Apr 8, 2017

●Ryzen God_mode　（Sleep revert boost bug）

How to ：　HTEP Disabled ＆ Sleep revert ⇒ Ryzen boost bug mode　(God mode)
Tread off：　Error rate ??　（Steady state??）
performance Merit：　 Skylake (intel) or over Skylake
BIOS：　Before April 2017 (before 04/2017)
Power plan：　performance

looncraz · Apr 8, 2017

I finally have tested the latest AGESA (1.0.0.4) thoroughly... I am very glad I waited for this BIOS before posting my Zen architectural review.

SMT penalties are now nearly completely gone. I had previously seen a few areas with penalties as high as 15%... now there are just a few small single digit penalties.

Frequency scaling is now also as it should be - with only bandwidth-constrained benchmarks falling behind... and quite a few showing slightly better scaling with frequency as memory bandwidth is going up along with it as latency drops.

I've tested the Ryzen 5 1400 at 3Ghz, 3.4Ghz, and 3.8Ghz with and without SMT enabled using DDR4-2400 CL14 and I've tested the 1700X at those same frequencies with various CCX configurations. The latest microcode is a BIG help.

I can finally boot my system with DDR4-3200 speeds, though I have been unable to get it into Windows. I've still found that DDR4-2667 CL14 brings the best overall stability for me... but 2933+ will be quite nice to get working fully.

I've also noticed that L1 and L2 bandwidth results are somehow impacted by memory clocks... it's very confusing that DDR4-2933 CL16 results in > 1100GB/s at 3.9Ghz when just dropping to DDR4-2667 CL14 results in 966GB/s of L1 bandwidth... according to AIDA.

You can browse my various raw results here.

I should have my review online within 48 hours. Almost thought I was going to manage it tonight, but I have to create new charts and some of the conclusions have changed, so I have to ponder the implications before a rewrite.

imported_jjj · Apr 8, 2017

looncraz said:
I've also noticed that L1 and L2 bandwidth results are somehow impacted by memory clocks... it's very confusing that DDR4-2933 CL16 results in > 1100GB/s at 3.9Ghz when just dropping to DDR4-2667 CL14 results in 966GB/s of L1 bandwidth... according to AIDA.
.

At 2933 you are getting higher memory BW than theoretical so you can't count on those results.
Is HPET disabled?

Timur Born · Apr 8, 2017

According to Elmor temperature shutdown is not handled by SIO CPU temp, but by the CPU based on Tctl. If this is the case then the shutdown temperature seems to be 115C Tctl.

CatMerc · Apr 8, 2017

imported_jjj said:
At 2933 you are getting higher memory BW than theoretical so you can't count on those results.
Is HPET disabled?

Seems like it's just getting 100% of theoretical rather than more than theoretical?

I say "just", but that's already impressive as all hell.

mtcn77 · Apr 8, 2017

I haven't noticed if it had been mentioned - what do you make of the quad single-rank Hardware.fr benchmarks? This is their summary of 4x4GB original benchmarks(blue) in comparison to faster 2x8GB results... pretty fast for 2400CL15 DIMMs:

We then noticed several things going from 4 x 4 GB to 2 x 8 GB:

The Command Rate changes from 2T to 1T (it is not adjustable at the motherboard level)
Reading bandwidth under Aida64 increases by one GB / s
The performance in applications where the memory subsystem is limited decreases by about 10%. This is the case for example under 7-Zip and WinRAR
http://www.hardware.fr/articles/956-4/ryzen-7-1800x-gamme-ddr4.html
http://www.hardware.fr/articles/958-8/choix-complexifie-par-support-complique.html

What am I missing, Captain?

imported_jjj · Apr 8, 2017

CatMerc said:
Seems like it's just getting 100% of theoretical rather than more than theoretical?

I say "just", but that's already impressive as all hell.

Theoretical is just under 47GB/s and his result with the cores at 3.9GHz gets to almost 49GB/s.
With Ryzen he should be getting 44-45GB/s with the DRAM at 2933.

looncraz · Apr 8, 2017

imported_jjj said:
At 2933 you are getting higher memory BW than theoretical so you can't count on those results.
Is HPET disabled?

HPET is enabled, absolutely. I will have to retest, though, as I did not realize it at the time - so thanks for catching that

I think the memory could have been running at 3200, despite what the screenshot says. Others on the forums have stated seeing similar behavior with the 0083 BIOS. I ran a couple other simple benchmarks and they were entirely consistent (2300 in CPU-z ST, for example), but I pushed overclocking too far with Ryzen Master and had to clear the CMOS.

Ryzen: Strictly technical

Junior Member

Member

Junior Member

Platinum Member

Senior member

Junior Member

Golden Member

Junior Member

Senior member

Senior member

Senior member

Senior member

Senior member

Golden Member

Junior Member

Member

Senior member

Junior Member

Senior member

Senior member

Senior member

Golden Member

Member

Senior member

Senior member