Question WD SN850x perf issue

stanzlavos

Member
May 21, 2016
65
5
71
Hi All

My current setup:

Proc : AMD 5950x
Mobo : Asus ROG Crosshair VIII Dark Hero
RAM : 2 x 16GB DDR4 G Skill 3600 C16
GPU : Gigabyte RTX 4090 Gamin OC
OS : Windows 11

I just got a 2TB WD SN850x. It is installed on the second M.2 slot on the motherboard (the first slot has a 1TB WD SN850). I am seeing lower than expected speeds on CrystalDiskMark.

1 GB



4 GB



16 GB



All the numbers seems to have hit a cap - sequential read/write (SEQ1M) floats ~6200 MB/s. Bottleneck somewhere ?

I am comparing my results with a review here :

https://nascompares.com/review/wd-black-sn850x-ssd-review-testing/

2TB SN850x being tested connected to the secondary M.2 slot (though on a different Mobo/Proc combo and on Windows 10). The sequential read (SEQ1M) floats ~6900 MB/s and the sequential write (SEQ1M) floats ~6630 MB/s.

What could be the problem ? Help!!!
 

stanzlavos

Member
May 21, 2016
65
5
71
Well, just came across this thread:

<< LINK >>

Boards like the X570 Hero Crosshair VIII have the 2nd M2 slot connected to the chipset. Although this slot is supposedly X4 Gen4 Pcie it bottlenecks new, fast SSDs like the 980 Pro and sn850. In the primary M2 slot you may get 7000/5000 MB sec read/write but in the 2nd M2 you may get 6500/3200, way below the x4 Gen4 pcie spec.

Asus support confirmed in my support case that there may be significant latency induced when using the 2nd M2 due to the chipset and this causes lower speed. I was surprised to hear this.

I guess that answers my query ?
 

Tech Junky

Diamond Member
Jan 27, 2022
3,825
1,342
106
There's been some issues with the X version vs the non X when it comes to speed. I recall a thread where someone went thru 3 of them before switching to a different model.

I use the 850 and 770 though for different things and they're good drives overall. The odd thing is the 770 performs better in my TB4 enclosure than the 850 by a good 30% faster. It's a bit counterintuitive though since the 770 doesn't have dram on the controller.

In reality though the only time you use the full potential is when clocking them with CDM or transferring huge amounts of data. Once something like a game is loaded into the GPU it sits idle and when transferring to the GPU they rarely exceed 100MB/s anyway. Drive to drive transfers across the mobo though also tend to hit a cap of 1.5GB/s as well.
 

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,709
2,970
136
It's not an issue with the SN850X. It's precisely the thread the OP found - Only the top/primary M.2 slot is direct CPU connected PCI-e lanes.

All the chipset-derived lanes will be limited in absolute throughput exactly as demonstrated. I have personally tested this on 2x X570, B650E, and X670E boards. I have also assisted several people with various other AMD boards with chipset-derived lanes.

Overall real world performance should not see a significant hit, not nearly as bad as the CrystalDiskMark makes it look. However, I understand the disappointment through my own experience using chipset M.2 slots.
 
Reactions: NewMaxx

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,709
2,970
136
Well, just came across this thread:

<< LINK >>



I guess that answers my query ?

It does. AM4 only has 20 PCIE lanes from the CPU - 16x are used for the primary PCIE slot (or split between the two for 8x/8x).

It only leaves 4x lanes for one single M.2 drive. The rest are chipset driven, and will have the issue you describe.

AM5 offers 24 CPU lanes, to allow for two M.2 slots with direct CPU lanes for maximum throughput.

This isn't really a big deal in practice. As I said, when you run real world tests or benchmarks that aren't solely synthetic throughput, the performance hit is very small and not a big deal. It just looks like a big deal when you see the hit in CrystalDiskMark.
 

stanzlavos

Member
May 21, 2016
65
5
71
There's been some issues with the X version vs the non X when it comes to speed. I recall a thread where someone went thru 3 of them before switching to a different model.

I use the 850 and 770 though for different things and they're good drives overall. The odd thing is the 770 performs better in my TB4 enclosure than the 850 by a good 30% faster. It's a bit counterintuitive though since the 770 doesn't have dram on the controller.

In reality though the only time you use the full potential is when clocking them with CDM or transferring huge amounts of data. Once something like a game is loaded into the GPU it sits idle and when transferring to the GPU they rarely exceed 100MB/s anyway. Drive to drive transfers across the mobo though also tend to hit a cap of 1.5GB/s as well.
I am sure that I wouldn't notice any problems in "real world" performance. Anyhow, it is what it is...!
 

stanzlavos

Member
May 21, 2016
65
5
71
It does. AM4 only has 20 PCIE lanes from the CPU - 16x are used for the primary PCIE slot (or split between the two for 8x/8x).

It only leaves 4x lanes for one single M.2 drive. The rest are chipset driven, and will have the issue you describe.

AM5 offers 24 CPU lanes, to allow for two M.2 slots with direct CPU lanes for maximum throughput.

This isn't really a big deal in practice. As I said, when you run real world tests or benchmarks that aren't solely synthetic throughput, the performance hit is very small and not a big deal. It just looks like a big deal when you see the hit in CrystalDiskMark.
I was aware of the PCIE lanes constraints. However, did not expect a bottleneck (unless other PCIe devices were on full throttle at the same time). Anyhow, I am sure that I wouldn't notice any problems in "real world" performance.

During the benchamarks, the SN850x runs much cooler than the SN850. I am assuming that this is because the drive never gets pushed to its peak performance becauseof the bottleneck. I tried copying a 30GB file from the SN850 to the SN850x and vice versa. Here, I see that the transfer speeds (floats ~3GBPs IIRC) hits a wall after a few seconds - and on checking, this coincides with the SN850 going past the 70-72C mark. I am assuming that this is also expected?

Then again, copying such large files is not something I need to do very often. Even so, seeing transfer speeds of 100-200 MBps between two Gen 4 NVMe SSDs hurts! 😜
 

Tech Junky

Diamond Member
Jan 27, 2022
3,825
1,342
106
hits a wall
It's the cache being depleted. After the cache being depleted they typically level out at 600MB/s. If you want better performance you need to multi thread the operation. With the 770 single thread drops to the 600 range but dropping folders separately yields higher combined throughput. Multi thread performance yields 2.8/3.1GB/s in my TB4 enclosure. Sustained single thread performance just doesn't perform well on most drives. When you hit higher thermals though in the 90+ range you will see actual throttling.
 

stanzlavos

Member
May 21, 2016
65
5
71
It's the cache being depleted. After the cache being depleted they typically level out at 600MB/s. If you want better performance you need to multi thread the operation. With the 770 single thread drops to the 600 range but dropping folders separately yields higher combined throughput. Multi thread performance yields 2.8/3.1GB/s in my TB4 enclosure. Sustained single thread performance just doesn't perform well on most drives. When you hit higher thermals though in the 90+ range you will see actual throttling.
IIRC, the first time I copied the 30GB file (SN850 to SN850x), it went through fine at ~3GBps. On subsequent runs, I started seeing the throttling, and the speeds would drop to ~100-200 MBps (I assumed, after the temps went up on the SN850). Anyhow, let me retest...
 

Tech Junky

Diamond Member
Jan 27, 2022
3,825
1,342
106
I mean it's possible but, repeated testing / purging of the cache between tests will impact throughput more than temps in the 70's. These things are more robust than that and will go full tilt up to 95C typically. I mean my idle temps right now on the 850 is 45C and the 770 is 37C.

The other issue is going to be the bus between the CPU and the additional PCH or whatever AMD wants to call it.

If you don't have it already DL the dashboard app they provide for monitoring the drives and under the performance section it will graph the throughput while testing though, using HWINFO and opening the sensors before testing will log the watermarks for temp / throughput.

 

stanzlavos

Member
May 21, 2016
65
5
71
I mean it's possible but, repeated testing / purging of the cache between tests will impact throughput more than temps in the 70's. These things are more robust than that and will go full tilt up to 95C typically. I mean my idle temps right now on the 850 is 45C and the 770 is 37C.

The other issue is going to be the bus between the CPU and the additional PCH or whatever AMD wants to call it.

If you don't have it already DL the dashboard app they provide for monitoring the drives and under the performance section it will graph the throughput while testing though, using HWINFO and opening the sensors before testing will log the watermarks for temp / throughput.

This is exactly what I do. Let me do one more round of testing.
 

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,709
2,970
136
I was aware of the PCIE lanes constraints. However, did not expect a bottleneck (unless other PCIe devices were on full throttle at the same time). Anyhow, I am sure that I wouldn't notice any problems in "real world" performance.

During the benchamarks, the SN850x runs much cooler than the SN850. I am assuming that this is because the drive never gets pushed to its peak performance becauseof the bottleneck. I tried copying a 30GB file from the SN850 to the SN850x and vice versa. Here, I see that the transfer speeds (floats ~3GBPs IIRC) hits a wall after a few seconds - and on checking, this coincides with the SN850 going past the 70-72C mark. I am assuming that this is also expected?

Then again, copying such large files is not something I need to do very often. Even so, seeing transfer speeds of 100-200 MBps between two Gen 4 NVMe SSDs hurts! 😜
Something about the way AMD chipset is setup results in this. I also thought it should see close to maximum throughput if no other I/O is active during testing but there's some kind of overhead somewhere in the chain.

Testing with something like 3dmark SSD Bench that runs traces of game installation/game loading/file copying/video recording/etc shows minimal performance hit in my experience.

The wall could be windows file transfer sperging. If you can try robocopy with multiple threads and see what kind of throughput you get that would be a more reliable indicator.
 
Reactions: stanzlavos

stanzlavos

Member
May 21, 2016
65
5
71
Well, I am not able to reproduce the issue anymore. Copying across the SSDs seems to working fine - it does drop occasionally but ramps right back up.

BTW, I thought of using TeraCopy after a loooong time! But copying using it is extremely slow! : ~96MBps (v2.3) and ~600 MBps (v3.9.7) while windows copy is ~3+ GBps!

Why is that ? Is TeraCopy doing some "verification" ?
 

stanzlavos

Member
May 21, 2016
65
5
71
Something about the way AMD chipset is setup results in this. I also thought it should see close to maximum throughput if no other I/O is active during testing but there's some kind of overhead somewhere in the chain.

Testing with something like 3dmark SSD Bench that runs traces of game installation/game loading/file copying/video recording/etc shows minimal performance hit in my experience.

The wall could be windows file transfer sperging. If you can try robocopy with multiple threads and see what kind of throughput you get that would be a more reliable indicator.
Let me checkout Robocopy.
 
Jul 27, 2020
23,635
16,596
146
Let me checkout Robocopy.

Try using as many threads as available on your system.
 
Reactions: stanzlavos

stanzlavos

Member
May 21, 2016
65
5
71

Try using as many threads as available on your system.
BTW, I thought of using TeraCopy after a loooong time! But copying using it is extremely slow! : ~96MBps (v2.3) and ~600 MBps (v3.9.7) while windows copy is ~3+ GBps!

Why is that ? Is TeraCopy doing some "verification" ?
Any idea why ? I somehow remember TerCopy being faster!
 

stanzlavos

Member
May 21, 2016
65
5
71
Something about the way AMD chipset is setup results in this. I also thought it should see close to maximum throughput if no other I/O is active during testing but there's some kind of overhead somewhere in the chain.

Testing with something like 3dmark SSD Bench that runs traces of game installation/game loading/file copying/video recording/etc shows minimal performance hit in my experience.

The wall could be windows file transfer sperging. If you can try robocopy with multiple threads and see what kind of throughput you get that would be a more reliable indicator.

I tried out Robocopy - did two tests :

1) Single video file (~30 GB) which I was already using for testing. Being a single file, I am assuming that the "MT:n" option will not have any impacts ? (I did not notice any). All default options.
  • Anyhow, I am getting higher speeds than what windows copy shows (5+ GBps on average I suppose) and once (or twice) it even hit the cap (~6.2 GBps) shown in the CrystalDiskMark results from the first post.
  • SN850 (on primary M.2 slot) --> SN850X was always faster (IIRC, by at least 1 GBps) when compared to the other way around.
2) Folder copy (7 sub-directories; ~1000 files of ~30MB each --> combined ~28GB). Default options except for "/MT" and "/MIR" (also tried "/E" instead of "/MIR")
  • The "speed" shown at the end of copy seems to be off! "/MT:1" was consistently showing higher values than "/MT:16". Anyhow, the higher thread count operation took less time to complete.
  • Looking at the copy output and time, the average speeds were ~1 GBps or lower.
 

00Logic

Junior Member
Oct 29, 2016
17
8
81
I am using mod+signed Samsung drivers on my Corsair MP600as they are more performant and will make up the losses you are seeing through the chipset, and then some!
NB
that the .inf file is modded, NOT the actual driver.sys file...


A search for win-raid.com mod signed drivers will bring you to a wide choice of such drivers, AND the procedure to install the certificate for them, so READING required! (that thins the heard to 1 in 20 dunnit! )

As an aside:

I am getting an extra 100MB/s of R4K perf on a 58GB Optane on a AMD system.
NB
R4K... Thats the I/O used over 60% of the time by an OS, vs less than 1% for Large sequential...!!!
Most of that extra 100MB/s is thx to the above drivers.
The rest is due to a properly aligned (8 sectors) partition and properly aligned exFAT file system. (4K cluster size)
Thx to way less overhead than NTFS and Windblows 'considering' exFAT un-trimmable; it's ideal for Optane.
 
Reactions: stanzlavos
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |