Info DirectStorage 1.1 benchmark

Makaveli · Jan 16, 2023

You can download the benchmark here.

https://drive.google.com/file/d/194zc0kSJxlZpCeT9Jwr2eD9bi4TizcTG/view

Win 10 22H2
Ryzen 5800X
6800XT PCI e 4.0
Adrenalin 22.11.2
Corsair MP600 1TB

Aapje · May 4, 2023

psolord said:
Seriously, we made fun of Forspoken and it's the only game that used this tech properly if not at all. I did recent runs with the demo and it runs like a champ!

I think that the issue is that they pretty much need to start using it from the beginning, so with development taking multiple years and Direct Storage only being ready in 2022, it's mostly limited to demos for now.

Forspoken was made on the Luminous Engine, which is used for almost nothing else, but where seemed to be working with MS to implement this before it was actually released. Meanwhile, Unreal Engine doesn't even have support yet, so it could be a good long while until games with that engine will have Direct Storage.

soresu · May 5, 2023

Aapje said:
Forspoken was made on the Luminous Engine, which is used for almost nothing else, but where seemed to be working with MS to implement this before it was actually released. Meanwhile, Unreal Engine doesn't even have support yet, so it could be a good long while until games with that engine will have Direct Storage.

UE5 virtualises most assets like textures and geometry so that they are only streamed as needed, consequently it doesn't need Direct Storage as much and Epic's main focus currently seems to be on improving Lumen and Nanite which are still pretty rough around the edges still.

Aapje · May 5, 2023

soresu said:
UE5 virtualises most assets like textures and geometry so that they are only streamed as needed, consequently it doesn't need Direct Storage as much and Epic's main focus currently seems to be on improving Lumen and Nanite which are still pretty rough around the edges still.

Streaming on demand is actually exactly what Direct Storage is intended to greatly improve. The classic storage APIs have a lot of overhead, but this wasn't a big deal when we had HDDs, because those have a lot of overhead for every request as well. After all, they need to move the head to the right location on the disc. This is a physical operation that takes a relatively long time. So HDDs aren't suitable for reading lots of small things from random locations, which is why games have traditionally read an entire collection of assets and then kept it in memory.

This is not particularly efficient since you often read and store much more than is needed at that time, but it is more efficient than reading all assets separately and having huge seek costs for each asset.

In contrast, NVMes have seek times that are way faster and so the entire calculation changes, because it now makes sense to just get what you need, which also means that you use the bus and VRAM more efficiently.

However, the classic storage APIs were designed for HDDs where you only have a few read operations at a time and where each important read operation is a bulk operation with large seek times, so it doesn't matter that much if the storage API is a bit slow or if you store a relatively large amount of data in memory to keep track of the request.

In contrast, with NVMes, you ideally want to be able to just fire off a ton of separate IO requests with low overhead. So this why Direct Storage was developed, which can very efficiently handle a huge number of requests without bogging down.

It is possible for UE5 to stream assets from RAM to the GPU without having Direct Storage, but I don't see how they can stream from the NVMe to the RAM. They will still have to do bulk reads and thus have long load times.

Note that the end goal, which we'll probably only achieve step by step, is for assets to go directly from the NVMe to the GPU. To do that, we need GPU decompression (and compression formats optimized for GPUs) so the GPU can unpack the textures without needing the CPU to do it and for the GPU's to have their own Direct Storage implementation, so they can retrieve individual textures and such with very low overhead.

This removes a lot of very inefficient data movement, because right now a large collection of textures goes from the NVMe to the CPU which then sends it to the RAM, then when the GPU needs the texture, it asks the CPU for it, which then retrieves it from RAM, decompresses it and sends it to the VRAM. Because it is uncompressed, this takes a long time to send. In the future, the compressed texture will go directly from the NVMe to VRAM, which will mean less load on the CPU and much faster load times.

soresu · May 5, 2023

Aapje said:
Streaming on demand is actually exactly what Direct Storage is intended to greatly improve. The classic storage APIs have a lot of overhead, but this wasn't a big deal when we had HDDs, because those have a lot of overhead for every request as well. After all, they need to move the head to the right location on the disc. This is a physical operation that takes a relatively long time. So HDDs aren't suitable for reading lots of small things from random locations, which is why games have traditionally read an entire collection of assets and then kept it in memory.

This is not particularly efficient since you often read and store much more than is needed at that time, but it is more efficient than reading all assets separately and having huge seek costs for each asset.

In contrast, NVMes have seek times that are way faster and so the entire calculation changes, because it now makes sense to just get what you need, which also means that you use the bus and VRAM more efficiently.

However, the classic storage APIs were designed for HDDs where you only have a few read operations at a time and where each important read operation is a bulk operation with large seek times, so it doesn't matter that much if the storage API is a bit slow or if you store a relatively large amount of data in memory to keep track of the request.

In contrast, with NVMes, you ideally want to be able to just fire off a ton of separate IO requests with low overhead. So this why Direct Storage was developed, which can very efficiently handle a huge number of requests without bogging down.

It is possible for UE5 to stream assets from RAM to the GPU without having Direct Storage, but I don't see how they can stream from the NVMe to the RAM. They will still have to do bulk reads and thus have long load times.

Note that the end goal, which we'll probably only achieve step by step, is for assets to go directly from the NVMe to the GPU. To do that, we need GPU decompression (and compression formats optimized for GPUs) so the GPU can unpack the textures without needing the CPU to do it and for the GPU's to have their own Direct Storage implementation, so they can retrieve individual textures and such with very low overhead.

This removes a lot of very inefficient data movement, because right now a large collection of textures goes from the NVMe to the CPU which then sends it to the RAM, then when the GPU needs the texture, it asks the CPU for it, which then retrieves it from RAM, decompresses it and sends it to the VRAM. Because it is uncompressed, this takes a long time to send. In the future, the compressed texture will go directly from the NVMe to VRAM, which will mean less load on the CPU and much faster load times.

On that subject this recent move from nVidia should alleviate some of the load:

Random Access Neural Texture Compression

NVIDIA Introduces Revolutionary Neural Texture Compression for Material Textures

NVIDIA has unveiled a cutting-edge compression algorithm for material textures, dubbed Neural Texture Compression (NTC). This innovative algorithm tackles the escalating memory requirements of high-re...

www.guru3d.com

Random-Access Neural Compression of Material Textures | Research

The continuous advancement of photorealism in rendering is accompanied by a growth in texture data and, consequently, increasing storage and memory demands. To address this issue, we propose a novel neural compression technique specifically designed for material textures. We unlock two more...

research.nvidia.com

NikosD · Jun 17, 2023

utahraptor said:
I am really struggling to find a free site to upload something to share, but I just rebuilt it from the current code which seems even faster so please try this link: https://uploadnow.io/f/YwWxh03

Please let me know if that link works or please suggest another site to try.

The link has expired.

Also the source has been updated to DirectStorage 1.2.1

Maybe it's a good time to recompile it and upload it to a more stable repository like One Drive with no expiration date.

igor_kavinski · Jun 17, 2023

Don't understand why Microsoft is being so difficult regarding this benchmark: https://github.com/microsoft/DirectStorage/blob/main/Samples/GpuDecompressionBenchmark/README.md

Why not provide a binary and include a standard data file so everyone can benchmark their GPU easily?

utahraptor · Jun 20, 2023

NikosD said:
The link has expired.

Also the source has been updated to DirectStorage 1.2.1

Maybe it's a good time to recompile it and upload it to a more stable repository like One Drive with no expiration date.

I'll try to reinstall visual studio and fight it again after I mow the lawn 🤣

utahraptor · Jun 20, 2023

Ok, here is the new version compiled: BulkLoadDemo 1.2.1

AdamK47 · Jun 20, 2023

utahraptor said:
Ok, here is the new version compiled: BulkLoadDemo 1.2.1

View attachment 82010

Missing some dependencies.

utahraptor · Jun 20, 2023

AdamK47 said:
Missing some dependencies.

View attachment 82015

ChatGPT tells me you are getting that error because I compiled it in Debug mode rather than Release mode. I have recompiled it in Release Mode:

BulkLoadDemo 1.2.1

Makaveli · Jun 20, 2023

utahraptor said:
ChatGPT tells me you are getting that error because I compiled it in Debug mode rather than Release mode. I have recompiled it in Release Mode:

BulkLoadDemo 1.2.1

AdamK47 · Jun 20, 2023

utahraptor said:
ChatGPT tells me you are getting that error because I compiled it in Debug mode rather than Release mode. I have recompiled it in Release Mode:

BulkLoadDemo 1.2.1

It works now.

About the same for me as the original 1.1 benchmark. Not much room for improvement for me it seems.

Inland 2TB TD510 PCI-E 5.0 NVMe:

Three 8TB Sabrent Rocket Q PCI-E 3.0 NVMe in 24TB RAID-0:

NikosD · Jun 20, 2023

utahraptor said:
Ok, here is the new version compiled: BulkLoadDemo 1.2.1

It works like a charm.
Thank you.

psolord · Jul 13, 2023

Semi off/on topic but it seems Rachet and Clank will be the first direct storage 1.2 game for the pc.

Ratchet & Clank: Rift Apart PC Specs Are Out - First DirectStorage 1.2 Game with GPU Decompression, No SSD Req

Ratchet & Clank: Rift Apart PC specs are out. This is the first DirectStorage 1.2 game with GPU decompression, but it doesn't require SSD.

wccftech.com

No SSD required they say, lol.

I swear to god, some console....fans...where saying that something like R&C would need 8GB/sec storage and a 3950X to play properly, rofl.

And there's a related story about an nvidia driver that improves direct storage performance, something something..

Aapje · Jul 13, 2023

psolord said:
No SSD required they say, lol.

Yes, at 720p @ 30FPS. And that is average, so the FPS will dip below that...

CakeMonster · Jul 14, 2023

19GB/s on my 7950X + 4090 + SN850X 4TB.
1.8GB/s on my 16TB ST Exos HDD on the same computer.

Does the CPU usage number have any practical use?

Edit: Tried closing and running it again and now it ends up at 13GB/s on every run. Something weird is going on. I might update this when I reboot, in case something is interfering.

igor_kavinski · Jul 14, 2023

CakeMonster said:
Edit: Tried closing and running it again and now it ends up at 13GB/s on every run. Something weird is going on.

SN850X throttling?

CakeMonster · Jul 14, 2023

Rebooted. Now it gives me consistently 25GB/s. Left it on for 20 minutes, and no degradation. I'm guessing there was something keeping the GPU or SSD busy even though it wasn't showing up in task manager. Possibly some of the AI applications I play with that take up VRAM.

BFG10K · Jul 31, 2023

Ratchet & Clank load times, the first game to use Direct Storage 1.2:

Virtually no difference between NVMe and SATA, and even the HDD is okay once the initial long load is done. This is much more realistic than a meaningless synthetic benchmark.

If I was interested in getting the game, it'd be fun to test it on my 10,000 RPM VelociRaptor

Makaveli · Jul 31, 2023

BFG10K said:
Ratchet & Clank load times, the first game to use Direct Storage 1.2:

Virtually no difference between NVMe and SATA, and even the HDD is okay once the initial long load is done. This is much more realistic than a meaningless synthetic benchmark.

If I was interested in getting the game, it'd be fun to test it on my 10,000 RPM VelociRaptor

Only playable on HDD once you have done a played it once and cache everything. There will still be intermittent pauses.

Even a regular SATA ssd isn't immune to some pausing

BFG10K · Aug 1, 2023

Makaveli said:
Only playable on HDD once you have done a played it once and cache everything. There will still be intermittent pauses.

I never said playing on an HDD was the best choice, just that it was "okay". It's completely possible to finish the game, as shown by numerous videos.

Also DirectStorage provides a performance gain on HDDs in at least one game.

Makaveli said:
Even a regular SATA ssd isn't immune to some pausing

NVMe pauses as well. If it didn't there'd be no load screens.

Even when it''s cached from RAM there are still load screens, proving there's no difference between NVMe speeds, because RAM is far faster than any of them.

Tup3x · Aug 1, 2023

BFG10K said:
NVMe pauses as well. If it didn't there'd be no load screens.

The fact that there's no difference between gen 4 and gen 3 NVMe drives makes me wonder if it's a deliberate delay.

I wouldn't say that HDD is even okay. It has to be really fast or stutters are severe and you might end up falling through the floor etc. SATA SSD would offer okay experience.

Makaveli · Aug 3, 2023

I just added a Western Digital 2TB SN850X to my system and reran this.

Adrenaline 23.7.2 drivers

Corsair 1TB MP600: Sequential Read 4,950 MBps Sequential Write 4,250 MBps

Western Digital 2TB SN850X: Sequential Read 7,300 MBps Sequential Write 6,600 MBps

Aapje · Aug 3, 2023

Makaveli said:
I just added a Western Digital 2TB SN850X to my system and reran this.

Someone got the same Prime deal I got...

Makaveli · Aug 3, 2023

Aapje said:
Someone got the same Prime deal I got...

lol I actually missed the Prime sale but my local store has these with big discounts so thought it was a good purchase.

Info DirectStorage 1.1 benchmark

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Junior Member

Lifer

Golden Member

Golden Member

Lifer

Golden Member

Diamond Member

Lifer

Junior Member

Platinum Member

Golden Member

Golden Member

Lifer

Golden Member

Lifer

Diamond Member

Lifer

Golden Member

Diamond Member

Golden Member

Diamond Member