Search results

M
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

I understand enthusiasm but at this pace, Skymont will turn out to be better than yet to be announced Apple M5. So if we start to pay attention to what intel claimed under following conditions So first 2% IPC parity has +/- 10% error. Next Intel used for comparison GCC12.1 at O2 optimization...
- MS_AT
- Post #12,269
- 17 minutes ago
- Forum: CPUs and Overclocking
M
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Lion Cove will be faster, especially in gaming as it simply has more cache available per core compared to Skymont core. Is it 6x the L2 cache per core? [3MB vs 0.5MB] not to mention L1 and L0 caches. I mean this is simply the conclusion you can get from seeing that 7800X3D outperforms faster...
- MS_AT
- Post #12,253
- Today at 11:04 AM
- Forum: CPUs and Overclocking
M
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Until current console generation dies, then 12 + 0 die is a gimmick for gaming, really. You would better spend the additional die space on bigger caches so they could catch up to x3d. By the time more than 8 cores will be a must in gaming your 12 + 0 ArrowLake will be obsolete. Unless of course...
- MS_AT
- Post #12,248
- Today at 10:21 AM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

And which codename Strix Halo is using?
- MS_AT
- Post #20,306
- Today at 10:00 AM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

It's different in what it stores and that it's higher throughput but the overall mechanism of getting it filled looks the same from comparing Software Optimization Guides so if either one misses it will need to wait unless I have misunderstood something.
- MS_AT
- Post #20,300
- Today at 4:59 AM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

I think the problem is how much you have to wait when it misses. Games are reaching like 75% uop utilization in Zen4 from C&C analysis, and if you are unlucky it seems the core will sit there doing nothing for 40 cycles when it is trying to fetch instructions to decode from L3 even worse if from...
- MS_AT
- Post #20,298
- Today at 4:30 AM
- Forum: CPUs and Overclocking
M
Question Post your Geekbench AI scores!

Have you observed thread utilization? OpenVino might limit itself to physical cores since HT won't give you lots of benefits in backend bound code. What you might see is noticeable performance scaling with DDR MT/s if the benchmark is using LLMs underneath.
- MS_AT
- Post #24
- Yesterday at 6:35 PM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Not at all I am afraid. I mean according to AMD single thread gets 4-wide decoder and the design is anyway more latency bound than throughput bound. So while I guess compiler guys can do wonders I wouldn't keep my hopes high. Well servers are also used for compiling;) Problem with compilation...
- MS_AT
- Post #20,286
- Yesterday at 6:31 PM
- Forum: CPUs and Overclocking
M
Question Zen 6 Speculation Thread

Haven't heard thanks for sharing but tbh she was a bit doomed to fall, we have too many too small tools in the ecosystem. I mean even projects like MOLD that are basically drop-in replacements into existing build systems are facing similar hurdles. https://github.com/rui314/mold but well for...
- MS_AT
- Post #694
- Yesterday at 2:27 PM
- Forum: CPUs and Overclocking
M
Discussion Qualcomm Snapdragon Thread

It can be there, but the score is not so meaningful if we don't know the power envelope needed to achieve it. OnePlus was known for using special power modes for GeekBench to boost perf in benchmarks and the test platform itself doesn't need to be in sealed phone factor so it might have...
- MS_AT
- Post #3,508
- Yesterday at 1:54 PM
- Forum: CPUs and Overclocking
M
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Do you know which method Intel used to produce their latency numbers, as I recall Cheese from Chips&Cheese suggested they are comparing apples to oranges since they used C&C number for Meteor Lake but did not provide how they obtained Lunar Lake numbers
- MS_AT
- Post #12,206
- Yesterday at 1:50 PM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Yay finally, maybe they will make it to clang 19 with real tunings instead of copy/paste from zen4.
- MS_AT
- Post #20,281
- Yesterday at 1:47 PM
- Forum: CPUs and Overclocking
M
Question Zen 6 Speculation Thread

But aren't games latency and not throughput bound? I mean more decoder throughput won't help you there. For example take a look at https://chipsandcheese.com/2023/09/06/hot-chips-2023-characterizing-gaming-workloads-on-zen-4/ uop cache is already serving over 75% needs for instructions. Problem...
- MS_AT
- Post #691
- Yesterday at 1:45 PM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

And did anyone check if core parking helps?:)
- MS_AT
- Post #20,267
- Yesterday at 10:30 AM
- Forum: CPUs and Overclocking
M
Question Zen 6 Speculation Thread

Which queue size do you mean? And do you know any workload that is affected by this inter-CCD latency issue other than the synthetic benchmark that is measuring it?
- MS_AT
- Post #684
- Yesterday at 7:23 AM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

I am getting lost in the hazards, do you mean the scheduler hazard that adds 1 cycle latency to ops that are supposed to be 1 cycle only or is there some otherwise undocumented FADD specific scheduler hazard that prevents 2 cycle FADDs to be used every time? As according to all official info on...
- MS_AT
- Post #20,256
- Tuesday at 2:20 PM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

If we are to believe measurements done by Alex Yee and David Huang only Granite Ridge is able to do 2c under specific circumstances http://www.numberworld.org/blogs/2024_8_7_zen5_avx512_teardown/#fadd_latency
- MS_AT
- Post #20,253
- Tuesday at 1:44 PM
- Forum: CPUs and Overclocking
M
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Rumors are pointing in that direction. Server people this way won't complain as they will get what they are used to. Marketing has already prepared an easy explanation with maximizing throughput not energy saving to justify the difference and Client has simpler scheduler model.
- MS_AT
- Post #12,128
- Tuesday at 7:44 AM
- Forum: CPUs and Overclocking
M
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

While I can only guess, based on their own press materials, they wanted to save transistor budget. After all you need to duplicate some structures to support SMT and seeing professional software still has issues with Raptor Lake due to scheduling issues, it might be for the better for them to...
- MS_AT
- Post #12,124
- Tuesday at 7:33 AM
- Forum: CPUs and Overclocking
M
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

for what it is worth
- MS_AT
- Post #12,122
- Tuesday at 7:28 AM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Number of 512b entries is noticeably smaller. For 128/256 the number is either exactly the same or very close to Granite Ridge depending on the source.
- MS_AT
- Post #20,250
- Tuesday at 7:22 AM
- Forum: CPUs and Overclocking
M
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

For what it is worth, desktop parts have twice as big L3 cache [32MB vs 16MB] but L3 is a victim cache so not sure how well R24 is able to make use of it.
- MS_AT
- Post #12,082
- Monday at 11:03 AM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Architecture & Technical discussion

Let me quote myself once again: I have never said timings are not important. In fact they are more important than the pure bandwidth for single CCD SKUs due to IF limitation. Second if you read through his article paying some attention you will see that: So this 20 % AI Benchmark is not thanks...
- MS_AT
- Post #437
- Monday at 8:26 AM
- Forum: CPUs and Overclocking
M
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

So how are caches reported? L0 is omitted from the report and the CPU reports it has only standard L1-L3 setup?
- MS_AT
- Post #12,069
- Monday at 7:49 AM
- Forum: CPUs and Overclocking
M
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

And how Aida accounts for L0? Is it bundled with L1 or any other permutation?
- MS_AT
- Post #12,063
- Monday at 7:33 AM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Architecture & Technical discussion

Let me quote myself: Thanks for finding measurements that confirm this hypothesis;) It's 7950X so 2 CCD SKU using AIDA that uses both CCDs;)
- MS_AT
- Post #435
- Monday at 7:17 AM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Architecture & Technical discussion

Ok, maybe I wasn't clear enough. The CCD to IOD interface limits you to 64GB/s, while 6000MT/s DDR5 setups provides theoretical 96GB/s. Since CCD to IOD bandwidth is the limiting factor here, it doesn't matter how fast your DRAM is if you saturate CCD to IOD link first [probably better to have a...
- MS_AT
- Post #433
- Monday at 4:30 AM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Architecture & Technical discussion

Fabric clock sweet spot is 2000MHz, CCD to IOD link is 32B/c -> max bandwidth 64GB/s. If you are lucky you get 70,4GB/s with 2200MHz. For DDR: 6400MT/s with 128b(16B) bus is 102GB/s. With current sweetspot of 6000MT/s you are at 96GB/s. In other words, the new sweet spot will be meaningless to...
- MS_AT
- Post #431
- Monday at 3:35 AM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Architecture & Technical discussion

And what needs fixing? CCD to CCD latency? Was anyone able to correlate poor performance in this synthetic test with impact on any particular workload? Or is there something else that needs fixing? Chips&Cheese did one article long time ago...
- MS_AT
- Post #428
- Sunday at 4:04 PM
- Forum: CPUs and Overclocking
M
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

R24 is memory subsystem dependant what makes it different to R23. I suggest checking out Chips&Cheese review of the benchmark. AT is using stock memory, while other outlets are using 6000MHz+ RAM. In case of Lunar Lake vs Strix, Lunar is enjoying both bandwidth and latency advantage. Since there...
- MS_AT
- Post #11,900
- Saturday at 10:29 AM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Architecture & Technical discussion

We would need an AMD engineer to answer that question. To the best of my knowledge, there is one register file for each domain. We don't know if it is internally divided. What we know is that the latency penalty applies only to a subset of instructions only under specific conditions, while what...
- MS_AT
- Post #424
- Saturday at 10:26 AM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Wasn't this assuming so OS side patches, or something like that, I remember in footnotes they somehow alluded to tuned software for the highest perf improvement they quoted. Ah, that would be unfortunate as it would suggest a cut down die, to be honest, or something that did not pass tests to...
- MS_AT
- Post #20,227
- Saturday at 9:13 AM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Architecture & Technical discussion

No. You should read this: "We are claiming some instructions have best case latency of 1 cycle but your standard latency test might measure 2 cycles. To measure 1 cycle do this". That is the only purpose for the testing solution they propose. It's not meant for performance comparisons between...
- MS_AT
- Post #422
- Saturday at 9:09 AM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Kraken was so far supposed to be 4 classic 4 dense, has something changed?
- MS_AT
- Post #20,210
- Friday at 6:48 PM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Architecture & Technical discussion

The Zen 5 Software Optimization Guide contains also an excel file listing instruction latencies. In there there is also a Notes sheet that contains a following note: This is about the 1 cycle latency regression for single cycle SIMD ops, that people were previously discussing here.
- MS_AT
- Post #416
- Friday at 6:45 PM
- Forum: CPUs and Overclocking
M
Question CPUs for shared memory parallel computing

Original quote with wider context: Posting just for reference purposes;)
- MS_AT
- Post #111
- Friday at 9:16 AM
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

I was looking up AT review of Skylake before posting;) Haven't calculated the percentage but the bars looked within 5% range. Anyway my point was, that 5% gaming improvements weren't something unheard of before even across 2 generations. The whole media coverage of Zen5 sounds like it is an...
- MS_AT
- Post #20,165
- Sep 5, 2024
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

The issue is that IPC might be misleading and not exactly what people are looking for. For example in Zen4 case, and avx512 vs avx2, you would have the same throughput, but AVX512 code would retire only half of the instructions the avx2 loop would do, and maybe shave off few cycles saved on the...
- MS_AT
- Post #20,161
- Sep 5, 2024
- Forum: CPUs and Overclocking
M
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

But weren't people already doing that when comparing IvyBridge to Skylake?
- MS_AT
- Post #20,156
- Sep 5, 2024
- Forum: CPUs and Overclocking
M
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

The packaging should also bring latency benefits I guess. I mean memory is on package, doesn't need to go through motherboard PCB. It's physically much closer to the CPU than otherwise would be possible.
- MS_AT
- Post #11,558
- Sep 4, 2024
- Forum: CPUs and Overclocking

RESOURCES

Top Bottom