igor_kavinski
Lifer
- Jul 27, 2020
- 23,686
- 16,612
- 146
They have enough of it too. Just need to cancel the current 7900X3D and future 9900X3D SKUs.Guess we'll start seeing IC after AMD's apu's get squeezed by Intel.
They have enough of it too. Just need to cancel the current 7900X3D and future 9900X3D SKUs.Guess we'll start seeing IC after AMD's apu's get squeezed by Intel.
-I've always been perplexed by AMD's unwillingness to add IC to their APUs. Figure it would have an outsized effect in the most bandwidth constrained scenarios.
Guess we'll start seeing IC after AMD's apu's get squeezed by Intel.
-I've always been perplexed by AMD's unwillingness to add IC to their APUs. Figure it would have an outsized effect in the most bandwidth constrained scenarios.
Guess we'll start seeing IC after AMD's apu's get squeezed by Intel.
This is always the case. Do not assume SIMD32 will save huge space. We knew RDNA3 won't perform well after Angstronomics revealed the compactness of the die. It does require transistors to do so, and it is hardware, so if it's too compact, then it's suspicious. It might save some space, but if it's say 40% more compact, then they would have removed features.IMO the plan is probably to match the previous generation peak performance in the U series but using much less area to do so. And the H series gets a wider GPU to keep up the 2x gains gen on gen while also creeping up into the Nvidia xx50 performance.
I think this is better way of looking at it than just politics. They are still inexperienced.AFAIK Intel had a perfect storm in their hands, a piece of hardware that required special software attention... and a software team paralyzed by the recent war.
Resizable bar requirements are because they had the iGPU mentality for so long. They didn't need to care. Now they have a dGPU they'll really understand what is needed.Let's hope they take out this stupid re-sizable bar requirement this time. For older systems, the performance uplift between Arc1 and Arc2, would be +130%.
Especially with the manager of that time that seems to mostly play stupid politics games instead of delivering.have long suspected that Intel is large enough for plenty of internal politics
CEO Brian Krzanich was responsible for the complacency of the company during his tenure and Intel TMG's Sohail for the 10nm delays.Especially with the manager of that time that seems to mostly play stupid politics games instead of delivering.
Intel TMG's Sohail was the biggest borderline criminal executive that was forced out a couple of years ago and was responsible for the 10nm delays.
I was honestly surprised the PS5 Pro didn't add an Infinity Cache. They blew the transistor budget on extra shaders, raytracing etc, but didn't give it any more memory bandwidth.3D cache APU? Who's here for that?
Heck, AFAIK they don't have LLC/IC in the console APUs either, where you would think it would make so much sense as it is so power efficient and that is a huge boon to getting performance out of a tiny box and reducing costs elsewhere in size, cooling capacity, power delivery, etc.
Thinking next gen is when we'll see it.
There's more memory bandwidth in the form of higher-clocked GDDR6, from 448GB/s to 576GB/s.I was honestly surprised the PS5 Pro didn't add an Infinity Cache. They blew the transistor budget on extra shaders, raytracing etc, but didn't give it any more memory bandwidth.
There's more memory bandwidth in the form of higher-clocked GDDR6, from 448GB/s to 576GB/s.
Regardless, the PS5 Pro doesn't need a lot more memory bandwidth because it will actually be targeting a lower base resolution. It'll render 1080p upscaled to 4K using AI PSSR whereas the PS5 renders 1440p upscaled to 4K using temporal FSR2 or similar.
Yes but a cache will speed it up in places where it needs lower latency such as instructions. Also a cache is much better at extracting theoretical bandwidth for that same reason.There's more memory bandwidth in the form of higher-clocked GDDR6, from 448GB/s to 576GB/s.
The Draw/Execute Indirect speed-up for BattleMage is another one of these cases.Resizable bar requirements are because they had the iGPU mentality for so long. They didn't need to care. Now they have a dGPU they'll really understand what is needed.
Well as long as the hardware team were able to blame the driver team!Hardware team has been bottlenecking the driver team.
NOT
Driver team has been bottlenecking the hardware team.
Raja getting sidelined confirmed Intel was not happy with the hardware. The blaming game may have been real, but I doubt they were convincing enough.Well as long as the hardware team were able to blame the driver team!
He blamed Lisa, the name that is a success even when it's a failure!The blaming game may have been real, but I doubt they were convincing enough.
Alchemist needs hand-tuning by the driver writers to optimize for weak APIs and engines such as Unreal Engine 5. It is because they said Alchemist emulates the feature widely used by UE5.
Hardware team has been bottlenecking the driver team.
NOT
Driver team has been bottlenecking the hardware team.
Almost immediately Intel stated that the design suffered from memory bandwidth issues. I am pretty sure Raja said that out loud in a post launch interview. Based on that I assume it was already being addressed in the hardware design of the next generation parts.The idea I get from the chipsandcheese's microbenchmarks on the A770 is that execution latencies are high and bandwidth for low workgroup count is low.
So it does look like the hardware is highly dependent on hand-tuned driver optimizations to keep many ALUs occupied and thus hide the low effective bandwidth. It does look a bit like the same problems GCN used to have.
Ironically, it would be the one that is best at games, or the Ryzen X3D series.I wonder which CPU benefits ARC the best, helping to keep it busy.
Saying that is akin to saying Vega suffered from memory bandwidth issues. It's just that both have a difficult time utilizing the said bandwidth.Almost immediately Intel stated that the design suffered from memory bandwidth issues. I am pretty sure Raja said that out loud in a post launch interview. Based on that I assume it was already being addressed in the hardware design of the next generation parts.
Saying that is akin to saying Vega suffered from memory bandwidth issues. It's just that both have a difficult time utilizing the said bandwidth.
How about some ARC with your ARC? https://en.m.wikipedia.org/wiki/ARC_(processor)I wonder which CPU benefits ARC the best, helping to keep it busy.
I found this mesa code today. This points to ARL-H with xe2, as gfx20=xe2 but im not sure.I was disppointed with the recent rumors pointing to having same 32 Xe cores as A770, but the architectural reveal shows that I might not need to worry.
Xe2/Battlemage current specs:
-32 Xe cores
-Higher 2.8-3GHz
Lunarlake shows that it seems it's 50% faster at the same basic Shader/TMU/ROP specs, and that's without even counting the low level details that might make it faster in games.
Let's look at the clocks though. 2.4GHz for A770 versus 2.8GHz for say, B970 is 73% faster by using faster GDDR memory. At 3GHz, it'll result in being 87.5% faster performance over A770. If it's better in actual games than Alchemist, then it could be enough to push it over the mark and be 2x and rival RTX 4070S/4070 Ti.
32 Xe cores at 3GHz is only 24.5TFlops. It'll turn out to be the "most powerful" GPU for it's TFlops class!*
*Ok that's expected as based on rumored die size it's said to be AD103 class.
64 would not be needed as 32 Xe2 cores with 3GHz clock and game improvements due to architecture is enough to reach 4070S. 64 would mean 2x on top of that, which I don't believe they'll reach that on N5, especially with just 1.5x perf/watt improvement.I found this mesa code today. This points to ARL-H with xe2, as gfx20=xe2 but im not sure.
Also ONE xe1 EU is 2x slower then ONE xe2 EU at half the power well at least in timespy. But 32xe2 cores = 256EU and A770 512EU. At worst 4060ti at best 4070super.
Also i still think the 56/64 variant is comming just from the mesa/linux kernel dev patches im reading daily.
LNL xe2 is already above b0 stepping since 4days ago.
Found kernel patches for G21 today for example. First mention of G21 in patchwork.