AMD Titan equivalent with Navi

davide445

Member
May 29, 2016
132
11
81
Researching a bit appear last time AMD released a single GPU card beating the Titan was with the R9 290X back in 2013.
I understand AMD does have a massive gap in R&D budget but was wondering why was just impossible to achieve the goal except for specific workloads.
My idea is power consumption was the biggest limitation, maybe someone can confirm.
With Navi being in fact a multi-GPU solution with specific AI hardware will the goal being achievable?
 

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
Basically power consumption yes - there are some quite hard limits to how much power a high end card can draw and AMD are already pushing those (hard) with Vega.

There isn’t any obvious reason I can think of that multi die would help with that. They just need a rather better architecture/the budget to do the fine tweaking to refine what they have.

The realistic hopes for that involve in Ryzen/Epyc etc getting really established, but that’ll take a bit of time, and then a fair while for the results to come down the R&D pipeline.
 

sandorski

No Lifer
Oct 10, 1999
70,231
5,806
126
Basically power consumption yes - there are some quite hard limits to how much power a high end card can draw and AMD are already pushing those (hard) with Vega.

There isn’t any obvious reason I can think of that multi die would help with that. They just need a rather better architecture/the budget to do the fine tweaking to refine what they have.

The realistic hopes for that involve in Ryzen/Epyc etc getting really established, but that’ll take a bit of time, and then a fair while for the results to come down the R&D pipeline.

Each chip could operate at optimal power usage. Performance can be achieved just by throwing enough chips at it.....I suspect that's the reasoning Anyways.
 

Muhammed

Senior member
Jul 8, 2009
453
199
116
New Researching a bit appear last time AMD released a single GPU card beating the Titan was with the R9 290X back in 2013.
Only the castrated Titan, the Titan Black (full die) was unchallenged. Till Kepler aged badly that is.

And FWIW, I don't think Navi will be a multi die solution just yet.
 

Mopetar

Diamond Member
Jan 31, 2011
8,104
6,740
136
I could see AMD going multi die approach much like they did with Ryzen. It would make a lot of economic sense and help them greatly in the data center, but it doesn’t fix their consumer GPU issues as games have struggled to fully utilize all of the resources on AMDs flagship cards as well the fewer, but beefier GeForce cards.

AMD still would need a good consumer die, but perhaps they could take NV’s approach and do more to separate compute from the consumer line so that gamers can actually get AMD cards. A sale is a sale as far as they’re concerned, but for developers there’s less incentive to build customized code paths for a decreasing percentage of the total market.

That costs money, but a multi-die approach can give them that funding.
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
I could see AMD going multi die approach much like they did with Ryzen. It would make a lot of economic sense and help them greatly in the data center, but it doesn’t fix their consumer GPU issues as games have struggled to fully utilize all of the resources on AMDs flagship cards as well the fewer, but beefier GeForce cards.

AMD still would need a good consumer die, but perhaps they could take NV’s approach and do more to separate compute from the consumer line so that gamers can actually get AMD cards. A sale is a sale as far as they’re concerned, but for developers there’s less incentive to build customized code paths for a decreasing percentage of the total market.

That costs money, but a multi-die approach can give them that funding.

I like that idea. A multi-die glue-together-with-Infinity-Fabric GPU where each is an independent GPU that is addressed separately would still be a huge hit with miners if they focused it on perf/W. Could do a bunch of small efficient dies and save the bigger dies for gamers that way.
 

Mopetar

Diamond Member
Jan 31, 2011
8,104
6,740
136
Ideally they’d figure out a way to have the multiple small dues treated as one monolithic GPU, but if that were easy CF and SLI would be a lot better at the present moment.

Once you can make a small die that gets hooked together via something like infinity fabric, it doesn’t make a lot of sense to make a bigger die, even if it’s only Polaris sized. I think the trick is artificially segmentation by selling more efficient cards for mining by stripping out the display ports and selling in bulk. Driver availablility could factor in as well.
 

davide445

Member
May 29, 2016
132
11
81
The fact Navi will be a MCM design appear to me very probable considering the introduction of Infinity Fabric and the fact Vega use it in the new APU.
Here an interesting article about the advantages
https://wccftech.com/amd-navi-gpu-launching-siggraph-2018-monolithic-mcm-die-yields-explored/
About the topic title will be probably make more sense to distinguish between gaming oriented GPU as 1080 Ti and compute oriented as the Titan (being the last Titan V much more compute than gaming it's clear NV does segment the design in these categories).
AMD appear to have always shared both features in the same design for volume reasons I suppose.
Competing with top NV gaming X80 Ti was also a goal never really reached, for the compute part was probably better.
 

nathanddrews

Graphics Cards, CPU Moderator
Aug 9, 2016
965
534
136
www.youtube.com
Excuse my ignorance, but couldn't multiple dies be presented at the hardware level as ROP clusters sharing a common memory pool? Or would that introduce too much latency?
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
even if they are just a lot of dies connected on one card but not monolithic (more like 4 gpus on a card) it would be a smash hit with miners
 

Suijin

Junior Member
Aug 19, 2015
20
0
16
I have been thinking for VR if you could dedicate 1 GPU to each eye say even with vega 64 levels of GPU it should beat the pants off almost anything available now. I guess that is the question if it can be done with something like infinity fabric.
 

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
136
Researching a bit appear last time AMD released a single GPU card beating the Titan was with the R9 290X back in 2013.
I understand AMD does have a massive gap in R&D budget but was wondering why was just impossible to achieve the goal except for specific workloads.
My idea is power consumption was the biggest limitation, maybe someone can confirm.
With Navi being in fact a multi-GPU solution with specific AI hardware will the goal being achievable?

Power and Die size both. AMD is behind NVidia in efficiency. They need more power and more die space to match performance.

Vega 64 already has bigger, more power hungry die that Titan/1080Ti, yet it only performs like a GTX 1080.

That is a large deficit.

Going to MCM is not a panacea, and is much more problematic that using multiple CPU dies.

Even if you magically solved all the MCM GPU issues, the same design split over multiple dies would always be slower than on one monolithic die. It won't save power either. In fact it would be worse there as well, and off chip connections use more power.

I am quite certain Navi won't be MCM, except for the possible Crossfire dual chip cards we have had for years.

GPU MCM will really only make sense if the die target size is ridiculously large.
 

Zero the hero

Junior Member
Jan 7, 2018
2
1
36
Is anyone else just annoyed at the state of industry at the moment?

VESA just announced new DP standard, HDMI is developing theirs. Display manufacturers have revealed 144Hz 4K displays and then delayed those to infinity.
Things are just stagnating at display front.

Just give me that NAVI/2080ti and 144Hz 4k display and do it now! Take my money!

//rant off
 

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
The refreshes are coming at roughly the speed they have for a few years now.

If you’re getting angsty now, the period between silicon shrinking finally failing and them finding some other thing to do instead won’t be at all fun.
 

Guru

Senior member
May 5, 2017
830
361
106
It might be MCM in terms of server type deal. You build a custom arrangement of say 10 or 20 Navi GPU's linked together to provide hundreds of Gflops in server type environments. Could be possibly used in super computers, imagine an array of 100 Navi gpu's all working in unison to provide 1000's gflops of compute power.

I don't think desktop MCM is a possibility or a reality, but in terms of high end, computational, research, super computers type deal it makes a lot of sense. Nvidia has 1 Volta for 10k(pro drivers or whatever) or $3000 prosumer Volta which brings 1x performance, AMD can fuse together 4xNavi for 2x performance.
 
Mar 11, 2004
23,280
5,722
146
Power and Die size both. AMD is behind NVidia in efficiency. They need more power and more die space to match performance.

Vega 64 already has bigger, more power hungry die that Titan/1080Ti, yet it only performs like a GTX 1080.

That is a large deficit.

Going to MCM is not a panacea, and is much more problematic that using multiple CPU dies.

Even if you magically solved all the MCM GPU issues, the same design split over multiple dies would always be slower than on one monolithic die. It won't save power either. In fact it would be worse there as well, and off chip connections use more power.

I am quite certain Navi won't be MCM, except for the possible Crossfire dual chip cards we have had for years.

GPU MCM will really only make sense if the die target size is ridiculously large.

That's because AMD is trying to compete with Titan and Nvidia's larger compute chips with a single chip. Which was really all AMD could do to try to compete, they just weren't (aren't still) in a good place to make effective use of that. They don't have leverage in the compute/pro space to capitalize on their chip there, and they don't have the resources to get all the graphics/gaming capability of it on the consumer side. Nvidia did a better job of timing how strong to go in the compute direction, enabling them to have a smaller more efficient chip and just focusing on getting the most out of it. AMD had to put out the chips they did because they had to lure developers to them. Much like they pretty much had to start moving to open source as they'd have no chance of competing as they fell further and further behind in the software aspect. So even if AMD had done two chips, one compute focused, and one graphics focused (by focused, I mean basically all about it, not talking the Nvidia situation where they've got the leaner/smaller consumer chip and then the larger pro/enterprise one), I don't think that'd be a good setup either, since they were competing against Nvidia chips that did both, while waiting on the industry. Plus, it would inevitably lead them to doing multi-chip setup anyway (and especially in the consumer space that'd probably be killer since that's a lower margin area where they'd need to be putting two chips).

Absolutely it has issues that will need to be overcome. I'm not entirely sure I'd agree with that, as lots of the tasks GPUs are used for are more parallel than CPU. Not saying it'll be easy, but I don't think there's necessarily anything inherently that makes it worse. Perhaps latency issues (like how important latency is for game rendering especially as we move to VR, where you need to have everything synced well and minimize latency), but the general tasks themselves are pretty well suited for heavy parallelization. And they're working on dealing with latency (looking into light transmission for interconnects, even at the chip level).

Er, you "magically solved all MCM GPU issue" but then here's several other problems? Doesn't sound like you magically solved all the issues then?

The problem is, you act like you'd absolutely be able to make such a monolithic die, and that's exactly why the industry as a whole is starting to look at this type of setup. They're hitting limits on the physics of chip production. I'm not sure the power use will be that much different, as realistically it'd just mean there'd be a bit extra wiring between the components, but it might be possible to do power gating more easily? Plus, its not like there aren't potential benefits, by spreading out the processor, it could actually help thermal performance by spreading the heat out over larger area, enabling less throttling. Plus, as chips start to move towards adding more co-processing, and they develop new memory (even HBM), they can potentially arrange chips in a better manner. Like putting shared memory between components, or putting processors that might share pieces closer. It could also be used to improve efficiency by having multiple chips operating at peak power efficiency in clocks, because the combined chips have more compute units. And because they're also in a more optimal manufacturing size than a single large monolithic design, it could be more cost efficient as well.

Mostly though I'd say stop thinking of it as solely breaking a GPU into multiple pieces, and think of it as potentially enabling various chips to be able to go closer together. Plus it offering flexibility so that you're not stuck trying to figure out balances as you make a monolithic chip. You can better adjust components for different uses, which will help a lot. Sure, if you could make a giant chip with all of that, it could probably beat it some, but you'd either have to make more specific chips (which the engineering expenses of will be higher), or make them more general purpose to be suited for more uses. And they're moving to start using more specialized processing units, which means they'll have even more need to slot more variety of chips in. And instead of doing the old method of add-in cards or the like, they can slot those closer to the other chips. This enables a more granular control over mixing and matching processing components than what we currently have.

I am quite certain that it will, because they're specifically developing it as such. I'm not expecting it to be some mega conquering wonderchip or anything (heck, I won't be surprised if its a bit of a dud or has some major problem, but I think it won't be anything that couldn't be fixed, even if that means that Nvidia or someone else properly implements MCM GPU setup), but I expect, even a 2 module Navi GPU will enable AMD to have a better GPU than trying to compete making a monolithic one of similar capability. Plus eventually it helps them do a complete stack of GPUs (and thus not have the situation where we've ended up with a bunch of GPUs with minor differences like with the various GCN versions, or situations like Polaris and Vega where there's fairly substantial differences). Initially I think it'll be more simple. We'll see something like Polaris level with 1 module and 1 stack of HBM, then a higher chip with 2 modules and 2 stacks placed between them.

I don't entirely agree. I think it will enable that (effectively we'll get more GPU), but I think even now, it has financial benefits from focusing on a single module size to mass produce, which will help on the chip design/engineering side (which is a substantial part of the costs; which absolutely those resources will need to go to getting the modules to work together now, so short term you're not gonna save probably any overall resources, but if you get it working well it would have big savings and/or let you reallocate resources to providing more variability in products, or chip design or improving the chip, like tweaking it for higher clocks, etc). I do think that doing that though, will require figuring out an optimal size for the module. I think that's why Ryzen was a success is that they figured out the right size to let them do an entire product stack. If they'd have decided to go with 1 or 2 cores per module max, would have probably not been as beneficial for them (unless each core brought a ton of performance, which I don't think it would). Plus balancing other aspects (another thing they did very well with Ryzen, is the input output, the memory and other, if they'd had just gotten it to be able to do 1-4 channels versus 2-8, and then didn't get the input/output good enough, thinking if it topped out at 64 PCIe lanes, things wouldn't be as rosy).

Something else I wonder, is could this start enabling something a bit new (in this space at least). Like stacking, like say putting the memory sandwiched between the two processors (not horizontally, but vertically). For a video card, think having a chip where there's two GPUs, one on either side of the card with the memory being between them. Maybe that'd have benefits for interposer (where instead of having to have a large one, you could have a smaller one and have direct connections between the GPUs, helping with latency)?

It might be MCM in terms of server type deal. You build a custom arrangement of say 10 or 20 Navi GPU's linked together to provide hundreds of Gflops in server type environments. Could be possibly used in super computers, imagine an array of 100 Navi gpu's all working in unison to provide 1000's gflops of compute power.

I don't think desktop MCM is a possibility or a reality, but in terms of high end, computational, research, super computers type deal it makes a lot of sense. Nvidia has 1 Volta for 10k(pro drivers or whatever) or $3000 prosumer Volta which brings 1x performance, AMD can fuse together 4xNavi for 2x performance.

I don't know, I feel like Ryzen is showing it is possible, and I think graphics should be well suited to it, and then in the future they'll adjust the whole situation (mixing and matching units based on the needs of the customer). The real question is will they actually implement it well in the end product. Both Intel and Nvidia seem to be following suit, or are at least strongly looking at it. Intel more than just looking at it as the chips with Vega are kinda what I'm talking about mixing processing components and memory. It enables companies to be flexible in ways that wouldn't in the past make sense.
 

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
136
Er, you "magically solved all MCM GPU issue" but then here's several other problems? Doesn't sound like you magically solved all the issues then?

My mistake. It was late. I should have said, even if you magically solve the two major problems:
A: Needing duplicate Memory Pools for each Chip.
B: Needing some kind of driver software to handle the partitioning (Like CF/SLI).

Those are NOT solved problems and MCM GPU for gaming is pretty much DOA until solved.

Even when they are someday solved. Putting the same Compute Units in multiple chips will ALWAYS be slower than having them in a monolithic design for GPU Gaming. The latency penalty for going off chip will NEVER be zero.



The problem is, you act like you'd absolutely be able to make such a monolithic die, and that's exactly why the industry as a whole is starting to look at this type of setup. They're hitting limits on the physics of chip production. I'm not sure the power use will be that much different, as realistically it'd just mean there'd be a bit extra wiring between the components, but it might be possible to do power gating more easily? Plus, its not like there aren't potential benefits, by spreading out the processor, it could actually help thermal performance by spreading the heat out over larger area, enabling less throttling.

No, I don't. I specifically said, it only makes sense, if your chip is ridiculously large. That isn't remotely the case for gaming GPUs. It really only comes into play in the HPC data center. These are actually two separate problem spaces. If MCM designs appear they will be for Data Center ONLY. The appetite and pricing for HPC components, can push designs large enough to consider MCM at some point in foreseeable future. That foreseeable future does not exist for gaming GPUs.

Multiple chips do not save power. The will increase it, as off chip interconnects increase power usage, not decrease it. Idle power is NOT the issue. Full load power is, so gating by turning off chips is not a real benefit.


Something else I wonder, is could this start enabling something a bit new (in this space at least). Like stacking, like say putting the memory sandwiched between the two processors (not horizontally, but vertically). For a video card, think having a chip where there's two GPUs, one on either side of the card with the memory being between them.

You keep mentioning having the memory between them. Do think think both GPU chips would be somehow wired to the same memory chips? That is NOT how this works. The GPU chips will be in the center. Each with it's own pool of memory on the outside. The GPU close together with their high speed interconnects between them. I expect we will see that center GPU layout within a couple of years for datacenter HPC. What you describe (memory in between) will never happen.



I don't know, I feel like Ryzen is showing it is possible, .

Ryzen is what I think spawns most of the wrong headed thinking on this issue. CPU and Graphics GPU are completely different problem spaces. Do you remember all the issues with Crossfire/SLI like drivers for CPUs when people had dual socket motherboards? You don't because that kind of issue never existed.

Gaming GPUs are a unique problem space with Unique issues. Nothing about Ryzen is applicable to solving issues with MCM gaming GPUs.
 

24601

Golden Member
Jun 10, 2007
1,683
39
86
Ryzen is what I think spawns most of the wrong headed thinking on this issue. CPU and Graphics GPU are completely different problem spaces. Do you remember all the issues with Crossfire/SLI like drivers for CPUs when people had dual socket motherboards? You don't because that kind of issue never existed.

Gaming GPUs are a unique problem space with Unique issues. Nothing about Ryzen is applicable to solving issues with MCM gaming GPUs.
Ryzen uses glorified "custom" PCI-E 3.0 x16 to communicate between dies as well as having a "northbridge" between the 4+4 core setup. (And the design suffers massively due to those compromises)

Have fun trying to do the same with 500-1000 gBps GPUs.
 

Bouowmx

Golden Member
Nov 13, 2016
1,147
551
146
I see MCM GPUs to make the biggest even bigger: ex. two 600 mm² (issue: dense packing of 500-600 W to dissipate) to create a married-pair to share various PCB components.

Why would one go the route of two 300 mm² GPUs in a MCM, rather than one 600 mm²? implying a reason other than yields.
 

Guru

Senior member
May 5, 2017
830
361
106
Its more effective in terms that 600mm chip can end up faulty more of the time, its a bigger die size, 300mm has a lot more chance of being 100% operational. Basically most of the 1080ti are faulty Titan xp's. So putting 2x300mm chips together makes more sense.

Going for a big chip like Nvidia's Volta is also more risky, what if there is little to no market for it, you've set yourself up for a certain market and cost and can't really reduce that 900mm cost, you have to sell the chip at a certain price range and again it will get a lot of defective chips, probably the chips that Nvidia is selling at $3000 in their desktop Volta variant.

Its good thing that they can sell it at $3000, I think it wouldn't make sense for them if it was anything in the price range of the Titan XP like $1300.
 

24601

Golden Member
Jun 10, 2007
1,683
39
86
Its more effective in terms that 600mm chip can end up faulty more of the time, its a bigger die size, 300mm has a lot more chance of being 100% operational. Basically most of the 1080ti are faulty Titan xp's. So putting 2x300mm chips together makes more sense.

Going for a big chip like Nvidia's Volta is also more risky, what if there is little to no market for it, you've set yourself up for a certain market and cost and can't really reduce that 900mm cost, you have to sell the chip at a certain price range and again it will get a lot of defective chips, probably the chips that Nvidia is selling at $3000 in their desktop Volta variant.

Its good thing that they can sell it at $3000, I think it wouldn't make sense for them if it was anything in the price range of the Titan XP like $1300.

JHH said on record that the manufacturing cost alone for GV100 is 1000 USD per card (and that was with lower spot prices for memory at the time, it's probably much higher than that now).

Add in R & D and etc and it's astronomical costs.

No amount of wishful thinking will make MCM GPUs a good idea for consumer level chips though, as it has higher die area, interposer or EMIB required, HBM (or something like HBM) required, much lower efficiency than monolithic die due to 1000+ gigabytes per second low latency connection needed between the dies, and that off die communication is always way more expensive power and latency wise than on die (take a look at NVLink and how much die space it takes for just 80 gigabytes per second links to 3 other GV100s/Power 9 CPUs)
 
Last edited:

davide445

Member
May 29, 2016
132
11
81
I used some time to gather and analyzing data, here my results





GPU for year are

2012: HD7970, GTX680
2013: R9 290X, GTX780Ti
2015: Fury, Titan X
2016: RX 480, GTX 1060 6GB
2017: Vega 64, Titan Xp

Major correlation for the differences I find are

Watt/surface vs Passmark Compute 0.63
ROP%Shaders vs Passmark 0.44

Curious is Watt/surface is the one with the lowest average difference AMD/NV 10% respect as example Passmark/Watt difference is on average more than 30%.

So appear in fact NV is able to squeeze the surface lowering the power consumption , with huge effects on the compute performance. The only AMD with smaller surface than equivalent NV is R9 290X Hawaii (considered the AMD most competitive GPU in the recent history), the only one that does have exactly the same TDP as the 780Ti competitor.

For the graphics performance appear the ROP vs Shaders number does have the major effect.

I suppose the two effects might be related, with AMD using on average 25% more shaders (again except in R9 290X) and 50% less ROPS (again except in R9 290X).

So from this analysis appear AMD need to rebalance his Shader/ROPS architecture.
 
Last edited:

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
No amount of wishful thinking will make MCM GPUs a good idea for consumer level chips though

Good thing both AMD and nVidia are using the best graphics ASIC designers in the world instead of 'wishful thinking' then
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |