Vega/Navi Rumors (Updated)

Page 194 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.
May 11, 2008
20,055
1,290
126
That would be doubtful, since all the major current API's wouldn't make use of it.
That is why it most likely is just a generic "black box", takes geometry as input, sees if it need culling, and discards what isn't seen.

While new functions can be exposed to things like Vulkan, it won't make it into DX anytime soon.

I agree, the efforts would be in the driver then. Hiding and changes.
I am still thinking of that vega can switch between modes.
I have read that it has more transistors than fury has.
I am thinking and hoping vega can switch between legacy intermediate mode and tile based rendering.
Right now, i am assuming we have only seen the legacy intermediate mode.
 

french toast

Senior member
Feb 22, 2017
988
825
136
Not sure how anyone can question HBM. It's just expensive for now because it's new but even HBM2 can slam dunk over GDDR6's projected bandwidth with a 4-stack at full speed (1000MHz) IIRC. Something like 1024GB/s with a 4-stack @1000MHz and at lower than GDDR5X power consumption. Price is the only issue for now, but I'm sure that'll change considering Samsung is starting production too.
Theretically yes, but so far HBM2 has been a massive disappointment; It's running much slower than original forecasts for this timeline (756mb-1gb P/s,) it's actually worse bandwidth than HBM1 from 2 years earlier, likely with much higher latency, it also is pushing more voltage than it should to reach those speeds= less efficient.
Oh and it's late, very very late.
AMD shold have gone for gddr5x for the short term untill HBM2 was ready for prime time, Vega would be out Q3 2016 I feel.
 
Mar 10, 2006
11,715
2,012
126
Theretically yes, but so far HBM2 has been a massive disappointment; It's running much slower than original forecasts for this timeline (756mb-1gb P/s,) it's actually worse bandwidth than HBM1 from 2 years earlier, likely with much higher latency, it also is pushing more voltage than it should to reach those speeds= less efficient.
Oh and it's late, very very late.
AMD shold have gone for gddr5x for the short term untill HBM2 was ready for prime time, Vega would be out Q3 2016 I feel.

I doubt AMD will stick with HBM for future consumer GPUs. GDDR6 for all
 
Reactions: french toast

advt.naveen

Junior Member
May 17, 2013
20
7
81
[QUOTE="Vesku, post: 38995124
Maybe the modules are created by new teams but its lead engineers have high knowledge and they are mostly from stateside.

When a SOC is designed they have a rough knowledge of transistor count, area, power. For performance too they have few tests to analyze before it goes to rtl logic freeze.

There are lot of reviews are done. Modules designed and reviewed after top level arch is designed. When modules are designed its clearly know to rtl designer to what to write inside the blocks. Once a module is completed its also reviewed again. Reviews are done with team of high level technical leads and not just local team members.

AMD have the ability to choose engineers according to their requirements in the new teams.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,153
136
Theretically yes, but so far HBM2 has been a massive disappointment; It's running much slower than original forecasts for this timeline (756mb-1gb P/s,) it's actually worse bandwidth than HBM1 from 2 years earlier, likely with much higher latency, it also is pushing more voltage than it should to reach those speeds= less efficient.
Oh and it's late, very very late.
AMD shold have gone for gddr5x for the short term untill HBM2 was ready for prime time, Vega would be out Q3 2016 I feel.
It's worse bandwidth because AMD chose half the bus width for Vega. Had they gone with 4096 bit again, we're talking near 1TB/s of bandwidth. It has lower latency than HBM1 by virtue of being clocked twice as fast.
 
Last edited:

railven

Diamond Member
Mar 25, 2010
6,604
561
126
So by your logic, RX VEGA could be 15-20% faster than GTX1080 and this goes in line with what AMD said about being close to GTX1080 and why they used the GTX1080 on the last event . Ok thanks for clearing the misleading RX performance shown by AMD.

You inferred all this from my post? Is this how you inferred 90% of GTX 1080 in DX12 for half the price reading AMD's Polaris estimates?

My post was simple and made no estimates of raw performance.

EDIT:

If true, then that whole excursion into a HBCC world is wasted and will have to be abandoned.

GDDR will not work for an effective HBCC implementation.

Well, no, I meant more so along the lines of GDDR for their consumer flavors while keeping HBM for their prosumer and professional flavors.

IE more along the lines of Nvidia. I'm even putting this on Nvidia, since I personally don't see them using HBM for a consumer product at least until well into 2018. I wouldn't be surprised if the GV102 equivalent is not HBM based either.
 

mohit9206

Golden Member
Jul 2, 2013
1,381
511
136
Seeing as how amd is mum about vega performance, i don't see how people are expecting the flagship Vega to be faster than 1080.
And no, select few amd optimized games do not count.
I mean faster than 1080 in most games. I just don't see that happening.
The grave is already dug and Vega is ready to be laid to rest come launch reviews.
Worse than Fermi is quite an achievement in this day and age.
 

Elixer

Lifer
May 7, 2002
10,376
762
126
Here is some video of the Vega demo @PDXLAN


*edit It was recorded last night, so, sometime today, we should know the reveal.
 
Last edited:

Elixer

Lifer
May 7, 2002
10,376
762
126
You can't use GDDR5X/6 on APU's. It's not all about DGPU. They are working on making them extinct.
True, however, not sure HBM is the answer there either.
They built the new cache controller to speed up operations, so, it is possible they will switch to something like eDRAM.
OEMs are not likely going to pay a premium just for having a HBM package.
It all boils down to cost.
 

Mopetar

Diamond Member
Jan 31, 2011
8,015
6,465
136
If smaller Vega is aimed at a more mainstream segment, it just makes more sense for AMD to use a cheaper memory type.

I'm not sure small Vega exists at this point or is meant to be a discrete card solution. If anything it might be the Vega part that ends up being used in AMD's upcoming APUs.

I suspect that if it were a smaller (say ~36 CU) part then they could probably get away with using a single stack of 2 or 4 GB HBM assuming that the HBCC and memory compression tech they have in Vega reduces the need for larger amounts of memory. The demos showing solid frame rates in modern games when restricting Vega to 2 GB of VRAM suggest they may be able to get by with using smaller quantities of HBM.

One question I have is how salvageable is HBM. Can a failed 8-Hi stack of HBM be salvaged to work as a 4-Hi or 2-Hi? If that's the case I think it could be financially feasible for AMD to use it in a stopgap Vega replacement for Polaris.

Otherwise they need to get to Navi where they can use a modular approach to lower their costs and be more competitive with NVidia. I suspect that means using more CUs at a lower clock speed where their architecture doesn't require an attached diesel generator to run the card.

Long term they still need a GCN replacement because it's fairly clear there's some kind of hurdle that they can't seem to get over that may require a fresh start.
 

french toast

Senior member
Feb 22, 2017
988
825
136
It's worse bandwidth because AMD chose half the bus width for Vega. Had they gone with 4096 bit again, we're talking near 1TB/s of bandwidth. It has lower latency tham HBM1 by virtue of being clocked twice as fast.
Oh, I thought the narrow but higher clocked bus = more latency? Perhaps I'm wrong there.
Still I'm sure the whole point of hbm 2 was having much higher clocks to achieve >512gb/s bandwidth whilst using a smaller bus and increasing capacity?
It's worse bandwidth because AMD chose half the bus width for Vega. Had they gone with 4096 bit again, we're talking near 1TB/s of bandwidth. It has lower latency tham HBM1 by virtue of being clocked twice as fast.
I'm just trawling through various product announcements and articles to re learn about this.
I think the problem is the yeilds and/or chip quality of hbm 2 was below AMD expectations, hence why they only went with 2 stacks be 4 stacks and 4096bit bus with Fiji, I think they expected higher clocks and lower voltages, I also think they expected enough chips so they could launch late 2016, as last year's roadmap vaguely pointed to. (2015? Roadmap, Q4 16 for Vega).

I think AMD was too ambitious withdie size and hbm, had they known HBM 2 would have issues I'm sure they would have gone for 4 stacks lower clocked and more rops and shaders for a Fiji like die size.
 
Last edited:

Mopetar

Diamond Member
Jan 31, 2011
8,015
6,465
136
I think they ran into serious issues with Vega and needed time to fix it (re-spin or more radical change). The now-useless-for-gaming cards are sold as 'Prosumer' and potentially lower end models.

It seems to me that if that were the case there should be a bigger gap betweeen the FE and RX cards (though if it is a paper launch as some predict this may be the case) in terms of launch as it is going to be at least three months to diagnose and fix the problem and to get new silicon back.

On further thought, there may be some validity to AMD sandbagging, but not for the reasons most think. If they know Vega isn't going to live up to expectations, if they make it seem a lot worse than it is, some people will be delighted when it turns out better than that worst-case. If it's really between a 1080 and Ti at obscene powered levels, but a lot of people think it's only barely matching a 1080 going in, they'll come out with a more favorable opinion than if they had more realistic expectations.

The card is still objectively less than great for gaming, but people feel as though it's not as awful as it really is because they expected much worse.
 

eek2121

Diamond Member
Aug 2, 2005
3,051
4,276
136
There has been a ton of speculation in this thread, and much of it has been negative. So I'm posting this to hopefully pull people back to reality.

RX Vega Performance:
We know NOTHING of RX Vega. It could be a completely different version of the Vega architecture. We don't even know if it uses HBM2. (It likely does, for reasons I'll get to in a minute). What we do know is:
  • RX Vega will be faster than Vega FE in gaming.
  • AMD has a large team working on drivers for RX Vega.
  • 3rd parties won't receive a final BIOS until August.
  • AMD is making an argument for value during the Vega Roadtrip
  • AMD Anticipates increasingly higher profit margins from both their CPU and GPU divisions as we go into 2018-2020. They would not have made this projection if they were expecting a failure.
HBM2:
I see a lot of people claiming that HBM2 is expensive. This is incorrect. AMD would have signed a multi-year deal based on the purchase of X amount of units. This deal would have a HUGE volume discount built in. Nobody outside of AMD and its supplier knows what the final number is. I CAN tell you from experience that the number they got was acceptable to them. This means that pricing is likely very good. Much better pricing than you or I can get.

Pascal:
I've seen people all over the internet saying that AMD was 'blown away' by the performance of Pascal, particularly the 1080ti. AMD has been in this business quite a while, and I seriously doubt anything 'blew them away'. Even if the 1080ti was unexpected (it wasn't), they had time to retool Vega and make it faster. Nvidia's performance bumps have been anything BUT "surprising" for the past few years. Most of us have predicted performance across multiple generations of parts.

Alternate Reality:
Given what I've stated above, let me present to you 2 entirely different, but equally possible scenarios (with healthy amounts of speculation of course):
  1. AMD is releasing a 1070 and 1080 competitor, and holding back on the flagship until next year. As an example of why this could be the case: They could be planning a 4 stack version of RX Vega, but initially want to only release less expensive 2-stack versions. A reason for doing this is that their contract could be front loaded (higher price for the first X units. Lower price for the following units). Another reason they could be doing this is that they expect Nvidia to drop a refresh pretty soon and they don't want to show all their 'cards' yet.
  2. RX Vega could be a new spin that is much faster and clocked higher than the original. Note that the development of RX Vega could have been taking place parallel to Vega FE, so the two could be following entirely different development tracks.
Final Thoughts:
We don't really know much about RX Vega. So it's best we temper both positive and negative expectations until we have more information. AMD is not incompetent and they don't live in a bubble. They would not be releasing RX Vega if they did not believe they could provide excellent value for the consumer while maintaining healthy margins.
 

Elixer

Lifer
May 7, 2002
10,376
762
126
There has been a ton of speculation in this thread, and much of it has been negative. So I'm posting this to hopefully pull people back to reality.
...
You just posted more speculation...unless you have sources inside AMD?

Final Thoughts:
We don't really know much about RX Vega. So it's best we temper both positive and negative expectations until we have more information.
That much is true...
AMD is not incompetent and they don't live in a bubble. They would not be releasing RX Vega if they did not believe they could provide excellent value for the consumer while maintaining healthy margins.
Again, you don't know what AMD's thinking is here, so it is more speculation.
For all we know, this whole fiasco could just be a stopgap measure to hold them off until Navi arrives. It was too late to pull the plug, and Navi can't arrive fast enough, so they were stuck with Vega.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,153
136
Oh, I thought the narrow but higher clocked bus = more latency? Perhaps I'm wrong there.
Still I'm sure the whole point of hbm 2 was having much higher clocks to achieve >512gb/s bandwidth whilst using a smaller bus and increasing capacity?

I'm just trawling through various product announcements and articles to re learn about this.
I think the problem is the yeilds and/or chip quality of hbm 2 was below AMD expectations, hence why they only went with 2 stacks be 4 stacks and 4096bit bus with Fiji, I think they expected higher clocks and lower voltages, I also think they expected enough chips so they could launch late 2016, as last year's roadmap vaguely pointed to. (2015? Roadmap, Q4 16 for Vega).

I think AMD was too ambitious withdie size and hbm, had they known HBM 2 would have issues I'm sure they would have gone for 4 stacks lower clocked and more rops and shaders for a Fiji like die size.
Higher clocks means lower latency. Latency is measured in cycles, so higher clock means the same amount of cycles is done quicker.

I think it was simply a cost measure. Two stacks means a much smaller interposer than for four stacks. It's also far easier to do traces for a 2048 bits bus vs a 4096 bit bus. It's basically double the wiring. The die size for Vega itself is also completely reasonable, considering it took a while for 28nm to be mature enough to go 600mm^2.

With TBR and improved compression, it wouldn't be out of the question for Vega to perform much better than Fiji even with the same bandwidth. You can look at 1080 Ti bandwidth to see an example.
 
Reactions: tonyfreak215

french toast

Senior member
Feb 22, 2017
988
825
136
Higher clocks means lower latency. Latency is measured in cycles, so higher clock means the same amount of cycles is done quicker.

I think it was simply a cost measure. Two stacks means a much smaller interposer than for four stacks. It's also far easier to do traces for a 2048 bits bus vs a 4096 bit bus. It's basically double the wiring. The die size for Vega itself is also completely reasonable, considering it took a while for 28nm to be mature enough to go 600mm^2.

With TBR and improved compression, it wouldn't be out of the question for Vega to perform much better than Fiji even with the same bandwidth. You can look at 1080 Ti bandwidth to see an example.
Ok thanks, but what about ddr4 runs faster than ddr3, but latency is higher?
 

Zstream

Diamond Member
Oct 24, 2005
3,396
277
136
Higher clocks means lower latency. Latency is measured in cycles, so higher clock means the same amount of cycles is done quicker.

I think it was simply a cost measure. Two stacks means a much smaller interposer than for four stacks. It's also far easier to do traces for a 2048 bits bus vs a 4096 bit bus. It's basically double the wiring. The die size for Vega itself is also completely reasonable, considering it took a while for 28nm to be mature enough to go 600mm^2.

With TBR and improved compression, it wouldn't be out of the question for Vega to perform much better than Fiji even with the same bandwidth. You can look at 1080 Ti bandwidth to see an example.

Just stop talking. Higher clocks do not mean lower latency. That's total crap.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,153
136
Ok thanks, but what about ddr4 runs faster than ddr3, but latency is higher?
The amount of cycles is higher, but it's made up for by higher clocks. Notice how the CL number after the speed tends to be higher with DDR4. That's the CAS latency, and the lower the number the better. But higher clocks means you can go through those cycles faster, negating the penalty.
Early DDR4 indeed had higher overall latency due to higher CAS latency while not being very fast (1200MHz or so).

DDR3 at 2133 with CAS latency of 9 would have lower latency than DDR4 at 2400 with CAS latency of 15. 8.4ns vs 12.5ns for column access.

Increasing DDR4 speed to 3600 while maintaining the same CAS latency would reduce it to 8.3ns.

Just stop talking. Higher clocks do not mean lower latency. That's total crap.
Before you call something total crap I suggest you read up and learn about the subject. Have a nice day
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Interposers are not very expensive, so we can please drop that recurring statement. Passive interposers are very simple (copper conductors), made on old processes and I doubt there is a yield problem. A 2012 article.

http://electroiq.com/blog/2012/12/lifting-the-veil-on-silicon-interposer-pricing/
"Sesh Ramaswami, managing director at Applied Materials, showed a cost analysis which resulted in 300mm interposer wafer costs of $500-$650 / wafer. His cost analysis showed the major cost contributors are damascene processing (22%), front pad and backside bumping (20%), and TSV creation (14%).

Ramaswami noted that the dual damascene costs have been optimized for front-end processing, so there is little chance of cost reduction there; whereas cost of backside bump could be lowered by replacing polymer dielectric with oxide, and the cost of TSV formation can be addressed by increasing etch rate, ECD (plating) rate, and increasing PVD step coverage.

Since one can produce ~286 200mm2 die on a 300mm wafer, at $575 (his midpoint cost) per wafer, this results in a $2 200mm2 silicon interposer."



HBM stacks can be costly due to manufacturing difficulties. You have to assemble 5 die at a minimum to very small tolerances. Thousands of micro-bumps at 55 micrometer spacing. Errors at any one stage wastes all 5 die. This is what I believe is restraining quicker adoption.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,153
136
Interposers are not very expensive, so we can please drop that recurring statement. Passive interposers are very simple (copper conductors), made on old processes and I doubt there is a yield problem. A 2012 article.

http://electroiq.com/blog/2012/12/lifting-the-veil-on-silicon-interposer-pricing/
"Sesh Ramaswami, managing director at Applied Materials, showed a cost analysis which resulted in 300mm interposer wafer costs of $500-$650 / wafer. His cost analysis showed the major cost contributors are damascene processing (22%), front pad and backside bumping (20%), and TSV creation (14%).

Ramaswami noted that the dual damascene costs have been optimized for front-end processing, so there is little chance of cost reduction there; whereas cost of backside bump could be lowered by replacing polymer dielectric with oxide, and the cost of TSV formation can be addressed by increasing etch rate, ECD (plating) rate, and increasing PVD step coverage.

Since one can produce ~286 200mm2 die on a 300mm wafer, at $575 (his midpoint cost) per wafer, this results in a $2 200mm2 silicon interposer."



HBM stacks can be costly due to manufacturing difficulties. You have to assemble 5 die at a minimum to very small tolerances. Thousands of micro-bumps at 55 micrometer spacing. Errors at any one stage wastes all 5 die. This is what I believe is restraining quicker adoption.
Aye, the interposer itself isn't too costly. The problem is assembly. Fail to properly connect one of the thousands of micro bumps and the entire chip, including memory and GPU, becomes junk.

Mind you, the interposer for Fiji was around 1000mm^2, while for Vega it should be 800mm^2 ish. This puts the cost per die with perfect yields at around 10 dollars a pop. Not impossible to manage but also not very fun for margins.
 

eek2121

Diamond Member
Aug 2, 2005
3,051
4,276
136
You just posted more speculation...unless you have sources inside AMD?


That much is true...

Again, you don't know what AMD's thinking is here, so it is more speculation.
For all we know, this whole fiasco could just be a stopgap measure to hold them off until Navi arrives. It was too late to pull the plug, and Navi can't arrive fast enough, so they were stuck with Vega.

What part of my post is speculation? The only part that was speculation is the part I said had healthy amounts of speculation. Every thing I stated avbout RX Vega came DIRECTLY from comments that AMD has made (reddit AMA, twitter comments from Raja, AMD earnings report)
 
Reactions: tonyfreak215

Zstream

Diamond Member
Oct 24, 2005
3,396
277
136
The amount of cycles is higher, but it's made up for by higher clocks. Notice how the CL number after the speed tends to be higher with DDR4. That's the CAS latency, and the lower the number the better. But higher clocks means you can go through those cycles faster, negating the penalty.
Early DDR4 indeed had higher overall latency due to higher CAS latency while not being very fast (1200MHz or so).

DDR3 at 2133 with CAS latency of 9 would have lower latency than DDR4 at 2400 with CAS latency of 15. 8.4ns vs 12.5ns for column access.

Increasing DDR4 speed to 3600 while maintaining the same CAS latency would reduce it to 8.3ns.


Before you call something total crap I suggest you read up and learn about the subject. Have a nice day

You're just making a fool of yourself. You stated higher clocks means less latency. That's one of the most unintelligent technical comment I've seen on this forum in a while.

Higher clocks do NOT = less latency. Your back tracking on the issue is laughable.
 
Status
Not open for further replies.
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |