[WCCF] AMD Radeon R9 390X Pictured

Page 34 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Cloudfire777

Golden Member
Mar 24, 2013
1,787
95
91
"Process doesnt matter"?
lol seriously? I bet $50 that AMD didnt want to build this 500-600mm2 Fiji if they had other process where they could go down in size

70x70 isnt the upper limit for the silicon. Its much lower since it includes the border around the package.

Keep hoping for 8 stacks but there are too many potential issues with it, so I dont think it will happen. Read the HBM slides over time. Increasing height and stacks are the only way and the only options thats been presented.

35mm2 x 8 for HBM is an extra 280mm2. Unlike GDDR5 they will be part of the silicon and the TSV. Its not sitting on a big PCB like current cards so you cant compare available space like that
 
Last edited:

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
Seems reasonable for NV to be testing working boards for a year or more, given the product.
 

Azix

Golden Member
Apr 18, 2014
1,438
67
91
"Process doesnt matter"?
lol seriously? I bet $50 that AMD didnt want to build this 500-600mm2 Fiji if they had other process where they could go down in size

70x70 isnt the upper limit for the silicon. Its much lower since it includes the border around the package.

Keep hoping for 8 stacks but there are too many potential issues with it, so I dont think it will happen. Read the HBM slides over time. Increasing height and stacks are the only way and the only options thats been presented.

35mm2 x 8 for HBM is an extra 280mm2. Unlike GDDR5 they will be part of the silicon and the TSV. Its not sitting on a big PCB like current cards so you cant compare available space like that

Not really getting your reasoning. Based on the dimensions and areas it seems there is tons of space. The whole package (GPU, interposer and HBM + whatever else). The interposer will be acting as what a PCB normally would. That area seems potentially large enough for 8. And actually, if you can have 4 you can have 8 if you're dealing with a square area and mostly square objects. 2 on 2 sides means you can have 2 on 4 sides. 2 at the corners means you can have 1 between them.

in terms of area it seems a done deal like what said in that interview. You can put more than four.

Seems reasonable for NV to be testing working boards for a year or more, given the product.

Maybe but then why assume things are accurately dimensioned. It doesn't even have power connectors or a slot to interface with a PC.

I would think this below was enough to show that more stacks are possible.

“You’re not limited in this world [die stacking world] to any number of stacks, but from a capacity point of view, this generation-one HBM, each DRAM is a two-gigabit DRAM, so yeah, if you have four stacks you’re limited to four gigabytes. You could build things with more stacks, you could build things with less stacks. Capacity of the frame buffer is just one of our concerns. There are many things you can do to utilise that capacity better. So if you have four stacks you’re limited to four [gigabytes], but we don’t really view that as a performance limitation from an AMD perspective.”

Read more: http://wccftech.com/amd-addresses-capacity-limitation-concern-hbm/#ixzz3aigCvR2v
 
Last edited:

kagui

Member
Jun 1, 2013
78
0
0
i think the problem is the routing
1024 pins per stack
1024 * 4 = 4096 traces

now double that and you get

8192 traces, that have to be routed in a 70mm x 70mm square

and probably also doubling power consumption
 

boozzer

Golden Member
Jan 12, 2012
1,549
18
81
Have you read documentation of Mantle and OpenCL 2.0?

HBM looks like was designed exactly for that.

Second thing is that OoOE is for only how the data is handled. Its not STORED in VRAM, its a constant stream of data to the core.

THAT makes gigantic difference. Lets just say, that with this ability Hawaii chip would be faster than GTX 980.
I love how you got ignored
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
no, he didn't. It was questioned whether that meant anything in the grand scheme of things since system RAM will still be a bottleneck.

And in that I think we both agree . Unless the System RAM will become HBM or HMC.

Then it will not be an issue. But that is a totally different story.
 

RampantAndroid

Diamond Member
Jun 27, 2004
6,591
3
81
Maybe but then why assume things are accurately dimensioned. It doesn't even have power connectors or a slot to interface with a PC.

You don't see the opposite side of the board, where there are likely pins or pads. The GPU itself sits on top of the green PCB; you see no connectors, but they are on the bottom side.......

And in that I think we both agree . Unless the System RAM will become HBM or HMC.

Then it will not be an issue. But that is a totally different story.

I think bottleneck is both the system RAM speed *and* the interlink speed (in this case PCIe) - neither will match the speeds of the cache that is on the card, either due to latency, interlink speed or simply contention. It's why I question whether saying "4GB is enough" is a good idea. I think that's what you say when you're unable to go above 4GB, and know that you probably should be going above 4GB, if that makes any sense. I think AMD would be going to 6GB if it were an option, to avoid any driver dodginess.
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
Or it would make sence if the application would be able to store big, less important data in System RAM, and the most important RAM executed in VRAM. Emphasize on Stored-executed is important.

But for that Application must know where the data is, and which data is more and less important, therefore - drivers. That is my opinion.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Not really getting your reasoning. Based on the dimensions and areas it seems there is tons of space. The whole package (GPU, interposer and HBM + whatever else). The interposer will be acting as what a GPU normally would. That area seems potentially large enough for 8. And actually, if you can have 4 you can have 8 if you're dealing with a square area and mostly square objects. 2 on 2 sides means you can have 2 on 4 sides. 2 at the corners means you can have 1 between them.

in terms of area it seems a done deal like what said in that interview. You can put more than four.



Maybe but then why assume things are accurately dimensioned. It doesn't even have power connectors or a slot to interface with a PC.

I would think this below was enough to show that more stacks are possible.

I wish you good luck in presenting this and being taken seriously.

First you will be told 4 stacks is the limit and then if you show why that is wrong [per your AMD's Joe Macri statement], you will be told about complexity and size on interposer disregarding the fact that the interposer traces are many times smaller that pcb traces and 8192 can fit if needed, or you will be told that the interposer is too small, which I see you have already disproved.
 

RampantAndroid

Diamond Member
Jun 27, 2004
6,591
3
81
Or it would make sence if the application would be able to store big, less important data in System RAM, and the most important RAM executed in VRAM. Emphasize on Stored-executed is important.

But for that Application must know where the data is, and which data is more and less important, therefore - drivers. That is my opinion.

I suggest you read the article I linked to on MSDN. This is already what goes on. What I am questioning is what resources sit in VRAM and how much is needed in a given period of time.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
You don't see the opposite side of the board, where there are likely pins or pads. The GPU itself sits on top of the green PCB; you see no connectors, but they are on the bottom side.......



I think bottleneck is both the system RAM speed *and* the interlink speed (in this case PCIe) - neither will match the speeds of the cache that is on the card, either due to latency, interlink speed or simply contention. It's why I question whether saying "4GB is enough" is a good idea. I think that's what you say when you're unable to go above 4GB, and know that you probably should be going above 4GB, if that makes any sense. I think AMD would be going to 6GB if it were an option, to avoid any driver dodginess.

Once you use the 4GB and go through the PCIe bus you are limited by the slowest link in the chain. The solution has to be making 4GB seem to be more by using it more efficiently packing wise

Do you think this could be one way to better utilize cache space.

http://www.memcon.com/pdfs/proceedings2014/NET104.pdf

Pg 26
Pseudo channel has reduced page size compared to Legacy mode. : 2KB -> 1KB
 

Abwx

Lifer
Apr 2, 2011
11,166
3,862
136
What I am questioning is what resources sit in VRAM and how much is needed in a given period of time.

And how they are needed, is the bus used in a bidirectional fashion or is the GPU mainly reading in the memory pools or is there substancial amounts of writes that are performed...

Depending of the answers parameters such as latencies could be instrumental and explain the need of huge frames buffers and caching technics.
 

twjr

Senior member
Jul 5, 2006
627
207
116
My hype just increased for the 390X as Nvidia pr campaing begins: http://www.techpowerup.com/212724/its-now-been-over-160-days-since-a-catalyst-whql-release.html

No word when AMD went 3 months without a driver (14.12 in dec to 15.3 in march). Just waiting for pcper and others to do the same....

Wow that is a pretty bad article from a tech site. Scaremongering at best. Also pretty amusing that they are using driver releases as a means to predict future support given how well GCN has aged vs Kepler.
 

SimianR

Senior member
Mar 10, 2011
609
16
81
My hype just increased for the 390X as Nvidia pr campaing begins: http://www.techpowerup.com/212724/its-now-been-over-160-days-since-a-catalyst-whql-release.html

No word when AMD went 3 months without a driver (14.12 in dec to 15.3 in march). Just waiting for pcper and others to do the same....

The whole releasing a driver for the sake of releasing a driver every time a game comes out seems silly. Clearly AMD needs driver improvements for Project Cars and they missed the mark on that. But most of the benchmarks were showing the R9 290/290X around the 970 performance level when hairworks was turned off in Witcher 3. It almost seems like a psychological thing, had AMD just taken the 15.4.1 beta driver and changed it to .2 and called it "Witcher 3 game ready driver" you wouldn't have all these people whining that they didn't release anything. But I do think they should be taken to task over the fact that Crossfire profiles and support are lacking and some games go months without any support.
 

MrTeal

Diamond Member
Dec 7, 2003
3,584
1,743
136
I wish you good luck in presenting this and being taken seriously.

First you will be told 4 stacks is the limit and then if you show why that is wrong [per your AMD's Joe Macri statement], you will be told about complexity and size on interposer disregarding the fact that the interposer traces are many times smaller that pcb traces and 8192 can fit if needed, or you will be told that the interposer is too small, which I see you have already disproved.

The physical arrangement limit seems silly, especially since in that image of the HBM die you can see the edge of the chip in the bottom left, then some capacitors on the PCB substrate, then a 5x7mm HBM sitting on the silicon interposer next to the middle of the die. What's the spacing to the next HBM in the top left, 5mm or so? If a 550mm^2 Fiji die was 20mmx27.5mm, you could pretty easily fit three HBM along the top as 7mm+5mm+7mm+5mm+7mm = 31mm and just have them poking out the corners.

That chip could very likely just be a process test piece and not actually Fiji, but there's no real physical constraint to fitting 8 stacks around a GPU.
 
Last edited:

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
The lower right pic would only need 8 chips with the newer Samsung high density GDDR5 chips.
 

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
Not if you want to keep the 512-bit bus.

Well, it wouldn't be a 290X, but a new card using the new faster and higher bandwidth GDDR5 chips.

Maybe Fiji will have 4gb HBM and 8Gb GDDR5 versions.

Maybe not...
 

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
Both Micron and Samsung have managed to make the new GDDR5 chips, and I wouldn't be surprised to find them on an NV card soon.
 

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
i think the problem is the routing
1024 pins per stack
1024 * 4 = 4096 traces

now double that and you get

8192 traces, that have to be routed in a 70mm x 70mm square

and probably also doubling power consumption

Do you even know how many "traces" we have in 20mm x 20mm square CPU? Interposer is not a PCB
 
Status
Not open for further replies.
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |