[WCCF] AMD Radeon R9 390X Pictured

Cloudfire777 · May 20, 2015

"Process doesnt matter"?
lol seriously? I bet $50 that AMD didnt want to build this 500-600mm2 Fiji if they had other process where they could go down in size

70x70 isnt the upper limit for the silicon. Its much lower since it includes the border around the package.

Keep hoping for 8 stacks but there are too many potential issues with it, so I dont think it will happen. Read the HBM slides over time. Increasing height and stacks are the only way and the only options thats been presented.

35mm2 x 8 for HBM is an extra 280mm2. Unlike GDDR5 they will be part of the silicon and the TSV. Its not sitting on a big PCB like current cards so you cant compare available space like that

LTC8K6 · May 20, 2015

Seems reasonable for NV to be testing working boards for a year or more, given the product.

Azix · May 20, 2015

Cloudfire777 said:
"Process doesnt matter"?
lol seriously? I bet $50 that AMD didnt want to build this 500-600mm2 Fiji if they had other process where they could go down in size

70x70 isnt the upper limit for the silicon. Its much lower since it includes the border around the package.

Keep hoping for 8 stacks but there are too many potential issues with it, so I dont think it will happen. Read the HBM slides over time. Increasing height and stacks are the only way and the only options thats been presented.

35mm2 x 8 for HBM is an extra 280mm2. Unlike GDDR5 they will be part of the silicon and the TSV. Its not sitting on a big PCB like current cards so you cant compare available space like that

Not really getting your reasoning. Based on the dimensions and areas it seems there is tons of space. The whole package (GPU, interposer and HBM + whatever else). The interposer will be acting as what a PCB normally would. That area seems potentially large enough for 8. And actually, if you can have 4 you can have 8 if you're dealing with a square area and mostly square objects. 2 on 2 sides means you can have 2 on 4 sides. 2 at the corners means you can have 1 between them.

in terms of area it seems a done deal like what said in that interview. You can put more than four.

LTC8K6 said:
Seems reasonable for NV to be testing working boards for a year or more, given the product.

Maybe but then why assume things are accurately dimensioned. It doesn't even have power connectors or a slot to interface with a PC.

I would think this below was enough to show that more stacks are possible.

“You’re not limited in this world [die stacking world] to any number of stacks, but from a capacity point of view, this generation-one HBM, each DRAM is a two-gigabit DRAM, so yeah, if you have four stacks you’re limited to four gigabytes. You could build things with more stacks, you could build things with less stacks. Capacity of the frame buffer is just one of our concerns. There are many things you can do to utilise that capacity better. So if you have four stacks you’re limited to four [gigabytes], but we don’t really view that as a performance limitation from an AMD perspective.”

Read more: http://wccftech.com/amd-addresses-capacity-limitation-concern-hbm/#ixzz3aigCvR2v

Click to expand...

kagui · May 20, 2015

i think the problem is the routing
1024 pins per stack
1024 * 4 = 4096 traces

now double that and you get

8192 traces, that have to be routed in a 70mm x 70mm square

and probably also doubling power consumption

boozzer · May 20, 2015

Glo. said:
Have you read documentation of Mantle and OpenCL 2.0?

HBM looks like was designed exactly for that.

Second thing is that OoOE is for only how the data is handled. Its not STORED in VRAM, its a constant stream of data to the core.

THAT makes gigantic difference. Lets just say, that with this ability Hawaii chip would be faster than GTX 980.

I love how you got ignored

RampantAndroid · May 20, 2015

boozzer said:
I love how you got ignored

no, he didn't. It was questioned whether that meant anything in the grand scheme of things since system RAM will still be a bottleneck.

Glo. · May 20, 2015

RampantAndroid said:
no, he didn't. It was questioned whether that meant anything in the grand scheme of things since system RAM will still be a bottleneck.

And in that I think we both agree . Unless the System RAM will become HBM or HMC.

Then it will not be an issue. But that is a totally different story.

RampantAndroid · May 20, 2015

Azix said:
Maybe but then why assume things are accurately dimensioned. It doesn't even have power connectors or a slot to interface with a PC.

You don't see the opposite side of the board, where there are likely pins or pads. The GPU itself sits on top of the green PCB; you see no connectors, but they are on the bottom side.......

Glo. said:
And in that I think we both agree . Unless the System RAM will become HBM or HMC.

Then it will not be an issue. But that is a totally different story.

I think bottleneck is both the system RAM speed *and* the interlink speed (in this case PCIe) - neither will match the speeds of the cache that is on the card, either due to latency, interlink speed or simply contention. It's why I question whether saying "4GB is enough" is a good idea. I think that's what you say when you're unable to go above 4GB, and know that you probably should be going above 4GB, if that makes any sense. I think AMD would be going to 6GB if it were an option, to avoid any driver dodginess.

Glo. · May 20, 2015

Or it would make sence if the application would be able to store big, less important data in System RAM, and the most important RAM executed in VRAM. Emphasize on Stored-executed is important.

But for that Application must know where the data is, and which data is more and less important, therefore - drivers. That is my opinion.

maddie · May 20, 2015

Azix said:
Not really getting your reasoning. Based on the dimensions and areas it seems there is tons of space. The whole package (GPU, interposer and HBM + whatever else). The interposer will be acting as what a GPU normally would. That area seems potentially large enough for 8. And actually, if you can have 4 you can have 8 if you're dealing with a square area and mostly square objects. 2 on 2 sides means you can have 2 on 4 sides. 2 at the corners means you can have 1 between them.

in terms of area it seems a done deal like what said in that interview. You can put more than four.

Maybe but then why assume things are accurately dimensioned. It doesn't even have power connectors or a slot to interface with a PC.

I would think this below was enough to show that more stacks are possible.

I wish you good luck in presenting this and being taken seriously.

First you will be told 4 stacks is the limit and then if you show why that is wrong [per your AMD's Joe Macri statement], you will be told about complexity and size on interposer disregarding the fact that the interposer traces are many times smaller that pcb traces and 8192 can fit if needed, or you will be told that the interposer is too small, which I see you have already disproved.

RampantAndroid · May 20, 2015

Glo. said:
Or it would make sence if the application would be able to store big, less important data in System RAM, and the most important RAM executed in VRAM. Emphasize on Stored-executed is important.

But for that Application must know where the data is, and which data is more and less important, therefore - drivers. That is my opinion.

I suggest you read the article I linked to on MSDN. This is already what goes on. What I am questioning is what resources sit in VRAM and how much is needed in a given period of time.

maddie · May 20, 2015

RampantAndroid said:
You don't see the opposite side of the board, where there are likely pins or pads. The GPU itself sits on top of the green PCB; you see no connectors, but they are on the bottom side.......

I think bottleneck is both the system RAM speed *and* the interlink speed (in this case PCIe) - neither will match the speeds of the cache that is on the card, either due to latency, interlink speed or simply contention. It's why I question whether saying "4GB is enough" is a good idea. I think that's what you say when you're unable to go above 4GB, and know that you probably should be going above 4GB, if that makes any sense. I think AMD would be going to 6GB if it were an option, to avoid any driver dodginess.

Once you use the 4GB and go through the PCIe bus you are limited by the slowest link in the chain. The solution has to be making 4GB seem to be more by using it more efficiently packing wise

Do you think this could be one way to better utilize cache space.

http://www.memcon.com/pdfs/proceedings2014/NET104.pdf

Pg 26
Pseudo channel has reduced page size compared to Legacy mode. : 2KB -> 1KB

Abwx · May 20, 2015

RampantAndroid said:
What I am questioning is what resources sit in VRAM and how much is needed in a given period of time.

And how they are needed, is the bus used in a bidirectional fashion or is the GPU mainly reading in the memory pools or is there substancial amounts of writes that are performed...

Depending of the answers parameters such as latencies could be instrumental and explain the need of huge frames buffers and caching technics.

DiogoDX · May 20, 2015

My hype just increased for the 390X as Nvidia pr campaing begins: http://www.techpowerup.com/212724/its-now-been-over-160-days-since-a-catalyst-whql-release.html

No word when AMD went 3 months without a driver (14.12 in dec to 15.3 in march). Just waiting for pcper and others to do the same....

LTC8K6 · May 20, 2015

But AMD recently said new driver this week...

boozzer · May 20, 2015

LTC8K6 said:
But AMD recently said new driver this week...

I think you don't understand his post

LTC8K6 · May 20, 2015

boozzer said:
I think you don't understand his post

But what if no driver this week...

twjr · May 20, 2015

DiogoDX said:
My hype just increased for the 390X as Nvidia pr campaing begins: http://www.techpowerup.com/212724/its-now-been-over-160-days-since-a-catalyst-whql-release.html

No word when AMD went 3 months without a driver (14.12 in dec to 15.3 in march). Just waiting for pcper and others to do the same....

Wow that is a pretty bad article from a tech site. Scaremongering at best. Also pretty amusing that they are using driver releases as a means to predict future support given how well GCN has aged vs Kepler.

SimianR · May 20, 2015

DiogoDX said:
My hype just increased for the 390X as Nvidia pr campaing begins: http://www.techpowerup.com/212724/its-now-been-over-160-days-since-a-catalyst-whql-release.html

No word when AMD went 3 months without a driver (14.12 in dec to 15.3 in march). Just waiting for pcper and others to do the same....

The whole releasing a driver for the sake of releasing a driver every time a game comes out seems silly. Clearly AMD needs driver improvements for Project Cars and they missed the mark on that. But most of the benchmarks were showing the R9 290/290X around the 970 performance level when hairworks was turned off in Witcher 3. It almost seems like a psychological thing, had AMD just taken the 15.4.1 beta driver and changed it to .2 and called it "Witcher 3 game ready driver" you wouldn't have all these people whining that they didn't release anything. But I do think they should be taken to task over the fact that Crossfire profiles and support are lacking and some games go months without any support.

MrTeal · May 20, 2015

maddie said:
I wish you good luck in presenting this and being taken seriously.

First you will be told 4 stacks is the limit and then if you show why that is wrong [per your AMD's Joe Macri statement], you will be told about complexity and size on interposer disregarding the fact that the interposer traces are many times smaller that pcb traces and 8192 can fit if needed, or you will be told that the interposer is too small, which I see you have already disproved.

The physical arrangement limit seems silly, especially since in that image of the HBM die you can see the edge of the chip in the bottom left, then some capacitors on the PCB substrate, then a 5x7mm HBM sitting on the silicon interposer next to the middle of the die. What's the spacing to the next HBM in the top left, 5mm or so? If a 550mm^2 Fiji die was 20mmx27.5mm, you could pretty easily fit three HBM along the top as 7mm+5mm+7mm+5mm+7mm = 31mm and just have them poking out the corners.

That chip could very likely just be a process test piece and not actually Fiji, but there's no real physical constraint to fitting 8 stacks around a GPU.

LTC8K6 · May 20, 2015

The lower right pic would only need 8 chips with the newer Samsung high density GDDR5 chips.

MrTeal · May 20, 2015

LTC8K6 said:
The lower right pic would only need 8 chips with the newer Samsung high density GDDR5 chips.

Not if you want to keep the 512-bit bus.

LTC8K6 · May 20, 2015

MrTeal said:
Not if you want to keep the 512-bit bus.

Well, it wouldn't be a 290X, but a new card using the new faster and higher bandwidth GDDR5 chips.

Maybe Fiji will have 4gb HBM and 8Gb GDDR5 versions.

Maybe not...

LTC8K6 · May 20, 2015

Both Micron and Samsung have managed to make the new GDDR5 chips, and I wouldn't be surprised to find them on an NV card soon.

Erenhardt · May 21, 2015

kagui said:
i think the problem is the routing
1024 pins per stack
1024 * 4 = 4096 traces

now double that and you get

8192 traces, that have to be routed in a 70mm x 70mm square

and probably also doubling power consumption

Do you even know how many "traces" we have in 20mm x 20mm square CPU? Interposer is not a PCB

[WCCF] AMD Radeon R9 390X Pictured

Golden Member

Lifer

Golden Member

Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Senior member

Lifer

Golden Member

Lifer

Senior member

Senior member

Diamond Member

Lifer

Diamond Member

Lifer

Lifer

Diamond Member