1000 shader APU - when?

Mac29

Member
Jun 2, 2010
131
12
81
Not finding anything on this, even on other sites.

Can anyone venture a guess how long it may take for AMD to make an
APU that delivers 1000 shaders? Their next iteration is supposed to still
use DDR3 so I'm thinking sometime when DDR4 is mainstream but when?


Thanks,

Mac
 

lyssword

Diamond Member
Dec 15, 2005
5,630
25
91
yeah, I think it's a memory issue, I think even 512 cores are currently limited by slow ram.
 

Blitzvogel

Platinum Member
Oct 17, 2010
2,012
23
81
And probably a die shrink or even two. Stacked memory plus twice then number of current shaders would be a huge, hot chip.

Taking the PS4 APU as a starting point, you could replace the 2 Jaguar clusters with two Carrizo modules, have the full 20 CUs or go with 16 CUs (1024 Shaders), implement 2 GB on board HBM and also have an external DDR4 interface and do it on 20 nm. It would be a compelling low end gaming APU.
 
Last edited:

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
Taking the PS4 APU as a starting point, you could replace the 2 Jaguar clusters with two Carrizo modules, have 16 CUs (1024 Shaders), implement 2 GB on board HBM and also have an external DDR4 interface and do it on 20 nm. It would be a compelling low end gaming APU.


You could put a lot of puma cores in place of a Carizzo module
 

Roland00Address

Platinum Member
Dec 17, 2008
2,196
260
126
Let put it this way there is no real world differences between the 384 shader parts and the 512 shader parts. Sure there is a small % increase in fps but the games are the same playable settings.

The reason this is the case is they are memory bandwidth starved.

Since AMD has their clusters in units of 128 shaders (384 is 128*3, 512 is 128*4) then they would not do 1000 shaders but instead 1024.

So until they can fix the memory bandwidth issue why would they increase the shaders from 384 to 1024, an increase of 8/3 aka 266%? No one would double the amount of shaders let alone near triple them till they fix the memory bandwidth issue.
 

Yuriman

Diamond Member
Jun 25, 2004
5,530
141
106
Unfortunately, building a big, complex memory bus that is connected to both the CPU and GPU is expensive. It's far cheaper/easier to give a large and powerful GPU its own memory bus, considering how different and non-overlapping the bandwidth needs of CPUs and large GPUs is. I expect that beyond the point of providing ample memory bandwidth to the CPU, APUs will always be more expensive than a discrete CPU and GPU at a given performance level, making them uneconomical until we start to see true heterogeneous computing.

So far there have been very few advantages of sticking a CPU and GPU on the same chip aside from package size.
 

SPBHM

Diamond Member
Sep 12, 2012
5,058
410
126
question, does Kaveri use similar memory bandwidth compression methods to Maxwell? because it could bring nice gains keeping the same number of ALUs and memory bandwidth...
 

jpiniero

Lifer
Oct 1, 2010
15,115
5,667
136
question, does Kaveri use similar memory bandwidth compression methods to Maxwell? because it could bring nice gains keeping the same number of ALUs and memory bandwidth...

I think AMD only introduced the memory compression with Tonga.
 

el etro

Golden Member
Jul 21, 2013
1,581
14
81
The FinFets 16nm TSMC process claims a power reduction of above 100% over the current 28nm process that Kaveri and probably Carrizo is based. So on this process you probably will see a APU with this power.
 

el etro

Golden Member
Jul 21, 2013
1,581
14
81
question, does Kaveri use similar memory bandwidth compression methods to Maxwell? because it could bring nice gains keeping the same number of ALUs and memory bandwidth...

Carrizo will surely use, Carrizo GPU will be Tonga-based.
 

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
I think AMD only introduced the memory compression with Tonga.

No, they had it before but this is a new better version. Its good but not as good as maxwell's.

R9-285 - 176 GB/sec.
980 - 224 GB/sec



27% more bandwidth 47% more fill. I think in game Nvidia's advantage is greater given that the 980 is ~75% faster in games (1080p or 1440p).

http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_980/26.html

Toms, take it or leave it.

http://www.tomshardware.com/reviews/graphics-performance-myths-debunked,3739-4.html

Essentially bandwidth utilization, blue line is when the 750 ti is normalized to the 650 ti's peformance.



Maxwell is roughly twice as efficient in terms of bandwidth than kepler.


I realize that that was a little off topic but it shows what is possible and is something that AMD needs to jump on if they want to improve their APUs. Better bandwidth utilization is possible but work needs to be done. With maxwell like memory efficiency they probably could feed twice as many shaders (768). 1024 would probably require a bigger bus or HBM.

Other ways to improve performance is to allow the igp to access cache like intel does.

http://www.notebookcheck.net/Performance-and-Scaling-Overview-of-Intel-HD-Graphics-4000.82847.0.html

HD 4000



Would be much more noticeable at AMD's levels of performance.
 

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
Isn't it 4 Jaguar/Puma cores will fit in a Steamroller module? Seems like a good trade off when you have much faster speeds in modules.

The Jaguar cores in Kabini are listed as 3.1mm2, and AMD is quoting that four of these cores will fit into a single Steamroller module. Unfortunately the dimensions of a Steamroller module are not known - a 32nm SOI Bulldozer module clocked in at 30.9 mm2 for example, but no equivalent number is available for 28nm Steamroller. However some quick math shows four Jaguar cores populates 12.4 mm2. This leaves the rest of the core for the L2 cache, IGP and a large amount of IO.
I guess the uncore will also fit into the module's footprint.
http://www.anandtech.com/show/8067/amd-am1-kabini-part-2-athlon-53505150-and-sempron-38502650-tested
 
Last edited:

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
r9-285 256-bit 5.5GHz [ http://www.anandtech.com/show/8460/amd-radeon-r9-285-review ]
gtx 980 256-bit 7GHz [ http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review ]

how did you get to the conclusion that the compression tech is better on maxwell?

I explained better colour fill. Toms 2x bandwidth efficiency (which tracks out where the 750m DDR3 is noticeably BW limited yet with the 850m DDR3 Nvidia managed to squeeze out +70% on a small BW increase (900 mhz DDR3 to 1000 mhz DDR3). The 980 also is 75% more powerful while only boasting 27% more bandwidth.

Also. Core vs. Core comparisons are nice but its really core + cache. Especially as cache designs and amount can change between designs. 4 Jaguar cores with 2 MB L2 or 2 SR modules with 4 MB L2.

 

NostaSeronx

Diamond Member
Sep 18, 2011
3,705
1,231
136
2016 APU Excavator+ - 1024 GCNX cores
2016 APU Cheetah - 1024 GCNX cores

Excavator+ = 14-nm successor to Excavator(28-nm)
Cheetah = 14-nm successor to Puma(28-nm)/Puma+(20-nm)

Zen and K12 have not been finished anything else is the usual bs from Sunnyvale.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
2016 APU Excavator+ - 1024 GCNX cores
2016 APU Cheetah - 1024 GCNX cores

Excavator+ = 14-nm successor to Excavator(28-nm)
Cheetah = 14-nm successor to Puma(28-nm)/Puma+(20-nm)

Zen and K12 have not been finished anything else is the usual bs from Sunnyvale.

look at the irony of it. you are telling that AMD is lying as if you always speak the truth. :biggrin:

AMD is yet to reveal their 2016 APU roadmap. Right now we only know their 2015 APUs - Carrizo on 28nm and Nolan (x86-64)/Amur(ARMv8) on 20nm.

On topic I expect AMD's 2016 FINFET APUs to sport a 1024 GCN 2.0 GPU with HBM.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,705
1,231
136
look at the irony of it. you are telling that AMD is lying as if you always speak the truth.
I tell what is accurate at the given time. If you don't like it, how about you go searching through PDFs, profiles, commentary, analyst interviews.





AMD has not once released a new platform architecture on time. This has been going on for um lets see.... nine years.

Any information from Sunnyvale is bad information unless they have given up. When they give up they give their work to the Mile High Design Center or the Boston Design Center.

---
We at least know for a fact Basilisk is Excavator.
 
Last edited:

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
I tell what is accurate at the given time. If you don't like it, how about you go searching through PDFs, profiles, commentary, analyst interviews.

AMD has not once released a new platform architecture on time. This has been going on for um lets see.... nine years.

Any information from Sunnyvale is bad information unless they have given up. When they give up they give their work to the Mile High Design Center or the Boston Design Center.

We at least know for a fact Basilisk is Excavator.

AMD is paying for its poor execution. But the point is AMD has a chance to rebuild their company with their 2016 CPU architectures. As for AMD's product roadmap I am not going to enter into an argument with you. You know nothing and just talk rubbish. When AMD discloses their 2016 apu roadmap to the public thats when it matters. Not some speculation of yours. :whiste:
 

III-V

Senior member
Oct 12, 2014
678
1
41
No, they had it before but this is a new better version. Its good but not as good as maxwell's.

R9-285 - 176 GB/sec.
980 - 224 GB/sec
Enigmoid, Maxwell has twice the number of ROPs. It's not totally surprising that it has higher pixel throughput.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |