Kitguru : Nvidia to release three GeForce GTX 800 graphics cards this October

Page 10 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

blackened23

Diamond Member
Jul 26, 2011
8,548
2
0
Sure thing, but RS was talking about the gains from GK110 to GM210, not GM204 over GK110/Gk104. It just isn't possible, I don't believe, to make GM210 on 28nm and get that 90-100% gain RS was discussing, due to die size limitations.

Obviously, the Gk104 to GM204 transition doesn't have that problem at 28nm, and Nvidia can increase the die size by ~150mm^2 to achieve 90-100% over GK104.

Oh. If i'm understanding right, yeah, i'd agree that GM204 will not in any way be 90-100% faster than GK110. I never thought GM204 would be that big of an increase, although such a gain is plausible for GM200 on a 20nm node. For the GM204 I'd say 15-30%, but I really don't know. But I do feel it will be an appreciable difference.

I've seen speculation running the gamut, from 15% faster than the ti for 500$ to 30% faster for 600$. Who knows. Now if it's 30%-35% faster I think the price would be 550-600$. I'm also curious if NV will introduce new software features with Maxwell 2nd gen, with Kepler they introduced a few new things software wise including adaptive vsync among other things. Or if they'll just go for performance. Shrug.
 
Last edited:

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
But if NV themselves claims 2x the performance/watt for 1st generation (28nm Maxwell 750Ti)

The gtx750 TI is indeed 2x perf/watt over the GTX 650 TI like Nvidia says, but it's not an apples to apples comparison because GTX 650 TI is not a full functioning die (it's GK106 with 1 SMX and a memory controller disabled). The most valid comparison is to look at GM107 against GK107. In this regard, perf/watt improvement is about 70-75%.

For example, if GM200 is a 250W TDP card like 780Ti is, then why shouldn't be 90-100% faster? Simply looking at NV's claimed performance/watt increase, per NV a 100W Maxwell ~ 200W Kepler in performance. We see 750Ti delivering 90-100% more performance than a Kepler card with similar power usage. If anything, 16/20nm GM200 would have > 2x the performance/watt over 28nm Kepler.

The rest of what you are saying makes sense. If GM200 is engineered to run at 250 watts like the 780 TI, then it should have the same perf/watt improvement that GM107 has over GK107, right??? I agree in theory.

The big unknown is die size and transistor count though. What resources can be squeezed into the gm200 die given the 28nm density constraint?. GM107 is 75% faster than GK107, but it took a 25% larger die size, 44% more transistors, and 15% more transistors per mm2 to realize this performance gain. On the other hand, for all intents and purposes, GK110 is as big as it can get on 28nm. Maybe Nvidia can upsize GM200 to 575mm^2, and increase GM200's transistor density 15% over GK110 (like GM107 over GK107), but then we're looking at 8.5 billion transistors, an increase of only 20% and well short of GM107's 44% increase in transistor count over it's predecessor.

It's a conundrum. Mawell is ripe with potential but there is hardly any die space left to work with. Then again, it might not be much of an issue. We all know GM107's headroom easily exceeds it's artificial TDP limit, and as I told you in our PM's I can see GM107 getting rebadged with 7ghz vram and noticeably higher clock speeds to bring up low-end performance on the 800 series cards, so it might not actually take that many more transistors over GK110 to get a 70% performance increase....
 
Last edited:

FatherMurphy

Senior member
Mar 27, 2014
229
18
81
I'm also curious if NV will introduce new features with Maxwell 2nd gen, with Kepler they introduced a few new things software wise including adaptive vsync among other things. Or if they'll just go for performance. Shrug.

Yeah, I'm hoping for the same. Perhaps the "1st Gen" and "2nd Gen" Maxwell distinction exists for a purpose, namely, to emphasize the feature differences. Or, then again, the "1st Gen" and "2nd Gen" distinction might exist simply for marketing purposes to explain/cover-up/mask the long divide between the roll out of GM107 and GM204

We know GM204 won't have HBM, right? Maybe Direct X 11.2 support (lame). Better 4k support? New SLI (something similar to the XDMA that AMD is having success with)? A return of good, voltage unlocked overlocking (seems unlikely)?

Like you said... shrug!
 
Last edited:

Mand

Senior member
Jan 13, 2014
664
0
0
So how plausible is this theory?

X99 and Haswell-E are just around the corner, aiming for mid September. People are going to be upgrading to it, people are going to be building entire new rigs around it. They're going to want graphics cards. Doesn't it make sense for Nvidia to time its cards around their release, to snap up the high-end market? Without a new Nvidia card, the people building X99s are going to go with the usual mix of Nvidia and AMD. But if there was a clear winner in a new card, that would dramatically shift for that initial buy-in phase of X99.

It's not as if Nvidia hasn't known when X99 was launching. Is it plausible that they're timing their launch accordingly, rather than just a calculation of when they would need to compete with an AMD launch?
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
Oh. If i'm understanding right, yeah, i'd agree that GM204 will not in any way be 90-100% faster than GK110. I never thought GM204 would be that big of an increase, although such a gain is plausible for GM200 on a 20nm node. For the GM204 I'd say 15-30%, but I really don't know. But I do feel it will be an appreciable difference.

The comparison was GM204 90% faster than GK104, which would make it approximately 34% faster than GK110. In turn GM200 would also be 90% faster than GK110. I don't think anyone expects a 28nm GM204 to beat GK110 by 90-100%.

The gtx750 TI is indeed 2x perf/watt over the GTX 650 TI like Nvidia says, but it's not an apples to apples comparison because GTX 650 TI is not a full functioning die (it's GK106 with 1 SMX and a memory controller disabled). The most valid comparison is to look at GM107 against GK107. In this regard, perf/watt improvement is about 70-75%. .

Ok but we are comparing performance/watt, not performance per transistor or performance per functioning units. Even if GTX650Ti was 33% or even 50% cut down version of some mythical chip, we are strictly looking at performance for 60W parts which for Kepler corresponds to W CUDA cores, X ROPs, Y TMUs, and Z memory bandwidth . If GK107 was fully enabled, it would use > 60W which doesn't help us compare apples-to-apples.

We may not even need to know the specs of GM204. If it has a 195W TDP, it should be 90-100% faster than GK104 since Maxwell architecture within a 195W power envelope is 2x faster than Kepler is in 195W. So why would be cut that efficiency down from 90% to 70-75%?

To get around your dilemma for GM200 vs. GK110 and how to solve the die size and transistor density issue, I think NV will launch GM200 on 16/20nm not on 28nm. That's why they are launching GTX880 now and then another flagship in say 12-15 months, more or less repeating the 680->780Ti move and allowing 20nm yields to mature and wafers to fall in price. Then once GM200 launches on 20nm, GM204 will be refreshed as GTX970 on that node and launched at say $349-399.
 
Last edited:

FatherMurphy

Senior member
Mar 27, 2014
229
18
81
The comparison was GM204 90% faster than GK104, which would make it approximately 34% faster than GK110. In turn GM200 would also be 90% faster than GK110.

Right. As discussed above, there are serious die size impediments to implementing such gains in any actual 28nm GM200. Agreed?
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
Ok but we are comparing performance/watt, not performance per transistor or performance per functioning units. Even if GTX650Ti was 33% or even 50% cut down version of some mythical chip, we are strictly looking at performance for 60W parts which for Kepler corresponds to W CUDA cores, X ROPs, Y TMUs, and Z memory bandwidth . If GK107 was fully enabled, it would use > 60W which doesn't help us compare apples-to-apples.

It is of my opinion that the best way to make comparisons of this type is to do so with full-functioning chips and also with chips that are the inherent successors or predecessors. According to TPU, GK107's gtx 650 power usage is almost identical to gtx 750 TI. Both parts are fully functioning dies. GM107 is 70-75% more efficient than the chip it replaced - GK107, so therefore it stands to reason IMO that GM204 should be 70-75% more efficient than GK104. But if you think I'm being conservative or making the wrong comparisons, and it turns out you're right, then hooray for us all!

To get around your dilemma for GM200 vs. GK110 and how to solve the die size and transistor density issue, I think NV will launch GM200 on 16/20nm not on 28nm. That's why they are launching GTX880 now and then another flagship in say 12-15 months, more or less repeating the 680->780Ti move and allowing 20nm yields to mature and wafers to fall in price. Then once GM200 launches on 20nm, GM204 will be refreshed as GTX970 on that node and launched at say $349-399.

Nvidia may view Knights Landing as a significant threat and may not want to wait until finfets are ready for volume production to replace GK110. In this regard, it makes sense to get a successor out before Knights Landing and steal some would-be potential customers from Intel.

But if you are right, and GM200 isn't coming on 28nm, then I do believe GM204 will be more than 15-20% faster than GTX 780 TI.
 

Mand

Senior member
Jan 13, 2014
664
0
0
We may not even need to know the specs of GM204. If it has a 195W TDP, it should be 90-100% faster than GK104 since Maxwell architecture within a 195W power envelope is 2x faster than Kepler is in 195W. So why would be cut that efficiency down from 90% to 70-75%?


The question to be raised by this is whether the efficiency scaling is maintained with increasing die size. I don't know enough to know whether it would or not, but Nvidia's stated goals for Maxwell are efficiency and scaling. They've demonstrated one already, and I don't think we have a rational justification for believing they'll fail at the second.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
OK I will do the Nostradamus once again, GM204 may have the following specs at 28nm and ~430mm^2 die size.

5.5B to 6B transistors

4x GPC
5x SMM per GPC
128 Stream Processors per SMM (2560 Total)
8x Tex Units per SMM (160 total)

256bit memory
4x 72bit memory channels
4x 8 ROPs (32 Total)
4x 1024kb L2 Cache (4MB Total L2 Cache)

Unless they have decoupled ROPs from the Memory Channels and have 6x ROPs with 3MB L2 cache. But i find it unlikely for GM204.

Edit 2: 4MB L2
 
Last edited:

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
The comparison was GM204 90% faster than GK104, which would make it approximately 34% faster than GK110. In turn GM200 would also be 90% faster than GK110. I don't think anyone expects a 28nm GM204 to beat GK110 by 90-100%.



Ok but we are comparing performance/watt, not performance per transistor or performance per functioning units. Even if GTX650Ti was 33% or even 50% cut down version of some mythical chip, we are strictly looking at performance for 60W parts which for Kepler corresponds to W CUDA cores, X ROPs, Y TMUs, and Z memory bandwidth . If GK107 was fully enabled, it would use > 60W which doesn't help us compare apples-to-apples.

We may not even need to know the specs of GM204. If it has a 195W TDP, it should be 90-100% faster than GK104 since Maxwell architecture within a 195W power envelope is 2x faster than Kepler is in 195W. So why would be cut that efficiency down from 90% to 70-75%?

To get around your dilemma for GM200 vs. GK110 and how to solve the die size and transistor density issue, I think NV will launch GM200 on 16/20nm not on 28nm. That's why they are launching GTX880 now and then another flagship in say 12-15 months, more or less repeating the 680->780Ti move and allowing 20nm yields to mature and wafers to fall in price. Then once GM200 launches on 20nm, GM204 will be refreshed as GTX970 on that node and launched at say $349-399.

Looking back over some of the various reviews (I tend to go to techpowerup because the graphs are very convenient) it seems like TPU's review showed GM107 in a lesser light than places like anandtech. So I may very well be giving worst case scenario predictions.
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
OK I will do the Nostradamus once again, GM204 may have the following specs at 28nm and ~430mm^2 die size.

5.5B to 6B transistors

4x GPC
5x SMM per GPC
128 Stream Processors per SMM (2560 Total)
8x Tex Units per SMM (160 total)

256bit memory
4x 72bit memory channels
4x 8 ROPs (32 Total)
4x 512MBs L2 Cache (2MB Total L2 Cache)

Unless they have decoupled ROPs from the Memory Channels and have 6x ROPs with 3MB L2 cache. But i find it unlikely for GM204.

Your specs are as good a guess as it can get, except it'll be 4 x 64 bit memory channels.
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
The question to be raised by this is whether the efficiency scaling is maintained with increasing die size. I don't know enough to know whether it would or not, but Nvidia's stated goals for Maxwell are efficiency and scaling. They've demonstrated one already, and I don't think we have a rational justification for believing they'll fail at the second.

Agreed. Kepler scaled just fine in efficiency as the cores got bigger. GK110 was as efficient as GK104, which was as efficient as GK107.
 

Keysplayr

Elite Member
Jan 16, 2003
21,209
50
91
OK I will do the Nostradamus once again, GM204 may have the following specs at 28nm and ~430mm^2 die size.

5.5B to 6B transistors

4x GPC
5x SMM per GPC
128 Stream Processors per SMM (2560 Total)
8x Tex Units per SMM (160 total)

256bit memory
4x 72bit memory channels
4x 8 ROPs (32 Total)
4x 1024kb L2 Cache (4MB Total L2 Cache)

Unless they have decoupled ROPs from the Memory Channels and have 6x ROPs with 3MB L2 cache. But i find it unlikely for GM204.

Edit 2: 4MB L2

Why would you think GM204 has ECC capability?
 

Keysplayr

Elite Member
Jan 16, 2003
21,209
50
91
Professional GPUs will, they use the same dies for Quadro cards.

GK104 has ECC in Quadro cards.

Good point, but for this arguments sake, best to say 4x 64bit registers as the gaming version of this GPU, which is what we are talking about, will ever only utilize such. 288-bit bus just doesn't sound right. hehe.
The rest of your specs I believe could be spot on though. I thought 2560 CUDA cores sounded about right.
 
Last edited:

Fastx

Senior member
Dec 18, 2008
780
0
0
Fwiw


Nvidia GeForce GTX 800: what to expect from its specs?
8-8-2014

Facts we know for (almost) sure

While there are not a lot of reliable facts from sources close to nvidia available at the moment, here is what we have found so far about the gm204 and what either is trustworthy or, at least, looks trustworthy:

·Nvidia gm204 is based on the code-named maxwell architecture;
·Nvidia gm204 is made using 28nm process technology (given that multiple sources said that the gm204 is a 28nm gpu, it is most probably a fact);
·Nvidia gm204 has die size of around 300mm² (we’ve seen the chip, we’ve discussed that performance gpus from nvidia feature about 300mm² die size);
·Nvidia gm204 most likely 256-bit memory bus (we’ve seen a gm204-based card, it features 16 gddr5 memory chips (with 16-bit or 32-bit interfaces), hence, given the size of the chip, 256-bit memory bus is a more likely option);
while we have no idea about exact specifications of the gm204, we do know specifications of the gm107 thanks to the official launch of the geforce gtx 750-series gpus earlier this year and hence we know something about the maxwell architecture in general.

Maxwell architecture

Nvidia’s graphics processors consist of several key building blocks: Main scheduler/dispatch processor (which nvidia calls gigathread engine), gpcs [graphics processing clusters], smms [streaming multiprocessor module], rops [raster operating units], cache, memory controllers a block diagram of a maxwell smm according to nvidia, thanks to better programmability of each block within an smm and higher utilization rate of each stream processors, the company increased the peak performance per stream processor by 35% when compared to the sps in its chips built with the previous generation kepler architecture. Therefore a maxwell-based gpu with equal (or more or less equal) amount of stream processors with a kepler-based gpu will perform around 35 per cent better.Nvidia’s gm107 graphics processing unit features 640 sps, 40 tus, 16 rops, 2mb cache and a 128-bit memory controller. The chip contains 1.87 billion of transistors and has 148mm² die size.

What nvidia needs from gm204?

Since the gm204 is a successor to the gk104 graphics processor (geforce gtx 670, 680, 760 and 770) made using the same 28nm process technology, nvidia cannot really expect it to outperform its predecessor by two times or something like that. Realistically, nvidia needs to tangibly outperform the best gk104 by around 30 to 40 per cent. It is impossible that a 300mm² maxwell chip could beat the gk110 gpu with 2880 stream processors (which die size is 551mm²), hence, it is not a target that the gm204 should achieve.

Possible gm204 configurations

Each architecture can be scaled to offer better performance or lower power consumption. Let’s try to assume how nvidia could scale the gm107’s architecture to offer better performance with the gm204 while maintaining around 300mm² die size.The easiest way to double performance of the gm107 is to double the amount of virtually everything inside: Gpc, rops, cache, memory controllers. Such a chip would have 1280 stream processors, 80 texture units, 32 rops and a 256-bit memory bus. The gpu would never outperform the gk104 (1536 sps, 128 tus, 32 rops, etc.) significantly and in many cases will be behind the older solution.

Therefore, it is unlikely that nvidia will take this route. Moreover, such a chip would be significantly smaller than 300mm² since not all elements of the chip have to be doubled. What nvidia did with gk110 compared to the gk104 architecturally (at least when it comes to organization of execution units) was the increase of the amount of sms [streaming multiprocessor] per gpc from two to three. The same approach could be used for the gm204 too. Nvidia could expand gpc to six smms and then double the amount of gpcs (and rops, cache, memory controllers) per chip. Such a gpu would feature 1536 sps, 96 tus, 32 rops as well as a 256-bit memory bus and could offer up to 35 per cent higher performance compared to the gk104 at the same clock-rate. The die size of such chip would probably be around 300mm², but since we do not know anything about exact sizes of the gm107’s elements, we cannot be 100 per cent sure.

Theoretically, nvidia could put six blocks into each smm (thus increasing the amount of sps per smm to 192), then double the amount of gpcs (along with rops, cache, memory controllers) per gpu. If nvidia manages to do this, then the final chip would feature 1920 sps, 120 tus, 32 rops and a 256-bit memory controller. Would that all fit into a 300mm² die area? Possibly. However, since this fundamentally changes the architecture of the smm, this could also affect efficiency of maxwell architecture in general. Therefore, the expansion of gpc to seven smms (and doubling the amount of gpcs, etc.) sounds more realistic (the gpu would have 1792 sps, 112 tus, 32 rops, 256-bit memory bus).

Final words

While chip designers can do many unexpected things and sometimes even wonders, they cannot overrule the laws of physics. Engineers will not be able to squeeze 3200 stream processors into the gm204 that is made using 28nm process technology and has die size of around 300mm². Therefore, the most logical configurations of the fully-fledged gm204 (the geforce gtx 880) are as follows:
· 1536 sps, 96 tus, 32 rops, 256-bit memory bus
· 1792 sps, 112 tus, 32 rops, 256-bit memory bus
· 1920 sps, 120 tus, 32 rops, 256-bit memory bus
http://www.kitguru.net/components/g...hat-to-expect-from-its-performance-and-specs/
 
Last edited:

Pinstripe

Member
Jun 17, 2014
197
12
81
I concur. As a GK104 successor build on 28nm that is supposed to be affordable, don't expect a GTX 780 Ti slayer. That's just unrealisitc.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
Kit Guru's entire analysis is based on the assumption that GM204 is only a 300mm2 28nm die. Also, they state that GM204 only needs to be 30-40% faster than 680 but what is that based on? They themselves state that GM204 is a successor to GK104 but GK104 increased GF104's performance by 90%, not 30-40% (560Ti--> 680). NV already said they are achieving 2x the perf/watt increase so why would a 680 successor be only 30-40% faster?

They also state that no way will a GM204 outperform a 780Ti but if so the 880 name would be one of the most confusing names in NV's history. That would be akin to 280 being slower than 8800GTX or 480 slower than 285? Very confusing indeed.

Another point: if per Kit Guru, GM204 880 won't even beat 780Ti, and sooner or later we can expect that some 500-600mm2 chip (aka GM200) would come out, surely GM200 would be 50-60% faster than 780Ti at minimum? If so, is NV going to fill in all the gaps in performance between GM204 and GM200 with just lower clocked/cut down GM200 parts? That would result in very expensive SKU chips from 780Ti (100%) to GM200 (160%) assuming GM204 is at least $400-$450, unless when NV released GM200 (130% SKU) and 150-160% SKU, they drop 880 another $100 or are we going to see increase in prices yet again?

Finally, if 880 is slower than 780Ti, then all of a sudden it won't look that hot against a 1 year old $360-370 after-market 290 or 1.5 year old $450 780Ghz edition cards. Sure, it will have lower power consumption and probably decent overclocking room, but by now since 780Ti can often be found for $600, surely we should have a next gen product for $450, at least as fast if not faster than 780Ti. From a purely technological point of view, sure a 300mm2 Maxwell beating 680 by 30-40% is a good accomplishment but in terms of time that has passed since 680 launched (2.5 years), it would be underwhelming.

I sure hope KitGuru's prediction is wrong and that 880 is a 400-430mm2 chip that's at least 15-20% faster than 780Ti. Otherwise a card that's only as fast as a year old 780Ghz for $400 is hardly exciting for a next gen product. However, since KitGuru claims they have seen the actual GM204 880 chip and it's only a 300mm2 die, this is not looking like the exciting launch many of us expected.

EDIT: KitGuru's 300mm2 die size is based on this card, the very card which die size was estimated at around 430mm2. I don't believe their projections.
 
Last edited:

toyota

Lifer
Apr 15, 2001
12,957
1
0
please explain the 72 bit memory controller. I have never heard of even a professional card having 288 bit or 432 bit memory bus.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
Only 64bits out of 72bits are always counted, the remaining 8bit is only for check. Both ECC and Non-ECC memory bandwidth is the same. But the controller is 72bits wide for ECC support
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |