Kabini Rumors

Page 16 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
Automated processes enable the scope and the schedule to do things that would not otherwise be feasible within the given cost envelope.

Isn't this like the subpar material you use in an engineering project, but one that allows *much* lower costs or time to build? I can see where this is going, and I agree with you that synthesis will play a much bigger role in the lagging edge of the market.

But what about the bleeding edge? When you have Qualcomm, Apple or Intel R&D budget, why wouldn't you go to a mixed mode, meaning crafting by hand a few critical parts and leave simpler things (cache, crossbars) to synthesis tools.

Or did I get you wrong and do you think that somehow even bleeding edge players will rely on synthesis tools for their designs?
 

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
Let's be realistic here. Jaguar is a nice improvement over Bobcat, but it is just that: an improved Bobcat. That is, an affordable low-power x86 CPU with adequate performance. It will be vastly better for FP and multithreaded workloads, but single-threaded integer it will be a 35-40% improvement. It certainly is no Wunderwaffe which will magically make AMD competitive in the enthusiast market again. Its competition is Silvermont Atom and ARM, not Ivy Bridge or Haswell.

I'd argue it will certainly go toe to toe with the lower end IBs and Haswells- the Celerons and Pentiums of the world. (Probably not the i3s, and definitely not the i5s and i7s.) But generally speaking, yeah, it's a "good enough" cheap chip aimed at the budget market.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Isn't this like the subpar material you use in an engineering project, but one that allows *much* lower costs or time to build? I can see where this is going, and I agree with you that synthesis will play a much bigger role in the lagging edge of the market.

But what about the bleeding edge? When you have Qualcomm, Apple or Intel R&D budget, why wouldn't you go to a mixed mode, meaning crafting by hand a few critical parts and leave simpler things (cache, crossbars) to synthesis tools.

Or did I get you wrong and do you think that somehow even bleeding edge players will rely on synthesis tools for their designs?

I think in time, as the tools and the computers running the tools get faster and better, you will see even the bleeding edge migrate to fully-synthesized designs. It is just a matter of time IMO.

In the interim, of course hand-design is going to beat the machines (until the machines get better) and provided you have the cash to fund hand-design teams and timelines then it will continue to be in your best interest to do so.

If you step back and think about it, it really is no different than the fundamentals that go into the decision to remain an IDM for yet-another-node-cycle versus throwing in the towel and becoming fabless (or starting out that way).

Eventually the complexity of the chip, and the margins to come from producing it, drives a cost structure that pretty much requires you to heavily rely on synthesis (or foundries in my analogy) in order for you to remain cost-competitive as well as performance-competitive.

Foundries exist because the barrier-to-entry for the development and production costs of new nodes rises precipitously with every node.

The same is true of designing IC's for those new nodes, and synthesis lowers that barrier-to-entry the same as being fabless and letting TSMC do all the heavy-lifting for you.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Exophase, the Atom chip shares some of the Integer SSE operations with the FP ports.

So do a lot of CPUs. There's a good reason for this: very little software uses both integer and floating point SIMD simultaneously. When talking about dual-issue the bottlenecks would usually be elsewhere even if they were.

Also, the Linpack results show 0.4GFlops for the single core 1.6GHz Atom while a dual core E-350(1.6GHz) gets 2.4GFlops. Doubling cores on Atom should get 0.8GFlops, which is still only 1/3rd the result. Linpack is a good benchmark for isolating pure FPU power.

Atom has two major FP problems that reek like bugs, Linpack is exposing at least one and possibly two of these:

1) x87 performance sucks, specifically there's this inexplicable extra stall when you issue them back to back
2) double precision SSE performance sucks, works as if it's unpipelined

This was with old Atoms, it's possible Saltwell fixed either or both of these things.

In the real world almost no one will be using double precision on these CPUs and no one should be using x87. Suffice it to say that Linpack is a poor proxy for real world performance in this scenario.

70% is again for best case scenarios.

Not at all, you can find real world tests that show > 100% better IPC for Bobcat. I listed a few in the post I made earlier. There were also cases where it was < 50%.

Average gain of E-350 versus similar clocked Atom is only 10-20% better. Hyperthreading is best case, 35-40% faster(coincidentally in the application where the gap is greatest with Bobcat), while we see places where there's almost no gain at all(http://www.tomshardware.com/reviews/Intel-Atom-Efficient,1981-13.html).

Ironically, you yourself posted a test where it gained over 50%. http://forums.anandtech.com/showthread.php?t=163947

And this is a real world test, nothing synthetic or especially bizarre. You've really got to be careful when talking about best case scenarios, only takes one incidence to disprove it

It would be great if someone did an exhaustive set of comparisons for Atom like was done here http://ixbtlabs.com/articles3/cpu/archspeed-2009-4-p1.html All I can really say is that it's a good rule of thumb that where HT helps on a Nehalem it'll help a lot more on Atom..

Here's one more figure, Geekbench on Medfield:

http://browser.primatelabs.com/geekbench2/1060511

This is useful because it's 1C/2T, and because there are mostly single threaded + multi-threaded versions of the test. So it's pretty close to a measurement of HT vs no-HT. This is what it shows:

Integer tests:
Blowfish: 64.3%
Text compress: 27.5%
Text decompress: 36.1%
Image compress: 43.4%
Image decompress: 48.5%
Lua: 38.2%

FP tests:
Mandelbrot: 88.9%
Dot product: 83.2%
LU decomposition: 1.3%
Primality test: 19.1%
Sharpen image: 68.7%
Blur image: 65.8%

Much bigger win for FP than integer, probably because FP operations have higher latencies that are difficult to schedule around in Atom code, especially 32-bit Atom with only 8 xmm registers. But even for integer calling out a 40% best case is low-balling it.
 

SocketF

Senior member
Jun 2, 2006
236
0
71
Nope, that is for Excavator. It's just proof of concept, EX will have different FP coprocessor (more efficient and not just thanks to smaller die area dedicated to it).
Given AMD's recent markteting re-labelling activities (Trinity 2.0 is called Richland 1.0), and consindering that Excavator was just added to the BD-roadmap not that long ago, I assume that EX is nothing else than SR 2.0.

However, Kaveri is half a year late, i.e. they had to make another revision, thus I would say, that the chances are good, that these "new" Kaveris which will be sold later this year will already have these high-density-FPU.

According to an old roadmap Excavator was planned for 2014. That would fit, too, AMD certainly tries to rush things now because of Kaveri's initial delay.

But lets wait and see, I just speculate.
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
Given AMD's recent markteting re-labelling activities (Trinity 2.0 is called Richland 1.0), and consindering that Excavator was just added to the BD-roadmap not that long ago, I assume that EX is nothing else than SR 2.0.

Excavator was always in the Bulldozer roadmap, it was supposed to be the fourth generation modular design of them. In fact, it was supposed to be a the first comprehensive redesign of the architecture.
 

inf64

Diamond Member
Mar 11, 2011
3,763
4,221
136
Excavator was always in the Bulldozer roadmap, it was supposed to be the fourth generation modular design of them. In fact, it was supposed to be a the first comprehensive redesign of the architecture.
Yep EX was on the roadmap since 2010 or 2011 (don't recall exactly which year out of those two). AMD presented a multi-year BD-derived roadmap and EX core is a BD derivative (although radically improved).
 

inf64

Diamond Member
Mar 11, 2011
3,763
4,221
136
AMD will support ASF but the question is when. Also it's very likely we will see AVX2 support with EX core and a proper one (read 2 full 256bit capable pipelines per FlexFX unit).
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136





Is it me or graphics performance is to low ??? If they need 25W TDP with 28nm to have 2x the graphics performance from Brazos 2.0 18W TDP then they blow it big time.
 

SocketF

Senior member
Jun 2, 2006
236
0
71
Excavator was always in the Bulldozer roadmap, it was supposed to be the fourth generation modular design of them. In fact, it was supposed to be a the first comprehensive redesign of the architecture.

Yep EX was on the roadmap since 2010 or 2011 (don't recall exactly which year out of those two). AMD presented a multi-year BD-derived roadmap and EX core is a BD derivative (although radically improved).
No, you forgot the very first BD-roadmap:


Steamroller is the "next generation", I don't expect a major redesign for Excavator, reasons see above.

Excavator is just labeled with "greater performance", sounds even blurrier than Piledriver's capabilities ( "Improved IPC and frequency". )
 
Last edited:

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
That GX415 is a 15W part.

Yes, i was talking about the GX420 which is 25W. In order to have 2x the iGPU performance over 40nm Brazos 18W TDP they need a 28nm 25W TDP SoC. WTF ???

I may have an explanation but im waiting to have more information. Ill give a clue......85c temps.
 

inf64

Diamond Member
Mar 11, 2011
3,763
4,221
136
G-T56N : 2x1.6Ghz Bobcat cores, 500Mhz GPU base freq., 80 VLIW5 SP units
GX-415GA: 4x1.5Ghz Jaguar cores, 500Mhz GPU freq. , 128 GCN SP units

So clock on GPU stays the same, unit count goes up by 60%. GCN brings around 24% efficiency speedup versus VLIW5 according to techpowerup's tests. All summed up: 1.6x1.24=1.98 or practically 2x better GPU performance when the workload is GPU bound- an ideal case scenario.

edit:
@ SocketF

If it wasn't on the official roadmap it doesn't mean it wasn't being worked on . AMD just gave 3 year roadmap at that time. One year later they added EX core to the roadmap as SR successor. And as for your "IPC+freq." claim for PD and missing the same one for EX part, I can say look at the SR core note . It only states "greater parallelism" but this core will be much better than BD or PD are today. We will most definitely have IPC improvements on top of the improvements AMD cites in workloads that stress shared HW. They explicitly stated single core execution improvements in SR core presentation. So the lack of mentioning this on the roadmap does not mean it's not there. Same goes for EX which is drawn as the biggest improvement gen-to-gen if you look at the chart.
 
Last edited:

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
G-T56N : 2x1.6Ghz Bobcat cores, 500Mhz GPU base freq., 80 VLIW5 SP units
GX-415GA: 4x1.5Ghz Jaguar cores, 500Mhz GPU freq. , 128 GCN SP units

So clock on GPU stays the same, unit count goes up by 60%. GCN brings around 24% efficiency speedup versus VLIW5 according to techpowerup's tests. All summed up: 1.6x1.24=1.98 or practically 2x better GPU performance when the workload is GPU bound- an ideal case scenario.

I see only 19,33% higher iGPU performance from the above slide in 3D Mark 2006. I was expecting more from 128 GCN cores, dont you ??

 

inf64

Diamond Member
Mar 11, 2011
3,763
4,221
136
I see only 19,33% higher iGPU performance from the above slide in 3D Mark 2006. I was expecting more from 128 GCN cores, dont you ??

Read the slide better , it's the average from two tests: 3dmark06 AND Passmark 2D performance test. I suspect that performance difference in 2D workloads is a lot less between the two than it is in actual 3D rendering like 3dmar06. And even in that one the CPU plays a big role and the benchmark is notorious for this . Since the GPU subtests favor IPC and clock on CPU subsystem , Jaguar does have higher (integer) IPC but has 0.93x the clock so the total effect is that JG core has slight advantage (1.15x0.93=1.07).
It would help a lot if we could know what was the exact score for Jaguar based SOC in 3dmark06 and not just the average of 3dmark and useless 2D benchmark that got jammed in there. I expect that the difference in just 3dmark would be >80%.
 

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
Yes, i was talking about the GX420 which is 25W. In order to have 2x the iGPU performance over 40nm Brazos 18W TDP they need a 28nm 25W TDP SoC. WTF ???

I may have an explanation but im waiting to have more information. Ill give a clue......85c temps.

Don't forget there's a ~4W FCH with Brazos as well, so that 18W is more like 22W.
 

inf64

Diamond Member
Mar 11, 2011
3,763
4,221
136
I found this older post on SA board. It's about AMD's numbers from footnotes they gave when they paper launched Kabini and Temash. The tablet parts are having comparable clocks (JG vs BC) while GPUs have similar ratios when it comes to clock and SP count. In 3dmark vantage which is a better GPU test than 3dmark06, Temash(A6-1450, qc @1Ghz, GPU @ 300Mhz base and 128SP) is more than 2x faster than comparable Bobcat(z60,DC @ 1Ghz,GPU @ 275Mhz and 80SP).
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Don't forget there's a ~4W FCH with Brazos as well, so that 18W is more like 22W.

If you consider the power savings due to integration (no more high speed off chip interconnect connecting the two) and what you get from the shrink, the contribution of the south bridge sort of stuff should be a far smaller part of the 25W TDP.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
I listed a few in the post I made earlier.

You mean the BOINC one? You said it yourself that the average gain was nowhere near that benchmark.

I remember what I said, and you have your point there. But the original discussion started because we wanted to see how much Silvermont needs to equal Bobcat right?

The thing is 35-40% average gain would do it(we wouldn't need 60-70%), because while there are indeed scenarios where the Bobcat is lot faster than that compared to Atom, its those exact scenarios where when they designed it back in 2008 was lot weaker relative to other CPUs and is something that is a real low hanging fruit(provided they execute on Silvermont). I admit I can't conclusively find evidence of Bobcat being superior than Atom on FP(although the differences like not having shared FP on Bobcat might favor it in few low level benmarks). That was probably an error on my part.

You can find it in many scenarios like Intel with graphics a) where they may not have been too far off in average but there were cases where its lot lower than average, like with earlier Gen units having absymal geometry performance. Subsequent generations fixed that(and that's what really throws people off rather than claims of average gains). Or like AVX on Sandy Bridge vs Nehalem, on those specific applications that take advantage of AVX, Sandy Bridge would show greater than average gain(or conversely, Nehalem would be a lot weaker than average). Even smaller advances like Northwood focused on areas where it was especially weak against AMD competition.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
You mean the BOINC one? You said it yourself that the average gain was nowhere near that benchmark.

No, I actually meant this one..

http://forums.anandtech.com/showpost.php?p=34920323&postcount=160

Sorry, it wasn't even in this thread. Starting to lose track big time

The claim (that Bobcat had something like 85% higher IPC than Atom) surprised me too, so I wanted to go find some more data on it. I was surprised to see that in true single-threaded tests Bobcat really did often have a huge IPC lead that was on average in the 70s, but varied tremendously. That lead also tended to evaporate to a in well threaded tests due to HT.

What's really throwing me off is that I generally see Cortex-A15 bench close to Bobcat at the same clock (but can use more information), but little I've seen shows A15 with a 70+% perf/MHz lead over Atom.

The thing is 35-40% average gain would do it(we wouldn't need 60-70%), because while there are indeed scenarios where the Bobcat is lot faster than that compared to Atom, its those exact scenarios where when they designed it back in 2008 was lot weaker relative to other CPUs and is something that is a real low hanging fruit(provided they execute on Silvermont). I admit I can't conclusively find evidence of Bobcat being superior than Atom on FP(although the differences like not having shared FP on Bobcat might favor it in few low level benmarks). That was probably an error on my part.

So I guess what you're saying is that you expect Silvermont to be all around more like Jaguar. And in the situations where it's weakest (and when HT boosts it the most) it'll gain the most. There could be some merit to that. That 20-30% estimation, if true, could have a huge variability of its own.

But until Intel announces more I don't really want to jump to conclusions.

What seems really pertinent to me is how close Jaguar can turbo a single core up to 2GHz in much lower TDP scenarios. Because even the 2.7GHz Silvermont isn't going to be in the power range the 25W Jaguar is, I don't think. And it's not reassuring that AMD hasn't been marketing turbo speeds, and has talked about Temash being 1GHz in tablet mode and 1.4GHz docked. It should definitely be up to 1.4GHz in either. If it's 1GHz max when undocked then it's in serious trouble, that'll barely compete with Clover Trail in CPU perf, never mind Silvermont.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
I found this older post on SA board. It's about AMD's numbers from footnotes they gave when they paper launched Kabini and Temash. The tablet parts are having comparable clocks (JG vs BC) while GPUs have similar ratios when it comes to clock and SP count. In 3dmark vantage which is a better GPU test than 3dmark06, Temash(A6-1450, qc @1Ghz, GPU @ 300Mhz base and 128SP) is more than 2x faster than comparable Bobcat(z60,DC @ 1Ghz,GPU @ 275Mhz and 80SP).

Ahh yes, forgot about that thanks.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |