Kabini Rumors

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

inf64

Diamond Member
Mar 11, 2011
3,763
4,221
136
L3 will definitely be there in server parts . For APUs L3 makes zero sense, especially if they are really going for 6T parts. 6T/512SP SR based APU will be the sweet spot on mainstream/performance desktop, L3 cache won't make any difference there (except it would blow up the die size considerably which is of course bad).
 

Abwx

Lifer
Apr 2, 2011
11,166
3,862
136
. For APUs L3 makes zero sense, especially if they are really going for 6T parts. 6T/512SP SR based APU will be the sweet spot on mainstream/performance desktop, L3 cache won't make any difference there (except it would blow up the die size considerably which is of course bad).

The shrink to 28nm just allow to add ONE module and 128SPs
in the same die area as Trinity so they will surely go 3M/6T ,
hence the non fully functionnal parts will be rebadged as 4C ,
there likely wont be 2C offerings anymore.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
I currently believe that the great bottleneck is the narrow 128bit connection of each module to the XBAR.

The reason for this is the performance benefits by overclocking the NB-clock. Performance in some single thread tasks or games are veery nice.

I also checked the decode rate and as soon as you go out of the L2 it is really baad ... less than 1 op per clock on average.

I'm skeptical. The L1 (64KB) + L2 (2MB) icache miss rate has got to be real close to zero for most applications. If your real hot code exceeds 2MB (or even 1MB with two uncorrelated programs) then you're probably running something very weird. Furthermore, they'd have to then fit in L3.

I looked around for NB overclocking tests, found this:

http://www.rage3d.com/reviews/cpu/amd_fx_8150/index.php?p=8

Maybe you have others? The big problem with this comparison is that the OC2 tests don't just involve a 9% northbridge overclock but a 50% increase in DRAM clock speed. Which is reflected directly in Sandra's synthetic memory bandwidth test.

Throughout the rest of the review you see a similar pattern that you tend to with a big shift in DRAM speeds - very little increase on most programs and a select small class that see a big boost. Ones that probably work very linearly through big datasheets like decryption. Other than that, a few around 5% but mainly under 2%.

The thing is, if you're bandwidth bound to main memory you're not going to see an improvement just by increasing core to L3 bandwidth.

AFAIK Onion is dead since Trinity (only used in Llano), but who cares. As long as that 256bit info is correct, I am happy ^^

Did you happen to see the leaked PS4 diagrams that reference Onion and Garlic (and "Onion+") buses? Could be that an evolution of the design made it into Kabini.. or could have been specific for Sony's needs.
 

Piroko

Senior member
Jan 10, 2013
905
79
91
L3 will definitely be there in server parts . For APUs L3 makes zero sense, especially if they are really going for 6T parts. 6T/512SP SR based APU will be the sweet spot on mainstream/performance desktop, L3 cache won't make any difference there (except it would blow up the die size considerably which is of course bad).
Is that 6T / 3 module info official or just guess? I would think with a 20% performance uplift (through ipc and/or clocks) in cpu performance and a sizeable performance uplift in GPU that chip would do fine without a third module.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Is that 6T / 3 module info official or just guess? I would think with a 20% performance uplift (through ipc and/or clocks) in cpu performance and a sizeable performance uplift in GPU that chip would do fine without a third module.

Well they could use the marketing points of having a 3 module APU, especially if it's just another $20-30 over the top 2 module.
 

Abwx

Lifer
Apr 2, 2011
11,166
3,862
136
Actualy it has less to do with marketing a 3M/6C than ensuring
that 28nm forcibly low yield at start will still allow to have
at least 2M/4C for the worst die and thus higher ASPs.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
The first mentioned 28SHP at their GTC in 2011:

http://www.brightsideofnews.com/news/2011/9/11/rumors-14nm-node-and-450mm-wafers-by-2015.aspx

However, since then it more or less vanished, on GF's 28nm website:
http://www.globalfoundries.com/technology/28nm.aspx

.. it wasnt mentioned at all. So people thought it is gone, but then it re-appeared suddenly in February.

There is also the question if SHP == PD-SOI, but I don't see what else it could be. FD-SOI is no candidate, due to its LP-base.

SHP doesn't necessarily mean PD-SOI, TSMC's doesn't. SHP is whatever process GF wants to affix that label to.

To my knowledge though IBM had zero interests in developing an SOI-based half-node for the fab club. It may surface again at 20nm, but I'm 99% confident it isn't in the mix for 28nm.

Thanks, sounds reasonable, however, I am missing the potential of higher ASPs, because the SOI products will clock higher.

At least if you have high-priced products. Not sure what TI was fabbing back then, but I guess it was rather cheap stuff? The only thing I remember was SUN, they fabbed their chip @TI, didnt they?

We made all kinds of high-clock high-power ICs. In fact we made three versions of every node, pretty similar to how any foundry operates.

SOI doesn't mean "high clocks that are otherwise unattainable without the use of SOI", it means "high clocks with less R&D effort to get those high clocks versus the R&D involved in getting the same clocks with bulk-Si".

Its the same story as with leakage. SOI makes the process node development easier because it cuts out a bunch of engineering legwork. But it transfers the work over to the accounting dept as well as the design team. Their jobs become all the more difficult, and it only pays off if your wafer volumes fall below a certain threshold.

Provided you are a small enough player then using SOI as a crutch for getting your node developed will actually pay off in the end.

But if you are a high volume player (and what foundry doesn't intend to be?) then the savings in R&D turn into unacceptable production costs in the fab.

Haven't you ever noticed the lack of interest in SOI by all the large volume fab owners? It was only ever seriously considered by the small-volume guys (AMD included). That was for economic, not technical performance, reasons.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Actualy it has less to do with marketing a 3M/6C than ensuring
that 28nm forcibly low yield at start will still allow to have
at least 2M/4C for the worst die and thus higher ASPs.

Certainly would fit with their WSA with GF. Churn out lots of 3 module APU wafers and cut down as necessary. They are pretty much required to have a certain $ amount of wafers running might as well make sure they are some of the best chips you can get.
 

SocketF

Senior member
Jun 2, 2006
236
0
71
I'm skeptical. The L1 (64KB) + L2 (2MB) icache miss rate has got to be real close to zero for most applications. If your real hot code exceeds 2MB (or even 1MB with two uncorrelated programs) then you're probably running something very weird. Furthermore, they'd have to then fit in L3.
I looked around for NB overclocking tests, found this:

http://www.rage3d.com/reviews/cpu/amd_fx_8150/index.php?p=8

Maybe you have others? The big problem with this comparison is that the OC2 tests don't just involve a 9% northbridge overclock but a 50% increase in DRAM clock speed. Which is reflected directly in Sandra's synthetic memory bandwidth test.
Checked my sources again, most of them were unfortunately Phenom X6s. But a print magazine tested NB-OC with a Vishera under starcraft2, they measured +7% with a 20% OC from 2,0 -> 2,4. Not that great as I thought, but still not bad.

The thing is, if you're bandwidth bound to main memory you're not going to see an improvement just by increasing core to L3 bandwidth.
Depends how many threads/cores you have. Check the sandra scores of the FM2 CPUs with 2 modules. They are far below the maximal DDR3 bandwidth for dual channel configurations. Hence there is another limitation, which has to be the NB, there is nothing else between memory controller and module.

Did you happen to see the leaked PS4 diagrams that reference Onion and Garlic (and "Onion+") buses? Could be that an evolution of the design made it into Kabini.. or could have been specific for Sony's needs.
Not sure, but the stuff I saw were no leaks, just some self made diagrams, if I remember correctly.

Is that 6T / 3 module info official or just guess?
It is semi-offical. BSN leaked some AMD PDFs about Kaveri and there it says "2 to 3 modules". There is a low chance that it is fake, but it seems legit, thus semi-official ;-)

SHP doesn't necessarily mean PD-SOI,
TSMC's doesn't. SHP is whatever process GF wants to affix that label to.
Yes, but GF already has lots of 28nm processes, they even changed the High performance process from HP to HPP (high performance plus), seems the parameters of TSMC's version was better, so they adjusted a few of them.
I cannot think about anything else than SOI for a "SHP"-version, and traditionally SHP always has been PD-SOI since they use it.

To my knowledge though IBM had zero interests in developing an SOI-based half-node for the fab club. It may surface again at 20nm, but I'm 99% confident it isn't in the mix for 28nm.
That however is a good argument ... I couldnt find anything else about 28nm SOI either, but I found an 22nm SOI process from IBM:

http://www.chipworks.com/blog/technologyblog/2012/12/19/ibm-surprises-with-22nm-details-at-iedm/

The strange thing is: It is still gate-first.
How big are the chances, that GF will call that thing 28SHP? If they are zero, how big are the chances that GF did the job alone? Impossible?
Well then I also have no clue, then we would be back to the inital question about the meaning of SHP.


We made all kinds of high-clock high-power ICs. In fact we made three versions of every node, pretty similar to how any foundry operates.


SOI doesn't mean "high clocks that are otherwise unattainable without the use of SOI", it means "high clocks with less R&D effort to get those high clocks versus the R&D involved in getting the same clocks with bulk-Si".
Oh, didnt know both of that, thanks
I really thought SOI can achive more performance, I think I red some time ago that SOI is 20% better (in some parameter, drive strength or whatever it was, no clue now), and thus delivers a better clock-headroom.
Its the same story as with leakage. SOI makes the process node development easier because it cuts out a bunch of engineering legwork. But it transfers the work over to the accounting dept as well as the design team. Their jobs become all the more difficult, and it only pays off if your wafer volumes fall below a certain threshold.
Yes totally clear then, if there is no performance advantage the calculation is rather easy.

Provided you are a small enough player then using SOI as a crutch for getting your node developed will actually pay off in the end.

But if you are a high volume player (and what foundry doesn't intend to be?) then the savings in R&D turn into unacceptable production costs in the fab.
Yes .. then the only reason would be financial. SOI is like paying less in the beginning (e.g. getting a loan), but then you have to pay off interest to SOITEC over the years = indefinitely.

Haven't you ever noticed the lack of interest in SOI by all the large volume fab owners? It was only ever seriously considered by the small-volume guys (AMD included). That was for economic, not technical performance, reasons.
Sure I did, but I thought it is just like a Pepsi <> Coke thing, furthermore I thought the engineers at IBM are great people, so they shouldn't choose sth bad.
But now it makes more sense with your example .. IBM's server chips are low-volume, so it is very likely better for them. Even if they would be above the threshold, they could just price the CPUs higher (but I think PPCs are already expensive enough :biggrin: ). AMD however is in a price fight with intel and has to sell a big and probably expensive 315mm² die with low ASPs ... a bad situation, no a really bad situation ...

No wonder that they stopped payments for the SHP20 node .. it doesnt pay off. From this point of view the switch to bulk is/was natural since GF became independent and more customers are bringing higher volume into the fab. I just hope there still will be other customers at 28nm besides the Chinese and AMD *G*

Thanks for all the information!
 
Last edited:

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
Comparison of low cost and low power AMD/Intel/VIA CPU microarchitectures
oh... cool, nice chart!

VIA looks very good
at the same time, silvermont with only 64-bit floating looks like a huge bottleneck... like bobcat's
 

Gideon

Golden Member
Nov 27, 2007
1,709
3,927
136
oh... cool, nice chart!

VIA looks very good
at the same time, silvermont with only 64-bit floating looks like a huge bottleneck... like bobcat's

Lol @ "floing" point.
What's the source for that information ? I seriously doubt VIA will support every ISA extention on the planet, including AVX2. The fact that silvermont has 64 Bit wide FP module is fishy as well. Besides, if Isaiah 2 is really a 2 Ghz 4 issue wide core it's power consumption should be i the same league as Sandy Bridge making this comparison to low power cores ridiculous.

Personally i think this info is beyond über BS ...
 
Mar 10, 2006
11,715
2,012
126
Comparison of low cost and low power AMD/Intel/VIA CPU microarchitectures

Tralalak,

Please cite your source on "Silvermont", as there have been no public details shares about this core, nor have there been any leaks beyond the L2$ sizes + OoO + clock frequency. Thanks...
 

Gideon

Golden Member
Nov 27, 2007
1,709
3,927
136
Tralalak,

Please cite your source on "Silvermont", as there have been no public details shares about this core, nor have there been any leaks beyond the L2$ sizes + OoO + clock frequency. Thanks...

Actually, i think i found the source:

 

inf64

Diamond Member
Mar 11, 2011
3,763
4,221
136
Intel states 50-60% improvement with Silvermont (OoO,4CT/4T 2.1Ghz listed in slides) vs previous gen Atoms (in-order,2C/4T 1.5Ghz listed in slides). So that gives it about 7-15% per thread improvement once you normalize the clockspeed.
 
Last edited:

Gideon

Golden Member
Nov 27, 2007
1,709
3,927
136

Thank you for the sources, the last article about VIA seems to be the most interesting. It seems that I was too sarcastic about it. Anyway I still don't see 4-issue wide core being mentioned anywhere (though it'll probably need it for AVX2). However if it is indeed as wide, I'm willing to bet, that it should be compared to the Haswell rather than these 2-wide small cores.

If they manage to keep the whole platform power consumption within the same range as Kabini I'd be extremely impressed. Looking at this article that seem way to far-fetched IMO:

http://www.anandtech.com/show/4332/vias-quadcore-nano-gets-bigger

At 1.2GHz, VIA's QuadCore still carries a 27W TDP. Add another 5W for the integrated graphics chipset and you're talking about 32W, nearly double of AMD's dual-core E-350 Brazos platform

So, the vanilla Isaiah had a 27,5W TDP @ 1.2 Ghz not including a GPU, Northbridge and Southbridge. It also has FSB and no on-die memory controller. While Isaiah 2 isn't a 2-die system it will nevertheless keep the ancient platform with FSB, no integrated GPU and a separate memory controller.

Don't get me wrong, the core seems very impressive, especially the fact that it supports AVX2. But In the end, if you take the whole system power-consumption into account, it's much more likely to end up competing with 17-35W Trinity and Haswell, rather than these low-power cores.

It's actually quite sad that VIAs platform is so ancient . A decent APU or SoC version of this chip would probably have a lot better IPC and single threaded performance than Richland.
 
Last edited:

mikk

Diamond Member
May 15, 2012
4,172
2,210
136
Intel states 50-60% improvement with Silvermont (OoO,4CT/4T 2.1Ghz listed in slides) vs previous gen Atoms (in-order,2C/4T 1.5Ghz listed in slides). So that gives it about 7-15% per thread improvement once you normalize the clockspeed.


Other slides say up to 2x improvement in productive workload: https://www.youtube.com/watch?v=KBtE6E-c730

Furthermore the old Atom support hyperthreading which boosted multithread applications up to 50%. You can't normalize anything if you don't know the tested application and hyperthreading speedup.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
Intel states 50-60% improvement with Silvermont (OoO,4CT/4T 2.1Ghz listed in slides) vs previous gen Atoms (in-order,2C/4T 1.5Ghz listed in slides). So that gives it about 7-15% per thread improvement once you normalize the clockspeed.

So that means that Kabini will have 100+% performance improvement over Brazos 2.0 (Double the Cores/Threads + 15% IPC). And that will raise even further if you raise the frequency.

Since Brazos 2.0 is faster than 32nm ATOM, i dont see Silvermond fair well against Kabini, only in very low sub 3W SoCs.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,688
1,222
136
Jaguar 2.0 is expected to launch 9 months after Jaguar. Then 5-6 months after that we will see Leopard.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Since Brazos 2.0 is faster than 32nm ATOM, i dont see Silvermond fair well against Kabini, only in very low sub 3W SoCs.

And that's the enviroment for Atom. The 3,4W Temash is a DualCore 1GHz SoC with a 75 GFLOPs GPU. Intel will easily beat it with Silvermont.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
And that's the enviroment for Atom. The 3,4W Temash is a DualCore 1GHz SoC with a 75 GFLOPs GPU. Intel will easily beat it with Silvermont.

That's not the only environment they will use ATOM, Intel plans to use it in low end Laptops and Desktops as well.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
That's not the only environment they will use ATOM, Intel plans to use it in low end Laptops and Desktops as well.

Sure, you will see ARM SoCs in the same products.
The problem for AMD is: Kabini needs to much power to be faster. And it's to slow while using the same power.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |