Kabini Rumors

inf64 · Apr 21, 2013

L3 will definitely be there in server parts . For APUs L3 makes zero sense, especially if they are really going for 6T parts. 6T/512SP SR based APU will be the sweet spot on mainstream/performance desktop, L3 cache won't make any difference there (except it would blow up the die size considerably which is of course bad).

Abwx · Apr 21, 2013

inf64 said:
. For APUs L3 makes zero sense, especially if they are really going for 6T parts. 6T/512SP SR based APU will be the sweet spot on mainstream/performance desktop, L3 cache won't make any difference there (except it would blow up the die size considerably which is of course bad).

The shrink to 28nm just allow to add ONE module and 128SPs
in the same die area as Trinity so they will surely go 3M/6T ,
hence the non fully functionnal parts will be rebadged as 4C ,
there likely wont be 2C offerings anymore.

SocketF · Apr 21, 2013

inf64 said:
L3 will definitely be there in server parts .

Yes but the rumors are "no more (high-end) server parts"
"The future is Seamicro thin light and dense".

Abwx · Apr 21, 2013

SocketF said:
Yes but the rumors are "no more (high-end) server parts"
"The future is Seamicro thin light and dense".

Wich rumours ?..

They pointed recently that they will use future BD iterations in servers.

http://www.engadget.com/2013/04/01/amd-roadmap-shows-steamroller-based-opterons-on-track-for-2013/

Exophase · Apr 21, 2013

SocketF said:
I currently believe that the great bottleneck is the narrow 128bit connection of each module to the XBAR.

The reason for this is the performance benefits by overclocking the NB-clock. Performance in some single thread tasks or games are veery nice.

I also checked the decode rate and as soon as you go out of the L2 it is really baad ... less than 1 op per clock on average.

I'm skeptical. The L1 (64KB) + L2 (2MB) icache miss rate has got to be real close to zero for most applications. If your real hot code exceeds 2MB (or even 1MB with two uncorrelated programs) then you're probably running something very weird. Furthermore, they'd have to then fit in L3.

I looked around for NB overclocking tests, found this:

http://www.rage3d.com/reviews/cpu/amd_fx_8150/index.php?p=8

Maybe you have others? The big problem with this comparison is that the OC2 tests don't just involve a 9% northbridge overclock but a 50% increase in DRAM clock speed. Which is reflected directly in Sandra's synthetic memory bandwidth test.

Throughout the rest of the review you see a similar pattern that you tend to with a big shift in DRAM speeds - very little increase on most programs and a select small class that see a big boost. Ones that probably work very linearly through big datasheets like decryption. Other than that, a few around 5% but mainly under 2%.

The thing is, if you're bandwidth bound to main memory you're not going to see an improvement just by increasing core to L3 bandwidth.

SocketF said:
AFAIK Onion is dead since Trinity (only used in Llano), but who cares. As long as that 256bit info is correct, I am happy ^^

Did you happen to see the leaked PS4 diagrams that reference Onion and Garlic (and "Onion+") buses? Could be that an evolution of the design made it into Kabini.. or could have been specific for Sony's needs.

Piroko · Apr 21, 2013

inf64 said:
L3 will definitely be there in server parts . For APUs L3 makes zero sense, especially if they are really going for 6T parts. 6T/512SP SR based APU will be the sweet spot on mainstream/performance desktop, L3 cache won't make any difference there (except it would blow up the die size considerably which is of course bad).

Is that 6T / 3 module info official or just guess? I would think with a 20% performance uplift (through ipc and/or clocks) in cpu performance and a sizeable performance uplift in GPU that chip would do fine without a third module.

Vesku · Apr 21, 2013

Piroko said:
Is that 6T / 3 module info official or just guess? I would think with a 20% performance uplift (through ipc and/or clocks) in cpu performance and a sizeable performance uplift in GPU that chip would do fine without a third module.

Well they could use the marketing points of having a 3 module APU, especially if it's just another $20-30 over the top 2 module.

Abwx · Apr 21, 2013

Actualy it has less to do with marketing a 3M/6C than ensuring
that 28nm forcibly low yield at start will still allow to have
at least 2M/4C for the worst die and thus higher ASPs.

Idontcare · Apr 21, 2013

SocketF said:
The first mentioned 28SHP at their GTC in 2011:

http://www.brightsideofnews.com/news/2011/9/11/rumors-14nm-node-and-450mm-wafers-by-2015.aspx

However, since then it more or less vanished, on GF's 28nm website:
http://www.globalfoundries.com/technology/28nm.aspx

.. it wasnt mentioned at all. So people thought it is gone, but then it re-appeared suddenly in February.

There is also the question if SHP == PD-SOI, but I don't see what else it could be. FD-SOI is no candidate, due to its LP-base.

SHP doesn't necessarily mean PD-SOI, TSMC's doesn't. SHP is whatever process GF wants to affix that label to.

To my knowledge though IBM had zero interests in developing an SOI-based half-node for the fab club. It may surface again at 20nm, but I'm 99% confident it isn't in the mix for 28nm.

SocketF said:
Thanks, sounds reasonable, however, I am missing the potential of higher ASPs, because the SOI products will clock higher.

At least if you have high-priced products. Not sure what TI was fabbing back then, but I guess it was rather cheap stuff? The only thing I remember was SUN, they fabbed their chip @TI, didnt they?

We made all kinds of high-clock high-power ICs. In fact we made three versions of every node, pretty similar to how any foundry operates.

SOI doesn't mean "high clocks that are otherwise unattainable without the use of SOI", it means "high clocks with less R&D effort to get those high clocks versus the R&D involved in getting the same clocks with bulk-Si".

Its the same story as with leakage. SOI makes the process node development easier because it cuts out a bunch of engineering legwork. But it transfers the work over to the accounting dept as well as the design team. Their jobs become all the more difficult, and it only pays off if your wafer volumes fall below a certain threshold.

Provided you are a small enough player then using SOI as a crutch for getting your node developed will actually pay off in the end.

But if you are a high volume player (and what foundry doesn't intend to be?) then the savings in R&D turn into unacceptable production costs in the fab.

Haven't you ever noticed the lack of interest in SOI by all the large volume fab owners? It was only ever seriously considered by the small-volume guys (AMD included). That was for economic, not technical performance, reasons.

Vesku · Apr 21, 2013

Abwx said:
Actualy it has less to do with marketing a 3M/6C than ensuring
that 28nm forcibly low yield at start will still allow to have
at least 2M/4C for the worst die and thus higher ASPs.

Certainly would fit with their WSA with GF. Churn out lots of 3 module APU wafers and cut down as necessary. They are pretty much required to have a certain $ amount of wafers running might as well make sure they are some of the best chips you can get.

SocketF · Apr 21, 2013

Exophase said:
I'm skeptical. The L1 (64KB) + L2 (2MB) icache miss rate has got to be real close to zero for most applications. If your real hot code exceeds 2MB (or even 1MB with two uncorrelated programs) then you're probably running something very weird. Furthermore, they'd have to then fit in L3.
I looked around for NB overclocking tests, found this:

http://www.rage3d.com/reviews/cpu/amd_fx_8150/index.php?p=8

Maybe you have others? The big problem with this comparison is that the OC2 tests don't just involve a 9% northbridge overclock but a 50% increase in DRAM clock speed. Which is reflected directly in Sandra's synthetic memory bandwidth test.

Checked my sources again, most of them were unfortunately Phenom X6s. But a print magazine tested NB-OC with a Vishera under starcraft2, they measured +7% with a 20% OC from 2,0 -> 2,4. Not that great as I thought, but still not bad.

The thing is, if you're bandwidth bound to main memory you're not going to see an improvement just by increasing core to L3 bandwidth.

Depends how many threads/cores you have. Check the sandra scores of the FM2 CPUs with 2 modules. They are far below the maximal DDR3 bandwidth for dual channel configurations. Hence there is another limitation, which has to be the NB, there is nothing else between memory controller and module.

Did you happen to see the leaked PS4 diagrams that reference Onion and Garlic (and "Onion+") buses? Could be that an evolution of the design made it into Kabini.. or could have been specific for Sony's needs.

Not sure, but the stuff I saw were no leaks, just some self made diagrams, if I remember correctly.

Piroko said:
Is that 6T / 3 module info official or just guess?

It is semi-offical. BSN leaked some AMD PDFs about Kaveri and there it says "2 to 3 modules". There is a low chance that it is fake, but it seems legit, thus semi-official ;-)

Idontcare said:
SHP doesn't necessarily mean PD-SOI,
TSMC's doesn't. SHP is whatever process GF wants to affix that label to.

Yes, but GF already has lots of 28nm processes, they even changed the High performance process from HP to HPP (high performance plus), seems the parameters of TSMC's version was better, so they adjusted a few of them.
I cannot think about anything else than SOI for a "SHP"-version, and traditionally SHP always has been PD-SOI since they use it.

To my knowledge though IBM had zero interests in developing an SOI-based half-node for the fab club. It may surface again at 20nm, but I'm 99% confident it isn't in the mix for 28nm.

That however is a good argument ... I couldnt find anything else about 28nm SOI either, but I found an 22nm SOI process from IBM:

http://www.chipworks.com/blog/technologyblog/2012/12/19/ibm-surprises-with-22nm-details-at-iedm/

The strange thing is: It is still gate-first.
How big are the chances, that GF will call that thing 28SHP? If they are zero, how big are the chances that GF did the job alone? Impossible?
Well then I also have no clue, then we would be back to the inital question about the meaning of SHP.

We made all kinds of high-clock high-power ICs. In fact we made three versions of every node, pretty similar to how any foundry operates.

SOI doesn't mean "high clocks that are otherwise unattainable without the use of SOI", it means "high clocks with less R&D effort to get those high clocks versus the R&D involved in getting the same clocks with bulk-Si".

Oh, didnt know both of that, thanks
I really thought SOI can achive more performance, I think I red some time ago that SOI is 20% better (in some parameter, drive strength or whatever it was, no clue now), and thus delivers a better clock-headroom.

Its the same story as with leakage. SOI makes the process node development easier because it cuts out a bunch of engineering legwork. But it transfers the work over to the accounting dept as well as the design team. Their jobs become all the more difficult, and it only pays off if your wafer volumes fall below a certain threshold.

Yes totally clear then, if there is no performance advantage the calculation is rather easy.

Provided you are a small enough player then using SOI as a crutch for getting your node developed will actually pay off in the end.

But if you are a high volume player (and what foundry doesn't intend to be?) then the savings in R&D turn into unacceptable production costs in the fab.

Yes .. then the only reason would be financial. SOI is like paying less in the beginning (e.g. getting a loan), but then you have to pay off interest to SOITEC over the years = indefinitely.

Haven't you ever noticed the lack of interest in SOI by all the large volume fab owners? It was only ever seriously considered by the small-volume guys (AMD included). That was for economic, not technical performance, reasons.

Sure I did, but I thought it is just like a Pepsi <> Coke thing, furthermore I thought the engineers at IBM are great people, so they shouldn't choose sth bad.
But now it makes more sense with your example .. IBM's server chips are low-volume, so it is very likely better for them. Even if they would be above the threshold, they could just price the CPUs higher (but I think PPCs are already expensive enough :biggrin: ). AMD however is in a price fight with intel and has to sell a big and probably expensive 315mm² die with low ASPs ... a bad situation, no a really bad situation ...

No wonder that they stopped payments for the SHP20 node .. it doesnt pay off. From this point of view the switch to bulk is/was natural since GF became independent and more customers are bringing higher volume into the fab. I just hope there still will be other customers at 28nm besides the Chinese and AMD *G*

Thanks for all the information!

dbcoopernz · Apr 21, 2013

AMD Jaguar (16h) Software Optimization Guide is now publicly available:

http://support.amd.com/us/Processor_TechDocs/52128_16h_Software_Opt_Guide.zip

Olikan · Apr 21, 2013

Tralalak said:
Comparison of low cost and low power AMD/Intel/VIA CPU microarchitectures

oh... cool, nice chart!

VIA looks very good
at the same time, silvermont with only 64-bit floating looks like a huge bottleneck... like bobcat's

Gideon · Apr 22, 2013

Olikan said:
oh... cool, nice chart!

VIA looks very good
at the same time, silvermont with only 64-bit floating looks like a huge bottleneck... like bobcat's

Lol @ "floing" point.
What's the source for that information ? I seriously doubt VIA will support every ISA extention on the planet, including AVX2. The fact that silvermont has 64 Bit wide FP module is fishy as well. Besides, if Isaiah 2 is really a 2 Ghz 4 issue wide core it's power consumption should be i the same league as Sandy Bridge making this comparison to low power cores ridiculous.

Personally i think this info is beyond über BS ...

Arachnotronic · Apr 22, 2013

Tralalak said:
Comparison of low cost and low power AMD/Intel/VIA CPU microarchitectures

Tralalak,

Please cite your source on "Silvermont", as there have been no public details shares about this core, nor have there been any leaks beyond the L2$ sizes + OoO + clock frequency. Thanks...

Gideon · Apr 22, 2013

Intel17 said:
Tralalak,

Please cite your source on "Silvermont", as there have been no public details shares about this core, nor have there been any leaks beyond the L2$ sizes + OoO + clock frequency. Thanks...

Actually, i think i found the source:

inf64 · Apr 22, 2013

Intel states 50-60% improvement with Silvermont (OoO,4CT/4T 2.1Ghz listed in slides) vs previous gen Atoms (in-order,2C/4T 1.5Ghz listed in slides). So that gives it about 7-15% per thread improvement once you normalize the clockspeed.

NostaSeronx · Apr 22, 2013

http://www.globalfoundries.com/technology/advanced_tech.aspx

Gideon · Apr 22, 2013

Tralalak said:
Comparison of low cost and low power AMD/Intel/VIA CPU microarchitectures

source: http://semiaccurate.com/2012/08/28/amd-let-the-new-cat-out-of-the-bag-with-the-jaguar-core/
source: http://www.planet3dnow.de/cgi-bin/newspub/viewnews.cgi?id=1346188254
source: http://www.planet3dnow.de/cgi-bin/newspub/viewnews.cgi?id=1361284728
source: http://www.techpowerup.com/180394/AMD-quot-Jaguar-quot-Micro-architecture-Takes-the-Fight-to-Atom-with-AVX-SSE4-Quad-Core.html
source: http://www.forum-3dcenter.org/vbulletin/showthread.php?p=9602732#post9602732
source: http://www.techpowerup.com/178189/Intel-quot-Bay-Trail-quot-Platform-and-quot-Valleyview-quot-Atom-SoC-Detailed.html
source: http://www.expreview.com/20979.html
source: http://www.h-online.com/newsticker/news/item/Processor-Whispers-About-Austin-powers-and-patents-1742927.html

Thank you for the sources, the last article about VIA seems to be the most interesting. It seems that I was too sarcastic about it. Anyway I still don't see 4-issue wide core being mentioned anywhere (though it'll probably need it for AVX2). However if it is indeed as wide, I'm willing to bet, that it should be compared to the Haswell rather than these 2-wide small cores.

If they manage to keep the whole platform power consumption within the same range as Kabini I'd be extremely impressed. Looking at this article that seem way to far-fetched IMO:

http://www.anandtech.com/show/4332/vias-quadcore-nano-gets-bigger

At 1.2GHz, VIA's QuadCore still carries a 27W TDP. Add another 5W for the integrated graphics chipset and you're talking about 32W, nearly double of AMD's dual-core E-350 Brazos platform

So, the vanilla Isaiah had a 27,5W TDP @ 1.2 Ghz not including a GPU, Northbridge and Southbridge. It also has FSB and no on-die memory controller. While Isaiah 2 isn't a 2-die system it will nevertheless keep the ancient platform with FSB, no integrated GPU and a separate memory controller.

Don't get me wrong, the core seems very impressive, especially the fact that it supports AVX2. But In the end, if you take the whole system power-consumption into account, it's much more likely to end up competing with 17-35W Trinity and Haswell, rather than these low-power cores.

It's actually quite sad that VIAs platform is so ancient . A decent APU or SoC version of this chip would probably have a lot better IPC and single threaded performance than Richland.

mikk · Apr 22, 2013

inf64 said:
Intel states 50-60% improvement with Silvermont (OoO,4CT/4T 2.1Ghz listed in slides) vs previous gen Atoms (in-order,2C/4T 1.5Ghz listed in slides). So that gives it about 7-15% per thread improvement once you normalize the clockspeed.

Other slides say up to 2x improvement in productive workload: https://www.youtube.com/watch?v=KBtE6E-c730

Furthermore the old Atom support hyperthreading which boosted multithread applications up to 50%. You can't normalize anything if you don't know the tested application and hyperthreading speedup.

AtenRa · Apr 22, 2013

inf64 said:
Intel states 50-60% improvement with Silvermont (OoO,4CT/4T 2.1Ghz listed in slides) vs previous gen Atoms (in-order,2C/4T 1.5Ghz listed in slides). So that gives it about 7-15% per thread improvement once you normalize the clockspeed.

So that means that Kabini will have 100+% performance improvement over Brazos 2.0 (Double the Cores/Threads + 15% IPC). And that will raise even further if you raise the frequency.

Since Brazos 2.0 is faster than 32nm ATOM, i dont see Silvermond fair well against Kabini, only in very low sub 3W SoCs.

NostaSeronx · Apr 22, 2013

Jaguar 2.0 is expected to launch 9 months after Jaguar. Then 5-6 months after that we will see Leopard.

sontin · Apr 22, 2013

AtenRa said:
Since Brazos 2.0 is faster than 32nm ATOM, i dont see Silvermond fair well against Kabini, only in very low sub 3W SoCs.

And that's the enviroment for Atom. The 3,4W Temash is a DualCore 1GHz SoC with a 75 GFLOPs GPU. Intel will easily beat it with Silvermont.

AtenRa · Apr 22, 2013

sontin said:
And that's the enviroment for Atom. The 3,4W Temash is a DualCore 1GHz SoC with a 75 GFLOPs GPU. Intel will easily beat it with Silvermont.

That's not the only environment they will use ATOM, Intel plans to use it in low end Laptops and Desktops as well.

sontin · Apr 22, 2013

AtenRa said:
That's not the only environment they will use ATOM, Intel plans to use it in low end Laptops and Desktops as well.

Sure, you will see ARM SoCs in the same products.
The problem for AMD is: Kabini needs to much power to be faster. And it's to slow while using the same power.

Kabini Rumors

Diamond Member

Lifer

Senior member

Lifer

Diamond Member

Senior member

Diamond Member

Lifer

Elite Member

Diamond Member

Senior member

Member

Platinum Member

Golden Member

Lifer

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member