First Steamroller processor core exposure

Page 20 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

sushiwarrior

Senior member
Mar 17, 2010
738
0
71
Jaguar IPC is higher than PileDrive at low frequencies of 1.5-2GHz. Also Kabini doesn’t suffer from the CMT, a quad core 1.5GHz Kabini A4-5000 is faster than Quad core Piledriver at 1.6GHz A8-4550M.


But, I don’t expect a 2GHz Kabini to win over a 2.5GHz (3.5GHz turbo) Piledriver A10-5750M

I see it losing in several benchmarks, so I would not agree that it is definitively faster than piledriver. Jaguar wins on cinebench and x264, loses on 3dmark and superpi and wprime. So no, not necessarily always faster...
 

Hitman928

Diamond Member
Apr 15, 2012
5,581
8,753
136
I see it losing in several benchmarks, so I would not agree that it is definitively faster than piledriver. Jaguar wins on cinebench and x264, loses on 3dmark and superpi and wprime. So no, not necessarily always faster...

You have to be aware of what is being tested and how on each.

1) Cinebench - jaguar wins against quad core Richland
2) x264 - Jaguar loses to a quad core Richland (the one in the test is dual core) but Richland should have an instruction advantage here
3) Super-pi - Jaguar ties but Richland has a turbo advantage therefore Jaguar wins
4) wprime - Jaguar wins at 32m but loses at 1024m
5) Truecrypt - AES is a win for Jaguar but a loss for the other two, not sure what instructions they utilize.
6)3dmark06 cpu - Jaguar beats quad Richland

See http://www.notebookcheck.net/AMD-A-Series-A8-4555M-Notebook-Processor.81873.0.html for quad Richland. Ignore the x264 result though, something is weird there as it gets half the score of the dual core A6-4455m for some reason.
 
Last edited:

Abwx

Lifer
Apr 2, 2011
11,163
3,859
136
This is very interesting. As shown in the PS4 thread*, 4 jaguar cores have about the same performance than dual sandy bridge cores with hyperthreading (aka an i3). And 8 jaguar cores have about the same performance than quad sandy bridge cores with hyperthreading (aka an i7).

Now, if steamroller is twice more powerful than jaguar, this means that 4 steamroller cores will have about the same performance than quad sandy bridge cores with hyperthreading (aka an i7) or about the same performance than octo piledriver cores (aka an FX-8). :awe:

This would explain the new server roadmaps.



The 8C piledriver Opterons 3300 are replaced by 4C steamroller Berlin. And the Warsaw piledriver coming only in 12C/16C configurations. with the old configurations dropped.

I think it is safe to speculate that Warsaw chips will be finally replaced by future 6C/8C Berlin APU/CPU or what was named.

This twice performance of the new steamroller would explain also why the desktop roadmaps for 2013 only consider 2C/4C kaveri, with the early rumoured 6C being dropped. The 4C kaveri would be competing with a sandy bridge i7 (in CPU of course, kaveri graphics will be much more advanced).

*
http://forums.anandtech.com/showpost.php?p=35021461&postcount=1120
http://forums.anandtech.com/showpost.php?p=35028201&postcount=1193

They say that Berlin will have "almost" 8 x the Gflops/watt throughput
of a 16 cores 3.2g opteron wich has a theorical peak of 204.8 gflops
at 140W TDP = 1.46 Gflops/Watt , hence they are targeting about
1.46 x 8 = 11.68 Gflops/Watt , that is 1168 Gflops for a 100W Kaveri.

Since they said "almost" 8x let s round it to 1100Gflops.
Using 512 SPs at 1 GHz for the GPU they can reach 1000Gflops ,
the rest is to be provided by the FPU of the CPU , something
that will be between 50 and 100Gflops.
 

Sazuzaki

Senior member
Jul 11, 2013
313
0
0
I got high hopes for the multi threaded performance, as for single thread, who cares.. AMD is pushing the use of multi core and I think they are starting to jumpstart that engine..
 

relztes

Junior Member
Apr 19, 2009
8
0
0
They say that Berlin will have "almost" 8 x the Gflops/watt throughput
of a 16 cores 3.2g opteron wich has a theorical peak of 204.8 gflops
at 140W TDP = 1.46 Gflops/Watt , hence they are targeting about
1.46 x 8 = 11.68 Gflops/Watt , that is 1168 Gflops for a 100W Kaveri.

Since they said "almost" 8x let s round it to 1100Gflops.
Using 512 SPs at 1 GHz for the GPU they can reach 1000Gflops ,
the rest is to be provided by the FPU of the CPU , something
that will be between 50 and 100Gflops.

I think AMD already mentioned that 35 W Berlin would have about 700 Gflops total. That is the basis of their calculation, not the 100 W part. The Opteron has a base frequency of 2.8 GHz. FMA counts as 2 flops, so the calculation is:
16 cores * 2.8 GHz * 8 flops/core/Hz / 140 W = 2.56 Gflops/Watt
Berlin: 20 Gflops/Watt * 35 Watts = 700 Gflops

If you assume 60-100 Gflops for the CPU, then 600-640 Gflops / 512 SP / 2 = 585-625 MHz GPU clock.

Although actually, 1000-1100 Gflops for the 100 W part sounds about right I think.
 

SPBHM

Diamond Member
Sep 12, 2012
5,057
410
126
I see it losing in several benchmarks, so I would not agree that it is definitively faster than piledriver. Jaguar wins on cinebench and x264, loses on 3dmark and superpi and wprime. So no, not necessarily always faster...

yes...
Jaguar 2 cores vs 1 module is faster for most MT software at the same clock, but it doesn't matter because Jaguar can't work at the same clock...
it's like.. Pentium 4 vs Pentium 3 I guess.

for 35W and higher I think PD is simply better.
 

wlee15

Senior member
Jan 7, 2009
313
31
91
yes...
Jaguar 2 cores vs 1 module is faster for most MT software at the same clock, but it doesn't matter because Jaguar can't work at the same clock...
it's like.. Pentium 4 vs Pentium 3 I guess.

for 35W and higher I think PD is simply better.

Well we don't know really know how far Jaguar can actually clock since currently Kabini is currently only go up to 2Ghz at 25W TDP.
 
Aug 11, 2008
10,451
642
126
If it is as fast as Piledriver and can be clocked to 4.0ghz, why are they selling an expensive, power hungry chip when they could sell a jaguar model and make a much bigger profit.
 

SPBHM

Diamond Member
Sep 12, 2012
5,057
410
126
Well we don't know really know how far Jaguar can actually clock since currently Kabini is currently only go up to 2Ghz at 25W TDP.

I think there is a good reason, they can't go higher (not much anyway), probably anything over 2GHz-25W is going to be bad perf per watt, and perhaps 2GHz is not far from the overall limit...
Bobcat was released at 1.6GHz and I think it never went past 1.65 (but I've seen overclocking to as high as 2.1)

if Jaguar was better than Piledriver for 35W and higher, I would expect AMD to be using it for that
 

galego

Golden Member
Apr 10, 2013
1,091
0
0
That is strange if you think about it.
8core consoles. AMD will get what they wanted and needed. Mutlicore ultilization in games forced on game devs by next gen console design. Going back to 2/4core CPU now makes no sense, that is if games will truly be multithreaded.

But I can see how they may be frustrated or even disappointed by games optimizations (even GE titles!), and don't want to deal with those problems again.

I also find that part difficult to swallow, going back to a 4 core setup after the consoles would appear to make a strong case for 8.

In #255 I mentioned the tomshardware rumour that each Steamroller module could execute four threads simultaneously (2M/8T) and how this could fit with consoles being 8C/8T. But I cannot find any verification of this rumour.

In any case, with independence of the number of threads look to how AMD is replacing 8C piledriver by 4C Steamroller. Do you really believe that AMD will be releasing 8C Steamroller for gaming? It looks to me as a disproportionated amount of performance.
 

galego

Golden Member
Apr 10, 2013
1,091
0
0
They say that Berlin will have "almost" 8 x the Gflops/watt throughput
of a 16 cores 3.2g opteron wich has a theorical peak of 204.8 gflops
at 140W TDP = 1.46 Gflops/Watt , hence they are targeting about
1.46 x 8 = 11.68 Gflops/Watt , that is 1168 Gflops for a 100W Kaveri.

Since they said "almost" 8x let s round it to 1100Gflops.
Using 512 SPs at 1 GHz for the GPU they can reach 1000Gflops ,
the rest is to be provided by the FPU of the CPU , something
that will be between 50 and 100Gflops.

AMD said that kaveri 4C will be 1.05 TFLOP. My estimation is

CPU: 4C * 8 * 4.1GHz = 131.2 GFLOP
GPU: 512 * 2 * 0.9GHz = 921.6

APU: 1053 GFLOP
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,688
1,222
136
http://phx.corporate-ir.net/External.File?item=UGFyZW50SUQ9MTI1MTM5fENoaWxkSUQ9LTF8VHlwZT0z&t=1

February 2, 2012:
Testing performed by AMD Performance Labs. Calculated compute performance or Theoretical Maximum GFLOPS score for 2013 Kaveri (4C, 8CU) 100w APU, use standard formula of (CPU Cores x freq x 8 FLOPS) + (GPU Cores x freq x 2 FLOPS). The calculated GFLOPS for the 2013 Kaveri (4C, 8CU) 100w APU was 1050. GFLOPs scores for 2011 A-Series “Llano” was 580 and the 2013 A-Series “Trinity” was 819. Scores rounded to the nearest whole number.
 

galego

Golden Member
Apr 10, 2013
1,091
0
0
But I started my post with "AMD said that kaveri 4C will be 1.05 TFLOP." My estimation was about what part of those 1050 correspond to CPU and what part to GPU.

I continue without understanding what did LegSWAT mean...
 

Ventanni

Golden Member
Jul 25, 2011
1,432
142
106
I'd imagine if you scale Jaguar cores up to higher clock speeds, it'd perform a lot like a Phenom II would. If I'm not mistaken, the original Bobcat cores were based a lot off a stripped down K10 design with "90% the IPC of a K8" as the target goal. Jaguar nicely improves on that, but now that I think about it more (because what I'm saying probably contradicts what I've posted in the past), I don't think it'd make it a more competitive chip [at the high end] than AMD's Bulldozer line.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,860
3,407
136
I'd imagine if you scale Jaguar cores up to higher clock speeds, it'd perform a lot like a Phenom II would. If I'm not mistaken, the original Bobcat cores were based a lot off a stripped down K10 design with "90% the IPC of a K8" as the target goal. Jaguar nicely improves on that, but now that I think about it more (because what I'm saying probably contradicts what I've posted in the past), I don't think it'd make it a more competitive chip [at the high end] than AMD's Bulldozer line.

you are mistaken, bobcat is a complete new design, nothing in it shares anything with K8. jaguar continues the deviation by implementing cache hierarchies that AMD have never implemented before.

K8 was used as a benchmark of performance and im guessing that's because it too only had 64bit ALU's so it was most comparable.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
you are mistaken, bobcat is a complete new design, nothing in it shares anything with K8. jaguar continues the deviation by implementing cache hierarchies that AMD have never implemented before.

K8 was used as a benchmark of performance and im guessing that's because it too only had 64bit ALU's so it was most comparable.

I know folks say it was a complete new design but just how much of a "complete new design" is truly "new"?

Would it make sense, both engineering-wise as well as project management-wise, to build a bobcat core from the ground up by paying zero heed to the decades of lessons learned and accessible know-how that had gone into every preceding microarchitecture?

I find it hard to believe, but agree it is not impossible, that bobcat was created in the vacuum of taking liberties with existing circuit blocks and experience.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
I know folks say it was a complete new design but just how much of a "complete new design" is truly "new"?

Would it make sense, both engineering-wise as well as project management-wise, to build a bobcat core from the ground up by paying zero heed to the decades of lessons learned and accessible know-how that had gone into every preceding microarchitecture?

I find it hard to believe, but agree it is not impossible, that bobcat was created in the vacuum of taking liberties with existing circuit blocks and experience.

In practice, you leverage things that make sense to leverage, and redo things that make sense to redo (of course you also encounter into some not-invented-here or want-to-redo-but-don't-have-time). Sometimes you can't even tell from an external presentation which components get leveraged and which are from-scratch just because of the (relatively low) level of detail that gets disclosed. Redoing everything when you already have reasonable starting points is silly.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,860
3,407
136
I know folks say it was a complete new design but just how much of a "complete new design" is truly "new"?

Would it make sense, both engineering-wise as well as project management-wise, to build a bobcat core from the ground up by paying zero heed to the decades of lessons learned and accessible know-how that had gone into every preceding microarchitecture?

I find it hard to believe, but agree it is not impossible, that bobcat was created in the vacuum of taking liberties with existing circuit blocks and experience.

i guess its going to depend at what level you want to look at.

but when you consider that:
ALU's are complete different
has way more L/S capabilities/flexibility
cache operates completely different
FPU config is again different and very low latency
fetch and decode are different
also using automated/synthesizable layout/marcos/etc

So from a functional level of each "block" of the CPU is different. So Sure structures might very well be reused but other then the people actually building the chip thats rather a mute point because the CPU functionally operates nothing like K8.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,688
1,222
136
Bobcat's ALUs and vALUs are from 00h/K8. The cache structure, front-end structure, back-end structure is mostly taken from Bulldozer.
Jaguar's ALUs and vALUs are from 10h/K10. The cache structure, front-end structure, back-end structure is mostly taken from Steamroller.

Certain aspects of Bobcat/Jaguar make them more superior than 00h/10h/11h/12h/15h. Generally, they are better with CISC and the latency involved with CISC.
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |