I'm comparing the two between each other, not towards intel.
With irrelevant bentmarks, nice job, very exceptional, much pride.
---
Again this forum provides a perfect example of very limited education in actual real world capabilities of architectures. Cinebench doesn't agree with you
SO YOUR WRONG! Anand's half-decade old bentmarks don't agree with your numbers
SO YOUR WRONG! If Anandtech doesn't prove my point; I'll go to another review website that uses the same half-decoade bentmarks. To prove you wrong,
NOT!
Even with relevant benchmarks they are no where near actual enterprise or consumer optimization. So benchmarks just show an estimation range of said product but never give actual performance. If you have results from a bentmark that means the
ESTIMATION of performance is wrong.
---
Orochi is 2x faster than Thuban and 1.1x faster than Gulftown. Enough said that is exactly what the hardware performance counter returns back not some fictional workload in a benchmark says.
---
Heck the bentmarks even effect Intel's APUs; Intel's Haswell is two times faster than Sandy Bridge and Ivy Bridge. While most bentmarks show it only having a marginal increase over them.
Is this poor judgement of the consumers? nope.
Is this poor judgement of the app devs? nope.
Is this poor judgement of the reviewer? yes.
---
Consumers/Enterprise don't buy new CPUs/GPUs for the
same applications they buy new CPUs/GPUs for
better applications. If reviewers continue to go with new CPUs same old version applications methodology. Then, reviewers are simply alienating the community and the industry.
Users of Pirate Islands GPUs are not going to get that GPU family for pixel shaders. They are going to get that series of GPUs for compute shaders. New CPU/GPUs mean that the old gets depreciated and the new gets faster.
---
Back to the topic of CMT; CMT is built for everything not just servers especially Bulldozer/Piledriver/Steamroller/Excavator.
Dual-core CMT has two advantages of Dual-core CMP;
Both cores get access to double the resources. That means if a single core in a dual core CMT solution is active it gets to hog those resources. If both cores are active the doubled resources keep it at the same performance of a Dual-core CMP processor.
So comparing Dual-core CMT with a single core active CMT is well iffy at best. Since, single-core has access to more resources than dual-core. So, all benchmarks that are observing four modules with the second core disabled where it gets slightly higher benchmarks. Is just showing that the CMT model is working as planned.
The second advantage comes with dual core workloads where the first one is single core workloads. The second advantage I'm not well educated with but it has to do with TLP.
CMP is not an alternative to SMT, it is an alternative to CMP.
SMT allows for two threads to use different resources per clock. (Thread A: ALU0/ALU1/FPU0/AGU1 ; Thread B: ALU2/ALU3/FPU1/FPU2/AGU0)
CMT allows for two threads to use the same resources per clock as the threads resources are duplicated. (Thread A: EX0/AGLU0/EX1/AGLU1 ; Thread B: EX0/AGLU0/EX1/AGLU1)
Intel's SMT core has the FPU included in the core while AMD's CMT core does not include the FPU in the cores. The FPU in AMD's dual-core CMT processor is separate and uses the SMT model. (Thread A: P0/P2 ; Thread B; P1/P3)
For example if Intel decides to use CMT it would be like; (Thread A: ALU0/ALU1/AGU0/AGU1/FPU0/FPU1/MISC0 ; Thread B: ALU0/ALU1/AGU0/AGU1/FPU0/FPU1/MISC0)
The Intel implementation will have the Front-End and the L2 cache shared and doubled in size. So, if the SMT/CMP version was 2-way decode then the CMT version will be 4-way decode. If the SMT/CMP fetch was 16B then the CMT fetch will be 32B. If the SMT/CMP L2 cache is 256KB then the CMT L2 cache is 512KB.
If Intel saw utilization was not as high expected then they could include SMT. (Thread A: ALU0/AGU1/FPU0/MISC0 ; Thread B: ALU1/AGU0/FPU2 ; Thread C: ALU1/AGU0/AGU1/MISC0 ; Thread D : ALU0/FPU0/FPU1)