With irrelevant bentmarks, nice job, very exceptional, much pride.
---
Again this forum provides a perfect example of very limited education in actual real world capabilities of architectures. Cinebench doesn't agree with you SO YOUR WRONG! Anand's half-decade old bentmarks don't agree with your numbers SO YOUR WRONG! If Anandtech doesn't prove my point; I'll go to another review website that uses the same half-decoade bentmarks. To prove you wrong, NOT!
Even with relevant benchmarks they are no where near actual enterprise or consumer optimization. So benchmarks just show an estimation range of said product but never give actual performance. If you have results from a bentmark that means the ESTIMATION of performance is wrong.
---
Orochi is 2x faster than Thuban and 1.1x faster than Gulftown. Enough said that is exactly what the hardware performance counter returns back not some fictional workload in a benchmark says.
---
Heck the bentmarks even effect Intel's APUs; Intel's Haswell is two times faster than Sandy Bridge and Ivy Bridge. While most bentmarks show it only having a marginal increase over them.
Is this poor judgement of the consumers? nope.
Is this poor judgement of the app devs? nope.
Is this poor judgement of the reviewer? yes.
I'm confused as to what you are even getting at. AT's server tests did not show 2x gains. Are you looking at purely theoretical numbers? Because that only matters if you can get the performance out of the system. Cell was very impressive on paper but real world, not so much. You may think all benchmarks are just 'estimates' but many of them simulate typical workloads. In fact, enterprise benchmarks are designed to be as 'real world' as possible. Your comment was in reference to what I believed to be consumer products and consumers aren't going to just jump ship on their software when they upgrade.
HW is nowhere close to 2x the speed of Ivy on average.
The scientific methodology is very results driven
Well what do you consider relevant? I already said that I am NOT looking at benchmarks were BD will be unduely optimized, including AVX, FMA, etc. as these are a measure of instruction set support, rather than architectural prowess. The fact also remains that a lot of companies are still using a lot of really old software.
I invite you to provide some proof to your claims.
Consumers/Enterprise don't buy new CPUs/GPUs for the same applications they buy new CPUs/GPUs for better applications. If reviewers continue to go with new CPUs same old version applications methodology. Then, reviewers are simply alienating the community and the industry.
Users of Pirate Islands GPUs are not going to get that GPU family for pixel shaders. They are going to get that series of GPUs for compute shaders. New CPU/GPUs mean that the old gets depreciated and the new gets faster.
Oh, lots of people buy the new equipment for the same programs (or newer versions). They buy a server to run visualization on or for rendering. Performance on 3dsmax 2012 is very indicative of performance on 3ds max 2014; much more so than looking at performance on something unrelated.
Back to the topic of CMT; CMT is built for everything not just servers especially Bulldozer/Piledriver/Steamroller/Excavator.
Dual-core CMT has two advantages of Dual-core CMP;
Both cores get access to double the resources. That means if a single core in a dual core CMT solution is active it gets to hog those resources. If both cores are active the doubled resources keep it at the same performance of a Dual-core CMP processor.
So comparing Dual-core CMT with a single core active CMT is well iffy at best. Since, single-core has access to more resources than dual-core. So, all benchmarks that are observing four modules with the second core disabled where it gets slightly higher benchmarks. Is just showing that the CMT model is working as planned.
I'm not sure if I am understanding you correctly (probably not). BD/PD/ and Kaveri don't get access to all the resources.
One thread has access to only ONE integer unit (2 ALUs). Never does one thread become 4-wide. Look at IDC's CB benchmarks, loading cores first improves performance but an additional 80% is achieved with the second thread. Note that AMD's implementations share the front end (on BD the decode was 4 wide whether one core or two cores were used, this has been fixed and now subsequent designs use two 4 wide decoders- a single thread never gets 8 wide decoding). This sharing of the front end is why singlethread performance is sometimes higher with one thread loaded, not because of execution resources.
The second advantage comes with dual core workloads where the first one is single core workloads. The second advantage I'm not well educated with but it has to do with TLP.
CMP is not an alternative to SMT, it is an alternative to CMP.
SMT allows for two threads to use different resources per clock. (Thread A: ALU0/ALU1/FPU0/AGU1 ; Thread B: ALU2/ALU3/FPU1/FPU2/AGU0)
CMT allows for two threads to use the same resources per clock as the threads resources are duplicated. (Thread A: EX0/AGLU0/EX1/AGLU1 ; Thread B: EX0/AGLU0/EX1/AGLU1)
Intel's SMT core has the FPU included in the core while AMD's CMT core does not include the FPU in the cores. The FPU in AMD's dual-core CMT processor is separate and uses the SMT model. (Thread A: P0/P2 ; Thread B; P1/P3)
For example if Intel decides to use CMT it would be like; (Thread A: ALU0/ALU1/AGU0/AGU1/FPU0/FPU1/MISC0 ; Thread B: ALU0/ALU1/AGU0/AGU1/FPU0/FPU1/MISC0)
The Intel implementation will have the Front-End and the L2 cache shared and doubled in size. So, if the SMT/CMP version was 2-way decode then the CMT version will be 4-way decode. If the SMT/CMP fetch was 16B then the CMT fetch will be 32B. If the SMT/CMP L2 cache is 256KB then the CMT L2 cache is 512KB.
If Intel saw utilization was not as high expected then they could include SMT. (Thread A: ALU0/AGU1/FPU0/MISC0 ; Thread B: ALU1/AGU0/FPU2 ; Thread C: ALU1/AGU0/AGU1/MISC0 ; Thread D : ALU0/FPU0/FPU1)
This is really going to depend on the implementation.
From the same link you quoted, it has a Server price table bellow the CPU prices.
The Opteron 6376 Server costs $4225
The XEON E5 2630 Server costs $5008.
They are running different setups. Find where they tried to normalize to get power numbers. Those are not even heavily loaded numbers.
The fact remains that the 6200 or 6300 series has not be successful in servers at all, despite its lower price (and selling more for less while good for the consumer is not good for AMD).