Reason for Conroe's Performance Boost?

Mar 11, 2006
33
0
0
So curiously, has anyone thought of the reasons as to why Conroe is THAT much faster over Dothan/Yonah and similarly Rev. E of the A64?

My thoughts are that 4MB of L2 while shared between both cores does allow single threaded applications (i.e. games) to take advantage of the large cache. Looking at how cache size (and accounting for diminishing returns) improves performance for say a Venice vs San Diego, I would think going from 2MB to 4MB would account for at least 5%-10% of the performance difference between A64 and Conroe.

I'm not sure as to how Macro-Op fusion would really help, but I guess in the end it will allow programs to take advantage of idle execution units boosting overall performance. I'd say it could account from anywhere to 3-5% of the performance boost.

Lastly the boost in SSE throughput will probably be pretty huge in applications that use it. I bet if Anand turned off SSE support in games we could really see some interesting numbers (i.e. compare in Quake4 A64 SSE off vs Conroe SSE off).

This is just my take of where the performance advantage of Conroe actually comes from. The end result is that AMD can game some ground back with a 4MB L3, implementing a shared cache (which I doubt will be done in the K8 generation as the additional control logic would be too difficult to add on). Once/If AMD fixes DDR2 in AM2 then it should be even closer, but definitely I think Intel will hold the lead until AMD comes out with a true next generation CPU (shared cache, larger cache (2MB per core at least), improve their branch prediction to Intel levels, and improve SSE/2/3 performance on their cpus).
 

dguy6789

Diamond Member
Dec 9, 2002
8,558
3
76
Cache size going from a Venice to San Diego does absolutely NOTHING for performance. Conroe has a massive performance boost for many reasons. They are far too numerous for me to remember and mention in a single post. However, when it all gets down to it, Conroe is just a more efficient and superior architecture when compared to what AMD has. How could AMD fix DDR2 in AM2 if it is not broken in the first place? AMD's branch prediction is already superior to Intel's by quite a bit. SSE2/3 performance will not be improved at all unless AMD makes a cpu similar to Intel's Netburst architecture, and that is simply not going to happen.

It seems that Anandtech has not done an indepth article on Conroe architecture. I suggest waiting until they do one, or do a google search to get the exact reasons why Conroe is superior to the Hammer architecture.
 
Mar 11, 2006
33
0
0
To me Conroe is merely a tweaked Pentium M core. The amount of increases that they get must be incremental from a Pentium M baseline at least, and Pentium M clock for clock is almost comparable against the A64. Shared cache, dual core, 1 more execution unit, macro-op fusion, and single cycle SSE seemed to be the main improvements over Yonah/Dothan.

In any case I never read anything where A64's branch prediction unit is better than Intel's. If anything isn't it supposed to be smaller, more efficient, but in the end Intel's wins? I mean Intel's BP in their P4 has got to be damn good to make up for Netburst's crappiness.
 

dguy6789

Diamond Member
Dec 9, 2002
8,558
3
76
Read the following information. I find that anyone who has read all of the following articles in full, has a very deep understanding of the Hammer architecture.

http://www.anandtech.com/cpuchipsets/showdoc.html?i=1815

http://www.anandtech.com/cpuchipsets/showdoc.html?i=1816

http://www.anandtech.com/it/showdoc.html?i=1817

http://www.anandtech.com/cpuchipsets/showdoc.html?i=1818

http://www.anandtech.com/showdoc.aspx?i=1884

If you read all of those articles, you will understand in depth how the Athlon 64 and Opteron function. So many people do not appreciate what it takes to make an Athlon 64 run as well as it does. Reading those will give you a great understanding. I recommend searching the Anandtech archives and looking for an article on the Netburst architecture.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
To me Conroe is merely a tweaked Pentium M core. The amount of increases that they get must be incremental from a Pentium M baseline at least, and Pentium M clock for clock is almost comparable against the A64. Shared cache, dual core, 1 more execution unit, macro-op fusion, and single cycle SSE seemed to be the main improvements over Yonah/Dothan.

In any case I never read anything where A64's branch prediction unit is better than Intel's. If anything isn't it supposed to be smaller, more efficient, but in the end Intel's wins? I mean Intel's BP in their P4 has got to be damn good to make up for Netburst's crappiness.

Oh no, thinking Conroe is a tweaked Pentium M core is such a mistake. Its a world of difference. Conroe improves VASTLY over Core Duo, which itself was improvement over Pentium M.

Yonah vs. Dothan
Micro-ops fusion for SSE instructions of all types (SSE/SSE2/SSE3)
SSE instructions are now handled by all three decoders
SSE3 instruction set
Faster execution of some SSE2 instructions as well as integer divide(FP performance improvement)
Enhanced data prefetch

Read Conroe vs Yonah and P4 here

http://www.realworldtech.com/page.cfm?ArticleID=RWT030906143144&p=1

You can sort of say this will be Intel's K7(back when K7's kicked ass over P6).







 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Man, again, Conroe has much more improvement against Yonah than Banias ever did for P6.

All Banias did over P6 was:
Add Micro Op Fusion
Use P4 bus
Use bigger cache
Dedicated Stack manager
Better branch prediction
Some increased pipeline stages

Conroe fixes ALL the major architectural disadvantages P6 architectures had over K7, in addition to further enhancing it!!!

-Improvements to OOO
-More Execution units, which are individually more powerful than Banias/Dothan/Yonah
-More powerful SSE
-Faster FSB
 

dguy6789

Diamond Member
Dec 9, 2002
8,558
3
76
Originally posted by: MBrown
I thought conroe didnt have netburst.

It doesn't. I was simply stating that it is better to understand as many architectures as possible. Even if one does not like something, it is still in one's best interest to understand it.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
The truth is, there are NO real exact reason for Conroe's performance. Its everything. Everything, Branch prediction, Execution units, Cache structure, Memory subsystem, front end and back end is improved.

Magnitude better/exciting than all these 5 years from Intel when ONLY performance increase were four reasons: Cache, FSB, Clock speed, and Optimization(like SSE).
 

MDme

Senior member
Aug 27, 2004
297
0
0
I agree it's a combination of all of the features. Certainly the wider execution core (4-wide), better branch predictor (gained from P4-netburst experience), macro-ops, micro-ops fusion, which combined with the wider execution core makes it even more potent, more cache, shared cache, better FSB, better execution units will make it truly better than A64.

Remember that core duo is nearly clock for clock as good if not better or slightly worse than A64.

Anyway, Intel realized that to beat a great architecture like A64 they needed to try everything including the kitchen sink. and so they did. they improved everything hoping it takes them over the top. It's a slam dunk!

The ballgame now moves to AMD....they better have something ready soon or they will get pummeled next round. Hope AMD doesn't lose the reputation they worked so hard to gain.

If AMD did nothing while waiting for intel to catch up, then they deserve and will get their A$$ whooped in 2006-2007...maybe longer.

If AMD had something in the works (maybe K8L or K10) and manages to put it out early enough (1H 2007) then it will be good for them...and deservingly so (if it is better than what intel has on the table.

now time to put off upgrade plans until 1H 2007....
 

xtknight

Elite Member
Oct 15, 2004
12,974
0
71
More instructions per clock, 3 (Pentium M) vs. 4 (Conroe). Here also:
  1. Intel Wide Dynamic Execution. Intel widened the execution pipeline so that now 4 instructions can be processed within a single clock cycle. As a result, fewer cycles are needed for processing the same amount of instructions and hence less energy is consumed for the same amount of work. They have also enhanced the ability to combine instructions. Now they are talking not only about micro-fusion, when simpler instructions were combined into a single one, but they have also introduced macro-fusion, i.e. when such complex instructions as ?compare? and ?jump? can be put together into a single instruction.
  2. Intel Advanced Digital Media Boost. This innovation deals with SSE family of instructions. Now the entire family of SSE instructions will be executed in a single cycle.
  3. Intel Advanced Smart Cache (L2 cache). They introduced shared cache structure and incorporated the advanced caching algorithms. Now the info will be more efficiently shared between the processors. Even if one CPU is idle at a given moment of time, the other CPU will be able to take the advantage of the entire cache. In other words, there will be no partitioning of the cache space.
  4. Intel Smart Memory Access. Intel introduced improved pre-fetch algorithms that deliver additional flexibility for ordering loads and stores. These algorithms allow to significantly improve efficient energy distribution and performance.
  5. Intel Intelligent Power Capability. Here they imply advanced power gating, i.e. they shut down those systems that are not needed for instruction execution.
 

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
Originally posted by: seferio
So curiously, has anyone thought of the reasons as to why Conroe is THAT much faster over Dothan/Yonah and similarly Rev. E of the A64?

My thoughts are that 4MB of L2 while shared between both cores does allow single threaded applications (i.e. games) to take advantage of the large cache. Looking at how cache size (and accounting for diminishing returns) improves performance for say a Venice vs San Diego, I would think going from 2MB to 4MB would account for at least 5%-10% of the performance difference between A64 and Conroe.

I'm not sure as to how Macro-Op fusion would really help, but I guess in the end it will allow programs to take advantage of idle execution units boosting overall performance. I'd say it could account from anywhere to 3-5% of the performance boost.

Lastly the boost in SSE throughput will probably be pretty huge in applications that use it. I bet if Anand turned off SSE support in games we could really see some interesting numbers (i.e. compare in Quake4 A64 SSE off vs Conroe SSE off).

This is just my take of where the performance advantage of Conroe actually comes from. The end result is that AMD can game some ground back with a 4MB L3, implementing a shared cache (which I doubt will be done in the K8 generation as the additional control logic would be too difficult to add on). Once/If AMD fixes DDR2 in AM2 then it should be even closer, but definitely I think Intel will hold the lead until AMD comes out with a true next generation CPU (shared cache, larger cache (2MB per core at least), improve their branch prediction to Intel levels, and improve SSE/2/3 performance on their cpus).

1. In the past, SSE hasn't relaly provided large performance boosts.
2. Performance doesn't increase linearly with cache size increases all the time. Sometimes, doubling cache can result in a big performance increase. It's very similar to memory, adding more memory does little for performance...until you run into a situation where you haven't had enough.
3. I'd imagine Conroe beats the K8 because it at least matches the execution core, has more and faster cache, and better branch prediction, so the only thing it lacks is the integrated memory controller.

Essentially, a dated AMD design against an extremely modern Intel design, sort of like the Athlon against the P3. Besides the IMC, what major changes has the athlon core seen since the first 500mhz core was released? (ok, it has seen plenty of major changes since then.....ok since the tbird then)

Conroe is an Athlon-like Intel design core with pretty much everything every other Intel processor has brought to the table and had superior to AMD.

BTW, Netburst didn't suck (until heat dissapation became a problem), I'd say it was easily the superior architecture to the Athlon, and if heat dissapation wasn't a problem we would definetely be seeing a follow up to it instead of Conroe.

Anyhow, certainly seems to show that AMD should have pushed up development of the K9/K10 rather than push it back. Looks like there was substantial room for improvement afterall.
Now then, if AMD is getting whooped by Intel...what happens to IBM? Well, not specifically IBM, I'm sure their Power5 line is fine, but Power4 (the G5) was barely hanging in there, looks like it's done for now.
 

dmens

Platinum Member
Mar 18, 2005
2,271
917
136
Excellent article. I'd like to add that besides the new architectural features, a portion of merom's performance can be attributed to its much larger buffers on all parts of the chip. A bigger RS/ROB goes a long way to prevent pipeline stalls.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Essentially, a dated AMD design against an extremely modern Intel design, sort of like the Athlon against the P3.

P4?? That was a new design compared to the K7's and P6's however it didn't perform better initially, so your argument doesn't always apply.
 

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
Originally posted by: IntelUser2000
Essentially, a dated AMD design against an extremely modern Intel design, sort of like the Athlon against the P3.

P4?? That was a new design compared to the K7's and P6's however it didn't perform better initially, so your argument doesn't always apply.

1. Willamette was rushed, P4 wasn't ready for primetime release yet. AMD had opterons out way before they offered competitive performance, but wouldn't replace the athlon xp until they did.
2. P4 and Athlon could practically be considered on the same tech level, with P4 a step ahead. (AMD did add some major updates to the athlon design going from slot a to socket a)
3. Part of the reason Northwood eventually sidelined the Athlon XP was because development on the Athlon XP stopped in favor of the Athlon 64 which got delayed. Even still, Athlon XP was pushed to its limit, but if AMD had continued improving it, it likely wouldn't have reached clocks any higher than the current Athlon 64s and lost to Prescott anyway, despite being a design pushed to its absolute limits with Prescott hitting an barrier not directly related to its design. (a water cooled prescott versus a water cooled athlon xp would have shown an even greater gap)
 

ZippZ

Member
Jul 24, 2000
108
13
81
I'd say it's the SSE and cache. Look at all of the benchmarks run, they would all benefit from the SSE. Also look at the cache, a single threaded program would be able to use the whole 4MB of cache.

Now lets see that chip run in a multi-threaded server or database, I bet the scores would drop compared to a K8. Still an impressive chip, but nothing that will make AMD go out of business. AMD will lose the crown this year, but probably get it back next year and so forth.
 

dexvx

Diamond Member
Feb 2, 2000
3,899
0
0
Originally posted by: XBoxLPU
Originally posted by: dexvx
All the reasons are here:

http://www.realworldtech.com/includes/t...m?ArticleID=RWT030906143144&mode=print

Core Architecture means Merom aka Conroe. Its far more than a tweaked Pentium-M.
AT is slipping on articles.....

How is it slipping?

The target audience of AT is generally the end-user/entry level enthusiast. That article I linked is for CSE/EE majors/dedicated hobbyists. Very few people can thoroughly comprehend that article on this forum.
 

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
Originally posted by: ZippZ
I'd say it's the SSE and cache. Look at all of the benchmarks run, they would all benefit from the SSE. Also look at the cache, a single threaded program would be able to use the whole 4MB of cache.

Now lets see that chip run in a multi-threaded server or database, I bet the scores would drop compared to a K8. Still an impressive chip, but nothing that will make AMD go out of business. AMD will lose the crown this year, but probably get it back next year and so forth.

Eh, most games didn't get much of a boost from SSE. The Quake and Doom series were about the only ones that ever used it to good effect.
 

XBoxLPU

Diamond Member
Aug 21, 2001
4,249
1
0
Originally posted by: dexvx
Originally posted by: XBoxLPU
Originally posted by: dexvx
All the reasons are here:

http://www.realworldtech.com/includes/t...m?ArticleID=RWT030906143144&mode=print

Core Architecture means Merom aka Conroe. Its far more than a tweaked Pentium-M.
AT is slipping on articles.....

How is it slipping?

The target audience of AT is generally the end-user/entry level enthusiast. That article I linked is for CSE/EE majors/dedicated hobbyists. Very few people can thoroughly comprehend that article on this forum.

Guess you didn't read the article on the K8 architecture.
 

Zebo

Elite Member
Jul 29, 2001
39,398
19
81
Reason for Conroe's Performance Boost?

They had a 3.6 Ghz conroe in there instead of 2.6?

Persoanlly I don't fully trust intels "test" because reviewers not even allowed to look inside box and other sites have shown the A64 marks to be off by ABOUT 10%.. not to mention intels losy track record for being straight up...But I predict conroe will still be faster by 10% clock for clock than a A64 in a global benchmarking scene.. Should'nt be too suprising since Yonah was clock for clock equal and Conroe is Yonah with lots of peformance enhancing extras added as noted.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |