AMD Barcelona Thoughts/Questions

Kuzi

Senior member
Sep 16, 2007
572
0
0
Hey guys, this is my first post on the forums here, but I've been an avid anandtech reader (great site) for a long time, around 9 years or so.

Since there are a bunch of knowledgeable people here on the forums, I thought I'd ask a few questions that keep running through my head about the new Quad-Core processors from AMD.

From the previews I've seen so far on the net, it seems that the L3 cache (CPU @2GHz) adds about 21ns additional latency to the caching system, and this L3 cache latency gets lower the higher the CPU speed gets (I think 19ns @2.5Ghz). I'm sure AMD engineers know that this extra latency is offsetted by the use of the shared 2MB L3 cache, otherwise they wouldn't have this L3 cache at all. Now to my question:

1) If AMD had doubled the L2 cache size for each core, meaning each core uses 1MB L2 instead of the current 512KB, but did not use any L3. Thus removing the extra layer of latency mentioned above, but also having no shared cache (the CPU will stay around the same size overall), do you think Barcelona would be slower or faster?

On some benchmarks on the net, I've seen Intel Core2 processors with 8MB L2 cache perform about 10-20% faster than the same CPU but with only 4MB L2 cache, and that is clock-for-clock. I know these are rare cases, and these are probably synthetic benchmarks, most other software/games usually get about 1-5% boost from larger caches, but still it is a difference, and can give Intel CPUs a big edge over Barcelona not because the AMD quad-core is architecturally deficient, but because these programs like more cache, simple as that. Now to my second question:

2) AMD's 45nm CPU called Shanghai, that is supposed to be released Q4 next year (I don't believe they can release it in 2008 at all, but we can always hope), is supposed to have 6MB L3 cache but the same L2 cache as Barcelona has now at 512KB. My thought here is this, L2 cache runs faster than L3 cache, so why wouldn't AMD double the L2 cache size and make the L3 cache size 4MB, also doubling it? Meaning that instead of Shanghai having 2MB L2/6MB L3, it would have 4MB L2/4MB L3, the effective CPU size should stay about the same. I'm not an expert on this, but I would assume that the second cache configuration mentioned 4MB/4MB would allow the processor to perform faster, any thoughts on this guys?

About the Barcelona memory controller, from what I understand, on normal motherboards with a single power plane, it runs 400MHz slower than the CPU clock speed, and on motherboards that support split power planes, it runs a bit faster at 200MHz lower than CPU speed. This small boost to the MC does seem to give a small boost to performance. Which brings me to my final question:

3) Why doesn't the MC in Barcelona run at full processor speed instead of 200-400MHz slower. Is this to lower the CPU power usage, or maybe a stability issue? It might have been mentioned somewhere and I have missed it. I'm asking this because for servers this might be fine, but for the desktop it seems AMD will need every increase in performance it can get, especially now that Intel is so close to releasing their Penryn CPUs, and Phenom will have a hard time competing against them.

*I would like to give my thoughts about K10 from all the info i've seen on the net so far. Like many of you guys here, I care about how this processor will perform on the desktop. As it seems now, for the majority of apps, AMD Phenom CPUs will be about 5-15% slower clock-for-clock than the new Intel Penryns that will be released. And yes, that is taking into account Phenom running with 1066MHz DDR2 memory and maybe a faster running memory controller.
*The frequency scaling and multithreading scaling on Barcelona/Phenom is a bit more efficient than Intels CPU's, so the faster the frequecy of Phenom, the smaller the advantage Penryn will have. And the more threads a program uses, the smaller the Penryn advantage also gets. So next year, with faster Phenom speeds (hopefully 3GHz in Q1), and more multithreaded software/games, Phenom will get really close to Penryn performance but still probably be a bit slower clock for clock.
*Architecturally I do believe Barc is more advanced than C2D and even Penryns, but the problem I see with it now is small cache sizes and low frequency speeds. Imagine if Barcelona was at 45nm process right now, and running at 3.2+GHz with twice the amount of L2/L3 cache, I'm sure it would easily match or beat Penryn. Intel's greater resources and manufacturing strength is giving them the advantage in this case really.
*As I undestand it, K10 can only perform 3 instructions per cycle, same as the old K8 (please correct me if I'm wrong), and C2D can do 4-5 instructions in optimal cases. That seems to me one of the weakest points of the K10 architecture. If AMD can release the 45nm Shanghai next year, with high clock speeds (3GHz+), larger caches, and make one change to the K10 architecture, and that is to allow it to run at least 4 instructions per clock cycle (not sure how hard that is to do), I really believe Shanghai would perform better than any CPU Intel will have in the market at the time, even ones running at higher clock speeds for example a 3.8GHz Penryn vs 3.0GHz Shanghai. Yep it can be like the P4 vs Athlon 64 days

Btw, the 4-instruction K10 Shanghai is just something I added, didn't read anything about it, I don't know if it's possible at all for AMD to do that on K10 anyways.

Sorry for the long post, all comments, corrections, and answers are welcome, please share what you think, thanks.

Kuzi,
 

nonameo

Diamond Member
Mar 13, 2006
5,902
2
76
A. I don't think we'll see a 4 issue shanghai. I can't tell you why(as I don't know) but I think they would have done that by now if they were going to do it.

B. If you look at benchmarks from other Athlon 64, X2, Sempron and Opteron processors, Extra L2 cache brings little added performance in MOST cases(not all). I imagine that AMD added the L3 as more of a marketing ploy rather than for real performance increases(as it's probably not that difficult or expensive to add more L3, relatively speaking) . I don't think increasing the L3 will help shanghai that much either. It's another numbers contest.

 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Rather interesting post. I can't add much to what noname said except that until K10 is shown on a proper chipset and M/B at higher clocks its really not fair to say its slower or faster than penryn or merom its just not fair to call it right now. I agree with your numbers but after much thought. One must still wait and see. Just to appease the amd folks around the web.

The 2 cores were all interested in are coming out around the same time so the shoot out is going to be interesting and should end all the hype.

Problem is guys like myself and others are going to start hyping Nehalem . No matter how Penryn Phenom turn out. Than of course AMD folks will probably start hyping bulldoozer. But at least the native thing will be put to rest. Along with ondie memory controller.

I must say this tho I doubt K10 will ever ever live up to the hype.
 

bfdd

Lifer
Feb 3, 2007
13,312
1
0
Why are you comparing Shangai which is the step AFTER Phenom to the Penryn which comes out BEFORE the Phenom? Shouldn't you be comparing Shangai to Nehalem?
 

bryanW1995

Lifer
May 22, 2007
11,144
32
91
I don't know what's wrong with me, nemesis, but I find myself agreeing with you yet again.
 

bryanW1995

Lifer
May 22, 2007
11,144
32
91
Originally posted by: bfdd
Why are you comparing Shangai which is the step AFTER Phenom to the Penryn which comes out BEFORE the Phenom? Shouldn't you be comparing Shangai to Nehalem?
amd had better have an ace up their sleeve with bulldozer. sorry, I had to say it since bulldozer was feeling left out with all the shanhai/penryn/nehalem references...

 

Lord Banshee

Golden Member
Sep 8, 2004
1,495
0
0
I'll "try" and answer (1)

1) (A) If the new AMD Core would get rid of the L3 cache and implement a larger L2 cache. I see they will lost efficiency this way. The L3 cache is shared between core for the aspect if Core 2 is accessing data and Core4 already asked for the same data and it is still in the L3 cache. Core 2 will not have to goto to the RAM only goto the L3 cache which is 100+x faster than RAM. Same thing applies if Core 4 changes data and it is still stored in L3 cache and Core 2 need data, the newest version of the data is still in L3 cache you get a nice speed increase than going to RAM.

(B) i believe the Core2Duo's L2 cache is shared between all cores thus making have all the features as the AMD L3 cache, but faster due to it being a L2 cache. Maybe the reason why AMD chose to use L3 cache for the shared cache is because (1) they can take a slight performance hit in memory due to their fast IMC and maybe that having a L3 shared cache has less performance hit with size versus intels method where the L2 cache is shared which is a much need cache unit and i can see if more one one core is using it the more the better. Also if this is the case for AMD that is, then it is good for them as AMD has never been as good as making SRAM as cheap as Intel so if they can get away with less for no to slim to no performance hit then it is good for them.

Last semester i had a Computer Architecture class where we build a working MIPS32 CPU and we had to do simulation, much like what Intel and AMD does, to find the best cache. You would be amazed how much a difference little things such as size and the way it stores data in the cache would add to latency. I am 100% sure AMD has done months of simulations and testing of different cache size to find the best for the performance and profit.
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0



1. It would run pretty close to the same...what you have to understand is that you're comparing apples to oranges here. For Intel, the large cache helps to compensate for the much higher memory latency inherent in a FSB design...For AMD, the on-die memory controller is already very low latency, so a larger cache has very little effect in L2.
A good example of this is when you compare the X2 4600 to the 4800, or the 4200 to the 4400...the X2 4200 and 4600 have 512k L2, the 4600 and 4800 have double that.
Sharky Extreme Review

The shared L3 however is a different question...I am less conversant with this, but my understanding is that it greatly increases performance when using multiple chips or cHT connections (I believe that it has something to do with both the efficiencies of the MOESI protocol and the higher bandwidth of HT 3.0).


2. As to Shanghai, keep in mind that AMD demoed their first 45nm test chip just 3 months after Intel did (both were SRAM)...and they have been co-developing this process with IBM at East Fishkill.

3. I don't know...it's a good question.

As to Phenom vs Penryn,
I think Nemesis is spot-on when he said "until K10 is shown on a proper chipset and M/B at higher clocks its really not fair to say its slower or faster than penryn or merom its just not fair to call it right now"
We don't really even know how shipping Barcelonas will bench (remember that the Barcy's that were reviewed have turned out to not be production silicon and are 5%+ slower than shipping Barcys).


As to creating a 4-issue Shanghai, there certainly are those rumours, but even C2D hardly EVER uses it...in fact you'd be hard pressed to hit 3 issues per clock due to the way the OS works.

At the end of the day, the L2 on Barcelona has very little to do with it's performance, however it makes a very big difference for C2D and Penryn...


 

zsdersw

Lifer
Oct 29, 2003
10,505
2
0
Originally posted by: Viditor
however it makes a very big difference for C2D and Penryn...

That's either misleading or lacking in specificity. C2D's performance across the board, regardless of application, does not hinge upon the size of its L2 cache. Most applications benefit very little, if at all, from the 4MB L2 of most of C2D versus the 2MB of the lower-end products, and even the 1MB of the E21xx processors.

I'd agree that it does matter a bit more than in AMD's architectures, so your statement could be fixed by changing the wording a bit:

"... however it makes a relatively significant difference for C2D and Penryn."

 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: zsdersw
Originally posted by: Viditor
however it makes a very big difference for C2D and Penryn...

That's either misleading or lacking in specificity. C2D's performance across the board, regardless of application, does not hinge upon the size of its L2 cache. Most applications benefit very little, if at all, from the 4MB L2 of most of C2D versus the 2MB of the lower-end products, and even the 1MB of the E21xx processors.

I'd agree that it does matter a bit more than in AMD's architectures, so your statement could be fixed by changing the wording a bit:

"... however it makes a relatively significant difference for C2D and Penryn."

You should read the post before replying...

From the OP:
I've seen Intel Core2 processors with 8MB L2 cache perform about 10-20% faster than the same CPU but with only 4MB L2 cache, and that is clock-for-clock
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
Originally posted by: nonameo
B. If you look at benchmarks from other Athlon 64, X2, Sempron and Opteron processors, Extra L2 cache brings little added performance in MOST cases(not all). I imagine that AMD added the L3 as more of a marketing ploy rather than for real performance increases(as it's probably not that difficult or expensive to add more L3, relatively speaking) . I don't think increasing the L3 will help shanghai that much either. It's another numbers contest.

I believe the L2/L3 cache in K10 has a bigger effect on performance than what most people assume, so increasing their sizes would give a good boost for many applications, I'd say at least 5% average, which is still nice.

Originally posted by: Nemesis 1
Problem is guys like myself and others are going to start hyping Nehalem . No matter how Penryn Phenom turn out. Than of course AMD folks will probably start hyping bulldoozer. But at least the native thing will be put to rest. Along with ondie memory controller.

Bulldoozer is far off now, didn't mention it here because first we have to see how fast AMD can ramp up K10 speeds, and how fast they can release 45nm CPUs, we'll have to wait and see.

Originally posted by: bfdd
Why are you comparing Shangai which is the step AFTER Phenom to the Penryn which comes out BEFORE the Phenom? Shouldn't you be comparing Shangai to Nehalem?

First of all Penryn and Phenom should be released in Nov-Dec time frame, I wouldn't call that before and after, 1 or 2 months difference is around the same time

I didn't mean my post to be about the race between Intel and AMD, I really wanted to talk about the K10 architecture, but of course had to compare it with what Intel has now just for the sake of having something to compare with. If Intel is going to release Nehalem around the same time as Shanghai, who knows, but my guess is that AMD will have a hard time competing.

Originally posted by: Viditor
A good example of this is when you compare the X2 4600 to the 4800, or the 4200 to the 4400...the X2 4200 and 4600 have 512k L2, the 4600 and 4800 have double that.

The L2 in Athlon 64 and X2 is slow (crap) compared to what K10 and C2D have now. K8 almost didn't get any boost when doubling the L2 cache size, the same can be said when using higher clocked memory speeds say 667MHz DDR2 to 800MHz DDR2. But I'd say K10 is different, it will benefit more from faster memory clocks and larger caches than K8 did.

2. As to Shanghai, keep in mind that AMD demoed their first 45nm test chip just 3 months after Intel did (both were SRAM)...and they have been co-developing this process with IBM at East Fishkill.

Still doesn't mean anything really. Remember Intel have been using 65nm for over two years now, AMD only 10 months or so. Intel has like 12-18 months advantage in process race.

As to creating a 4-issue Shanghai, there certainly are those rumours, but even C2D hardly EVER uses it...in fact you'd be hard pressed to hit 3 issues per clock due to the way the OS works.

I did not know that, how about when using a different OS like Linux?


Thanks for the input everyone.
 

jones377

Senior member
May 2, 2004
451
47
91
Originally posted by: Kuzi

As to creating a 4-issue Shanghai, there certainly are those rumours, but even C2D hardly EVER uses it...in fact you'd be hard pressed to hit 3 issues per clock due to the way the OS works.

I did not know that, how about when using a different OS like Linux?


Thanks for the input everyone.

It has nothing to do with the OS...

 

zsdersw

Lifer
Oct 29, 2003
10,505
2
0
Originally posted by: Viditor
You should read the post before replying...

.. and you shouldn't take a quote out of context by omitting important parts of it, such as what's highlighted below:

On some benchmarks on the net, I've seen Intel Core2 processors with 8MB L2 cache perform about 10-20% faster than the same CPU but with only 4MB L2 cache, and that is clock-for-clock. I know these are rare cases, and these are probably synthetic benchmarks, most other software/games usually get about 1-5% boost from larger caches

Since it's not an across-the-board significant improvement from the cache, how can you truly say that the size of the L2 cache makes a "very big difference" for C2D and Penryn?

I agree that some apps can use all the cache you can throw at them and show increases in performance, but does that really have to do with the CPU and its architecture or does it have more to do with the nature of the application itself? The latter makes more sense, particularly in light of the fact that a relative few applications benefit significantly from a larger L2.
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: Kuzi
Originally posted by: nonameo
Originally posted by: Viditor
A good example of this is when you compare the X2 4600 to the 4800, or the 4200 to the 4400...the X2 4200 and 4600 have 512k L2, the 4600 and 4800 have double that.

The L2 in Athlon 64 and X2 is slow (crap) compared to what K10 and C2D have now. K8 almost didn't get any boost when doubling the L2 cache size, the same can be said when using higher clocked memory speeds say 667MHz DDR2 to 800MHz DDR2. But I'd say K10 is different, it will benefit more from faster memory clocks and larger caches than K8 did.

In what way do you think K10 is different with respect to L2 cache?

2. As to Shanghai, keep in mind that AMD demoed their first 45nm test chip just 3 months after Intel did (both were SRAM)...and they have been co-developing this process with IBM at East Fishkill.

Still doesn't mean anything really. Remember Intel have been using 65nm for over two years now, AMD only 10 months or so. Intel has like 12-18 months advantage in process race.

I guess I just don't see it as a race...remember that Intel had hit a thermal ceiling when they first shipped 65nm. In other words, they absolutely had to ship it no matter WHAT the yields were because they couldn't get the chips to crank any faster at 90nm.
AMD still had miles of headroom left on 90nm, so they could certainly afford to wait until yields were at their highest...
While there is certainly a savings in going from 90nm to 65nm as far as wafer area is concerned, there's ALSO a great deal of savings in waiting because it allows you to continue using existing equipment instead making the massive Capitol Expenditure until you have to. (i.e. the more use you get out of your equipment, the cheaper each chip is to make)

As to creating a 4-issue Shanghai, there certainly are those rumours, but even C2D hardly EVER uses it...in fact you'd be hard pressed to hit 3 issues per clock due to the way the OS works.

I did not know that, how about when using a different OS like Linux?

That's a very difficult question because there are so many other variables (different distros for example). However, I have never heard of Linux hitting 4 issues per clock either, except in some very rare (almost purpose-built) apps...that said, I'm not incredibly familiar with Linux, so maybe someone else can answer this?
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: jones377
Originally posted by: Kuzi

As to creating a 4-issue Shanghai, there certainly are those rumours, but even C2D hardly EVER uses it...in fact you'd be hard pressed to hit 3 issues per clock due to the way the OS works.

I did not know that, how about when using a different OS like Linux?


Thanks for the input everyone.

It has nothing to do with the OS...

But I take it you don't disagree with the point itself...could you please expound on the reasons?
The only notes I have other than that is a quote:
"The "4-issue" of C2D is different from the "3 complex decodes" of K8.
The former reads 20 bytes per cycle and generate up to 4 micro ops.
The latter fetches 32 bytes per cycle and decodes up to 3 x86 instructions.
In sheer number, K8's 3-way x86 decoding is even better than C2D's 4-way micro op generation, but in average they should perform about the same"
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: zsdersw
Originally posted by: Viditor
You should read the post before replying...

.. and you shouldn't take a quote out of context by omitting important parts of it, such as what's highlighted below:

On some benchmarks on the net, I've seen Intel Core2 processors with 8MB L2 cache perform about 10-20% faster than the same CPU but with only 4MB L2 cache, and that is clock-for-clock. I know these are rare cases, and these are probably synthetic benchmarks, most other software/games usually get about 1-5% boost from larger caches

Since it's not an across-the-board significant improvement from the cache, how can you truly say that the size of the L2 cache makes a "very big difference" for C2D and Penryn?

I agree that some apps can use all the cache you can throw at them and show increases in performance, but does that really have to do with the CPU and its architecture or does it have more to do with the nature of the application itself? The latter makes more sense, particularly in light of the fact that a relative few applications benefit significantly from a larger L2.

Sorry, but are you disagreeing that the most important function of the large caches is to make up for the much higher memory latency??
 

zsdersw

Lifer
Oct 29, 2003
10,505
2
0
No, I'm disagreeing with your assertion that C2D/Penryn performance is affected in a very big way by the size of the L2 cache.
 

Jeff007245

Member
Aug 31, 2007
125
1
81
Originally posted by: zsdersw
No, I'm disagreeing with your assertion that C2D/Penryn performance is affected in a very big way by the size of the L2 cache.

If performance is not affected by the L2 cache, then why does Intel keep increasing cache size? And by alot, mind you... The performance of the E2xxx series does not compare to that of the E6xxx series, especially the 1MB L2 cache versions...

It's already been said many times by others, articles, and reviewers to the point that its been abused, that Intel's Core 2 Cpu's need the big cache because of the fact that it is hindered by the FSB. Reason AMD Cpu's don't need that big of a cache is because of its built in memory controller and HyperTransport.

Edit: The performance of the E2xxx series does not compare to that of the E6xxx series, especially the 1MB L2 cache versions...

E4xxx(Allendale) series I meant.
 

zsdersw

Lifer
Oct 29, 2003
10,505
2
0
Originally posted by: Jeff007245
If performance is not affected by the L2 cache, then why does Intel keep increasing cache size? And by alot, mind you... The performance of the E2xxx series does not compare to that of the E6xxx series, especially the 1MB L2 cache versions...

It's already been said many times by others, articles, and reviewers to the point that its been abused, that Intel's Core 2 Cpu's need the big cache because of the fact that it is hindered by the FSB. Reason AMD Cpu's don't need that big of a cache is because of its built in memory controller and HyperTransport.

Then perhaps you can explain to me why most apps don't demonstrate a significant improvement in performance between 2MB and 4MB.. or 8MB of L2 cache.

And while you're at it.. explain why FSB1333 doesn't add much to the performance picture versus FSB1066, and why the difference between DDR2-667 and DDR2-800 or higher is pretty small. Seems to me C2D isn't particularly dependent on the speed of either the FSB or RAM.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
First thing I would like to say is this My hat is off to the AT mods you have made this forum cival and really fun to post at. Great work guys.

I see a little talk here on the 3issue 4issue core thing again .

One must keep in mind that while Intel was working on the merom core future cores were also being worked on . Even tho 4 issue seldom gets a hit. Intel created it with the future in mind. So when we do see Nehalem with its completely new type of HT. Hperthreading. I see 4issue being hit alot .
So intel you could say added 4issue core to perfect the logic on the chip for up and coming releases . Does this make sense or not?

bryanW1995 thats 2 times now fella you best watch your self fella or men in white suites be coming for ya. lol
 

jones377

Senior member
May 2, 2004
451
47
91
Originally posted by: Viditor
Originally posted by: jones377
Originally posted by: Kuzi

As to creating a 4-issue Shanghai, there certainly are those rumours, but even C2D hardly EVER uses it...in fact you'd be hard pressed to hit 3 issues per clock due to the way the OS works.

I did not know that, how about when using a different OS like Linux?


Thanks for the input everyone.

It has nothing to do with the OS...

But I take it you don't disagree with the point itself...could you please expound on the reasons?
The only notes I have other than that is a quote:
"The "4-issue" of C2D is different from the "3 complex decodes" of K8.
The former reads 20 bytes per cycle and generate up to 4 micro ops.
The latter fetches 32 bytes per cycle and decodes up to 3 x86 instructions.
In sheer number, K8's 3-way x86 decoding is even better than C2D's 4-way micro op generation, but in average they should perform about the same"

It has more to do with the code itself than the decoders, although they do play a part. The IPC on most integer code is only around 1.0 on average. But even on such code, having a 4th decoder can improve performance a bit, by a few percent.
 

Jeff007245

Member
Aug 31, 2007
125
1
81
Originally posted by: zsdersw
Originally posted by: Jeff007245
If performance is not affected by the L2 cache, then why does Intel keep increasing cache size? And by alot, mind you... The performance of the E2xxx series does not compare to that of the E6xxx series, especially the 1MB L2 cache versions...

It's already been said many times by others, articles, and reviewers to the point that its been abused, that Intel's Core 2 Cpu's need the big cache because of the fact that it is hindered by the FSB. Reason AMD Cpu's don't need that big of a cache is because of its built in memory controller and HyperTransport.

Then perhaps you can explain to me why most apps don't demonstrate a significant improvement in performance between 2MB and 4MB.. or 8MB of L2 cache.

And while you're at it.. explain why FSB1333 doesn't add much to the performance picture versus FSB1066, and why the difference between DDR2-667 and DDR2-800 or higher is pretty small. Seems to me C2D isn't particularly dependent on the speed of either the FSB or RAM.

Well perhaps we could kill 2 birds with one stone by you showing me where you see a Celeron(Conroe Core) 512kb CPU or Core 2 E4xxx(Allendale) 2mb CPU is equal in performance with a one with 4mb cache or 8mb cache? The second you could do that for me, i'll do the same and show you some benchmarks where cache size does matter especially for Intel Core 2's. Or were u just assuming that the smaller cache CPU's are = in performance clock for clock with the lager cache E6xxx CPU's?
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
I don't know the ans. to your question . But let us suppose that it is related to a future core . Like I said about the 4 issue thing.

For instance Nehalem will be a native 4 core part some with ondie memory controller some without . Larger shared cache would be helpful on a 4 core without ondie controller. So Intel would need that larger cache. No reason why intel shouldn't try to perfect it .

I read somewhere that Nehalem will also have shared cache on L1. When I find it I will give link but that info should show up this week anyway.. Another place where large caches would come in handy is code morphing which would be very helpful with a core that would use a compiler such as the Elbrus compiler.

As I said already if intel gives us good info on Nehalem this week . We shall be able to see a much more clear picture on what intel is doing and the direction their heading.
 

zsdersw

Lifer
Oct 29, 2003
10,505
2
0
Originally posted by: Jeff007245
Well perhaps we could kill 2 birds with one stone by you showing me where you see a Celeron(Conroe Core) 1mb CPU or Core 2 E4xxx(Allendale) 2mb CPU is equal in performance with a one with 4mb cache or 8mb cache? The second you could do that for me, i'll do the same and show you some benchmarks where cache size does matter especially for Intel Core 2's. Or were u just assuming that the smaller cache CPU's are = in performance clock for clock with the lager cache E6xxx CPU's?

http://www.bit-tech.net/hardwa...ore_2_duo_processors/5

Most of the time, there's little benefit from the additional 2MB of L2.
 

Jeff007245

Member
Aug 31, 2007
125
1
81
Originally posted by: zsdersw
Originally posted by: Jeff007245
Well perhaps we could kill 2 birds with one stone by you showing me where you see a Celeron(Conroe Core) 1mb CPU or Core 2 E4xxx(Allendale) 2mb CPU is equal in performance with a one with 4mb cache or 8mb cache? The second you could do that for me, i'll do the same and show you some benchmarks where cache size does matter especially for Intel Core 2's. Or were u just assuming that the smaller cache CPU's are = in performance clock for clock with the lager cache E6xxx CPU's?

http://www.bit-tech.net/hardwa...ore_2_duo_processors/5

Most of the time, there's little benefit from the additional 2MB of L2.

Most of the time?

15 out of 18 tests in that thread u posted showed an increase of performance with the bigger cache CPU. The only time it didn't increase is those compression tests, but MOST of the time they did.. In the gaming benchmarks cache does make a difference, but at higher resolutions like that ones showed in that test (1600x1200) Doesn't show too much of an increase because the GPU is doing almost all of the work at those resolutions. Keep in mind at lower resolutions 1024x768 and below the CPU has alot to do with the performance; hence, the bigger change from moving from smaller cache to larger cache.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |