The CPU memory bandwidth problem

Scientist113

Junior Member
Jan 21, 2013
19
0
0
The memory bandwidth has only tripled on Intel's CPUs since 2004, a space of 9 years. The Pentium 4 HT Extreme supported 8.512 GB/s of memory bandwidth, while the latest Intel Core i7 Extreme Edition only supports 25.6 GB/s:

http://ark.intel.com/products/71096...ition-8M-Cache-up-to-3_90-GHz?wapkw=i7-3940xm
http://ark.intel.com/products/27492...066-MHz-FSB#infosectiongraphicsspecifications
http://en.wikipedia.org/wiki/Front-side_bus

This is a massive problem! The CPU is fundamentally limited in performance by its memory bandwidth. Thus - the new Intel Core i7 Extreme Edition is only 3 times faster then the Pentium 4 HT Extreme Edition - despite it coming out 9 years later.

How to fix it - the onboard GPU. By giving the CPU access to the GPU's resources, it can easily have access to 100+ GB/s of memory bandwidth. A brilliant move.
 

Bill Brasky

Diamond Member
May 18, 2006
4,324
1
0
I'll have what he's having.

I think these are my favorite from the bench... It's interesting and handy that Anand has a P4 on the bench site. It's kind of fun to be able to see how far Intel's come since then. And yes, I realize these heavily threaded benchmarks are don't really reflect the IPC difference of Ivy vs Netburst.

 
Last edited:

FalseChristian

Diamond Member
Jan 7, 2002
3,322
0
71
I think you'll notice that having 1 or 2 high-end GPUs such as the GTX 670 or the Radeon HD 7970 makes the biggest difference in games since these GPUs have well over 100GB of vRAM bandwidth. 20GB-30GB of bandwidth is more than enough for modern day CPUs. The overclock you get outta yer CPU is way more important. My i5 2500K at 5GHz would still be way faster than a stock i5 2500K at 3.3GHz with 100GB of bandwidth.
 
Feb 25, 2011
16,908
1,553
126
Yes, some things are memory limited. If you're going to drop a grand on a CPU, get one that supports quad channel memory.

But not all things are memory limited, and the CPUs are oodles more awesomer than 3x.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
IMO the reason ram bandwidth has not scaled as aggressively as you might like is because it simply hasn't needed to.

http://www.anandtech.com/show/6389/gskill-tridentx-review-2x4gb-at-ddr32666-c111313-165v

The performance difference between DDR3-1333 and DDR3-2666 is on the order of 5% in real-world apps, and only around 20% in synthetic apps.

When that is the case, what business case is to be made for justifying the cost-adder of developing even faster DDR, even higher bandwidth (more channels) or exotic mixed-use models involving GPU's and GDDR?

Today's CPUs don't take advantage of a near doubling in bandwidth, and that is bandwidth that is commercially available.

I suspect Intel and AMD would just as soon develop processors that will take advantage of the cheaply available bandwidth headroom before setting themselves on a bandwidth path that elevates cost (and bandwidth) but brings no more performance improvement than that obtainable with commercially available DDR3-2xxx dimms.
 

Scientist113

Junior Member
Jan 21, 2013
19
0
0
I would postulate that the Pentium 4 660 would give 1/4 the performance of the Intel i7 3770K. But it in fact gives 1/10th the performance. Why - the i7 3770K has 8 MB of L3 cache - the Pentium 4 660 doesn't. Thus - the L3 cache makes a difference. If not for that - it would be 1/4th. The processor is limited by the memory/L3 cache bandwidth, I should re-say it. Nevertheless - the L3 cache is only increasing at so fast a rate.

However you don't need benchmarks to see that the Core i7 3770K can only transfer 4 times more data from its memory banks then the Pentium 4 can. Without its L3 cache - it would be as slow.

So the memory bandwidth problem - can be temporarily addressed with large L3 caches. But that's not a permanent solution - and it makes the CPU far more expensive.
 

Scientist113

Junior Member
Jan 21, 2013
19
0
0
On a different note - these processors are not advancing as fast as I thought they were. It's a completely linear increase. Every year - it increases by the same amount X. Thus since 2004 - it has increased by approximately the strength of the Pentium 4 HT, every year, in performance. Thus 9 years later - it is about 9 times stronger.

Thus - the since the Playstation 4 is seeing a 12-15 times increase in CPU processing power over Playstation 3 - that's quite significant. Significantly above average.
 

Scientist113

Junior Member
Jan 21, 2013
19
0
0
At this rate, in 100 years, that Intel chip will be 100 times more powerful then the Pentium 4 HT. This seems a little slow.
 

Turbonium

Platinum Member
Mar 15, 2003
2,146
82
91
I'll have what he's having.

I think these are my favorite from the bench... It's interesting and handy that Anand has a P4 on the bench site. It's kind of fun to be able to see how far Intel's come since then. And yes, I realize these heavily threaded benchmarks are don't really reflect the IPC difference of Ivy vs Netburst.

I had no idea Anandtech.com had this benching feature. Awesome.
 

Turbonium

Platinum Member
Mar 15, 2003
2,146
82
91
At this rate, in 100 years, that Intel chip will be 100 times more powerful then the Pentium 4 HT. This seems a little slow.
Assuming this statement is accurate (which I'm in no way saying it is), how can you complain?

I once saw a post online where someone was saying that it's basically a miracle that any of this technology stuff works in the first place. And I tend to agree.

Just think of what a CPU actually is and how it works. The fact that it even manages to do what it does in the first place is amazing. And you're complaining that it's not progressing fast enough? The whole Moore's Law thing is yet another marvel that has just spoiled you, lol.

Quit complaining and get some perspective.
 

Vectronic

Senior member
Jan 9, 2013
489
0
0
At this rate, in 100 years, that Intel chip will be 100 times more powerful then the Pentium 4 HT. This seems a little slow.

*only* 100 times faster, but probably 1/50th the size... with 1000 cores per CPU... but in 100 years, silicon transistors will be like vacuum tubes "remember those old 7nm chips? they had such a great EMF field to them"
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
On a different note - these processors are not advancing as fast as I thought they were. It's a completely linear increase. Every year - it increases by the same amount X. Thus since 2004 - it has increased by approximately the strength of the Pentium 4 HT, every year, in performance. Thus 9 years later - it is about 9 times stronger.

Thus - the since the Playstation 4 is seeing a 12-15 times increase in CPU processing power over Playstation 3 - that's quite significant. Significantly above average.

At this rate, in 100 years, that Intel chip will be 100 times more powerful then the Pentium 4 HT. This seems a little slow.

It depends on the application of course, but in doing some benching recently with my FX-8350 with a real-world application I have been using since 1998, I found that the performance has actually doubled every two years for the past 12 yrs (in that app).



From 1998 to 2012 we'd expect performance to now be 128x that of what it was in 1998, which the 4GHz FX-8350 basically provides.

This requires the core counts we are seeing, it doesn't come with single-threaded IPC improvments. If those 8 cores weren't there then performance for my app would be ~1/6 what you see in the graph.
 

Scientist113

Junior Member
Jan 21, 2013
19
0
0
Yes but you're not exactly taking the best CPU from every 2 years and then comparing it to the previous - most of those are centered after 2005. Only the Pentium 2 is in 1998. But yes it is has doubled that fast - for that app.
 

Haserath

Senior member
Sep 12, 2010
793
1
81
The biggest problem is latency instead of bandwidth.

Processors today are smarter at reducing latency of instructions getting to the pipeline to reduce bubbles.

The P4 wasn't very smart. Hence the reason there was such a jump from P4 to Core 2. They used practically the same ram, but Core 2 could double performance(and even real bandwidth in applications).

There's only so much they can do with the kind of technology we have today. They'll most likely switch to something much faster down the road, so 100 years later Silicon transistors will be relics.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
I would postulate that the Pentium 4 660 would give 1/4 the performance of the Intel i7 3770K. But it in fact gives 1/10th the performance. Why - the i7 3770K has 8 MB of L3 cache - the Pentium 4 660 doesn't. Thus - the L3 cache makes a difference. If not for that - it would be 1/4th. The processor is limited by the memory/L3 cache bandwidth, I should re-say it. Nevertheless - the L3 cache is only increasing at so fast a rate.

And why exactly would you expect that Pentium 4 to give 1/4th the performance of an i7-3770K? And in what exactly? You must think that the only factors in CPU performance are clock speed, core count, cache, and memory bandwidth. In reality there's a whole host of microarchitectural features that separate P4 from Ivy Bridge, and just by knowing some of those I would put the perf/MHz far higher for Ivy Bridge. I guarantee you that it would still be much faster than the P4 with a tiny amount of L3 cache.

Most programs hit diminishing returns at not just the bandwidth levels we're seeing but also L3 cache levels. Compare performance between a Celeron and an i7 of the same generation, with the same number of cores and threads enabled and locked at the same clock speed. This will give you a good idea of the impact the much larger L3 cache makes.

Thus - the since the Playstation 4 is seeing a 12-15 times increase in CPU processing power over Playstation 3 - that's quite significant. Significantly above average.

I don't even know where you read something like that... If PS4 is coming with an 8 Jaguar cores at 1.6GHz like rumors say then it's not going to have 12-15x the performance of Cell in anything that reasonably fits the hardware.
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
Just as an exercise, let's say we have a single core, 4GHz CPU, and it has 20GB/s available to it. In the absence of prefetching, in order for that core to saturate 20GB/s, it would need to transfer 5B on average over the DRAM memory bus every cycle. In reality, application instruction mixes are more like 40% loads or stores, not 100%, so that reduces our bandwidth needed to only 3 bytes per cycle on average. Next consider that caches are often 90%+ effective at servicing memory operations (sometimes more than 99% effective), and we suddenly need to transfer 1 byte every 3 cycles on average (or about 4GB/s).

Even if you have 4 cores with 8 threads, you still aren't likely to actually saturate your 20GB/s. Aggressive prefetching can consume more of this spare bandwidth, but that doesn't happen in products available to consumers, so you don't have to worry about that either.
 

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
The rate of progress has quite clearly slowed down. In the Pentium 2 -> Pentium 3 -> Pentium 4 days we saw instructions, IPC and clock speed improvements with each chip and a new process brought with it. It wasn't uncommon to see a doubling in transistors result in a doubling of performance in everything.

In the 6 years covering from the first Pentium 2 they went from 300 Mhz on .35 micrometer all the way to Northwood Pentium 4's 3.4 Ghz on 130 nm. This wasn't a simple doubling of performance it was a doubling every 18 months along with the transistor counts.

6 years ago was mid Core 2 era with the Q6600 coming out that year. Anandtech bench has that CPU so we can see how it compares: http://www.anandtech.com/bench/Product/551?vs=53. In summary about 2-3x faster on the modern CPU.

But really its the period since the i7 920 that is interesting, http://www.anandtech.com/bench/Product/551?vs=47. That was late 2008, available beginning of 2009, some 4 years ago. In that period we have gained 25%-80% performance. Core 2 wasn't really when the performance stopped climbing, despite the reduction in clock speed that came with Core and Core 2 it was Nehalem that seems to have been the last point where we saw a good decent gain. Memory bandwidth and latency is clearly an issue, but so are many many other things. We were very fortunate back in the P2-P3-P4 days and the preceding decades where progress was so fast, but there is no doubt its slowed dramatically.
 

NTMBK

Lifer
Nov 14, 2011
10,322
5,352
136
The rate of progress has quite clearly slowed down. In the Pentium 2 -> Pentium 3 -> Pentium 4 days we saw instructions, IPC and clock speed improvements with each chip and a new process brought with it. It wasn't uncommon to see a doubling in transistors result in a doubling of performance in everything.

In the 6 years covering from the first Pentium 2 they went from 300 Mhz on .35 micrometer all the way to Northwood Pentium 4's 3.4 Ghz on 130 nm. This wasn't a simple doubling of performance it was a doubling every 18 months along with the transistor counts.

6 years ago was mid Core 2 era with the Q6600 coming out that year. Anandtech bench has that CPU so we can see how it compares: http://www.anandtech.com/bench/Product/551?vs=53. In summary about 2-3x faster on the modern CPU.

But really its the period since the i7 920 that is interesting, http://www.anandtech.com/bench/Product/551?vs=47. That was late 2008, available beginning of 2009, some 4 years ago. In that period we have gained 25%-80% performance. Core 2 wasn't really when the performance stopped climbing, despite the reduction in clock speed that came with Core and Core 2 it was Nehalem that seems to have been the last point where we saw a good decent gain. Memory bandwidth and latency is clearly an issue, but so are many many other things. We were very fortunate back in the P2-P3-P4 days and the preceding decades where progress was so fast, but there is no doubt its slowed dramatically.

Don't look at raw performance numbers- look at performance per watt. We went from ~30W on the Pentium II to ~90W on that Northwood P4 (and the later P4s pushed 130W). The i7 920 was also a 130W chip. But since then Intel has been heavily reining in their power usage, to the point where the 3770k is down to only 77W. And don't forget that that includes (relatively) high performance integrated graphics, which in the old days would have been counted in the chipset's TDP, not the processor's.
 

exar333

Diamond Member
Feb 7, 2004
8,518
8
91
IMO the reason ram bandwidth has not scaled as aggressively as you might like is because it simply hasn't needed to.

http://www.anandtech.com/show/6389/gskill-tridentx-review-2x4gb-at-ddr32666-c111313-165v

The performance difference between DDR3-1333 and DDR3-2666 is on the order of 5% in real-world apps, and only around 20% in synthetic apps.

When that is the case, what business case is to be made for justifying the cost-adder of developing even faster DDR, even higher bandwidth (more channels) or exotic mixed-use models involving GPU's and GDDR?

Today's CPUs don't take advantage of a near doubling in bandwidth, and that is bandwidth that is commercially available.

I suspect Intel and AMD would just as soon develop processors that will take advantage of the cheaply available bandwidth headroom before setting themselves on a bandwidth path that elevates cost (and bandwidth) but brings no more performance improvement than that obtainable with commercially available DDR3-2xxx dimms.

This.

Memory bandwidth for most tasks is not limited at all, and doesn't scale much, if at all. There was hardly any performance difference on my old X58 rig going from triple to dual channel and I am currently running at quad channel <1600mhz RAM on my X79 today. It really only matters for iGPUs and some very specific encoding applications.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |