Haswell to include a L4 cache?

Jionix · Apr 3, 2012

semi-accurate has a pretty in-depth article about Haswell's Graphics..

http://semiaccurate.com/2012/04/02/haswells-gpu-prowess-is-due-to-crystalwell/

Fjodor2001 · Apr 4, 2012

Jionix said:
semi-accurate has a pretty in-depth article about Haswell's Graphics..

http://semiaccurate.com/2012/04/02/haswells-gpu-prowess-is-due-to-crystalwell/

"We are told the GT3 variants of Haswell will have 64MB of on-package memory connected through an ultra-wide bus."

Will that really be enough? Discrete GFX cards have like 1-3 GB of RAM!

Seems like the worst case scenario will be really bad with only 64 MB. Then you'll sometimes have to swap out the content of that 64 MB memory, and populate it from normal system RAM, i.e. you fall back to the same memory bandwidth as today with Sandy/Ivy Bridge? So the worst case will not be any better than with those CPUs, or?

So you might get very high performance for some time while gaming, and then from time to time the performance drops radically when the 64 MB cache memory will have to be swapped out? So the rendering will stutter?

"In the end, the massive bandwidth, coupled with the 5x increase in shader performance, will mean Haswell is a real graphics monster."

Can that really be true? 5x performance increase compared to Ivy Bridge HD4000 IGP?

The VR-Zone article estimates a 2-3x increase instead (see: http://vr-zone.com/articles/mystery...up-the-graphics-ante-further-again/15272.html).

Are either realistic? :hmm:

blckgrffn · Apr 4, 2012

Fjodor2001 said:
"We are told the GT3 variants of Haswell will have 64MB of on-package memory connected through an ultra-wide bus."

Will that really be enough? Discrete GFX cards have like 1-3 GB of RAM!

Seems like the worst case scenario will be really bad with only 64 MB. Then you'll sometimes have to swap out the content of that 64 MB memory, and populate it from normal system RAM, i.e. you fall back to the same memory bandwidth as today with Sandy/Ivy Bridge? So the worst case will not be any better than with those CPUs, or?

So you might get very high performance for some time while gaming, and then from time to time the performance drops radically when the 64 MB cache memory will have to be swapped out? So the rendering will stutter?

"In the end, the massive bandwidth, coupled with the 5x increase in shader performance, will mean Haswell is a real graphics monster."

Can that really be true? 5x performance increase compared to Ivy Bridge HD4000 IGP?

The VR-Zone article estimates a 2-3x increase instead (see: http://vr-zone.com/articles/mystery...up-the-graphics-ante-further-again/15272.html).

Are either realistic? :hmm:

If its used like the cache on the xbox 360 (20MB), it could really help.

beginner99 · Apr 4, 2012

Fjodor2001 said:
"We are told the GT3 variants of Haswell will have 64MB of on-package memory connected through an ultra-wide bus."

Will that really be enough? Discrete GFX cards have like 1-3 GB of RAM!

Exactly RAM, not cache. Cache is a lot faster.

Fjodor2001 · Apr 4, 2012

beginner99 said:
Exactly RAM, not cache. Cache is a lot faster.

True, but the RAM on discrete GFX cards is a lot faster than regular PC RAM. That's why the bandwidth is so much faster on discrete graphics cards, and why it is a bottle neck on Sandy/Ivy Bridge IGP.

So, the question is still valid. Since the 64 MB cache cannot hold as much data as the 1-3 GB fast RAM on discrete graphics cards, the worst case scenario when the IGP will have to fetch data from regular PC RAM instead of cache it will not be any better on Haswell than Ivy/Sandy Bridge. Right?

And with a discrete graphics cards that worst case scenario will never happen, since all GFX data will normally fit in the 1-3 GB fast on-board GFX RAM while gaming.

Khato · Apr 4, 2012

blckgrffn said:
If its used like the cache on the xbox 360 (20MB), it could really help.

The xbox 360 edram is 10MB, no? Have to remember that it was originally built on 90nm, and at that node it was a pretty fair sized die.

Regardless, the smaller size just results in it being unable to store all game textures... but textures aren't the only source of bandwidth consumption. Unfortunately it's annoying to find current figures for the various sources of bandwidth consumption, it used to account for around 75%. Even at high resolutions, Z, color, and render target buffers should fit within a 64MB edram. If those buffers still account for a fair amount of bandwidth, then removing them and having only textures in main memory could result in a marked difference.

blckgrffn · Apr 4, 2012

Khato said:
The xbox 360 edram is 10MB, no? Have to remember that it was originally built on 90nm, and at that node it was a pretty fair sized die.

Regardless, the smaller size just results in it being unable to store all game textures... but textures aren't the only source of bandwidth consumption. Unfortunately it's annoying to find current figures for the various sources of bandwidth consumption, it used to account for around 75%. Even at high resolutions, Z, color, and render target buffers should fit within a 64MB edram. If those buffers still account for a fair amount of bandwidth, then removing them and having only textures in main memory could result in a marked difference.

Ah, perhaps, I am probably wrong...

http://www.gamespot.com/forums/topic/26116823

Ah yes, I am

So, if 10MB is good for ~640P, what resolution will 64MB be good for? 1080P? I am guessing it is not a linear scale?

Khato · Apr 4, 2012

blckgrffn said:
So, if 10MB is good for ~640P, what resolution will 64MB be good for? 1080P? I am guessing it is not a linear scale?

Should be a linear scale with resolution. Though depending upon what all Intel uses it for in comparison to the xbox 360, the scale could well be different. Regardless, the real question is how much bandwidth the buffers they can fit in there consume compared to the total. Intel's inclusion of an edram implies that there is.

BrightCandle · Apr 5, 2012

Processors are becoming more and more caches with some logic chunked on the side for good measure. On die stacked RAM should be considerably quicker than the RAM attached to a GPU, but then the bandwidth to main RAM is considerably lower. I doubt the cache will compensate enough, but then they aren't yet aiming for discrete GPU performance, just to eat more and more into the volume market.

Mr. Pedantic · Apr 5, 2012

Tuna-Fish said:
It's about the workloads. The typical cpu is interested in low-latency accesses to a small subset of the memory, and is thus well served with a good cache hierarchy. The SNB cache system has a total hitrate well in excess of 95%, which means you get some 20 times more realized bandwidth than what your memory provides.

The typical GPU workload consists of rapidly streaming through large data sets. This is essentially uncacheable, as accessing an item of memory makes it the least likely one to be accessed again in the near future. So what you want is just raw bandwidth.

But GPUs have L1 and L2 caches now too...

Edrick · Apr 5, 2012

Mr. Pedantic said:
But GPUs have L1 and L2 caches now too...

I was under the impression that the caches on GPUs today were more for HPC tasks than for graphics tasks. I could be wrong however.

pelov · Apr 5, 2012

How much would something like this cost? 64MB on-package memory, L4 cache... I'm afraid to ask.

Olikan · Apr 5, 2012

Fjodor2001 said:
Can that really be true? 5x performance increase compared to Ivy Bridge HD4000 IGP?
The VR-Zone article estimates a 2-3x increase instead (see: http://vr-zone.com/articles/mystery...up-the-graphics-ante-further-again/15272.html).
Are either realistic? :hmm:

errr...in some older leak, semiaccurate said it was 5x the sandy bridge, and fit's very well the the VR-Zone performance numbers

I was under the impression that the caches on GPUs today were more for HPC tasks than for graphics tasks. I could be wrong however

you are, caches are important, mainly ROPs

Fjodor2001 · Apr 6, 2012

pelov said:
How much would something like this cost? 64MB on-package memory, L4 cache... I'm afraid to ask.

Good question. Also the Haswell with the 64 MB cache will be a multi-chip-module (from what has been said), which is more expensive.

So for cost efficiency, if you're building a desktop system wouldn't it be better to buy a cheaper 4 core Haswell with the lowest performing IGP (GT1?, which doesn't have any 64 MB cache) and a discrete GFX card for about $100 instead?

Fjodor2001 · Apr 6, 2012

Olikan said:
errr...in some older leak, semiaccurate said it was 5x the sandy bridge, and fit's very well the the VR-Zone performance numbers

Which Sandy Bridge IGP did they compare it to when concluding Haswell had 5x the performance? HD2000 or HD3000? There's a huge difference between those two.

denev2004 · Apr 6, 2012

pelov said:
How much would something like this cost? 64MB on-package memory, L4 cache... I'm afraid to ask.

It wouldn't cost too much if they do not use traditional SRAM technology. Remark that Power 7 use 45nm, still get 32nm eDRAM on die

Tuna-Fish · Apr 6, 2012

Mr. Pedantic said:
But GPUs have L1 and L2 caches now too...

Edrick is right, they are mostly useful for GPGPU tasks. They do have *some* use for graphics too, as they act as write/read-combining caches -- gpu loads have no time locality (that is, when you look at a piece of memory, you are very unlikely to look at it again in the near future), but they do have bucketloads of space locality (that is, when you look at a piece of memory, you are very likely to look at nearby pieces of memory soon). So when you do a texture lookup, you don't just get the texel you want, you get everything in the vicinity and hope that you are going to need them to render nearby pixels. This allows for some gain of efficiency from caching, but the gains are more like 1.5 times to 2 times, compared to the 20 times and more that cpus get.

Again, this is not a function of the structure of the devices, this is a function of the loads they run. If you build a software renderer that works like a gpu, the fat caches on the CPU will give it no advantage. The devices have just evolved to fit their purpose well.

Khato said:
Regardless, the smaller size just results in it being unable to store all game textures... but textures aren't the only source of bandwidth consumption.

The XB360 edram is never used to store any textures. It is strictly a render target. (The rops are actually on the daughter die, and they are tightly integrated in the memory pool.)

Unfortunately it's annoying to find current figures for the various sources of bandwidth consumption, it used to account for around 75%.

Deferred rendering and deferred texturing have switched this around -- they make the texture lookups from the big pools relatively rare, but massively increase the writing to (and reading from) the render targets. Texturing (from the big pool) would then typically be less than 30% of the total bw, and most of the "texture lookups" would actually happen from the render targets of the previous phases.

Even at high resolutions, Z, color, and render target buffers should fit within a 64MB edram. If those buffers still account for a fair amount of bandwidth, then removing them and having only textures in main memory could result in a marked difference.

blckgrffn said:
So, if 10MB is good for ~640P, what resolution will 64MB be good for? 1080P? I am guessing it is not a linear scale?

This is only for traditional rendering schemes. Deferred rendering techniques write a lot more than just a few color values into the render targets. Think more like a few stages writing 16 bytes per stage (per pixel, per frame). 64MB would probably hurt BF3 at 1080p.

This is all related a little in that one of the reasons of going deferred rendering is that it gains relatively a lot from bigger caches on the GPU.

Ajay · Apr 6, 2012

Tuna-Fish said:
<snip>
This is only for traditional rendering schemes. Deferred rendering techniques write a lot more than just a few color values into the render targets. Think more like a few stages writing 16 bytes per stage (per pixel, per frame). 64MB would probably hurt BF3 at 1080p.

This is all related a little in that one of the reasons of going deferred rendering is that it gains relatively a lot from bigger caches on the GPU.

Thanks for the very informative post :thumbsup:

blckgrffn · Apr 6, 2012

Ajay said:
Thanks for the very informative post :thumbsup:

Yes, I agree, thank you

Haswell to include a L4 cache?

Jionix

Senior member

Fjodor2001

Diamond Member

blckgrffn

Diamond Member

beginner99

Diamond Member

Fjodor2001

Diamond Member

Khato

Golden Member

blckgrffn

Diamond Member

Khato

Golden Member

BrightCandle

Diamond Member

Mr. Pedantic

Diamond Member

Edrick

Golden Member

pelov

Diamond Member

Olikan

Platinum Member

Fjodor2001

Diamond Member

Fjodor2001

Diamond Member

denev2004

Member

Tuna-Fish

Golden Member

Ajay

Lifer

blckgrffn

Diamond Member

TRENDING THREADS