Pentium III L2 cache better than TBird one ??

gustavo · Oct 16, 2001

Please correct me where am I wrong:

The L2 Tbird cache has 256 Kbytes organized in 16 entries of 64 bytes each, so you have 256 lines and then for each Megabyte of RAM you have 4 actual entries!!

The L2 Pentium III cache has 256 Kbytes organized in 8 entries of 32 bytes each, so you have 1024 lines and then for each Megabyte of RAM you have 8 actual entries !!

Then the hit/miss ratio of the Pentium III would be better !!

Where did I get wrong ??

CTho9305 · Oct 16, 2001

did you consider latencies? bigger sets take longer (generally) to search, or require more hardware. also, the tbird cache is exclusive between the L1 and L2, so there is actually 384k of L1+L2 on a tbird, while there is only 256k effective L1+L2 cache on a p3.

burntkooshie's article might go into this - even if it doesn't, you should still read it

Sohcan · Oct 16, 2001

First of all, the TBird has 4096 cache lines, vs. the P3's 8192 lines. (256KB / 64 B/line = 4K lines)

Secondly, you're over-simplifying things. Hit-rate is the result of many different factors:

- Mapping type: Direct-mapped has the lowest hit-rate, but has the fastest access time (each address is specifically mapped to a location, so you know exactly where to look for it). Fully-associative has the highest hit-rate, but has the slowest access time...each address can be anywhere, so each line has to have a comparator to compare the tags, as well as a much larger mux...these factors decreases access time. Set-associative is a combination of the two methods...the higher "n" is in an n-way set-associative cache, the more the cache behaves like fully-associative. An address gets mapped into a specific line within each set, and can be mapped into any set. So an 8-way set-associative cache has to compare the tags of 8 lines within the cache.

- Cache size: Kind of self-explanatory . The more lines you can store in the cache, the higher the hit rate due to exploitation of temporal locality.

- Line size (larger is better): The larger the line size is, the more that spatial locality is exploited, since you're likely to reference another word in the same line recently accessed. The downside is that larger line sizes tend to require more bandwidth.

- Replacement policy & algorithms: they have an effect on the temporal locality of the lines in the cache.

- Exclusive vs. inclusive cache: inclusive repeats the address space of the L1 in the L2, so an exclusive cache has a larger effective L2. The downside of an exclusive cache is that it requires more bandwidth, since cache evictions will be moving data between the L1 and L2 based on demand (even though the Athlon "appears" to have 384K of cache, the first 128K has lower latency). On the other hand, an exclusive cache with the P3 wouldn't have much effect, because of the L1's small size compared to the L2 (32K vs. 256K).

...and more factors that I can't remember off-hand.....

Anyway, Paul DeMone reports that the Athlon has an effective L2 hit-rate of 99%, compared to the P3's 98.9%.

gustavo · Oct 16, 2001

So, as I see you know a lot and have good background on the subject:
Which cache architecture you think is faster (better) beyond we all know
after Einstein everything is relative.-

Of course if you can justify your opinions I think it'll contribute most to
the forum.

Thanks Gustavo.-

BurntKooshie · Oct 16, 2001

Which type of cache is "better" depends on the access patterns of the program. This is true not only because of the exclusive vs inclusive, but also because of the L2 latencies, hit-rates, and the comparative sizes of the primary caches. The Pentium III has drastically lower latencies, and twice the bandwidth at the same clock speeds, so it all really depends on the application, should everything else be the same. If you're comparing a P3 with the T-birds cache design, with the P3s current cache design, the way Intel's doing it makes sense, because going exclusive would be relatively supid, as Sohcan said. The only benefit of going to exclusive is higher on-chip hit-rate, and for the P3 to do so would not be of much use.

Conversley, for the Athlon to have the same L2 caching scheme as the P3, this too wouldn't make much sense. If this were the case, the Athlon would have the same amount of effective-cache as the P3, but a great deal more of the cache (and therefore, diespace) would be wasted. The Athlon would have lower average latencies because of the larger L1 cache. It would also be somewhat wasteful to have increased the bandwidth so dramatically to the L2 cache considering the added hit-rate would be relatively small as compared to the L1 cache.

Both of the design decisions make sense for their respective chips, because there are other featuers that have to come into play (the other layers of the memory hierarchy, and things like die-size, etc). I would not want to see either chip with the others' L2 cache implementation, as it would only degrade performance in both cases, for the majority of cases.

As Sohcan said, going exclusive requires more bandwidth, so it would be nice for the T-bird (and now Palomino) to have a wider L2 cache interface, but that requires more engineering effort, and considering the already massive hit-rate of the L1 cache, there would be relatively little benefit as compared to the time required to dig around in the core again (though perhaps they could have between the T-bird and Palomino....).

Pentium III L2 cache better than TBird one ??

gustavo

Senior member

CTho9305

Elite Member

Sohcan

Platinum Member

gustavo

Senior member

BurntKooshie

Diamond Member

TRENDING THREADS