can you compare a new cpu to a 20 year old one?

SlowSpyder · Apr 15, 2013

aigomorla said:
my cely 300A was the best bang for the buck cpu i ever had..

nothing beats it... absolutely nothing!

u set 1 dip switch and poof! your flying at 450mhz!

I paid something stupid cheap like $37 for a 1600MHz Duron that could crack 2300Mhz. That is probably my most memorable. You have to love those cheapies that approach a 50% oc!

tweakboy · Apr 15, 2013

I had that 450Mhz Pentium 2 P2B motherboard... 9GB scsi cheetah,,,,

AndyE · Apr 16, 2013

Don't forget memory hierarchy scaling.

Over time, if memory requirements keep at a constant factor, the application's memory footprint "moves up" the cache hierarchy.

Just for fun did I use a virtual machine with DOS 5 and Turbo Pascal. My 30.000 line chess program from the 90ties experienced a 3.000 fold improvement vs. the original IBM PC - largely due to the fact that the application fitted in the level 1 cache. Increasing the speed-up even beyond the clock, ILP, micro-architecture improvements of the core CPU.

rgds,
Andy

(sic)Klown12 · Apr 16, 2013

bononos said:
I wonder what happened to the P4 team, were they censured, disgraced and put into cold storage?

That is Intel's Oregon team, who then ended up designing Nehalem and Haswell.

tweakboy · Apr 16, 2013

jaqie said:
You are correct for the P3 -> Pm -> C2. They did also take some of the P4 (netburst) and put it into the i series, though, so that got the best of both (though IMO hyperthreading is not good, in many people's it is, so I give credit here)

My coppermine P3 systems give like 256MB/s bandwidth from memory, the dual tualatin (and a few single ones) got 512MB/s or so (from two memory controllers akin to dual channel today), and the P mobile gets 2GB/s or so... performance of the p3-ish cpu has been shown to scale with that, in my personal experience as well.

I need HT. I use Sonar X2 Producer x64 24bit/192hz @ 2ms. In my app it shows your usages.. it shows 8 , so it sorta becomes a 8 cores, HT 64bit app... You should not say you dont need HT blah. First see what they are going to do if your gonna shut it off, alto on would be better giving you up to 40 percent increase in say Photoshop,, etc..

Matt1970 · Apr 16, 2013

IntelEnthusiast said:
Wow comparing my old Intel® Pentium® 75MHz processor vs. my new Intel Core i5-3570K isnt even close. I can do so much more with this system so much faster that isnt even close.

Consider that at the time the Pentium 75 was released your whole operating system fit on a handful of 1.4 meg floppy disks. Windows 3.1 absolutely flew on a Pentium.

TerryMathews · Apr 16, 2013

aigomorla said:
my cely 300A was the best bang for the buck cpu i ever had..

nothing beats it... absolutely nothing!

u set 1 dip switch and poof! your flying at 450mhz!

Happened with the 1.6LV Xeons too. Stepping D1 could run all day long at 3.2GHz.

Had a pair of those for a long, long time.

james1701 · Apr 16, 2013

One thing to remember, information was not as wide spread then when the Pentium series were launched like it is today. I can remember just dreaming about getting a Pentium chip just to watch how fast it could decompress ARJ files. It was a big step up for many 486 machines at the time.

jaqie · Apr 16, 2013

when I got it my 486-133 could best most early pentiums. I miss it. It's one of the few systems I don't have in my collection of old systems.

jhu · Apr 16, 2013

AndyEI891627 said:
Don't forget memory hierarchy scaling.

Over time, if memory requirements keep at a constant factor, the application's memory footprint "moves up" the cache hierarchy.

Just for fun did I use a virtual machine with DOS 5 and Turbo Pascal. My 30.000 line chess program from the 90ties experienced a 3.000 fold improvement vs. the original IBM PC - largely due to the fact that the application fitted in the level 1 cache. Increasing the speed-up even beyond the clock, ILP, micro-architecture improvements of the core CPU.

rgds,
Andy

There's no doubt that memory is now much faster. But let's look at a few things

1) Original IBM PC is 4.7 MHz. Core i5 3570K is 3.4 GHz. That's a 723x increase just in clock speed.
2) On the 8088, the fastest instruction was mov reg,reg at 2 cycles. Now we're up to at least a 1440x increase and probably more already since there are many more instructions (especially integer instructions) that effectively execute in 1 cycle on Ivy Bridge
3) Ivy Bridge is also super scalar and has speculative execution.

The above 3 can absolutely give a 3000x increase in performance. Try disabling L1, L2, and L3 caches and run again. Should be interesting.

AndyE · Apr 17, 2013

jhu said:
There's no doubt that memory is now much faster. But let's look at a few things

1) Original IBM PC is 4.7 MHz. Core i5 3570K is 3.4 GHz. That's a 723x increase just in clock speed.
2) On the 8088, the fastest instruction was mov reg,reg at 2 cycles. Now we're up to at least a 1440x increase and probably more already since there are many more instructions (especially integer instructions) that effectively execute in 1 cycle on Ivy Bridge
3) Ivy Bridge is also super scalar and has speculative execution.

The above 3 can absolutely give a 3000x increase in performance. Try disabling L1, L2, and L3 caches and run again. Should be interesting.

Don't get caught by the increase in bandwidth. The primary bottleneck today is latency (access time). This access time (measured in nanoseconds) increased since the IBM PC days only 4-5 fold. That's the primary reason why we got one level of cache, then a second and now a third.

These days, one main memory access incurs a CPU wait time up to 200-400 CPU cycles. During the IBM PC days only poorly designed systems had wait states for main memory access.

All the huge performance increases listed by you have one important requirement. The data need to be in the cache. Most benchmarks are so small, that they fit into the cache hierarchy - if your application as well, that's fine. If your data is bigger than the cache than your computational perf drops up to 10 fold (and sometimes even more) due to memory stalls.

Simple example:
Try adding the elements of 2 arrays (each 1 GB large) to a third array.

With 2 x 1600 MHz memory channels (listed by Intel with 25.6 GB/s) the best real throughput is more in the 12-16 GB/s region. If you don't know how to tweak your access pattern, it is rather 8 GB/sec.

To add 2 double precision numbers, 3 x 64 bit memory transfers are necessary (2 reads, 1 write). If you divide 8 GB/sec by 24 Byte, the maximum flop/s your highly optimized LGA-1155 CPU can achive will not exceed with this simple program 333 Mflop/s.

Andy

Exophase · Apr 17, 2013

Fortunately most programs aren't one giant stream benchmark. Most popular benchmarks these days are of real world programs that aren't architected to fit entirely in cache. But they also spend little of their time bandwidth limited and I think that's true for most programs people use - you can see this in most comparisons of programs running on dual vs quad memory channels.

So yeah, hiding latency of memory accesses is more critical.. but that's not only done by the caches but by the prefetch engines, OoO scheduling, and SMT.

How much an application's hot working set fits into the different levels of the cache hierarchy varies but you don't see this big 10x drop off between something that fit entirely in cache vs something that no longer does, because access within the working send tends to be very non-uniform.. You can have programs which touch a ton of memory but have it interspersed with spending a lot of time performing computations and control flow based on local variables clumped up well in L1 cache. Lots of programs have big data and code footprints but few of them drop below fairly high L1 hit rates.

AndyE · Apr 17, 2013

Agree that not all programs are stream benchmarks. My comment was a response to the statement above that basically most perf increase comes from the "core" CPU.

BTW, many HPC systems turn SMT off - the ability of hiding latency in this class of apps is limited. Turning on SMT quite often leads to a reduced overall performance.

At the end, it is depended on the workload and profile of the apps, which component of the system is either a bottleneck or drives performance.

Being able to "read" benchmarks helps in this process

rgds,
Andy

jhu · Apr 17, 2013

Just for lulz, I disabled processor cache on the Core i5 2400S machine I have available and reran the benchmark scene using Povray 3.6. There's no change in render times between cache enabled and disabled. OTOH, the graphical interface is noticeably slower. Even command-line is a little slower.

FalseChristian · Apr 17, 2013

My favourite chip was the Tualitan 1GHz Celeron. I got that puppy to 1500MHz. Stable as hell for almost 1 1/2 years. I figure my i5 2500K at 4.5GHz is at least 6 times faster.

Exophase · Apr 17, 2013

jhu said:
Just for lulz, I disabled processor cache on the Core i5 2400S machine I have available and reran the benchmark scene using Povray 3.6. There's no change in render times between cache enabled and disabled. OTOH, the graphical interface is noticeably slower. Even command-line is a little slower.

What cache did you disable, and how did you do it? I'm surprised to hear there's an option to disable any level of cache on a Sandy Bridge.. L1 and L2 are deeply integrated with the CPU cores and L3 should be needed for coherency and is probably tightly coupled with the memory controller..

Personally I haven't seen options to disable L2 outside of uarchs where the L2 is technically optional - like with the first Celerons - or with Cortex-A8s.

The only way this makes sense to me if it's disabling all of L1 data, L2, and L3 cache. But the thought of disabled L2 or especially L1 not substantially diminishing performance of pretty much everything you run is unfathomable to me. I guess it's possible for just L3 to be disabled if the hardware forces a snoop to every core's L1 and L2..

SunRe · Apr 17, 2013

Hi,

My first pc was with a 600Mhz Celeron Mendocino. Now, I don't seem to find anywhere but do you guys have any idea what was the SDRAM bandwidth on that platform?

Intel just announced that the new 9 series chipsets will support SATA express with 10 to 16 Gb/s transfer rates. I wonder if that's more than my RAM in the old days

jhu · Apr 17, 2013

Exophase said:
What cache did you disable, and how did you do it? I'm surprised to hear there's an option to disable any level of cache on a Sandy Bridge.. L1 and L2 are deeply integrated with the CPU cores and L3 should be needed for coherency and is probably tightly coupled with the memory controller..

Personally I haven't seen options to disable L2 outside of uarchs where the L2 is technically optional - like with the first Celerons - or with Cortex-A8s.

The only way this makes sense to me if it's disabling all of L1 data, L2, and L3 cache. But the thought of disabled L2 or especially L1 not substantially diminishing performance of pretty much everything you run is unfathomable to me. I guess it's possible for just L3 to be disabled if the hardware forces a snoop to every core's L1 and L2..

Easy enough. Set bit 30 of cr0. Processor cache is now no longer used. Some BIOSes allow this to be set in the BIOS. Otherwise it requires a kernel module and root access.

Idontcare · Apr 18, 2013

SunRe said:
Hi,

My first pc was with a 600Mhz Celeron Mendocino. Now, I don't seem to find anywhere but do you guys have any idea what was the SDRAM bandwidth on that platform?

Intel just announced that the new 9 series chipsets will support SATA express with 10 to 16 Gb/s transfer rates. I wonder if that's more than my RAM in the old days

Your ram bandwidth on that platform was probably somewhere around 600-1000 MB/s. (guessing from my foggy memory)

jaqie · Apr 18, 2013

that's around pc133 sdr time, and that on a good controller got 256MB/s

Montsegur97 · Apr 18, 2013

Worst part about the pentium 75 we had...the whole system cost 3500$. I'm sure my parents hadn't shopped around properly, but still ridiculous.

Exophase · Apr 18, 2013

Idontcare said:
Your ram bandwidth on that platform was probably somewhere around 600-1000 MB/s. (guessing from my foggy memory)

Stock FSB for Mendocino Celerons was 66MHz, but I don't think there was a 600MHz one. It could have been Coppermine-128 (http://ark.intel.com/products/27191/Intel-Celeron-Processor-600-MHz-128K-Cache-66-MHz-FSB), still 66MHz FSB. Or it could have been overclocked, like a 400MHz Mendocino, that'd give 100MHz FSB.

66MHz = 528MB/s, 100MHz = 800MB/s

Idontcare · Apr 18, 2013

Montsegur97 said:
Worst part about the pentium 75 we had...the whole system cost 3500$. I'm sure my parents hadn't shopped around properly, but still ridiculous.

That sounds about right, for the time. Remember back then you'd spend $100 on a keyboard alone. Your hard-drive would run you $600 or $800. The PSU would be another $500.

We live in a golden age of inexpensive computing now.

Montsegur97 · Apr 18, 2013

Idontcare said:
That sounds about right, for the time. Remember back then you'd spend $100 on a keyboard alone. Your hard-drive would run you $600 or $800. The PSU would be another $500.

We live in a golden age of inexpensive computing now.

Worst part, it couldn't even run NHL 96 and Mortal Kombat 3 properly. Though, I wonder if anything at the time could (with max settings). I'm sure the thing didn't have a dedicated GPU at the time. Then I remember a month later, pentium 90 being released and subsequently it seemed you were outdated every couple months.

Jacky60 · Apr 18, 2013

I reckon my 4 yr old i7 920 at 4ghz would run rings around my first PC -Pentium 133mhz (think 120mhz). Quite a bit faster/slower but maybe or maybe not faster/slower than I would expect.

can you compare a new cpu to a 20 year old one?

Lifer

Diamond Member

Junior Member

Senior member

Diamond Member

Lifer

Lifer

Golden Member

Platinum Member

Lifer

Junior Member

Diamond Member

Junior Member

Lifer

Diamond Member

Diamond Member

Member

Lifer

Elite Member

Platinum Member

Member

Diamond Member

Elite Member

Member

Golden Member