CPU Pipelining

tedthebear

Senior member
Jul 5, 2001
236
0
0
I think I'm starting to get an idea of what's happening inside my processor!
Here goes: The instructions travel through the processor's pipeline in five stages, 1)prefetch 2)instruction decode 3)address generate 4)execute 5)write back.
It takes five clock cycles for one instruction to be completed. But since its all being "pipelined" so that there aren't any empty stages, you can perform one instruction every cycle. Right?
The Pentium Pro is super pipelined. It's instructions travel through the pipeline in 14 stages. This means it has 14 instructions all going on at once, right? Is this faster then having just 5 instructions going on at once????
Plus I understand that the later Pentiums are superscalar. Does this mean they have two parallel pipelines churning out instructions at the same time????
Let me know where I'm mixed up.
Thanks.

 

AndyHui

Administrator Emeritus<br>Elite Member<br>AT FAQ M
Oct 9, 1999
13,141
16
81
Have a look at this thread. It should help to answer your questions.

All Pentiums are &quot;superscalar&quot;. This simply means that there is more than one pipeline. They do not necessarily operate in parallel.
 

BurntKooshie

Diamond Member
Oct 9, 1999
4,204
0
0
You're pretty close. Just because a processor is &quot;pipeliend&quot; doesn't mean every single one has to be 5 stages. Plus, prefetch is when you get instructions you don't need -- so the first general step is just &quot;fetch&quot;.

And you're right on about it taking 5 cycles (if it has a pipeline length of 5 stages). And the idea is exactly as you said -- to be able to issue another instruction every cycle evn though it takes 5 cycles to get it all done. Just like an assembly line. (Hennesy and Patterson like the &quot;washing machine&quot; analogy).

The Pentium pros is something like 14 stages (I think that's right -- I don't have the number memorized ). It's actually slower to have more instructions in at once, due to something call a pipeline flush. When there's a branch in code, (like an &quot;if this, do this&quot; &quot;else, do that&quot; section in a program), it will guess. Why wait 14 cycles when you can guess? That's what all modern processors do, most with ~90%+ correct &quot;gussing&quot; rate. This is called &quot;branch prediction&quot; Whell, I'd explain more, but you're getting the ideas correct from aceshardware, and they explain that later on Just keep reading

So it's technically slower to have more instructions in the 'pipeline' at once, because when it guesses wrong, there's more work that has to be thrown away and restarted. But, the more stages (all else equal), the higher the chip will be able to clock in the same process.

Yep, superscalar means that it can issue more than one instruction at a time, so it can have, say, 2, 3 4, etc.... insturctions running through the processor at once.

You're getting there....

Welcome to computer architecture.
 

tedthebear

Senior member
Jul 5, 2001
236
0
0
:QSo! Superscalar means a processor has more then one pipeline going at one time. Even more then two pipelines, heh?
How many pipelines does a current processor like the Celeron usually have?
Thanks so much for the help here.
BTW, I've been reading the processor info in &quot;The PC Technology Guide&quot; to get this far. AcesHardware was still a little too advanced for me to understand. The interactive Intel Education Series helped me a lot too. I think it's geared to the junior high school level and that's probably where my IQ is I hate to admit!
 

AndyHui

Administrator Emeritus<br>Elite Member<br>AT FAQ M
Oct 9, 1999
13,141
16
81
IIRC (it's 4:30 am here, and BurntKooshie is probably going to tell me that I am wrong)...

The P6 family (Pentium Pro, Pentium II, Pentium III, Celeron), has 2 integer pipelines, 1 full and 1 partial pipeline for the FPU.

In any case, both the Athlon and the P6 family can decode up to 3 instructions at once (although there are limits as to what 3 instructions the P6 can decode at once; the Athlon is not subject to this limitation).

The Pentium 4 is supposed to be able to decode 4 instructions simultaneously.
 

BurntKooshie

Diamond Member
Oct 9, 1999
4,204
0
0
Don't worry, FAQ Man - You're right

As for superscalar:

The way people discuss how &quot;superscalar&quot; an architecture is, they say it's an &quot;X-issue superscalar design.&quot; This means that if it has 4 functional units (a functional unit being either one that does integer operations, or floating point operations), it would be called a &quot;4-issue superscalar.&quot;

Note that not all the pipelines have to be in use at once -- in fact, it is very rare, especially for x86 processors, to have anywhere near the peak number of instructions &quot;in-flight&quot; at once.

As for the Pentium 4, sorry, I'll have to make a correctio here Andy

The Pentium 4 can only decode one x86 instruction at a time! If it's a &quot;simple&quot; (regular) x86 instruction, it gets decoded into however many &quot;micro-ops&quot; it coorelates to.

However, the Trace Cache is able to issue up to three uops (micro ops) pre cycle (note that this number is even fewer than the number of instructions its double pumped ALUs can do, but this is unimportant because its so rare to have that many instructions to be able to issue in such a way that it doesn't matter). The more complex x86 instructions skip the Trace Cache, because they would need to be turned into so many &quot;uops&quot; that it would pollute the Trace Cache.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
To explain the washing machine analogy referenced by burntkooshie...
if you have a washer and a drier, and 2 loads of clothes, the simplest way would be to wash load 1, dry it, wash load 2, dry it. that takes 4 time units.

To pipeline it, you could wash load 1, then as you dry it, wash load 2, then dry load 2. now you're down to 3 time units total.

If you broke the washer into soak, soap, agitate, rinse, spin-dry, each one could take less time, so a time unit becomes smaller. while an instruction (load of clothes) takes 6 units (incl. drying) each time unit can be shorter, improving throughput
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |