How big is the processor's execution window for code optimization?

chrstrbrts

Senior member
Aug 12, 2014
522
3
81
Hello,

OK, here's what I mean.

Modern processors have a multitude of expedition and optimization mechanisms.

Some examples are: branch prediction, out-of-order instruction execution, pre-fetching instructions.

But how broad is the processor's scope when it performs these optimizations?

Let's say a big process is loaded into memory and an entire 2 MB page is dedicated to its code.

Does the processor look at the entire page and optimize the entire page before beginning execution?

Or does it just start at the code entry point and only look at the next 10 or 15 instructions and optimize within that small window?
 

Cogman

Lifer
Sep 19, 2000
10,278
126
106
https://www.microway.com/knowledge-...intel-xeon-e5-2600v4-broadwell-ep-processors/

So no it isn't doing the whole program.

How much is done depends entirely on both the code being executed and the processor it is executing on.

For intel CPUs, they have a scheduler of micro-opts. They build out the dependency tree and then execute things as blocks (waiting on IO) are removed. According to the above article, their modern CPUs can handle 64 entries.

I imagine that now because we have stopped getting the free performance (in the form of larger caches primarily) and power decreases afforded by node shrinks, that what the CPU will optimize will become ever more impressive.
 

chrstrbrts

Senior member
Aug 12, 2014
522
3
81
So no it isn't doing the whole program. According to the above article, their modern CPUs can handle 64 entries.

OK, that makes sense.

Trying to optimize possibly millions of instructions in one shot seemed woefully impractical to me; I suspected that the scope was much smaller.

Thank you, sir.
 

Greyguy1948

Member
Nov 29, 2008
156
16
91
https://www.microway.com/knowledge-...intel-xeon-e5-2600v4-broadwell-ep-processors/

So no it isn't doing the whole program.

How much is done depends entirely on both the code being executed and the processor it is executing on.

For intel CPUs, they have a scheduler of micro-opts. They build out the dependency tree and then execute things as blocks (waiting on IO) are removed. According to the above article, their modern CPUs can handle 64 entries.

I imagine that now because we have stopped getting the free performance (in the form of larger caches primarily) and power decreases afforded by node shrinks, that what the CPU will optimize will become ever more impressive.

Interesting subject!
I can see that in SPEC the step from base to peak is very small for Intel compiler compared to other compilers. Is this a sign of better prediction?
 

Cogman

Lifer
Sep 19, 2000
10,278
126
106
Hard to say. Unfortunately (fortunately?) as more and more optimizations enter both the hardware and software level, it becomes harder and harder to tell which optimizations are due to better compilers, which are due to better processors, and what exactly is triggering the fast path.

intel certainly hasn't been shy of using their compiler to aggressively optimize for their CPUs. Their compiler's have also traditionally done a better job of compiling code than pretty much everyone else out there. In particular, they have led the pack when it comes to things like autovectorization (the automatic use of instructions like SSE and AVX).
 

exdeath

Lifer
Jan 29, 2004
13,679
10
81
The scope only goes so far as there are available execution units, rename registers, reservation stations, and pipeline stages. The entire point is simply to keep all resources doing something every clock.

It's not about optimizing code, it's about the decode and schedule unit avoiding stalls and keeping all resources actively doing something by keeping all pipeline stages filled and moving.

See Tomasulo’s algorithm, register renaming, and reservation stations, these are the heart of every modern microprocessor.

Branch prediction is semi-unrelated, it's more a MMU task necessary because of the huge gap between DRAM and CPU speed and deep pipelines. Obviously there is communication and cooperation with the control logic as conditional branches hit the execution stage but branch hits and misses can be thought of as being forwarded to the MMU / prefetcher more than anything.
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |