are multi-threaded cores the most logical design?

coffeemonster

Senior member
Apr 18, 2015
241
86
101
My understanding of CPU micro-architecture is admittedly limited, but this thought occurred to me recently.

Is simultaneous multi-threading a more efficient approach to maximizing thread count on a transistor budget and keeping IPC about the same?

Consider 8 thread Ryzen 1400(quad core SMT). What if they were able to design 6-8 individual cores that were slightly smaller and narrower single threaded with similar IPC, instead of 4 larger wider SMT cores on the same transistor budget.

Would single thread IPC be directly sacrificed as a result of making the cores smaller and narrower of resources?

I'm not sure if I'm explaining my line of thought well enough, or if my lack of understanding is blatantly obvious.

In the most basic sense I ask, why design a core so full of resources that it is most efficient when processing 2 threads at a time(1 much weaker than the othe) rather than a core designed to maximize 1 thread at a time but smaller, leaner and able to fit more of these on a similar die?
 

whm1974

Diamond Member
Jul 24, 2016
9,460
1,570
96
From my understanding SMT takes up very little die space and doesn't reduce IPC at all.
 

MajinCry

Platinum Member
Jul 28, 2015
2,495
571
136
SMT Only shows benefits in certain scenarios.

http://www.agner.org/optimize/blog/read.php?i=6

I have made some tests of hyperthreading to see how fast each of the two threads is running. The following resources are shared between two threads running in the same core:
  • Cache
  • Branch prediction resources
  • Instruction fetch and decoding
  • Execution units
Hyperthreading is no advantage if any of these resources is a limiting factor for the speed. But hyperthreading can be an advantage if the speed is limited by something else. To be more specific, each of the two threads will run at more than half speed in the following cases:
  • If memory data are so scattered that there will be many cache misses regardless of whether each thread can use the full cache or only half of it. Then one thread can use all the execution resources while the other thread is waiting for a memory operand that was not in the cache.

  • If there are many branch mispredictions and the number of branch mispredictions is not increased much by sharing the branch target buffer and branch history table between two threads. Then one thread can use all the execution resources while the other thread is waiting for the misprediction to be resolved.

  • If the code has many long dependency chains that prevent efficient use of the execution units.

Edit: Also, Hyperthreading can also decrease performance. From the above source:

In these cases, each of the two threads will run at more than half speed, but less than full speed. The total performance is never doubled by hyperthreading, but it may be increased by e.g. 25%.

On the other hand, if the performance is limited by any of the shared resources, for example the instruction fetcher, the memory read port, or the multiply unit, then the total performance is not increased by hyperthreading.

Actually, in the worst cases the total performance is decreased by hyperthreading because some resources are wasted when the two threads compete for the same resources. A quick google search reveals several examples of applications that run slower with hyperthreading than when hyperthreading is disabled.

Anecdotally, there was one user over on the Tale Of Two Wastelands forum who was getting some hardcore microstutter in New Vegas, with his i7. Told him to try disabling hyperthreading, et voila.
 
Reactions: Dresdenboy

DrMrLordX

Lifer
Apr 27, 2000
21,813
11,167
136
My understanding of CPU micro-architecture is admittedly limited, but this thought occurred to me recently.

Is simultaneous multi-threading a more efficient approach to maximizing thread count on a transistor budget and keeping IPC about the same?

Consider 8 thread Ryzen 1400(quad core SMT). What if they were able to design 6-8 individual cores that were slightly smaller and narrower single threaded with similar IPC, instead of 4 larger wider SMT cores on the same transistor budget.

Would single thread IPC be directly sacrificed as a result of making the cores smaller and narrower of resources?

I'm not sure if I'm explaining my line of thought well enough, or if my lack of understanding is blatantly obvious.

In the most basic sense I ask, why design a core so full of resources that it is most efficient when processing 2 threads at a time(1 much weaker than the othe) rather than a core designed to maximize 1 thread at a time but smaller, leaner and able to fit more of these on a similar die?

You should examine the POWER8 and POWER9 architectures from IBM. They basically have what your propose: a huge, ultra-wide monstrous core with the ability to handle multiple threads via SMT:

https://en.wikipedia.org/wiki/POWER9

Though in the case of POWER9, the SMT implementation is . . . distinctively different than Intel's.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
More cores is usually better. Whether you get extra benefits in SMT and HT is a bit murky, though a lot has happened since the days of Pentium 4 HT.
Here for example CSGO shows improvement with core count but loses performance when you have HT enabled:
 

imported_jjj

Senior member
Feb 14, 2009
660
430
136
The question should be about power not area as it's far more important nowadays.
And i don't have an answer, never looked at perf gains vs power for SMT but chances are that it's pretty efficient.
Computing in general has to move towards parallelism and accelerators as otherwise there is no realistic path forward. The era of the CPU is over.
 

sm625

Diamond Member
May 6, 2011
8,172
137
106
Considering how much time the pipeline spends stalled out, it only makes sense to have SMT. I expected SMT to have a larger penalty on the Intel chips with the huge L4 cache, but it seems very little testing has been done on the 5775C with HT disabled.
 

TheELF

Diamond Member
Dec 22, 2012
3,993
744
126
Edit: Also, Hyperthreading can also decrease performance. From the above source:
In these cases, each of the two threads will run at more than half speed, but less than full speed. The total performance is never doubled by hyperthreading, but it may be increased by e.g. 25%.

Actually, in the worst cases the total performance is decreased by hyperthreading because some resources are wasted when the two threads compete for the same resources. A quick google search reveals several examples of applications that run slower with hyperthreading than when hyperthreading is disabled.
More cores is usually better. Whether you get extra benefits in SMT and HT is a bit murky, though a lot has happened since the days of Pentium 4 HT.
Here for example CSGO shows improvement with core count but loses performance when you have HT enabled:

Hyperthreading will do whatever you tell it to.
Csgo uses one thread for it's main game logic and fills up the rest of the cores with worker threads that prepare graphics to give you better FPS.
If you run the game, all these threads will have the same priority causing the things agner talks about,task manager takes away time from the main thread,but this is due to bad "coding" .
Microsuks on Scheduling Priorities
The system treats all threads with the same priority as equal. The system assigns time slices in a round-robin fashion to all threads with the highest priority. If none of these threads are ready to run, the system assigns time slices in a round-robin fashion to all threads with the next highest priority. If a higher-priority thread becomes available to run, the system ceases to execute the lower-priority thread (without allowing it to finish using its time slice), and assigns a full time slice to the higher-priority thread.
Screw equal,that's not what we want we want the important threads to run as if they where on a real core all on their own.
Use HIGH_PRIORITY_CLASS with care. If a thread runs at the highest priority level for extended periods, other threads in the system will not get processor time.
That's what we want,the main thread getting all the processor power it can use and the rest but only the rest going towards rendering.
You should almost never use REALTIME_PRIORITY_CLASS, because this interrupts system threads that manage mouse input, keyboard input, and background disk flushing.
Er,well shut up, we have enough cores.

 
Last edited:

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Hyperthreading will do whatever you tell it to.
Csgo uses one thread for it's main game logic and fills up the rest of the cores with worker threads that prepare graphics to give you better FPS.
If you run the game, all these threads will have the same priority causing the things agner talks about,task manager takes away time from the main thread,but this is due to bad "coding" .
Microsuks on Scheduling Priorities

Screw equal,that's not what we want we want the important threads to run as if they where on a real core all on their own.

That's what we want,the main thread getting all the processor power it can use and the rest but only the rest going towards rendering.
From my own tests I saw no benefit from thread and process priorities regarding HT. Hyperthreading is a "humanistic" variant of SMT, where each thread is seen as being created equal. So in the CSGO case, even the best coding skills won't help, if there is still an OS in the background and sees a free logical core, which is the second logical core on a physical core, which already runs that mentioned CSGO main thread. Thus a simple BG task could slow down the CSGO thread by 20% or more.

Idea: If a software wants to avoid it, it could actually create a thread, which does nothing (but blocks disturbing threads), and puts it via affinity setting next to an important thread on a physical core. Has that been tried already?
 

TheELF

Diamond Member
Dec 22, 2012
3,993
744
126
Idea: If a software wants to avoid it, it could actually create a thread, which does nothing (but blocks disturbing threads), and puts it via affinity setting next to an important thread on a physical core. Has that been tried already?
Real-time priority (on the thread itself) interrupts even system threads!!!
It stops any other thread from running on the same logical core(if the thread can use 100% of this core) there is exactly 0% chance of "a simple BG task could slow down the CSGO thread by 20% or more"
Read the MS link I posted:
You should almost never use REALTIME_PRIORITY_CLASS, because this interrupts system threads that manage mouse input, keyboard input, and background disk flushing.
Look at the video, there is more than an easily visible benefit from changing priorities (OF THREADS NOT TASKS(tasks also but that's an other discussion) )
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Real-time priority (on the thread itself) interrupts even system threads!!!
It stops any other thread from running on the same logical core(if the thread can use 100% of this core) there is exactly 0% chance of "a simple BG task could slow down the CSGO thread by 20% or more"
Read the MS link I posted:

Look at the video, there is more than an easily visible benefit from changing priorities (OF THREADS NOT TASKS(tasks also but that's an other discussion) )
I stand corrected. So this is a non-issue. Now about the software itself: Does it prevent other threads from disturbing one important main thread (the critical path)?

I'm still at work with the video blocked. Will watch it later.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
Csgo uses one thread for it's main game logic and fills up the rest of the cores with worker threads that prepare graphics to give you better FPS.
I'm not so sure as someone mentioned in another thread how running more than 8 bots in a custom match tanked fps on his 4770k.

Besides, engine limitations can show up even more prominently with HT enabled, and I have a feeling that an older version of Source would be appropriate to test in this regard.
 

TheELF

Diamond Member
Dec 22, 2012
3,993
744
126
I stand corrected. So this is a non-issue. Now about the software itself: Does it prevent other threads from disturbing one important main thread (the critical path)?

I'm still at work with the video blocked. Will watch it later.
If you mean process hacker it just changes the priority value,something you could do through windows,it's windows task manager that will not interfere with a thread put into real-time unless there are other threads that are equally high (lol in priority that is)
 

TheELF

Diamond Member
Dec 22, 2012
3,993
744
126
I'm not so sure as someone mentioned in another thread how running more than 8 bots in a custom match tanked fps on his 4770k.
This isn't about fps tanking, that can happen for any number of reasons,this is about threads being able to run as fast as on "normal" cores if you (wish it where the devs) tell them to do so.
 

ashetos

Senior member
Jul 23, 2013
254
14
76
There are some applications, such as games, that are latency sensitive. If a game effectively uses more threads than the cores available then the OS needs to schedule how the threads are run, and assign time slices.

The scheduling can be very bad, up to 10 milliseconds of extra latency. There is also context switch overhead with worker threads, which can be almost zero if all threads are actively running and queues are fed.
 

TheELF

Diamond Member
Dec 22, 2012
3,993
744
126
There is also context switch overhead with worker threads, which can be almost zero if all threads are actively running and queues are fed.
Look at the CS:GO video,process hacker in the background displays context switches,they drop to 1/3 when the main thread runs at real-time because the system knows that it should not "switch away" from running the main thread.
It has nothing, or not much, to do with available cores.
 
Reactions: Dresdenboy

ashetos

Senior member
Jul 23, 2013
254
14
76
Look at the CS:GO video,process hacker in the background displays context switches,they drop to 1/3 when the main thread runs at real-time because the system knows that it should not "switch away" from running the main thread.
It has nothing, or not much, to do with available cores.
I'm talking about active threads that are greater than the available number of cores though.
 

TheELF

Diamond Member
Dec 22, 2012
3,993
744
126
Cs:go runs 65 threads...3 of them to a high degree,the cpu only has 2 (real) cores
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
If you mean process hacker it just changes the priority value,something you could do through windows,it's windows task manager that will not interfere with a thread put into real-time unless there are other threads that are equally high (lol in priority that is)
As it seems, it does that thread-wise. That's the interesting thing.
 

HexiumVII

Senior member
Dec 11, 2005
661
7
81
Makes me think if i get a Ryzen 8 core, i can disable HT and get max performance. I mean I get by fine with 6700k, 8 true cores would be awesome.
 

Dygaza

Member
Oct 16, 2015
176
34
101
Worth mentioning is that you can also set affinities for different threads with process hacker. So if you run 1 or 2 cores at higher clocks than rest, you can assign your heavies threads to those cores.
 

dogen1

Senior member
Oct 14, 2014
739
40
91
More cores is usually better. Whether you get extra benefits in SMT and HT is a bit murky, though a lot has happened since the days of Pentium 4 HT.
Here for example CSGO shows improvement with core count but loses performance when you have HT enabled:

1-4 cores were faster with HT enabled...
 
Last edited:

R0H1T

Platinum Member
Jan 12, 2013
2,582
162
106
As it seems, it does that thread-wise. That's the interesting thing.
It can also alter the thread I/O & page priority as well as that of the main program, I use it all the time to cheat in benchmarks

I'm surprised many here don't know what process hacker is, it isn't a task manager replacement, not even close.
 
Reactions: Dresdenboy
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |