Hulk
Diamond Member
- Oct 9, 1999
- 4,373
- 2,251
- 136
Let's review. I don't think there is much debate that the Big/Little concept is useful for mobile when efficiency is of the utmost importance. We don't have to look further than phones to see that since that is the norm these days it must be advantageous.
I think most of us would agree that Intel is going with a Big/Little design for Alder Lake because mobile is their first priority. The question to be asked is as follows:
Is the hybrid design a detriment to desktop performance and Intel will just do the best than can with it?
Or have they found a way that for a given transistor budget they can achieve higher performance with Big/Little?
While there is little doubt Intel will stay in the optimum efficiency range of the schmoo plot for mobile, they have demonstrated that if enormous power is required for desktop to be competitive they will do it. So I don't take much stock in the theory that for the desktop they will limit Big or Little core speed for power reasons, especially at the top of the stack where competition with AMD is paramount.
It seems to me that the elephant in the room here is for a given application do all threads need the same amount of compute? I don't know. The easiest way to design to to assume they do and throw as many powerful cores at the application as you can. ie Threadripper
A more elegant solution might be to have two "bucket's" of cores. Big and Little and assign them based on the compute needs of various threads for the ongoing process. Of course this can be tailored to a specific application but can it work across the board? As I wrote above I found a neat little app called "processthreadsview" that allows you to see the threads created by a specific application, the rate of context switching and the user/kernel time for each thread.
I have only started to check it out but it seems that the rate of context switching could be meaningful. As we know, everytime the context is switched for a thread there is inefficiency as one thread is flushed from the CPU and another one started. The less context switching the better.
Here's a screenshot of Handbrake running the benchmark from this forum. Lots of threads are spawned and the first 12 are doing quite a bit of context switching. This is from my 4770k. It would be interesting to see what this looks like with 8/16 core, 12/24 cores, and 16/32 cores. Seems like 7 or 8 threads are getting hammered pretty good by the context switching rate, which I believe is switches/second. So those "main" threads are being cut away from to service other threads frequently.
Here is a screen shot of Presonus Studio One playing back a relatively lightly loaded multitrack song. Here we have 7 threads with high context switching and 8 or so other threads which seem to be lightly loaded and being serviced during the context switches from the main threads.
Honestly I don't know what the heck this all really means but it seems it could be that these applications could do well with 8 dedicated Big cores and a bunch of Little cores to service the less heavily loaded threads. Or it could mean nothing because the context switching performance hit is negligible.
One more interesting observation is that the user time for Studio One is focused on 8 threads. For Handbrake it's 7 threads, plus one less loaded one. Is this because my 4770k has 8 logical processors or is this because these apps were coded with 4/8 core processors in mind? If someone ran the Handbrake test on an 8 core rig we'd at least know that right?
But I thought it was interesting enough to post.
I think most of us would agree that Intel is going with a Big/Little design for Alder Lake because mobile is their first priority. The question to be asked is as follows:
Is the hybrid design a detriment to desktop performance and Intel will just do the best than can with it?
Or have they found a way that for a given transistor budget they can achieve higher performance with Big/Little?
While there is little doubt Intel will stay in the optimum efficiency range of the schmoo plot for mobile, they have demonstrated that if enormous power is required for desktop to be competitive they will do it. So I don't take much stock in the theory that for the desktop they will limit Big or Little core speed for power reasons, especially at the top of the stack where competition with AMD is paramount.
It seems to me that the elephant in the room here is for a given application do all threads need the same amount of compute? I don't know. The easiest way to design to to assume they do and throw as many powerful cores at the application as you can. ie Threadripper
A more elegant solution might be to have two "bucket's" of cores. Big and Little and assign them based on the compute needs of various threads for the ongoing process. Of course this can be tailored to a specific application but can it work across the board? As I wrote above I found a neat little app called "processthreadsview" that allows you to see the threads created by a specific application, the rate of context switching and the user/kernel time for each thread.
I have only started to check it out but it seems that the rate of context switching could be meaningful. As we know, everytime the context is switched for a thread there is inefficiency as one thread is flushed from the CPU and another one started. The less context switching the better.
Here's a screenshot of Handbrake running the benchmark from this forum. Lots of threads are spawned and the first 12 are doing quite a bit of context switching. This is from my 4770k. It would be interesting to see what this looks like with 8/16 core, 12/24 cores, and 16/32 cores. Seems like 7 or 8 threads are getting hammered pretty good by the context switching rate, which I believe is switches/second. So those "main" threads are being cut away from to service other threads frequently.
Here is a screen shot of Presonus Studio One playing back a relatively lightly loaded multitrack song. Here we have 7 threads with high context switching and 8 or so other threads which seem to be lightly loaded and being serviced during the context switches from the main threads.
Honestly I don't know what the heck this all really means but it seems it could be that these applications could do well with 8 dedicated Big cores and a bunch of Little cores to service the less heavily loaded threads. Or it could mean nothing because the context switching performance hit is negligible.
One more interesting observation is that the user time for Studio One is focused on 8 threads. For Handbrake it's 7 threads, plus one less loaded one. Is this because my 4770k has 8 logical processors or is this because these apps were coded with 4/8 core processors in mind? If someone ran the Handbrake test on an 8 core rig we'd at least know that right?
But I thought it was interesting enough to post.