I don't have the chops to discuss the pitfalls of early "big iron" systems and their problems, but I would think that most of those old systems - at least in MP configurations - had uniform processor distribution. That is, every CPU was basically the same as every other CPU. Or were there exceptions?
Yeah, they were completely symmetrical.
big.LITTLE chips seem to make things even more complicated, because you have your "fast" cores and your "slow" cores, and keeping everything running smoothly while dynamically switching execution between the core groups depending on TDP limits and battery level and other nonsense has got to be a headache.
Yeah, unfortunately I don't know how the Apple dispatcher works.
My initial thought was that when a process came up in the ready queue that they'd give it a lightweight engine and step back and monitor what kind of utilization there was, and add additional resources should that prove insufficient.
But really, dispatching is probably done on an entire system basis, with the processor controller lighting up engines whenever something comes up in the ready queue, starting with the high efficiency engines, and proceeding upward through the big boys if needed to satisfy the processing load. Once an engine completes its task, if it remains idle long enough the dispatcher powers it down to save power (starting with the high power engines).
Really, the objective would be to minimize power use while satisfying the workload. This sounds crazy complicated but in a real time world, it may be self-regulating since large heavy threads may spend more time on their engines and ephemeral tasks would flit in and out as they were completed. This raises all kinds of questions like what are the internal channel paths like, and can I/O be initiated from any engine? Can I/O completion be seen by all engines?
Really, I'm not sure how critical it is to keep these processors "hot" every microsecond since we're no longer talking about a multi-million dollar investment; with faster MVNe storage, you might even want to loop on I/O completion to avoid adding complexity to the OS dispatcher and speed up the I/O without all the attendant problems of processing an asynchronous I/O completion.
Stuff like this is the "secret sauce" of the OS - in the low level guts of MVS's IOS, I can recall a 10 retry counter being used before reporting an I/O error .