Clever OS Scheduling Partly Explains Apple M1’s Responsiveness
When Apple launched the M1, one of the persistent critiques from end-users was how responsive the CPU felt, even during ordinary desktop usage. Now, a macOS developer has found clues to how Apple pulled off the improvement. It isn’t a matter of boosting CPU performance, at least not exactly. What Apple has done is change how iOS responds to quality-of-service (QoS) metrics and how workloads are scheduled on the chip.
Dr. Howard Oakley is an author, Mac developer, and former Royal Navy Surgeon. He’s recently written about his comparisons between an M1-powered Mac and his Xeon-based Mac Pro, and how differently macOS behaves on the two machines.
Before we dive into his findings, I’d like to toss in a bit of historical context. One of the challenges of testing Hyper-Threading, back when it debuted on the Pentium 4, was the difficulty of measuring exactly how it impacted system behavior. Scott Wasson, then of the Tech Report, coined the term “creamy smoothness” to describe how the P4 behaved under load compared to an HT-less CPU. Even though the AMD Athlon XPs of the day might be faster in a single-threaded workload, Hyper-Threading kept the system responsive.
Fast forward to 2008-2009, and the launch and popularity of Intel’s first-generation Atom. While no Bonnell-powered Atom system had much of a CPU to speak of, netbooks based on Nvidia’s Ion chipset felt like they were in an entirely different class of device. Even though the integrated Nvidia GPU only offloaded the Windows 7 UI, it made Ion feel distinctly up-market compared with the Intel 945 chipset.
We have, therefore, historical background from Windows to demonstrate the impact of proper task offloading and how much of an impact it can have on task responsiveness. In the modern era, macOS allows developers to define different QoS levels. On an x86 CPU, Dr. Oakley’s testing shows that threads execute as quickly as possible at any QoS setting, so long as an application with a higher QoS doesn’t preempt it. In his testing, this worked out to a consistent 5.6 – 6.6 second compression time for a 10GB file. Testing multiple instances of the application simultaneously showed that the version with a higher QoS executed in the same 5.6 – 6.6s window, while the run with a lower QoS took as long as 24 seconds. All of this is more-or-less equivalent to what we’d expect from Windows.
The M1, however, does not behave this way. Here’s Dr. Oakley:
All operations with a QoS of 9 (background) were run exclusively on the four Efficiency (Icestorm) cores, even when that resulted in their being fully loaded and the Performance cores remaining idle. Operations with any higher QoS, from 17 to 33, were run on all eight cores.
Apple, in other words, has changed the way macOS treats the M1 to prioritize responsiveness. Instead of being used to execute background tasks or OS updates, the FireStorm cores are reserved for high-priority applications. If the application demands maximum performance, it can still run across all eight cores, even though this is probably more likely to cause some degree of desktop lag. The system will preferentially run OS tasks in-background, even when this makes them execute much more slowly, in the name of keeping power consumption low.
There’s no specific reason why an x86 CPU couldn’t be run in this fashion. While x86 CPUs are still almost entirely homogeneous, the OS could hypothetically dedicate a specific set of cores to processing background tasks, while reserving the rest for peak performance.
This is, at minimum, a clever way for Apple to improve the end-user experience. Intel is moving to its own hybrid architecture with Alder Lake later this year, and we may see Windows 10 + hybrid x86 CPUs deploy a similar system for minimizing power consumption while maximizing performance.