Set thread QoS to USER_INITIATED on Apple Silicon#3278
Set thread QoS to USER_INITIATED on Apple Silicon#3278ssp3nc3r wants to merge 1 commit intostan-dev:developfrom
Conversation
On Apple Silicon Macs, TBB worker threads are created with the default QoS class, which macOS may schedule to efficiency cores even when performance cores are available. This significantly degrades parallel performance. This adds a pthread_set_qos_class_self_np() call in on_scheduler_entry() to set USER_INITIATED QoS, signaling to macOS that these are compute threads the user is waiting for. This causes macOS to prefer performance cores when available. Fixes stan-dev#3277
|
Changes look good and relatively self contained to me, but I don't have a machine to test @bob-carpenter is our resident Mac Silicon fan and has used some of this functionality recently, so maybe he can take a peek before we'd merge |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
| #if defined(__arm64__) || defined(__aarch64__) | ||
| // Set thread QoS to USER_INITIATED so macOS prefers scheduling | ||
| // TBB worker threads on performance cores rather than efficiency cores. | ||
| pthread_set_qos_class_self_np(QOS_CLASS_USER_INITIATED, 0); | ||
| #endif |
There was a problem hiding this comment.
Leaving this as is should be fine, but pthread_set_qos_class_self_np should work for x86 as well.
|
I only added those statements after @WardBrian or @SteveBronder told me about them. I haven't profiled without, which I should probably do. We |
|
@bob-carpenter you might have gotten lost in the notification soup, this is about adding similar settings to the Math library that you had previously set in walnuts |
|
I may have wrongly assumed people reading this knew I was working on WALNUTS, but I understood the context here. @WardBrian pinged me and I was just responding that I haven't even verified that the commands do anything useful in the place where I am using them. You might want to ask @SteveBronder, who I believe is the one who recommended these commands for Apple Silicon where there are a combination of "performance" cores and "efficiency" cores. |
|
This looks good to me, but I'd like to run it on my mac first to make sure all is good. Once it works locally for me I'll approve! |
Summary
On Apple Silicon Macs, TBB worker threads are created with the default QoS class, which macOS may schedule to efficiency cores even when performance cores are available. This significantly degrades parallel performance.
This adds a
pthread_set_qos_class_self_np()call inon_scheduler_entry()to setUSER_INITIATEDQoS, signaling to macOS that these are compute threads the user is waiting for. This causes macOS to prefer performance cores when available.Details
#include <pthread.h>and#include <sys/qos.h>on Applepthread_set_qos_class_self_np(QOS_CLASS_USER_INITIATED, 0)when TBB worker threads enter the scheduler__arm64__or__aarch64__)Testing
Tested on macOS 26.2 (Tahoe) with Apple M3 Ultra (24 P-cores, 8 E-cores).
Before: CPU usage drops from ~800% to ~100-300% per chain after ~4 minutes (threads demoted to E-cores)
After: CPU usage remains stable on P-cores
Fixes #3277