729b17bc25
428: executor: Use critical sections instead of atomic CAS loops r=lulf a=Dirbaio Optimize executor wakes. CAS loops (either `fetch_update`, or manual `load + compare_exchange_weak`) generate surprisingly horrible code: https://godbolt.org/z/zhscnM1cb This switches to using critical sections, which makes it faster. On thumbv6 (Cortex-M0) it should make it even faster, as it is currently using `atomic-polyfill`, which will make many critical sections for each `compare_exchange_weak` anyway. ``` opt-level=3 opt-level=s atmics: 105 cycles 101 cycles CS: 76 cycles 72 cycles CS+inline: 72 cycles 64 cycles ``` Measured in nrf52 with icache disabled, with this code: ```rust poll_fn(|cx| { let task = unsafe { task_from_waker(cx.waker()) }; compiler_fence(Ordering::SeqCst); let a = cortex_m::peripheral::DWT::get_cycle_count(); compiler_fence(Ordering::SeqCst); unsafe { wake_task(task) } compiler_fence(Ordering::SeqCst); let b = cortex_m::peripheral::DWT::get_cycle_count(); compiler_fence(Ordering::SeqCst); defmt::info!("cycles: {=u32}", b.wrapping_sub(a)); Poll::Ready(()) }) .await; ```` Co-authored-by: Dario Nieuwenhuis <dirbaio@dirbaio.net>