bors[bot] 729b17bc25
Merge #428
428: executor: Use critical sections instead of atomic CAS loops r=lulf a=Dirbaio

Optimize executor wakes.

CAS loops (either `fetch_update`, or manual `load + compare_exchange_weak`) generate surprisingly horrible code: https://godbolt.org/z/zhscnM1cb

This switches to using critical sections, which makes it faster. On thumbv6 (Cortex-M0) it should make it even faster, as it is currently using `atomic-polyfill`, which will make many critical sections for each `compare_exchange_weak` anyway.

```
            opt-level=3   opt-level=s
   atmics:  105 cycles    101 cycles
       CS:   76 cycles     72 cycles
CS+inline:   72 cycles     64 cycles
```

Measured in nrf52 with icache disabled, with this code:

```rust


    poll_fn(|cx| {
        let task = unsafe { task_from_waker(cx.waker()) };

        compiler_fence(Ordering::SeqCst);
        let a = cortex_m::peripheral::DWT::get_cycle_count();
        compiler_fence(Ordering::SeqCst);

        unsafe { wake_task(task) }

        compiler_fence(Ordering::SeqCst);
        let b = cortex_m::peripheral::DWT::get_cycle_count();
        compiler_fence(Ordering::SeqCst);

        defmt::info!("cycles: {=u32}", b.wrapping_sub(a));

        Poll::Ready(())
    })
    .await;
````

Co-authored-by: Dario Nieuwenhuis <dirbaio@dirbaio.net>
2021-10-18 12:05:43 +00:00
..
2021-10-18 12:05:43 +00:00
2021-09-13 17:05:17 +02:00