dma does this too, also with 12 bits to check. this decreases code size
significantly (increasing speed when the cache is cold), frees up an
interrupt handler, and avoids read-modify-write cycles (which makes each
processed flag cheaper). due to more iterations per handler invocation
the actual runtime of the handler body remains roughly the
same (slightly faster at O2, slightly slower at Oz).
notably wakers are now kept in one large array indexed by the irq
register bit number instead of three different arrays, this allows for
machine code-level optimizations of waker lookups.