1425: rp pio, round 2 r=Dirbaio a=pennae
another round of bugfixes for pio, and some refactoring. in the end we'd like to make pio look like all the other modules and not expose traits that provide all the methods of a type, but put them onto the type itself. traits only make much sense, even if we added an AnyPio and merged the types for the member state machines (at the cost of at least a u8 per member of Pio).
Co-authored-by: pennae <github@quasiparticle.net>
not requiring a PioInstance for splitting lets us split from a
PeripheralRef or borrowed PIO as well, mirroring every other peripheral
in embassy_rp. pio pins still have to be constructed from owned pin
instances for now.
merge into PioInstance instead. PioPeripheral was mostly a wrapper
around PioInstance anyway, and the way the wrapping was done required
PioInstanceBase<N> types where PIO{N} could've been used instead.
1423: rp: fix gpio InputFuture and inefficiencies r=pennae a=pennae
InputFuture could not wait for edges without breaking due to a broken From impl, but even if the impl had been correct it would not have worked correctly because raw edge interrupts are sticky and must be cleared from software. also replace critical sections with atomic accesses, and do nvic setup only once.
Co-authored-by: pennae <github@quasiparticle.net>
doing this setup work repeatedly, on every wait, is unnecessary. with
nothing ever disabling the interrupt it is sufficient to enable it once
during device init and never touch it again.
pio control registers are notionally shared between state machines as
well. state machine operations that change these registers must use
atomic accesses (or critical sections, which would be overkill).
notably PioPin::set_input_sync_bypass was even wrong, enabling the
bypass on a pin requires the corresponding bit to be set (not cleared).
the PioCommon function got it right.
fixing the dma word size to 32 makes it impossible to implement any
peripheral that takes its data in smaller chunks, eg uart, spi, i2c,
ws2812, the list goes on.
compiler barriers were also not set correctly; we need a SeqCst barrier
before starting a transfer as well to avoid reordering of accesses into
a buffer after dma has started.
InputFuture did not use and check edge interrupts correctly.
InterruptTrigger should've checked for not 1,2,3,4 but 1,2,4,8 since the
inte fields are bitmasks, and not clearing INTR would have repeatedly
triggered edge interrupts early.
1414: rp: report errors from buffered and dma uart receives r=Dirbaio a=pennae
neither of these reported errors so far, which is not ideal. add error reporting to both of them that matches the blocking error reporting as closely as is feasible, even allowing partial receives from buffered uarts before errors are reported where they would have been by the blocking code. dma transfers don't do this, if an errors applies to any byte in a transfer the entire transfer is nuked (though we probably could report how many bytes have been transferred).
Co-authored-by: pennae <github@quasiparticle.net>
this reports errors at the same location the blocking uart would, which
works out to being mostly exact (except in the case of overruns, where
one extra character is dropped). this is actually easier than going
nuclear in the case of errors and nuking both the buffer contents and
the rx fifo, both of which are things we'd have to do in addition to
what's added here, and neither are needed for correctness.
sending break conditions is necessary to implement some protocols, and
the hardware supports this natively. we do have to make sure that we
don't assert a break condition while the uart is busy though, otherwise
the break may be inserted before the last character in the tx fifo.
instruction memory is a shared resource. writing it only from PioCommon
clarifies this, and perhaps makes it more obvious that multiple state
machines can share the same instructions.
this also allows *freeing* of instruction memory to reprogram the
system, although this interface is not entirely safe yet. it's safe in
the sense rusts understands things, but state machines may misbehave if
their instruction memory is freed and rewritten while they are running.
fixing this is out of scope for now since it requires some larger
changes to how state machines are handled. the interface provided
currently is already unsafe in that it lets people execute instruction
memory that has never been written, so this isn't much of a drawback for now.
pin and irq operations affect the entire pio block. with pins this is
not very problematic since pins themselves are resources, but irqs are
not treated like that and can thus interfere across state machines. the
ability to wait for an irq on a state machine is kept to make
synchronization with user code easier, and since we can't inspect loaded
programs at build time we wouldn't gain much from disallowing waits from
state machines anyway.
this mainly removes the need for explicit indexing to get the pac
object. runtime effect is zero, but arguably things are a bit easier to
read with less indexing.
this is already done during platform init. it wasn't even sound in the
original implementation because futures would meddle with the nvic in
critical sections, while another (interrupt) executor could meddle with
the nvic without critical sections here. it is only accidentally sound
now and only if irq1 of both pios isn't used by user code. luckily the
worst we can expect to happen is interrupt priorities being set wrong,
but wrong is wrong is wrong.
since we never actually *disable* these interrupts for any length of
time we can simply enable them globally. we also initialize all pio
interrupt flags to not cause system interrupts since state machine
irqa are not necessarily meant to cause a system interrupt when set. the
fifo interrupts are sticky and can likewise only be cleared inside the
handler by disabling them.
dma does this too, also with 12 bits to check. this decreases code size
significantly (increasing speed when the cache is cold), frees up an
interrupt handler, and avoids read-modify-write cycles (which makes each
processed flag cheaper). due to more iterations per handler invocation
the actual runtime of the handler body remains roughly the
same (slightly faster at O2, slightly slower at Oz).
notably wakers are now kept in one large array indexed by the irq
register bit number instead of three different arrays, this allows for
machine code-level optimizations of waker lookups.
1406: rp: DMA behaviour during flash operations r=Dirbaio a=kalkyl
This PR changes the old behaviour during flash operations where all DMA transfers were paused during the flash operation.
The new approach is to wait for any DMA operating in flash region to finish and let RAM transfers continue.
Co-authored-by: kalkyl <henrik.alser@me.com>
storing a full function pointer initialized to a resolver trampoline
lets us avoid the runtime cost of checking whether we need to do the
initialization.
rp-hal has done this very well already, so we'll just copy their entire
impl again. only div.rs needed some massaging because our sio access
works a little differently, everything else worked as is.
1378: Add ability to invert UART pins, take 2 r=Dirbaio a=jakewins
Same PR as before, except this now works :)
There was a minor hiccup in the UartRx code where the rx pin got passed as the tx argument, so the invert settings didn't get applied. With this fix, my local setup at least is happily reading inverted uart data.
Co-authored-by: Jacob Davis-Hansson <jake@davis-hansson.com>
1372: rp: add division intrinsics r=Dirbaio a=pennae
rp2040-hal adds division intrinsics using the hardware divider unit in the SIO, as does the pico-sdk itself. using the hardware is faster than the compiler_rt implementations, and more compact too.
since embassy does not expose the hardware divider in any way (yet?) we could go even further an remove the state-saving code rp2040-hal needs, but that doesn't seem to be worth it.
Co-authored-by: pennae <github@quasiparticle.net>