1465: rp: continue clock rework r=Dirbaio a=pennae
vastly reduce the code size of initial clock config (over 700 bytes saved!), at the cost of about 48 bytes of ram used to store the frequencies of all clocks in the system. also stop exporting unstable pac items for clock config, fix a few settings that were out of spec, and add missing features (most notably gpin source information).
Co-authored-by: pennae <github@quasiparticle.net>
gpin clock sources aren't going to be very useful during cold boot and
thus require runtime clock reconfig. once we get there we can use this
for reference. or maybe we can't, only time will tell.
we'll take static ownership of an entire pin (not just a limited
reference), otherwise we cannot at all guarantee that the pin will not
be reused for something else while still in use. in theory we could
limit the liftime of this use, but that would require attaching
lifetimes to ClockConfig (and subsequently the main config), passing
those through init(), and returning an object that undoes the gpin
configuration on drop. that's a lot unnecessary support code while we
don't have runtime clock reconfig.
don't recalculate clock frequencies every time they are asked for. while
this is not very often in practice it does consume a bunch of flash
space that cannot be optimized away, and was pulled in unconditionally
previously. while we technically only need the configured rosc, xosc and
gpin frequencies it is easier to store all frequencies (and much cheaper
at runtime too).
if rosc really does run at 140MHz in high at div=1 then these values
were not correct and would've exceeded the chip spec. the HIL test
device seems to run fast (150MHz) so they're still not quite correct,
but rosc has high variance anyway so it's probably fine.
exposing pac items kind of undermines the unstable-pac feature. directly
exposing register structure is also pretty inconvenient since the clock
switching code takes care of the src/aux difference in behavior, so a
user needn't really be forced to write down decomposed register values.
the datasheet says that the xosc may be run by feeding a square wave
into the XIN pin of the chip, but requires that the oscillator be set to
pass through XIN in that case. it does not mention how, the xosc
peripheral does not seem to have any config bits that could be set to
this effect, and pico-sdk seems to have no (or at least no special)
handling for this configuration at all. it can thus be assumed to either
be not supported even by the reference sdk or to not need different
handling.
solvers usually output fbdiv directly, using vco_freq to get back to
fbdiv is not all that necessary or useful. both vco_freq and fbdiv have
hidden constraints, but vco_freq is a lot less accurate because the
fbdiv value resulting from the division may be off by almost a full
ref_freq's worth of frequency.
also fixes the usb pll config, which ran the pll vco way out of (below)
spec.
we might not configure both, so we should put the others into reset
state. leaving them fully as is might leave them running, which might
not be the goal for runtime reconfig (when it comes around). this now
mirrors how we reset all clock-using peripherals and only unreset those
that are properly clocked.
this is only really useful for runtime *re*configuration, which we don't
currently support. even runtime reconfig probably won't need it, unless
we keep taking the sledgehammer approach of reconfiguring everything all
the time.
It is UB to pass `entry` to core1 as `&mut`, because core0 keeps
an aliasing pointer to that memory region, and actually writes to
it (when `spawn_core1` returns, the stack frame gets deallocated and the memory
gets reused). This violates noalias requirements.
Added the fence just in case, een though it works without.
It was intended to allow changing baudrate on shared spi/i2c. There's no
advantage in using it for PWM or PIO, and makes it less usable because you have to
have `embassy-embedded-hal` as a dep to use it.
1439: rp: use rp2040-boot2 to provide the boot2 blob r=Dirbaio a=pennae
we're currently shipping an old boot2 that runs the flash at half speed. use the more recent version instead, and allow user to choose between the different supported boot2 versions for different flash chips if they need that.
Co-authored-by: pennae <github@quasiparticle.net>
we're currently shipping an old boot2 that runs the flash at half speed.
use the more recent version instead, and allow user to choose between
the different supported boot2 versions for different flash chips if they
need that.
execution wraps around after the end of instruction memory and wrapping
works with this, so we may as well allow program loading across this
boundary. could be useful for reusing chunks of instruction memory.
sometimes state machines need to be started, restarted, or synchronized
at exactly the same time. the current interface does not allow this but
the hardware does, so let's expose that.
the many individual sets aren't very efficient, and almost no checks
were done to ensure that the configuration written to the hardware was
actually valid. this adresses both of these.
none of these are safe. the x/y functions mangle the fifos, the set
functions require the state machine to be stopped to be in any way safe,
the out functions do both of those things at once. only the jump
instruction is marginally safe, but running this on an active program is
bound to cause problems.
programs contain information we could pull from them directly and use to
validate other configuration of the state machine instead of asking the
user to pull them out and hand them to us bit by bit. unfortunately
programs do not specify how many in or out bits they use, so we can only
handle side-set and wrapping jumps like this. it's still something though.
there's nothing this critical section protects against. both read and
write-to-clear are atomic and don't interfere with other irq futures,
only potentially with setting/clearing an irq flag from an arm core.
neither have ever been synchronized, and both have the same observable
effects under atomic writes and critical sections. (for both setting and
clearing an irq flag observable differences could only happen if the
set/clear happened after the poll read, but before the write. if it's a
clear we observe the same effects as sequencing the clear entirely after
the poll, and if it's a set we observe the same effects as sequencing
the set entirely before the poll)
it's only any good for PioPin because there it follows a pattern of gpio
pin alternate functions being named like that, everything else can just
as well be referred to as `pio::Thing`
this *finally* allows sound implementions of bidirectional transfers
without blocking. the futures previously allowed only a single direction
to be active at any given time, and the dma transfers didn't take a
mutable reference and were thus unsound.
this way we can share irq handling between state machines and common
without having to duplicate the methods. it also lets us give irq flag
access to places without having to dedicate a state machine or the
common instance to those places, which can be very useful to eg trigger
an event and wait for a confirmation using an irq wait object.
we can only have one active waiter for any given irq at any given time.
allowing waits for irqs on state machines bypasses this limitation and
causes lost events for all but the latest waiter for a given irq.
splitting this out also allows us to signal from state machines to other
parts of the application without monopolizing state machine access for
the irq wait, as would be necessary to make irq waiting sound.
move all methods into PioStateMachine instead. the huge trait wasn't
object-safe and thus didn't have any benefits whatsoever except for
making it *slightly* easier to write bounds for passing around state
machines. that would be much better solved with generics-less instances.
once all sharing owners of pio pins have been dropped we should reset
the pin for use by other hal objects. unfortunately this needs an atomic
state per pio block because PioCommon and all of the state machines
really do share ownership of any wrapped pins. only PioCommon can create
them, but all state machines can keep them alive. since state machines
can be moved to core1 we can't do reference counting in relaxed mode,
but we *can* do relaxed pin accounting (since only common and the final
drop can modify this).
we can't prove that some instruction memory is not used as long as state
machines are alive, and we can pass instance memory handles between
instances as well. mark free_instr unsafe, with documentation for this caveat.
1425: rp pio, round 2 r=Dirbaio a=pennae
another round of bugfixes for pio, and some refactoring. in the end we'd like to make pio look like all the other modules and not expose traits that provide all the methods of a type, but put them onto the type itself. traits only make much sense, even if we added an AnyPio and merged the types for the member state machines (at the cost of at least a u8 per member of Pio).
Co-authored-by: pennae <github@quasiparticle.net>
not requiring a PioInstance for splitting lets us split from a
PeripheralRef or borrowed PIO as well, mirroring every other peripheral
in embassy_rp. pio pins still have to be constructed from owned pin
instances for now.
merge into PioInstance instead. PioPeripheral was mostly a wrapper
around PioInstance anyway, and the way the wrapping was done required
PioInstanceBase<N> types where PIO{N} could've been used instead.
1423: rp: fix gpio InputFuture and inefficiencies r=pennae a=pennae
InputFuture could not wait for edges without breaking due to a broken From impl, but even if the impl had been correct it would not have worked correctly because raw edge interrupts are sticky and must be cleared from software. also replace critical sections with atomic accesses, and do nvic setup only once.
Co-authored-by: pennae <github@quasiparticle.net>
doing this setup work repeatedly, on every wait, is unnecessary. with
nothing ever disabling the interrupt it is sufficient to enable it once
during device init and never touch it again.
pio control registers are notionally shared between state machines as
well. state machine operations that change these registers must use
atomic accesses (or critical sections, which would be overkill).
notably PioPin::set_input_sync_bypass was even wrong, enabling the
bypass on a pin requires the corresponding bit to be set (not cleared).
the PioCommon function got it right.
fixing the dma word size to 32 makes it impossible to implement any
peripheral that takes its data in smaller chunks, eg uart, spi, i2c,
ws2812, the list goes on.
compiler barriers were also not set correctly; we need a SeqCst barrier
before starting a transfer as well to avoid reordering of accesses into
a buffer after dma has started.
InputFuture did not use and check edge interrupts correctly.
InterruptTrigger should've checked for not 1,2,3,4 but 1,2,4,8 since the
inte fields are bitmasks, and not clearing INTR would have repeatedly
triggered edge interrupts early.
1414: rp: report errors from buffered and dma uart receives r=Dirbaio a=pennae
neither of these reported errors so far, which is not ideal. add error reporting to both of them that matches the blocking error reporting as closely as is feasible, even allowing partial receives from buffered uarts before errors are reported where they would have been by the blocking code. dma transfers don't do this, if an errors applies to any byte in a transfer the entire transfer is nuked (though we probably could report how many bytes have been transferred).
Co-authored-by: pennae <github@quasiparticle.net>
this reports errors at the same location the blocking uart would, which
works out to being mostly exact (except in the case of overruns, where
one extra character is dropped). this is actually easier than going
nuclear in the case of errors and nuking both the buffer contents and
the rx fifo, both of which are things we'd have to do in addition to
what's added here, and neither are needed for correctness.
sending break conditions is necessary to implement some protocols, and
the hardware supports this natively. we do have to make sure that we
don't assert a break condition while the uart is busy though, otherwise
the break may be inserted before the last character in the tx fifo.
instruction memory is a shared resource. writing it only from PioCommon
clarifies this, and perhaps makes it more obvious that multiple state
machines can share the same instructions.
this also allows *freeing* of instruction memory to reprogram the
system, although this interface is not entirely safe yet. it's safe in
the sense rusts understands things, but state machines may misbehave if
their instruction memory is freed and rewritten while they are running.
fixing this is out of scope for now since it requires some larger
changes to how state machines are handled. the interface provided
currently is already unsafe in that it lets people execute instruction
memory that has never been written, so this isn't much of a drawback for now.
pin and irq operations affect the entire pio block. with pins this is
not very problematic since pins themselves are resources, but irqs are
not treated like that and can thus interfere across state machines. the
ability to wait for an irq on a state machine is kept to make
synchronization with user code easier, and since we can't inspect loaded
programs at build time we wouldn't gain much from disallowing waits from
state machines anyway.
this mainly removes the need for explicit indexing to get the pac
object. runtime effect is zero, but arguably things are a bit easier to
read with less indexing.
this is already done during platform init. it wasn't even sound in the
original implementation because futures would meddle with the nvic in
critical sections, while another (interrupt) executor could meddle with
the nvic without critical sections here. it is only accidentally sound
now and only if irq1 of both pios isn't used by user code. luckily the
worst we can expect to happen is interrupt priorities being set wrong,
but wrong is wrong is wrong.
since we never actually *disable* these interrupts for any length of
time we can simply enable them globally. we also initialize all pio
interrupt flags to not cause system interrupts since state machine
irqa are not necessarily meant to cause a system interrupt when set. the
fifo interrupts are sticky and can likewise only be cleared inside the
handler by disabling them.
dma does this too, also with 12 bits to check. this decreases code size
significantly (increasing speed when the cache is cold), frees up an
interrupt handler, and avoids read-modify-write cycles (which makes each
processed flag cheaper). due to more iterations per handler invocation
the actual runtime of the handler body remains roughly the
same (slightly faster at O2, slightly slower at Oz).
notably wakers are now kept in one large array indexed by the irq
register bit number instead of three different arrays, this allows for
machine code-level optimizations of waker lookups.
1403: Bump versions preparing for -macros and -executor release r=lulf a=lulf
I'd like to propose a new release of embassy-macros and embassy-executor, as there is a challenge with some of the features changing since 0.1.1 when using libraries that depend on 0.1.1 with applications that patch to use git versions.
Co-authored-by: Ulf Lilleengen <lulf@redhat.com>