CHERIoT Programmers' Guide

David Chisnall

Table of contents

12. Porting from FreeRTOS

FreeRTOS is an established real-time operating system with a large deployed base. It runs on tiny microcontrollers up to large systems with MMU-based isolation. The CHERIoT platform aims to provide, on small microcontrollers, stronger security guarantees than FreeRTOS is able to provide on large systems.

This chapter describes how several concepts in FreeRTOS map to equivalents in CHERIoT RTOS.

The FreeRTOS-Compat directory in include contains a set of headers (including FreeRTOS.h) that expose FreeRTOS-compatible wrappers around various CHERIoT RTOS services. These allow you to port existing FreeRTOS code to CHERIoT RTOS with minimal changes. These are not complete, but are expected to evolve over time.

12.1. Contrasting design philosophies

FreeRTOS is primarily designed around a model with a single trust domain. The initial targets did not provide any memory protection. You, the author of an embedded system, were assumed to have control over all components that you're integrating. Later, MPU support was added, building on top of the task model. When using an MPU, some tasks can be marked as unprivileged. These have access to their own stack and up to three memory regions, which must be configured explicitly.

Even when an MPU exists, the trust model is limited to hierarchical trust. The system integrator may mark certain tasks as unprivileged, but individual tasks cannot define more complex trust relationships. Memory safety is limited to the granularity of an MPU region. For example, the scheduler can expose message queues as privileged functions, which protects the queue's internal state from being tampered with by untrusted tasks, but may still overwrite the bounds of an object in an untrusted tasks if passed a pointer to an object that is not large enough to store a complete message.

As a fundamental design principle, FreeRTOS aims to run on many different platforms and provide portable abstractions. This limits the security abstractions that are possible to implement.

In contrast, the CHERIoT platform was created as a whole-system hardware-software co-design project. The hardware is required to provide properties that the software stack can use to build security policies. The core design of CHERIoT is motivated by a world in which a developer of an embedded system may not have full control over components provided by third parties, yet must integrate them. It is intended to provide auditing support that allows the integrator to make security claims even when integrating binary-only components.

This difference manifests most obviously in the fact that FreeRTOS provides imperative APIs for a number of things that CHERIoT RTOS prefers to create via declarative descriptions. Auditing a declarative description is easier than auditing arbitrary Turing-complete imperative code calling privileged APIs.

FreeRTOS starts from a position of sharing by default and has added MPU support to provide isolation. CHERIoT RTOS starts from a default position of isolation and provides object-granularity sharing.

The design of FreeRTOS was designed to support adding features to systems that did not originally use any kind of OS. This is apparent, for example, in how the programmer interacts with the scheduler. The scheduler is just another service that the system integrator may choose to use. User code chooses when the scheduler starts and may choose to stop it for arbitrary periods.

In contrast, CHERIoT RTOS provides a model mode familiar to users of desktop or server systems. The core parts of the RTOS are always available and provide strong isolation guarantees.

12.2. Replacing tasks with threads and compartments

The FreeRTOS task abstraction is similar to the traditional UNIX process abstraction. A task owns a thread and is independently scheduled. It is intended to be isolated from the rest of the system, though on systems without memory protection it has access to everything in the address space.

A task in FreeRTOS is roughly the equivalent of a combination of a thread and a compartment in CHERIoT RTOS. The compartment defines the code and global data associated with the task. The thread provides the stack and allows the task to be created.

CHERIoT RTOS threads have one key limitation in comparison to FreeRTOS tasks: They cannot be dynamically created. The security model requires a static guarantee that no memory moves between being stack memory (which is permitted to hold non-global capabilities) and non-stack (global or heap) memory. The trusted stack memory and save area memory should never be visible outside of the switcher. Without these static properties, the allocator would be in the TCB for thread and compartment isolation.

As such, there is no equivalent of the FreeRTOS xTaskCreate function. Threads (and their associated stacks and trusted stacks) must be described up front in the build system (see Section 5.1. Defining threads). In some cases, dynamically created threads can be replaced with thread pools, in the same way that coroutines can.

The compatibility later exposes xTaskCreate and xTaskCreateStatic as macros that generate a warning and evaluate to an invalid thread handle. This is intended to ease porting of code that conditionally uses these APIs.

The best way to replace dynamic thread creation is usually to create the threads declaratively in the build system. If they need to be started only after a certain event, then you can wait on a futex (see Section 5.5. Building locks with futexes) and notify that futex at the point where the original code called `xTaskCreate`.

12.3. Using thread pools to replace coroutines

The CHERIoT RTOS thread pool (see lib/thread_pool) allows a small number of threads to be reused. This provides a compartment that has two entry points. One is a thread entry point that sits and waits for messages from other threads, the other is exposed for calls by other compartments and sends a message to one of the threads in the pool.

This is most commonly used with C++ lambdas via the async wrapper in thread_pool.h:

async([]() {
	// This runs in the caller's compartment but in another thread.
})

This can be used for cooperatively-scheduled work in a similar manner to stackless coroutines. Each task dispatched to a thread pool will run until completion on one of the threads allocated to the thread pool. When it returns, the thread-pool thread will block until another task is available in the queue.

Some of the use cases for dynamic FreeRTOS task creation can be implemented the same way. On memory-constrained systems, dynamic thread creation can easily exhaust memory for stacks and so most systems that depend on dynamic thread creation do so at different phases of computation to allow the stack space to be reused. Pushing these as thread-pool tasks provides similar behaviour, with each task taking ownership of the (safely zeroed) stack after the previous one has finished.

The RTOS-provided thread pool is very simple. You may wish to implement something similar using it as an example, rather than using it as an off-the-shelf component.

12.4. Porting code that uses message buffers

The CHERIoT RTOS message queue APIs (see Section 5.9. Sending messages) are modelled after the FreeRTOS message queue. In most cases, there is a direct mapping between the FreeRTOS APIs and the CHERIoT RTOS ones, as shown in Table 4. CHERIoT equivalents of FreeRTOS queue operations

FreeRTOS API CHERIoT RTOS API
xQueueCreate queue_create
vQueueDelete free
xQueueReceive queue_receive
xQueueSendToBack queue_send
uxQueueMessagesWaitingqueue_items_remaining
Table 4. CHERIoT equivalents of FreeRTOS queue operations

The FreeRTOS-Compat/queue.h header provides wrappers that respect this mapping. The CHERIoT RTOS APIs provide some additional functionality that is not present in FreeRTOS and so code that does not need to be maintained working in both environments may benefit from being moved to the native APIs.

This mapping uses the queue library, which is intended for communication between threads in the same compartment. FreeRTOS code typically assumes a single trust domain and so this is usually what you want when porting. In some cases, you will split multiple FreeRTOS components into separate compartments. In this case, you will most likely want to use the queue compartment (see Section 5.9. Sending messages), which isolates the queue state from callers.

For C++ code, the ring buffer in ring_buffer.hh may be more interesting. This provides a generic ring buffer that can be specialised with different locks on the producer and consumer end.

12.5. Porting code that uses event groups

As with message queues, the CHERIoT RTOS event queue API was modelled on that of FreeRTOS. As such, there is direct correspondence between the FreeRTOS APIs and the equivalent CHERIoT RTOS versions, shown in Table 5. CHERIoT equivalents of FreeRTOS event group operations.

FreeRTOS API CHERIoT RTOS API
xEventGroupCreate eventgroup_create
vEventGroupDelete eventgroup_destroy
xEventGroupWaitBits eventgroup_wait
xEventGroupClearBits eventgroup_clear
xEventGroupSetBits eventgroup_set
Table 5. CHERIoT equivalents of FreeRTOS event group operations

The FreeRTOS-Compat/event_groups.h header performs this translation.

The FreeRTOS event queue structure provides a rich set of operations. In contrast, CHERIoT RTOS aims to provide a small set of core abstractions that can be assembled into complex systems. A lot of users of the event groups API could use simpler wrappers around a futex, rather than an event group.

12.6. Adopting CHERIoT RTOS locks

CHERIoT RTOS provides futexes as the building block for most locks. This can be used to build counting semaphores, ticket locks, mutexes, priority-inheriting mutexes, and so on. Several of these are implemented in the locks library and exposed via locks.h (and locks.hh for C++ wrappers).

The FreeRTOS-Compat/semphr.h exposes FreeRTOS-compatible wrappers for counting semaphores. In FreeRTOS, these are implemented as message queues with zero-sized messages. In CHERIoT RTOS, they are simply futexes that store a count. This means semaphore get and put operations are usually simple atomic operations. The scheduler is not involved unless a thread needs to block (the semaphore count is zero and a thread tries to do a semaphore-get operation) or needs to wake waiters (the semaphore value is increased from zero and there were waiting threads).

Unlike FreeRTOS, CHERIoT RTOS exposes different types for different locking primitives if they are incompatible. This catches some API misuse errors at compile time. For example, FreeRTOS uses SemaphoreHandle_t to represent semaphores and recursive mutexes. These must be created with different functions and then locked and unlocked with different functions, but creating something as a semaphore and then trying to lock it as a recursive mutex will compile. In contrast, CHERIoT RTOS exposes these as distinct types and will fail to compile if you try to pass a semaphore to, for example, recursivemutex_trylock.

The FreeRTOS-Compat/semphr.h header provides wrappers that for the various types. These expose the FreeRTOS APIs and wrap all of the relevant CHERIoT RTOS types in a union with a discriminator. This adds a small amount of overhead for dynamic dispatch and so code that uses only one type of semaphore can avoid this. Each of the underlying types can be exposed by defining one of the following macros before including FreeRTOS-Compat/semphr.h (directly, or indirectly via FreeRTOS.h):

CHERIOT_FREERTOS_SEMAPHORE
Expose counting and binary semaphores.
CHERIOT_FREERTOS_MUTEX
Expose non-recursive (priority-inheriting) mutexes.
CHERIOT_FREERTOS_RECURSIVE_MUTEX
Expose recursive mutexes.

Enabling only the subset that you use (which can be done on a per-file basis) will reduce code size and improve performance.

12.7. Building software timers

FreeRTOS provides a timer callback API. This is implemented on top of existing functionality in the FreeRTOS kernel. CHERIoT RTOS does not yet provide such an API, but building one is fairly simple.

The structure of such a service is similar to that of the thread pool in lib/thread_pool, except that each callback has an associated timer. These should be added to a data structure that keeps them sorted. The thread that runs the callbacks should wait on a message queue, with the timeout set to the shortest time timer. If this wakes with timeout, it should invoke the first (__cheri_callback, see Section 3.3. Passing callbacks to other compartments.) callback function in its queue. If it wakes receiving a message, it should add the new callback into the set that it has ready.

There is no generic version of this in CHERIoT RTOS because it is impossible to implement securely in the general case for a system with mutual distrust. Callbacks may run for an unbounded amount of time (preventing others from firing) or untrusted code may allocate unbounded numbers of timers and exhaust memory. As such, it is generally better to build a bespoke mechanism for the specific requirements of a given workload.

12.8. Timing out blocking operations

FreeRTOS uses the combination of vTaskSetTimeOutState and xTaskCheckForTimeOut to implement timeouts. These are implemented in the FreeRTOS compatibility layer. In CHERIoT RTOS, these are subsumed in the Timeout structure, which contains both the elapsed and remaining number of ticks for a timeout.

The CHERIoT RTOS design is intended to be trivially composed. Most operations simply forward the timeout structure to a blocking operation in the scheduler (a sleep of a futex wait). They can query whether the timeout has expired without needing to query the scheduler, simply by checking whether the remaining field of the structure is zero.

12.9. Dynamically allocating memory

FreeRTOS provides a number of different heap implementations, not all of which are thread safe. In contrast, CHERIoT RTOS design assumes a safe, secure, shared heap. Various uses of statically pre-allocated memory in a FreeRTOS system can move to using the heap allocation mechanisms in CHERIoT RTOS, reducing total memory consumption.

FreeRTOS prior to 9.0 allocated kernel objects from a private heap. Later versions allow the user to provide memory. The latter approach has the benefit of accounting these objects to the caller, but the disadvantage of breaking encapsulation.

CHERIoT RTOS has an approach (described in Chapter 6. Memory management in CHERIoT RTOS) that combines the advantages of both. Rather than providing memory for creating objects such as message queues, multiwaiters, semaphores, and so on, the caller provides an _allocation capability_. This is a token that permits the callee to allocate memory on behalf of the callee. The scheduler is not able to allocate memory on its own behalf, it can allocate memory only when explicitly passed an allocation capability. It then uses the sealing mechanism to ensure that the caller cannot break encapsulation for scheduler-owned objects.

12.10. Disabling interrupts

FreeRTOS code often uses critical sections to disable interrupts. This may require some source-code modifications. Critical sections in FreeRTOS are used for two things:

Disabling interrupts is the simplest way of guaranteeing both on a single-core system. FreeRTOS provides two APIs for critical sections: taskENTER_CRITICAL and taskEXIT_CRITICAL, which disable interrupts, and vTaskSuspendAll and xTaskResumeAll, which disable the scheduler. CHERIoT RTOS is designed to provide availability guarantees across mutually distrusting components and so does not permit either unbounded disabling of interrupts or turning the scheduler off. If mutual exclusion is the only requirement then you can implement these function as acquiring and releasing a lock that is private to your component. This is how that are implemented in the compatibility layer. They use distinct locks and these must be defined in your compartment, as shown below:

struct RecursiveMutexState __CriticalSectionFlagLock;
struct RecursiveMutexState __SuspendFlagLock;

A futex-based lock is very cheap to acquire in the uncontended case, it requires a single atomic compare-and-swap instruction (this may be a function call to a library routine that runs with interrupts disabled if the hardware does not support atomics). If possible, this approach is preferred for two reasons. First, it ensures that your component's critical sections do not impede progress of higher-priority threads. Second, it removes a burden on auditing.

The second use case, atomicity with respect to the rest of the system, requires disabling interrupts. The CHERIoT platform requires a structured-programming model for disabling interrupts. Interrupt control can be done only at a function granularity. Hopefully, the code that runs with interrupts disabled is already a lexically scoped block. In C++, you can simply wrap this in a lambda and pass it to CHERI::with_interrupts_disabled. In C, you will need to factor it into a separate function.

For auditing, you may prefer to move the code that runs with interrupts disabled into a separate library. This lets you separately audit the precise code that is allowed to run with interrupts disabled, but modify the rest of your component without constraints.

12.11. Strengthening compartment boundaries for FreeRTOS components

Microsoft did an internal port of the FreeRTOS network stack and MQTT library. This was not part of the open-source release, but involved very little code change. Most of the porting effort was done via a FreeRTOS compatibility header, which provided wrappers around the CHERIoT RTOS inter-thread communication APIs to make them look like the FreeRTOS equivalents.

FreeRTOS assumes, by default, that all code and globals are shared unless explicitly protected by an MPU region. When porting FreeRTOS components, this assumption is broken unless they are in the same compartment. This is not normally a problem for an initial port, because components are cleanly encapsulated and do not directly modify the state of other components.

This property does not hold on all RTOS implementations. For example, several ThreadX components directly manipulate the internal state of the scheduler, rather than acting via well-defined APIs.

Using compartments give some defence in depth against accidental errors, but may not provide strong security guarantees. For example, the FreeRTOS TCP/IP stack provides a FreeRTOS_socket call that returns a pointer to a heap-allocated socket structure that encapsulates connection state. Simply compiling this in a CHERIoT compartment has a few limitations.

First, the structure is allocated out of the network stack's quota. This means that a caller can perform a denial of service by opening a load of connections. Fixing this requires an API change to pass an allocation capability (and possibly a timeout) into the network-stack compartment so that it can allocate this space on behalf of the caller.

Second, the structure is unprotected. The caller can load and store via the returned pointer and so can corrupt connection state. This may allow it to leak state of connections owned by other components or cause arbitrary failures.

Finally, there is no notion of access control. That might be fine: if you're allowing only one compartment to talk to the network stack then you don't need any kind of authorisation. For more complex uses, you may want to allow one component to talk to a command-and-control server and another component to talk to an update server. Neither of these components should be able to connect anywhere else and so you probably want to use the software capability model to define a static authorisation to make DNS lookups of a specific domain and then have that return a dynamic authorisation that allows connection to that host (or place both the lookup and connection behind a single interface).