CHERIoT Platform

Why did you write a new RTOS for CHERIoT?

2024-10-24T00:00:00+00:00

I’m often asked why we decided to write a new RTOS for CHERIoT instead of using something that already existed, such as ThreadX, FreeRTOS, or Zephyr. The short answer is that CHERIoT is a hardware-software co-design project and retrofitting ground-up co-design is hard. This post is for people who want the long answer.

Microsoft bought ThreadX (and renamed their RTOS to Azure RTOS) shortly before we were starting the CHERIoT project and so it was natural that, as a Microsoft Research project, we’d aim to use it as a base. Unfortunately, as we looked at ThreadX we found that there was a lot of tight coupling that made compartmentalisation difficult. If we did it, the result would look so unlike ThreadX that we wouldn’t get much benefit. It would not be compatible with existing ThreadX uses.

We also considered some other potential options but found similar issues.

Security is not an add-on

We believe that building a secure system means enabling and respecting two key principles:

The principle of least privilege means that no component should run with privileges that it doesn’t need.
The principle of intentional use means that no component should accidentally exercise privileges that it needs for some purpose other than its current operation.

CHERI is designed around these principles and CHERIoT inherits them. The principle of least privilege is embodied in CHERI by allowing you to hand a pointer that grants access to a single byte of memory, perhaps with only read permission, if that’s all that a function requires. The principle of intentional use is embodied by CHERI because you must use the correct capability as the operand to load or store instructions. Just because you hold pointers to two objects doesn’t mean that an out-of-bounds access to one will let you access the other. In contrast, memory management unit (MMU) or memory protection unit (MPU) designs may enable least privilege (though not in a way that’s convenient for programmers: more on this later) but they can’t enforce the principle of intentional use. If I have access to two memory regions protected by an MPU and I do some invalid pointer arithmetic from one, I can access the other.

CHERIoT extends these principles into the software space. I’ll talk later about how we build a software capability model to authorise operations at higher levels of abstraction than ‘can I read or write this object’ but the principle of least privilege permeates the design.

As with microkernel designs, we do not have a single large privileged entity. Unlike most microkernel designs, we don’t even have a single small entity in the system that holds the rights do absolutely anything (at least, not after the bootloader has run, and the bootloader is deterministic and does not process any data except that in the firmware image).

The most privileged entity in the system is the switcher. This is a few hundred instructions and we’re starting to work with some folks interested in formally verifying it (let us know if you want to join in this activity!). The main privilege that this holds is the ability to access the register that contains a capability to the register-save area for context switching threads and the trusted stack used to enforce call-return discipline on cross-compartment calls. This is quite a rich set of permissions because it would allow the switcher to violate the compartment- and thread-isolation guarantees that the rest of the system depends on. It isn’t omnipotent though. It can’t access all memory indiscriminately. It can’t modify its own code. It can’t read or write anything that the caller or callee couldn’t access in a cross-compartment call.

The core of the RTOS also contains a scheduler and an allocator. The scheduler has direct access to devices like the timer and interrupt controller. It chooses the next thread to run but it cannot see the state of any interrupted thread. When you call into the scheduler for a blocking operation, it’s simply another compartment and it has access only to the things that you pass it.

The allocator is somewhat more trusted. It holds a capability to the entire heap (though not to any other compartment’s memory). It subdivides this and manages the hardware features required to prevent use-after-free. As such, it is in the TCB for heap memory safety (both spatial and temporal). If; however, you are using only stack and global objects, the allocator is entirely outside of your TCB.

Other parts of the RTOS are optional libraries or compartments using the same model. This is hard to retrofit to an existing design. Building in least-privilege principles is easy if you do it from the start (and have the underlying hardware mechanisms to cleanly express it) but it’s very difficult to disentangle later.

Building the compartment model

CHERIoT is designed with a compartmentalisation model in mind. The hardware and software work together to enforce this model. The model is intended to be easy to use from programming languages.

In a traditional operating system, RTOS or otherwise, if you want to communicate between trust domains you typically use something that looks like inter-process communication (IPC). There’s a kernel that mediates communication and lets you do things like sending messages or streams of data. In the CHERIoT compartmentalisation model, you simply call functions.

If you want to share data between two security contexts then, in a traditional operating system, you create a shared memory region and configure the MMU or MPU to expose it to both security contexts. In CHERIoT’s compartmentalisation model, you pass a pointer as a function argument to a cross-compartment call (you can now also statically share objects).

Trying to introduce these concepts to an existing RTOS involves significant redesign.

For example, FreeRTOS implements message queues in the kernel (as do most operating systems for larger systems). In CHERIoT RTOS, we provide a similar abstraction but in several layers. The CHERIoT RTOS scheduler is in the TCB for availability (by necessity: a scheduler is the thing that chooses the next thread to run and can violate availability by not running a thread) but not for confidentiality or integrity. It provides a single synchronisation primitive: a futex. A futex is a conceptually simple mechanism that allows a thread to block if a memory location contains an expected value and another thread to wake one or more waiters. You can use a futex to build a variety of lock types and you can also use it for producer and consumer counters in a ring buffer so that producers can sleep if the queue is full and consumers can sleep if it is empty.

CHERIoT RTOS implements a message queue in a shared library. A CHERIoT RTOS shared library has code and read-only globals but no read-write globals and does not require crossing a security boundary to invoke. Calls to shared libraries are about as fast as calls to functions in the current compartment but the same code can be shared by many compartments. The message-queue library avoids any cross-compartment calls on the fast path and calls the scheduler only when a producer or consumer needs to sleep (when the queue is full or empty, respectively).

Often, CHERIoT RTOS systems will use message queues for communicating between threads in the same compartment and so this is sufficient. Alternatively, you may want to use them in a way that’s closer to traditional inter-process communication, where the sender and receiver are in different compartments and a shared library with both ends having complete access to the queue would be a problem. For this use case, we provide a compartment that owns the message queue and provides sealed capabilities as opaque handles for the endpoints.

CHERI doesn’t look like an MPU

A lot of embedded operating systems were not designed with any kind of protection in mind and had some MPU support retrofitted. Others were designed to take advantage of an MPU. Neither ends up with abstractions that map well to CHERI.

For example, FreeRTOS has the notion of a restricted task, which is similar to a process in a conventional operating system. A restricted task has access to a limited set of MPU regions, rather than the entire address space. This model lets you run some code with reduced privileges, but does not help with the principle of intentional use and so does not align well with a CHERI model.

In an MPU-based approach, you may wish a component to have access to (the memory ranges containing) objects A and B, so you would configure MPU regions that grant access to A and B. If there is a bounds error in addressing into A then the component may accidentally overwrite B. In contrast, in a CHERI system, you would pass bounded pointers (CHERI capabilities) to A and B. Any operation that attempts to access A will never accidentally access B. This intentionality (you can access an object only when you intend to access that object) is core to CHERI.

The CHERIoT security model was designed to support language-friendly abstractions. You don’t share objects in CHERIoT RTOS by creating shared regions of memory and populating them, you share objects by passing pointers as arguments to functions. This comes with a rich hardware-enforced permission model that lets you express properties such as shallow or deep immutability or no-capture guarantees on pointers.

For example, the TLS compartment in the network stack maintains a per-connection ring buffer that it uses for incoming and outgoing messages. When it wishes to receive some data, it creates a bounded pointer to a region in that buffer that has only store permission. The TLS compartment then passes this to the TCP/IP compartment’s receive function. When this call returns, the TLS compartment knows that this buffer is safe from time-of-check-to-time-of-use problems because the TCP/IP compartment cannot have captured it because the pointer does not have global permission. It knows that no data outside of that buffer can have been written, because the pointer is bounded. It knows that data left over in that buffer cannot have been read (because the pointer does not have read permission) and so it is safe from information disclosure. All of this is visible in the source code and is enforced with CHERI.

It might be possible to get the same guarantees from an MPU by invoking a kernel to set up an MPU region with store-only permission covering an object owned by the TLS stack, but this would look very different. Trying to provide the same programmer model for both CHERI and MPU-based systems (or even those without an MPU) would require serious compromises.

MMUs and MPUs were not designed to provide abstractions for normal programmers. MMUs were created to provide operating-system abstractions such as swapping and the ability to isolate processes or virtual machines from each other. MPUs were designed to allow an RTOS kernel to protect itself from untrusted tasks. They were both designed as tools for use by operating systems.

CHERI, in contrast, was designed to provide a protection model that could be directly exposed into C and higher-level programming languages. It is designed to protect things that are visible in a language abstract machine as objects, not memory ranges. It protects these objects by putting the permissions (and bounds and integrity guarantees) in things that are exposed to programming languages as pointers, not as entries in an indirection table.

CHERI enables new abstractions

There are a lot of abstractions in CHERIoT RTOS that are possible only because of CHERI. Most notably, we make a lot of use of the sealing mechanism from CHERI. Sealing is an operation that takes a CHERI capability representing something that a programmer would think of as a pointer and another that conveys the authority to seal an object as a specific type and produces a sealed capability. Once you have created a sealed capability, the only operation that you can do with it is unseal it with a permit-unseal capability that matches the type used for the sealing operation.

If you didn’t understand that, don’t worry, the guarantees are more important than the mechanism. From a C programmer’s perspective, this means that you can create opaque pointers that are tamper-proof and type safe. Your compartment can expose an API that returns a sealed pointer for use as a handle. A caller can store this just as they would any other opaque pointer but cannot dereference it. When they pass it back (possibly indirectly, via other compartments), your compartment can unseal it and be certain that it points to a value of the correct type.

These objects can be created statically or dynamically and let you build a rich capability system at the software level. The first place that you’ll encounter this in CHERIoT RTOS is in the heap allocator. Our malloc function is a compatibility wrapper around heap_allocate, which requires an allocator capability to authorise allocation. Each allocator capability has an associated quota, so you can restrict the amount of memory that a compartment may allocate. These quotas show up in the auditing report when you link a firmware image and so you can statically see how much memory each compartment may allocate.

Some compartments allocate memory but have no quota. This is possible because delegation is intrinsic to capability systems. For example, when you want to create a TLS connection, you have to pass two capabilities as arguments to the function that the TLS compartment exposes for this purpose. One authorises the TLS compartment to allocate memory on the caller’s behalf for all of the connection state. The other, a connection capability, authorises TLS compartment to create a network connection to the specific host on behalf of the caller.

This kind of model is very hard to build without CHERI. POSIX operating systems use file descriptors for handles, where file descriptors are indexes into a table maintained by the kernel. This model works well when you have a single monolithic kernel that can hold the file descriptors, it works far less well if you want to provide handles to different objects provided by mutually distrusting components that can be passed (for permanent or temporary delegation) between components.

Ubiquitous shared heap

CHERIoT was designed to provide both spatial and temporal safety, both enforced efficiently in the hardware. As such, we can rely on a shared heap, even in situations where you need to provide mutual distrust.

Building an operating system assuming that you can safely share heap allocations between mutually distrusting components leads to some very different design choices. There’s often a lot of API complexity from having to first query how much space something needs, then allocate the space, and then pass it and the length back to another API. This is particularly hard when the size of the available data can change between the calls. In a CHERIoT system, this kind of API is unnecessary. The caller provides a capability that allows allocation and then the callee returns a pointer (owned by the caller) that is the correct size.

There are a lot of places where usability and efficiency are improved by being able to rely on a (spatially and temporally safe) shared heap but this is not something that can be provided without CHERIoT (or similar CHERI extensions).

Auditing the compartment graph

The CHERIoT ABI was designed to enable link-time auditing. A compartment must be explicitly provided with capabilities to do anything outside of its own code and global space. The structures for creating these capabilities are present in the firmware image and are visible to the linker, which generates a report containing:

The hashes of the object code that went into the compartment (for integration with SBOMs and a secure compilation flow).
The functions exported from each compartment and library, and whether they run with interrupts enabled or disabled.
The functions in other compartments or libraries that a compartment or library calls.
The MMIO devices that a compartment can access.
The software-defined capabilities that a compartment holds, their type, and their contents.
The stack and trusted stack sizes and entry points for all threads.

These reports are intended to be consumed by the cheriot-audit tool, which can drive CI checks, code-signing decisions, and so on.

Supporting this required carefully designing the ABI along with the compartment model. It required building support into the linker for report generation and having a build process that created the right-shaped inputs. We would have lost these properties if we’d tried to fit into an existing linkage model.

An RTOS is small

Finally, being small is one of the key selling points for an RTOS. ThreadX advertises a code size of 2-15 KiB for most deployments.

If we had decided to rewrite something like Linux or Windows and aim for feature parity, that would have been a multi-billion-dollar effort and would have required a lot of justification of the potential benefits. Rewriting an RTOS, on the other hand, is far more feasible. The core parts of CHERIoT RTOS add up to around 7 KLoC. Including headers and libraries, it’s only around 20 KLoC. This made the cost-reward calculation very different: the cost was small and the rewards of a co-designed hardware-software stack were large.

It is far more important that we can easily reuse existing code that was written either for bare metal or other embedded operating systems. For example, our FreeRTOS compatibility layer makes it possible to run the FreeRTOS TCP/IP stack, MQTT library, SNTP library, and so on. Out prototype network stack is around 5 KLoC for the new code, but it includes well over a hundred thousand lines of off-the-shelf third-party code.

By providing a core RTOS that has a rich compartmentalisation model, we make it easy to take existing components and wrap them in mutually-distrusting least-privilege compartments.

So, Why a New RTOS?

CHERIoT RTOS is co-designed with its underlying architecture and its C/C++ toolchain to efficiently provide programmers with affordances that are difficult, expensive, or even impossible to achieve in embedded computing platforms that run software stacks that had to work around the limitations of existing hardware. Try it yourself by running CHERIoT-RTOS in a GitHub Codespace!

Improved error handling in CHERIoT RTOS

2024-09-20T00:00:00+00:00

CHERI platforms in general, and CHERIoT in particular, can turn a lot of bugs that would be silent data corruption into recoverable errors. The ‘recoverable’ part comes from the fact that any error is caught before an invalid operation succeeds.

In CheriBSD and the proposed CHERI extensions to POSIX, CHERI faults are delivered as signals. In CHERIoT RTOS, we have a similar mechanism. Each compartment can define an error handler and, when a CHERI fault (or other recoverable fault such as an illegal instruction or alignment trap) occurs, this is invoked with a copy of the register file at the point that the trap occurred. This can be used to perform low-level error recovery operations, such as skipping the faulting instruction or unwinding to the previous compartment invocation.

Custom error handlers are quite difficult to write. In addition, the design made it impossible to run them when the fault was caused by stack exhaustion. Although we have tools to help avoid stack overflows, these still happen (particularly when incorporating third-party code) and it’s nice to have a general mechanism for handling them.

With a few recent PRs, we’ve built a much more developer-friendly mechanism for handling errors.

Unwinding the stack

Exceptions are a common mechanism for reporting errors. Exceptions are a form of non-local return: they transfer control flow to a function higher up the stack to report the error.

On most modern platforms, exceptions are implemented using a table-driven unwinder. The exact details differ between 64-bit Windows and *NIX systems, but the underlying mechanisms are very similar. For each function, the compiler emits a table that describes how to unwind the stack. A generic unwinder library can read this information and pop each frame off the call stack, one at a time. Language-specific exceptions are built on top of this. In the unwind data, each function can define a personality function that uses some language-specific data to determine whether the current unwind should run cleanups or should stop for a catch.

All of this adds up to a lot of code. The unwind library and C++ exception support are typically on the order of 2-300 KiB of code. On top of this, the unwind metadata can add 10-20% to the size of a program. This means that, for a lot of embedded systems, the generic unwinder plus the unwind metadata would consume all of memory, leaving no space for anything else.

For CHERIoT, we went back to older mechanisms. Two systems implemented exceptions in a similar way: 16-bit (and 32-bit) Windows, and OpenStep. These both used a model built on setjmp and longjmp.

These functions are defined in the C standard and provide a very low-level exception-like model. When you first call setjmp, it stores some current register state in the jmp_buf that you pass as an argument and return zero. If you then pass the same jmp_buf to longjmp, it will jump back to the place where setjmp returned.

Windows’ Structured Exception Handling (SEH) and OpenStep exceptions built a linked list of exception structures that contained a jmp_buf. You could jump to the top exception handler simply by calling a function that popped the top entry from this stack and called longjmp on it.

On Windows, this was supported by __try and __except keywords. OpenStep implemented the equivalent functionality entirely with macros: NS_DURING, NS_HANDLER, and NS_ENDHANDLER.

These went out of favour on larger systems for two reasons.

First, they were not ‘zero cost’. Typically, exceptions happen in exceptional circumstances. The setjmp-based model meant that there was overhead entering a try block, but much less overhead when throwing an exception. Table-based models make entering a try block (almost) free, but make throwing an exception more expensive.

Second, as register files grew, the amount of state that needed to be stored with setjmp increased, so the non-zero cost became an increasingly large cost.

CHERIoT is based on RV32E, which defines only 15 registers. Of these, most are temporary and so our jmp_buf needs to store only four: The two callee-save registers, the stack pointer, and the return address. This means that it takes only 32 bytes of space and requires four instructions to write to. Our setjmp implementation is only six instructions as a result (as is longjmp).

Providing a thread-local unwind handler

The unwinding mechanism described above depends on two things: setjmp and somewhere to stash a linked list of jmp_bufs. The first bit is easy, but the second bit is more complex.

CHERIoT uses the local / global mechanism from CHERI to enforce strong thread isolation. Pointers derived from the stack pointer and return addresses may be stored only on the stack. This means that at least two of the four registers that must be stored in a jmp_buf by setjmp can be stored only on the stack.

The traditional approach of keeping the head of the linked list in a global is therefore impossible. The slightly more modern approach of using thread-local storage (TLS) would also not work because we do not have thread-local storage.

The obvious solution is therefore to add TLS, but what would that mean? Consider a single thread that starts in compartment A, then calls into compartment B, which calls back into A. If we have conventional TLS, A could stash a pointer into the part of the stack that it passes to B in TLS and then in the second invocation it would be able to violate compartment isolation (reading and writing all of the on-stack state for B). If we built a linked-list of jmp_bufs there, then it would be possible for the second invocation of A to jump back to the first, bypassing B and violating the trusted-stack guarantees. This would be bad.

What we want is not thread-local storage but compartment-invocation-local storage. Each time a thread enters a new compartment, you should get some storage that is not local to the current function but can be accessed from any nested call.

To implement this, we looked at the earliest way that operating systems have implemented thread-local storage: reserve some space at the top of the stack.

When the switcher transitions between compartments, it truncates the stack so that the callee doesn’t have access to the caller’s stack. Now, on entry into a compartment, the switcher will move the stack pointer 16 bytes down before transferring control into the callee. Similarly, the loader will reserve 16 bytes at the top of the stack before starting a thread.

This means that you have two pointers worth of space that are easy to find (set the address of the stack pointer to its top, then set the address to eight or 16 bytes below that). CHERIoT has a convenient cgettop instruction and so this sequence is very short. First, cgettop gives the top address, then csetaddress gives a new capability derived from the stack pointer that points to the address. After that, a -8 immediate offset to a load or store capability instruction can access the space, so we need only three instructions to load the head of the list.

With this, we can store the head of the linked list of error handlers at the top of the stack for the current compartment invocation. When you want to jump to the nearest error handler, find the head of the list relative to the stack pointer, pop it, and pass it to longjmp. The cleanup_unwind function does all of this for you, so typically you won’t need to ever see that this is how it’s implemented.

Handling errors even in the presence of stack overflow

Everything so far is enough to build a set of nested error handlers, but what happens if the error is caused by running out of stack space? PR 301 adds the last bit of this: support for a new kind of error handler that doesn’t use a stack.

This is in addition to the existing error handlers. The stackless error handler (if it exists) will run instead, in any of the following conditions:

A compartment doesn’t provide the normal error handler.
There isn’t enough stack space available for the context.
The stack isn’t valid at all (the stack pointer can become untagged if a function prologue subtracts from it and moves it out of bounds).

This means that you can now run an error handler even if you overflow the stack. But what happens if you want to jump to the nearest error handler registered as described earlier and the stack pointer isn’t valid?

First, the CPU will trap because a csp-relative load fails because csp (the capability stack pointer register) is untagged. This transitions to the switcher. The switcher will then find that you need to run the stackless error handler (because csp is untagged). The switcher will then look at the trusted stack and rederive the csp value that your compartment had on entry.

The stackless error handler does have a stack, but it doesn’t have a stack frame. When it’s invoked, it can guarantee that csp is a capability that authorises access to the stack, but it can’t guarantee anything about the address of that capability (other than that it will be in bounds).

That’s absolutely fine for popping the top error handler and jumping to it though.

Putting it all together

Most users will never need to know anything about the above. They can simply use the macros or wrapper functions, and the error handler, that we provide. The error handler is added to a compartment by adding the following line under the compartment declaration in xmake.lua:

    add_deps("unwind_error_handler")

C programmers have to use the macro-based version, which looks like this:

CHERIOT_DURING
{
    // Do some things that may cause a crash
}
CHERIOT_HANDLER
{
    // Handle the failure
}
CHERIOT_END_HANDLER

Note that, because these are implemented with setjmp, the usual rules about setjmp apply. Specifically, anything that’s used in both blocks must be declared volatile.

In C++, you can use the on_error function, which takes two lambdas (or other callable objects), representing the try and error paths. If you’re using RAII for cleanup, you can often omit the second. For example:

    LockGuard g(flagLock);
    // No handler.  g's destructor runs after on_error returns.
    on_error([&]() { /* code that might fail here */ });

This will run the code in the lambda and then release the lock, even if you take a CHERI trap.

Hopefully this is much easier than writing a custom error handler that detected held locks and released them.

This isn’t a full exception model. In particular, there’s currently no way in the handler of seeing the cause of the fault. In a future iteration, we’ll add something like ‘Herbceptions’: Herb Sutter’s proposal for C++ where exceptions must be preallocated objects or primitive values.

This mechanism is designed to be simple and lightweight, not to be as generic as something that you’d build on a complex system. You can protect something against faults with a single C++ wrapper function and 40 bytes of stack space. On a system where stacks are often under 1 KiB and where total memory is usually much less than 1 MiB, that’s feasible overhead.

CHERIoT at the Digital Security by Design All Hands meeting

2024-09-19T00:00:00+00:00

Several companies presented CHERIoT-related things at the Digital Security by Design all-hands meeting yesterday!

lowRISC, whose Sonata board was used by all of the demos, presented a demonstration of an automotive system where a bug in the volume control would overwrite the speed controller value (on a non-CHERI system). The source for this is in the Sonata software repo, as is the snake example that lowRISC also showed.

ConfiguredThings presented an extended version of the configuration management demonstration that they’ve previously contributed to the project. The updated version integrated the CHERIoT network stack to talk to their back-end secure configuration management system. The code for the original version of their demo is open and the network-connected version should appear in the same place soon.

This showed how a CHERIoT system can provide additional defence in depth. Each configuration block from the server was parsed in a separate compartment, so bugs in the JSON parsing are not exploitable. The worst that can happen is that an invalid configuration update is ignored. CrowdStrike provided a good demonstration of how bad this can be without CHERI.

Finally, we at SCI Semiconductor presented a demonstration of the network-stack restart work that we released over the summer. This ran on Sonata, but (as with the other demonstrators) will be trivial to port to our ICENI CHERIoT chips, which are expected early next year. This showed a simple multi-colour light that was connected to the Internet via MQTT. The CHERIoT network stack runs the FreeRTOS TCP/IP stack (‘FreeRTOS+TCP’) in a compartment. We introduced a memory-safety bug into this code, which forms a key part of the attack surface (it’s the thing that has to process packets that come from the Internet, where all of the bad people live). When this is triggered, we see a CHERI exception on Sonata’s CHERI fault LEDs and the network connection is dropped. The TCP/IP compartment is then restarted automatically and the application code resumes:

Video showing Hugh the Lightbulb, an Internet-connected multicolour light. The video shows an Android app controlling the CHERIoT code and demonstrates that a memory-safety bug in the TCP/IP stack does not crash the system, but is caught and the TCP/IP stack gracefully recovers.

The code for this demo is available. Note that there’s nothing in the application-specific part of the code related to the TCP/IP stack crashing. From the perspective of a consumer of the TCP/IP APIs, sockets just return a disconnection error. The normal reconnection paths then succeed once the TCP/IP stack has been restarted.

CHERI Myths: I don’t need CHERI if I have safe languages

2024-08-28T00:00:00+00:00

There’s a recurring myth that CHERI and safe languages are solving the same problems and that, if you have one, you don’t need the other. When I joined the CHERI project in 2012, my primary motivation was producing hardware that enabled safe interoperability between languages, so I never felt that safe languages and CHERI were in tension. Quite the reverse: My work on CHERI was motivated by a desire to enable safe language adoption and the combination of CHERI and safe languages is far more powerful than either in isolation.

CHERI and safe languages provide very different benefits. Each have their strengths and weaknesses and, in this post, I’ll try to explain how their strengths combine.

A single memory-safety bug can ruin your entire day

The WannaCry ransomware attack, which cost billions of dollars, was enabled by a single use-after-free bug. The CrowdStrike incident, which also cost billions of dollars, was the result of an uninitialised-use bug (compounded by a lot of operational failures).

Memory-safety bugs are particularly bad because they step outside of the language’s abstract machine. After a program passes a memory-safety bug, it may do anything. It may access objects that should not be reachable, modify immutable objects, and even execute instructions out of sequence to do things that are not part of the source-code description of what the program should do at all.

A single memory-safety bug can be all that an attacker needs to get an arbitrary-code execution exploit. Rewriting 99% of a program in a safe language but leaving just one memory-safety bug does not prevent at-scale deployment of malware.

This was one of the problems that CHERI aimed to solve, in two ways. At the fine granularity, CHERI offers language-agnostic memory safety primitives. Compilers can generate, or humans can write, assembler making use of these primitives to capture and then enforce memory safety of source-level objects. The switcher is the most privileged part of a CHERIoT system and is written in assembly (it has to do things like save all registers, which are not possible in any high-level language). In spite of its privilege, it remains bound by the memory-safety rules. It cannot access any compartments other than the caller and callee. It cannot access any memory other than that reachable from the registers when it is invoked and its internal state, unlike a traditional OS kernel that can access anything.

Enforcing memory-safety guarantees in the hardware makes it possible for safe-language code to call unsafe or, most excitingly, differently safe code and yet still enforce source-level properties. Consider a single program composed of code written in all of Rust, Java, and Haskell. Each is in the ‘safe languages’ category, but provides different guarantees. Rust code passing a borrowed reference to Java may wish to ensure that it isn’t captured, a property that CHERIoT can enforce directly (or CHERI can enforce with indirection). Haskell passing an object to Java or Rust may wish to guarantee deep immutability, a property that CHERIoT and all Morello or later CHERI platforms can enforce.

When a CHERI CPU raises a trap, it happens before any data has been leaked or corrupted as a result of the memory-safety violation. This means that it’s at a place where you can recover. For example, a JVM attempting to write to an immutable object borrowed from Haskell can raise an exception in Java. The Haskell component doesn’t need to know how Java handles dynamic failure, it just knows that nothing can modify its immutable objects.

At a coarser granularity, CHERI makes it easy to sandbox entire unsafe components. You can take a C library and put it in an isolated compartment. A lot of attempts have been made to build sandboxing technologies that are independent of memory safety but they have not been widely adopted because they don’t lend themselves to a simple programmer model. Programmers think about memory in terms of objects with pointers between them, not in terms of pages mapped in a linear address space. CHERI lets you share objects between compartments, without having to think about anything lower level than the language abstract machine. You can use CHERI to share a read-only view of an object graph, a write-only view of a buffer, and so on.

In the CHERIoT network stack, we take advantage of this between the TLS and TCP/IP compartments. The TLS stack maintains its own ring buffers for messages. When it calls into the TCP/IP stack to receive some data, it passes a bounded capability to a subset of the buffer, with write-only, no-capture permissions. By the end of the call, the TLS stack knows that the TCP/IP stack doesn’t have access to the buffer and can’t have leaked any stale data in there. The TLS compartment contains BearSSL, which is mostly written in a domain-specific language that compiles down to C for side-channel-resistant crypto routines, but the same guarantees are available to any language that can target the platform.

Rewriting large codebases is rarely the correct answer

There’s a pervasive narrative in some camps that rewriting everything in a safe language is the path to fixing security.

Rewriting comes with a large opportunity cost

Just looking at open source code, there are around ten billion lines of C/C++ code. It’s not clear how much more exists in proprietary projects. Optimistically, the cost of rewriting all of this is hundreds of billions to trillions of dollars.

That’s a lot, but most importantly, it’s investment that takes away from writing new code. If one team decides to spend a year rewriting a new project and another spends the year writing new code that adds new features, which do you think will be more successful in the market? Rewriting code with no user-visible benefits is not likely to drive sales.

Writing new code that solves new problems, of course, doesn’t come with opportunity cost concerns. If you can write new code in a safe language, especially a safe language that improves developer productivity, then that’s an obvious win.

Rewriting can introduce new bugs

Will a team that rewrites a legacy codebase or one that incrementally improves one have fewer bugs in their codebase? If you rewrite existing code in a safe language then you can guarantee, by construction, that the code does not have bugs in the category that the language enforces. The value of this can’t be understated. Knowing that code that compiles is free from a particular class of error has a huge value.

In spite of this improvement, there remains a big difference between ‘memory safe’ and ‘correct’. Even formal verification doesn’t guarantee correctness, only that the properties that are checked hold as long as the axioms hold. In general, new code is more likely to contain bugs than old code (which people have tested and fixed bugs in, in production, often over many years). We saw this with Microsoft’s confusingly named sudo for Windows: It was written in Rust, but the lack of memory safety bugs did not prevent it from being a poor design for security.

Again, writing new code in a safe language is likely to reduce the number of new bugs that you introduce relative to something like C and so is often a better choice.

Rewriting may not be possible

All of this is assuming that rewriting is possible. A lot of TCB code is intrinsically unsafe. A memory allocator, for example, defines the notion of a heap object and so must sit below the abstraction of any notion of memory safety. The same is true of any part of an OS kernel that has views of userspace memory. These can definitely be made safer, but not safe.

Conversely, some code is intrinsically safe. Some C codebases have never had a memory-safety issue because they simply don’t do any pointer arithmetic or dynamic allocation: they work with pre-allocated fixed-sized structures. Rewriting these in a memory-safe language is often trivial but yields few benefits.

Rewriting requires specialised skills

Consider something like the FFMPEG library’s libavcodec. This has had a lot of optimisations implemented over the years, which rely on intimate knowledge of the problem domain. There are almost certainly people who could rewrite it in Rust and get at least equivalent performance. Hiring an experienced Rust programmer is hard (though getting easier), hiring an expert on video and audio CODECs is very hard (and not getting easier), hiring someone with skills in both is incredibly hard.

The same applies to things like crypto libraries, where it’s very easy for an experienced programmer who is not a domain expert to introduce vulnerabilities. This may be in the form of timing side channels, or more subtle things such as choosing keys that are vulnerable to particular known attacks.

This doesn’t always apply. For more common tasks, the metaprogramming facilities in a higher-level language may make the rewrite significantly simpler to write and maintain than a C original.

CHERI does not fix bugs for you

CHERI doesn’t guarantee that your code is free from memory-safety errors, it guarantees that any memory-safety bugs will trap and not affect confidentiality or integrity of your program. With compartmentalisation, components can be isolated into separate failure domains, which improves reliability a lot. For example, a memory-safety bug in our TCP/IP compartment will trigger a reset of that compartment.

This is a much weaker guarantee than most safe languages provide. A lot of things that, in C/C++/assembly code will trap on CHERI, will cause compile failures with Rust or Ada. If they cause a compile failure, then you have to fix them before you ship your code and that guarantees that they don’t exist in production at all. This is obviously a benefit.

This benefit isn’t limited to memory safety. The more properties you can check at compile time, the lower your maintenance costs are likely to be. At some point, the cost of all of those checks becomes too large (in terms of writing them, modifying them as the codebase evolves, and compile time) but there are often a lot of simple properties that can be easily checked. Rust or modern C++ make it easy to check a lot of things at compile time and enforce properties of types that go well beyond memory safety.

CHERI protects you against bugs in your code doing damage, combining that with eliminating other categories of bugs is a win.

Your supply chain may not be memory safe

One of the big arguments in favour of Rust (and one that I have personally used) is that an unsafe keyword is a big flag in code review. This is a much better place to be for your own code than C, where anything that involves a pointer may be a memory safety bug unless you read all code that touches the pointer or other aliases of it. When you’re writing code with a set of non-malicious collaborators, this is a big benefit.

But what happens with the rest of the supply chain? The Rudra paper used some static analysis to find 264 memory-safety bugs in cargo crates, and issued 76 CVEs and 112 RustSec advisories as a result. These were difficult to spot unless you knew the exact incorrect idiom to look for, so a cursory glance at unsafe code in your dependencies would not have helped.

This is the best case. The bugs that Rudra found were accidental. The cve-rs project shows how to exploit a soundness issue in the Rust borrow checker to implement any kind of memory-safety bug entirely without unsafe. Even a type-safe language may have implementation bugs that make it possible to sneak in malicious code.

This is not just a problem for Rust. Go is a garbage-collected type-safe language, but it exposes a slice type that contains a base, length, and capacity. This is not atomically updated and so you can have two goroutines race storing to a shared variable that is a slice type and a third will eventually see the base of one and the length of the other, allowing out-of-bounds accesses, which can then be used to escape type safety. The Java VM is infamous for security vulnerabilities that allow programs to break type safety in various ways.

The more complex a type system is, whether it’s enforced statically or dynamically, the more likely that there will be a bug that allows an attacker to exploit it. When you’re pulling in a million-line dependency, auditing it to make sure that it doesn’t trigger such a vulnerability is impossible.

Java shipped with a lot of complex infrastructure for sandboxing components. The SecurityManager let you restrict the privileges of running code and elevate privileges later. This depended on the correctness of most of the rest of the JVM. In Android, Google removed it (technically it’s still there, you just got an exception if you installed a non-default policy) and made the process boundary the only security boundary. They did this because anything that depends on the correctness of the entire JVM for security is likely to be broken. The same rationale should apply to compartmentalisation models that rely on other languages’ type-safety guarantees.

CHERI does not have this complexity problem. The security properties of a CHERI ISA are simple and local. They have been formally verified at the ISA level on two CHERI variants and the CHERIoT Ibex core has had its implementation formally verified. This is possible because (slightly oversimplifying) the properties are easy to understand:

All memory accesses require a capability with the right permissions.
Capability permissions may not increase.

Beyond that, the TCB code that enforces higher-level properties such as compartment isolation, thread isolation, and no-capture guarantees (in combination with hardware) is only around 300 instructions on CHERIoT and so should also be amenable to formal verification.

This means that you can have strong run-time guarantees even when using malicious third-party code that exploits bugs in a type checker for a safe language. These guarantees also apply to binary-only components, including ones written in unsafe languages.

CHERI compartmentalisation does not just restrict the damage from memory-safety issues. The cheriot-audit tool lets you audit exactly which things a compartment can do outside of itself in a CHERIoT firmware image (which functions it can call in other compartments, which pre-shared objects and memory-mapped I/O regions it can access and with which permissions). You can reason about the damage from a compromise even if an attacker can gain arbitrary-code execution in a compartment. For supply-chain security, you should assume that a third-party component is compromised and includes malicious code. CHERIoT lets you reason about what it can do in these cases. In contrast, if a Java or Rust component is malicious and uses (intentional or otherwise) unsafe language features, it can do anything that the program can do.

The future should be safe languages on CHERI

CHERI and safe language give different sets of guarantees, but those guarantees compose to be stronger than either in isolation. CHERI makes it easy to safely interoperate with untrusted code. Safe languages make it easier for your trusted code to be trustworthy. Safe languages with CHERI make it easy to interoperate between safe and unsafe languages, without compromising the guarantees of any language.

Folks at Kent have ported Rust to CHERI targets and others at AdaCore have ported Ada. We hope to have both of these on CHERIoT over the next year. Being able to generate code that runs in a CHERIoT compartment should be fairly simple but there’s additional work to make it easy to define and use compartment entry points and so on.

CHERIoT even comes with a JavaScript interpreter that preserves JavaScript type-safety properties even when C code takes pointers to JavaScript objects. Using CHERI doesn’t mean you have to stick with unsafe languages, it means that your transition to safe languages can be motivated by the benefits of those languages, not by the shortcomings of C.

CHERI Myths: Writing C/C++ for CHERI is hard

2024-08-22T00:00:00+00:00

I’ve had several conversations over the past six months where people who have never written C/C++ code on CHERI have told me that they expect it to be harder than on non-CHERI systems. I struggle a bit to understand this. If it were true, using tools like valgrind and Address Sanitier would make development harder, which makes you wonder why these tools exist.

I recently wrote some C and C++ for a non-CHERI target and, honestly, I can’t believe I used to do that regularly given how much harder it is. Even in environments with a fully working interactive debugger, writing working C/C++ is more effort than on CHERIoT where we don’t (yet) have debugger support.

Imagine you have an off-by-one error that overflows a buffer. On non-CHERI systems, it’s hard to track down. On the stack, it may be in padding and have no effect. It may have no effect in debug builds, but cause corruption in release builds where the stack layout is different. If it’s in the heap, it may corrupt some unrelated object and the symptoms show up much later. I run in valgrind or address sanitiser and hopefully get a useful result.

On any CHERI target, I get a deterministic fault. Every time I read or write that one-byte-out-of-bounds value, I get the same fault. On CheriBSD, I’d attach the debugger and see where it happened. On CHERIoT, until we get a working debugger, I’d include the (somewhat poorly named) fail-simulator-on-error.h header, which installs a default error handler. When the error is triggered, this prints the exact instruction that tried to read or write out of bounds. I’d then look in the dump file, which would tell me the line number, and fix it. This typically takes me a minute or two, if that.

Similarly, if I have a use-after-free error, there’s some probability that address sanitiser will find it. Valgrind is a bit better, but is very slow. On CHERIoT, I get a trap as soon as I try to use the dangling pointer and I fix it in the same way as a spatial error.

Importantly, the CHERI exception happens before any data corruption. I’m not trying to work backwards from a point where my heap or stack is corrupted to try to find the place where the corruption occurred, I’m told exactly where the bug is. The first use of a dangling pointer or the first out-of-bounds access to an object will trigger a CHERI exception and point to precisely the instruction that is doing the wrong thing.

Note that all of this is about incorrect code. CHERI C and C++ try very hard to give you a standards-compliant (and de-facto standards-compliant, allowing things that the standard leaves open to implementations but everyone assumes are fine) implementation. Almost all of the C and C++ code that we’ve tried to run on CHERIoT has worked with no source-code modifications. Most of these are well-tested codebases, sometimes MISRA C with loads of static analyses run, which probably don’t have any memory-safety bugs.

The things that cause CHERI traps are undefined behaviour in C/C++. When your program does something that is undefined behaviour, the space of possible behaviours is unbounded. You may get a segmentation fault. You may get arbitrary data corruption. You may get a totally unexpected sequence of instructions executed. Bugs that introduce undefined behaviour are the hardest to debug, because they mean that later code (or, in some exciting examples, earlier code) is all depending on properties that are not true and so can do absolutely anything. Trapping on these things, rather than corrupting state, is a huge improvement to the debugging experience.

If you’re writing correct code, you probably won’t notice the difference between CHERI and non-CHERI systems. If you’re writing buggy code (which, let’s face it, we all do, at least some of the time), CHERI lets you catch errors sooner.

We’ve heard from several of the companies that prototyped on Morello that they want to keep their Morello systems for CI for precisely this reason: testing in Morello finds bugs earlier.

The ‘shift-left’ idea comes from the fact that bugs cost more the later they’re found. If you can avoid bugs at the design time, that’s perfect. If you can avoid them before you ship a product, that’s good. If you can detect them in production and recover, that’s okay. If you don’t detect them and they impact customers, that’s the worst (just ask CrowdStrike). Developing for a CHERI target makes it easy to find bugs before you ship them. It typically costs at least one order of magnitude less to fix them at this point than after deployment.

The ‘shift-left’ benefits for CHERIoT don’t end at catching bugs early. If you compartmentalise your software, failures in production can become recoverable failures in production. For example, the CHERIoT network stack now restarts the compartment that contains the FreeRTOS TCP/IP stack if it crashes. From the perspective of the rest of the system, all connections drop (something that you have to handle anyway because networks are unreliable) and need to be reconnected.

All of this makes developing and shipping products cheaper on CHERI systems than on conventional hardware.

£15,000 grants to prototype on CHERIoT

2024-08-21T00:00:00+00:00

Digital Catapult today announced a new Technology Access Programme (TAP) that covers CHERIoT. The Digital Security by Design (DSbD) TAPs are intended to help companies prototype on CHERI systems, to build the CHERI ecosystem. Prior TAPs have been restricted to Arm’s Morello prototype system. This is the first that allows participants to build on CHERIoT.

The programme will provide lowRISC’s excellent Sonata board to participants (these are also now available to buy). This board makes it incredibly easy to get started with CHERIoT. We’ve previously shown that you can go from a standing start to running CHERIoT code in two minutes with Sonata:

Video showing how to start working in CHERIoT RTOS with Sonata. First clone the CHERIoT-RTOS repository from GitHub and open it in the dev container when prompted. Next, open a source file and observe that things like cross-references and inline API documentation work out of the box. Then run `xmake config --sdk=/cheriot-tools --board=sonata` in one of the projects to configure it. Finally, run `xmake` and `xmake run` to build and run.

The basic environment gives you spatial and temporal memory safety out of the box, a privilege-separated RTOS, and a very easy mechanism for splitting your code into isolated compartments with fine-grained sharing. You can try the compartmentalisation exercise to see how easy it is to define compartment boundaries for fault isolation, protecting secrets, or mitigating compromises. This exercise works in the simulator (you can even run it in a GitHub Code Space if you deploy one from here) and on Sonata.

The CHERIoT prototype compartmentalised network stack runs on Sonata. Between the compartmentalisation strategy employed and the foundational properties of the CHERIoT ISA, this provides a system where most bugs in the TCP/IP stack have little or no security impact.

Combined with Sonata’s range of I/O facilities, this gives an excellent prototyping platform for secure IoT systems. Anything that runs on Sonata should then be easy to port to SCI Semiconductor’s ICENI devices next year for commercial deployment at scale.

If you have a commercial IoT product that you want to be able to easily support in production for 10+ years, this TAP is a great way for you to explore how CHERIoT can help.

If you’re considering participating in this TAP, and have any questions about the CHERIoT Platform, please don’t hesitate to ask them in GitHub Discussions or our public Signal chat.

Sharing objects between compartments

2024-08-15T00:00:00+00:00

The CHERIoT compartment model is similar to an object-oriented model, where each compartment exposes a set of entry points (analogous to methods) that can be called by other compartments. This works well for compartmentalising a lot of libraries: just expose their public API as compartment entry points.

One of the common questions from people starting to put some existing code in a compartment is: How do I export a global from this library? To which the obvious answer is: what does that even mean?

When you expose a function from a compartment, the security properties are well defined. Control flow will transition from callers to that entry point. The switcher will ensure that only things passed as arguments are visible in the callee. On return, the switcher will ensure nothing except the return value (and things reachable from it) are exposed to the caller.

But what are the security properties when you share a global? Should every compartment that can access it be able to write to it? This may be what you want (assuming a small number of compartments can access it). For example, if you have some performance-monitoring counter where the primary requirement is to minimise the probe effect. In this case having a compartment write an invalid value is less of a problem than the performance overhead of a cross-compartment call for each update.

In other (more common) cases, you may want to expose an object that one compartment can write to but many can read. We had one example of this in the core of the RTOS already. The allocator exposes an epoch counter that it increments when it starts and finishes inspecting a list of hazard pointers (so odd numbers indicate that it’s in the middle of a read). We were (ab)using the mechanism that we have for importing capabilities for memory-mapped I/O regions for this, but it was not a generic mechanism.

Most examples with similar requirements defined a global in one compartment and then exposed an entry point that returned a pointer to it. For example, the SNTP compartment in the network stack provides a shared library for getting the current UNIX timestamp using the CPU counter and the last value from NTP. A read-only pointer to the value from NTP is fetched by calling a function exported from the SNTP compartment. This code would be simpler if it were possible to simply import the cached time as a pre-shared object.

This week, we’ve added a fully supported abstraction for these use cases. The first part introduced the support in the RTOS This introduces macros for importing a pre-shared object with all permissions or with a subset of permissions. It also extends the build system to allow compartments to define pre-shared objects that they need.

Note that the last bit is not the same as defining pre-shared objects that they export. There is no notion of a compartment exporting globals. Instead, there are pre-shared objects that are imported by one or more compartments. This distinction is important because there may not be a canonical owner for a global.

When you define a shared object, you specify its name and size. If two compartments define an object of the same name and different sizes, the build will fail.

With the RTOS bits done, the next part was making pre-shared objects show up in the linker reports. Now, when you define an object, you’ll see something like this in the SharedObjects section of the linker report:

{ 
  "end": 2147605688,
  "name": "exampleK",
  "start": 2147604664
}

This describes the start and end address of the object and its name. In this case, it’s a 1 KiB object called exampleK. You’ll also see a corresponding entry in the imports section for anything that imports this object, for example:

{
  "kind": "SharedObject",
  "length": 1024,
  "permits_load": true,
  "permits_load_mutable": true,
  "permits_load_store_capabilities": true,
  "permits_store": true,
  "shared_object": "exampleK",
  "start": 2147604664
}

This shows the object name, its address and length (which may be smaller than the global in the future, though always match it for now). It also defines the set of permissions that this has.

As with the rest of the linker report, we don’t expect normal humans to ever read this directly. This brings me to the last part, the cheriot-audit integration.

This adds some helper functions for inspecting shared objects. For example, we have two pre-shared objects associated with the allocator. The hazard-pointer list is accessible only by the allocator (a capability to a subset of it for the current thread can be obtained via a call to the switcher). The epoch counter can be read by anything but must be written only by the allocator. We have added this to the RTOS policy like this:

data.compartment.shared_object_allow_list("allocator_hazard_pointers", {"allocator"})
data.compartment.shared_object_writeable_allow_list("allocator_epoch", {"allocator"})

If the allocator_hazard_pointers object is accessible by any other compartment or if allocator_epoch is writeable by anything except the allocator, this will fail. For some defence in depth, we also restrict the permissions with which the allocator imports the hazard pointer array and so we can also check that we got that right in the auditing policy:

some hazardListImport
hazardListImport = [ i | i = input.compartments.allocator.imports[_] ; i.shared_object == "allocator_hazard_pointers"]
every i in hazardListImport {
    i.permits_load == true
    i.permits_load_store_capabilities == true
    i.permits_load_mutable == false
    i.permits_store == false
}

The first two lines use a Rego comprehension to collect every import from the allocator compartment that refers to the hazard pointers object. We then assert that, for every one of those imports, the permissions are the same and permit loading capabilities, but not storing them or storing through any loaded capabilities.

This kind of policy is easy to write and flexible. As with other CHERIoT policies, it’s up to you how you use them. You can use this to drive the code-signing choices for built firmware, make your build fail entirely if they fail, or just use them for introspection.

How to talk to your parents about hardware memory safety

2024-08-06T00:00:00+00:00

Some conversations are difficult to have with members of older generations who grew up with different social norms. In particular, when you’re talking to people who grew up with PDP-11s with their completely flat memory, or Lisp machines or Burroughs Large Systems with their deeply opinionated and language-integrated hardware memory safety, you may find it hard to talk about CHERI. This guide aims to help you have those conversations with the minimum of stress on both sides.

You don’t understand me or my object model!

When we talk informally about CHERI, it’s tempting to say things like ‘CHERI provides memory safety’ or ‘CHERI gives you control-flow integrity’. The CHERI project started 14 years ago and people who have been working on it for a decade or so know that this is a shorthand but when you’re engaging with people who haven’t yet accepted the truth of capability systems into their life, it’s important to be precise.

A CPU instruction set architecture (ISA), or an ISA extension, for general-purpose computing, cannot provide memory safety. A definition of memory safety starts from a definition of an object model. C has an object model, as has Rust, Java, and so on. Each object model defines the bounds of objects and their lifetime. If you access an object after the end of its lifetime, or outside of its bounds, you have violated memory safety. A CPU can’t provide memory safety because it doesn’t know what the object model for the running program is.

The same is true for control-flow integrity (CFI). CFI defines a program as a directed graph of blocks of instructions with arcs in between them representing things like function calls, returns, and so on. Again, the existence of this graph is a property of the language. For example, in C++ if you call a virtual function then the object on which it’s called must be a subclass of the class on which the function is defined. This set of properties can be quite rich and so most CFI schemes focus on preventing particularly dangerous invalid transitions, rather than preventing all invalid transitions. For example, a C CFI scheme may allow you to call system when you meant puts, because they take the same argument type, but prevent you from calling fputs. This is a dangerous conflation (it may allow an attacker to run a program rather than printing a string) but it doesn’t corrupt the state of the running program.

You’re in my language and you’ll follow my language’s abstract machine’s rules!

So what’s the point of CHERI if an ISA can’t give you these properties? CHERI is not designed to give you an object model that you must conform to, it’s designed to give language implementers tools to enforce these properties. This is very different from a lot of earlier memory-safe system. Intel’s iAPχ 432 was designed around the Algol model (though not very well), as was the B5500 (somewhat better). Various Lisp machines implemented the Lisp memory model. Such an approach is not feasible today, in a world where most programs use components written in a variety of languages.

That’s a very important distinction and it becomes even more important when you ask the follow-on question: Against whom, exactly, are these properties being enforced?

We assume that a compiler has some notion of an object model. It may be able to enforce that memory model entirely because the source language has strong typing guarantees. For example, we have a JavaScript VM in the CHERIoT RTOS repository that builds its own garbage-collected type-safe object model using 16-bit integers to represent object pointers (more on that later). That compiler needs to generate binaries that interoperate with other code. The other code may be third-party code compiled with the same compiler but rely on exploiting compiler bugs to bypass certain guarantees. It may be written in a different language with different guarantees, or it may be assembly code with no enforced concept of types or an object model. Our goal is to allow the compiler for one language to enforce these properties against all of that other code, irrespective of the source language or compiler (if any) used to produce the other code.

Note that this is not just about providing C memory safety. A lot of the work so far is on C and C++, but the same problems occur in safer languages such as Java or Rust. For example, here’s a snippet from the Rust FFI manual that calls a C function from the snappy library:

use libc::size_t;

#[link(name = "snappy")]
extern {
    fn snappy_max_compressed_length(source_length: size_t) -> size_t;
}

fn main() {
    let x = unsafe { snappy_max_compressed_length(100) };
    println!("max compressed length of a 100 byte buffer: {}", x);
}

Note that the call from Rust to C requires the unsafe keyword. This recognises the fact that Rust enforces a lot of useful properties, such as lifetime safety, that C does not. As soon as you transition into C code, these properties are no longer automatically enforced. As a Rust programmer, you are expected to enforce these properties at the API boundary, but in the general case that is impossible on most hardware. For example, there’s no Rust wrapper that you can write that prevents a C library (on non-CHERI hardware) from mutating an immutable object, or from capturing a borrowed reference.

But what if the hardware could help? What if the Rust compiler could tell some combination of the hardware and the language-agnostic run-time system that a pointer must not be captured, or may be used with load instructions but not stores?

So what is this chariot thing you kids are talking about?

CHERIoT is a hardware-software platform built on a variant of CHERI that is optimised for small embedded and IoT applications. It builds on all of the prior CHERI research and makes a set of design choices optimised for tiny low-power devices.

CHERIoT allows a compiler to enforce a fairly rich set of properties against all of the other parts of a program. It can guarantee that, when invoking untrusted code, the untrusted code cannot:

Access objects unless passed pointers to them.
Access the memory that was formerly used for an object after that object has been freed.
Hold a pointer to an object with automatic storage duration (an ‘on-stack’ object) after the end of the call in which it was created.
Hold a temporarily delegated pointer beyond a single call.
Modify an object passed via immutable reference.
Modify any object reachable from an object that is passed as a deeply immutable reference.
Tamper with (or access) an object passed via opaque reference.

Some of these are baseline guarantees from CHERI, some are built atop those with additional hardware extensions, and others are built atop them with some language-agnostic software. Let’s go through them one at a time.

It’s rude to point!

Running code cannot access objects unless passed pointers to them. This is a basic property of CHERI but also the absolute minimum foundation for any form of memory safety. If malicious or buggy code can materialise (or forge) arbitrary pointers out of thin air and use them, it can bypass any of the other rules. In any CHERI system, any instruction that accesses memory (loads, stores, jumps, and so on) must take a capability as one of its operands. A capability (in general) is an unforgeable token of authority that allows you to do some action. A capability system is one in which all operations require an authorising capability to be presented. A CHERI capability is a machine word that is interpreted, and protected, by the CPU and which can be used as a hardware type that implements the language notion of a pointer. CHERI capabilities were designed to allow everything that correct C does with pointers, which is a superset of what most safer languages permit.

That capability is protected both in registers and memory by a tag bit. The tag bit is an attestation from the hardware that there is a valid chain of provenance from the root capabilities that authorise everything down via a sequence of subsetting operations that end up with the capability that you hold. On CHERIoT, for example, if you have a capability to a heap object, this is telling you that the allocator gave out a capability to a subset of the heap (which itself was given a subset of the address space by the loader). The CPU doesn’t know any of the bits about the software model, of course, it just ensures that you didn’t create a fake valid pointer. The software model defines what valid paths exist between the initial boot state and normal execution with you holding a pointer to a heap object, the hardware guarantees that some such path must have existed for you to hold that pointer.

I said you could look, not touch!

The two guarantees about immutability (code may not modify an object passed via immutable reference or any object reachable from a deeply immutable reference) are simple properties once you have unforgeable CHERI capabilities. These are both enforced via permissions. Each capability carries a set of permissions that defines the set of things that it can be used for. Removing store permission means that it can’t be used with store instructions (and so gives a read-only pointer).

Deep immutability isn’t just about controlling to top level pointer, you also need to make sure no pointer loaded by following pointers from the original object is ever used to store. Morello and CHERIoT also define a load-mutable permission. This permission allows you to transitively load capabilities that have store permission. When you load a capability, its store and load-mutable permissions are anded with load-mutable permission of the capability that authorised the load. This gives a very simple way of enforcing deep immutability. Removing a permission (and so constructing a deeply or shallowly immutable reference) is a simple register-register operation, less expensive in hardware than addition.

Permissions and bounds are monotonic. You can remove permissions from a capability but not add them. You can shrink the bounds, but not increase them. This means, for example, that one function can be given a pointer to a structure and can then create a read-only pointer to a single field of that structure to pass to another function.

You don’t necessarily need your source language to map this into the type system. A language that has a notion of read-only views of objects could use it automatically but in C/C++ we expose the operations to remove (and check) permissions as built-in functions, so you can use them for building your own security policies.

Why is there a seal living in my computer?

The ability to hand out tamper-proof opaque references is implemented with CHERI’s sealing mechanism. CHERI sealing associates an object type (a numerical value) with a pointer and, at the same time, makes the pointer unusable. Sealing a capability requires a second capability that authorises sealing with the specific type. Unsealing similarly requires a capability that authorises unsealing with a specific type. You can use this to enforce type safety for opaque types in the presence of unsafe code.

I used this extensively in the CHERI-JNI work back in 2017. When a Java program passed an object pointer to C, it was passed as a sealed capability. C code could do nothing with this other than pass it back to the Java VM via Java Native Interface (JNI) functions. The JNI exposes functions for getting or setting fields and calling methods. Each of these could unseal the object and know that it was a valid Java object. From there, there was no need for CHERI to be involved with type safety because every Java object carries a pointer to its class and so type checks were possible to implement purely in software. This highlights one of the core goals of CHERI: It enables languages to be safe, it doesn’t mandate how they enforce that safety. The Java VM can efficiently enforce type safety (and therefore memory safety) within its own world without CHERI (though CHERI can improve performance in a few places), but CHERI enables it to retain these properties even when calling C.

As with permissions, you don’t need these to be used directly in the language. In CHERIoT, we use sealed objects for almost anything where one compartment wants to provide a handle that lets other compartments ask it to do something. This includes allocating memory, reading or writing message queues, connecting to network servers, and so on. Cross-compartment type safety is useful even when you don’t have a type-safe language.

It’s a free memory!

The remaining three properties are somewhat more specific to CHERIoT. Preventing an object from being accessed after it has been freed is possible on other CHERI systems but is implemented in different ways. Cornucopia Reloaded explains an efficient way of implementing it on very large CHERI systems. CHERIoT takes an approach tailored for tiny embedded systems with no MMU. Each granule (8 bytes, by default) of memory that can be used for heap (which may not be all memory) has an associated bit in a bitmap. When an object is freed, the allocator sets all of the associated bits, marking the memory as freed. From this point on, you can never load a pointer to that object. Such pointers may continue to exist in memory but the CPU will clear their tag bit when you attempt to load them. This catches use-after-free errors immediately but is not sufficient for the program to be able to safely reuse the memory.

Eventually, the revoker (part of the hardware on CHERIoT) deletes pointers to freed objects from memory. After that has happened, the allocator can clear the bits in the ‘freed’ bitmap again and allow the memory to be reused. The monotonicity properties of CHERI mean that a pointer that points to an object can be used only to derive objects with equal or smaller bounds.

In CHERIoT, the memory allocator itself is just a special case of a language runtime. It hands out pointers to objects, marks them as free, and periodically triggers revocation, to ensure that no other component (irrespective of the language it’s written in) can access them after they’ve been freed.

This can be used as a foundation for other allocators. The Microvium JavaScript VM runs on CHERIoT and provides a garbage-collected heap for JavaScript. The Microvium heap is a modified semi-space compacting collector. Each object pointer in JavaScript is a 16-bit integer giving an offset into a logical heap space, which is actually constructed from a set of chunks. When the garbage collector is run, it will scan the chunks to find live objects, then allocate space to copy them into, and free the old chunks.

This means that every time C code captures a pointer to something on the JavaScript heap (for example, a string returned from mvm_toStringUtf8), that pointer will always point to the same object until the GC runs. After the GC runs, the pointer will become invalid. C code can use the mvm_Handle abstraction to keep objects live across collections, but (on CHERIoT platforms) if it doesn’t then it will have an invalid (unusable) pointer, not a pointer to a different object in the JavaScript heap.

Microvium also marks pointers to strings that it hands to C as read-only (no store permission), so JavaScript can do zero-copy sharing with C, but C code cannot violate the type-safety properties of JavaScript.

The final two properties are both variations of the same thing. It’s possible to pass a pointer to a function and ensure that the function does not capture that pointer. This is implemented with two properties on CHERIoT. The first is the two-bit information-flow-control model that CHERI has had since the very early days. This defines some capabilities as ‘global’ (and, conversely, the rest as local) and adds a store-local permission. You may store local capabilities only via a capability that has store-local permission. Global is not a permission (it doesn’t authorise you to do anything) but behaves like one: it can be cleared, but not set.

On a CHERIoT platform, the only memory that you will ever see with the store-local permission is the stack. This means that anything that you remove the global bit from can be held in registers or stored on the stack, but not stored on the heap or in globals. CHERIoT also adds a permit-load-global permission, so you can make this a deep property: no pointer loaded (at any depth of indirection) from this pointer may ever be stored on the globals or heap. This is combined with stack clearing in both directions of cross-compartment calls (with a bit of hardware assistance) to implement shallow or deep no-capture guarantees. A Rust compiler, for example, could use this to pass borrowed references to a C function and have a strong guarantee that the C code could not access the object beyond the function’s return.

So you mean compilers and ISAs can be friends?

None of these language-level properties come exclusively from the hardware. The hardware provides tools that the compiler can choose to use. The compiler can also choose not to use them when it doesn’t need to. For safe languages, a lot of properties are guaranteed by construction and so don’t need enforcing within that language’s code (or within the safe subset, for a language like Rust), though a compiler for safety-critical systems may choose to use them for defence in depth against compiler bugs. In C, for example, we don’t enforce stack temporal safety within a compartment because it’s easy for static analysers to track this kind of bug when they can see all of the code and it’s a better security-performance tradeoff to recommend that people aim the gun slightly away from their foot. We do enforce it at compartment boundaries, because we’re crossing a security boundary. If you choose not to run a static analyser that checks for stack temporal safety issues, you can introduce bugs into your own code, but not other compartments.

The same applies for CFI properties. CHERIoT provides a trusted stack for cross-compartment calls and a switcher that enforces a lot of properties on both the call and return path. Within a compartment, we provide forward and backwards return sentries for coarse-grained CFI (you can’t confuse function pointers and return addresses). This provides a few guarantees at library boundaries. You cannot jump into the middle of a library function (easily, you may be able to if it spills its program counter to the stack and doesn’t zero it on return). You cannot call a library function that disables interrupts without using the link register to return. Compilers can build richer abstractions for CFI, or they can accept that memory safety plus some coarse CFI still makes life very hard for attackers. This choice is not made for them and, if attackers find clever techniques for starting code-reuse attacks in the future, compilers can add additional defences within a compartment.

All of this is why it’s important not to talk about properties that CHERI systems enforce in terms of language-level properties. CHERI provides tools that allow implementations to enforce language-level properties but it does not define any of these in terms of language-level constructs. This is particularly apparent when, for example, you create a deeply immutable pointer in C code: something that you cannot enforce with language semantics (C lets you cast away const) but which CHERI can enforce even on assembly code that handles that pointer. CHERI doesn’t give C memory safety, CHERI gives you a set of tools that allow C, C++, Rust, Java, and so on to all share objects without letting any of them violate the safety properties of the others.

Formally verifying security properties of CHERI processors

2024-08-02T13:00:00+00:00

In 2018, the two infamous attacks Spectre and Meltdown raised awareness of timing side channels in the microarchitecture of modern processors. In the following years, ensuing research efforts have been made by academia and industry to investigate and mitigate these findings. So far, the focus has been on larger cores with powerful performance optimizations such as out-of-order execution and speculation. But even small processors optimized for low-power environments and featuring in-order pipelines can be vulnerable to these attacks. We have used our new formal verification framework VeriCHERI to detect a vulnerability to a Meltdown-style attack in CHERIoT Ibex. This was reported and fixed in May.

What makes a timing attack Meltdown-style?

The classical Meltdown attack exploits the out-of-order execution of processors to read from protected memory locations. In a Meltdown attack, the attacker can run code on the processor but has no access to certain memory locations, like the kernel address space. The key aspect is that the out-of-order execution creates a transient time window that allows an early execution of an (illegal) load access. This creates a race condition between the illegal access and the access permission check, which raises an exception and triggers corrective measures. In a functionally correct implementation of the processor, the illegally accessed data does not affect the architectural state of the processor, like the register file or the PC. However, it may leave a footprint in the microarchitecture, for example in the cache, to be extracted by the attacker. Race conditions between the effect of an illegal memory access and the access control checks preventing the access do not require features like out-of-order execution. Meltdown-style attacks can already be created by simple in-order load instructions. Out-of-order execution merely amplifies the effect by increasing the size of the transient time window.

The vulnerability we detected in CHERIoT Ibex fits into this category. Consider a situation where the PCC restricts fetching instructions from outside the current compartment. Executing a jump instruction, i.e., fetching an instruction from an address outside of the PCC bounds, results in an exception and a flushed pipeline. However, if the 32 bits contain unaligned instructions, e.g., due to compressed instructions, and the upper halfword contains the lower 16 bits of an uncompressed instruction, the processor uses an additional cycle to fetch the rest of the instruction, before the access bounds are evaluated and an exception is triggered. Whether or not the upper halfword contains an unaligned and uncompressed instruction is determined by two specific bits of the instruction (bits [17:16]). VeriCHERI, our new verification method, discovered that a potential attack exploiting this vulnerability can probe any word in the memory for these two bits. The probing result is then contained in the overall execution time and in the value of a performance counter.

The CHERIoT development team fixed this vulnerability by adjusting the fetch FIFO to always behave as if there was an unaligned compressed instruction (rdata[17:16] != 2'b11) when there is an address bound violation and the current fetch PC is unaligned.

How did we detect this vulnerability?

VeriCHERI is a new formal verification framework targeting security vulnerabilities in CHERI-enhanced processors. The key idea is that we start from abstract security requirements targeting confidentiality and integrity. Based on these general notions of security, we formulate security properties for the microarchitectural implementation. This is a significantly different approach compared to previous verification methods, which focus on verifying that the design conforms to a specification. VeriCHERI allows us to target not only security violations due to functional bugs, but also Meltdown-style timing side channels such as the one described above. At its core, VeriCHERI consits of only 4 security properties; these can be checked using the power of commercial property checking tools. Verification times for CHERIoT Ibex range from a few seconds to 31 minutes for detecting vulnerabilities in the original versions or to prove that the fixed design is secure. We refer interested readers to our paper about VeriCHERI [1].

Full citation

Anna Lena Duque Antón, Johannes Müller, Philipp Schmitz, Tobias Jauch, Alex Wezel, Lucas Deutschmann, Mohammad Rahmani Fadiheh, Dominik Stoffel and Wolfgang Kunz. VeriCHERI: Exhaustive Formal Security Verification of CHERI at the RTL. Accepted for publication at the 43rd International Conference on Computer-Aided Design (ICCAD ‘24), Association for Computing Machinery (2024). [doi]

BibTeX

CHERIoT projects moving into the CHERIoT Platform org

2024-07-31T00:00:00+00:00

I am pleased to announce that, today, Microsoft has transferred the CHERIoT Sail and CHERIoT RTOS repositories into the CHERIoT-Platform GitHub organisation. GitHub should redirect everything from the old locations, but it’s probably a good idea to update URLs in bookmarks, git clones, and so on.

CHERIoT began as a research project in Microsoft Research Cambridge, as part of Microsoft’s work on the Digital Security by Design Programme. The project’s goal was to explore several aspects of CHERI system design, specifically:

What can you do with software abstractions if you can assume CHERI?
How can you scale down the work on temporal safety to tiny embedded devices?
How can you remove the need for bolted-on security extensions by creating a holistic hardware-software security model around an assumption of memory safety?

As a research project, it was a resounding success. We demonstrated that a foundation of CHERI let you build tiny cores that provided a step change in the baseline security guarantees, along with a simple programmer model, in exchange for a tiny power and area overhead.

Microsoft; however, is not a microcontroller vendor and, for CHERIoT to be useful, it needs broad ecosystem support. This ecosystem is forming, with several companies making significant early contributions. Google has contributed, and we have merged, improvements to the ISA specification and various parts of the software stack. SCI Semiconductor is working to ship commercial CHERIoT SoCs to customers next year. lowRISC has built the Sonata FPGA platform for prototyping CHERIoT devices. Folks at Configured Things have written a fantastic demo application (see yesterday’s blog for more information).

At this point, CHERIoT is no longer a research project, it is an open source foundation supported by multiple vendors and these repositories’ moves reflect this fact. The CHERIoT-Platform organisation is now a centralised landing pad for anything CHERIoT related.

The two repositories that have moved today are the ISA specification and the RTOS. These are core parts of the platform.

The CHERIoT Sail repository contains the ISA specification, including both the formal model and the prose descriptions. This executable formal model can be used to prove properties of the ISA, verify that implementations conform to the specification, verify properties of software running on CHERIoT cores, and also build our golden model simulator. This repository contains everything that you need to be able to build a CHERIoT core and validate that it really implements the ISA. I believe that formal verification is a key part of any secure system. Formal specification is the foundation on which formal verification is built and I’m excited by the results we’ve seen so far from groups building on this.

The CHERIoT RTOS repository contains the core parts of the software stack. CHERIoT is unusual in being a complete hardware-software stack, where the hardware, programmer model, and software were all designed together. The CHERIoT RTOS is the embodiment of the compartmentalisation model that the CHERIoT ISA was designed to support (and, conversely, the CHERIoT ISA was designed to run the CHERIoT RTOS). Although you can run other operating systems on CHERIoT, you will get the most benefit from using the software stack that was designed around the guarantees that the hardware provides.

Readers who have been following the project for a while may notice one omission. Microsoft is not transferring the CHERIoT Ibex core. This remains an open-source core that is rapidly approaching commercial quality and is the first core that can be used to build a CHERIoT Platform. We want to make it clear that it is not the CHERIoT core, merely the first commercial-quality CHERIoT core.

When we originally prototyped CHERIoT, we built two hardware implementations. In addition to the Ibex, which aimed at production use, we also prototyped the CHERIoT ISA on the CHERI Flute processor. Ibex was optimised more for power and area, Flute more for performance. Ibex was a 3-stage pipeline with a 33-bit memory bus (requiring two cycles to load a capability). Flute was a 5-stage pipeline with a 64-bit memory bus (loading capabilities in a single cycle).

We did this to ensure that the CHERIoT ISA was not over-fitted for one particular microarchitecture. We expect it to scale from (tiny!) things the size of Ibex up to quite large microcontrollers.

The Ibex, which vital to the ecosystem in demonstrating the viability of CHERIoT, was never intended to be the only CHERIoT core. We welcome additional implementations (open and proprietary) and have no wish to bless any particular implementation and exclude others.

All of that said, the community engagement with CHERIoT Ibex has been amazing. It has been the focus of several formal verification efforts and I am very happy to see contributors from across industry and academia working to improve it. We will continue to include a cycle-accurate simulator of the Ibex in the CHERIoT Dev Container and will add any other implementations that are available under licenses that permit redistribution.

The CHERIoT Platform organisation currently has two maintainers: myself and Yucong Tao (Microsoft), and may add more in the future. Burdening a young project with too much bureaucracy early on is a sure way to kill it and so the administration is currently very light. This forms a nucleus around which we can form a CHERIoT Foundation if such a thing is desirable in the future.

As part of the repository transfer, I have signed a legal agreement with Microsoft that the components in these repositories will remain open source and under permissive licenses. This agreement will also bind future maintainers of the CHERIoT Platform. We are committed to providing an open ecosystem for CHERIoT devices. Device vendors are legally permitted to take as much or as little of this as they wish, and modify it as much as they wish. We encourage anyone with downstream changes that might benefit others to consider contributing them upstream.

Anyone wanting long-term commercial support for the CHERIoT software stack should contact SCI Semiconductor.