Clang 17 now in dev container, and other toolchain news

Thanks to a collaboration with the upstream CHERI toolchain, the CHERIoT toolchain has now been rebased onto Clang 17 from Clang 13, bringing us two years closer to upstream Clang. Major thanks to Alex Richardson (Google) and Sam Leffler (Google) for their work on this effort!

This update brings with it substantially improved support for C++20 features, as well as preliminary support for some C++23 and C23 features. It also brings with it many improvements to the core of the RISC-V code generator, notably benefiting code size: the firmware image for the Cheriot RTOS testsuite is 2.7% smaller when built with Clang 17 compared to Clang 13. Other highlights include compile time improvements, and too many under-the-hood fixes to enumerate. You can find detailed release notes for all Clang and LLVM releases on the LLVM.org releases page.

While we generally attempt to maintain compatibility between CHERIoT RTOS and the older toolchain, we recommend pulling the latest devcontainer (or otherwise updating your toolchain) to ensure the best experience.

Other Toolchain Improvements

Since landing the Clang 17 rebase, we’ve been busy bringing bugfixes and enhancements to the CHERIoT toolchain, including:

Language & Usability Improvements

Implemented a new Clang diagnostic to warn on compartment exports that return void, or where the return value is unused. This is important in practice, because cross-compartment calls can fail in the compartment switcher, and void returns make this failure undetectable. These warnings are disabled by default until CHERIoT RTOS has been updated for them, and are controlled by the -Wcheri-compartment-return-void compiler flag. Thanks to Robert Dazi for this one!
Allowed cheri_libcall-annotated functions to decay into unannotated function pointers. This is useful for passing the address of a cheri_libcall function as a callback within a compartment.
Improved linker error reporting if you accidentally omit the compartment export annotation on a declaration. lld will now look for matching unexported functions and provide a suggested fix.
Eliminated the need to repeat the minimum stack size in both the annotation and in the stack check, improving the ergonomics significantly. Below is an example of using the StackUsageCheck template in CHERIoT RTOS to verify stack usage, demonstrating the older style that repeats the size, and the new style that does not. The related STACK_CHECK(expected) macro in CHERIoT RTOS has been updated to use the new style, and the expected parameter will be removed in the future.
```
  __cheriot_minimum_stack(0x200)
  int old_style() {
      StackUsageCheck<StackCheckMode::Asserting, 0x200, __PRETTY_FUNCTION__> stackCheck;
  }

  __cheriot_minimum_stack(0x200)
  int new_style() {
      StackUsageCheck<StackCheckMode::Asserting, __cheriot_minimum_stack__, __PRETTY_FUNCTION__> stackCheck;
  }
```
Added support for “temporal” capability valid bit checking, using a new builtin __builtin_cheri_tag_get_temporal(void*). This is needed when reading the valid bit in situations where validity can change within the current function, such as around a deallocation or pinning. In all other circumstances, the existing non-temporal version should be preferred for better optimization. An example would be using a double-checked pattern when pinning with heap_claim_fast, where combining the two valid bit reads would yield incorrect code.
```
void func(Timeout *t, int *ptr) {
  if (__builtin_cheri_tag_get_temporal(ptr)) {
      int claim = heap_claim_fast(t, ptr, nullptr);
      if (claim == 0 && __builtin_cheri_tag_get_temporal(ptr)) {
          *ptr = 1234;
          // ...
      }
  }
}
```

Bugfixes

Fixed a recurring issue where the compiler would generate improperly mangled calls to memcpy, memmove and/or memcmp in specific circumstances, resulting in linker errors. This has now been fixed at the source.
Fixed linker errors that arise when taking the address of non-exported, non-libcall functions with non-default interrupt state annotations.

Optimizations

Taught the compiler to better optimize CAndPerm instructions, including constant folding and idempotence. This tends to benefit places where redundant CAndPerm instructions were generated by macros or C++ templates.
Freed up the TP/X4 register for the compiler’s use in code generation. This register is normally reserved as a “thread pointer” in RISC-V ABIs, but is not used for that purpose on CHERIoT. We haven’t observed this making a significant performance or size difference, but some compute-intensive code may benefit.
Re-enabled the MachineOutliner size optimization. This improved the firmware size on the CHERIoT RTOS testsuite by 4.4%, and will likely benefit other code bases similarly. However, this optimization uncovered an issue in the CHERIoT ISA related to return sentinels that has since been fixed in the specification. If your development board does not contain the fix, you will need to pass -enable-machine-outliner=never to the compiler. We have added automatic support for enabling this flag when required to the CHERIoT RTOS build system prior to enabling the optimization by default.
Improved code quality for unaligned memory accesses. We’ve seen this particularly benefitting some cryptographic code. Note that you need to be using an up-to-date SAIL simulator that supports misaligned memory accesses by default, or, if using an older simulator, you will need to pass -m to the simulator explicitly to enable them.

Looking Forward

We have one major improvement in the works that we hope to make available to CHERIoT toolchain users soon: sealed capability annotations.

This change adds a new pointer attribute __sealed_capability, which disallows any operations that would cause the pointer to be dereferenced, or to lose its __sealed_capability annotation. Once integrated with CHERIoT RTOS, we will be able to represent sealing, unsealing, and the propagation of sealed capabilities in a type-safe manner.

// CHERI sealing and unsealing operations now have signatures that are type-safe with respect to sealing.
void * __sealed_capability cheri_seal(void *cap, const void *type);
void * cheri_unseal(void * __sealed_capability cap, void *type);

int func(int * __sealed_capability ptr) {
    // This causes a compiler error!
    return *p;
}

We expect that integrating sealed capabilities into the type system will result in more ergonomic and less error-prone programming when dealing with sealing and unsealing operations, as well as detecting most incorrect dereferences of sealed capabilities at compile time.