<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://cheriot.org/feed.xml" rel="self" type="application/atom+xml" /><link href="https://cheriot.org/" rel="alternate" type="text/html" /><updated>2026-05-22T15:44:33+00:00</updated><id>https://cheriot.org/feed.xml</id><title type="html">CHERIoT Platform</title><subtitle>Welcome to the CHERIoT Platform, a hardware-software co-design project that provides game-changing security for embedded devices.</subtitle><entry><title type="html">Introducing Bitpacks</title><link href="https://cheriot.org/rtos/programming/2026/05/22/bitpacks.html" rel="alternate" type="text/html" title="Introducing Bitpacks" /><published>2026-05-22T00:00:00+00:00</published><updated>2026-05-22T00:00:00+00:00</updated><id>https://cheriot.org/rtos/programming/2026/05/22/bitpacks</id><content type="html" xml:base="https://cheriot.org/rtos/programming/2026/05/22/bitpacks.html"><![CDATA[<script src="https://cdn.jsdelivr.net/npm/wavedrom@3.6.1/wavedrom.unpkg.min.js"></script>

<script>window.onload = function() { wavedrom.processAll(); };</script>

<h1 id="motivation">Motivation</h1>

<p>C and C++ are low-level languages,
nominally especially well suited for working with hardware and
in performance-sensitive data processing applications,
and yet, they lack convenient, portable mechanism for working with bit-packed structures,
as often found in hardware memory-mapped I/O (MMIO) registers, network packets, and data serialization formats.
By “bit-packed”, we mean that the logical data elements are not necessarily aligned to “natural” boundaries,
as defined by the machine processing those data (for example, byte / octet or machine word boundaries),
and/or may not fully span between two such “natural” boundaries
(for example, a 5-bit field on an 8-bit byte machine).
The primary, syntactic mechanism that these languages do offer for sub-word field management is called “bit-fields”,
a provision to specify, in C++ jargon, the bit width of of a data member with integral or enumeration type.
Concretely, one writes something like</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">MyFields</span>
<span class="p">{</span>
	<span class="kt">bool</span> <span class="n">enable</span> <span class="o">:</span> <span class="mi">1</span><span class="p">;</span>
	<span class="k">enum</span> <span class="n">MyEnum</span> <span class="n">reference</span> <span class="o">:</span> <span class="mi">2</span><span class="p">;</span>
	<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">bias</span> <span class="o">:</span> <span class="mi">8</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>
<p>to specify a structure holding
an 1-bit, <code class="language-plaintext highlighter-rouge">bool</code>-typed field named <code class="language-plaintext highlighter-rouge">enable</code>,
a 2-bit <code class="language-plaintext highlighter-rouge">enum MyEnum</code>-typed <code class="language-plaintext highlighter-rouge">reference</code>, and
an 8-bit, <code class="language-plaintext highlighter-rouge">unsigned int</code>-typed <code class="language-plaintext highlighter-rouge">bias</code>.</p>

<p>Unfortunately, the layout (that is, the in-machine-memory representation) of such definitions
is almost completely unspecified by the language standards.
C++23 (well, <a href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/n4928.pdf">N4928</a>) says:</p>

<blockquote>
  <p>Allocation of bit-fields within a class object is implementation defined.
Alignment of bit-fields is implementation defined.
Bit-fields are packed into some addressible allocation unit.</p>

  <p>[Note 1: Bit-fields straddle allocation units on some machines and not on others.
Bit-fields are assigned right-to-left on some machines, left-to-right on others. – end note]</p>
</blockquote>

<p>While there is some provision for forcing alignments of, and controlling padding around, bit-fields,
the implementation-defined nature makes even this challenging to actually understand and use.</p>

<p>Moreover, bit fields are very much second class citizens:</p>

<ul>
  <li>
    <p>they may not have their address taken
(so there are no pointers to bit-fields; what would be the type of such a thing?);</p>
  </li>
  <li>
    <p>references to bit-fields must be <code class="language-plaintext highlighter-rouge">const</code>
(allowing implementations to quietly convert the reference to one of the bit-field’s underlying type);</p>
  </li>
  <li>
    <p>because bit-fields exist within aggregate structures that are not themselves numeric types,
converting between a bit-field aggregate and a numeric representation can be particularly troublesome,
especially in <code class="language-plaintext highlighter-rouge">constexpr</code> contexts where <code class="language-plaintext highlighter-rouge">reinterpret_cast</code>-s are forbidden.</p>
  </li>
</ul>

<p>As a result, implementations of drivers and codecs and such in practice tend to eschew language bit-fields
in favor of explicit bit-wise operations on numerically-typed words.
They may even go so far as to eschew C/C++ aggregates (<code class="language-plaintext highlighter-rouge">struct</code>-s and such) completely
and view entire MMIO regions as simply arrays of <code class="language-plaintext highlighter-rouge">uint32_t</code>-s,
with information about the word and sub-word fields’ layout
conveyed “out of band” via, for example, preprocessor <code class="language-plaintext highlighter-rouge">#define</code>-s.</p>

<p>Given the near ubiquity of dealing with bit-packed structures in embedded contexts, we wanted something better.
Thus, on an experimental basis, we have introduced
<a href="https://github.com/CHERIoT-Platform/cheriot-rtos/pull/558">“bitpacks”</a>
to the CHERIoT-RTOS programming framework.
Bitpacks are, at their core, simply machinery and syntax wrapping bitwise operators (shift, complement, AND, OR)
on an underlying value of integral type (say, <code class="language-plaintext highlighter-rouge">uint32_t</code>),
making it much more pleasant to view (and write code that views) that underlying value as containing sub-fields.</p>

<p>The remainder of this post gives examples of that machinery
(and of different ways of thinking about, naming, and typing sub-fields)
built around a simple, fictitious, but representative, MMIO device.</p>

<h1 id="a-simple-fictitious-mmio-device">A Simple, Fictitious MMIO Device</h1>

<p>Imagine a simple peripheral that generates data and has some configuration knobs about how it does so,
something like a very rough sketch of an analog-to-digital converter.
Let’s say that our device has</p>

<ul>
  <li>
    <p>a 32-bit control register with…</p>

    <ul>
      <li>
        <p>a control bit that enables or disables the device’s operation,
which must be disabled to configure the peripheral.
The mnemonics for this field’s values are <code class="language-plaintext highlighter-rouge">NoGo</code> and <code class="language-plaintext highlighter-rouge">Go</code>.</p>
      </li>
      <li>
        <p>a status bit that indicates whether the device is enabled and active.
The mnemonics for this field’s values are <code class="language-plaintext highlighter-rouge">Off</code> and <code class="language-plaintext highlighter-rouge">Running</code>.</p>
      </li>
      <li>
        <p>a choice of “reference”, with one internal and two external sources, creatively named “A” and “B”,
giving us an enumeration with mnemonics <code class="language-plaintext highlighter-rouge">Internal</code>, <code class="language-plaintext highlighter-rouge">ExternalA</code>, and <code class="language-plaintext highlighter-rouge">ExternalB</code>.</p>
      </li>
      <li>
        <p>an 8-bit “bias” value.
This field is more numeric than an enumeration, so has no associated mnemonics.</p>
      </li>
    </ul>
  </li>
  <li>
    <p>a 32-bit data register that can be read once the device is active.</p>
  </li>
</ul>

<p>Graphically, we might render that MMIO layout something like this:
<script type="WaveDrom">
{reg: [
  { name: 'Enable',    bits: 1 },
  { name: 'Active',    bits: 1 },
  {                    bits: 6 },
  { name: 'Reference', bits: 2 },
  {                    bits: 6 },
  { name: 'Bias',      bits: 8 },
  {                    bits: 8 },
  { name: 'Data',      bits: 32}
], config: {hspace: 900, lanes: 4, hflip: true}}
</script></p>

<h1 id="code-without-bitpacks">Code Without Bitpacks</h1>

<h2 id="an-array-of-uint32_t-is-all-you-need">An Array of <code class="language-plaintext highlighter-rouge">uint32_t</code> Is All You Need?</h2>

<p>A minimal, easily generated software description of such a MMIO device might look like this:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define MYDEVICE_OFFSET32_CONTROL 0
</span>
<span class="cp">#define MYDEVICE_CONTROL_ENABLE_SHIFT 0
#define MYDEVICE_CONTROL_ENABLE_MASK 0x1
#define MYDEVICE_CONTROL_ENABLE_V_NOGO 0x0
#define MYDEVICE_CONTROL_ENABLE_V_GO 0x1
</span>
<span class="cp">#define MYDEVICE_CONTROL_ACTIVE_SHIFT 1
#define MYDEVICE_CONTROL_ACTIVE_MASK 0x2
#define MYDEVICE_CONTROL_ACTIVE_V_OFF 0x0
#define MYDEVICE_CONTROL_ACTIVE_V_RUNNING 0x1
</span>
<span class="cp">#define MYDEVICE_CONTROL_REFERENCE_SHIFT 8
#define MYDEVICE_CONTROL_REFERENCE_MASK 0x300
#define MYDEVICE_CONTROL_REFERENCE_V_INTERNAL 0x0
#define MYDEVICE_CONTROL_REFERENCE_V_EXTERNAL_A 0x1
#define MYDEVICE_CONTROL_REFERENCE_V_EXTERNAL_B 0x3
</span>
<span class="cp">#define MYDEVICE_CONTROL_BIAS_SHIFT 16
#define MYDEVICE_CONTROL_BIAS_MASK 0xFF0000
</span>
<span class="cp">#define MYDEVICE_OFFSET32_DATA 1
</span></code></pre></div></div>

<p>(For a real-world example,
the <a href="https://opentitan.org/book/util/reggen/index.html">OpenTitan Register Generator tool</a>
produces C headers that look much like this.)</p>

<p>These <code class="language-plaintext highlighter-rouge">#define</code>-s give</p>

<ul>
  <li>
    <p>each register’s offset, in terms of 32 bit words,</p>
  </li>
  <li>
    <p>the shift to reach the least significant bit of each field,</p>
  </li>
  <li>
    <p>the (shifted, in-situ) mask of each field, and</p>
  </li>
  <li>
    <p>an enumeration of values for enumeration-typed fields.</p>
  </li>
</ul>

<p>An example function using such a definition to write code to configure the device,
wait for it to become active,
and then returning the first word of data could look like this,
taking a pointer to the device’s MMIO region’s base:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="nf">test</span><span class="p">(</span><span class="k">volatile</span> <span class="kt">uint32_t</span> <span class="o">*</span><span class="n">device</span><span class="p">)</span>
<span class="p">{</span>
	<span class="c1">// set device.control's enable field to NOGO</span>
	<span class="n">device</span><span class="p">[</span><span class="n">MYDEVICE_OFFSET32_CONTROL</span><span class="p">]</span> <span class="o">=</span>
	  <span class="p">(</span><span class="n">device</span><span class="p">[</span><span class="n">MYDEVICE_OFFSET32_CONTROL</span><span class="p">]</span> <span class="o">&amp;</span> <span class="o">~</span><span class="n">MYDEVICE_CONTROL_ENABLE_MASK</span><span class="p">)</span> <span class="o">|</span>
	  <span class="p">(</span><span class="n">MYDEVICE_CONTROL_ENABLE_V_NOGO</span> <span class="o">&lt;&lt;</span> <span class="n">MYDEVICE_CONTROL_ENABLE_SHIFT</span><span class="p">);</span>

	<span class="cm">/*
	 * Set device.control's reference field to EXTERNAL_A value and set bias to
	 * 0x7f at the same time.  Leave the enable field unchanged.
	 */</span>
	<span class="n">device</span><span class="p">[</span><span class="n">MYDEVICE_OFFSET32_CONTROL</span><span class="p">]</span> <span class="o">=</span>
	  <span class="p">(</span><span class="n">device</span><span class="p">[</span><span class="n">MYDEVICE_OFFSET32_CONTROL</span><span class="p">]</span> <span class="o">&amp;</span>
	   <span class="o">~</span><span class="p">(</span><span class="n">MYDEVICE_CONTROL_REFERENCE_MASK</span> <span class="o">|</span> <span class="n">MYDEVICE_CONTROL_BIAS_MASK</span><span class="p">))</span> <span class="o">|</span>
	  <span class="p">(</span><span class="n">MYDEVICE_CONTROL_REFERENCE_V_EXTERNAL_A</span>
	   <span class="o">&lt;&lt;</span> <span class="n">MYDEVICE_CONTROL_REFERENCE_SHIFT</span><span class="p">)</span> <span class="o">|</span>
	  <span class="p">(</span><span class="mh">0x7f</span> <span class="o">&lt;&lt;</span> <span class="n">MYDEVICE_CONTROL_BIAS_SHIFT</span><span class="p">);</span>

	<span class="c1">// set device.control's enable field to GO</span>
	<span class="n">device</span><span class="p">[</span><span class="n">MYDEVICE_OFFSET32_CONTROL</span><span class="p">]</span> <span class="o">=</span>
	  <span class="p">(</span><span class="n">device</span><span class="p">[</span><span class="n">MYDEVICE_OFFSET32_CONTROL</span><span class="p">]</span> <span class="o">&amp;</span> <span class="o">~</span><span class="n">MYDEVICE_CONTROL_ENABLE_MASK</span><span class="p">)</span> <span class="o">|</span>
	  <span class="p">(</span><span class="n">MYDEVICE_CONTROL_ENABLE_V_GO</span> <span class="o">&lt;&lt;</span> <span class="n">MYDEVICE_CONTROL_ENABLE_SHIFT</span><span class="p">);</span>

	<span class="c1">// wait for device.control's active field to be RUNNING</span>
	<span class="k">while</span> <span class="p">((</span><span class="n">device</span><span class="p">[</span><span class="n">MYDEVICE_OFFSET32_CONTROL</span><span class="p">]</span> <span class="o">&amp;</span> <span class="n">MYDEVICE_CONTROL_ACTIVE_MASK</span><span class="p">)</span> <span class="o">!=</span>
	       <span class="p">(</span><span class="n">MYDEVICE_CONTROL_ACTIVE_V_RUNNING</span> <span class="o">&lt;&lt;</span> <span class="n">MYDEVICE_CONTROL_ACTIVE_SHIFT</span><span class="p">))</span>
	<span class="p">{</span>
		<span class="n">yield</span><span class="p">();</span>
	<span class="p">}</span>

	<span class="k">return</span> <span class="n">device</span><span class="p">[</span><span class="n">MYDEVICE_OFFSET32_DATA</span><span class="p">];</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="how-about-a-struct-of-uint32_t-s">How About a <code class="language-plaintext highlighter-rouge">struct</code> of <code class="language-plaintext highlighter-rouge">uint32_t</code>-s?</h2>

<p>We can use C++ aggregates to at least make the “outer” layer of the MMIO structure,
the layout and naming of the registers (but not their fields).
Keeping the per-field parts of the above definition,
but replacing the register layout information with slightly more idiomatic C++
makes our example look something like this:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">RawDevice</span>
<span class="p">{</span>
	<span class="kt">uint32_t</span> <span class="n">control</span><span class="p">;</span>

	<span class="kt">uint32_t</span> <span class="n">data</span><span class="p">;</span>
<span class="p">};</span>

<span class="c1">// MYDEVICE_CONTROL_... as above</span>

<span class="kt">uint32_t</span> <span class="nf">test</span><span class="p">(</span><span class="k">volatile</span> <span class="n">RawDevice</span> <span class="o">&amp;</span><span class="n">device</span><span class="p">)</span>
<span class="p">{</span>
	<span class="c1">// set device.control's enable field to NOGO</span>
	<span class="n">device</span><span class="p">.</span><span class="n">control</span> <span class="o">=</span>
	  <span class="p">(</span><span class="n">device</span><span class="p">.</span><span class="n">control</span> <span class="o">&amp;</span> <span class="o">~</span><span class="n">MYDEVICE_CONTROL_ENABLE_MASK</span><span class="p">)</span> <span class="o">|</span>
	  <span class="p">(</span><span class="n">MYDEVICE_CONTROL_ENABLE_V_NOGO</span> <span class="o">&lt;&lt;</span> <span class="n">MYDEVICE_CONTROL_ENABLE_SHIFT</span><span class="p">);</span>

	<span class="cm">/*
	 * Set device.control's enable field to GO at the same time set the
	 * reference field to EXTERNAL_A value.
	 */</span>
	<span class="n">device</span><span class="p">.</span><span class="n">control</span> <span class="o">=</span> <span class="p">(</span><span class="n">device</span><span class="p">.</span><span class="n">control</span> <span class="o">&amp;</span> <span class="o">~</span><span class="p">(</span><span class="n">MYDEVICE_CONTROL_REFERENCE_MASK</span> <span class="o">|</span>
	                                     <span class="n">MYDEVICE_CONTROL_BIAS_MASK</span><span class="p">))</span> <span class="o">|</span>
	                 <span class="p">(</span><span class="n">MYDEVICE_CONTROL_REFERENCE_V_EXTERNAL_A</span>
	                  <span class="o">&lt;&lt;</span> <span class="n">MYDEVICE_CONTROL_REFERENCE_SHIFT</span><span class="p">)</span> <span class="o">|</span>
	                 <span class="p">(</span><span class="mh">0x7f</span> <span class="o">&lt;&lt;</span> <span class="n">MYDEVICE_CONTROL_BIAS_SHIFT</span><span class="p">);</span>

	<span class="c1">// set device.control's enable field to GO</span>
	<span class="n">device</span><span class="p">.</span><span class="n">control</span> <span class="o">=</span>
	  <span class="p">(</span><span class="n">device</span><span class="p">.</span><span class="n">control</span> <span class="o">&amp;</span> <span class="o">~</span><span class="n">MYDEVICE_CONTROL_ENABLE_MASK</span><span class="p">)</span> <span class="o">|</span>
	  <span class="p">(</span><span class="n">MYDEVICE_CONTROL_ENABLE_V_GO</span> <span class="o">&lt;&lt;</span> <span class="n">MYDEVICE_CONTROL_ENABLE_SHIFT</span><span class="p">);</span>

	<span class="c1">// wait for device.control's active field to be RUNNING</span>
	<span class="k">while</span> <span class="p">((</span><span class="n">device</span><span class="p">.</span><span class="n">control</span> <span class="o">&amp;</span> <span class="n">MYDEVICE_CONTROL_ACTIVE_MASK</span><span class="p">)</span> <span class="o">!=</span>
	       <span class="p">(</span><span class="n">MYDEVICE_CONTROL_ACTIVE_V_RUNNING</span> <span class="o">&lt;&lt;</span> <span class="n">MYDEVICE_CONTROL_ACTIVE_SHIFT</span><span class="p">))</span>
	<span class="p">{</span>
		<span class="n">yield</span><span class="p">();</span>
	<span class="p">}</span>

	<span class="k">return</span> <span class="n">device</span><span class="p">.</span><span class="n">data</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This compiles to the same code as the <code class="language-plaintext highlighter-rouge">test(volatile uint32_t *)</code> implementation above.</p>

<p>(<a href="https://github.com/ARM-software/CMSIS_6/">ARM’s CMSIS</a> contains many examples
of using <code class="language-plaintext highlighter-rouge">struct</code>-s for the layout of register words and <code class="language-plaintext highlighter-rouge">#define</code>-s for fields within registers).</p>

<h2 id="c-bitfields">C++ Bitfields?</h2>

<p>We can try to write our example with C++ bitfields, despite their implementation-defined nature
(perhaps we believe that there is and will only ever be exactly one C++ ABI for our machine).
That would make the MMIO region definition look something like this:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">BFDevice</span>
<span class="p">{</span>
	<span class="k">struct</span> <span class="nc">Control</span>
	<span class="p">{</span>
		<span class="kt">bool</span> <span class="n">enabled</span> <span class="o">:</span> <span class="mi">1</span><span class="p">;</span>

		<span class="kt">bool</span> <span class="n">active</span> <span class="o">:</span> <span class="mi">1</span><span class="p">;</span>

		<span class="kt">uint32_t</span> <span class="o">:</span> <span class="mi">6</span><span class="p">;</span> <span class="c1">// padding</span>

		<span class="kt">uint8_t</span> <span class="n">reference</span> <span class="o">:</span> <span class="mi">2</span><span class="p">;</span>

		<span class="kt">uint32_t</span> <span class="o">:</span> <span class="mi">6</span><span class="p">;</span> <span class="c1">// padding</span>

		<span class="kt">uint32_t</span> <span class="n">bias</span> <span class="o">:</span> <span class="mi">8</span><span class="p">;</span>

		<span class="kt">uint32_t</span> <span class="o">:</span> <span class="mi">8</span><span class="p">;</span> <span class="c1">// padding</span>
	<span class="p">}</span> <span class="n">control</span><span class="p">;</span>

	<span class="kt">uint32_t</span> <span class="n">data</span><span class="p">;</span>
<span class="p">};</span>

<span class="k">static_assert</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="n">BFDevice</span><span class="o">::</span><span class="n">Control</span><span class="p">)</span> <span class="o">==</span> <span class="mi">4</span><span class="p">);</span>
</code></pre></div></div>

<p>And in use, we’d run into the fact that C++ does not implicitly define copy operators
suitable for use with <code class="language-plaintext highlighter-rouge">volatile</code> bitfield aggregates,
so we’d resort to doing more writes than the above examples, with</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="nf">test</span><span class="p">(</span><span class="k">volatile</span> <span class="n">BFDevice</span> <span class="o">&amp;</span><span class="n">device</span><span class="p">)</span>
<span class="p">{</span>
	<span class="n">device</span><span class="p">.</span><span class="n">control</span><span class="p">.</span><span class="n">enabled</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>

	<span class="c1">// challenging to make these into one MMIO write, like we could above</span>
	<span class="n">device</span><span class="p">.</span><span class="n">control</span><span class="p">.</span><span class="n">reference</span> <span class="o">=</span> <span class="n">MYDEVICE_CONTROL_REFERENCE_V_EXTERNAL_A</span><span class="p">;</span>
	<span class="n">device</span><span class="p">.</span><span class="n">control</span><span class="p">.</span><span class="n">bias</span>      <span class="o">=</span> <span class="mh">0x7f</span><span class="p">;</span>

	<span class="n">device</span><span class="p">.</span><span class="n">control</span><span class="p">.</span><span class="n">enabled</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>

	<span class="k">while</span> <span class="p">(</span><span class="o">!</span><span class="n">device</span><span class="p">.</span><span class="n">control</span><span class="p">.</span><span class="n">active</span><span class="p">)</span>
	<span class="p">{</span>
		<span class="n">yield</span><span class="p">();</span>
	<span class="p">}</span>

	<span class="k">return</span> <span class="n">device</span><span class="p">.</span><span class="n">data</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>All the challenges aside, that sure looks like nicer code!
So, how do bitpacks help?</p>

<h1 id="introducing-bitpacks">Introducing Bitpacks</h1>

<p>Bitpacks offer a way to define, view, and manipulate typed fields within a numeric word.
Internally, fields are simply contiguous spans of bits within that word,
and operations are phrased in terms of bitwise operations (shift, complement, AND, OR).
Fields are accessed through “proxies”,
each of which can be thought of as a reference to a field within a particular word.</p>

<p>One defines a bitpack by defining a <code class="language-plaintext highlighter-rouge">class</code> (or <code class="language-plaintext highlighter-rouge">struct</code>) that inherits
from the <code class="language-plaintext highlighter-rouge">Bitpack&lt;typename Storage&gt;</code> class template.
The <code class="language-plaintext highlighter-rouge">Storage</code> template parameter defines the size (and type) of the underlying numeric word.</p>

<p>Defining a field – specifying its position within the word – amounts to filling out a <code class="language-plaintext highlighter-rouge">FieldInfo</code> structure:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">FieldInfo</span>
<span class="p">{</span>
	<span class="c1">/// Minimum 0-indexed bit position occupied by this field</span>
	<span class="kt">size_t</span> <span class="n">minIndex</span><span class="p">;</span>

	<span class="c1">/// Maximum 0-indexed bit position occupied by this field</span>
	<span class="kt">size_t</span> <span class="n">maxIndex</span><span class="p">;</span>

	<span class="c1">/// Should proxies of this field not provide affordances for mutation?</span>
	<span class="kt">bool</span> <span class="n">isConst</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>A <code class="language-plaintext highlighter-rouge">FieldInfo {3, 7}</code>, for example, specifies a field spanning bits 3 through 7 of a word.
From this, bitpacks can, internally, compute things like the <code class="language-plaintext highlighter-rouge">_SHIFT</code> and <code class="language-plaintext highlighter-rouge">_MASK</code> values used above.
We will almost never encounter <code class="language-plaintext highlighter-rouge">FieldInfo</code>-s by name, but we will specify initializers for them when defining fields.</p>

<p>The simplest way to build a proxy of a field, so that we can access it,
is to define a method within our bitpack that invokes
the <code class="language-plaintext highlighter-rouge">protected</code> <code class="language-plaintext highlighter-rouge">Bitpack::member&lt;FieldType, FieldInfo&gt;</code> method with appropriate arguments.
This is called a “named member” and the C++ syntax involved
is encapsulated within the <code class="language-plaintext highlighter-rouge">BITPACK_MEMBER_ADD</code> macro,
to which we now direct our attention.</p>

<h2 id="named-members">Named Members</h2>

<p>We can define a bitpack for our device’s control register,
and specify a named member for each field within, thus:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">NuDevice</span>
<span class="p">{</span>
	<span class="k">struct</span> <span class="nc">Control</span> <span class="o">:</span> <span class="n">Bitpack</span><span class="o">&lt;</span><span class="kt">uint32_t</span><span class="o">&gt;</span>
	<span class="p">{</span>
		<span class="n">BITPACK_USUAL_PREFIX</span><span class="p">;</span>

		<span class="n">BITPACK_MEMBER_ADD</span><span class="p">(</span><span class="n">enable</span><span class="p">,</span> <span class="kt">bool</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span> <span class="c1">// NoGo is false / Go is true</span>

		<span class="n">BITPACK_MEMBER_ADD</span><span class="p">(</span><span class="n">active</span><span class="p">,</span> <span class="kt">bool</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="nb">true</span><span class="p">);</span> <span class="c1">// Off / Running</span>

		<span class="n">BITPACK_MEMBER_ADD</span><span class="p">(</span><span class="n">reference</span><span class="p">,</span>
		                   <span class="kt">uint8_t</span><span class="p">,</span>
		                   <span class="mi">8</span><span class="p">,</span>
		                   <span class="mi">9</span><span class="p">);</span> <span class="c1">// values from MYDEVICE_CONTROL_REFERENCE_V_...</span>

		<span class="n">BITPACK_MEMBER_ADD</span><span class="p">(</span><span class="n">bias</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">23</span><span class="p">);</span>
	<span class="p">}</span> <span class="n">control</span><span class="p">;</span>

	<span class="kt">uint32_t</span> <span class="n">data</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>As discussed, <code class="language-plaintext highlighter-rouge">struct Control</code> now inherits from <code class="language-plaintext highlighter-rouge">Bitpack&lt;uint32_t&gt;</code>.
The <code class="language-plaintext highlighter-rouge">BITPACK_USUAL_PREFIX</code> macro hides some syntactic chatter that’s often useful
(including, notably, constructors and assignment operators).</p>

<p>Our fields are now specified by invocations of the <code class="language-plaintext highlighter-rouge">BITPACK_MEMBER_ADD(n, T, ...)</code> macro.
These invocations define methods named by <code class="language-plaintext highlighter-rouge">n</code>,
each returning a <code class="language-plaintext highlighter-rouge">T</code>-based field proxy for the field whose <code class="language-plaintext highlighter-rouge">FieldInfo</code> follows.</p>

<p>Field proxies offer a number of methods on the field they reference:</p>

<ul>
  <li>
    <p>Getting the current value of the field.
Proxies implicitly convert to their field’s type, so this is often syntactically “free”,
but it is also explicitly available as the proxy’s <code class="language-plaintext highlighter-rouge">.raw()</code> method.
In implementation, this is the expected shift and a mask to extract the field’s value.</p>
  </li>
  <li>
    <p>Setting the field value.
Proxies overload the assignment operator (<code class="language-plaintext highlighter-rouge">=</code>) to update the field’s bits appropriately.
All other bits in the underlying word are not altered.
In implementation, this is the expected sequence of shift, complement, AND, and OR operations (as we’ve seen above).
This is not provided by proxies of fields whose <code class="language-plaintext highlighter-rouge">FieldInfo::isConst</code> flag is <code class="language-plaintext highlighter-rouge">true</code>.</p>
  </li>
  <li>
    <p>Read-modify-write functionality.
Proxies provide a <code class="language-plaintext highlighter-rouge">.alter(f)</code> method which invokes the callback <code class="language-plaintext highlighter-rouge">f</code> with the field’s current value
and updates the field’s value with the callback’s return value.</p>
  </li>
  <li>
    <p>Cloning with override.
Proxies provide a <code class="language-plaintext highlighter-rouge">.with()</code> method that behaves much like <code class="language-plaintext highlighter-rouge">.alter()</code>
except that it returns a modified <em>copy</em> of the word.
<code class="language-plaintext highlighter-rouge">.with()</code> can also be given a new value directly, rather than a callback.</p>
  </li>
  <li>
    <p>Comparison.
Proxies overload the <code class="language-plaintext highlighter-rouge">&lt;=&gt;</code> operator,
and can be compared to other proxies (of compatible types) or values of the field’s type.</p>
  </li>
</ul>

<p>Continuing our example, the code that uses this bitpack-flavored definition looks like…</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="nf">test</span><span class="p">(</span><span class="k">volatile</span> <span class="n">NuDevice</span> <span class="o">&amp;</span><span class="n">device</span><span class="p">)</span>
<span class="p">{</span>
	<span class="n">device</span><span class="p">.</span><span class="n">control</span><span class="p">.</span><span class="n">enable</span><span class="p">()</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>

	<span class="n">device</span><span class="p">.</span><span class="n">control</span><span class="p">.</span><span class="n">alter</span><span class="p">([](</span><span class="k">auto</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">c</span><span class="p">.</span><span class="n">reference</span><span class="p">()</span> <span class="o">=</span> <span class="n">MYDEVICE_CONTROL_REFERENCE_V_EXTERNAL_A</span><span class="p">;</span>
		<span class="n">c</span><span class="p">.</span><span class="n">bias</span><span class="p">()</span>      <span class="o">=</span> <span class="mh">0x7f</span><span class="p">;</span>
		<span class="k">return</span> <span class="n">c</span><span class="p">;</span>
	<span class="p">});</span>

	<span class="n">device</span><span class="p">.</span><span class="n">control</span><span class="p">.</span><span class="n">enable</span><span class="p">()</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>

	<span class="k">while</span> <span class="p">(</span><span class="o">!</span><span class="n">device</span><span class="p">.</span><span class="n">control</span><span class="p">.</span><span class="n">read</span><span class="p">().</span><span class="n">active</span><span class="p">())</span>
	<span class="p">{</span>
		<span class="n">yield</span><span class="p">();</span>
	<span class="p">}</span>

	<span class="k">return</span> <span class="n">device</span><span class="p">.</span><span class="n">data</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This compiles to the same code as the <code class="language-plaintext highlighter-rouge">test(volatile RawDevice &amp;)</code> implementation above
(and avoids the extra write that we had to introduce for <code class="language-plaintext highlighter-rouge">test(volatile BFDevice &amp;)</code>).</p>

<p>Proxy assignments and <code class="language-plaintext highlighter-rouge">.alter()</code> methods are defined for <code class="language-plaintext highlighter-rouge">volatile</code> bitpack references
(such as <code class="language-plaintext highlighter-rouge">device.control</code> in this example, because the containing <code class="language-plaintext highlighter-rouge">NuDevice</code> is <code class="language-plaintext highlighter-rouge">volatile</code>),
and each such performs a read-modify-write operation on the underlying word.
Proxy are more picky about field <em>getter</em> operations,
and will refuse to operate on <code class="language-plaintext highlighter-rouge">volatile</code> bitpack references;
the bitpack itself offers a <code class="language-plaintext highlighter-rouge">.read()</code> method
that returns a non-<code class="language-plaintext highlighter-rouge">volatile</code> <em>snapshot</em> of the (<code class="language-plaintext highlighter-rouge">volatile</code>) bitpack.
This serves to keep accesses explicit in the source
and allows for clearly differentiating multiple tests of a single read snapshot
from many tests each performing its own read of field value(s).</p>

<h3 id="compiler-error-messages">Compiler Error Messages</h3>

<p>The compiler error messages for attempts to get a value from a <code class="language-plaintext highlighter-rouge">volatile</code> bitpack are,
admittedly, nothing short of unfortunate.
The best (well, most informative) case happens if we replace
<code class="language-plaintext highlighter-rouge">while (!device.control.read().active())</code> with <code class="language-plaintext highlighter-rouge">while(!device.control.active().raw())</code>
and looks something like this:</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">sdk/include/bitpack.hh:657:12: error: no matching conversion for static_cast from 'const Bitpack&lt;...&gt;</span>::Field&lt;bool, FieldInfo<span class="o">{</span>...<span class="o">}&gt;</span>::Proxy&lt;...&gt;<span class="s1">' to '</span>Field::Type<span class="s1">' (aka '</span>bool<span class="s1">')
</span><span class="c">...
</span><span class="gp">./tmp/bitpack-blog.cc:172:34: note: in instantiation of member function 'Bitpack&lt;...&gt;</span>::Field&lt;bool, FieldInfo<span class="o">{</span>...<span class="o">}&gt;</span>::Proxy&lt;...&gt;::raw<span class="s1">' requested here
</span><span class="go">  172 |         while (!device.control.active().raw())
      |                                         ^
./sdk/include/bitpack.hh:649:14: note: candidate template ignored: constraints not satisfied [with Self = ...]
</span><span class="c">...
</span><span class="gp">./sdk/include/bitpack.hh:648:10: note: because '!std::is_volatile_v&lt;...&gt;</span><span class="s1">' evaluated to false
</span><span class="c">...
</span></code></pre></div></div>
<p>only with a lot more C++ template and Bitpack internal goo in place of those ellipses.
At least it points at <code class="language-plaintext highlighter-rouge">is_volatile</code> as the problem!</p>

<p>Sometimes, if C++ is attempting further automagic (say, searching for implicit conversions) which fails,
because we instead wrote <code class="language-plaintext highlighter-rouge">while(!device.control.active())</code> (note the missing <code class="language-plaintext highlighter-rouge">.raw()</code>),
the compiler will not actually emit such diagnostics of the root cause and will instead
offer more mysterious reports like</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">./tmp/bitpack-blog.cc:172:9: error: invalid argument type 'typename Field&lt;bool, FieldInfo{...}&gt;</span>::Proxy&lt;...&gt;<span class="s1">'
</span><span class="gp">      (aka 'Proxy&lt;...&gt;</span><span class="s1">'</span><span class="o">)</span> to unary expression
<span class="go">  172 |         while (!device.control.active())
      |                ^~~~~~~~~~~~~~~~~~~~~~~~
</span></code></pre></div></div>
<p>This latter report is particularly confusing because it fails to point at <code class="language-plaintext highlighter-rouge">volatile</code> as a possible cause.
Ideas for improving error messages are <em>eagerly</em> accepted.</p>

<h2 id="named-members-with-custom-global-types">Named Members with Custom Global Types</h2>

<p>Much of the bitpack machinery is an attempt to encourage the use of fields with <em>custom types</em>.
For example, we might give a name to the “reference” enumeration for our device, thus:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">enum</span> <span class="k">class</span> <span class="nc">MyDeviceControlReference</span> <span class="o">:</span> <span class="kt">uint8_t</span>
<span class="p">{</span>
	<span class="n">Internal</span>  <span class="o">=</span> <span class="mi">0</span><span class="p">,</span>
	<span class="n">ExternalA</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span>
	<span class="n">ExternalB</span> <span class="o">=</span> <span class="mi">3</span><span class="p">,</span>
<span class="p">};</span>
</code></pre></div></div>

<p>We can take advantage of this type by changing the <code class="language-plaintext highlighter-rouge">reference</code> member in our bitpack:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">NeDevice</span>
<span class="p">{</span>
	<span class="k">struct</span> <span class="nc">Control</span> <span class="o">:</span> <span class="n">Bitpack</span><span class="o">&lt;</span><span class="kt">uint32_t</span><span class="o">&gt;</span>
	<span class="p">{</span>
		<span class="n">BITPACK_USUAL_PREFIX</span><span class="p">;</span>

		<span class="n">BITPACK_MEMBER_ADD</span><span class="p">(</span><span class="n">enable</span><span class="p">,</span> <span class="kt">bool</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span> <span class="c1">// NoGo is false / Go is true</span>

		<span class="n">BITPACK_MEMBER_ADD</span><span class="p">(</span><span class="n">active</span><span class="p">,</span> <span class="kt">bool</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="nb">true</span><span class="p">);</span> <span class="c1">// Off / Running</span>

		<span class="n">BITPACK_MEMBER_ADD</span><span class="p">(</span><span class="n">reference</span><span class="p">,</span> <span class="n">MyDeviceControlReference</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">);</span>

		<span class="n">BITPACK_MEMBER_ADD</span><span class="p">(</span><span class="n">bias</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">23</span><span class="p">);</span>
	<span class="p">}</span> <span class="n">control</span><span class="p">;</span>

	<span class="kt">uint32_t</span> <span class="n">data</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>The corresponding use code changes very little,
but introduces our first <em>use-side</em> bitpack macro, <code class="language-plaintext highlighter-rouge">BITPACK_OPERATE_QUALIFY</code>:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="nf">test</span><span class="p">(</span><span class="k">volatile</span> <span class="n">NeDevice</span> <span class="o">&amp;</span><span class="n">device</span><span class="p">)</span>
<span class="p">{</span>
	<span class="n">device</span><span class="p">.</span><span class="n">control</span><span class="p">.</span><span class="n">enable</span><span class="p">()</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>

	<span class="n">device</span><span class="p">.</span><span class="n">control</span><span class="p">.</span><span class="n">alter</span><span class="p">([](</span><span class="k">auto</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">BITPACK_OPERATE_QUALIFY</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">reference</span><span class="p">(),</span> <span class="o">=</span><span class="p">,</span> <span class="n">ExternalA</span><span class="p">);</span>
		<span class="n">c</span><span class="p">.</span><span class="n">bias</span><span class="p">()</span> <span class="o">=</span> <span class="mh">0x7f</span><span class="p">;</span>
		<span class="k">return</span> <span class="n">c</span><span class="p">;</span>
	<span class="p">});</span>

	<span class="n">device</span><span class="p">.</span><span class="n">control</span><span class="p">.</span><span class="n">enable</span><span class="p">()</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>

	<span class="c1">// .read() makes read of volatile word clear in source</span>
	<span class="k">while</span> <span class="p">(</span><span class="o">!</span><span class="n">device</span><span class="p">.</span><span class="n">control</span><span class="p">.</span><span class="n">read</span><span class="p">().</span><span class="n">active</span><span class="p">())</span>
	<span class="p">{</span>
		<span class="n">yield</span><span class="p">();</span>
	<span class="p">}</span>

	<span class="k">return</span> <span class="n">device</span><span class="p">.</span><span class="n">data</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Again, this compiles to the same code as the <code class="language-plaintext highlighter-rouge">test(volatile NuDevice &amp;)</code> implementation above.</p>

<p>That <code class="language-plaintext highlighter-rouge">BITPACK_OPERATE_QUALIFY</code> macro invocation lets us avoid the need to manually qualify the <code class="language-plaintext highlighter-rouge">ExternalA</code> name;
C++ name resolution rules would require that we write</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">c</span><span class="p">.</span><span class="n">reference</span><span class="p">()</span> <span class="o">=</span> <span class="n">MyDeviceControlReference</span><span class="o">::</span><span class="n">ExternalA</span><span class="p">;</span>
</code></pre></div></div>
<p>The macro invocation takes advantage of the ability to look at the field’s type at compile time.
It expands, essentially, to</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">c</span><span class="p">.</span><span class="n">reference</span><span class="p">()</span> <span class="o">=</span> <span class="p">(</span><span class="k">decltype</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">reference</span><span class="p">())</span><span class="o">::</span><span class="n">Field</span><span class="o">::</span><span class="n">Type</span><span class="p">)</span><span class="o">::</span><span class="n">ExternalA</span><span class="p">;</span>
</code></pre></div></div>
<p>which is to say</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">c</span><span class="p">.</span><span class="n">reference</span><span class="p">()</span> <span class="o">=</span> <span class="n">MyDeviceControlReference</span><span class="o">::</span><span class="n">ExternalA</span><span class="p">;</span>
</code></pre></div></div>

<h2 id="named-members-with-custom-local-types">Named Members with Custom Local Types</h2>

<p>The same <code class="language-plaintext highlighter-rouge">test</code> code (other than the type of the parameter) also works
(and compiles to the same code)
when we move the reference field’s enumeration type into the <code class="language-plaintext highlighter-rouge">Control</code> structure itself:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">NaDevice</span>
<span class="p">{</span>
	<span class="k">struct</span> <span class="nc">Control</span> <span class="o">:</span> <span class="n">Bitpack</span><span class="o">&lt;</span><span class="kt">uint32_t</span><span class="o">&gt;</span>
	<span class="p">{</span>
		<span class="n">BITPACK_USUAL_PREFIX</span><span class="p">;</span>

		<span class="n">BITPACK_MEMBER_ADD</span><span class="p">(</span><span class="n">enable</span><span class="p">,</span> <span class="kt">bool</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span> <span class="c1">// NoGo is false / Go is true</span>

		<span class="n">BITPACK_MEMBER_ADD</span><span class="p">(</span><span class="n">active</span><span class="p">,</span> <span class="kt">bool</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="nb">true</span><span class="p">);</span> <span class="c1">// Off / Running</span>

		<span class="k">enum</span> <span class="k">class</span> <span class="nc">Reference</span> <span class="o">:</span> <span class="kt">uint8_t</span>
		<span class="p">{</span>
			<span class="n">Internal</span>  <span class="o">=</span> <span class="mi">0</span><span class="p">,</span>
			<span class="n">ExternalA</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span>
			<span class="n">ExternalB</span> <span class="o">=</span> <span class="mi">3</span><span class="p">,</span>
		<span class="p">};</span>
		<span class="n">BITPACK_MEMBER_ADD</span><span class="p">(</span><span class="n">reference</span><span class="p">,</span> <span class="n">Reference</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">);</span>

		<span class="n">BITPACK_MEMBER_ADD</span><span class="p">(</span><span class="n">bias</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">23</span><span class="p">);</span>
	<span class="p">}</span> <span class="n">control</span><span class="p">;</span>

	<span class="kt">uint32_t</span> <span class="n">data</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>With that definition, the <code class="language-plaintext highlighter-rouge">BITPACK_OPERATE_QUALIFY(c.reference(), =, ExternalA)</code> macro invocation
in our <code class="language-plaintext highlighter-rouge">test</code> method still expands to</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">c</span><span class="p">.</span><span class="n">reference</span><span class="p">()</span> <span class="o">=</span> <span class="p">(</span><span class="k">decltype</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">reference</span><span class="p">())</span><span class="o">::</span><span class="n">Field</span><span class="o">::</span><span class="n">Type</span><span class="p">)</span><span class="o">::</span><span class="n">ExternalA</span><span class="p">;</span>
</code></pre></div></div>
<p>but that now means</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">c</span><span class="p">.</span><span class="n">reference</span><span class="p">()</span> <span class="o">=</span> <span class="n">MyDevice</span><span class="o">::</span><span class="n">Control</span><span class="o">::</span><span class="n">Reference</span><span class="o">::</span><span class="n">ExternalA</span><span class="p">;</span>
</code></pre></div></div>

<p>We strongly encourage the use of such local types,
as it uses the device and register structures as namespaces,
minimizing the amount of stuff at global scope and making the purpose of a type clearer.</p>

<h2 id="local-types-as-member-names">Local Types as Member Names</h2>

<p>In fact, fields can often be thought of as being <em>named by their type</em>:
<em>this device’s</em> enable flag is of a different type from <em>that device’s</em>,
even though they’re both active-high single-bit fields,
and similarly for the other fields.
It would be strange, after all, to read configuration values from one device and write them directly to another;
we can ask the C++ type system to help ensure that we don’t accidentally do so.</p>

<p>Towards this end, bitpacks include a fair bit of machinery for introducing a custom type for a particular field
and then using that type <em>as the name</em> of that field.
To make use of this machinery, we first replace (or augment) the <code class="language-plaintext highlighter-rouge">BITPACK_MEMBER_ADD</code> macros above
with <code class="language-plaintext highlighter-rouge">BITPACK_MEMBER_ADD_*</code> macros,
which simultaneously introduce a new type and associate field information with it.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">TyDevice</span>
<span class="p">{</span>
	<span class="k">struct</span> <span class="nc">Control</span> <span class="o">:</span> <span class="n">Bitpack</span><span class="o">&lt;</span><span class="kt">uint32_t</span><span class="o">&gt;</span>
	<span class="p">{</span>
		<span class="n">BITPACK_USUAL_PREFIX</span><span class="p">;</span>

		<span class="n">BITPACK_MEMBER_ADD_ENUM_BOOL</span><span class="p">(</span><span class="n">Enable</span><span class="p">,</span> <span class="n">NoGo</span><span class="p">,</span> <span class="n">Go</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>

		<span class="n">BITPACK_MEMBER_ADD_ENUM_BOOL</span><span class="p">(</span><span class="n">Active</span><span class="p">,</span> <span class="n">Off</span><span class="p">,</span> <span class="n">Running</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="nb">true</span><span class="p">);</span>

		<span class="n">BITPACK_MEMBER_ADD_ENUM</span><span class="p">(</span><span class="n">Reference</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">)</span> <span class="p">{</span>
			<span class="n">Internal</span>  <span class="o">=</span> <span class="mi">0</span><span class="p">,</span>
			<span class="n">ExternalA</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span>
			<span class="n">ExternalB</span> <span class="o">=</span> <span class="mi">3</span><span class="p">,</span>
		<span class="p">};</span>

		<span class="n">BITPACK_MEMBER_ADD_NUMERIC</span><span class="p">(</span><span class="n">Bias</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">23</span><span class="p">);</span>
	<span class="p">}</span> <span class="n">control</span><span class="p">;</span>

	<span class="kt">uint32_t</span> <span class="n">data</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">BITPACK_MEMBER_ADD_ENUM(T, B, ...)</code> defines an <code class="language-plaintext highlighter-rouge">enum class T</code> with base type <code class="language-plaintext highlighter-rouge">B</code>;
the definition of the enumeration follows the macro.
<code class="language-plaintext highlighter-rouge">BITPACK_MEMBER_ADD_ENUM_BOOL(T, VF, VT, bit, ...)</code> is a wrapper thereof,
defining an <code class="language-plaintext highlighter-rouge">enum class T : bool</code> with the enumerator <code class="language-plaintext highlighter-rouge">VF</code> having value <code class="language-plaintext highlighter-rouge">false</code> and <code class="language-plaintext highlighter-rouge">VT</code> <code class="language-plaintext highlighter-rouge">true</code>;
because booleans occupy exactly one bit,
it is not necessary to specify both the minimum and maximum bit positions.
<code class="language-plaintext highlighter-rouge">BITPACK_MEMBER_ADD_NUMERIC(T, B, ...)</code> defines a class <code class="language-plaintext highlighter-rouge">T</code> which inherits from a <code class="language-plaintext highlighter-rouge">Numeric&lt;B&gt;</code> base,
an internal helper class that provides, for example, implicit conversions to and from <code class="language-plaintext highlighter-rouge">B</code>.
(Not all operations on <code class="language-plaintext highlighter-rouge">B</code> are supported by <code class="language-plaintext highlighter-rouge">Numeric&lt;T&gt;</code> and derived classes;
if this proves problematic, do let us know and we’ll see what can be done!)</p>

<p>Rather than having named methods for each field of our bitpack,
we now make use of a singular method template, <code class="language-plaintext highlighter-rouge">.member&lt;T&gt;()</code>,
which returns a proxy of <em>the</em> field whose type is <code class="language-plaintext highlighter-rouge">T</code>.
Because the types are defined within a bitpack type itself,
they often need qualification:</p>

<ul>
  <li>
    <p>The <code class="language-plaintext highlighter-rouge">BITPACK_MEMBER_DECLTYPE(b, T)</code> macro gets the proxy of the field whose type is
<code class="language-plaintext highlighter-rouge">T</code> qualified by the type of the bitpack value <code class="language-plaintext highlighter-rouge">b</code>.
(That is, it expands to <code class="language-plaintext highlighter-rouge">(b).template member&lt;decltype(b)::T&gt;()</code>.)</p>
  </li>
  <li>
    <p>If the type of the bitpack value <code class="language-plaintext highlighter-rouge">b</code> is “dependent” (in the C++ sense),
use the <code class="language-plaintext highlighter-rouge">BITPACK_MEMBER_DEPENDENT(b, T)</code> macro,
which differs only in that it calls for dependent resolution of the type named <code class="language-plaintext highlighter-rouge">T</code>.
(That is, it expands to <code class="language-plaintext highlighter-rouge">(b).template member&lt;typename decltype(b)::T&gt;()</code>.)</p>
  </li>
</ul>

<p>We can make use of macros that combine these and the <code class="language-plaintext highlighter-rouge">OPERATE_QUALIFY</code> form we’ve seen above
when writing code that uses these type-centric bitpack definitions:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="nf">test</span><span class="p">(</span><span class="k">volatile</span> <span class="n">TyDevice</span> <span class="o">&amp;</span><span class="n">device</span><span class="p">)</span>
<span class="p">{</span>
	<span class="n">BITPACK_OPERATE_QUALIFY_DECLTYPE</span><span class="p">(</span><span class="n">device</span><span class="p">.</span><span class="n">control</span><span class="p">,</span> <span class="n">Enable</span><span class="p">,</span> <span class="o">=</span><span class="p">,</span> <span class="n">NoGo</span><span class="p">);</span>

	<span class="n">device</span><span class="p">.</span><span class="n">control</span><span class="p">.</span><span class="n">alter</span><span class="p">([](</span><span class="k">auto</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">BITPACK_OPERATE_QUALIFY_DEPENDENT</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">Reference</span><span class="p">,</span> <span class="o">=</span><span class="p">,</span> <span class="n">ExternalA</span><span class="p">);</span>
		<span class="n">BITPACK_MEMBER_DEPENDENT</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">Bias</span><span class="p">)</span> <span class="o">=</span> <span class="mh">0x7f</span><span class="p">;</span>
		<span class="k">return</span> <span class="n">c</span><span class="p">;</span>
	<span class="p">});</span>

	<span class="n">BITPACK_OPERATE_QUALIFY_DECLTYPE</span><span class="p">(</span><span class="n">device</span><span class="p">.</span><span class="n">control</span><span class="p">,</span> <span class="n">Enable</span><span class="p">,</span> <span class="o">=</span><span class="p">,</span> <span class="n">Go</span><span class="p">);</span>

	<span class="k">while</span> <span class="p">(</span><span class="n">BITPACK_OPERATE_QUALIFY_DECLTYPE</span><span class="p">(</span>
	  <span class="n">device</span><span class="p">.</span><span class="n">control</span><span class="p">.</span><span class="n">read</span><span class="p">(),</span> <span class="n">Active</span><span class="p">,</span> <span class="o">!=</span><span class="p">,</span> <span class="n">Running</span><span class="p">))</span>
	<span class="p">{</span>
		<span class="n">yield</span><span class="p">();</span>
	<span class="p">}</span>

	<span class="k">return</span> <span class="n">device</span><span class="p">.</span><span class="n">data</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This too compiles to the same code as the <code class="language-plaintext highlighter-rouge">test(volatile NuDevice &amp;)</code> implementation above.</p>

<p><code class="language-plaintext highlighter-rouge">BITPACK_OPERATE_QUALIFY_DECLTYPE(b, T, op, v)</code> is just
<code class="language-plaintext highlighter-rouge">BITPACK_OPERATE_QUALIFY(BITPACK_MEMBER_DECLTYPE(b, T), op, v)</code>.
Here, rather than using a method like <code class="language-plaintext highlighter-rouge">.reference()</code> to obtain a field proxy by name
we use <code class="language-plaintext highlighter-rouge">BITPACK_MEMBER_DECLTYPE</code> to use the field’s <em>type</em> as its name and construct the appropriate proxy.
<code class="language-plaintext highlighter-rouge">BITPACK_OPERATE_QUALIFY_DEPENDENT</code>, analogously, uses <code class="language-plaintext highlighter-rouge">BITPACK_MEMBER_DEPENDENT</code> internally.
In much more detail, <code class="language-plaintext highlighter-rouge">BITPACK_OPERATE_QUALIFY_DECLTYPE(c, Reference, =, ExternalA)</code>
can be thought of as expanding to</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">BITPACK_OPERATE_DECLTYPE</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="k">template</span> <span class="n">member</span><span class="o">&lt;</span><span class="n">TyDevice</span><span class="o">::</span><span class="n">Reference</span><span class="p">&gt;(),</span> <span class="o">=</span><span class="p">,</span> <span class="n">ExternalA</span><span class="p">)</span>
</code></pre></div></div>
<p>which then expands to</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">c</span><span class="p">.</span><span class="k">template</span> <span class="n">member</span><span class="o">&lt;</span><span class="n">TyDevice</span><span class="o">::</span><span class="n">Reference</span><span class="p">&gt;()</span> <span class="o">=</span>
  <span class="k">decltype</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="k">template</span> <span class="n">member</span><span class="o">&lt;</span><span class="n">TyDevice</span><span class="o">::</span><span class="n">Reference</span><span class="p">&gt;)</span><span class="o">::</span><span class="n">Field</span><span class="o">::</span><span class="n">Type</span><span class="o">::</span><span class="n">ExternalA</span>
</code></pre></div></div>
<p>which is to say</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">c</span><span class="p">.</span><span class="k">template</span> <span class="n">member</span><span class="o">&lt;</span><span class="n">TyDevice</span><span class="o">::</span><span class="n">Reference</span><span class="p">&gt;()</span> <span class="o">=</span> <span class="n">TyDevice</span><span class="o">::</span><span class="n">Reference</span><span class="o">::</span><span class="n">ExternalA</span>
</code></pre></div></div>
<p>(Internally, the difference between explicitly named members and type-named members
amounts to the former invoking <code class="language-plaintext highlighter-rouge">.member&lt;FieldType, FieldInfo&gt;()</code> while the latter invokes <code class="language-plaintext highlighter-rouge">.member&lt;FieldType&gt;()</code>.
This second form uses some type-level computation machinery
buried inside the <code class="language-plaintext highlighter-rouge">BITPACK_USUAL_PREFIX</code> and <code class="language-plaintext highlighter-rouge">BITPACK_MEMBER_ADD_*</code> macros
to map from a <code class="language-plaintext highlighter-rouge">FieldType</code> to its <code class="language-plaintext highlighter-rouge">FieldInfo</code>.)</p>

<p>We could also have written, instead of <code class="language-plaintext highlighter-rouge">BITPACK_MEMBER_DEPENDENT(c, Bias) = 0x7f</code>,
<code class="language-plaintext highlighter-rouge">BITPACK_OPERATE_WRAP_DEPENDENT(c, Bias, =, 0x7f)</code>.
The former makes use of <code class="language-plaintext highlighter-rouge">Numeric&lt;T&gt;</code>’s implicit conversion from <code class="language-plaintext highlighter-rouge">T</code>, while the latter is more general.
The <code class="language-plaintext highlighter-rouge">BITPACK_OPERATE_WRAP_...</code> macros are similar to the <code class="language-plaintext highlighter-rouge">BITPACK_OPERATE_QUALIFY_...</code>
except that they expand to invoke the indicated type’s constructor, rather than simply qualify the value.
That is, <code class="language-plaintext highlighter-rouge">BITPACK_OPERATE_WRAP(proxy, op, value)</code> expands to <code class="language-plaintext highlighter-rouge">(proxy) op (F{value})</code>,
invoking the <code class="language-plaintext highlighter-rouge">F</code> constructor,
where <code class="language-plaintext highlighter-rouge">F</code> is the type of the field being proxied by <code class="language-plaintext highlighter-rouge">proxy</code> (specifically, <code class="language-plaintext highlighter-rouge">decltype(proxy)::Field::Type</code>).</p>

<h1 id="looking-to-the-future">Looking to the Future</h1>

<p>Bitpacks are a public experiment and, as such, a request for comments.
We are especially open to feedback about use cases that remain un-ergonomic or otherwise challenging.</p>

<p>At present, most of our use of bitpacks has been with manually written bitpack definitions.
In the interest of reducing the number of “sources of truth” that must be kept synchronized,
we have been pondering tools that could <em>generate</em>
bitpack definitions from other register description languages
(especially https://opentitan.org/book/util/reggen/index.html),
or assertions of equivalence between a manually-written bitpack definition and one in another language.
Should there be external interest, we would be keen to collaborate on this front.</p>

<p>Readers might rightly object that bitpacks, and their professed <em>typed</em> perspective of sub-word fields,
lean rather heavily on C preprocessor macros
(which act on the level of source code tokens, well below any notion of types).
We hope that C++’s reflection machinery (to first appear in C++26)
should allow many of these macros to go away once it’s a bit more complete.</p>

<h1 id="conclusion">Conclusion</h1>

<p>This has been a whirlwind tour of the “bitpack” machinery available in the CHERIoT-RTOS programming environment.
Bitpacks are meant to increase legibility, portability, and type-safety of low-level code that manipulates sub-fields of numeric words.
We hope that you find they have achieved their purpose!</p>]]></content><author><name>Nathaniel Wesley Filardo</name></author><category term="rtos" /><category term="programming" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Removing the AUICGP instruction</title><link href="https://cheriot.org/isa/toolchain/2026/03/23/removing-auicgp.html" rel="alternate" type="text/html" title="Removing the AUICGP instruction" /><published>2026-03-23T00:00:00+00:00</published><updated>2026-03-23T00:00:00+00:00</updated><id>https://cheriot.org/isa/toolchain/2026/03/23/removing-auicgp</id><content type="html" xml:base="https://cheriot.org/isa/toolchain/2026/03/23/removing-auicgp.html"><![CDATA[<blockquote>
  <p>Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away</p>

  <p>– Antoine de Saint-Exupéry</p>
</blockquote>

<p>In the last week or so, we’ve started landing patches in our compiler to remove the <code class="language-plaintext highlighter-rouge">auicgp</code> instruction.
This is something that we’d hoped to do for a while because we want to remove the instruction entirely in CHERIoT v2 (to be based on the upcoming RV32YE standard).
This post will explain why we had it, why we wanted to get rid of it, and how we’ve removed the need for it.</p>

<h2 id="what-is-auicgp">What is auicgp?</h2>

<p>In CHERIoT, every global is either accessed relative to the program counter (PC) for code or immutable globals, or the global pointer (GP) for read-write globals.
On RISC-V, PC-relative addressing is done via the <code class="language-plaintext highlighter-rouge">auipc</code> instruction, which takes a 20-bit immediate, left shifts it by 12, and adds it to the PC.
This is used paired with a load, store, or add (if you just want to materialise the address as a pointer) to fill in the remaining bits.</p>

<p>For symmetry with base RISC-V (more detail on this later), we introduced a similar instruction that added a shifted immediate to the global pointer: add upper immediate to (capability) global pointer (<code class="language-plaintext highlighter-rouge">auicgp</code>).
Originally, because we’re based on RV32E, we used one of the bits in the destination-register field to differentiate this from <code class="language-plaintext highlighter-rouge">auipcc</code> instruction, so it fitted into some spare encoding space.</p>

<p>In the initial public release, we changed this because RISC-V promises never to use that bit of encoding space for standard extensions and we hoped to standardise it.
Since then, the <code class="language-plaintext highlighter-rouge">auicgp</code> instruction has been the least popular bit of the CHERIoT ISA, for a variety of reasons.</p>

<p>First, it’s an enormous instruction.
With a register target and a 20-bit immediate, it uses a full major opcode.
That’s 1/128 of the total encoding space.
In practice, it’s worse than that, because the RISC-V encoding requires 32-bit instructions to start 11 (01, 10, and 00 are reserved for 16-bit encodings), it’s actually 1/32 of the space available for 32-bit instructions.
That’s a lot to pay, though it might be worth it if the instruction were used a lot.</p>

<p>Unfortunately, <code class="language-plaintext highlighter-rouge">auicgp</code> is very rarely used.
Our ABI points the global pointer to the <em>middle</em> of the globals region, which means that we can use the entire 4 KiB range of the 12-bit immediate in loads, stores, and adds.
On RISC-V, the normal model is for the compiler to emit a long instruction sequence for operations like global accesses and then the linker to <em>relax</em> this by deleting instructions.
We followed this pattern for <code class="language-plaintext highlighter-rouge">auicgp</code>, with the linker deleting it if the immediate is 0.
As a result, almost all of the <code class="language-plaintext highlighter-rouge">auicgp</code> instructions were deleted.
Our original linker version didn’t implement support for relaxations (upstream LLD didn’t support them for RISC-V when we started working on CHERIoT!) and so we weren’t initially sure how feasible deleting them would be, but we’ve had linker relaxations working for a few years now and they work very well.</p>

<p>The RTOS test suite contains a few <code class="language-plaintext highlighter-rouge">auicgp</code>s, but only because we’ve intentionally written tests to make sure that large globals work.
In most firmware images, there are no instances of the instruction that survive relaxation.
So we’re using 1/32 of the encoding space for 32-bit instructions for an instruction that, to a first approximation, never gets used.</p>

<p>If that isn’t bad enough, it’s also annoying for simple pipelines to implement.
RISC-V is designed so that the source and destination registers are always in the same place in instructions.
Once you move to long pipelines, have register rename, or extensions with multiple register files (such as floating-point or vector extensions) then this is of little or no benefit.
But CHERIoT targets the kind of microcontroller design that RISC-V overfitted for.
Simple in-order pipelines find it useful to do register fetch early.
The <code class="language-plaintext highlighter-rouge">auicgp</code> instruction is the only one that has fetches from the general-purpose register file but does not have the registers in the correct location.
This adds extra multiplexing that, on a short in-order pipeline such as CHERIoT Ibex, impacts the critical-path length.</p>

<p>So, unfortunately, there is no situation in which this instruction is a good idea, except that it made it easy to create dense code via largely orthogonal code paths.
To understand why we incorporated it, you need to understand how RISC-V normally generates pointers to globals.</p>

<h2 id="why-did-we-have-auicgp">Why did we have auicgp?</h2>

<p>Let’s start with the simple case in RISC-V, of a static binary (how you’d normally compile embedded firmware) accessing a global.
A simple sequence loading an <code class="language-plaintext highlighter-rouge">int</code> from a global called <code class="language-plaintext highlighter-rouge">x</code> would look like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>	lui     a0, %hi(x)
	lw      a0, %lo(x)(a0)
</code></pre></div></div>

<p>This has two relocations.
The load upper immediate instruction will materialise the top 20 bits of the address via the <code class="language-plaintext highlighter-rouge">hi</code> relocation, then the load will materialise the low 12 bits in its immediate from the <code class="language-plaintext highlighter-rouge">lo</code> relocation.
This is very efficient, but doesn’t work on a CHERI system because you can’t just make up an integer address and use it for loads and stores.</p>

<p>That’s not a big problem, because it’s the same issue that position-independent code needs to handle, and RISC-V does this with the same number of instructions.
When you’re compiling position-independent code, RISC-V will instead generate the following sequence:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.Lpcrel_hi0:
	auipc   a0, %pcrel_hi(x)
	lw      a0, %pcrel_lo(.Lpcrel_hi0)(a0)
</code></pre></div></div>

<p>Again, this materialises the address in two instructions, including folding half of the address-materialisation into the load.
The difference is that the high bits are not created as an absolute address, they are created by adding a value to the current PC value.
The <code class="language-plaintext highlighter-rouge">%pcrel_lo</code> relocation looks a bit odd because it refers to the address of the <code class="language-plaintext highlighter-rouge">auipc</code> instruction, <em>not</em> the address of <code class="language-plaintext highlighter-rouge">x</code>.
This is because of limitations in ELF relocations.
The linker needs to know both the address of the <code class="language-plaintext highlighter-rouge">auipc</code> instruction (which doesn’t <em>have to be</em> next to the <code class="language-plaintext highlighter-rouge">lw</code>, and might not be if the <code class="language-plaintext highlighter-rouge">lw</code> is in a loop) and of the target, but ELF relocations can’t express both of these.
Fortunately, the linker can work around this by looking up the relocations that apply to the address of the <code class="language-plaintext highlighter-rouge">auipc</code> instruction to find the target.</p>

<p>On CHERIoT, accessing read-only globals works in <em>exactly</em> the same way.
They are stored within the bounds of the program counter for the current compartment (or library) and so we can compute a displacement and load.
Read-write globals are instead stored within a region reachable via the global pointer.
These capabilities don’t overlap, and they have distinct permissions.</p>

<p>This is where we encounter the first problem.
Most of the time, the compiler knows whether a global is read-only or not and so knows whether it needs to emit a sequence relative to the program counter or the global pointer.
This is not universally true.
In particular, C inherited a notion of ‘common linkage’ from Fortran and this makes it possible for different compilation units to <em>disagree</em> on whether something is read-only and leaves it for the linker to figure it out.</p>

<p>For read-only globals, the sequence looks like this (note: the <code class="language-plaintext highlighter-rouge">ct.</code> prefix is the official vendor prefix for CHERIoT instructions):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.LBB0_1:
	ct.auipcc    a0, %cheriot_compartment_hi(x)
	ct.clw       a0, %cheriot_compartment_lo_i(.LBB0_1)(a0)
</code></pre></div></div>

<p>But for read-write globals, it looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.LBB0_1:
	ct.auicgp   a0, %cheriot_compartment_hi(x)
	ct.clw      a0, %cheriot_compartment_lo_i(.LBB0_1)(a0)
</code></pre></div></div>

<p>This hopefully makes it clear how the <code class="language-plaintext highlighter-rouge">auicgp</code> instruction is useful.
These two sequences look identical and so the linker can rewrite one to the other by simply rewriting the opcode of the first instruction.
If the compiler tries to do PC-relative addressing for a read-write global, the compiler can turn the <code class="language-plaintext highlighter-rouge">auipcc</code> into an <code class="language-plaintext highlighter-rouge">auicgp</code>, or vice versa.</p>

<p>For read-write globals, the linker can also do <em>relaxation</em>.
In the (common, almost universal) case that the address of <code class="language-plaintext highlighter-rouge">x</code> is within the displacement from the global pointer of a single instruction, the second sequence becomes this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>	ct.clw	a0, {displacement of x from the global pointer}(c)
</code></pre></div></div>

<p>Note that this is now <em>fewer</em> instructions than the baseline RISC-V case.
We hit this in a lot of cases, because a lot of global accesses are simple reads and writes of scalar values.
That’s not universally true.
If the address of the global escapes the analyses that the compiler uses to prove that a dereference is in-bounds (which are currently fairly naïve, but will improve over time), then we must materialise a full capability.</p>

<p>This is why it’s important to design a CHERI ABI along with a threat model.
Our threat model requires that any compartment be able to enforce a list of memory-safety properties <em>against untrusted code</em>, but treats things like memory safety for stack or global variables <em>within</em> a compartment as defence-in-depth properties.
Code within a compartment is allowed to access all globals for that compartment and so the threat model says it’s fine to make the compiler responsible for enforcing those bounds.
But when a pointer to a global is passed to another compartment (possibly indirectly via another compilation unit, maybe in a different language, in the same compartment) then the compiler taking the address of that global <em>must</em> be able to apply bounds and permissions.</p>

<p>If we are taking the address of <code class="language-plaintext highlighter-rouge">x</code>, the compiler will generate a sequence like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.LBB0_1:
	ct.auicgp       a0, %cheriot_compartment_hi(x)
	ct.cincoffset   a0, a0, %cheriot_compartment_lo_i(.LBB0_1)
	ct.csetbounds   a0, a0, %cheriot_compartment_size(x)
</code></pre></div></div>

<p>Note that now the second part of the address isn’t folded into the load instruction.
This doesn’t hurt code size as much as you might think, because now we’re likely to do offset-zero loads and stores on the result and RISC-V lets us express these as 16-bit instructions.</p>

<p>The new instruction is applying the bounds.
As before, the linker will <em>normally</em> relax away the first instruction, so we end up with this sequence for taking the address of a global:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>	ct.cincoffset   a0, a0, {offset of x}(gp)
	ct.csetbounds   a0, a0, {size of x}(x)
</code></pre></div></div>

<p>Again, this is the same number of instructions as baseline RISC-V, but now with bounds applied.
For comparison, here is the RISC-V version:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>	auipc   a0, %pcrel_hi(x)
	addi    a0, a0, %pcrel_lo(.Lpcrel_hi0)
</code></pre></div></div>

<p>There is one annoying case though: what happens if the bounds are too large for an immediate?
Here, we need an extra instruction (and register) to hold the immediate.
This means that our worst case (large bounds, address taken or not provably in bounds) is five instructions (if it takes two instructions to materialise the immediate for the bounds), much worse than our best case of one.</p>

<p>The approach taken on the ABI for big CHERI is to use an indirection layer, still referred to as a global offset table (GOT), even though it doesn’t contain offsets.
This mirrors the baseline RISC-V GOT model, where the same sequence would be:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        auipc   a0, %got_pcrel_hi(x)
        lw      a0, %pcrel_lo(.Lpcrel_hi0)(a0)
</code></pre></div></div>

<p>At two instructions, this looks cheap, but note that the second is a load, which is loading the address from the GOT.
This means that, although it’s only eight bytes of instructions, it’s <em>also</em> one pointer’s worth of data (eight more bytes for CHERIoT, sixteen for big CHERI).
The second instruction is accessing memory (even on CHERIoT systems with local SRAM, memory-access instructions are typically slower than ALU instructions).</p>

<p>This is worse than our common case, but is often better than our worst case.</p>

<h2 id="the-new-model">The new model</h2>

<p>In the new approach, any global that isn’t reachable via a short displacement from the global pointer (i.e. things that would need <code class="language-plaintext highlighter-rouge">auicgp</code>) is turned into a GOT-relative access.</p>

<p>This also serves as a building block for other optimisations.
The three-instruction sequence is smaller than the GOT-relative access <em>if a global is accessed only once</em>, but is larger if the same global has its address taken three or more times.
The linker has full visibility into the number of accesses to a global and so can make this decision.</p>

<p>Similarly, globals that are too big to have their bounds applied in a single instruction will be able to fall back to the GOT mechanism.
These are rare but they do occur for things like frame buffers.
These are close to break even for a single use.
For example, imagine if <code class="language-plaintext highlighter-rouge">x</code> is an array of 1028 <code class="language-plaintext highlighter-rouge">int</code>s.
Taking the address is currently this (long!) sequence:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>	lui             a0, 1
	addi            a0, a0, 16
.LBB0_1:
	ct.auicgp       a1, %cheriot_compartment_hi(x)
	ct.cincoffset   a1, a1, %cheriot_compartment_lo_i(.LBB0_1)
	ct.csetbounds   a0, a1, a0
</code></pre></div></div>

<p>Assuming that the <code class="language-plaintext highlighter-rouge">auicgp</code> is relaxed away, this becomes:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>	lui             a0, 1
	addi            a0, a0, 16
	ct.cincoffset   a1, a1, {offset of x}(gp)
	ct.csetbounds   a0, a1, a0
</code></pre></div></div>

<p>This is four instructions, but both the <code class="language-plaintext highlighter-rouge">lui</code> and <code class="language-plaintext highlighter-rouge">addi</code> will, in this case, be the 16-bit variants, so this is 12 instructions: smaller than the GOT-relative sequence plus the GOT entry.
This sequence also requires an additional register, and removing that requirement can improve code generation in the rest of the function.</p>

<p>To support this, we’ve introduced a new <code class="language-plaintext highlighter-rouge">ct.auipcc.data</code> pseudo-instruction that emits padding nop that is normally relaxed away, but provides space for the linker to emit the longest sequence.
The sequence the compiler will emit looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.LBB4_1
	ct.auipcc.data  s0, %cheriot_compartment_data_hi(x)
	ct.cincoffset   s0, s0, %cheriot_compartment_lo_i(.LBB4_1)
	ct.csetbounds   s0, s0, %cheriot_compartment_size(x)
	ct.cincoffset   a0, s0, a0
</code></pre></div></div>

<p>The linker can then relax this to any of the sequences described above, almost always to something much shorter.</p>

<p>The extra padding is needed only in one specific case, but emitting it in all cases that may use the global pointer simplifies the linker logic.
If the compiler statically knows that the access is in bounds, it will emit the short two-instruction sequence of <code class="language-plaintext highlighter-rouge">auipcc</code> followed by a load or store.
This is two 32-bit instructions (8 bytes total).
If the global is large, this must be transformed into a GOT load followed by a load or store, which is three instructions, two of which might be 16-bit, but in the worst case all three will need the full range of the 32-bit variants.
The linker always removes the nop but may use the space that it reserved for the longer sequence, in this one corner case.</p>

<h2 id="a-final-note">A final note</h2>

<p>CHERIoT benefits enormously from having co-designed the ISA, ABI, and programmer model.
The <a href="https://github.com/CHERIoT-Platform/cheriot-audit"><code class="language-plaintext highlighter-rouge">cheriot-audit</code> tool</a> makes it easy to reason about the security of a firmware image.
This tool is possible because we defined our ABI so that everything that a compartment accesses outside of its code and globals region is explicit (in the programmer model).</p>

<p>Several of these optimisations are possible only because we took a holistic view to system design.
If, for example, we allowed globals from other compartments to be transparently accessed then this would have required a GOT approach from the start and would also have made reasoning about communication between compartments much harder.
Instead, we have an explicit notion of a pre-shared object, which must come via a compartment’s import table and so is exposed to auditing.
By making the import explicit, we can also go beyond core C features and make the <em>permissions</em> explicit, so we can see by reading the source code, and confirm with <code class="language-plaintext highlighter-rouge">cheriot-audit</code> that exactly what permissions a compartment has on a shared object.</p>

<p>Decades of prior work have shown us that security policies divorced from the source code are hard to reason about and often get out of sync with the implementation.
When they get out of sync, people notice when components have too few permissions (they stop working) but not when they have too many (until an attacker notices).</p>

<p>For CHERIoT, we had an explicit design goal that you should be able to reason about the security of your code by reading the source and do coarser-grained auditing of <em>other people’s code</em> with additional tooling.
Whether a function is in your compartment or elsewhere is explicit in the source.
Whether a global is uniquely owned by you or a pre-shared object is explicit in the source.
The ABI we co-designed with the programmer model to enable this and it lets us emit very short instruction sequences in the common case, while falling back to larger ones to handle corner cases.</p>]]></content><author><name>David Chisnall</name></author><category term="isa" /><category term="toolchain" /><summary type="html"><![CDATA[Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away – Antoine de Saint-Exupéry]]></summary></entry><entry><title type="html">First CHERIoT Silicon!</title><link href="https://cheriot.org/silicon/2026/03/04/cheriot-first-silicon.html" rel="alternate" type="text/html" title="First CHERIoT Silicon!" /><published>2026-03-04T00:00:00+00:00</published><updated>2026-03-04T00:00:00+00:00</updated><id>https://cheriot.org/silicon/2026/03/04/cheriot-first-silicon</id><content type="html" xml:base="https://cheriot.org/silicon/2026/03/04/cheriot-first-silicon.html"><![CDATA[<p><img alt="ICENI on a development board" width="80%" style="margin-left:auto;margin-right:auto;display:block" src="/images/2026-03-04 iceni.png" /></p>

<p>Most CHERIoT work to date has been done on software or FPGA simulations.
We have several such implementations: The executable model built from our <a href="https://github.com/CHERIoT-Platform/cheriot-sail">formal ISA specification</a>, the <a href="https://mpact.googlesource.com/mpact-cheriot/">MPact simulator from Google</a>, <a href="https://github.com/microsoft/cheriot-safe">Microsoft’s CHERIoT SAFE FPGA target for the Arty A7</a>, and of course lowRISC’s beautiful <a href="https://www.mouser.co.uk/new/newae-technology/newae-sonata-one-dev-board">Sonata FPGA board, which is designed to simulate CHERIoT systems</a>.
These were always intended to be developing and prototyping systems, so I’m delighted to announce that SCI Semiconductor has the first silicon CHERIoT implementation.</p>

<p>[ Conflict disclaimer: I am a co-founder of SCI Semiconductor. ]</p>

<p>The dev board pictured above contains one of the first batch of ICENI chips to come back from the fab.
This is a complete CHERIoT system, with all of the core CHERI properties (spatial memory safety, no pointer injection, and so on) along with all of the CHERIoT extensions that provide deterministic use-after-free protection, auditable control over interrupt state, and everything that we need for an aggressively compartmentalised RTOS.</p>

<p>This chip uses the CHERIoT Ibex core, running at up to 250 MHz, and includes a few feature that accelerate temporal safety, improve interrupt determinism, and so on.
These build on top of all of the benefits of any CHERIoT implementation: deterministic mitigation of memory safety bugs from simple buffer overflows up to use-after-free, fine-grained compartmentalisation, and a programming model co-designed with both the ISA and the software stack to provide a tiny TCB.
Anything that works on CHERIoT SAFE or Sonata should be very easy to port to ICENI for production use.
Anything that runs on the software simulators should just work.</p>

<p>We’ll be showing the chips at <a href="https://www.embedded-world.de/en">Embedded World (Stand 4A - 131)</a> next week and at <a href="https://cheri-alliance.org/events/cheri-blossoms-conference-2026/">CHERI Blossoms</a> a couple of weeks later.
From tomorrow, one will also be on display in the CHERI 15th anniversary exhibit in the Cambridge Computer Laboratory.</p>

<p>Aside: The <a href="https://en.wikipedia.org/wiki/Iceni">Iceni tribe</a> were one of the pre-Roman tribes in Britain and are famous for their chariots (though more due to <a href="https://en.wikipedia.org/wiki/Boadicea_and_Her_Daughters">this statue</a> than historical fact).
I am only partially to blame for the bad puns in the naming.</p>]]></content><author><name>David Chisnall</name></author><category term="silicon" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">CHERIoT vs the top 25 CWEs</title><link href="https://cheriot.org/cwes/2026/02/04/cheriot-top-25-cwes.html" rel="alternate" type="text/html" title="CHERIoT vs the top 25 CWEs" /><published>2026-02-04T00:00:00+00:00</published><updated>2026-02-04T00:00:00+00:00</updated><id>https://cheriot.org/cwes/2026/02/04/cheriot-top-25-cwes</id><content type="html" xml:base="https://cheriot.org/cwes/2026/02/04/cheriot-top-25-cwes.html"><![CDATA[<p>Each year, MITRE publishes a list of the top 25 most dangerous software weaknesses.
The <a href="https://cwe.mitre.org/top25/archive/2025/2025_cwe_top25.html">2025 list</a> is interesting reading.
Let’s see how CHERIoT fares against them.</p>

<p>The top three (CWEs <a href="https://cwe.mitre.org/data/definitions/79.html">79</a>, <a href="https://cwe.mitre.org/data/definitions/89.html">89</a>, and <a href="https://cwe.mitre.org/data/definitions/352.html">352</a>) are not typically applicable on embedded platforms.
Two are cross-site things that apply to web applications, one is SQL injection.</p>

<p>At position 4, we have <a href="https://cwe.mitre.org/data/definitions/862.html">CWE-862</a>, missing authorisation.
This is not something that’s automatically mitigated by CHERIoT, but the design of CHERIoT RTOS and the programming model that we expose makes it easy to write code that avoids this kind of issue.
The CHERIoT pattern for any operation that you do on behalf of another compartment is to require an authorising capability.
For example, if you allocate memory, you must present an allocation capability that encapsulates your right to allocate memory (and your quota).
If you want to create a socket, you must present a capability that authorises you to bind to a specific port (for server sockets) or connect to a specific remote host.
The same applies for dynamically created things, such as sockets themselves, message-queue endpoints, and so on.
If you forget to authorise something, it will not have the capability to perform the action and the action will fail.
This is a general property of capability systems and not something specific to CHERIoT.</p>

<p>Position 5 is <a href="https://cwe.mitre.org/data/definitions/787.html">CWE-787</a>, out-of-bounds write, also known as a buffer overflow.
This one is deterministically mitigated by any CHERI platform.</p>

<p>Technically, CHERIoT is not vulnerable to the path-traversal bugs in position 6 (<a href="https://cwe.mitre.org/data/definitions/22.html">CWE-22</a>), but only because we don’t yet ship a filesystem.
But, again, this kind of issue is a solved problem in capability systems.
<a href="https://www.cl.cam.ac.uk/research/security/capsicum/freebsd.html">Capsicum</a>, for example, eliminates this kind of vulnerability and I’d expect our filesystem APIs to follow a similar shape.
There’s no excuse for writing APIs that are vulnerable to path traversal in the 2020s.</p>

<p>The next two are good old-fashioned memory-safety vulnerabilities.
Position 7 is use after free (<a href="https://cwe.mitre.org/data/definitions/416.html">CWE-416</a>), and 8 is out-of-bounds read (<a href="https://cwe.mitre.org/data/definitions/125.html">CWE-125</a>).
The latter is mitigated by any CHERI platform.
The former is usually made unexploitable by CHERI systems, and is deterministically mitigated by CHERIoT.</p>

<p>The next two are unlikely to apply to embedded platforms.
<a href="https://cwe.mitre.org/data/definitions/78.html">CWE-78</a> at position 9 is largely to do with failing to validate dynamically created command lines that you pass to a shell.
Then <a href="https://cwe.mitre.org/data/definitions/94.html">CWE-94</a> (Improper Control of Generation of Code) at position 10 is typically introduced with scripting languages producing output that can be influenced by an attacker and is then executed by another interpreter, a rare situation on embedded devices.</p>

<p>Position 11 (<a href="https://cwe.mitre.org/data/definitions/120.html">CWE-120</a>) is a ‘Classic Buffer Overflow’, i.e. something that CHERI deterministically mitigates.
Not to be confused with the buffer overflow we had at position 5.</p>

<p>The 12th entry is another that’s rare on embedded devices.
<a href="https://cwe.mitre.org/data/definitions/434.html">CWE-434</a> relates to unrestricted uploads of dangerous file types, something that matters a lot to web apps and far less to other classes of program.</p>

<p>Next, position 13, is a null-pointer dereference where a valid pointer was expected (<a href="https://cwe.mitre.org/data/definitions/476.html">CWE-476</a>).
CHERI guarantees that this will trap (even if an attacker can provide arbitrary offsets to the null pointer) and CHERIoT makes this a recoverable error either via our <a href="https://cheriot.org/rtos/errors/2024/09/20/error-handling.html">scoped error handlers</a> or by simply unwinding the compartment to the caller.</p>

<p>Buffer overflows seem to be popular and position 14 is the third instance of this kind to make the list, this time on the stack (<a href="https://cwe.mitre.org/data/definitions/121.html">CWE-121</a>).
Again, this will deterministically trap on any CHERI platform.</p>

<p>The next entry is more interesting.
Unsafe deserialisation of untrusted data (<a href="https://cwe.mitre.org/data/definitions/502.html">CWE-502</a>) is something a lot of people get wrong.
<a href="https://cheriot.org/security/philosophy/2024/07/30/configuration-management.html">Phil Day wrote about how to do this safely a couple of years ago</a>.
Lightweight compartmentalisation makes it easy to limit the scope of damage that this kind of bug can do, to almost nothing.</p>

<p>Did I mention that buffer overflows are a recurring theme on this list?
Position 16 (<a href="https://cwe.mitre.org/data/definitions/122.html">CWE-122</a>) is yet another buffer overflow, this time on the heap.
One more that any CHERI platform deterministically mitigates.</p>

<p>Positions 19–21 all relate to incorrect access control at the application layer and, sadly, are not mitigated by CHERI.
Position 24 is similar.</p>

<p>In between these, we have another web app problem (<a href="https://cwe.mitre.org/data/definitions/918.html">CWE-918</a>, server-side request forgery) and another command injection (<a href="https://cwe.mitre.org/data/definitions/77.html">CWE-77</a>).
These are unlikely to be present on embedded devices.</p>

<p>Finally, at position 25, we have a fairly broad category of availability issues that arise from not constraining resource allocation (<a href="https://cwe.mitre.org/data/definitions/770.html">CWE-770</a>).
These are normally mitigated by the software capability layer on CHERIoT.
For example, a compartment can’t allocate memory unless it has a capability that authorises it to do so.
That capability encapsulates a quota and so provides a limit to the total amount of allocation.
Other resources that can be dynamically allocated are normally managed in the same way.</p>

<p>So, what’s the final score card?</p>

<p>Not applicable in embedded contexts: 1, 2, 3, 9, 10, 12, 22, and 23</p>

<p>Deterministically mitigated with just a recompile: 5, 7, 8, 11, 14, and 16.</p>

<p>Mitigated by CHERIoT design patterns and software model: 4, 6, 13, 15, 25.</p>

<p>That still leaves six that we don’t mitigate (17, 18, 19, 20, 21, and 24), but now hopefully the cognitive load is much lower from not having to think about the eleven that we do prevent and you can avoid some of these as well!</p>]]></content><author><name>David Chisnall</name></author><category term="cwes" /><summary type="html"><![CDATA[Each year, MITRE publishes a list of the top 25 most dangerous software weaknesses. The 2025 list is interesting reading. Let’s see how CHERIoT fares against them.]]></summary></entry><entry><title type="html">Post-Quantum Cryptography on CHERIoT</title><link href="https://cheriot.org/pqc/2025/12/12/pqc-on-cheriot.html" rel="alternate" type="text/html" title="Post-Quantum Cryptography on CHERIoT" /><published>2025-12-12T00:00:00+00:00</published><updated>2025-12-12T00:00:00+00:00</updated><id>https://cheriot.org/pqc/2025/12/12/pqc-on-cheriot</id><content type="html" xml:base="https://cheriot.org/pqc/2025/12/12/pqc-on-cheriot.html"><![CDATA[<p>When you tell everyone you’re building a secure platform, the first thing that they ask about is encryption.
And, in 2025, the hot topic in encryption is algorithms that are safe from hypothetical quantum computers that, unlike real ones, can factorise numbers bigger than 31.
These algorithms are referred to as post-quantum cryptography (PQC).
Since NIST standardised a few such algorithms, there’s been a lot more interest in seeing them in production, so I spent some time getting the implementations from the Linux Foundation’s PQ Code Package to run on CHERIoT.
A lot of companies are building hardware to accelerate these operations, so it seemed useful to have a performance baseline on the CHERIoT Ibex, as well as something that can be used in future CHERIoT-based products.</p>

<h2 id="what-are-ml-kem-and-ml-dsa-for">What are ML-KEM and ML-DSA for?</h2>

<p>I am not a mathematician and so I’m not going to try to explain how these algorithms work, but I am going to explain what they’re <em>for</em>.</p>

<p>Module-Lattice-Based Key-Encapsulation Mechanism (ML-KEM) is, as the name suggests, an algorithm for key encapsulation.
One side holds a public key and uses it (plus some entropy source) to generate a secret in both plain and encapsulated forms.
The encapsulated secret can be sent to a remote party who holds the corresponding private key.
The receiver can then recover unencrypted version of the secret (and detect tampering).
Now, both parties have the same secret and can use it with some key-derivation function to produce something like an AES key for future communication.</p>

<p>Note that this is somewhat more restrictive than traditional key-exchange protocols.
You don’t get to exchange an arbitrary value, the generation step is part of encapsulation.
This also means that it’s a fixed size, defined by the algorithm, which is why you typically feed it into a key-derivation function rather than using it directly.</p>

<p>Module-Lattice Digital Signature Algorithm (ML-DSA) has a similarly informative name.
It is intended for providing and validating digital signatures.
It takes a private key, an arbitrary-sized document and context, and produces a signature.
A holder of the associated public key can then validate that the document matches the version signed with the private key and context.</p>

<p>These are both quite low-level building blocks for higher-level protocols.
For example, TLS can use ML-KEM for key exchange and ML-DSA for certificate validation, but also incorporates traditional algorithms in case the PQC algorithms have unexpected weaknesses against classical computers.</p>

<h2 id="initial-porting">Initial porting</h2>

<p>As is usually the case for CHERIoT, porting the C implementations of ML-KEM and ML-DSA required no code changes.
I worked with upstream to slightly simplify the platform-integration layer, so we just provide a single header describing the port.
For example, the <a href="https://github.com/CHERIoT-Platform/cheriot-pqc/blob/main/include/mldsa_native_config.h">port header for ML-DSA</a> configures the build to produce ML-DSA44 support, defines a custom function for zeroing memory and getting entropy, and adds the <code class="language-plaintext highlighter-rouge">__cheriot_libcall</code> attribute to the all exported APIs (so we can build them as shared libraries, rather than embedded in a single compartment).
The <a href="https://github.com/CHERIoT-Platform/cheriot-pqc/blob/main/include/mlkem_native_config.h">file for ML-KEM</a> is almost identical.</p>

<p>With these defined, it is possible to build both libraries as CHERIoT shared libraries.
This motivated a bit of cleanup.
We have a device interface for entropy sources, but it wasn’t implemented on the Sail model (which doesn’t have an entropy source).
It has a way of exposing the fact that entropy is insecure, so that wasn’t a problem, it just needed doing, so I refactored all of the insecure entropy-source drivers to use a common base.
Most encryption algorithms want an API that fills a buffer with entropy.
It’s nice if these don’t all need to touch the driver directly, so I created a compartment that provides this API and exposes it.
Now, both libraries are simply consumers of this API.
This also makes it easier to add stateful whitening for entropy drivers for hardware entropy sources that don’t do the whitening in hardware.</p>

<p>Most CHERIoT stacks are on the order of 1-2 KiBs.
The PQC algorithms use much more space.
More, in fact, than we permitted.</p>

<p>The previous limitation was based on the precision of bounds rounding.
A CHERI capability compresses the bounds representation by taking advantage of the fact that, for a pointer to an allocation, there is a lot of redundancy between the address of the pointer, the address of the end of the allocation (the top), and the address of the start of the allocation (the base).
The distance from the address to base and top are stored as floating-point values with a shared exponent.
In practical terms, this means that the larger an allocation is, the more strongly aligned its start and end addresses must be.
The same restrictions apply for any capability that grants access to less than an entire object.</p>

<p>When you call a function in another compartment, the switcher will truncate the stack capability so that the callee sees only the bit of the stack that you weren’t using.
The top and base of the stack must be 16-byte aligned (as an ABI requirement), but a very large stack may have hardware requirements for greater alignment and so may require a gap between the bottom of the caller’s stack and the top of the callee’s.</p>

<p>Fortunately, we’d added an instruction precisely for this kind of use case: <code class="language-plaintext highlighter-rouge">CSetBoundsRoundDown</code>.
This takes a capability and a length and truncates it to <em>at most</em> that length.
It was a fairly small tweak to the switcher to make it do this, and a much larger amount of time with SMT solvers to convince ourselves that this was a safe thing to do.</p>

<p>This also showed up a bug in our linker’s handling of the <code class="language-plaintext highlighter-rouge">CAPALIGN</code> directive, which rounds a section’s base and size up to the required alignment to be representable.
This was not working for sections that followed an explicit alignment directive.
Our stacks must be both at least 16-byte aligned <em>and</em> representable as capabilities.
This is now fixed.</p>

<p>So now we support stacks up to almost 64 KiB, a limitation imposed by the current loader metadata format rather than anything intrinsic to how the system operates after booting.
We could easily increase this limit but 64 KiB ought to be enough for anyone.</p>

<h2 id="performance-on-cheriot-ibex">Performance on CHERIoT Ibex</h2>

<p>The repository contains <a href="https://github.com/CHERIoT-Platform/cheriot-pqc/tree/main/examples/01.benchmark">a simple benchmark example</a> that tries each of the operations and reports both the cycle time and stack usage.
The output on the CHERIoT Ibex verilator simulation is:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PQC benchmark: Starting: stack used: 224 bytes, cycles elapsed: 41
PQC benchmark: Generated ML-KEM key pair: stack used: 14304 bytes, cycles elapsed: 5143987
PQC benchmark: Encrypted secret pair with ML-KEM: stack used: 17440 bytes, cycles elapsed: 1773235
PQC benchmark: Decrypted secret pair with ML-KEM: stack used: 18464 bytes, cycles elapsed: 2176226
PQC benchmark: Compared results successfully for ML-KEM: stack used: 224 bytes, cycles elapsed: 414
PQC benchmark: Generated ML-DSA key pair: stack used: 46912 bytes, cycles elapsed: 3622132
PQC benchmark: Signed message with ML-DSA: stack used: 60544 bytes, cycles elapsed: 5391177
PQC benchmark: Verified message signature with ML-DSA: stack used: 44672 bytes, cycles elapsed: 3674071
PQC benchmark: Correctly failed to verify message signature with ML-DSA after tampering: stack used: 44672 bytes, cycles elapsed: 3673706
</code></pre></div></div>

<p>The ML-KEM encrypt (generate shared secret and encrypted version) and decrypt (recover shared secret from encrypted version) each use around 18 KiB of stack and run in around two million cycles.
CHERIoT Ibex should scale up to 200-300 MHz (though may be clocked lower for power reasons in some deployments), but even at 100 MHz that’s 50 encryption or decryption operations per second.
Remember that this is an operation that typically happens when you establish a connection, then you use a stream cypher such as AES with the exchanged key.</p>

<p>The ML-DSA operations are slower and use a <em>lot</em> more stack space (almost 60 KiB for signing!).
But, even there, the performance is reasonable, under 4 M cycles.
This means that you can do 20 signature-verification operations per second at 100 MHz.</p>

<p>Even using ML-KEM for key exchange and ML-DSA for certificate validation in a TLS flow is unlikely to add more than a few tens of milliseconds to the handshake time, which is perfectly acceptable for the common use case for embedded devices.</p>

<p>In terms of code size, both are small.
The ML-KEM implementation is around 12 KiB, the ML-DSA implementation 18 KiB.
These both include a SHA3 (FIPS 202) implementation, so there’s scope for code-size reduction on systems that need both, but 30 KiB of code isn’t too bad.</p>

<h2 id="future-plans">Future plans</h2>

<p>The stack usage is very high.
Upstream has some plans to allow pluggable allocators, which will allow us to move a lot of this to the heap.
This is precisely the kind of use case that CHERIoT’s memory-safe heap is great for: something needs 60 KiB of RAM for 4,000,000 cycles, but then doesn’t need that RAM again for a long time.
That memory can then be used for something else, even in a mutually distrusting compartment.</p>

<p>Currently, the library builds are very thin wrappers around the upstream projects.
This is great as a building block, but we should make more use of CHERIoT features in the longer term.</p>

<p>Both ML-KEM and ML-DSA depend on SHA3 (FIPS 202).
Ideally, we’d factor that out as some common code, rather than carrying a copy in each library.
Similarly, the libraries provide an option to plug in your own SHA3 implementation.
This is likely to be a common hardware operation even for chips that don’t have full PQC implementations, so we should expose this option in the build system.</p>

<h2 id="is-it-secure">Is it secure?</h2>

<p>Security always depends on the threat model.</p>

<p>For signature validation, you don’t have any secret data, just a public key, a document, and a signature.
The only concerns are whether there are weaknesses in the algorithm, or bugs, that would allow an attacker to substitute a different document for the same signature.
CHERIoT prevents memory-safety bugs, so this is concerned solely with logic errors.
The code upstream is checked against a set of test vectors that aim to trigger corner cases in the logic of the underlying implementation, so hopefully is secure in this way.</p>

<p>For signing or key exchange, you need to worry about the key leaking.
On a CHERI system, it’s unlikely to leak explicitly, but may leak via side channels.
The <a href="https://github.com/pq-code-package/mldsa-native?tab=readme-ov-file#security">security section of the upstream projects</a> discusses a number of techniques that they use to mitigate this kind of attack.</p>

<p>That’s typically sufficient.
It’s been recommended practice for embedded devices to have per-device secrets for a long time.
This means that leaking a key from one device doesn’t compromise the device class, only that specific device.</p>

<p>For some very high-assurance use cases, that secret may matter and need to be robust against an adversary with physical access to the device.
Hardware encryption engines typically care about confidentiality breaches via power side channels and integrity breaches via glitch injection.
Power side channels are difficult to mitigate in software: the power requirements of multiplying two numbers together may depend on the number of carry bits set, for example.
They’re much easier to mitigate in hardware, by simply doing the same calculation twice in parallel, once with the original inputs and once with the inputs permuted to have the opposite power characteristics.</p>

<p>Glitch injection takes the chip out of its specified power or frequency (or thermal) envelope and attempts to introduce bit flips, which can corrupt state in such a way that tamper with signing or leak a key.
These are also effectively impossible to mitigate in software because the software that’s attempting the mitigation is vulnerable to the same glitches.
There are some compiler techniques that can make these harder, but they come with a high performance cost.</p>

<p>If power analysis and glitch injection are part of your threat model, the software implementations are not sufficient.
In this case you may also need to worry about someone removing the top of the chip and using a scanning-tunnelling electron microscope to read bits from non-volatile memory.
This used to require tens of thousands of dollars but is now much cheaper.
Devices that need to worry about this often have tiny explosive charges in the package to destroy the chip in cases of tampering.
If that’s your threat model, hardware PQC implementations may not be sufficient, at least alone.</p>

<p>But if you care about attackers on the network being unable to compromise the security of the class of devices, even if they have a magical and imaginary quantum computer, then these should be sufficient.</p>]]></content><author><name>David Chisnall</name></author><category term="pqc" /><summary type="html"><![CDATA[When you tell everyone you’re building a secure platform, the first thing that they ask about is encryption. And, in 2025, the hot topic in encryption is algorithms that are safe from hypothetical quantum computers that, unlike real ones, can factorise numbers bigger than 31. These algorithms are referred to as post-quantum cryptography (PQC). Since NIST standardised a few such algorithms, there’s been a lot more interest in seeing them in production, so I spent some time getting the implementations from the Linux Foundation’s PQ Code Package to run on CHERIoT. A lot of companies are building hardware to accelerate these operations, so it seemed useful to have a performance baseline on the CHERIoT Ibex, as well as something that can be used in future CHERIoT-based products.]]></summary></entry><entry><title type="html">Your RTOS is upside down</title><link href="https://cheriot.org/rtos/philosophy/2025/11/26/your-os-is-upside-down.html" rel="alternate" type="text/html" title="Your RTOS is upside down" /><published>2025-11-26T00:00:00+00:00</published><updated>2025-11-26T00:00:00+00:00</updated><id>https://cheriot.org/rtos/philosophy/2025/11/26/your-os-is-upside-down</id><content type="html" xml:base="https://cheriot.org/rtos/philosophy/2025/11/26/your-os-is-upside-down.html"><![CDATA[<p>In the last month or so (partly as a result of going to SOSP), I’ve seen a lot of architecture diagrams for operating systems and one thing has struck me about all of them: they put the device drivers in the wrong place.
Here, for example, is TockOS:</p>

<p><img src="/images/tock-architecture.png" alt="TockOS architecture diagram, showing the bottom layer containing all of the device drivers" /></p>

<p>TockOS is implemented in Rust and has safety as a priority.
I generally regard it as the gold standard for an RTOS that is forced to operate under the constraints of working with existing hardware.
But note where the drivers are: In the kernel, right at the bottom.</p>

<p>Here’s a similar diagram from the Zephyr RTOS:</p>

<p><img src="/images/zephyr-architecture.png" alt="Zephyr architecture diagram, showing the bottom layers containing all of the device drivers" /></p>

<p>Here, there is some separation between drivers and the core of the kernel, but the default configuration runs with no privilege separation.
Indeed, the <a href="https://docs.zephyrproject.org/latest/security/security-overview.html">Zephyr Security Overview</a> says:</p>

<blockquote>
  <p>The security architecture is based on a monolithic design where the Zephyr kernel and all applications are compiled into a single static binary.
System calls are implemented as function calls without requiring context switches.</p>
</blockquote>

<p>In fact, the only exception I’ve seen to this recently is LionsOS, a multiserver system built on top of the formally verified seL4 microkernel:</p>

<p><img src="/images/lionsos-architecture.svg" alt="LionsOS architecture diagram, showing the device drivers in userspace processes" /></p>

<p>This runs all of the device drivers in unprivileged contexts.
Unfortunately, seL4 assumes an MMU and so is not feasible for small embedded devices (seL4 uses more memory to hold page tables than a lot of CHERIoT firmware images use in total).</p>

<h2 id="drivers-are-attack-surface">Drivers are attack surface</h2>

<p>Device drivers, at least for I/O devices, are code that interacts with the outside world.
An attacker trying to compromise a device has a much easier job if they don’t need to take the chip apart.
The easiest attacks to mount are ones that work from an across the network.
The next easiest are ones that drive local I/O, which may be reachable remotely via other paths.</p>

<p>Drivers for I/O devices, by definition, make decisions based on the values that they read from the device.
These operations include mapping error codes to some type-safe enumeration (which can fail if the hardware is buggy), or using the values to index into other structures.
These are hard to get right, even in a memory-safe language, because they often sit below the language’s abstraction layer.
Microsoft estimates that <a href="https://learn.microsoft.com/en-us/troubleshoot/windows-client/performance/stop-code-error-troubleshooting#what-causes-stop-errors">70% of Windows crashes are caused by bugs in device drivers</a>.</p>

<p>Any bug in a driver for an I/O device is a useful building block for an attacker.
This problem is made <em>much</em> worse if the device driver is in a privileged component.
For example, at the end of last month there were <a href="https://app.opencve.io/cve/CVE-2025-10456">three</a> <a href="https://app.opencve.io/cve/CVE-2025-10458">bluetooth</a> <a href="https://app.opencve.io/cve/CVE-2025-7403">CVEs</a> in Zephyr that all could lead to compromise and, if the Bluetooth stack is not privilege separated, can lead to arbitrary code execution by an attacker who gets within a few meters of the device (or compromises another Bluetooth-enabled device nearby).</p>

<h2 id="device-drivers-do-abstraction-and-multiplexing">Device drivers do abstraction and multiplexing</h2>

<p>This design results from conflating the two functions of a device driver.</p>

<p>A device driver has to provide an abstraction over a particular device.
Sometimes this happens in multiple layers.
For example, an Ethernet device may have an abstraction for sending and receiving Ethernet frames, but this is then the foundation for a further abstraction layer for sending IP packets, which is then used to expose TCP streams and UDP datagrams.</p>

<p>A device driver often <em>also</em> has to provide some secure multiplexing.
For example, two mutually distrusting components may be allowed to create sockets for different TCP connections that flow over the same Ethernet device.
Or they may be allowed to talk to two different USB bus endpoints via the same USB controller.</p>

<p>The first of these requirements is a <em>software engineering</em> problem.
The second is primarily a <em>security</em> problem, but typically the multiplexed abstractions need to be device independent and so it’s <em>also</em> a software-engineering problem.</p>

<p>In embedded development, there’s often a distinction between a hardware-abstraction layer (HAL) and a driver, with the former providing only the abstraction and the latter also providing multiplexing.
This is a useful distinction because, in a lot of cases, embedded systems have a <em>single consumer</em> for a device.
For example, you may have multiple SPI or I<sup>2</sup>C interfaces on a device, but each one is used for a single purpose.
It is convenient to be able to write software to talk to a SPI device without having to know exactly <em>which</em> SPI controller this chip uses, but you don’t need to handle safely sharing those pins with other components.</p>

<h2 id="cheriot-rtos-distrusts-drivers">CHERIoT RTOS distrusts drivers</h2>

<p>In CHERIoT RTOS, the core platform provides device abstractions that meet the earlier definition of a HAL: they provide abstractions over classes of device, but do not attempt to provide security.
The RTOS also provides a trivial way of auditing which compartments can access which devices, so that you can ensure that devices are not accessible to compartments that are not trusted to interface with them.</p>

<p>The platform’s device code runs within whatever compartment you instantiate it in.
It has no elevated privileges <em>except</em> the MMIO region(s) that you explicitly pass it for talking to a particular device.</p>

<p>This makes it easy to support both bespoke and reusable security models.
If you need to share a device between two compartments with some secure multiplexing based on a custom policy, you can do that by instantiating the driver in a compartment and exposing APIs to the two others.</p>

<p>Sometimes, the desired abstractions are reusable.
For example, the CHERIoT network stack is assembled out of the following compartments:</p>

<pre class="mermaid">
graph TD
  Network
  subgraph Firewall["On-device firewall"]
    DeviceDriver["Device Driver"]
  end
  TCPIP["TCP/IP"]:::ThirdParty
  User["User Code "]
  NetAPI["Network API"]
  DNS["DNS Resolver"]
  SNTP:::ThirdParty
  TLS:::ThirdParty
  MQTT:::ThirdParty
  DeviceDriver &lt;-- "Network traffic" --&gt; Network
  TCPIP &lt;-- "Send and receive Ethernet frames" --&gt; Firewall
  DNS &lt;-- "Send and receive Ethernet frames" --&gt; Firewall
  NetAPI -- "Perform DNS lookups" --&gt; DNS
  NetAPI -- "Add and remove rules" --&gt; Firewall
  TLS -- "Request network connections" --&gt; NetAPI
  TLS -- "Send and receive" --&gt; TCPIP
  NetAPI -- "Create connections and perform DNS requests" --&gt; TCPIP
  MQTT -- "Create TLS connections and exchange data" --&gt; TLS
  User -- "Create connections to MQTT server and publish / subscribe" --&gt; MQTT
  MQTT -- "Callbacks for acknowledgements and subscription notifications" --&gt; User
  SNTP -- "Create UDP socket, authorise endpoints" --&gt; NetAPI
  SNTP -- "Send and receive SNTP (UDP) packets" --&gt; TCPIP
  TLS -- "Request wall-clock time for certificate checks" --&gt; SNTP
  style User fill: #5b5
  classDef ThirdParty fill: #e44
</pre>

<p>Note that the driver for the Ethernet device is instantiated in the firewall compartment.
What happens if an attacker gets arbitrary-code execution here?
They could mount a denial of service attack (refuse to forward Ethernet frames in or out).
They could tamper with Ethernet frames.</p>

<p>This sounds bad but the rest of a network stack already has to assume that things like this can happen.
Packets coming over the network are intrinsically untrusted.
The TCP/IP stack has to assume that they may be malicious.
It isn’t always good at this.
The FreeRTOS TCP/IP stack that we use has had 15 CVEs disclosed since it was released, but our compartmentalisation strategy mitigates all of them.
By placing the parts of the system that are exposed to an attacker in the <em>least</em>, not most, trusted places, we make it easy to build secure systems.</p>

<script src="https://cdn.jsdelivr.net/npm/mermaid@10.9.1/dist/mermaid.min.js"></script>]]></content><author><name>David Chisnall</name></author><category term="rtos" /><category term="philosophy" /><summary type="html"><![CDATA[In the last month or so (partly as a result of going to SOSP), I’ve seen a lot of architecture diagrams for operating systems and one thing has struck me about all of them: they put the device drivers in the wrong place. Here, for example, is TockOS:]]></summary></entry><entry><title type="html">Rust coming to CHERIoT!</title><link href="https://cheriot.org/rtos/publication/2025/11/21/rust-coming-to-cheriot.html" rel="alternate" type="text/html" title="Rust coming to CHERIoT!" /><published>2025-11-21T00:00:00+00:00</published><updated>2025-11-21T00:00:00+00:00</updated><id>https://cheriot.org/rtos/publication/2025/11/21/rust-coming-to-cheriot</id><content type="html" xml:base="https://cheriot.org/rtos/publication/2025/11/21/rust-coming-to-cheriot.html"><![CDATA[<p>The recent <a href="https://www.ukri.org/news/21-million-backing-for-technology-to-stop-cyber-attackers/">UKRI press release</a> announcing £21M for CHERI projects includes two CHERIoT-focused activities.
The tools programme, in particular, has funded SCI Semiconductor to bring Rust support for CHERIoT to production quality.
This is being done in close collaboration with the folks from the University of Kent, who previously implemented Rust support for Arm’s Morello (CHERI) platform.</p>

<p>The funding was awarded back in September, but the embargo was lifted last week and so we can talk about it publicly.</p>

<h1 id="rust-and-cheriot-have-complementary-benefits">Rust and CHERIoT have complementary benefits</h1>

<p>I’ve <a href="/cheri/myths/2024/08/28/cheri-myths-safe-languages.html">written previously about why safe languages and CHERI are complementary</a>.
Superficially, both Rust and CHERI provide similar benefits in terms of memory safety.
That similarity goes away when you look at the details.</p>

<p>Rust provides a very rich set of type-system guarantees.
If you are writing software, you can use Rust’s type system to enforce a range of properties that go beyond simple memory safety.
The key part here is ‘if you are writing software’.</p>

<p>The industry learned some important lessons from Java and JavaScript attempts at language-level sandboxing: it’s <em>very</em> hard to write a compiler that assumes that the programmer is an adversary.
Any soundness issue in the underlying type system or any bug in the compiler can be a security vulnerability if you assume that the person writing the software is malicious and actively trying to break the guarantees that the language aims to enforce.</p>

<p>The Rust compiler is currently tracking <a href="https://github.com/rust-lang/rust/issues?q=is%3Aissue%20state%3Aopen%20label%3AI-unsound">107 bugs marked as soundness issues</a>.
A typical Rust programmer is unlikely to encounter these.
Encountering these bugs typically require poking at corner cases of the language that you’re unlikely to hit by accident.
In contrast, a malicious programmer wanting to insert a supply-chain vulnerability into something that you consume has a rich set of tools.</p>

<p>The CHERIoT compartmentalisation was designed with this kind of adversary in mind.
It assumes that you may be incorporating arbitrarily buggy or malicious code into a device’s firmware and need to be able to protect against components that are compromised.
The checks that CHERIoT does at run time are less rich than those that Rust enforces at compile time, but are not bypassable, even with <code class="language-plaintext highlighter-rouge">unsafe</code> code or inline assembly.</p>

<p>This means that you can use Rust’s type system to ensure that <em>your</em> code has strong confidentiality, integrity, <em>and availability</em> guarantees, while simultaneously using CHERIoT to ensure that supply-chain code cannot violate these guarantees (at least with respect to confidentiality and integrity, though availability is a bit harder).
Rust’s guarantees make achieving high levels of confidence in your code much easier than if you used C/C++.
Rust is one of the few languages that delivers this kind of guarantee and is also able to run on small embedded devices (such as CHERIoT implementations), making it a very exciting choice for future CHERIoT development.</p>

<h1 id="cheri-and-rust-give-you-the-benefits-of-rust-faster">CHERI and Rust give you the benefits of Rust faster</h1>

<p>Rust gives rich guarantees to Rust code.
Most software is not written in a single language, and especially not in a relatively young language.
What if you have some legacy C/C++ component?
You can use it from Rust, but in most systems this means bugs in the legacy code can completely undermine the security guarantees of the Rust code.
A memory-safety bug in C code can corrupt <em>any</em> Rust object in the same process.</p>

<p>On a CHERI system, you get different levels of guarantee depending on how you mix languages.
Calling C code from Rust within the same compartment requires the C code to follow the CHERI rules.
If Rust code passes a pointer to C, the C code may tamper with objects reachable from there, but can’t arbitrarily affect the system unless it has a much narrower set of bugs.
Stack uninitialised use bugs may cause C to access objects left on the stack, for example.</p>

<p>If you put the C code in another compartment, these guarantees become a lot stronger.
C code can tamper with objects passed as arguments (or objects reachable from them) but has no access to the Rust compartment’s stack or globals.
This makes it <em>much</em> easier to reason about the impact of bugs in C code when adopting Rust.</p>

<p>You can take this even further and restrict the permissions on pointers passed from Rust to C.
Do you want C code to be able to read a Rust object or object graph?
Or perhaps modify an object but not capture a pointer to it (similar to borrow semantics)?
The hardware can enforce these properties.</p>

<p>This lets you get all of the great benefits of Rust from the <em>very first function that you write in Rust</em>, rather than having to wait until you’ve rewritten everything in Rust.</p>

<h1 id="what-are-we-trying-to-do">What are we trying to do?</h1>

<p>Initially, we aim to make Rust work as a source language for targeting CHERIoT.
CHERIoT is an embedded platform, so the no_std + alloc mode for Rust (you can dynamically allocate memory, but can’t use most of the standard library) makes sense as a target.
This will make it easy to port existing embedded Rust code to CHERIoT and to write new Rust components.
As the project runs, there will be a lot of quality-of-implementation work to ensure that it’s in a state to upstream and, until then, in a state where we can support it.</p>

<p>The next step in the project is to make sure that Rust is a first-class citizen of the CHERIoT platform.
We have a set of C/C++ language extensions to provide rich compartmentalisation features.
Rust will need equivalents of these.
A direct port of the features is quite easy and works as a minimum viable product, but we’d also like to make sure that these work as <em>idiomatic</em> Rust: you shouldn’t have to write C-like Rust to use CHERIoT features.</p>

<p>Finally, there are several things in the Rust type system that can be dynamically enforced in CHERIoT.
In current Rust, every call to non-Rust code must be <code class="language-plaintext highlighter-rouge">unsafe</code>.
I hope that we’ll be able to relax this requirement and have the compiler enforce Rust properties by removing permissions from pointers before calling C.</p>

<p>There are a lot of subtleties here.
The authors of Tock <a href="https://tockos.org/assets/papers/2025-sosp-tock-decade.pdf">discovered that exposing Rust to untrusted code has some issues</a> and provided solutions that worked in their specific context.
I’m optimistic that the richer substrate of the CHERIoT ISA gives the compiler more tools to provide generic solutions to these issues.</p>

<p>The funded project is specifically scoped to CHERIoT, but we’re building on work from Morello and hope to make it easy to support other CHERI platforms.</p>

<h1 id="rust-enables-verification">Rust enables verification</h1>

<p>The <a href="https://github.com/verus-lang/verus">Verus</a> project builds language-integrated formal verification tools in Rust.
This is particularly interesting for core parts of CHERIoT RTOS.
The lowest level parts of an operating system are tricky for safer systems languages because the things that they do are <em>intrinsically</em> unsafe.
They must be able to use escape hatches that opt out of parts of the safety guarantees of a language like Rust <em>because they are the things that implement those guarantees</em>.</p>

<p>You can think of rich type systems as <em>off-the-shelf</em> verification tools (they define some generic properties and prove them for every program, and reject those for which they can’t prove them), whereas the lowest-level parts of systems code need <em>bespoke</em> verification to prove one-off sets of properties.
There was <a href="https://mars-research.github.io/doc/2025-sosp-atmo.pdf">some great work at SOSP this year showing how Verus can prove key aspects of a kernel</a> and I’m excited to see how far we can get with this in CHERIoT.</p>

<h1 id="where-will-this-project-live">Where will this project live?</h1>

<p>The <a href="https://rust.cheriot.org">CHERIoT Rust project has its own web site</a> which is where we’ll post more public information.
The <a href="https://github.com/CHERIoT-Platform/cheri-rust">compiler repository</a> is public, but is not yet ready for general use.
We’ll post calls for testing when it reaches an early preview state.
As with any other aspect of CHERIoT, contributions are welcome.
Please join <a href="https://signal.group/#CjQKIElxAs3t3MUEMOEmQEuMHRK4rErUk2xVeFzjAjFXAShzEhCK9qQwEMFKGLGZnCjrQ7zm">our public Signal chat</a> if you want to discuss the project or help out!</p>]]></content><author><name>David Chisnall</name></author><category term="rtos" /><category term="publication" /><summary type="html"><![CDATA[The recent UKRI press release announcing £21M for CHERI projects includes two CHERIoT-focused activities. The tools programme, in particular, has funded SCI Semiconductor to bring Rust support for CHERIoT to production quality. This is being done in close collaboration with the folks from the University of Kent, who previously implemented Rust support for Arm’s Morello (CHERI) platform.]]></summary></entry><entry><title type="html">CHERI or CHERIoT?</title><link href="https://cheriot.org/cheri/philosophy/isa/2025/11/19/cheri-or-cheriot.html" rel="alternate" type="text/html" title="CHERI or CHERIoT?" /><published>2025-11-19T00:00:00+00:00</published><updated>2025-11-19T00:00:00+00:00</updated><id>https://cheriot.org/cheri/philosophy/isa/2025/11/19/cheri-or-cheriot</id><content type="html" xml:base="https://cheriot.org/cheri/philosophy/isa/2025/11/19/cheri-or-cheriot.html"><![CDATA[<p>Recently, a few people have asked me ‘should we do CHERI or CHERIoT?’
I hadn’t previously written an answer to this because the question doesn’t make sense: it’s like asking ‘should we do MMUs or x86 page tables?’
Since it’s been asked several times, I think it’s worth taking some time to explain <em>why</em> the question doesn’t make sense.</p>

<h2 id="cheri-is-an-abstract-architecture">CHERI is an abstract architecture</h2>

<p>CHERI is a conceptual extension to conventional computing architectures define a <em>capability model</em> for accessing memory within an address space.
There are a <em>lot</em> of concrete instantiations of this conceptual model.
The <a href="https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201406-isca2014-cheri.pdf">original CHERI research was done on a 64-bit MIPS variant.</a>
<a href="https://www.arm.com/architecture/cpu/morello">Arm Morello extended ARMv8 with CHERI extensions</a>.
The University of Cambridge <a href="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-987.pdf">CHERI ISA v9 Technical Report</a> contains a RISC-V CHERI adaptations and even a sketch of CHERI for x86.
And there is an ongoing effort to standardise <a href="https://github.com/riscv/riscv-cheri">RISC-V CHERI base architectures</a>.</p>

<p>These all provide the same core set of features.
Most software isn’t written for a single target instruction-set architecture (ISA), it’s written assuming a set of language-level abstractions that can be represented in a particular target.
Software written for CHERI in C (or any language that is higher-level than C) doesn’t care the bit patterns of capabilities, how bounds or permissions are encoded, and so on.
It cares only that you have some capability registers, that you can store capabilities in memory, and that the C language notion of a pointer can be represented by a CHERI capability for the target ISA.</p>

<p>I like to use memory-management units (MMUs) as an analogy for CHERI.
MIPS and SPARC had software-managed translation-lookaside buffers (TLBs).
x86 and Arm had architectural page tables that hardware walked to fill the TLB automatically on miss.
PowerPC had an MMU design that, in the words of Linus Torvalds, ‘can be used to frighten small children’.
All of these implemented the same set of abstractions in terms of processes, shared memory, copy-on-write regions, and so on.
All of these are MMU designs, just as there are many possible CHERI implementations.</p>

<p>There are lots of things like this in ISAs.
Arm’s Neon, PowerPC’s AltiVec, and x86 SSE are all SIMD extensions, which provide similar programming models, but they all have quite different low-level details.
It’s quite possible to write code that targets all of them.</p>

<h2 id="cheriot-is-a-concrete-platform">CHERIoT is a concrete platform</h2>

<p>CHERIoT is a complete ISA specification, co-designed with a software stack.
In the same way that x86 page tables or MIPS’ software-managed TLB are both concrete instances of the abstract idea of a memory-management unit, CHERIoT is a concrete instantiation of CHERI.</p>

<p>This means that there is no ‘CHERI or CHERIoT’ choice.
The real choices are between CHERI or not CHERI, or between CHERIoT and some other concrete instantiation of CHERI.
Having worked on CHERI since 2012, I am obviously biased on the first of these: given a choice between CHERI and not-CHERI, you should definitely pick CHERI!
The second choice (CHERIoT or some other CHERI variant) is more nuanced, as I’ll discuss more in the rest of this post.
There are good reasons for CHERIoT but there are also domains where it is not the best choice.</p>

<h2 id="cheriot-is-a-rich-cheri-platform">CHERIoT is a <em>rich</em> CHERI platform</h2>

<p>A minimal CHERI specification provides pointer integrity, bounds enforcement, and some permissions.
Pointer integrity means (among other things) that you can precisely differentiate between pointers and other kinds of non-pointer data in memory and so can build temporal safety on top.
Most of the CHERI temporal safety work has provided users with protection against use-after-reallocation issues.
A pointer in a CHERI system may point to deallocated memory, but it will not point to <em>reallocated</em> memory.
Something in the system will ensure that deallocated pointers are gone before the memory allocator reuses memory.
This means that any use-after-free bug will either point to the old object, or will trap.
It will never point to a new object and cause accidental aliasing.</p>

<p>CHERIoT builds on top of the core CHERI ideas with a couple of hardware features that guarantee deterministic trapping for use-after-free bugs.
CHERIoT also provides a <a href="/rtos/sealing/2025/11/06/sealing.html">rich sealing design that works with our software model</a>.
It extends the sentry mechanism in CHERI systems <a href="/isa/ibex/2024/06/26/sentries-cfi.html">to provide stronger control-flow integrity properties</a> than most other CHERI platforms, and also uses them to enforce interrupt control.</p>

<p>The entire system is designed to allow <a href="/rtos/firmware/auditing/2024/03/01/cheriot-audit.html">fine-grained auditing of compartment rights</a>, which lets us respect the principle of least privilege by having compartments as small as a single function <em>and statically reasoning about what they can do in the presence of a compromise</em>.
This set of building blocks lets us do things like <a href="/rtos/networking/auditing/2024/03/08/cheriot-network-stack.html">privilege separate the network stack and make callers respect the principle of intentional use, again with auditable guarantees that let you reason about which compartments can talk to which remote servers</a>.</p>

<p>This adds up to an easy-to-use programmer model and richer security guarantees than, we believe, any other system (CHERI or otherwise).</p>

<h2 id="cheriot-is-designed-for-resource-constrained-systems">CHERIoT is designed for resource-constrained systems</h2>

<p>Some of the strengths of CHERIoT come from the fact that we were willing to tailor the system to small devices.
Prior CHERI systems rely on the MMU to support concurrent revocation as their mechanism for temporal memory safety.
CHERIoT does not include or need an MMU, it uses an alternative approach for temporal safety, inspired by our previous CHERI+MTE work</p>

<p>You don’t want an MMU on an embedded device.
An MMU is typically larger than an entire microcontroller core!
Page tables are large data structures and they enforce a minimum granule size for memory protection and accounting.
A minimal process isolated with a typical MMU needs one page for the stack, one page of code, and one for read-write globals.
On top of that, it needs a complete page table, which is at least two pages.
This means you need a minimum of 20 KiB for an isolated component, of which 8 KiB is purely book-keeping overhead.
On a CHERIoT system, a lot of our compartments are smaller than the page tables that would be necessary to isolate them.
In addition, MMUs bring nondeterminism, which is not ideal in any system with even fairly soft realtime requirements.</p>

<p>The temporal memory system in CHERIoT is designed to scale across the range of microcontroller cores, including low-core-count multicore systems.
We check whether capabilities are valid when you load them.
Most pointers are used more than once, so this is more efficient than checking on every use (as an MTE system must do), but does not scale as well to large multicore systems or complex memory hierarchies.
You would not want to build a temporal-safety system like ours for a large server system, for example.</p>

<p>CHERIoT also extends CHERI’s sealed function pointer (“sentry”) mechanism for interrupt control.
This has some enormous benefits for the programmer model.
It is quite easy to implement on microcontroller-class systems, even dual-issue designs such as CHERIoT-Kudu.
It would be very hard to scale to big out-of-order cores.</p>

<p>And that’s all fine, for the same reason it’s fine to run different operating systems on different scales of core: different abstractions and different optimisations make sense at different scales.
By the time you’ve built a system where you start to run into scalability challenges with CHERIoT, you have enough area that an MMU doesn’t add much overhead and enough RAM that you would benefit from running an OS designed for larger computers.</p>

<h2 id="what-about-standard-risc-v-cheri">What about standard RISC-V CHERI?</h2>

<p>The RISC-V standardisation process has an unenviable task of trying to create a standard base for RISC-V CHERI systems that will work for 2-stage microcontrollers, 25-stage superscalar out-of-order server chips, GPUs, AI accelerators, SmartNICs, and so on.
This is likely to end up being a (hopefully small) family of bases and a set of extensions.</p>

<p>If all goes well, CHERIoT v2 will be one of those bases plus some extensions.
RVB26 will be a RISC-V profile assembled from those bases and a set of extensions necessary to run a CHERI Linux or CheriBSD.</p>

<p>As with the rest of RISC-V, any useful chip implements a base architecture and a set of extensions.
Profiles exist to corral a set of extensions that software can depend on in environments where binary compatibility is important, such as those running off-the-shelf operating systems and programs.</p>

<h2 id="so-when-should-i-use-cheriot-vs-some-other-cheri">So when should I use CHERIoT vs some other CHERI?</h2>

<p>If you are looking for a microcontroller-class system, CHERIoT is probably the right answer.
It is co-designed with a compartmentalisation model and a rich set of software abstractions (implemented in CHERIoT RTOS), and provides temporal safety as a baseline feature.
It is a mature and stable target, supported by multiple organisations.</p>

<p>If you are looking for an application core, CHERIoT is not for you.
There are a few cases where people have moved to using Linux on an Arm A-profile system and could run the same workload on a cheaper CHERIoT system, but don’t expect to be able to run arbitrary Linux workloads on a CHERIoT system (ever).
The upcoming CHERI RISC-V profile is a far better choice for these use cases.
Capabilities Limited and lowRISC are implementing this in the open-source CVA6 core, so there will soon be a path to a useful open-source CHERI core in this part of the design space.
Codasip’s X730 core is also available for commercial licenses and is targeting this spec (and pre-standard versions of it until it is ratified).
Hopefully there will be more implementations available in coming years.</p>

<p>If you are looking for an ISA as a base for an accelerator or coprocessor, CHERIoT <em>may</em> be the right choice, depending on the rest of the system.
Depending on your requirements, you may want to do something completely custom.
As long as it provides the same underlying security guarantees, it can still be CHERI!</p>]]></content><author><name>David Chisnall</name></author><category term="cheri" /><category term="philosophy" /><category term="isa" /><summary type="html"><![CDATA[Recently, a few people have asked me ‘should we do CHERI or CHERIoT?’ I hadn’t previously written an answer to this because the question doesn’t make sense: it’s like asking ‘should we do MMUs or x86 page tables?’ Since it’s been asked several times, I think it’s worth taking some time to explain why the question doesn’t make sense.]]></summary></entry><entry><title type="html">How CHERIoT uses Sealing</title><link href="https://cheriot.org/rtos/sealing/2025/11/06/sealing.html" rel="alternate" type="text/html" title="How CHERIoT uses Sealing" /><published>2025-11-06T00:00:00+00:00</published><updated>2025-11-06T00:00:00+00:00</updated><id>https://cheriot.org/rtos/sealing/2025/11/06/sealing</id><content type="html" xml:base="https://cheriot.org/rtos/sealing/2025/11/06/sealing.html"><![CDATA[<p>Sealing is one of the oldest parts of CHERI and one of the most powerful.
When I joined the project in 2012 it was integral to the early prototype call-gate mechanism.
You can find this version in <a href="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-850.pdf">our 2014 tech report</a>.
It included <code class="language-plaintext highlighter-rouge">CSealCode</code> and <code class="language-plaintext highlighter-rouge">CSealData</code> instructions that assembled a pair of capabilities that could be used with the <code class="language-plaintext highlighter-rouge">CCall</code> instruction to perform a cross-compartment call.
By <a href="https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201505-oakland2015-cheri-compartmentalization.pdf">our IEEE Security and Privacy 2015 paper</a>, this had been replaced with the modern sealing mechanism that we use today.</p>

<h1 id="a-quick-intro-to-sealing">A quick intro to sealing</h1>

<p>CHERI capabilities are most commonly used as pointers (they can also be used for very coarse-grained sandboxing, with a model similar to WebAssembly).
Each capability is an address plus some metadata, protected by <em>guarded manipulation</em>.
The CPU enforces properties on how capabilities can be manipulated (for example, their bounds can be shrunk, but not expanded).
Importantly, it also checks the metadata before the capability can be used for any operation.
For example, if you use a capability as the base for a load instruction then the core will check that the entire width of the load is within the bounds and that the capability has load permission.</p>

<p>The metadata for a CHERI capability has an <em>object type</em> (otype) field.
If you’re just using capabilities to represent pointers in unmodified C/C++ code, then you will only ever see capabilities with zero in their otype field.
This represents an unsealed capability.</p>

<p>The <code class="language-plaintext highlighter-rouge">CSeal</code> instruction (which has different spellings in some CHERI variants) combines two capabilities.
One is a ‘normal’ pointer-like capability.
The other, the <em>sealing key</em>, has the permit-seal permission and its address (sometimes referred to as ‘value’ or ‘cursor’) does not represent a memory location, but instead represents a value in the space of types.
The result of sealing is a copy of the pointer-like input with its otype field set to the address of the sealing key, making it a <em>sealed</em> capability.</p>

<p>You can pass this around just like any other pointer, but you can’t use or modify it.
If you try to use it as the base for loads or stores, it will trap.
If you try to modify it (the address or the metadata), you will get an untagged (invalid) capability.</p>

<p>If you use that capability as one operand to <code class="language-plaintext highlighter-rouge">CUnseal</code> and provide an <em>unsealing key</em> (a capability with permit-unseal permission) that has the address from the sealed capability’s otype, you get back the original pointer-like capability.
This lets you pass untrusted code a pointer to something that they can pass back but can’t tamper with.</p>

<h1 id="sealing-gives-type-safety">Sealing gives type safety</h1>

<p>The object type in a CHERI capability doesn’t necessarily have to have a 1:1 mapping to a language-level type.
It’s quite common for a set of types to have some kind of internal type discriminator.
For example, in our <a href="https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201704-asplos-cherijni.pdf">ASPLOS 2017 CHERI JNI paper</a>, we used three otypes for everything passed from the JVM to native code.
One each for field-ID and method-ID structures, and one for all Java objects.
Each Java object starts with a pointer to its class, so we didn’t need one otype for each Java class type, just one for all Java objects.
The virtualised sealing mechanism in CHERIoT uses this same approach to multiplex a huge number of possible object types onto two hardware otypes (one for statically allocated objects and one for dynamic, so the memory allocator is not in the TCB for statically allocated sealed objects).
More on this later.</p>

<p>If you couple a sealing key type with a language-level type (or family of self-disambiguating types), then you can build type safety that works in the presence of an attacker.
It’s common in C/C++ to expose an <em>opaque type</em> at API boundaries.
This is usually implemented as a pointer to forward-declared structure type.
C and C++ will not let you dereference this pointer unless you cast it to another type.
CHERI sealing lets you enforce the no-dereference rule, even if malicious code does cast it to some other type.
This means that you can hand sealed capabilities to other compartments and they must treat them as opaque types.
When you get them back, you have a lightweight check that they really are the type that you expect.</p>

<p>After the 2015 paper, we separated the permit-seal and permit-unseal permissions.
In most CHERIoT use cases, the entity that can seal and unseal pointers with a particular otype is the same.
This isn’t universal.
C++ provided one of our original use cases for separating them.
If you seal C++ vtables with a well-known otype, and make the permit-unseal capability for it available anywhere, then you can have a C++ ABI where only the loader can forge vtables.
This makes code reuse attacks harder.</p>

<p>There are other situations where an object type is used for <em>integrity</em> but not <em>confidentiality</em>.
It is an attestation that some software trusted to seal with a particular otype has done so, even if anyone is then able to unseal the result.
The sentry mechanism (discussed later) is a variant of this idea.</p>

<h1 id="virtualised-sealing-shares-hardware-object-types">Virtualised sealing shares hardware object types</h1>

<p>Morello had a 15-bit object-type field, which is sufficient for a lot of things.
When we scaled CHERI down to 32-bit systems for CHERIoT, we ended up with only three bits of space.
Of these, the zero value means not-sealed, so there were only seven bit patterns available.
Even simple embedded software typically has more than seven types.
CHERIoT systems usually have more than seven compartments, so seven types isn’t even enough for one type per compartment (and some compartments wish to offer more than one sealed type).
Because they are (mostly) used for different purposes, we differentiated executable and data capabilities, so 3 bits actually gives 14 sealing types, seven for data and seven for sentries (see later).</p>

<p>The <em>virtualised sealing</em> mechanism was designed to work around this shortage of sealing types.
We reserve two of the object types for objects that are instances of a structure with the following layout:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">SealedObject</span>
<span class="p">{</span>
    <span class="c1">/// The sealing type for this object.</span>
    <span class="kt">uint32_t</span> <span class="n">type</span><span class="p">;</span>
    <span class="c1">/// Padding for alignment</span>
    <span class="kt">uint32_t</span> <span class="n">padding</span><span class="p">;</span>
    <span class="c1">/// The real data for this.</span>
    <span class="kt">char</span> <span class="n">data</span><span class="p">[];</span>
<span class="p">};</span>
</code></pre></div></div>

<p>The first 32-bit word is the type, followed by 32 bits of padding, and then the real object.</p>

<p>Our virtualised mechanism takes advantage of the fact that we can represent a lot more values in the address field (32 bits) of a capability than in the otype field (three bits).
Any address in a permit-seal or permit-unseal capability that cannot fit into the otype field is available to authorise sealing or unsealing with the virtualised mechanism.
This gives us almost four billion types that we can represent with the virtualised mechanism, at the expense of being able to seal only complete objects and of each object having a single type.
This is fine for most use cases, though language VMs on bigger systems would benefit from at least a few hardware otypes.
We also reserve two data otypes for future use, so have some flexibility going forward.</p>

<p>The CHERIoT <code class="language-plaintext highlighter-rouge">token_unseal</code> function unseals the object using the hardware sealing key and then compares the virtual otype in the header to the type of the permit-unseal capability passed as the key.
If they match, it returns an unsealed capability to the object without the header, so the caller doesn’t ever have access to the header.</p>

<p>Sealed pointers using the software sealing mechanism always point to the end of the header.
The header must be strongly aligned, which means that the low three bits are unused.
We use these to implement three <em>software permissions</em>.
As with the CHERI permissions, these can be cleared but not set.</p>

<h1 id="type-safety-gives-easy-to-use-handles">Type safety gives easy-to-use handles</h1>

<p>A lot of CHERIoT compartments use type-safe pointers as handles.
For example, if you open a socket, you get back a sealed capability to that socket’s state.
The same thing happens for message queues, TLS sessions, and so on.</p>

<p>In a conventional monolithic kernel, these would all be handles or file descriptors looked up in some table in the kernel.
In a microkernel, they’d either be indexed per caller, or managed via some handle-manager service.
In CHERIoT, these are simply opaque pointers.
If these are allocated from the heap, they also benefit from the temporal safety mechanisms: Freeing the object that a handle refers to invalidates all handles, without requiring any further synchronisation.</p>

<p>Exactly the same abstractions that you use for data hiding in good API design work between trust domains, with a little bit of hardware help.
This is great for programmers because you can easily <em>retrofit</em> a security boundary.
If you have built an API around exposing opaque types, turning it into a robust security boundary simply means doing a few checks on the public APIs.</p>

<h1 id="type-safety-gives-static-software-defined-capabilities">Type safety gives static software-defined capabilities</h1>

<p>The same sealing mechanism that provides type safety for dynamic allocations also works for static objects.
You can create a CHERIoT compartment that has access to a sealed object where another compartment owns the sealing key.
The contents of these show up in the audit log.</p>

<p>We use these throughout the system to implement <em>software</em> capabilities.
CHERI (hardware) capabilities are unforgeable (delegable) tokens of authority to perform some <em>architectural</em> action; they authorize operations carried out on your behalf by the hardware.
CHERIoT software capabilities are unforgeable (delegable) tokens of authority to perform some <em>software</em> action; they authorise operations carried out on your behalf by some other compartment.</p>

<p>The first use of these most programmers will make is to allocate memory.
If you call <code class="language-plaintext highlighter-rouge">malloc</code>, this is a thin compatibility wrapper around <code class="language-plaintext highlighter-rouge">heap_allocate</code>, which requires an allocator (software) capability as an argument.
The allocator capability is a sealed object that contains a quota.
It authorises you to allocate memory until your quota is exhausted.
It also authorises you to free objects that were allocated from your quota and reclaim the quota.
Similarly, when you create a connected TCP socket, you pass a capability that authorises you to connect to a specific host and port.</p>

<p>These static capabilities can be inspected with <a href="/rtos/firmware/auditing/2024/03/01/cheriot-audit.html">CHERIoT Audit</a> so you can write policies that say things like ‘these four compartments, between them, can’t allocate more than 16 KiB of RAM’ or ‘this compartment may connect to my cloud back end, but nowhere else’.</p>

<h1 id="sealed-handles-give-trivial-flow-isolation">Sealed handles give trivial flow isolation</h1>

<p>Sealed capabilities are passed out and back as opaque values and can be unsealed without any global state beyond a constant (un)sealing key.
A lot of compartments that use them have no global mutable state.
If you pass a sealed capability to a TLS session into the TLS compartment, that compartment can unseal it and then see the state of your TLS session.</p>

<p>The thread operating on <em>your</em> TLS session, while within the TLS compartment, does not gain access to capabilities to other TLS sessions.
If an attacker gains arbitrary code execution in the TLS compartment while operating on one TLS flow, that doesn’t give them the ability to attack another one.</p>

<p>The same applies to much simpler compartments.
The message-queue compartment, which provides secure message queues and streams between compartments, has the same property.
The only state that it operates over is reached via the sealed queue pointer or one of the arguments.
If you dynamically compromise this compartment, you can tamper with a queue that you hold an endpoint handle for, but not any other queue.</p>

<h1 id="sentries-build-on-sealing">Sentries build on sealing</h1>

<p>CHERI also has a notion of a <em>sealed entry</em> (sentry) capability.
These are sealed capabilities with a special otype that allows them to be used as jump targets.
When you jump to a sentry, it is implicitly unsealed and installed as the program counter.
PC-relative loads then let the jumped-to code retrieve capabilities inaccessible to the code that held the sentry.</p>

<p>In CHERIoT RTOS, all library functions, including the switcher (which handles cross-compartment calls) are provided as sentries.</p>

<p>Morello had several variants of sentries, including some designed for descriptors where the sentry was actually a pointer to a pair of a code and data capability.
The jump would load the data capability and branch to the code capability.
This underlying mechanism is very flexible and there is a lot more research to be done on how it can be used in the future on big systems.</p>

<h1 id="cheriot-provides-rich-sentries">CHERIoT provides rich sentries</h1>

<p>CHERIoT uses sentries to control interrupts.
Rather than the usual paradigm of software explicitly managing the interrupt enable status bit, we have sentry variants that explicitly enable or disable interrupts when you jump to them.
This encourages a more structured style of code for interrupt management.
Moreover, the interrupt status of a function is visible in the audit report, so it is possible to see which compartments are able to call which interrupt-disabling functions.
This is one of the ways in which CHERIoT is very much designed for embedded systems.
Implementing this feature is moderately easy on in-order pipelines, but would be much harder on out-of-order machines.
This is also one of the strengths of RISC-V: we can add extensions that are desirable in part of the hardware design space, but not everywhere.</p>

<p>CHERIoT also differentiates forward and backwards-edge sentries.
Function pointers and return addresses use different otypes.
This means that you cannot trivially replace a return address on the stack with a function pointer for control-flow hijacking.
This kind of attack is hard on a CHERI system anyway, but this provides some defence in depth.
Credit to folks at the Microsoft Security Response Center for recommending this and to Murali Vijayaraghavan at Google for coming up with a way of adding it without invasive changes to the ISA.</p>

<p>When you jump to a forward-edge sentry with a jump-and-link instruction, the link register is a return sentry that captures the previous interrupt state (enabled or disabled).
This means that, at least for leaf functions, we can enforce structured programming for control over interrupt state.</p>

<p>The forward-edge sentries are an <em>attestation</em> from the RTOS to the running software that this is a valid function pointer.
The backwards-edge sentries are an <em>attestation</em> from the hardware that this is the result of executing a jump-and-link instruction.</p>

<h1 id="cheriot-uses-sealing-inside-the-rtos">CHERIoT uses sealing inside the RTOS</h1>

<p>In addition to the otypes exposed directly to programmers, the RTOS reserves two for internal use.
When you do a cross-compartment call, the compiler will insert a call to the switcher (via the switcher sentry).
The function pointer for such a call is a sealed <em>data</em> capability to the target compartment’s export table.
The base of this will point to the program counter and global pointer register values for the target.
The address will point to the metadata describing this entry point.
The switcher unseals this and so knows that this really is a cross-compartment entry point provided by the loader.</p>

<p>When the switcher takes an interrupt, it will spill the register file and pass the scheduler a sealed capability to the register-save area.
The scheduler then returns a sealed capability of the same type and the switcher restores the register state from there.
This ensures that the scheduler never sees the state of interrupted threads, it has only opaque tokens allowing it to choose the next thread to run.</p>

<p>Hopefully this short guided tour has given you some idea of both how powerful a mechanism sealing is, and how pervasive it is in the CHERIoT platform.</p>]]></content><author><name>David Chisnall</name></author><category term="rtos" /><category term="sealing" /><summary type="html"><![CDATA[Sealing is one of the oldest parts of CHERI and one of the most powerful. When I joined the project in 2012 it was integral to the early prototype call-gate mechanism. You can find this version in our 2014 tech report. It included CSealCode and CSealData instructions that assembled a pair of capabilities that could be used with the CCall instruction to perform a cross-compartment call. By our IEEE Security and Privacy 2015 paper, this had been replaced with the modern sealing mechanism that we use today.]]></summary></entry><entry><title type="html">CHERIoT 1.0 Released!</title><link href="https://cheriot.org/sail/specification/release/2025/11/03/cheriot-1.0.html" rel="alternate" type="text/html" title="CHERIoT 1.0 Released!" /><published>2025-11-03T00:00:00+00:00</published><updated>2025-11-03T00:00:00+00:00</updated><id>https://cheriot.org/sail/specification/release/2025/11/03/cheriot-1.0</id><content type="html" xml:base="https://cheriot.org/sail/specification/release/2025/11/03/cheriot-1.0.html"><![CDATA[<p>Today, we <a href="https://github.com/CHERIoT-Platform/cheriot-sail/releases/tag/v1.0">released the 1.0 version of the CHERIoT specification</a>!
For those reading about CHERIoT for the first time, it is a hardware-software co-design project that aims to produce secure microcontroller-class systems for connected devices.
We start with a foundational guarantee of memory safety (the hardware will trap on buffer overflows or use after free errors, even in assembly code) and build rich (and usable) compartmentalisation abstractions on top.</p>

<p>This specification defines the ISA, the CHERIoT language extensions, compilation model, relocations, and so on.
The last change that we made to the ISA was in December 2024, so we are confident that this is a stable release that we can support in hardware for a long time.
This specification was implemented by the <a href="https://github.com/microsoft/cheriot-ibex/releases/tag/cheriot_ibex_v1.0">1.0 release of CHERIoT Ibex</a> and by <a href="https://github.com/microsoft/cheriot-kudu">CHERIoT Kudu</a> (which has not yet had an official release).
These two implementations demonstrate that the ISA scales from three-stage single-issue pipelines to six-stage dual-issue pipelines, roughly the same range of microarchitectures supported by Arm’s M profile.</p>

<p>We at SCI have the first of our ICENI chips, which use the CHERIoT Ibex core, on the way back from the fab now and will be scaling up to mass production in the new year.
I am not allowed to speak for other folks building CHERIoT silicon, but I expect 2026 to be an exciting year for the CHERIoT project!</p>

<p>This is a release that, both through the open-source CHERIoT Platform project and through partner companies that aim to ship CHERIoT products, we will be supporting for years to come.</p>

<h1 id="alignment-with-the-risc-v-y-base">Alignment with the RISC-V Y base</h1>

<p>RISC-V International is currently standardising a CHERI base for RISC-V (tentatively named Y, alongside the existing I and E bases).
We aim for this to be a common subset across all CHERI implementations, whether they are microcontroller, application cores, or domain-specific accelerators.
The exact composition of this base is still under discussion, but we aim for CHERIoT 2.0 to be source-compatible with (and with equivalent functionality to) CHERIoT 1.0, but built atop this common base.</p>

<p>On the way to 2.0, we have been separating out the parts that are in the common core from those that are CHERIoT-specific.
We also have the <code class="language-plaintext highlighter-rouge">ct.</code> prefix reserved for the CHERIoT vendor extensions in RISC-V and so will move any instructions that are not direct equivalents of standardised RISC-V instructions into that namespace.
For compatibility, our toolchains will support both names, but we expect to transition to the RISC-V official names as the RISC-V standardisation process makes progress.</p>

<p>Expect to see some 1.x releases that make changes in things like assembler mnemonics (but not the underlying ISA).</p>

<h1 id="future-plans-beyond-20">Future plans beyond 2.0</h1>

<p>No ISA specification is ever complete.
Our 1.0 release is a solid foundation, not an endpoint.</p>

<p>We already have plans to integrate with the <a href="https://docs.riscv.org/reference/isa/unpriv/zfinx.html">Zfinx and Zdinx extensions</a>, which allow 32-bit and 64-bit floating point values to be held in integer registers on RISC-V.
On CHERIoT, a 64-bit floating-point value could live in a single capability register (as, indeed, it is in the CHERIoT soft-float ABI).
This avoids increasing compartment and context-switch times (which comes from a larger register file, and are why we used RV32E, not RV32I as our base), while still allowing hardware floating point acceleration.</p>

<p>We have two candidates for alternative bounds encodings, which each have different microarchitectural tradeoffs.
They are 100% backwards compatible with the software and so we aim to add both to a future version of the specification.
We have designed our ABI carefully to avoid leaking capability encoding details into the software model, which means that we can support implementers picking any of the three encodings, on the products where each makes sense, along with any vendor-specific variants that they may wish to support.
Supporting different encodings does not require changes even to core RTOS code and the two proposed modifications do not require toolchain changes either, though vendor extensions may.</p>

<p>We’ve been exploring for a while the idea of having two integer subregisters for each capability.
This would give us 30 integer registers or 15 capability registers, which would slightly reduce stack spills.
This involves some complexity in the microarchitecture and in the compiler, so it isn’t an automatic win and there’s a lot more work to decide whether it’s desirable.</p>

<p>The current drafts for the RISC-V official specification provide a thread ID register.
We’ve been considering how we could replace our use of the global-pointer register with this, to make it a compartment identifier.
The specification also reserves loads and stores with the zero base register (which will always trap on a CHERI system) and so we could use these for GP-relative loads and stores.</p>

<p>Similarly, the current draft reserves space for a branch-if-tag-[not-]-set instruction, which would be quite commonly used in our software model.
The RISC-V code-size extensions have <code class="language-plaintext highlighter-rouge">push</code> and <code class="language-plaintext highlighter-rouge">pop</code> / <code class="language-plaintext highlighter-rouge">popret</code> instructions, which are likely to want some adaptation to be most useful with CHERIoT.</p>

<p>And there are also a lot of interesting ideas in research.
For example, <a href="https://arxiv.org/abs/2504.14654">BLACKOUT</a> provided a way of building a data-centric constant-time programming model on top of CHERI.
There’s a bit of work to see how to integrate that with CHERIoT but being able to have a good programmer model for constant-time programming would be worth it for a later release.</p>

<p>There are a lot of possible future directions, and we expect to be adding to the spec for a while.
Having a solid core means that we can do this without breaking backwards compatibility.
And, as with everything in CHERIoT today, future additions to the ISA will always be motivated by software requirements and careful consideration of the programmer model and the microarchitecture.</p>

<h1 id="sorry-for-the-wait">Sorry for the wait!</h1>

<p>This is a slightly embarrassing post to write, because I thought we’d done a 1.0 release several months ago: The ISA has been sufficiently stable for a 1.0 stamp for a while.
Unfortunately, when we initially thought about it, there was a bug in the Sail to LaTeX converter and so we postponed the release until we could generate a nice specification PDF.
That bug was fixed quite quickly, but by then I’d forgotten that it had been the release blocker and thought we’d done this release back in the Spring.</p>

<p>Approaching a year without any ISA changes makes me even more confident than I was back in the Spring that this is a release that we can support for the next decade.</p>]]></content><author><name>David Chisnall</name></author><category term="sail" /><category term="specification" /><category term="release" /><summary type="html"><![CDATA[Today, we released the 1.0 version of the CHERIoT specification! For those reading about CHERIoT for the first time, it is a hardware-software co-design project that aims to produce secure microcontroller-class systems for connected devices. We start with a foundational guarantee of memory safety (the hardware will trap on buffer overflows or use after free errors, even in assembly code) and build rich (and usable) compartmentalisation abstractions on top.]]></summary></entry></feed>