CHERIoT Programmers’ Guide published!

The cover of the CHERIoT Programmers' Guide, featuring many cats in boxes as a zoomed-in image from the top of a chip, with the tag line 'safe and secure compartmentalisation'

As of this week, the first edition of the CHERIoT Programmers’ Guide is available for purchase! The eBook editions are available now, the print edition is with the publisher and will appear on that page in the next few days.

A lot of people have provided feedback on the current edition. The four technical reviewers, Phil Day, Richard Edgar, Adam Finney, and Hugo McNally, all did detailed reads. Amanda Robinson, our copyeditor, did a great job in cleaning up my writing. I’d like to thank Michael W. Lucas for recommending Amanda and the draft2digital flow for publishing. And, of course, I wouldn’t have been able to focus on writing it without the generous grant from UKRI Discribe Hub+, funded through the Economic and Social Research Council [ES/V003666/1]

I started writing the CHERIoT book near the end of 2023, shortly after leaving Microsoft. The initial commit was on November 6th, which had the rough skeleton of the book, a little over 21,000 words. Prior to that, we had several documents but they were not written as a cohesive set of materials, each was stand-alone documentation for a single component.

Last July, Discribe Hub put out a funding call for beginners guides to developing on CHERI. This seemed like a great fit for the CHERIoT book. CHERIoT’s secondary mission has always been to showcase the abstractions that you can build if you are able to assume CHERI from the ground up, making it an excellent platform to learn about CHERI.

At this point, the draft was a bit over 25,000 words. Most importantly, it lacked any example code. We proposed finishing the text, adding examples, and having the book reviewed by technical experts and professionally copyedited as a grant request.

This was funded in November. At which point, I got Knuth’d¹.

I wrote my first book in LaTeX and it produced beautiful PDFs but was hard to generate good HTML (or XHTML for ePub). For my second book, I wrote semantic markup that happened to be valid LaTeX, so it could be included as LaTeX files but also parsed with another tool that would generate HTML.

For the CHERIoT book, I’d hoped to use AsciiDoctor and AsciiDoxy, which let me embed documentation snippets parsed from Doxygen markup. Unfortunately, this didn’t work well. AsciiDoc favoured presentation markup and had a lot of markup (I had to remember to type {cpp} instead of C++ everywhere, because ++ was markup). Favouring presentation markup is a problem because if it’s easier for me to mark something up as fixed-width than it is for me to mark it as C code or Rego code, I will. This loses information that is useful if you want to improve the typesetting later.

AsciiDoxy crashed on newer versions of the RTOS headers and I couldn’t figure out why. For previous books, I’d used libclang to extract annotated regions from examples and make sure that code listings actually worked but AsciiDoctor’s plugin model was poorly documented and I couldn’t work out how to do this (and, especially, how to build a text tree that used the same formatting that inline code markup used.

So I wrote a new tool, named igk (I got Knuth’d), which parses a slightly cleaned-up TeX-style markup (largely copied from SILE) and is structured as a compiler. It runs a set of passes (written in Lua) that transform the tree and then generates some output. For the CHERIoT book, this is configured to produce four outputs:

HTML for online viewing.
PDFs for online reading.
PDFs (which, for example, replace hyperlinks with footnotes that contain the target URL) for the printers.
ePub (which passes the official ePub3 validation tool) for eBook readers.

The PDFs are generated by emitting XML that SILE then typesets. SILE is a modern reimplementation of the TeX algorithms in Lua.

This slightly convoluted flow means that we can typeset C/C++ sources that include CHERIoT language extensions. They are parsed with the version of libclang built from the CHERIoT LLVM tree. The same libclang parses doc comments from headers so that the book can include the canonical documentation rather than reproducing it, where this makes sense.

All of the examples in the book now live in the book-examples repository. The text included in the book is extracted from these files, so we can make sure that they actually build and work.

This makes it easy to not just release the book, but maintain it as a living document. The current length is a bit over 62,000 words, not counting code listings or text from header documentation. The first edition is a reviewed and copy-edited snapshot, but we will keep evolving the book towards a second and future editions. At any point, the latest drafts will remain available available for free, in their current home.

Donald Knuth famously got sidetracked from writing The Art of Computer Programming and implemented TeX, a complete typesetting program. I didn’t get quite as badly sidetracked. ↩