Simplifying a key-value service using CHERIoT
Recently I ported a key-value store service to CHERIoT-RTOS. It highlighted a bunch of simplifications and performance improvements that CHERIoT and CHERIoT-RTOS provide.
The key-value service has a simple API (addOrUpdate(key, value)
, read(key)
and erase(key)
) and is required to serve multiple user applications while ensuring confidentiality of data stored by the different user applications. In conventional systems (both server-class and embedded), one will design such a service and the user applications roughly as follows:
- To ensure that the user applications do not read each other’s data, the key-value store should be in some abstraction of a “process”, with each user application in its own process. Each process in a conventional system will typically have code, globals, a stack and a thread associated with it.
- There will be a message queue for the user processes to interact with the key-value store process. The message queue can be in the same process as the key-value store. The messages in the queue should not be accessible by the user applications, other than just the ability to insert a new message. The message queue should reflect the priority of the user processes.
- Each user application should be associated with a unique unforgeable
userKey
that must be supplied with each user application request to authenticate the request. The easiest way to do it in a conventional system would be to use asymmetric encryption, encrypting the process-id of the user application, along with a request count using the public-key of the key-value store service, which the key-value store service can later decrypt using its private-key to get the process-id and the count. The service also has to maintain the count for each user application to ensure a match. Note that this involves the encryption/decryption overhead for each request and, less importantly, an overhead to maintain the count state of each user application. An alternative design would be to distribute a uniqueuserKey
to each user application at the beginning, and thisuserKey
must be large to avoid forging by another user application. Distributing such a key to each user application while ensuring that another user application does not have access to the key is non-trivial – one requires either a system-level service to send they key from the key-value store service to any user application confidentially (this can be done easily if the key-value store service is a privileged service that has access to all of the system memory, but this is not desirable), or a system-level service to distribute keys to multiple user applications. Note that even in embedded systems where all user applications are compiled into a firmware, one cannot assign these keys to the applications during firmware creation because the keys can then be read from the firmware’s binary.
The separation of compartments and threads and the notion of sealed keys in CHERIoT satisfy all the above requirements elegantly. We can create a key-value store compartment exporting addOrUpdate(userKey, key, value)
, read(userKey, key)
and erase(userKey, key)
where userKey
is just a sealed object unique to each compartment. This way, there’s no need to encrypt or decrypt a userKey
during every API call. Rather, the unforgeability property is guaranteed by the sealing mechanism of CHERIoT. One also needs the userKey
’s to be unique for each user application which can be guaranteed by using different underlying values for each user application. Such a mechanism can be achieved either by static sealing or by dynamic sealing. I used static sealing in my approach, using a test code as a template (service code, service header and user application code). One can also use dynamic sealing as shown in this example, which requires another function userKey_t initialize()
exported by the key-value store service that has to be called by each user application before accessing the key-value store’s API, with the key-value store service assigning unique userKey
values to each initialize
call.
The performance and simplification benefits don’t stop with just avoiding encryption and decryption. As mentioned earlier, a conventional system will require creating a process for the key-value store service which involves creating a stack and a thread, whereas in CHERIoT the key-value store is just a threadless compartment; the thread running a specific user application can run the key-value store service. The user application simply makes a function call to one of the exported functions during which the CHERIoT-RTOS switcher switches to the compartment containing the service on the same thread. This avoids storing a stack and thread for the key-value store service. Remember that the unique, unforgeable sealed userKey
ensures that a user application can access only its own keys/values.
But how do we ensure that two user applications do not access the functions exported by the key-value store service concurrently? The discussion above alluded to the use of a message queue into which the user applications enqueue the request messages (concurrent enqueues protected by a lock) which the service dequeues one by one. As mentioned above, this requires seralizing the requests as opposed to passing them as arguments to function calls. Moreover, the priority of the user applications have to be maintained in the queue to ensure that the requests are dequeued in the right order. Instead, in CHERIoT-RTOS, one can use a single priority-inheriting lock for locking each of the functions exported by the key-value store service. As in a conventional system, the requests will still be processed one-by-one, therefore not affecting performance. Moreover, this piggy-backs on the CHERIoT-RTOS’ scheduler queue for each thread instead of creating another queue and maintaining the scheduler’s priority for the user applications in the new queue. Finally, there’s no need to serialize the requests as there’s no request queue; rather the key-value store API is just a function call to the appropriate function. All this leads to better performance than the design mentioned earlier using a conventional system.