Source a1f48b4e... — STIMSMITH

SOURCE ARCHIVE

SHA256: a1f48b4e48ed3b2828e34b2153ad4919126dcb0a072342cea7a71aba707a0b02

URL: https://epub.jku.at/obvulihs/download/pdf/12671377

TYPE: application/pdf

SIZE: 758.5 KB

FETCHED: 6/11/2026, 10:19:50 PM

EXTRACTOR: liteparse

CHARS: 295,747

EXTRACTED CONTENT

295,747 chars

                      Author
                      Andreas Hinterdorfer, BSc

                      Submission
                      Institute for
                      Complex Systems

CHERI-RISC-V VP++: Thesis Supervisor Univ.-Prof. Dr. A Virtual Prototyping Daniel Große Assistant Thesis Supervisor Platform Enabling DI Manfred Schlägl September 2025 Fine-Grained Memory Protection

Master’s Thesis

to confer the academic degree of Diplom-Ingenieur

in the Master’s Program Elektronik und Informationstechnik

                      JOHANNES KEPLER
                      UNIVERSITY LINZ
                      Altenberger Straße 69
                      4040 Linz, Austria
                      jku.at

Abstract

Memory corruption bugs are among the oldest and most persistent problems in computer security. Although modern type-safe programming languages attempt to address this problem, billions of lines of existing C and C++ code in performance critical applications will not be replaced in the foreseeable future. This had led to the development of Capability Hardware Enhanced RISC Instructions (CHERI), a research project that extends conventional Instruction Set Architectures (ISAs), compilers and Operating Systems (OSs) with new architectural features for 昀椀ne-grained memory protection. CHERI has been realized in several hardware prototypes and a QEMU based emulator with CHERI support is already available. However, no CHERI-enabled Virtual Prototype (VP) has been published to date. The contribution of this thesis is the design, implementation, veri昀椀cation and evaluation of CHERI-RISC-V VP++, a CHERI-enabled VP based on the open-source RISC-V VP++ project. This VP is capable of running unmodi昀椀ed CHERI-enabled software, including the capability- enabled FreeBSD OS, called CheriBSD. The implementation was veri昀椀ed using random testing with the TestRIG framework and demonstrated through bare-metal CHERI-enabled programs, before successfully booting CheriBSD. The resulting CHERI-RISC-V VP++ is publically avail- able and provides a cycle-approximate and deterministic platform for system-level evaluation of CHERI, o昀昀ering a valuable tool for early design space exploration and advancing CHERI research.

ii

Kurzfassung

Speicherfehler gehören zu den ältesten und hartnäckigsten Problemen der Computersicherheit. Obwohl moderne typsichere Programmiersprachen versuchen, dieses Problem zu lösen, werden Milliarden Zeilen an bestehendem C- und C++-Code in leistungskritischen Anwendungen nicht in absehbarer Zeit durch solche ersetzt werden. Dies führte zur Entwicklung von Capability Hardware Enhanced RISC Instructions (CHERI), einem Forschungsprojekt, das herkömmliche Instruction Set Architectures (ISAs), Compiler und Betriebssysteme um architekturelle Merkmale für feingra- nularen Speicherschutz erweitert. Mittlerweile wurde CHERI in mehreren Hardware-Prototypen realisiert und auch ein QEMU-basierter Emulator mit CHERI-Unterstützung ist bereits verfügbar. Allerdings wurde bisher noch kein CHERI-fähiger virtueller Prototyp (VP) verö昀昀entlicht. Der Beitrag dieser Arbeit ist die Entwicklung, Implementierung, Veri昀椀kation und Evaluierung von CHERI-RISC-V VP++, einem CHERI-fähigen VP basierend auf dem Open-Source-Projekt RISC-V VP++. Dieser neue VP ist in der Lage, unveränderte CHERI-fähige Software auszuführen und sogar das CHERI-fähige FreeBSD basierte Betriebssystem CheriBSD zu booten. Die Imple- mentierung wurde durch zufällig generierte Tests mit dem TestRIG-Framework veri昀椀ziert und sowohl anhand von Bare-Metal Beispielen als auch durch den erfolgreichen Boot von CheriBSD demonstriert. Der so entstandene, frei verfügbare CHERI-RISC-V VP++ bietet eine deterministi- sche und zyklusnahe Plattform für die Evaluierung von CHERI auf Systemebene und stellt damit ein wertvolles Werkzeug für frühe Entwurfsraumerkundung sowie die weitere CHERI-Forschung dar.

iii

Contents

Abstract ii

Kurzfassung iii

1 Introduction 1

2 Preliminaries 3 2.1 The RISC-V Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1.1 RISC-V Base ISA and Extensions . . . . . . . . . . . . . . . . . . . . . . . 3 2.1.2 RISC-V Custom Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1.3 Virtual Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Virtual Prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 RISC-V VP++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 CHERI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4.1 What is CHERI? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4.2 CHERI Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4.3 Principles of CHERI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4.4 The CHERI Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4.5 Architectural Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4.6 Capability Compression in CHERI . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.7 Representing Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.8 Capability Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.9 The Existing CHERI Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Extending RISC-V VP++ with CHERI 18 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Implementing Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Extending the Instruction Set Simulator . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3.1 General Purpose Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.2 Special Capability Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.3 Instruction Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.4 Instruction Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3.5 Interrupt and Trap Handling . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3.6 Implications on ISS Performance Optimizations . . . . . . . . . . . . . . . . 27 3.4 Adapting Memory Layout and Interfaces . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4.1 Tagged Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4.2 Memory Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4.3 TLM Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4.4 DMI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4.5 Virtual Memory and Page Tables . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4 Veri昀椀cation using TestRIG 34 4.1 Advantages of TestRIG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2 Disadvantages of the TestRIG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3 The RVFI-DII Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

iv

Contents v

4.4 Adding RVFI-DII support to CHERI-RISC-V VP++ . . . . . . . . . . . . . . . . . 36 4.5 Bugs identi昀椀ed using TestRIG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.5.1 Memory Alignment Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.5.2 Integer Values in Capability Registers . . . . . . . . . . . . . . . . . . . . . 37 4.5.3 Compressed Capability Instructions . . . . . . . . . . . . . . . . . . . . . . 38 4.6 Coverage of the TestRIG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.6.1 Common/CHERI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.6.2 RV64/CHERI64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.6.3 Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.6.4 Common . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.6.5 RV64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.7 Problems not found with the TestRIG . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.7.1 Trap Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.7.2 Missing Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.7.3 Non-deterministic Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.7.4 Virtual Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5 Running Bare-Metal Software on CHERI-RISC-V VP++ 47 5.1 CHERI-enabled RISC-V Assembly Programs . . . . . . . . . . . . . . . . . . . . . 47 5.1.1 Initial Assembly Implementation . . . . . . . . . . . . . . . . . . . . . . . . 48 5.1.2 Modifying Assembly for CHERI . . . . . . . . . . . . . . . . . . . . . . . . 48 5.1.3 Initializing the Stack Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.1.4 The Global Capability Table . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2 CHERI-enabled C Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.2.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.2.2 Evaluating Boundary Protection . . . . . . . . . . . . . . . . . . . . . . . . 54 5.2.3 Evaluating Function Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6 Bringing a Full-Scale Operating System to CHERI-RISC-V VP++ 61 6.1 Requirements and Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.2 Running CheriBSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.2.1 Basic System Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.2.2 Inspecting Running Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.2.3 Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.3 Testing CHERI Protection in CheriBSD . . . . . . . . . . . . . . . . . . . . . . . . 67 6.3.1 Evaluating Boundary Protection . . . . . . . . . . . . . . . . . . . . . . . . 67 6.3.2 Evaluating Function Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

7 Conclusion and Future Work 72 7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 7.2.1 RISC-V Speci昀椀cation for CHERI Extensions . . . . . . . . . . . . . . . . . 73 7.2.2 RV32 Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 7.2.3 Performance Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 7.2.4 Extending other PTE Formats . . . . . . . . . . . . . . . . . . . . . . . . . 74 7.2.5 GDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 7.2.6 Veri昀椀cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 7.2.7 Challenging the CHERI Model . . . . . . . . . . . . . . . . . . . . . . . . . 75

Bibliography 76

Appendix A TestRIG Code Coverage 80

List of Figures

Figure 2.1 Architecture of RISC-V VP++ . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Figure 2.2 RISC-V Instruction Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Figure 2.3 CHERI Capability Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Figure 2.4 CHERI-256 Capability Format . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Figure 2.5 TestRIG Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Figure 3.1 Architecture of CHERI-RISC-V VP++ . . . . . . . . . . . . . . . . . . . . . . 19 Figure 3.2 Format of xtval for Capability Exceptions . . . . . . . . . . . . . . . . . . . . 27 Figure 3.3 Sv39 PTE Format with CHERI Extension . . . . . . . . . . . . . . . . . . . . . 32

Figure 5.1 Overview of Data Flow in the ISS of CHERI-RISC-V VP++ . . . . . . . . . . 57

vi

List of Tables

Table 2.1 RISC-V Page Table Entry (PTE) 昀椀elds: bits 0–7 . . . . . . . . . . . . . . . . . . 4 Table 2.2 Decoding of Base and Top Address . . . . . . . . . . . . . . . . . . . . . . . . . 13 Table 2.3 In-Register Representation of Capability Constants . . . . . . . . . . . . . . . . 14 Table 2.4 In-Memory Representation of Capability Constants . . . . . . . . . . . . . . . . 14

Table 3.1 CHERI Special Capability Registers . . . . . . . . . . . . . . . . . . . . . . . . . 22 Table 3.2 Encoding of some Memory-Load Instructions with Explicit Address Type . . . . 24 Table 3.3 CSR Access Whitelist in CHERI . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Table 3.4 CHERI Exception Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Table 3.5 Capability Store Permissions in PTE . . . . . . . . . . . . . . . . . . . . . . . . 32 Table 3.6 Capability Load Permissions in PTE . . . . . . . . . . . . . . . . . . . . . . . . 33

Table A.1 Code Coverage of the CHERI-RISC-V VP++ running 2 150 000 test cases gen- erated by TestRIG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

vii

List of Acronyms

ABI Application Binary Interface CHERI Capability Hardware Enhanced RISC Instructions CID Compartment Identi昀椀er CLINT Core Local Interruptor CPU Central Processing Unit CSR Control and Status Register CTSRD Clean Slate Trustworthy Secure Research and Development DBBCache Dynamic Base Block Cache DDC Default Data Capability DII Direct Instruction Injection DMem IF Data Memory Interface DMI Direct Memory Interface DuT Device under Test ELF Executable and Linkable Format EPSRC Engineering and Physical Sciences Research Council GCC GNU Compiler Collection GCT Global Capability Table GDB GNU Debugger GOT Global O昀昀set Table HDL Hardware Description Language HTML Hypertext Markup Language ICMP Internet Control Message Protocol IMem IF Instruction Memory Interface ISA Instruction Set Architecture ISS Instruction Set Simulator LSCache Load Store Cache MMU Memory Management Unit MSB Most Signi昀椀cant Bit OS Operating System PC Program Counter PCC Program Counter Capability PCI Peripheral Component Interconnect

viii

List of Tables ix

PLIC Platform Level Interrupt Controller PTE Page Table Entry RIG Random Instruction Generation RTC Real-Time Clock RTL Register Transfer Level RVC RISC-V Compressed Instructions RVFI RISC-V Formal Interface SATP Supervisor Address Translation and Protection SBI Supervisor Binary Interface SCR Special Capability Register SoC System on Chip TCB Trusted Computing Base TLB Translation Lookaside Bu昀昀er TLM Transaction Level Modeling UART Universal Asynchronous Receiver/Transmitter VEngine Veri昀椀cation Engine VP Virtual Prototype

Chapter 1

Introduction

While memory corruption bugs are one of the oldest problems in computer security, they are still a major issue today. According to the MITRE ranking [1] memory corruption bugs are considered one of the top three most dangerous software vulnerabilities in 2024. A concrete prominent example is the Chromium project, which reports that around 70% of their security bugs are memory safety problems [2]. Although modern type-safe programming languages attempt to address this problem, it is unrealistic to expect that the billions of lines of existing C and C++ code for performance critical applications, which rely on low-level features, will ever be fully replaced [3]. This situation has led to a costly arms race. Developers of C and C++ Operating System (OS) kernels, language runtimes, web browsers, and server components constantly release patches, while attackers quickly 昀椀nd new vulnerabilities. The result is an unwinnable ”patch and pray” cycle [4]. With Capability Hardware Enhanced RISC Instructions (CHERI), a project started in 2010 by the University of Cambridge, an alternative strategy to mitigate memory corruption bugs was developed. CHERI extends the conventional Instruction Set Architecture (ISA), compiler and OS with new architectural features to enable 昀椀ne-grained, capability based memory protection and highly scalable software compartmentalization [5]. Watson et al. [6] argue that the use of modi昀椀ed hardware enables memory-safe variants of the C and C++ programming languages themselves. To achieve the 昀椀ne-grained memory protection, CHERI introduces architectural capabilities, which extend the conventional pointer model of C and C++ with additional metadata, such as bounds and permissions. These capabilities serve as unforgeable tokens granting abilities to perform certain actions to the holder. An additional out-of-band Tag bit, which can be read and unset by software, but cannot be set, records the authenticity of each capability. In the following years, the CHERI project has been extended to a full ecosystem of hardware, software and tools, including an adapted LLVM compiler and a CHERI-enabled FreeBSD OS called CheriBSD. In October 2024, the CHERI Alliance [7] was founded with the goal to drive a global industrial ecosystem around CHERI. Since then, the alliance has released several publically available projects, including a Linux kernel with CHERI support. The CHERI alliance also works on the rati昀椀cation of the RISC-V speci昀椀cation for CHERI extension [8], which will become a standard for the RISC-V architecture. With QEMU-CHERI [9], a generic and open-source machine and userspace emulator with CHERI support is already available. While this emulator is a great tool for developing and testing CHERI-enabled software, it does not provide sufficient accuracy to evaluate the underlying hard- ware architecture. At the other end of the spectrum, Register Transfer Level (RTL) simulation o昀昀ers cycle-accurate hardware modeling, but comes at the cost of very low simulation speed, mak- ing system-level experimentation impractical. This thesis aims to bridge the gap between these two approaches by extending an existing RISC-V Virtual Prototype (VP) with CHERI support. A VP is a high-level executable software model of a hardware platform capable of running unmod- i昀椀ed target software, thereby enabling early software development and system-level evaluation. By operating at higher level of abstraction than RTL, VPs achieve much faster simulation speeds, while still exposing the architectural details necessary for analyzing the impact of CHERI on the hardware. For this thesis, RISC-V VP++ [10], an open-source project developed and maintained by the In- stitute for Complex Systems, Johannes Kepler University Linz, is used as a basis. RISC-V VP++

1

1 Introduction    2

is a combination of several projects into one powerful tool for early design space exploration, evalu- ation, veri昀椀cation, and validation of RISC-V based systems at the system level. The usage of C++ and the standardized SystemC [11, 12] and Transaction Level Modeling (TLM) [13] libraries allow the abstraction of communication details, enabling fast and extensible models. The CHERI-enabled version, developed in the context of this thesis, is called CHERI-RISC-V VP++ and is made publi- 1 cally available as open-source software on GitHub . Moreover, a paper on CHERI-RISC-V VP++ has been accepted at the Asia and South Paci昀椀c Design Automation Conference (ASP-DAC) 2026. In the upcoming chapters, this thesis will present the design, implementation, veri昀椀cation and evaluation of CHERI-RISC-V VP++. Chapter 2 provides the necessary background on the RISC-V architecture and VPs. It also in- troduces the key concepts of CHERI, which are essential to understand the design decisions and implementation details presented in the later chapters. The chapter concludes with an overview of existing projects in the CHERI ecosystem, showcasing the current state of CHERI support in hardware and software and highlighting the projects relevant to this thesis. Chapter 3 focuses on extending the existing RISC-V VP++ architecture with CHERI support from a structural perspective. It explains how the Instruction Set Simulator (ISS) and the memory model of the RISC-V VP++ were adapted to support CHERI capabilities, including the introduction of capability registers, tagged memory, capability-aware instruction decoding and execution, and exception handling. Chapter 4 addresses veri昀椀cation of the CHERI extension using random testing with the TestRIG framework. The advantages of random testing are illustrated by showcasing exemplary bugs that could be found and 昀椀xed using TestRIG. But the chapter also highlights limitations of random testing by inspecting code coverage and showing bugs the TestRIG was not able to 昀椀nd. Building on the veri昀椀ed implementation, Chapter 5 explores CHERI-enabled software running on the bare-metal CHERI-RISC-V VP++. These examples demonstrate the basic principles of CHERI in practice and highlight its implications on the architecture from a control-昀氀ow oriented perspective. The chapter begins with a simple assembly program and explaining the necessary adaptions to make it CHERI-aware. Then, the complexity of the programs is gradually increased to show low-level programming details introduced by CHERI. This is followed by details on spe- ci昀椀c requirements when developing and compiling CHERI-enabled C applications, including the adaption of conventional techniques, like peripheral access through memory-mapped I/O, to en- sure compatibility with CHERI’s capability-based memory protection. Finally, this chapter ex- plores the protection mechanisms of CHERI based on small examples, demonstrating how CHERI can prevent common memory corruption bugs, and con昀椀rming the correct implementation of the CHERI-RISC-V VP++. With a veri昀椀ed CHERI-enabled VP and an understanding of bare-metal programs, Chapter 6 demonstrates the potential of CHERI-RISC-V VP++ by booting CheriBSD, a capability-enabled UNIX-like OS on the extended VP. This chapter highlights the functionality of CheriBSD on the VP and revisits examples from Chapter 5 to showcase CHERI’s memory protection mechanisms in a full OS context. Finally, Chapter 7 summarizes the results of this thesis and proposes an outline for future work.

1https://github.com/ics-jku/cheri-riscv-vp-plusplus

Chapter 2

Preliminaries

Before the actual work of this thesis can be presented, several foundational concepts and technolo- gies need to be understood. The following sections brie昀氀y explain the RISC-V architecture and the general concepts of a VP. This leads to the introduction of the RISC-V VP++ project, the starting point of this thesis. The core of this chapter presents the CHERI project and explains its design goals and fundamental principles. With a basic understanding of CHERI, the remainder of this chapter details the aspects of the CHERI standard relevant to this work. Finally, the chapter ends with an overview of the existing CHERI ecosystem, focusing on the tools and software employed in this work.

2.1 The RISC-V Architecture

RISC-V [14, 15] is an open standard ISA that was developed at the University of California, Berkeley, and is now maintained by the RISC-V Foundation [16]. Since its initial speci昀椀cation in 2011 [17], RISC-V’s emphasis on simplicity and scalability has led to a broad ecosystem of tools, hardware designs, and software support, driven by a large community of researchers and industry partners. The motivation behind RISC-V is to provide a royalty-free, open standard alternative to proprietary ISAs, with a focus on learning from past designs and avoiding mistakes made with existing ISAs. At its core, an ISA de昀椀nes how software interacts with a processor, specifying the set of instructions, their encoding, and how they interact with the system’s memory and registers.

2.1.1 RISC-V Base ISA and Extensions

RISC-V is designed to be modular, including a minimal base ISA and a set of standardized, op- tional extensions, making it suitable for a wide range of applications. The 昀椀rst volume of the RISC-V speci昀椀cation [14] de昀椀nes the base ISA and the 昀椀rst set of standard extensions, includ- ing the Mul/Div Extension (M), the Atomic Extension (A), Floating Point Extensions (F, D, Q), and the Compressed Extension (C). The second volume [15] of the speci昀椀cation de昀椀nes the RISC-V Privileged Architecture, which describes the interaction between the hardware and the OS, including memory management, interrupts, and trap handling.

2.1.2 RISC-V Custom Extensions

Additionally, the modularity of RISC-V allows for the creation of custom extensions, which can be tailored for speci昀椀c applications. To support the development of custom extensions, the standard guarantees portions of the encoding space to never be used by standard extensions. The terms green昀椀eld extension and brown昀椀eld extension are used to di昀昀erentiate between two types of ex- tensions. A green昀椀eld extension begins populating a new instruction encoding space and hence can only cause con昀氀icts at the pre昀椀x level. On the other hand, the term brown昀椀eld extension de- scribes an extension that 昀椀ts around existing encodings in an already populated encoding space. A brown昀椀eld extension is necessarily tied to a particular green昀椀eld extension and multiple brown昀椀eld extensions may share the same green昀椀eld extension.

3

2 Preliminaries    4

2.1.3 Virtual Memory Management

RISC-V supports virtual memory through a 昀氀exible and extensible page-based memory manage- ment system. The virtual memory system translates virtual address used by software into physical addresses used by hardware, allowing processes to operate in isolated address spaces and supporting features such as paging, memory protection and efficient context switching. The translation of virtual addresses is done by a multi-level page table structure, where the number of levels depends on the con昀椀gured address width. RISC-V de昀椀nes three standard page table formats: Sv32, Sv39 and Sv48, which support 32-bit, 39-bit and 48-bit virtual addresses respectively. Each level of the page table is indexed by using a portion of the virtual address, and the 昀椀nal Page Table Entry (PTE) provides the mapping to the physical address along with various access and permission bits. Disregarding the virtual address width, the PTE format always contains 8 昀氀ags in the lowest bits, which are used to control access and permissions. These bits are listed and explained in Table 2.1. Whenever a virtual address is accessed, the Memory Management Unit (MMU) performs a page table walk to translate the virtual address to a physical address. This process always starts at the root page table, whose physical address is stored in the Supervisor Address Translation and Protection (SATP) register. The walk proceeds level by level until a valid PTE with the required permissions is found. If no PTE is found or the permissions are insufficient, a page fault is raised. To reduce the performance impact of page table walks, RISC-V includes a Translation Lookaside Bu昀昀er (TLB), which caches recently used translations. When a virtual address is accessed, the MMU 昀椀rst checks the TLB for a matching entry. If a match is found, the physical address is retrieved directly from the TLB, otherwise a page table walk is initiated.

    Table 2.1: RISC-V Page Table Entry (PTE) 昀椀elds: bits 0–7

Bit Name Description 0 V (Valid) Indicates whether the PTE is valid. Must be set for the entry to be considered during address translation. 1 R (Read) Grants read access to the page. If clear, read accesses cause a page fault. 2 W (Write) Grants write access to the page. Must be 0 if R is 0. 3 X (Execute) Grants execute permission to the page. If clear, instruction fetches cause a page fault. 4 U (User) If set, User mode code may access the page. Otherwise, access is re- stricted to Supervisor or Machine mode. 5 G (Global) Marks a global mapping that exists in all address spaces. 6 A (Accessed) Set by hardware when the page is accessed (read, write, or execute). Software must clear this bit to track page usage. 7 D (Dirty) Set by hardware on write. If clear and a write occurs, a page fault is triggered.

2.2 Virtual Prototypes

A Virtual Prototype (VP) is an executable model of a hardware platform capable of running unmodi昀椀ed production software, which allows for early software development and design space exploration before physical hardware exists. VPs make use of higher abstraction levels to allow for orders of magnitude faster simulation compared to the simulations on the RTL [18]. Using high-level languages, such as C++, and domain speci昀椀c standardized libraries like SystemC and TLM to abstract communication details, makes VPs easier to understand and maintain, compared to the actual hardware design, thereby accelerating the development process.

2 Preliminaries 5

Nowadays, VPs are predominantly written in SystemC TLM [13]. SystemC [11, 12] is a C++ based modeling framework and simulation library that provides an event-driven simulation kernel and common building blocks to support the development of hardware simulations. SystemC enables designers to describe hardware components and their interactions at various abstraction levels, ranging from cycle-accurate RTL simulations, like known from other domain-speci昀椀c Hardware Description Languages (HDLs), to higher-level functional models. For the development of VPs, the more interesting abstraction level is the TLM extension [19], which abstracts communication between modules into high-level transactions instead of low-level signal toggling. TLM signi昀椀cantly accelerates simulations performance compared to RTL, by reducing the amount of detail that must be processed per clock cycle, which makes SystemC TLM especially interesting for the development of VPs. As an example, instead of modeling each signal on a bus individually, a TLM bus model only describes the transactions performed on the bus. TLM is used to model on-chip bus systems based on the memory-mapped I/O paradigm, which is common in modern System on Chips (SoCs). Individual SystemC modules are connected via one or more TLM sockets, allowing the modules to read or write data through these sockets. A communication in TLM is called transaction, and the data transferred in a transaction is referred to as payload. A payload always contains an address, a command, the data length and a pointer to the actual data, but can be extended with additional information.

2.3 RISC-V VP++

The starting point of this thesis is the RISC-V VP++ project, which was introduced in 2018 [20] and later improved and extended in subsequent works [21, 22, 23]. Eventually the project was released as a new open-source project called RISC-V VP++ [10]. This project realizes multiple virtual platforms and includes software and sample applications. The VP provides a platform based on the SiFive FU540 SoC [24], which is capable of booting a Linux OS. RISC-V VP++ is implemented in SystemC TLM and realizes multiple RISC-V based platforms, all sharing a common infrastructure. In Figure 2.1 the core components of the RISC-V VP++ architecture are shown. The image is based on the latest architecture diagram of the original VP published by Schlägl and Große [23].

ISS DMI Access (RV64 Core) Program Counter Decode/ LSCache DMem IF MMU General Purpose Registers Interpret/ Execute DBBCache IMem IF TLB Control & Status Registers

                                                            TLM Transactions

               TLM 2.0 Bus                                  Memory Map          Memory

           CLINT                                             UART,
                                                             Framebuffer,
           Timer/SW                   PLIC      Peripherals  Mass Storage,
           Interrupts                 Ext. Interrupts        Mouse,

Interrupts Keyboard, Network, ...

               Figure 2.1: Architecture of RISC-V VP++

The most important component of the VP is the ISS as it models the Central Processing Unit (CPU) of the platform. The VP implements both RV32 and RV64 ISAs separated in two di昀昀erent ISS implementations, with only RV64 shown in the 昀椀gure. The ISS supports the RISC-V privilege levels

2 Preliminaries 6

M-, S- and U-mode and the instruction set extensions IMAFDC, as well as the vector extension V. It is possible to create multi-core systems by combining multiple instances of the ISS. Inside the ISS, the instructions are 昀椀rst loaded from the instruction memory at the address the Program Counter (PC) is pointing to, and then decoded afterwards. Regular RISC-V instructions are always 32-bit wide words. The ISS uses a switch-case structure to decode the instruction, where the 昀椀rst switch is done based on the opcode 昀椀eld of the instruction. The opcode is a 7-bit 昀椀eld that identi昀椀es the instruction category and is located at bits 6–0 of the instruction word. Based on the opcode, the second switch is done on the funct3 昀椀eld (bits 14–12) and, if necessary, on the funct7 昀椀eld (bits 31–25) to further distinguish between instructions in the same category. In Figure 2.2 the RISC-V instruction formats are shown, which illustrate the position of the opcode, funct3, funct7 and other 昀椀elds depending on the instruction format.

31 25 24 20 19 15 14 12 11 7 6 0 funct7 rs2 rs1 funct3 rd opcode R-type imm [11:0] rs1 funct3 rd opcode I-type imm [11:5] rs2 rs1 funct3 imm [4:0] opcode S-type imm [12,10:5] rs2 rs1 funct3 imm [4:1,11] opcode B-type imm [31:12] rd opcode U-type imm [20,10:1,11,19:12] rd opcode J-type

                             Figure 2.2: RISC-V Instruction Formats

The VP also supports RISC-V Compressed Instructions (RVC), which provides 16-bit encodings for a subset of frequently used RISC-V instructions, designed to improve code density without sacri昀椀cing performance or compatibility. Compressed instructions reduce the size of common in- structions, such as loads stores and arithmetic operations, by encoding them in a more compact format. Compressed instructions are identi昀椀ed by the two lowest bits of the instruction word, which are less than 0b11 for compressed instructions and 0b11 for regular instructions. If the decoder of the VP detects a compressed instruction, it 昀椀rst decodes the compressed instruction in a similar but separate switch-case structure. Once the instruction is identi昀椀ed it gets expanded into a regular 32-bit instruction internally. In the ISS, the decoder always returns a unique opId, from the so-called opId-Table (regardless of whether the instruction is compressed), which corresponds to the decoded instruction. If the decoded value does not match any of the de昀椀ned instructions, or it is not supported by the currently enabled ISA, a value for unknown or unsupported instructions is returned instead. The opId that is returned by the instruction decoder is then used in the ISS’s execution block to determine which instruction implementation to call. Inside the ISS’s core function exec_steps, the actual execution block is implemented. This function again implements a large switch-case block, where instruction execution is determined based on the opId returned by the instruction decoder. In the execution block of the ISS General Purpose Registers as well as Control and Status Registers (CSRs) and memory contents are manipulated according to the given instruction. The VP also includes a MMU that supports the RISC-V virtual memory system and implements the RISC-V Sv32, SV39 and SV48 page based virtual memory schemes. The ISS interacts not directly with memory, but accesses memory via dedicated interfaces (Data Memory Interface (DMem IF) and Instruction Memory Interface (IMem IF)) and furthermore uses the MMU to translate virtual to physical addresses. The boxes Load Store Cache (LSCache) and Dynamic Base Block Cache (DBBCache), which are located between the instruction execution block and the memory interfaces, are introduced by Schlägl and Große [23] and are used to optimize the performance of the ISS. It is important to understand that these two blocks do not model actual hardware caches, but rather optimization techniques to speed up execution and reduce memory accesses during VP simulation. Interrupt processing is done by the Core Local Interruptor (CLINT) and Platform Level Interrupt Controller (PLIC) components, shown in the bottom left of the image. The CLINT generates local interrupt sources such as software and timer interrupts and provides a corresponding compare register for each core to trigger timed interrupts. Additionally, CLINT enables on core to trigger a

2 Preliminaries    7

software interrupt on another core by writing to a speci昀椀c register. The PLIC acts as a multiplexer that processes external interrupt signals from peripheral devices. Interrupt prioritization is done based on core-wise PLIC con昀椀guration registers. On the right side of Figure 2.1 the VP’s memory is shown. Each memory is realized as a TLM module that allocates memory on the host system and multiple such components can be combined in one platform of the VP, which is indicated by the stacked symbol in the 昀椀gure. It is possible to initialize the memory content at the top level of the VP. This allows to load software images in form of Executable and Linkable Format (ELF) 昀椀le in the program memory, which can then be executed by the VP. At the center of the image is the TLM bus, which connects all components of the VP and allows them to communicate with each other. To accelerate memory accesses, it is possible to skip the TLM transaction and instead directly access the memory via the Direct Memory Interface (DMI). The last component shown in Figure 2.1 are the peripherals, which are again symbolized as a stacked component to indicate that multiple peripheral components can be connected. Peripherals are TLM modules implemented as memory-mapped I/O accessed via the TLM bus. The RISC-V VP++ project includes a set of standard peripherals, reaching from simple timers and sensors to more complex components like Universal Asynchronous Receiver/Transmitter (UART) implementations and Ethernet.

2.4 CHERI

2.4.1 What is CHERI?

The Capability Hardware Enhanced RISC Instructions (CHERI) project was initiated in 2010 by the University of Cambridge and is a part of the Clean Slate Trustworthy Secure Research and Devel- opment (CTSRD) (pronounced ”custard”) project [25]. CHERI is a hardware/software/semantics co-design project, that extends conventional hardware ISAs with new architectural features to en- able 昀椀ne-grained memory protection and highly scalable software compartmentalization [26]. The newly introduced memory protection mechanisms makes it possible to use traditionally memory- unsafe languages like C and C++ in a more secure way, o昀昀ering e昀昀ective and efficient defenses against widely exploited vulnerabilities. Additionally, scalable compartmentalization techniques allow OSs and applications to be broken down into smaller, isolated components, reducing the im- pact of security 昀氀aws in ways that conventional architectures cannot achieve. CHERI strengthens software robustness by constraining escalation paths that could otherwise allow low-level bugs to be exploited into more severe security vulnerabilities. An example of this could be code injection via bu昀昀er over昀氀ows, control-昀氀ow corruption and other memory based attacks. CHERI provides improved memory protection by introducing an architectural feature called a ca- pability. A capability is an extension of a conventional C or C++ pointer, with additional metadata that describes the bounds and permissions of the memory region that the pointer can access. These capabilities serve as unforgeable tokens of authority, granting the holder the ability to perform cer- tain actions on a given memory region. Details on capabilities follow in Section 2.4.5 and more on their implementation in the VP is found in Section 3.2. A fundamental security guarantee of CHERI is that capabilities are unforgeable, which ensures pointer provenance. This means that capabilities can only be constructed by deriving from existing capabilities, and can never be created from scratch. CHERI’s strict application of the monotonicity concept implies that capabilities can only become more restrictive than their parent, but never more permissive.

2.4.2 CHERI Design Goals

The design goal of CHERI is to improve the security of modern C-language Trusted Computing Bases (TCBs) through processor support for 昀椀ne-grained memory protection and scalable soft- ware compartmentalization [6]. CHERI aims to provide compiler-driven memory protection by

2 Preliminaries    8

protecting programmer-described data structures and references, unlike the coarse page-based pro- tection constraints that a MMU-based system would provide. The 昀椀ne-grained memory protection is achieved by the extension of pointers in languages such as C and C++ with capabilities, which constrain memory access ranges and the allowed operations on them. This instrumentation is han- dled by the compiler, so applications typically require no or only minimal source code modi昀椀cations. Additionally, CHERI aims to protect the integrity, provenance and monotonicity of those pointers to prevent unauthorized manipulation that would lead to privilege escalation otherwise. This 昀椀ne-grained protection also allows for compartmentalization of software within application instances, which means that di昀昀erent parts of the software can be isolated into components to mitigate the impact of a security breach. This is done by applying known principles of security like encapsulation, type safety, abstraction and the principle of least privilege. While these are not new principles, CHERI allows for a more efficient implementation to reach a sweet spot of simultaneously having low overhead, wide applicability and easy migration [27]. As CHERI allows to decouple virtualization from separation, scalability problems imposed by MMUs based on TLBs are avoided. In contrast to traditional MMU-based system where each protection domain is asso- ciated with a separate page table and TLB entries, CHERI uses capabilities to enforce isolation, allowing many domains to coexist without expensive TLB management. In addition to enforcing compartmentalization through capabilities, CHERI introduces a Compartment Identi昀椀er (CID) to support isolation of software components at the microarchi- tectural level. While capabilities de昀椀ne clear security boundaries between components, the CID allows the hardware (e.g., in the branch predictor) to recognize these boundaries as well. By tag- ging internal state with a CID, the processor can avoid sharing sensitive prediction data accross compartments, thereby reducing the risk of side-channel attacks [28]. Since the de昀椀nitions of in- structions related to CID are not yet 昀椀nalized, they are not further relevant for this thesis and are therefore not discussed in detail. For more information on CID and side-channel attacks we refer to Appendix C.15 of [6] and [28]. Another very important aspect of the CHERI design is a viable transition path from current soft- ware and hardware design towards this new approach. A viable transition path means that CHERI hardware should be able to run current software without the need for signi昀椀cant modi昀椀cation. The goal is to retain existing design beginning from hardware, over operating-systems up to software compilation models, and only extend on the existing architectures, as far as possible. CHERI is able to blend architectural capabilities with existing conventional MMU-based architectures and with conventional C/C++ software stacks. This is achieved by the CHERI-aware compiler, which is able to generate code that makes use of the new architectural features, without requiring a complete rewrite of the existing software sources. The hybrid approach allows non-CHERI and CHERI-aware instructions to run on the same hardware, allowing for incremental development and usage of CHERI within existing ecosystems.

2.4.3 Principles of CHERI

The CHERI design is based on two principles [29]. The 昀椀rst is the principle of least privilege [30], which states that a program should operate with only the permissions necessary to perform its intended function. This principle is realized through architectural privileges such as memory bounds restrictions and 昀椀ne-grained permission controls. For example, CHERI extends pointers with capabilities, that restrict access to the speci昀椀c object the pointer references, in contrast to C programs on conventional hardware, where pointers may access the entire process address space, constrained only by coarse page-based permissions enforced by the MMU. The principle of least privilege has a long history in academic research, which has examined both the expression of reduced privilege and mechanisms for selecting appropriate privileges [6, 31]. A simple example of the principle outside the CHERI context is found in Unix 昀椀le permissions: A script that processes system logs should not run with root privileges, but instead under a user account that only has read access to the /var/log directory. This ensures that the script can perform its intended task without being able to modify or access unrelated parts of the system, thereby limiting potential damage in case of bugs or compromise.

2 Preliminaries    9

The principle of intentional use forms the second core principle of CHERI. According to this principle, a program performing an action must be explicit rather than implicit on the selected rights that authorize this action. The principle of intentional use avoids an issue classically known as confused deputy problem [32], which refers to a scenario where program unintentionally exercises a privilege that it holds legitimately, but does so on behalf of another program that lacks this privilege. A typical example of the confused deputy problem is a program that is allowed to access sensitive system data but is invoked by a less privileged user. If the program does not verify the users access rights it may grant access to information that should remain restricted. An example of the principle of intentional use outside the CHERI context is the const quali昀椀er in C or C++. By declaring a pointer as const, the programmer explicitly restricts it to read-only access. This makes the programmer’s intent clear and prevents accidental modi昀椀cation of the pointed-to data, thereby enforcing the principle of intentional use.

2.4.4 The CHERI Standard

The concept of CHERI does not rely on a speci昀椀c architecture and does not imply any particular details on implementations. While the 昀椀rst implementation done by the authors of [33] was based on the MIPS architecture, the project has been extended for RISC-V and in the latest version the primary reference platform was shifted towards CHERI-RISC-V. Additionally, CHERI is imple- mented in Arm’s prototype Morello architecture [34] and a sketched variant of CHERI for x86-64 was presented in [35]. For the speci昀椀cation of CHERI, a new foundation called CHERI Alliance was founded in 2024, which aims to drive a global industrial ecosystem around CHERI. This alliance is currently working on the rati昀椀cation of a speci昀椀cation for the RISC-V CHERI Extension. However, at the time of writing, this speci昀椀cation is still under development and not all projects that are provided by the CTSRD project are fully ported to this new speci昀椀cation. As the CHERI Alliance has members from industry as well as academia, the standardization process is difficult and not all features that are already proposed by the CTSRD project will be implemented in the 昀椀rst version of the RISC-V CHERI Extension. The latest CHERI speci昀椀cation proposed by the CTSRD project re昀氀ects a more academic direction, incorporating several experimental features that are not yet ready for production use. To maximize this work’s contribution to the future development of CHERI, the decision was made to implement not only the bare minimum required by the RISC-V CHERI Extension, but also the most recent features proposed by CTSRD. This decision is further motivated by the lack of a publically available test suite for the RISC-V CHERI Extension speci昀椀cation. Accordingly, the content presented in this document is based on the latest version of the CHERI ISA(Version 9) [6] as de昀椀ned by CTSRD. This version is further referred to as CHERI ISAv9. While the focus is on the RISC-V variant, the fundamental concepts are consistent with the MIPS-based version.

2.4.5 Architectural Capabilities

As already explained in Section 2.4.1, capabilities are the core feature of CHERI and form the basis for its security guarantees. Therefore, this section provides a detailed overview of the main features of capabilities. The goal is to support strong protection of C and C++-language pointers by using capabilities as unforgeable and delegatable tokens of authority. Capabilities extend integer virtual addresses with metadata that limits how they are manipulated and used to protect their integrity. Like pointers, capabilities are stored in two architectural forms: in integer registers, and in memory. The size of a capability (CLEN) is always twice the architectures natural address size (XLEN). This leads to a capability size of 64-bit on RV32 and 128-bit on RV64 architectures respectively. Not included in CLEN is the additional Tag bit, required for each capability, which is a core com- ponent of the CHERI architecture. The 1-bit Tag is used to indicate whether a capability is valid

2 Preliminaries    10


63        48     47    30     29     28           27 26 25        14 13        0
    permissions        object type      R         F IE       B        T
                    address
                       64

                         Figure 2.3: CHERI Capability Encoding

or not. The Tag is atomically bound to the capability but not actually visible via byte-wise loads
and store operations, which means memory must be modi昀椀ed to implement this ”out-of-band” Tag
for each CLEN bytes of memory. The Tag is maintained by legal capability operations but cleared
by all other operations on that memory (e.g., a regular RISC-V store operation). CHERI’s pro-
tection features do not only apply for the pointers themselves but can also apply for the pointee
data or referenced sections of code. While the exact in-memory representation of capabilities is
architecture-speci昀椀c, the required metadata is the same across all implementations. As capabilities
form the core of CHERI and the main work of this thesis was to implement CHERI in a RISC-V
architecture, in the following all major features will be explained. Whenever actual architectural
layouts are mentioned they will be speci昀椀c to the RISC-V implementation. Figure 2.3 shows the
layout of an encoded capability in the way it is implemented the current state of the VP.

Tag Bit

The premise of CHERI is, that each location that can hold a capability (register, capability aligned
memory word) has a 1-bit Tag associated with it. This bit atomically tracks the validity of its
corresponding capability, which requires a special modi昀椀ed memory further referred to as tagged
memory. A tagged memory architecture must ensure that the unaddressable             Tag is atomically
bound to each memory segment that can hold a capability. This means the memory is divided
into segments of size CLEN, which is the size of a capability. A Tag is associated to each CLEN-sized
segment of memory, therefore the granularity depends on the speci昀椀c ISA (either 64 bit for RV32
or 128 bit for RV64). These memory segments are then protected by the Tag, which must be set
to 0 whenever a non-capability operation is performed on the segment. Stores of non-capability
types e.g. single bytes do count as such a non-capability operation and must therefore clear the
Tag. It is also allowed to have untagged memory in the same system. In this case tags of capability
values are discarded when stored in untagged memory and loaded capabilities will have their Tag
bit cleared.
Because non-capability-aware operations do always invalidate the Tag, in-memory pointer corrup-
tion attacks are caught on the next attempt to dereference the pointer. The Tag also controls which
operations can be performed with the capability, leading to precise exceptions, if a valid Tag is
required for operations. It is important to understand, that the 昀椀elds of a capability can always
be accessed, disregarding its Tag, which means the value of the capability can be modi昀椀ed, but
non-capability-aware operations will clear the capabilities Tag. Similarly, all addressable portions
of a capability (not the          Tag) can be read from memory via ordinary load operations. In other
words, an untagged capability value is simply data, that can be treated in any arbitrary way. Only
operations that dereference or use a capability require a valid Tag and therefore a valid capability.
Dereferencing in this context means using the capability for loading and storing data or other
capabilities, or instruction fetches. A valid Tag    is also required for jumps or domain transitions
and the sealing and unsealing of other capabilities, which is explained in the next paragraph.

Object Type and Sealing

A capability’s object type is used for marking capabilities as sealed or unsealed. A sealed capability
cannot be dereferenced and is immutable, which means any modi昀椀cation of its 昀椀elds clears the Tag.
The immutability allows sealed capabilities to be used as unforgeable tokens of authority for higher-
level software. Sealed pairs of capabilities share a common object type and are designed to support
the linking of a pair of code and data capabilities to be used together during domain transition.

Tag

2 Preliminaries 11

The jump-like instruction cinvoke allows the two sealed capabilities to be unsealed and transfers control 昀氀ow to the code pointed to by the code capability, if the object types of the sealed pair match. In this way controlled privilege escalation can be implemented. While a single bit would be enough to mark sealing, the use of the object type allows capabilities to be indelibly and indivisibly linked. Sealed entry (sentry) capabilities seal a single code capability and describe a function entry point. Those are used to establish control-昀氀ow integrity. A program may jump to a sentry to begin executing from there. The jalr instruction is modi昀椀ed to automatically unseal a sentry target and install it in the PC. Jump-and-link instructions do also seal the return address which serves as the return point for the callee, but can not be used to authorize memory loads or stores.

Permissions

Permissions add the ability to restrict how a capability may be used. Each bit of the permission 昀椀eld represents a speci昀椀c permission. The upper bits of the permission are reserved as user de昀椀ned permissions and are not further speci昀椀ed in the standard. For example, they can be used by the OS or application programs to de昀椀ne their own functionality. The other bits are referred to as architectural permissions. Those restrict the operations that can be performed with a capability. The following list gives a brief overview of the architectural permissions de昀椀ned in CHERI ISAv9: permit_set_CID: Required for setting the CIDs access_system_regs: Required for an instruction to access CSRs permit_unseal: Required for unsealing capabilities permit_cinvoke: Required for capability invocation permit_seal: Required for sealing capabilities permit_store_local_cap: Required for storing local capabilities permit_store_cap: Required for storing capabilities permit_load_cap: Required for loading capabilities permit_store: Required for storing data permit_load: Required for loading data permit_execute: Required for executing instructions (checked when fetching instructions) global: Indicates if the capability is global or local (depending on this bit, the local permissions from above are required)

Reserved

The two bits labeled as 𝕅 in Figure 2.3 are reserved for future use.

Flags

The general speci昀椀cation of CHERI de昀椀nes 昀氀ag 昀椀elds which are included in the capability encoding. Those are supposed to be manipulated freely and unlike permissions should not determine privilege. This means that its state is orthogonal to capability monotonicity. They are intended to a昀昀ect semantics of access and do not impose access control. For RISC-V only a single bit is used as 昀氀ag, which is labeled as 𝔹 in Figure 2.3. This 昀氀ag is used to control the opcode interpretation on instruction fetch and switches between capability mode and default decoding and is further referenced as flag_cap_mode.

2 Preliminaries    12

Bounds

The 昀椀elds for base, top and the internal exponent (𝔵, 𝕇 and 𝔼𝔸 in Figure 2.3) are used to encode the bounds of a capability. The bounds represent the lower and upper memory address that the capability is authorized to access. As an example, a capability pointing to a C-style array of size 𝕁 , with 1-byte elements would have a base address pointing to the 昀椀rst element of the array and its top address pointing to the element after the array’s last element, i.e., the address of the 昀椀rst byte after the array. This results in the capability’s length being 𝕁 bytes, which is the di昀昀erence between the top and base address. It is possible for a capability’s address to temporarily move out-of-bounds, only the attempt of dereferencing and out-of-bounds capability will cause an exception. Bu昀昀er over昀氀ows on global variables, heap and stack and also out-of-bounds execution exploits are therefore prevented. The complexity comes with the representation of these bounds, as both the top and base address would both require the same amount of bits as the actual address (XLEN). Since additional bits are required for the other 昀椀elds (object type, permissions, 昀氀ags), capabilities constructed this way would require four times the architectures address size. But as Figure 2.3 shows, only 26 bits are used for 𝔵 and 𝕇 , plus one additional bit 𝔼𝔸 marking which exponent is used. This size reduction is achieved by using a compressed encoding scheme, which will be explained in more depth in the next section.

2.4.6 Capability Compression in CHERI

CHERI uses a compression algorithm called CHERI Concentrate Compression, introduced by Woodru昀昀 et al. [36] to represent bounds in capabilities. This algorithm is a new, so-called fat-pointer compression scheme that improves encoding efficiency, solves important pipeline prob- lems and eases semantic restrictions of compressed encoding. Figure 2.4 shows the CHERI-256 format, that was used in 昀椀rst iterations of the CHERI architecture. This format does not compress the boundaries of the capability and therefore requires four times the size of the integer address pointer. Whilst this is obviously not efficient, it serves as a starting point, and clearly visualizes what CHERI needs to represent in a capability. One also might notice, that this format does not store base and top but base and length instead. It should be obvious that this is functionally equivalent, as long as the entire address space is reachable by 64 bit addresses.

63 0 permissions, otype, flags length 256-bit base address

    Figure 2.4: CHERI-256 Capability Format

The CHERI Concentrate Compression algorithm stores the values 𝔵 and 𝕇 and an additional 昀氀ag 𝔼𝔸 (called internal exponent 昀氀ag) in the capability. 𝔵 and 𝕇 do no longer represent the actual base and top address. Instead, the actual base and top values can be calculated, by performing a decompression using the following scheme: First the exponent 𝔸 is determined based on the internal exponent 昀氀ag 𝔼𝔸. If 𝔼𝔸 is zero, the exponent 𝔸 is 0, otherwise the exponent is extracted from the lowest bits of 𝔵 and 𝕇 . The value of 𝕇 is two bits smaller than the base, because it can be reconstructed based on the value of 𝔵, depending on a carry-out bit and the exponent. Finally, the base and top address can be calculated by inserting them into the address, as visualized in Table 2.2. The calculation is done by setting the lowest 𝔸 bits to zero, inserting 𝕇 / 𝔵 in the middle and applying a correction term to the value of ÿ𝕡𝕜𝕝. This encoding procedure, obviously implies restrictions on the representable address space for top and base addresses. However, the limits of this algorithm are not the focus of this thesis. More research on this topic can be found in [6] and [36].

2 Preliminaries 13

                            Table 2.2: Decoding of Base and Top Address

address, ÿ = ÿ𝕡𝕜𝕝 = ÿ[63 ∶ 𝔸 + 14] ÿ𝕚𝕖𝕑 = ÿ[𝔸 + 13 ∶ 𝔸] ÿ𝕙𝕜𝕤 = ÿ[𝔸 − 1 ∶ 0] top, 𝕡 = ÿ𝕡𝕜𝕝 𝕡 𝕇 [13 ∶ 0] 0 base, ÿ𝕡𝕜𝕝 𝕏 𝔵[13 ∶ 0] 0

2.4.7 Representing Capabilities

While the CHERI standard de昀椀nes the contents of a capability, it does not de昀椀ne the exact rep- resentation of capabilities in memory or registers. This allows for freedom in the implementation of capabilities, enabling di昀昀erent designs to optimize for speci昀椀c use cases. The formal CHERI RISC-V Sail model (see Section 2.4.9) makes use of this freedom by de昀椀n- ing two di昀昀erent representations of capabilities. This separation allows for space efficient storage of capabilities in memory, while still providing a convenient representation that allows for fast manipulation of capabilities in registers. The 昀椀rst format are encoded capabilities, which are exactly CLEN = 2 XLEN bits in size. The encoded capability does not include the Tag bit, as it is out-of-band. This representation follows all the rules and implications that are speci昀椀ed by the generic CHERI de昀椀nition and everything explained previously in Section 2.4.5. This representation matches with the visualization of a capability shown in Figure 2.3. As the encoded capability format is used to store capabilities in memory, it is further referred to as the in-memory representation. The second format is used for capabilities stored in registers and is further referred to as the in- register representation. This representation makes use of a partially decompressed encoding format, which does also include the Tag. This format is not only used to store capabilities in registers, but also for all capability manipulation instructions. The partially decompressed encoding format allows easier access to speci昀椀c 昀椀elds of a capability and also accelerates the validation and manipulation of a capability’s bounds. Details on how the in-register representation is implemented in the VP are explained later in Section 3.2. It is important to understand that the in-register representation is not visible to software, it is only used internally by the VP to represent capabilities in registers and during capability manipu- lation. This distinction is noteworthy, as some CHERI instructions such as cgethigh, expect the implementation to return the in-memory representation of a capability, although the instruction operates on registers. This can be seen in the distinction between capToBits and capToMemBits in the CHERI RISC-V Sail model.

2.4.8 Capability Constants

Two special cases of capabilities are de昀椀ned, serving both illustrative and functional purposes. Beyond their role in conceptual explanation, they are implemented as constants within the VP and utilized during capability resets.

Null Capability

The Null Capability is the capability representation of a NULL pointer. The Null Capability is de昀椀ned to have the Tag set to zero, its base set to zero, and the top set to the maximum addressable memory, so e昀昀ectively, NULL is the integer value zero stored as a non-capability value in a capability register. While it is not semantically meaningful to talk about the length of the Null Capability, as it does not reference to a memory region, the de昀椀nition to set it to MAXINT (264 − 1) is important for the compression of capabilities. The permissions, the capability mode 昀氀ag and the reserved bits are set to zero. The object type 昀椀eld is set to its maximum value (0x3FFFF), indicating an unsealed capability. To summarize this, the 昀椀elds of the Null Capability are shown

2 Preliminaries 14

             Table 2.3: In-Register Representation of Capability Constants

        Tag      Permissions     Object Type    R   F   𝔼𝔸   E   B    T           Address

Null 0 0 0x3FFFF (-1) 0 0 1 52 0 4096 0 In昀椀nite 1 0xFFFF 0x3FFFF (-1) 0 0 1 52 0 4096 0

             Table 2.4: In-Memory Representation of Capability Constants

                    Permissions   Object Type   R   F   𝔼𝔸   B   T   Address
       Null         0             0             0   0   0      0   0   0
       In昀椀nite     0xFFFF        0             0   0   0      0   0   0

in Table 2.3. The table shows the in-register representation, which uses the partially decompressed encoding format mentioned previously. To ensure a memory, that is reset to zero, contains only Null Capabilities, the in-memory repre- sentation of the Null Capability must be the all zero value. To achieve this, each capability that is stored in memory is XOR-ed with the Null Capability before being stored. Vice versa, all ca- pabilities that are loaded from memory are XOR-ed with the Null Capability again. This means that the Null Capability is used as a mask to ensure that a memory reset to zero contains only Null Capabilities. To visualize this, the in-memory representation of the Null Capability is shown in Table 2.4. This table is in the fully compressed encoding format, that is shown in Figure 2.3.

In昀椀nite Capability

The In昀椀nite Capability or also called Default Capability could be seen as the opposite of the Null Capability. As its name implies, the In昀椀nite Capability is a capability that can access the entire address space and has all permissions set. It is used as the default value for some special capabilities, such as the Program Counter Capability (PCC), which must be able to access the entire address space at the beginning of a program. The In昀椀nite Capability is de昀椀ned to have the Tag set to one, its boundaries set to reach the entire address space, all of its permissions set to one and the object type set to its maximum value (0x3FFFF), indicating an unsealed capability. The 昀氀ag for the capability mode and the reserved bits remain zero, so does the address. The in-memory representation of the In昀椀nite Capability results in a value of 0x昀昀昀昀0000000000000000000000000000, as it is XOR-ed with the Null Capability. The in-register representation of the In昀椀nite Capability is shown in Table 2.3 and its in-memory representation in Table 2.4.

2.4.9 The Existing CHERI Ecosystem

The CTSRD project has published multiple open-source projects that are related to the CHERI architecture. As the entire GitHub page of CTSRD-CHERI1 hosts 314 repositories at the time of writing, this section gives a brief overview of those that are relevant for this thesis.

CHERI RISC-V Sail model

The CHERI RISC-V Sail model [37] is a formal model of the CHERI extension for the RISC-V architecture written in the Sail language [38]. Sail is a language for de昀椀ning the ISA semantics of processors in a formally precise and executable way introduced by Gray et al. [39]. Sail is a powerful tool as it allows for formal veri昀椀cation of the 1https://github.com/orgs/CTSRD-CHERI/repositories

2 Preliminaries    15

model but can also be used to generate a simulator that allows execution of the model. Its syntax is designed to be readable by engineers familiar with existing vendor documentation. The CHERI RISC-V Sail model is an extension of the RISC-V sail model [40]. For this thesis the sail model was used in two ways. First it was used as a reference whenever the speci昀椀cation was not clear or missing details. Second, the Sail model was used during the veri昀椀cation process of the 昀椀nished implementation. As an executable C emulator can be generated from the Sail model, instructions can be executed and compared to the results of the VP implementation.

LLVM-project

The CHERI LLVM compiler [41] implements the CHERI C and C++ variants of these program- ming languages. They add support for the protection of language level resources such as stack-and- heap allocations and also of sub-language structures such as dynamic program linkage and global variable access [34]. This compiler is used for all CHERI-related software development done in this thesis.

CheriBSD

CheriBSD [42, 43] is a UNIX-like OS based on FreeBSD, which has been extended to support CHERI. CheriBSD enables kernel and user-level memory safety and scalable software compart- mentalization by leveraging CHERI’s memory protection capabilities to isolate components [34]. CheriBSD ensures enhanced security measures, dynamic adjustments and reduced vulnerabilities within a compartmentalized environment [44].

QEMU-CHERI

QEMU-CHERI [9] is a fork of the QEMU project that adds support for the CHERI architecture. QEMU [45] is an open-source machine emulator and virtualizer that allows users to run software for one architecture on a di昀昀erent one. QEMU supports full system emulation, which emulates a complete hardware platform including CPU, memory and peripheral devices. QEMU is widely used for testing and debugging of embedded systems and OSs, especially when direct access to target hardware is limited or unavailable. In contrast to SystemC-based VPs, which can provide cycle-approximate simulation of hardware, QEMU focuses on functional emulation optimized for speed. As a result, a QEMU-based model o昀昀ers only limited insight into timing behavior or detailed hardware interactions. Another drawback of QEMU is the lack of a standardized modeling framework, such as SystemC, which makes the integration of external components or models more cumbersome.

Cheribuild

Cheribuild [46] is a build system implemented as a python script that automates all the steps required to build various CHERI-related projects and is therefore the starting point for the setup of most other projects. While it is absolutely not required to use this tool, cheribuild provides a repeatable way to set up the rather complex toolchain required to build for CHERI systems and is used to set up all CHERI related projects used in this thesis. As an example, the CHERI aware LLVM compiler and the CHERI aware QEMU fork are built using cheribuild.

2 Preliminaries    16

CapableVMs

CapableVMs [47] is a Engineering and Physical Sciences Research Council (EPSRC)-funded [48] research project investigating how programming language virtual machines (VMs) can make use of hardware capabilities, such as those found in CHERI. This project provides a repository with CHERI examples that explore the fundamental operations but also interesting corner cases and simple demonstrative applications of CHERI. These examples are used in this thesis as a starting point for CHERI exploration in Section 5.2.

TestRIG

The TestRIG [49] is a test framework for RISC-V processors that relies on Random Instruction Generation (RIG). It generates random instruction sequences, runs them on multiple implemen- tations and compares the results. In combination with the Sail model this is used to verify the correctness of the implementation done in this thesis. This tool utilizes the standardized formal model of the RISC-V architecture in the Sail language, giving a human-readable speci昀椀cation. In an ideal world, any RISC-V implementation could for- mally prove its equivalence to the Sail model. However, proof tools are not yet sufficiently au- tomated to do this task. For this reason, researchers at the University of Cambridge developed TestRIG to check equivalence between Sail and other models. The TestRIG performs equivalence checking by generating random instruction sequences, executing them on both implementations and comparing the results. While this approach does not prove equivalence it at least can demon- strate divergence and is usable in all stages of development. The RISC-V Formal Interface (RVFI) standard [5] is used to observe the change in state after each instruction. During a normal program execution, the CPU would fetch an instruction from program memory at an address determined by the PC. As the goal is not to test a completed fabricated chip, but comparing executable for- mal models with software ISA simulators and simulated hardware executions this is not feasible. Therefore, the TestRIG uses Direct Instruction Injection (DII), which ensures the next executed instruction is always determined by the test harness, regardless of the CPU’s PC or the memory state. The developers of TestRIG claim that this approach is easier to use than unit tests and test coverage is higher, as random testing replaces developer e昀昀ort to explore possibilities [50].

    Verification Engine (VEngine)

    Consume Exection Traces Generate Instructions


DII
RVFI

Socket    Socket



RISC-V            RISC-V

Implementation Implementation A B

                             Figure 2.5: TestRIG Architecture

Figure 2.5 is an illustrative example of the TestRIG architecture, showing the interaction between its components. The Veri昀椀cation Engine (VEngine) communicates with multiple (in this use case

2 Preliminaries    17

two) implementations via sockets. The usage of networking sockets allows the VEngine to com- municate with the implementations in a platform-independent way. For example, the QuickCheck- VEngine is written in Haskell, while the reference implementation is written is Sail (either inter- preted by OCaml or compiled into C), and the CHERI-RISC-V VP++ is written in C++. The instructions generated by VEngine are sent to the implementations via DII, which execute the given instructions and return the results in the form of RVFI traces. These traces are then com- pared by the VEngine to check for equivalence. In case the traces diverge, the VEngine will report the divergence and the test case that caused it. The QuickCheckVEngine, used as the VEngine for all tests in this thesis, is a basic QuickCheck based random instruction generator for RISC-V written in Haskell. The VEngine has its instruction set split up into multiple modules, divided into the RISC-V extensions. This split ensures that the VEngine only generates instructions that are supported by the implementations, by passing the architecture string to the VEngine. Additionally, regex patterns can be used to include or exclude di昀昀erent groups of tests. In total 44 di昀昀erent groups of tests are de昀椀ned in the TestRIG. Some of these groups require the Device under Test (DuT) to support certain RISC-V extensions, while others are independent of any extension and only test the base RISC-V architecture. Seven of these 44 groups depend only on the CHERI extension, while 10 others require the CHERI extension and additional other extensions, like the atomic extension or the compressed extension.

Chapter 3

Extending RISC-V VP++ with CHERI

With the fundamental concepts of CHERI explained, the focus of the thesis shifts to the extension of an existing RISC-V architecture with CHERI support. As mentioned in the introduction, the goal is to use a VP as a platform, to allow for detailed and cycle-approximate evaluation of CHERI and enable insights into the CHERI architecture, that are not possible with existing CHERI simulators. For this purpose, the RISC-V VP++ project, introduced in Section 2.3, serves as the starting point. The following sections describe the modi昀椀cations required to transform the RISC-V VP++ into the CHERI-enabled variant, further referred to as CHERI-RISC-V VP++. These adaptions are explained primarily from a structural and architectural perspective, highlighting the changes to the components of the VP and their interactions. This approach emphasizes the implications CHERI has on the modi昀椀ed hardware and the e昀昀ort required for these adaptions to give an idea of the required additional resources. A more control-昀氀ow oriented perspective, illustrating how CHERI enforces software protection within hardware is deferred to later chapters. Although RISC-V VP++ supports both the RV32 and RV64 ISA, this work focuses on the RV64 variant only. While this signi昀椀cantly reduces the implementation e昀昀ort, the design of the VP and all modi昀椀cations allows for easy extension to the RV32 ISA in the future.

3.1 Overview

In Figure 3.1 an overview of the architecture of CHERI-RISC-V VP++ is shown. The image is based on the architecture diagram of the original RISC-V VP++, shown in Figure 2.1. The color coding in this version of the image highlights the changes required to support the CHERI extension, with parts that are entirely new in green. The orange marked components were already present in the original VP but required some modi昀椀cations to support CHERI. Blue components are unchanged from the original VP. As this image indicates, modifying an existing architecture to support CHERI a昀昀ects multiple components throughout the architecture. But two major points of interest can be identi昀椀ed. The biggest changes are to the Instruction Set Simulator (ISS), which is expected, as this is the part of the architecture where instructions are decoded and executed. The second category of changes is all related to the memory layout, which involves changes in the memory module, as well as the memory interface, but also the TLM Bus, which must provide transactions supporting the transport of capabilities.

3.2 Implementing Capabilities

As explained in Section 2.4, capabilities are the core component of the CHERI architecture, and therefore the starting point of the CHERI implementation. In Figure 3.1 the extension of general purpose registers and CSRs to capabilities is shown by the addition of the green boxes above them. As brie昀氀y mentioned in Section 2.4.7 the actual representation of capabilities in an architecture is not clearly de昀椀ned. To allow for a comparable implementation, the VP makes use of the same representation as the Sail model and therefore two di昀昀erent capability structures are de昀椀ned.

18

3 Extending RISC-V VP++ with CHERI    19


  ISS                           Capability   Opcodes                    DMI Access
  (RV64 Core)
                             Program Counter                      Tagged IF
                                             Decode/   LSCache    DMem IF     MMU
                             Capabilities   Interpret/
              General Purpose Registers      Execute   DBBCache   IMem IF     TLB

   SCRs                      Capabilities  Instruction
              Control & Status Registers    Execution

                                                                   TLM Transactions
                                           TLM 2.0 Bus             Memory Map      Tagged
                                                                                   Memory
               CLINT                           PLIC                 UART,
                                                                    Mass Storage,   Unmodified
              Timer/SW           Peripherals                        Framebuffer,
                                                                    Keyboard,       Modified
  Interrupts   Interrupts                Ext. Interrupts            Mouse,
                                                                    Network, ...    New

                             Figure 3.1: Architecture of CHERI-RISC-V VP++

1 struct EncCapability 2 { 3 uint32_t perms : cEncCapPermsWidth; 4 uint32_t otype : cCapOTypeWidth; 5 uint8_t reserved : cCapReservedWidth; 6 bool flags : cCapFlagsWidth; 7 bool internal_E : 1; 8 uint16_t T : cCapMantissaWidth - 2; 9 uint16_t B : cCapMantissaWidth; 10 int64_t address : cCapAddrWidth; 11 }

               Listing 3.1: Encoded capability structure in the CHERI-RISC-V VP++

The 昀椀rst is the fully encoded format, that does not include the Tag. This format is used for the representation of capabilities in memory and transactions on the TLM bus. In Listing 3.1 the memory representation of the encoded capability is shown. This structure matches with the visualization of a capability shown in Figure 2.3, except for the not included Tag. The second structure is the partially decompressed format and includes the Tag. This is the for- mat that is used for all capability manipulations and also the registers in the VP store objects of this type. In addition, this structure provides various methods to extract and manipulate 昀椀elds of the capability. Using this partially decoded boundary representations improves the performance of the VP’s implementation, as boundary checks can be performed more efficiently, and the com- pression/decompression algorithm does not need to be performed when capabilities are transferred between registers. Because the whole structure de昀椀nitions including the access methods is quite lengthy, only the memory representation is listed in Listing 3.2, to allow for a comparison with the encoded structure type. Two major aspects can be observed in this structure. First, the permissions are stored as a bit 昀椀eld with individual single bit sized elements, which allows accessing individual permissions by their name. However, this does not increase the size, as the entire size of the permissions is still cEncCapPermsWidth = 16 bits. Second, the boundaries are stored di昀昀erently. Both the base and the top address are now cCapMantissaWidth = 14 bits in size. Furthermore, an additional 昀椀eld E of size cCapEWidth =

Tags

3 Extending RISC-V VP++ with CHERI    20

1 struct Capability { 2 uint8_t uperms : cCapUPermsWidth; 3 bool permit_set_CID : 1; 4 bool access_system_regs : 1; 5 bool permit_unseal : 1; 6 bool permit_cinvoke : 1; 7 bool permit_seal : 1; 8 bool permit_store_local_cap : 1; 9 bool permit_store_cap : 1; 10 bool permit_load_cap : 1; 11 bool permit_store : 1; 12 bool permit_load : 1; 13 bool permit_execute : 1; 14 bool global : 1; 15 uint8_t reserved : cCapReservedWidth; 16 bool flag_cap_mode : 1; 17 bool internal_E : 1; 18 uint8_t E : cCapEWidth; 19 uint16_t B : cCapMantissaWidth; 20 uint16_t T : cCapMantissaWidth; 21 uint32_t otype : cCapOTypeWidth; 22 int64_t address; 23 bool tag : 1; 24 }

  Listing 3.2: (Partially) decompressed capability structure in the CHERI-RISC-V VP++

6 bits is added. This 昀椀eld stores the exponent of the capability, in a separate 昀椀eld, and it is no longer contained inside the base and top values. This format represents a partially decompressed variant, and to understand its origin, the compression algorithm for CHERI’s bounds as explained in Section 2.4.6 must be examined. With this in mind, one can now understand what happens in the conversion process from the encoded capability to the partially decompressed variant. This decompression performs the 昀椀rst steps of the boundary decoding:

𝔸 is extracted from 𝔵 and 𝕇 depending on 𝔼𝔸
𝕇 is extended to cCapMantissaWidth bits
The three lowest bits of 𝔵 and 𝕇 are zeroed, if 𝔼𝔸 is set
The two highest bits of 𝕇 are calculated based on 𝔵 and a carry-out bit While this representation may not be practical for hardware implementations, it does not alter the semantics within the ISS, and the VP remains functionally equivalent to real hardware. Nev- ertheless, employing such a format in hardware could be a viable design choice if performance is prioritized over representation compactness, which is a classic size-speed trade-o昀昀.

3.3 Extending the Instruction Set Simulator

The Instruction Set Simulator (ISS), located at the top of Figure 3.1, is a key component of the VP, and is responsible for decoding and executing instructions. The ISS also holds all the registers of the CPU, including general purpose registers, special registers, like the PC and status registers, but also 昀氀oating point and vector registers. Additionally, it features interfaces to the memory controller and for system calls. It is also the ISS that is responsible for interrupt and trap handling. Due to its central role in instruction execution and system integration, the ISS often contains the most complexity of an architecture. Therefore, it is unsurprising that the majority of changes needed to support CHERI primarily impact the ISS. In Figure 3.1 the most important components of the ISS are shown. The following sections will explain the changes that are done to the ISS to support CHERI.

3 Extending RISC-V VP++ with CHERI    21

3.3.1 General Purpose Registers

The general purpose registers of the VP are held in a single structure called RegFile. This structure contains a simple array with 32 entries, each holding a 64-bit unsigned integer, serving as the general-purpose registers. Additionally, this structure provides a few methods for accessing and manipulating those registers. CHERI proposes two variants of adding capabilities to those registers. The 昀椀rst variant is to retain the existing integer registers as-is and add a separate capability register 昀椀le to hold capabilities. The other approach is to create a ”merged” register 昀椀le, that extends all integer registers to the size of a capability. While the 昀椀rst approach is used in CHERI-MIPS, the ”merged” register 昀椀le is the recommended variant for CHERI-RISC-V. Therefore, the array in the RegFile is extended to hold capabilities instead of integers. The ex- tension is done similarly, as extending 32-bit registers to 64-bit. Which means it is still possible to perform operations on the smaller value, despite the actual register size being larger. The extension implies another constraint: If a register is used to store an integer, the remaining, untouched, bits of the register must be zeroed. For the extension of 32-bit to 64-bit this is done implicitly by a zero sign extension. For CHERI, it must be ensured that this zeroing includes the Tag. In software this is solved, by de昀椀ning the capability object in such a way, that it implicitly zeroes the remaining bits, when an integer value is assigned to it. The implicit zeroing allows the rest of the code in RegFile to remain conceptually the same as before. Accessing the registers also remains unchanged and existing instructions that do not act on capabilities still work as before. In short this means, that with CHERI every register is capable of holding a capability. If it is interpreted as such depends on the instruction that is executed.

3.3.2 Special Capability Registers

In RISC-V Control and Status Registers (CSRs) are special purpose registers used to control and monitor the operation of the processor. They provide features for managing execution context and enforcing system policies in both user and supervisor modes. CHERI extends those existing CSRs to capabilities, naming them Special Capability Registers (SCRs). Additionally, some new registers are introduced. To distinct between a default CSR and its capability variant, the SCRs are also renamed. To give an example, the existing CSR uepc (User Exception Program Counter) becomes uepcc (User Exception Program Counter Capability). In the VP, the capability is implemented as a wrapper around the existing CSR. The wrapper ensures instructions can access CSRs as before, but also allows the new capability-aware instructions to manipulate the corresponding special capability register. Access to those registers might be controlled individually by the CHERI speci昀椀c permis- sion permit_access_system_registers, but also other control sources like the current exception handling state. Important to note is, that in contrast to general purpose registers, the special registers are not zeroed when they are accessed by an instruction that does not act on capabilities. This design decision ensures intentional use of CHERI-aware manipulation of these registers. This choice allows for fewer modi昀椀cations in the existing architecture, and is also mentioned by Watson et al. [6] in Section 3.4.6. Additionally, unlike the general purpose registers, the SCRs are not necessarily reset to the Null Capability. Instead, some of them default to the In昀椀nite Capability on reset. Not all CSRs are extended to capabilities, as some of them are not relevant for CHERI or do not require a capability variant. To summarize the modi昀椀ed CSRs, Table 3.1 lists all SCRs and indicates if they are based on an existing CSR or are a new register introduced by CHERI. The Modes 昀椀eld indicates the privilege modes that are allowed to access the register, where U, S and M stand for User, Supervisor and Machine mode respectively. Access indicates additional restrictions on accessing the register, where RO means read-only, ASR means access system register and - means no restrictions. The Reset 昀椀eld indicates if the register is initialized to the In昀椀nite Capability (∞)

3 Extending RISC-V VP++ with CHERI 22

 Table 3.1: CHERI Special Capability Registers

Register Name Modes Access Reset Extends Program counter capability (PCC) U, S, M RO ∞ PC Default data capability (DDC) U, S, M - ∞ - User trap code capability (UTCC) U, S, M ASR ∞ utvec User trap data capability (UTDC) U, S, M ASR ∅ - User scratch capability (UScratchC) U, S, M ASR ∅ uscratch User exception PC capability (UEPCC) U, S, M ASR ∞ uepc Supervisor trap code capability (STCC) S, M ASR ∞ stvec Supervisor trap data capability (STDC) S, M ASR ∅ - Supervisor scratch capability (SScratchC) S, M ASR ∅ sscratch Supervisor exception PC capability (SEPCC) S, M ASR ∞ sepc Machine trap code capability (MTCC) M ASR ∞ mtvec Machine trap data capability (MTDC) M ASR ∅ - Machine scratch capability (MScratchC) M ASR ∅ mscratch Machine exception PC capability (MEPCC) M ASR ∞ mepc

or the Null Capability (∅) on reset. The last column (Extends) indicates the CSR that is extended by the SCR. If the SCR is a new register, this 昀椀eld is empty. In addition, CHERI adds three CSRs (uccsr, sccsr, mccsr) that do not require capabilities and are actually only implemented as integer register. These are used for compartmentalization purposes and are not further relevant for this thesis. The following sections will give a view more insights on the SCRs, which are introduced by CHERI.

Program Counter Capability

The Program Counter Capability (PCC) is an extension of the existing PC register to a full ca- pability. Note that in Figure 3.1, PCC is shown separate from the other Special Capability Regis- ters (SCRs). This visualization is used because of the implementation in the VP, where PCC is separated from other CSRs as well, as it is used so frequently. Functionally this does not make a di昀昀erence. This capability allows for various validity, permissions and bounds checks on instruction fetches. The actual PC is still a 64-bit integer, which is held in the address of the PCC. The capability again acts as a wrapper around the PC. Because an integer manipulation of the PC does not a昀昀ect its capability, the existing PC manipulations during the default operation cycle in the ISS do not require changes. The reset value of PCC is the In昀椀nite Capability, as no program would be able to ever fetch their 昀椀rst instruction, if PCC was the Null Capability.

Default Data Capability

The Default Data Capability (DDC) is a new register introduced by CHERI. By providing both an o昀昀set and bounds for non-capability-aware loads and stores, the DDC allows the capability mechanism to constrain legacy instructions. This plays a crucial role in hybrid mode, where CHERI- aware and legacy code can run side by side. Upon reset, the DDC defaults to the In昀椀nite Capability. Applications or the OS are responsible for reducing its permissions as needed.

3 Extending RISC-V VP++ with CHERI 23

Trap Data Capabilities

CHERI-RISC-V also introduces additional trap data capabilities for each privilege level called utdc, stdc and mtdc. Those are planned to hold additional trap data that might be required for trap handling with the CHERI extension in the future. They are intended to work in addition to the existing trap vector registers, which are also extended to capabilities. However, they are not actually used in the latest version.

3.3.3 Instruction Decoding

As the CHERI extension adds new instructions to the RISC-V ISA, the ISS must be extended to support these new instructions. A good starting point for the extension is the instruction decoder, which is responsible for determining the type of instruction that is currently being executed. In total 99 instructions are added to the instruction decoder, not including the compressed instruc- tions. The instructions can be divided into the following categories:

10 Capability Inspection Instructions Used to access various 昀椀elds of a capability, like the bounds, permissions or the object type. 17 Capability Modi昀椀cation Instructions Used to modify 昀椀elds of a capability, like the bounds or permissions. Object type can be changed by sealing or unsealing instructions. 3 Pointer Arithmetic Instructions Used to create a capability from a pointer or vice versa. 2 Pointer Comparison Instructions Used to compare capabilities, returning an integer value as result. 3 Control-Flow Instructions Jump-like capability instructions that manipulate the PC. 1 Special Capability Register Access Instruction Used to access SCRs. 2 Fast Register-Clearing Instructions Used to clear entire memory sections at once. 2 Adjusting to Compressed Capability Precision Instructions Used to change bounds of a capability to next representable value based on compression format. 2 Tagged-Memory Access Instructions Used to load or clear tags of memory locations. 26 Memory Loads with Explicit Address Type Instructions Used to load data from memory into registers. 20 Memory Stores with Explicit Address Type Instructions Used to store data from registers into memory. 2 Memory-Access Instructions via Capability with O昀昀set Instructions Used to load or store a capability from or to memory. 7 Atomic Memory-Access Instructions via Capability with O昀昀set Instructions Addition to support atomic memory operations with 昀椀ne-grained capability boundaries. 2 Deprecated Instructions Removed from the standard but still used by TestRIG and therefore implemented in the VP.

3 Extending RISC-V VP++ with CHERI    24


Besides the last three categories, all instructions use the opcode 0x5B. All register-register op-
erations use the RISC-V R-type or I-type encoding format, that are shown in Figure 2.2. The
instructions are further distinguished by the   funct3 昀椀eld (bits 14–12) and, secondarily, by the
funct7 昀椀eld (bits 31–25). For instructions that do not require all three register 昀椀elds (rs1, rs2,
rd), the unused 昀椀elds are used for additional instruction decoding. A more detailed overview of
the instructions can be found in Appendix B of [6].
As an example, the Memory-Load Instruction do only require one source register and a destination
register. Therefore, all Memory-Load instructions share the same opcode, funct3 and funct7 昀椀eld,
as the excerpt of those instructions in Table 3.2 illustrates. The value of the rs2 昀椀eld then further
distinguishes the byte size and type (signed, unsigned) of the load operation.
This format allows for an easy extension of the VP’s instruction decoder. As explained in Sec-
tion 2.3, a switch-case structure is used to decode the instructions, which is extended with the new
CHERI instructions. In addition, the opId-Table, containing the values returned by the instruction
decoder is extended with the new CHERI instructions.

Table 3.2: Encoding of some Memory-Load Instructions with Explicit Address Type

funct7  rs2    rs1  funct3                    rd  opcode  Instruction
 0x7d   0x00   rs1   0x00                     rd   0x5b      LB.DDC
 0x7d   0x01   rs1   0x00                     rd   0x5b      LH.DDC
 0x7d   0x08   cs1   0x00                     rd   0x5b      LB.CAP
 0x7d   0x09   rs1   0x00                     rd   0x5b      LH.CAP

Compressed Instructions

As compressed instructions are supported by the VP, the instruction decoder must also be extended to support the new CHERI instructions in compressed format. Decoding compressed instructions requires additional e昀昀ort because CHERI repurposes some existing opcodes. CHERI follows the same pattern that RV64C and RV128C use, by repurposing compressed 昀氀oating point loads and stores to act on capabilities instead. The ambiguity of compressed instructions requires the in- struction decoder to be aware of the current state of the instruction mode. The instruction mode is determined by the flag_cap_mode 昀椀eld of PCC. If this 昀氀ag is set, the instruction decoder interprets the compressed instruction as an operation on capabilities, otherwise it is interpreted as a regular compressed instruction and the corresponding 昀氀oating point instruction is returned. In total 16 compressed instructions depend on the current instruction mode for RV64C. As an example, one of the mode dependent instructions is the compressed double precision store instruction c.fsd fs2, offset(rs1), which becomes the instruction c.csc cs2, offset(cs1) in capability mode. c.csc is the compressed capability store instruction, which stores the capability in register cs2 in memory at address cs1.address + offset.

3.3.4 Instruction Execution

As already mentioned in Section 2.3, after the instruction is decoded, the execution is handled in the exec_steps function of the ISS. This section describes the changes that are done to the instruction execution block to support CHERI and is split into two parts. The 昀椀rst part describes the implementation of new CHERI instructions, while the second part explains the changes that are done to existing instructions to support CHERI.

3 Extending RISC-V VP++ with CHERI    25


CHERI Instructions

Instructions that are added by CHERI are implemented by adding new cases to the existing
switch-case structure in the exec_steps function. Each instruction implementation is based on
their speci昀椀cation in [6]. In addition, the Sail model acts as a reference for the implementation,
whenever CHERI ISAv9 is ambiguous or incomplete. While simpler instructions are straightforward
to translate from Sail code to C++, for more complex instructions it makes sense, to do the
implementation from scratch. As the usage of C++ speci昀椀c features allows fore more performant
and better readable code. While the 昀椀rst iterations of the implementation followed the Sail model
very closely, the 昀椀nal implementation is more optimized and uses C++ speci昀椀c features to increase
performance and readability.

Changes to Existing Instructions

Besides the obvious new instructions, CHERI also a昀昀ects existing instruction behavior. This starts at the instruction fetch stage, where the PCC is used to authorize the fetch operation. To allow 昀椀ne granular CHERI checks, the fetching process is split into two memory loads of two bytes each. Before the lower bytes of the instruction are loaded, the PCC is validated. The following checks are performed: Tag of PCC must be valid (1) PCC must be unsealed (type = -1) PCC must permit execution The base address of PCC must be properly aligned The fetch address must be within the bounds of PCC If this checks pass, the instruction is fetched from the memory address. If the loaded instruction is a compressed instruction, this is the end of the fetching process. Otherwise, the boundary check is applied a second time before the next two bytes are fetched from the memory at address pc + 2. This time only the boundary check is performed, as the remaining checks are already done for the 昀椀rst fetch and cannot change in between the two fetches. Finally, the instruction is formed by concatenating the two fetched 16-bit words, resulting in one 32-bit instruction word. But not only the instruction fetch is a昀昀ected by CHERI. The behavior of some existing instructions is changed, depending on the current instruction mode, speci昀椀ed by the flag_cap_mode of PCC. All load and store instructions become mode dependent. If the instruction is executed in capability mode, the source register is acting as the authorizing capability. In integer pointer mode, the DDC is used instead. More on the actual behavior of load and store instructions is explained in Section 3.4.2, which covers the memory interface in more detail. Also the auipc (Add Upper Immediate to PC) instruction is a昀昀ected, which now creates a capability instead of a regular integer value, if in capability mode. Jump instructions are also a昀昀ected, as their target address must be validated against PCC before the jump is performed. Additionally, in capability mode the source register is interpreted as the authorizing capability, which is validated before the jump. The authorizing capability must have a valid Tag, must either be unsealed or a sentry capability, must permit execution and the desired new PC address must be within the bounds of the authorizing capability. In addition, all CSR access instructions perform an additional check on the PCC. access_system_registers 昀氀ag. The check of this 昀氀ag ensures low privileged code can not interfere with key system management functionality. But not all registers do require the per- mission. Instead, a whitelist approach is adopted, reading or writing to any CSR requires the access_system_registers permission, with the exceptions in the whitelist. Currently, the whitelist contains the registers shown in Table 3.3, which can be accessed without the permission.

3 Extending RISC-V VP++ with CHERI    26


     Table 3.3: CSR Access Whitelist in CHERI

 CSR           Access     Description
 cycle(h)    Read-Only    Counts the number of clock cycles since the core was started.
                          The high-word version (cycleh) provides the upper 32 bits on
                          RV32.
 time(h)     Read-Only    Timer value from a platform-speci昀椀c real-time clock. timeh is
                          the high 32 bits on RV32.
 instret(h)  Read-Only    Counts the number of retired instructions.   instreth is the
    upper half for RV32.

hpmcounter(h) Read-Only Hardware performance counters. The number of these counters is platform speci昀椀c. fflags Read/Write Floating-point exception 昀氀ags indicating invalid operations, over昀氀ows, etc. frm Read/Write Floating-point rounding mode (e.g., round to nearest, toward zero, etc.). fcsr Read/Write Combined 昀氀oating-point control/status register; includes fflags and frm.

3.3.5 Interrupt and Trap Handling

The trap handling in the ISS is implemented using C++ exceptions. A try-catch block is used to catch exceptions that are thrown during the execution of instructions. The usage of exceptions allows for a clean separation of the normal execution 昀氀ow and the trap handling code. If an instruction causes a trap, it raises a SimulationTrap exception, which contains an exception code and the current value of mtval. If an exception occurs, the ISS switches to trap handling mode. First, the method prepare_trap is called, which then con昀椀gures CSRs depending on the current privilege level. The preparation phase is unmodi昀椀ed and does not require any changes to support CHERI. The next step is called switch_to_trap_handler, here are some modi昀椀cation required. In this step the CSR Machine Exception Program Counter mepc is set to the current PC. mepc is used to store the PC before the trap to allow returning (e.g., using mret) later. With CHERI, mepc is extended to Machine Exception Program Counter Capability (mepcc) and is now a capability, which means that PCC must be written to the register, not only the integer PC. On the other hand, the PC is then set to the trap vector base address register mtvec/stvec/utvec depending on the privilege level. Because the trap vectors are also extended to capabilities, instead of only setting the integer PC, the entire PCC must be set to mtcc/stcc/utcc. The lowest two bits of the trap vector are used to determine the trap mode. Therefore, the actual address of the trap vector is calculated by shifting the trap vector base address by two bits. This still applies for the capability extended variants of the trap vector registers. The trap is then handled by executing the instruction currently pointed to by the PCC. When trap handling is done, the instruction mret/sret/uret is executed. These call the method return_from_trap_handler in the ISS. The return_from_trap_handler method must again be modi昀椀ed to set the PCC instead of the integer PC. However, this requires a separate preparation step for the return target, that is not required in the RISC-V base ISA. In this step the Exception Program Counter Capability (epcc) is legalized, before it is written to the PCC. If the otype of the legalized epcc is sentry the PCC must be unsealed. As previously mentioned and further explained in Section 3.4.2, CHERI applies additional checks on various operations, like memory accesses or capability manipulations. These checks must all

3 Extending RISC-V VP++ with CHERI    27

result in a trap, if they fail. CHERI adds three new exception codes that are reported in the RISC-V mtval register, which is used to report additional information about the exception. The three new exception codes are: CheriLoadFault (0x1A) Is reported when a load attempts to fetch a capability through a valid page table entry that forbids loading capabilities. This fault otherwise behaves like a RISC-V load page fault. CheriStoreFault (0x1B) Is reported when a store attempts to write a capability through a valid page table entry that forbids storing capabilities. This fault otherwise behaves like a RISC-V store/AMO page fault. CheriFault (0x1C) Is reported when other capability-related exceptions (e.g., a tag violation) occur. Additional CHERI speci昀椀c information is reported in the xtval CSR. In the actual implementation, a function named handle_cheri_exception is called, if any CHERI related check fails. This function optionally prints a detailed error message and calls the already existing raise_trap function by passing the corresponding exception code and xtval. The format of xtval for capability exception is de昀椀ned in the speci昀椀cation as shown in Figure 3.2 and is calculated accordingly before calling raise_trap. For the cause 昀椀eld in xtval, CHERI de昀椀nes detailed exception codes, which are used to distinguish between di昀昀erent CHERI exceptions. The new exception codes are listed and brie昀氀y explained in Table 3.4. The usage of this additional CHERI exception handling function allows not only for detailed error logging, but also to disable exceptions caused by CHERI with a single macro at compile time. The macro can be utilized for testing purposes, but also when the VP is used in a non-CHERI-aware mode, to make sure no CHERI related exceptions are called, in case the actual checks are not disabled due to some mistake. The rest of the interrupt handling does not require any additional modi昀椀cations to support CHERI.

31 11 10 5 4 0 WPRI cap idx cause

    Figure 3.2: Format of xtval for Capability Exceptions

3.3.6 Implications on ISS Performance Optimizations

As explained in Section 2.3, the VP implements DBBCache and LSCache to speed up the execution of the ISS. As performance optimization is not the scope of this thesis, these components of the VP are not yet modi昀椀ed to support CHERI. Instead, the dummy implementation of DBBCache and LSCache are used, which do not cache any data, but still provide the identical interface. The implementation of a CHERI-aware DBBCache and LSCache is left for future work.

3.4 Adapting Memory Layout and Interfaces

3.4.1 Tagged Memory

CHERI-RISC-V allows mixing of data and capabilities, which means that registers, but also mem- ory must be able to hold tagged capabilities as well as general integer data. This enables em- bedding of capabilities within in-memory data structures and ensures compatibility of capabilities with C-language pointers. This requires major changes in the memory implementation, as storing capabilities includes storing the Tag, which must be done in a way that does not allow unintended manipulation of the Tag.

3 Extending RISC-V VP++ with CHERI 28

        Table 3.4: CHERI Exception Codes

Value Exception Description 0x00 None Should never be raised 0x01 LengthViolation Raised, if bounds check fails 0x02 TagViolation Raised, if Tag of a capability is 0, although required 0x03 SealViolation Raised, if a sealed capability is expected to be unsealed for a operation (e.g., cinvoke) 0x04 TypeViolation Raised, if a capability’s otype does not match expectations (e.g., cinvoke) 0x05- reserved 0x07 0x08 Software-de昀椀ned Permission Violation Raised, if software permissions do not match expected value (never used) 0x09- reserved 0x0f 0x10 GlobalViolation Never used 0x11 PermitExecuteViolation Raised, if permit_execute 昀椀eld of a capa- bility is 0, although required (e.g., cinvoke ) 0x12 PermitLoadViolation Raised, if permit_load 昀椀eld of a capability is 0, although required (e.g., lb) 0x13 PermitStoreViolation Raised, if permit_store 昀椀eld of a capabil- ity is 0, although required (e.g., sb) 0x14 PermitLoadCapabilityViolation Raised, if permit_load_cap 昀椀eld of a ca- pability is 0, although required (e.g., lc) 0x15 PermitStoreCapabilityViolation Raised, if permit_store_cap 昀椀eld of a ca- pability is 0, although required (e.g., sc) 0x16 PermitStoreLocalCapabilityViolation Raised, if permit_store_local_cap 昀椀eld of a capability is 0 and global 昀椀eld is 0, although required (e.g., sb) 0x17 reserved 0x18 PermitAccessSystemRegistersViolation Raised if access_system_regs 昀椀eld of a capability is 0, although required (e.g., cspecialrw) 0x19 PermitInvokeViolation Raised, if permit_invoke 昀椀eld of a capa- bility is 0, although required (e.g., cinvoke ) 0x1a- reserved 0x1b 0x1c PermitSetCIDViolation Raised, if permit_set_CID 昀椀eld of a capa- bility is 0, although required (never used) 0x1d- reserved 0x1f

3 Extending RISC-V VP++ with CHERI    29

As already explained in Section 2.4.5, a tagged memory must hold an unaddressable out-of-band Tag for each capability-sized memory word. This Tag must be atomically bound to the corre- sponding memory word. This atomicity is what ensures protection of the capabilities, as it must not possible to partially overwrite a capability in memory, while the Tag remains valid. In the VP this is solved by implementing a new SystemC/TLM memory module called TaggedMemory. It is basically a copy of the existing SimpleMemory, with a few extensions that are explained in the following. This modi昀椀cation can be seen when comparing Figure 2.1 and Fig- ure 3.1, where the memory module is now called TaggedMemory and the addition of the Tags is visualized by the small boxes at the side of the memory. On initialization, the TaggedMemory does not only allocate the memory for a given size, but also initializes a vector of boolean values called tag_bits. The length of this vector is 昀椀xed and set to the size of the memory divided by the size of a capability (CLEN). This means that each Tag in this vector corresponds to a CLEN sized memory section. By default, all Tag bits are set to false. In addition to the existing write_data method, which takes an address, a reference to source data and the amount of bytes, a new overloaded method is provided. This new method takes an additional boolean value as parameter. This boolean is interpreted as the Tag for the data that should be written. If this Tag is set to true, the method checks that the written data must be capability aligned, and CLEN bytes in size. If this is the case, the method copies the data to the given address and sets the Tags. Initially this was implemented for a generic data size and could theoretically write multiple capabilities at once, by passing their Tags as a vector. However, this is not used by the current instructions in CHERI ISAv9, which all act on single capabilities, and was therefore removed to reduce complexity. The read_data method is extended to return a boolean value that represents the Tag of the loaded data. The actual data read is copied to a given destination, the same as it is done in SimpleMemory. This method can not do assertions on alignment and size of the loaded data, as it must be possible to read untagged data from memory. However, it does still check if the read data could be a capability by checking the if the address and size are capability aligned. Only when the read value might be interpreted as a capability (based on size and alignment), the actual Tag value at this address is returned. If it is guaranteed that the read value is not a capability the method always returns false. Neither write_data nor read_data operate on objects of type Capability. Instead, they utilize a simple boolean value to represent the Tag associated with each memory word of size CLEN. Using this representation is a design decision that was made to keep the memory implementation as abstract as possible. Also, it is more efficient to simply pass integer pointers instead of the signi昀椀cantly more complex Capability structure. This decision is reasonable from multiple perspectives. First, it highlights that such a tagged memory structure is conceptually independent of CHERI and could be used for other concepts that might want to mark words in memory. But also it separates the code cleanly, as the TaggedMemory is implemented in the platform namespace of the VP, whereas the Capability struct is de昀椀ned in core.

3.4.2 Memory Interface

As shown in Figure 3.1, the ISS does not access memory directly. Instead, it communi- cates via a memory interface over the TLM bus. Since the memory now supports tagged data, this interface must also be extended to handle capabilities. To that end, a new interface called CombinedTaggedMemoryInterface is implemented, building on the existing CombinedMemoryInterface. The new interface introduces methods speci昀椀cally for loading and storing tagged data. Compared to implementing the TaggedMemory class, adapting this interface requires more e昀昀ort, as it already provides separate methods for various data types and supports atomic load/store operations. The interface is extended with dedicated methods for loading and storing capabili- ties. Unlike the TaggedMemory, which uses a boolean Tag value, the new methods of

3 Extending RISC-V VP++ with CHERI    30

CombinedTaggedMemoryInterface operate on the actual Capability type. This is appropri- ate since the interface is tightly coupled to the ISS and all other methods of this interface also apply strict types. Additionally, because CHERI speci昀椀es how capabilities must interact with atomic memory op- erations, and the VP implements the atomic extension, atomic variants of the capability access methods are provided alongside the standard load_cap and store_cap methods. Furthermore, CHERI interferes with existing load/store methods too. For every operation the register serving as the address in the standard RISC-V ISA must now be treated as a capability and acts as the authorizing capability. The authorizing capability is used to verify the operation, which means that several checks are performed before each load/store operation. These checks do always include the following: Tag of authorization capability must be true Authorization capability must not be sealed Authorization capability must have permission to perform the given operation (load/store) Destination address in memory must be within bounds of the authorizing capability Destination address must be properly aligned

3.4.3 TLM Bus

The TLM bus is the communication interface between the ISS and all other components, like the memory or peripherals. It is visualized in the center of Figure 3.1 to highlight that it is a central component of the VP. As explained in Section 2.3, the usage of SystemC/TLM allows for abstraction of communication details and enables the VP to focus on the actual instruction execution and memory access. The TLM Bus is implemented in the CombinedMemoryInterface, which de昀椀nes a method _do_transaction, that is invoked by all load/store methods. This method prepares and initiates the actual TLM transaction and must be adapted accordingly for the tagged variant. It begins by creating a tlm_generic_payload object that contains all the necessary information for the transaction. The payload is then passed to the TLM bus via the b_transport method of the TLM socket. Both tlm_generic_payload and b_transport are de昀椀ned by the SystemC/TLM standard and not speci昀椀c to the VP. The problem is that the generic payload type does not support tagged data, as it can only store a pointer to the data and its size. The SystemC/TLM standard allows users to de昀椀ne custom extensions, that can be appended to the generic payload. Therefore, a new TagExtension is implemented, that inherits from tlm_extension and simply holds a boolean Tag value. The full code of the TagExtension is shown in Listing 3.3 to highlight the simplicity of the implementation. Using the TagExtension allows CHERI-aware modules to exchange Tags alongside the actual data over the unmodi昀椀ed TLM bus. Modules that are not aware of the TagExtension simply ignore it and operate as usual, ensuring backward compatibility with existing TLM components. If the current transaction is a store operation, the Tag is set accordingly before appending the extension to the payload. The TLM transaction is then performed as usual and once it is completed, the Tag value is read from the TagExtension and used as the Tag of the loaded value, if it was a load operation.

3.4.4 DMI

In addition to the usual TLM transport interface, the VP also supports transaction via the Direct Memory Interface (DMI). The DMI allows for faster access to memory by bypassing the normal TLM transport layer and providing the initiator with a pointer to the actual memory array in- stead [51]. It is obvious that this must also be extended to support tagged data. For this the existing

3 Extending RISC-V VP++ with CHERI    31

1 struct TagExtension : tlm::tlm_extension { 2 bool tag; 3 4 TagExtension(bool t){ 5 tag = t; 6 } 7 8 tlm::tlm_extension_base* clone() const override { 9 return new TagExtension(*this); 10 } 11 12 void copy_from(tlm::tlm_extension_base const &ext) override { 13 tag = static_cast<TagExtension const &>(ext).tag; 14 } 15 };

        Listing 3.3: TLM Tag Extension in CHERI-RISC-V VP++

DMI implementation in the class MemoryDMI is extended with a few new methods. One for loading a tagged data object, one for storing a tagged data object, and one method that allows to read the Tag that protects a given address. Additionally, an overloaded initialization method is added, that allows to set the reference to the actual Tag vector in the TaggedMemory. An important oversight that happened during the implementation is, that the DMI must ensure that the corresponding Tag is cleared, if a non-tagged data element is accessed via the already existing methods (e.g., a single byte store). Since the DMI skips the TLM transaction and also the memory’s write_data implementation, the clearing of the Tag must be ensured by the DMI.

3.4.5 Virtual Memory and Page Tables

The MMU (located on the right side of the ISS in Figure 3.1) is responsible for translating virtual addresses to physical addresses and managing the page table walks. The MMU is implemented in the struct MMU, which is part of the VP and is used by the ISS to access memory. Its implementation follows the rules of virtual addressing as explained in Section 2.1.3. For this work, the MMU is extended to support CHERI and its implications on the virtual memory system. In general, the translation of virtual addresses to physical address is not changed when CHERI is enabled. The MMU still uses the SATP register to determine the current page table root and translates addresses based on the current privilege level. CHERI protects the SATP register and requires the permit_access_system_registers permission to change the page table root and other virtual-memory parameters. Additionally, the MMU must constrain loading and storing of valid capabilities via speci昀椀c page mappings. The constraint is achieved by adding new permission bits to the Page Table Entry (PTE) format. As the RISC-V Sv32 PTE format does not have enough reserved bits, the standard only de昀椀nes this for the Sv39 and Sv48 formats. In Figure 3.3 the PTE format for Sv39 is shown, with the bits added by CHERI highlighted in bold. In the current state of the VP, only this format is extended with CHERI support. This means that only the Sv39 PTE format can be used when CHERI is enabled.

Capability Stores

The two bits called CW and CD are added to the PTE format. Those are used to control the storing of capabilities and are parallel to the existing W and D bits, which are explained in Table 2.1. The functionality of the new CW and CD bits is described in Table 3.5. In CHERI ISAv9, these behaviors must be applied to all instructions that act on a valid capability, which means that its Tag is set. As a result instructions are dependent on the Tag, but this might change in future versions as noted in CHERI ISAv9.

3 Extending RISC-V VP++ with CHERI 32

63 62 C 60 59 58 54 53 48 CW CR CD CRM CRG Reserved PPN[2] ... 47 32 ... PPN[2] ... 31 16 ... PPN[2] PPN[1] PPN[0] ... 15 10 9 8 7 6 5 4 3 2 1 0 ... PPN[0] RSW D A G U X W R V

                 Figure 3.3: Sv39 PTE Format with CHERI Extension

                         Table 3.5: Capability Store Permissions in PTE

CW CD Behavior 0 X Trap on capability stores (exception code 0x1B) 1 0 Capability stores atomically raise CD or fault (exception code 0x1B) 1 1 Capability stores permitted

When storing a capability through a PTE that has the CD bit cleared, hardware may either raise a capability-store page fault (exception code 0x1B) or atomically update the PTE to set the CD bit, the same way as is with the existing D bit. The PTE update must follow the existing atomicity rules. Therefore, the page table walker must atomically verify that the PTE is valid and that both the W and CW bits are set, and the update to the PTE must become visible no later than the store itself. Since capability stores are still regular stores, they are required to check both W and CW permissions and set the D and CD bits. If either D or CD is clear when it should be set, the hardware may raise a page fault instead of updating them atomically This alternative approach allows the OS or hypervisor to explicitly manage and track the use of capability stores. In the VP, the 昀椀rst approach is implemented, which means that CD is set atomically when a capability store is performed through a PTE that has the CW bit set. The ordering of PTE check and updates follow RISC-V conventions extended for CHERI. V, U, and W bits are checked 昀椀rst, followed by CW, before D, CD and optionally A are updated. If a fault condition arises, the D and A bits take precedence over CD.

Capability Loads

For capability loads, CHERI ISAv9 speci昀椀es three behaviors. However, multiple con昀椀gurations are reserved for experimental features. This leads to the de昀椀nition of the three bits CR, CRM and CRG being used for the CHERI extension, while most of their possible con昀椀gurations are not used and will cause a page fault (due to invalid settings in PTE). In Table 3.6 all possible con昀椀gurations and their intended behavior are shown. As with CW/CD behavior, implementations must respond based on the data when a PTE is con昀椀gured to fault on a capability load. Which means faults must only be raised, if the loaded value has its Tag set. Again, the CHERI ISAv9 notes that this restriction might be relaxed in future versions. The translation of virtual to physical addresses in the VP is done inside the MMU. The VP handles the translation of virtual to physical addresses in the translate_virtual_to_physical_addr method of the MMU. This method is called by the ISS via the memory interface, whenever it needs to access memory, either for loading or storing data. It is this method that then performs a page table walk to 昀椀nd the corresponding PTE for the given virtual address and 昀椀nally returns the physical address. This method is extended with additional checks for the new bits added by the CHERI extension. But the actual handling of the error cases is not as trivial. Because the Tag bit of the loaded or

3 Extending RISC-V VP++ with CHERI    33


         Table 3.6: Capability Load Permissions in PTE

 CR  CRM      CRG  Behavior
 0   0        0    Capability loads strip Tags on loaded result
 0   1        0    Capability loads fault (exception code 0x1A)
 1   0        0    Capability loads are unaltered
 0   X        1    Reserved
 1   0        1    Reserved
 1   1        X    Reserved for generational load barriers

stored capability must be known, as faults are only triggered if the Tag is set. For stores, the Tag
bit of the capability being written is available and can be easily passed to the MMU. However, for
loads, the Tag bit is not known at the time of address translation, making handling more complex.
In the VP this problem is solved by passing two additional pointers to the translation method, each
referencing a boolean 昀氀ag. One 昀氀ag indicates that the Tag of the loaded value should be cleared,
the other signals that a fault must be raised if the Tag is set. Using pointers allows the MMU to
update these 昀氀ags based on the PTE con昀椀guration. It is the caller’s responsibility to inspect these
昀氀ags and respond appropriately after the load is complete, and the Tag is available.

3.5 Conclusion

This chapter covered the most important steps required to implement CHERI support in RISC-V VP++. First two structures for architectural capabilities were de昀椀ned, which are used to represent capabilities in registers and memory. Using these structures, the ISS was extended to hold capabilities in registers and to perform capability-aware instruction fetches. The instruction decoding was modi昀椀ed to handle the instructions added by CHERI, which follow the same decod- ing scheme as the existing RISC-V instructions. Afterwards the behavior of each instruction was implemented in the execution block of the ISS based on the CHERI ISAv9. The last part of the manipulations made to the ISS was related to the trap handling, which was extended to support CHERI exceptions and traps, while also allowing the ISS to store capabilities in the trap related CSRs. Then the memory interface was extended to support tagged memory, which is required to store capabilities in memory. This included the implementation of a new TLM memory module and adapting the TLM bus to allow transport of tagged data. This was done using the concept of TLM extensions, which allowed to append a boolean Tag value to the generic payload. The last section of this chapter covered the implications of CHERI on the virtual memory system, and how this was implemented in the MMU of the VP. This concludes the implementation of CHERI in the RISC-V VP++. The next chapter presents the testing strategy used to verify correctness and to ensure that all features required for CHERI ISAv9 compliance are properly implemented.

Chapter 4

Veri昀椀cation using TestRIG

With the results from Chapter 3, the implementation of the CHERI-RISC-V VP++ is in a state, where it is capable of executing binary (ELF) 昀椀les built for the CHERI-RISC-V architecture. To verify the correctness of the implementation, all new and modi昀椀ed instructions require testing. These tests should cover not only basic functionality, but also edge cases and error handling. While this is already a non-trivial task for regular RISC-V architectures, this becomes considerably more complex with CHERI, since each instruction must account for numerous additional edge cases. This stems from the fact that most CHERI instructions validate multiple 昀椀elds of the involved capabilities, which can trigger traps in a wide range of scenarios. Therefore, a comprehensive testing strategy is required to ensure the correctness of the implementation. From these considerations, it should be clear, that manually writing, building and executing test cases for all instructions and edge cases is not a feasible approach. To address this challenge, the CTSRD project developed the TestRIG, a testing framework speci昀椀cally designed for RISC-V im- plementations. As already explained in Section 2.4.9, the TestRIG provides a way to automatically generate randomized test cases, which can be executed on di昀昀erent RISC-V implementations. It is important to understand, that the TestRIG does not generate the expected output of the test cases, but instead compares the output of two di昀昀erent implementations, and thus requiring a reference implementation. Because the CHERI ISAv9 is de昀椀ned in terms of the formal Sail model, and Sail allows to build an executable simulator of the model, this simulator can be used as the reference implementation. This leads to a test setup, as shown in Figure 2.5 (see Section 2.4.9), with the Sail model as implementation A and the CHERI-RISC-V VP++ as implementation B.

4.1 Advantages of TestRIG

Besides the obvious advantage of automatic test case generation, the TestRIG provides several other advantages that make it a suitable choice for testing the CHERI-RISC-V VP++. First, the major reason for using TestRIG is that it does not require to compile the generated code for the target architecture. Instead, the test case generator already provides the machine code for each instruction which is directly injected into the implementations. This is especially important, because building and running C programs requires some special considerations, and a lot of instruction must already work 昀氀awlessly to set up the capability relocations, which are explained later in Section 5.2. Therefore, it would require all test cases to be written in assembly, which would be quite tedious for all instructions and edge cases. With the usage of TestRIG, validation of the already implemented instructions was possible throughout the implementation process. Second, the DII, which was explained in Section 2.4.9, provides a platform independent way to inject instructions, which allows running identical test cases on any RISC-V implementation that supports the RVFI-DII standard. Third, since the TestRIG does not generate entire programs but only sequences of instructions, the sequence generator does not need to ensure completeness of the program. This eliminates the need for valid control 昀氀ow (e.g., jump instructions), memory management setup, or any bootloader or startup code. This simpli昀椀cation not only makes automatic test case generation easier, but

34

4 Veri昀椀cation using TestRIG    35

also greatly facilitates manual testing. The used QuickCheckVEngine, which was introduced in Section 2.4.9, supports storing and loading instruction sequences, which can also be written by hand. These sequences can be as short as a single instruction, which is particularly useful for early testing of newly implemented instructions. Fourth, the reduction to tiny test cases allowed for a kind of counterexample driven development, an advancement over test-driven development [50]. Whenever new instructions were added to the VP, the TestRIG was used to verify the implementation against the Sail model. When the VP met most of the obvious requirements of CHERI ISAv9, the TestRIG provided a guideline on what was still missing. In this way it was possible to incrementally implement the CHERI extension, while always having a way to verify the implementation.

4.2 Disadvantages of the TestRIG

While the TestRIG provides many advantages, it also has some disadvantages that must be con- sidered. First of all, the TestRIG requires the DuT to support the RVFI-DII interface. The integration of this interface adds signi昀椀cant complexity to the VP implementation. Not only does it require substantial initial e昀昀ort to implement, but it also demands ongoing maintenance and careful consideration when introducing new features. The second problem is, that the RVFI implementation, requires the VP to behave di昀昀erently, depending on whether the RVFI tracing is enabled or not. The cause for this mode dependency will be explained in more detail in Section 4.4. This di昀昀erent execution paths can lead to di昀昀erent behavior during tests and production, invalidating the test results. Modifying the testee implemen- tation to support the test framework is always bad practice, and marks a signi昀椀cant downside of using the TestRIG and RVFI-DII.

4.3 The RVFI-DII Format

To understand how TestRIG interacts with the DuT, it is important to 昀椀rst look at the RVFI-DII format in more detail, since this interface de昀椀nes how instructions and their results are communi- cated and checked. With this background, the next sections will explain how the RVFI-DII interface was integrated into the CHERI-RISC-V VP++. RVFI implements two versions of its trace format. While version 1 is documented in [52], version 2 is not yet standardized, but used by the Sail model. The VP follows the version 1 standard, which is not a problem for the TestRIG, as it is able to identify which version is used by each implementation and handles the comparison of the traces accordingly. The format de昀椀nes two di昀昀erent packets, the instruction packet and the execution packet, which are used to communicate between the VEngine and the DuT.

RVFI-DII Instruction Packet

The instruction packet is 8 bytes in size and sent from the VEngine to the DuT. It contains the following 昀椀elds: 4 bytes: Instruction word: Instruction that should be executed in binary encoding, only two bytes are used for compressed instructions. 2 bytes: Time: De昀椀nes a delay before the instruction should be injected. This can be ignored according to speci昀椀cation, and is not used by any implementation. 1 byte: Trace command: Control signal de昀椀ning the action that should be performed by the implementation.

4 Veri昀椀cation using TestRIG 36

s 0: EndOfTrace: Resets the implementation, including registers, memory and PCC, regis- ters are set to the Default-Capability, which di昀昀ers from normal execution, where general purpose registers are initialized as Null-Capability (see Section 2.4.8 for their de昀椀nitions) s 1: Instruction: Injects the given instruction into the implementation s ’B’: Blink: Used for debugging, causes the VP to signal that it received the command by printing BLINK to the console 1 byte: Padding, always zero

RVFI-DII Execution Packet

The execution packet is sent from the DuT to the VEngine after each executed instruction. It contains the following 昀椀elds: 1 byte: Trap handler: Indicates the 昀椀rst instruction executed in the trap handler, 0 otherwise (not used by any implementation) 1 byte: Halt indicator: Indicates whether the instruction caused a halt, 0 if no halt was caused 1 byte: Trap indicator: Indicates whether the instruction caused a trap, 0 if no trap was caused 1 byte: Write register address: Address of the rd register, or zero if not used 1 byte: rs1 register address: Address of the rs1 register, or zero if not used 1 byte: rs2 register address: Address of rs2 register, or zero if not used 8 bytes: Memory write mask: Indicates valid bits written, 0 if no bits were written to memory 8 bytes: Memory read mask: Indicates valid bits read, 0 if no bits were read from memory 8 bytes: Memory address: Address of the memory access, if the instruction accesses memory, otherwise zero 8 bytes: Write register value: Content of the rd register after instruction execution 8 bytes: rs1 register value: Content of the rs1 register after instruction execution 8 bytes: rs2 register value: Content of the rs2 register after instruction execution 8 bytes: Instruction word: Instruction that was executed in binary encoding, always decom- pressed value, even if instruction was compressed 8 bytes: PC after instruction: PC after the instruction was executed 8 bytes: PC before instruction: PC before the instruction was executed 8 bytes: Instruction number: Content of the instret register, after instruction execution

4.4 Adding RVFI-DII support to CHERI-RISC-V VP++

The implementation of RVFI-DII in the VP can be divided into three parts. First, the VP must be able to communicate with the VEngine via a web socket. This socket implementation must be provided by the VP and is used to send and receive messages from the VEngine. A C++ implementation of RVFI-DII is available from the Spike RISC-V ISA simulator by CTSRD [53]. While this project is no longer actively maintained and not up-to-date with the latest ISA revision, parts of its RVFI-DII implementation are still usable. Especially the web-socket implementation is still functional and is therefore reused in the CHERI-RISC-V VP++. Second, the VP must be able to not fetch instructions from memory, but instead receive them from the VEngine using DII. This requires the manipulation of the fetching process, which is

4 Veri昀椀cation using TestRIG    37

done by adding a condition to the VP’s slow path. If RVFI is enabled, the VP does not fetch the next instruction from memory, but instead uses the latest received instruction stored in the rvfi_dii_input object. The modi昀椀ed fetching process is the 昀椀rst cause of the RVFI mode depen- dency, as the fetching process is signi昀椀cantly di昀昀erent when RVFI is enabled. To ensure the VP waits for the next instruction to be received, instead of performing step after step, as soon as one instruction is executed, a new CoreRunner is implemented, which waits to receive an instruction via DII before calling the run_step method of the ISS. The third part is the reporting of the executed instructions via RVFI traces, which is the most complex part of the addition. As the RVFI tracing format shown in Section 4.3 contains a lot of information, the VP must be modi昀椀ed in various places throughout the implementation to ensure correct reporting. Because memory access is handled quite di昀昀erently in the VP compared to the Sail model, it is not always trivial to report identical results. To ensure identical reporting, the VP must check in various places, if RVFI tracing is enabled. This causes some modi昀椀ed behavior in the VP, depending on whether the RVFI tracing is enabled or not, leading to the already criticized mode dependency of the VP.

4.5 Bugs identi昀椀ed using TestRIG

In this section, some of the bugs that were found using the TestRIG are described. This is only a small selection of the bugs that were found during the development. The described scenarios should give an impression of the kind of bugs that can be found using the TestRIG and how TestRIG can be utilized to verify the implementation. The total number of found bugs would exceed the limits of this thesis, as many bugs were found due to the test driven development approach. The range of bugs found by the TestRIG is quite large, ranging from simple bugs caused by missing instructions, over edge cases that were missed during the implementation of the more complex instructions, to more complex bugs that were caused by missed speci昀椀cation details for example in the trap handling. It is also worth noting that many of the bugs were caused only by the reporting via the RVFI, and not by the actual implementation of the instructions. Such di昀昀erences are a result of di昀昀erent design decisions made by the Sail model and the CHERI-RISC-V VP++, which lead to difficulties in reporting identical results.

4.5.1 Memory Alignment Checks

When loading or storing data in memory, alignment checks are performed to ensure that the data is correctly aligned for the data type being accessed. The alignment is not a requirement of CHERI, but of the RISC-V architecture in general. However, during the extension of the VP to support CHERI, the alignment checks were missed in some new execution paths. This was discovered by the TestRIG during a test case that tried to load a 64-bit value from an unaligned address, which lead to a trap in the Sail implementation, but not on the VP.

4.5.2 Integer Values in Capability Registers

CHERI speci昀椀es, that whenever an integer value is written to a capability register, the registers address 昀椀eld must be set to the given value, while its other 昀椀elds (including the Tag) must be set to zero. In a 昀椀rst implementation this was not ensured in all cases. The TestRIG could re- duce the problem to the test case shown in Listing 4.1, which also shows the reported output of implementation A (Sail) compared to the output of implementation B (CHERI-RISC-V VP++). The output format of the TestRIG is designed in a way that allows re-execution of the test case, but also to identify the di昀昀erences between the two implementations. The lines 1 and 5 show the two instructions that are executed by the test case. When running the test again using the TestRIG, it would send the two instructions that follow the .4byte keyword to each implementation, all other lines are ignored as they are comments. The # symbol marks the start of a comment. The comments

4 Veri昀椀cation using TestRIG    38

1 .4byte 0x00008093 # addi x1, x1, 0 2 # Trap: False, PCWD: 0x0000000080000004, RD: 01, RWD: 0x0000000000000000, I: 0x0000000000008093 3 # Trap: False, PCWD: 0x0000000080000004, RD: 01, RWD: 0x0000000000000000, I: 0x0000000000008093 4 5 .4byte 0xff7080db # cgethigh x1, x1 6 # Trap: False, PCWD: 0x0000000080000008, RD: 01, RWD: 0x0000000000000000, I: 0x00000000ff7080db 7 8 # ^ A, B v: mismatch in field rd_wdata: 0x0 != 0xffff000000000000 9 10 # Trap: False, PCWD: 0x0000000080000008, RD: 01, RWD: 0xffff000000000000, I: 0x00000000ff7080db

Listing 4.1: Integer write to capability register

are used to log the reported output of the VP and the Sail model. For the version 1 trace format, that the VP uses, each output trace contains all 昀椀elds explained in Section 4.3. As not all the 昀椀elds are relevant for one instruction, version 2 of the RVFI-DII trace format only reports those, that are meaningful for the executed instruction. As an example, the addi instruction does not manipulate the memory, so the 昀椀elds related to memory access are not reported. Because the Sail model uses version 2, the output of the Sail model only contains a subset of the 昀椀elds. For better readability, the comments shown in Listing 4.1 are shortened to only show the 昀椀elds required to understand the di昀昀erences between the two implementations. Trap indicates whether the instruction caused a trap or not. PCWD is the value written to the PC after the instruction was executed. RD is the destination register of the instruction, which is written to by the instruction. RWD is the value written to the destination register. I is the instruction that was executed, for non-compressed instruction this always matches with the instruction sent by the TestRIG. If the reported outputs of implementation A (Sail) and B (VP) are identical, both of them are logged as comments. In case of a mismatch, the TestRIG logs which 昀椀elds di昀昀er between the two implementations, by inserting additional comments between the two output traces, as can be seen in Line 8. As the RVFI is not designed to support CHERI, the implementations do only report the integer value of the destination registers, which is why the report after the addi instruction is identical in both cases. It is the second instruction, cgethigh, that shows the di昀昀erence in behavior. cgethigh is a CHERI instruction, that reads the high part of a capability register and writes it to the address 昀椀eld of the destination register. As shown in Figure 2.3 it is the high part of a capability register that contains the capability 昀椀elds, while the low part is used for the address 昀椀eld. Therefore, the expected result of the cgethigh instruction is zero, as the previous addi instruction should clear the capability in the x1 register. In the incorrect implementation, the upper part of the capability register is not cleared, resulting in the returned value of 0xffff000000000000, since the x1 register is initialized to the In昀椀nite Capability. This leads to the report of the TestRIG in Line 8, which shows the di昀昀erence in the RWD 昀椀elds.

4.5.3 Compressed Capability Instructions

As explained in Section 3.3.3, CHERI de昀椀nes compressed instructions that are dependent on the encoding mode, marked by flag_cap_mode in the PCC. In a 昀椀rst attempt this was implemented incorrectly by the CHERI-RISC-V VP++, which did not check for flag_cap_mode when decoding the instruction. Instead, decoding was done whether the CHERI extension was enabled or not. The TestRIG is able to detect this issue, if the compressed instructions are enabled in the TestRIG, by appending c to the architecture string (e.g., rv64icxcheri). The test case shown in Listing 4.2 shows the failed test case, which tries to execute the compressed instruction c.addi16sp. The

4 Veri昀椀cation using TestRIG    39

1 .4byte 0x00007179 # Unknown instruction 2 # Trap: False, PCWD: 0x0000000080000002, RD: 02, RWD: 0xffffffffffffffd0, I: 0x0000000000007179 3 # Trap: False, PCWD: 0x0000000080000002, RD: 02, RWD: 0xffffffffffffffd0, I: 0x0000000000007179 4 5 .4byte 0xff71025b # cgethigh x4, x2 6 # Trap: False, PCWD: 0x0000000080000006, RD: 04, RWD: 0x0000000000000000, I: 0x00000000ff71025b 7 8 # ^ A, B v: mismatch in field rd_wdata: 0x0 != 0xffff000000000000 9 10 # Trap: False, PCWD: 0x0000000080000006, RD: 04, RWD: 0xffff000000000000, I: 0x00000000ff71025b

Listing 4.2: Compressed capability instruction

instruction is expected to be interpreted as a regular RISC-V instruction, as the flag_cap_mode is not set in the In昀椀nite Capability, and therefore not set in the initial PCC. On the VP, the instruction is interpreted as the capability instruction c.cincoffsetimm16csp, which performs a similar operation, but has one signi昀椀cant di昀昀erence. The cincoffset instruc- tion does set the destination register cd to cs1, with its address replaced with cs1.address + imm. cincoffset leads to a valid capability in the destination register, while the non-capability instruction c.addi16sp would clear the capability in the destination register. Therefore, the bug is not visible, when viewing the result of the 昀椀rst instruction, as the integer value in the destination register is identical in both cases. However, the second instruction cgethigh shows the di昀昀erence in behavior. The expected result is zero, as the previous instruction should clear the capability in x2 (csp). But the VP implementation returns the value 0xffff000000000000, which is the upper part of the In昀椀nite Capability. Another observation in Listing 4.2, is that the instruction reported by the TestRIG in Line 1 is labeled as an unknown instruction. This is interesting, because apparently the TestRIG is capable of generating compressed instructions, as this test case was generated by the TestRIG. However, the interpretation of the instructions for the comments seems to be done in another part of the TestRIG, which does not support compressed instructions. This behavior is not only observed for this particular instruction, but for all compressed instructions. The issue is probably caused by the TestRIG not being able to resolve the correct instruction, as the correct encoding mode is not known to the TestRIG.

4.6 Coverage of the TestRIG

The 昀椀nal version of CHERI-RISC-V VP++ developed in this thesis passes all tests, when running the TestRIG with 250 000 test cases in the three random test categories. The tests were run with the architecture string rv64icxcheri and the option test-include-regex set to rand, resulting in the generation of random test cases in caprandom - Xcheri Extension Random Template, caprvcrandom - Xcheri RVC Extension Random Template and random - Random Template. To achieve better coverage, additional tests were run with the architecture string rv64imacxcheri, which adds support for atomic instructions and the multiplication extension. For the second run, the option test-include-regex was not speci昀椀ed, resulting in the generation of tests in all categories supported by the architecture. With 50 000 test cases per category, this results in 28 ⋅ 50 000 = 1 400 000 additional test cases. By default, the QuickCheckVEngine generates test sequences, with a maximum length of 2048 instructions. Although the average length of a generated test case is signi昀椀cantly shorter. According to the coverage 昀椀le, in total 1 849 021 852 instructions were executed over the total of 2 150 000 test cases, resulting in an average of 860 instructions per test case. Code coverage is measured using gcov, a tool that is part of the GNU Compiler Collection (GCC) [54]. Measuring the code coverage gives insights on how meaningful the test cases gen- erated by the TestRIG are. To add coverage measurement to the VP it is rebuilt with the

4 Veri昀椀cation using TestRIG    40

-fprofile-arcs -ftest-coverage 昀氀ags, which are used to instrument the code for coverage measurement. The coverage is then measured by running the VP with the generated test cases, which then generates a coverage 昀椀le that contains coverage information for each 昀椀le. The coverage 昀椀le contains information about how many times each line of code was executed. The coverage 昀椀le is then processed by gcov to generate a report that shows the coverage of each 昀椀le line by line as a Hypertext Markup Language (HTML) report. The full coverage analysis is too large to be included in this thesis, but the top-level results are shown in Appendix A. These results show that the overall coverage is rather low, with only 22.3% of lines and 29.7% of all functions executed. However, these results are misleading, as they include several 昀椀les that are not relevant for the analysis. So at 昀椀rst, the vendor/* directory is excluded, as its 昀椀les must not be tested. The next step is to exclude the 昀椀les, that are not relevant for the platform under test, as the VP implements multiple platforms, and de昀椀nes peripherals and components that are not used by all platforms. This includes 昀椀les that are replaced by the CHERI implementation, such as the mem.h 昀椀le, which is replaced by the cheri_mem.h 昀椀le (that implements tagged memory) for all platforms that support CHERI. Additionally, 昀椀les that are entirely related to extensions not enabled during the tests, such as the 昀氀oating-point and vector extensions are excluded. Other 昀椀les that are expected to be not used, such as the syscall interface, the debug functions and the GDB server implementation, are also excluded. The implementation of the RVFI-DII is also excluded, as it is only a helper and not the unit under test. Also, its coverage was generally high, only the error paths are never reached, as the RVFI communication worked as expected. After everything not relevant for the analysis is excluded, the coverage of the remaining 昀椀les is 34.6% of lines and 70.1% of functions executed. As these numbers still appear rather low, some more details must be given. As the detailed investigation of each 昀椀le would exceed the limits of this thesis, the focus is on the 昀椀les that were modi昀椀ed or added during the implementation of the CHERI extension.

4.6.1 Common/CHERI

The 昀椀les in core/common/cheri are all added by the CHERI extension. The coverage of these 昀椀les is 91.3% of lines which is quite high already. But a detailed analysis gives some quite interesting insights. The capability implementation in cheri_capability.h has only one function not covered, which is getTop(). This function is not used by the implementation internally, but only used for the cgettop instruction. Missing instructions are analyzed when viewing the instruction decoding in Section 4.6.4. So besides this missing instruction, the coverage of the cheri_capability.h 昀椀le is 100%. Some more interesting observations can be made in the cheri_exceptions.h 昀椀le, which imple- ments the CHERI speci昀椀c exceptions and their handling. Based on the exception to string conver- sion, it can be seen, that not all exceptions are raised during the tests. First, the CapEx_None and the default handling are expected to be never executed. CapEx_UserDefViolation and CapEx_UnalignedBase are expected to be never executed, as the speci昀椀cation marks them as reserved. More interesting are the CapEx_GlobalViolation and CapEx_PermitSetCIDViolation exceptions, which are de昀椀ned in the speci昀椀cation, but are never raised in the implementation of the VP. Inspection of the Sail model shows, that these exceptions are never raised in the Sail model either, which raises the question of why those are de昀椀ned in CHERI ISAv9 at all. The last excep- tion that is never raised during the tests is the CapEx_PermitStoreLocalCapViolation, which is caused by the fact, that this exception is only raised, if a capabilities’ global 昀椀eld is not set, when it is used as authorizing capability for a store operation. This seems to be never the case in any of the generated test cases. It is up to future research to analyze, if the TestRIG is not able to clear the global 昀椀eld with its implemented instructions, or if this has simply not happened due to the random nature of the test case generation.

4 Veri昀椀cation using TestRIG    41

The capability register implementation in cheri_regfile.h lacks coverage because its write method is never called. This is caused by the fact, that in the ISS, the registers are directly accessed. The write method is only used for external access, like the syscall interface would do. As syscalls are not tested using the TestRIG, the write method is never called. The other method that is never accessed is the show method, which is only used for printing the contents of all registers at the end of execution in trace mode. As trace mode is not enabled during the TestRIG tests, this is expected to be not reached. With this in mind, the coverage of the cheri_regfile.h is 100%. The last 昀椀le in this folder is that does not have 100% coverage is cheri_sys_mem.h, which never reaches one line inside legalizeTcc, because PCC relocation is not enabled inside the VP. PCC relocation is a planned feature in CHERI ISAv9, that is not used in the current state of CHERI-RISC-V. If PCC relocation was enabled, integer pointers would be treated as o昀昀sets relative to the address of PCC. The same feature exists with DDC as base for pointers, but this feature is also not used in CHERI ISAv9. Based on the detailed analysis of the 昀椀les in common\cheri, it can be concluded, that the TestRIG covers everything implemented in here, except for the missing cgettop instruction and the CapEx_PermitStoreLocalCapViolation, which is never raised.

4.6.2 RV64/CHERI64

The 昀椀les in core/rv64/cheri64 contains all 昀椀les added by the CHERI extension that are speci昀椀c to the 64-bit RISC-V architecture. Total coverage of these 昀椀les is 68.1% of lines and 65.3% of functions executed. In the 昀椀le cheri_addr_checks.h, coverage is missing two lines, which can not be reached, because PCC relocation is not enabled. The 昀椀le cheri_prelude.cpp that implements the architecture speci昀椀c EncCapability type and the conversion from and to the Capability type has a coverage of 100%. The last 昀椀le in this folder is the cheri_mem.h 昀椀le, which implements the CHERI variant of the memory interface called CombinedTaggedMemoryInterface. Several lines in this 昀椀le are expected to be not reached. This includes lines related to error handling (e.g., null pointer checks), tracing and debug output and memory access via the DMI, which is not enabled. Besides these expected misses, the coverage reveals some more interesting insights. First, as already observed in Section 4.6.1, the CapEx_PermitLocalStoreCapViolation is never raised. Second, the functions related to virtual memory management, like loading and storing PTE and accessing the TLB are never called. This is because the TestRIG only runs the pte test pattern, if the S extension (Supervisor mode) is enabled. These tests were intentionally not enabled, because of di昀昀erences in the implementation of the TLB in the Sail model and the VP. Details on this di昀昀erence are explained in Section 4.7.4. Third, the methods for loading instructions are never called, as instructions are not loaded from memory, but injected directly via RVFI-DII. Forth, the strictly typed methods for loading and storing data (e.g, load_word) are never called. This is because with the CHERI implementation, the load and store instructions were edited to use the more generic handle_store_data_via_cap and handle_load_data_via_cap methods, that pass the size of the data to be loaded or stored as parameter. The old typed methods are still implemented, as still exist only as an artifact of the original CombinedMemoryInterface. They might be required, once the loading and storing is handled via the LSCache again, and are therefore not removed. Another interesting fact is, that the methods for atomic loading and storing of tagged data are never called, although the atomic and CHERI extensions are enabled in the architecture. Analysis of the executed instructions in iss_ctemplate.cpp shows, that their related instructions are never executed. This is also the case for the new added sub-word atomic instructions added by the CHERI

4 Veri昀椀cation using TestRIG    42

extension. The explanation for this is that these instructions are not generated by the TestRIG, which is further analyzed in Section 4.6.4. Overall, the coverage shows, that the implementation is tested quite well. The only obvi- ous missed tests are related to the atomic extension’s interaction with the CHERI exten- sion. As the methods related to the atomic extension are the more complex methods of the CombinedTaggedMemoryInterface this marks a big gap in the coverage of the TestRIG.

4.6.3 Platform

The 昀椀le platform/cheri/cheri_main.cpp is the main 昀椀le of the CHERI platform implemen-
tation. While it reaches a coverage of only 83.3% of lines, all lines that are not reached are
caused by the con昀椀gured options the platform is started with. Lines not reached are related to
the not enabled debug_bus, instr_dmi, data_dmi, debug_runner and the intercept_syscalls
options. Also, the DirectCoreRunner is not used, as the RVFI-DII is implemented in the alter-
native RvfiDiiCoreRunner. Therefore, coverage of this 昀椀le can be considered as 100% of relevant
lines.
The 昀椀le platform/common/cheri_memory.h implements the TaggedMemory architecture. The cov-
erage of this 昀椀le is 73.8% of all lines. However, the missing lines are entirely not relevant, leading
to an e昀昀ective 100% coverage of relevant lines. Some lines are the destructor of the TaggedMemory,
which is never called, as the memory is never freed. Other lines are static assertions that ensure
the write_data method is correctly called and the TagExtension of the TLM transaction is not
a null pointer. Lastly, the get_direct_mem_ptr method is never called, as DMI is not enabled
during the tests.
The other 昀椀les in the platform folder are not directly related to the CHERI extension, and were
not modi昀椀ed during the implementation. Nevertheless, they are included in the analysis, as they
are a mandatory part of the CHERI-RISC-V VP++. The lines of code that were not covered by
the TestRIG are related to async handling, debug features of the TLM handling, unused options
and handling of incorrect options.

4.6.4 Common

Files in the core/common folder originate from the unmodi昀椀ed RISC-V VP++, but some of them
were changed to support the CHERI extension. For this section, only the 昀椀les that were modi昀椀ed
are analyzed, although all of them are included in the coverage analysis.
The DBBCache-Dummy implemented in dbbcache.h was adapted for CHERI, as fetching and
jumps are handled in this 昀椀le. The not covered lines are related to the fetching of instructions from
memory, as this is not done when injecting instructions via RVFI-DII. Besides these functions, the
dbbcache.h 昀椀le has a coverage of 100% of relevant lines.
The 昀椀le instr.cpp implements the decompression and decoding of fetched instructions. The overall
coverage of this 昀椀le is rather low, which is to be expected, as the vector and 昀氀oating-point extensions
are not enabled during the tests. This 昀椀le provides detailed insight on the instructions that are not
covered by the TestRIG. The coverage of the decompression method shows, that all compressed
instructions related to RV64 are used at least once. The analysis of the decoding gives the following
list, which shows all instructions that are never decoded, although they should be supported by
the architecture:
 cgettop: This instruction is not de昀椀ned in the QuickCheckVEngine, although it is speci昀椀ed
 in CHERI ISAv9 that was released in September 2023. As the last version of the QuickCheck-
 VEngine was released in April 2024, this is a bug in the QuickCheckVEngine, which should
 support the cgettop instruction.
 csetequalexact: This instruction was added in version 8.0 of the CHERI speci昀椀cation [6], but
 neither the TestRIG nor the Sail model have de昀椀nitions of this instruction.

4 Veri昀椀cation using TestRIG    43


jalr,pcc: This instruction is not in the TestRIG’s instruction set, but is de昀椀ned in the Sail
model and the CHERI ISAv9.
ccleartags: This instruction is marked for future use in CHERI ISAv9, and does not have an
implementation in the Sail model. It is not included in the TestRIG’s instruction set.
fpclear: Only relevant if 昀氀oating-point support is enabled, which is not the case.
amoswap.c: This instruction repurposes amoswap.q instruction. The same is done for reserved
load lr.q and store conditional sc.q instructions. However, lr.q and sc.q are de昀椀ned in the
TestRIG and get executed during the tests, while amoswap.q is missing for unknown reasons.
CHERI de昀椀nes sub-word atomic instructions, required due to the precise bounds of CHERI.
But all four de昀椀ned instructions (sc.b, sc.h, lr.b and lr.h) are not de昀椀ned in the TestRIG.
This appears to be another issue of the QuickCheckVEngine, as the Sail model does implement
these instructions. It is planned to report this issue to the developer of the QuickCheckVEngine
for further investigation.

4.6.5 RV64

The folder core/rv64 contains 昀椀les that are speci昀椀c to the 64-bit RISC-V architecture. While all of these 昀椀les originate from the unmodi昀椀ed RISC-V VP++, some of them were touched for the CHERI implementation. Those are analyzed in this section, while the others are ignored to not clutter this section too much. The 昀椀le csr.h, which implements the CSRs including those added by CHERI, shows an interesting result. The method default_write64 is never invoked. This method would normally handle writes to all CSRs that do not require special constraints. In the used test pattern, however, CSRs are not tested at all, since the architecture string omits the Zicsr extension. This extension was disabled intentionally, because comparing the values of certain CSRs is meaningless due to their non-deterministic nature (see Section 4.7.3). The 昀椀le core/rv64/iss_ctemplate.cpp, which implements the ISS for the 64-bit architecture, is of particular interest, as it contains the implementations of all instructions, including the new CHERI related instructions. Since the coverage of this 昀椀le only reached 25.5% of its lines, it must be analyzed in more detail. First, many lines in the ISS are only executed in trace mode, which was not enabled during the TestRIG tests. Second, some lines are static assertions, that ensure the correct setup of the opcode switch table, as it is implemented using macros. Because the macros rely on the correct de昀椀nitions of the opcode mapping table some sanity checks are performed. As explaining every line not reached would exceed the limits of this thesis, the following will list instructions that were never executed during the tests and try to clarify if this is an issue. All instructions that are not listed were executed at least once during the tests, and all of their execution paths were fully covered. ecall is never called in Supervisor mode, only in Machine mode and User mode. This is because S and N extensions are not enabled in the architecture string. Instructions related to the M extension are never executed, as the architecture string speci昀椀ed when running the tests does not include the M extension. Instructions related to the F and D extensions are never executed, as the architecture string speci昀椀ed when running the tests does not include the F or D extension. Instructions related to the V extension are never executed, as the architecture string speci昀椀ed when running the tests does not include the V extension. The sign extension path of the cgettype and ccopytype instructions for positive otype values is never reached. This is reasonable, as all speci昀椀ed otype values are negative values.

4 Veri昀椀cation using TestRIG    44


Instructions that are never decoded, as explained in Section 4.6.4, are obviously never executed.

Besides the instruction implementations, iss_ctemplate.cpp, handles trap handling, reading of CSRs and other functionalities in separate methods. Based on their coverage a few more lacks in the tests can be identi昀椀ed. Not all CSRs are read during the tests. While some of them are related to certain extensions, others should be readable in any implementations and should be tested. The same applies for writing CSRs. However, the problem is even more prominent here, as the default case, which writes all CSRs that do not require special handling, is never called. Trap handling is never performed in User mode, which is expected, because this feature was removed from the RISC-V speci昀椀cation, but is still implemented in the RISC-V VP++ as it does not break RISC-V compliance. Interrupt handling is not tested at all. However, this is not expected due to the design of the TestRIG. Trap preparation (prepare_trap) is only handled in Machine mode. Apart from the lacking coverage explained above, the remainder of the ISS is fully tested. It can be concluded that although overall coverage of this 昀椀le is rather low, the lines of code that are actually relevant to the CHERI extension are covered quite well, with a few issues that need further investigation.

4.6.6 Summary

Based on the in-depth analysis of the coverage achieved by running the TestRIG with 2 150 000 test cases, it can be concluded that the TestRIG is able to cover most of the relevant and testable code of the CHERI-RISC-V VP++. One has to keep in mind, that several functionalities of the VP can not be tested using the TestRIG, by design. This includes actual fetching of instructions from memory, interrupt handling, the syscall interface and interaction with peripherals. Additionally, the results revealed several lacks of the TestRIG, like the missing instructions, which should be addressed in future work.

4.7 Problems not found with the TestRIG

Although the coverage analysis with the TestRIG shows promising results, certain 昀氀aws remained undetected during that phase and only surfaced later in the development process. The following sections present scenarios that escaped detection by the TestRIG but became apparent when attempting to boot CheriBSD in Chapter 6. These sections include an analysis of why the TestRIG failed to capture these bugs.

4.7.1 Trap Handling

When a trap occurs, RISC-V implementations are required to set the xtval register (mtval, stval, utval, depending on privilege), which contains additional information about the cause of the trap. This behavior remains unchanged by the CHERI extension. However, if the CHERI extension causes a trap, the xtval register must hold details about the capability exception, which includes the index of the capability register that caused the exception, but also the cause. Although this was implemented incorrectly at 昀椀rst, the TestRIG did not 昀椀nd this bug. On further investigation, the reason is quite clear. While the TestRIG does generate instruction sequences that cause exceptions, it only compares the result of the instruction to the expected result. Therefore, the TestRIG checks for missing traps and incorrect destination register values in trap scenarios, yet it does not verify any side e昀昀ects of the instruction. The issue is not only

4 Veri昀椀cation using TestRIG    45

1 # . 2 #>QCVENGINE_TEST_V2.0 3 .4byte 0xfc4080db # cinvoke x1, x4 # -> CHERI Exception: Seal Violation 4 .4byte 0x300210f3 # CSRRW ra (x1), tp (x4), 0x300 (MSTATUS_ADDR) 5 .4byte 0x305210f3 # CSRRW ra (x1), tp (x4), 0x305 (MTVEC_ADDR) 6 .4byte 0x340210f3 # CSRRW ra (x1), tp (x4), 0x340 (MSCRATCH_ADDR) 7 .4byte 0x341210f3 # CSRRW ra (x1), tp (x4), 0x341 (MEPC_ADDR) 8 .4byte 0x342210f3 # CSRRW ra (x1), tp (x4), 0x342 (MCAUSE_ADDR) 9 .4byte 0x343210f3 # CSRRW ra (x1), tp (x4), 0x343 (MTVAL_ADDR)

    Listing 4.3: Test case to check trap handling

true for trap handling, but for all other instructions that cause side e昀昀ects, such as the csrrw instruction, which manipulates CSRs. In terms of a trap, this results in the TestRIG not validating the contents of CSRs that are related to trap handling, after a trap. To further con昀椀rm this issue, the trap handling behavior of the VP was intentionally broken, by always setting the xtval register to 0 on every trap. Then the TestRIG was run for 100 000 test cases, which never found this bug. It is worth noting that this scenario could theoretically be detected by the TestRIG. To demonstrate this, a custom test case was created that 昀椀rst triggers a trap and then reads the CSRs related to trap handling. The test case, shown in Listing 4.3, successfully identi昀椀es the issue. The test case indicates that, given a sufficient number of test cases, the TestRIG is capable of 昀椀nding this bug. It also implies that using a more advanced test case generation algorithm, which takes into account the side e昀昀ects an instruction may cause, could improve the detection of similar issues.

4.7.2 Missing Instructions

As the code coverage analysis revealed in Section 4.6.4, the TestRIG is missing some instructions that are part of CHERI ISAv9. In Appendix B.2.2 of [6], atomic memory access instructions for sub-word operations are de昀椀ned, which are required due to the precise capability bounds. These four instructions (lr.b, sc.b, lr.h, sc.h) seem to be missing in the TestRIG instruction set. However, these instructions are used by CheriBSD and are therefore mandatory for a complete implementation.

4.7.3 Non-deterministic Behavior

The TestRIG also struggles with naturally di昀昀ering behavior of implementations. An example of this would be the register mtime, which can be read using CSR access instructions. According to the RISC-V speci昀椀cation [15], this register is a free-running timer that counts the number of cycles since the last reset, counted based on a 昀椀xed frequency. However, the speci昀椀cation does not de昀椀ne the frequency of this timer, leading to di昀昀erent implementations having di昀昀erent frequencies and thus di昀昀erent values in the mtime register. So di昀昀erent implementations might return di昀昀erent values for the mtime register without indicating a bug in either implementation. The same applies for other CSRs like vendorid or archid. But also some interrupt related behavior can di昀昀er between implementations. Also, the misa register, which holds the supported ISA extensions can di昀昀er between implementations, which was the 昀椀rst issue that indicated this issue during these tests. However, this issue is known to the developers of the TestRIG and currently labeled as enhancement request #231 in the TestRIG repository. For the moment, test cases that fail due to this must be ignored manually.

1https://github.com/CTSRD-CHERI/TestRIG/issues/23

4 Veri昀椀cation using TestRIG    46

4.7.4 Virtual Addressing

While the TestRIG has a test pattern for virtual memory management called pte, it could not be used for the comparison of CHERI-RISC-V VP++ and the Sail model. This is caused by very di昀昀erent design decisions in the implementation of the TLB and the page table walks. Because in the VP the TLB is split for each type of access (load, store, fetch), while the Sail model uses a single TLB for all accesses. While both implementations have their advantages and both are valid according to the RISC-V speci昀椀cation, this leads to di昀昀erent behavior in the two implementations. Thus, the TestRIG is not able to compare the two implementations, as both report di昀昀erent memory accesses that occur when performing a page table walk.

4.8 Conclusion

Overall, the TestRIG provides a good starting point for testing a new RISC-V implementation and the CHERI extension. It is able to cover most of the relevant code and can be used to 昀椀nd bugs in the implementation. The automatic test case generation and test case reduction allows for a quick and efficient testing of the implementation. However, as the results show, the TestRIG is not able to 昀椀nd all bugs in the implementation. While some not found bugs are due to the nature of random test case generation, others can never be found by the TestRIG due to design constraints. Because using the RVFI-DII interface requires modi昀椀cations in the control 昀氀ow of the tested implementation, the developer must take special care to ensure the testee’s behavior is identical when controlled by the TestRIG and when running in its regular operation mode. While passing thousands of test cases generated by the TestRIG is a good indicator of a correct implementation, it is by far not a guarantee that the tested implementation is functionally equiv- alent to the reference implementation. Additionally, even if the TestRIG could prove functional equivalence, this still does not guarantee a correct implementation, as the reference implementa- tion could also be incorrect. This issue can be seen when running TestRIG to compare the working CHERI-QEMU model to the Sail model, which are both maintained by the CTSRD team, but do not pass all tests generated by the TestRIG. But with all limitations of the TestRIG in mind, it can be a very useful tool for early veri昀椀cation of a new RISC-V implementation, and it de昀椀nitely helped to create a solid foundation for the CHERI-RISC-V VP++ that can now be used for further evaluation of CHERI.

Chapter 5

Running Bare-Metal Software on
CHERI-RISC-V VP++

With veri昀椀cation of the CHERI-RISC-V VP++ implementation completed, the next step is to explore its capabilities by running actual CHERI-enabled software. To fully understand the im- plications of CHERI on software, a bottom-up approach was chosen for this thesis. Consequently, the following sections begin with minimal assembly programs, gradually progressing to more so- phisticated C code examples. This approach allows a clear view on low-level mechanisms that are hidden by higher-level languages and abstractions. All examples shown in this chapter can be built for bare-metal CHERI systems and executed on the CHERI-RISC-V VP++. The examples are chosen to not only demonstrate the functionality of the CHERI implementation, but also to explain the basic principles of CHERI. All programs listed in the following sections are compiled using the capability-enabled LLVM toolchain introduced in Section 2.4.9. The programs are built using clang, with the following arguments:

clang --target=riscv64-unknown-elf -march=rv64imafdxcheri -mabi=l64pc128d -nostdlib

This command sets the target triple to the RISC-V architecture of an unknown vendor running on bare-metal. Using the march argument, the supported instruction set is de昀椀ned, which includes the CHERI extension (xcheri). The mabi argument sets the Application Binary Interface (ABI) to the 64-bit with 64-bit stack alignment and 128-bit capabilities in pure capability mode and enables double-precision 昀氀oating point. The nostdlib arguments prevents the inclusion of the standard library, which is not yet fully available for bare-metal CHERI systems. For linking, a custom linker script is used, which de昀椀nes the sections .text, .data, and .bss, as well as the entry point of the program. The generated output is an ELF 昀椀le, which can be executed by the CHERI-RISC-V VP++. This is done by running the cheri-vp platform using the following command:

cheri-vp --intercept-syscalls --cheri-purecap

The argument --intercept-syscalls enables the interception of system calls and handle them directly in the ISS. The argument --cheri-purecap enables the pure capability mode by default, which is required when running programs built in pure capability mode on bare-metal.

5.1 CHERI-enabled RISC-V Assembly Programs

Although CHERI is mainly designed to allow easy retro昀椀tting of existing codebases, its design goal is primarily focused on higher-level languages, like C and C++. Because CHERI introduces new instructions to the ISA it is obvious that assembly programs need modi昀椀cations to make use of this new functionality. The e昀昀ects and changes of CHERI on assembly programs are best understood by comparison to a non-CHERI program. Therefore, the next section will 昀椀rst present a simple non-CHERI assembly

47

5 Running Bare-Metal Software on CHERI-RISC-V VP++    48

program, which is used as the baseline for upcoming explanations. Based on this simple example, the following sections will describe the required changes, to make the program executable on a CHERI-enabled system in pure capability mode, in this case the CHERI-RISC-V VP++.

5.1.1 Initial Assembly Implementation

The chosen example is a RISC-V assembly program that does not use any CHERI features, as shown in Listing 5.1. The program simply adds two numbers and returns the result as the exit code of the program. The program execution is started in Line 9 at the label _start, which is de昀椀ned as the entry point of the program in the linker script. The 昀椀rst instruction lla ra, main loads the address of the label main into the register ra, which is then used to jump to the function by executing jalr ra. In the main function, two numbers are loaded into the temporary registers t0 and t1 using li instructions. These two numbers are then added together using the add instruction, which stores the result in the register a0, which is the return value of a function based on the RISC-V calling convention. Finally, the program jumps to the exit label (Line 15). There the value 93 is loaded into the register a7, which is the system call number for exiting a program. The last instruction ecall is used to trigger the system call, which terminates the program and returns the value in a0 to the OS. When compiled using the standard riscv-gnu toolchain this code can be executed on the RISC-V VP++ and the return value is the expected sum of 42 + 84 = 126. The return value can be observed by inspecting the return value of the VP after execution. By using a debugger, or running the VP in trace mode, which prints executed instructions, it becomes evident that all instructions execute in the expected order. This simple example forms the foundation for understanding how CHERI changes the execution model. In the following, the same program is revisited in the context of pure-capability mode, where some instructions must be replaced by their capability-aware counterparts.

5.1.2 Modifying Assembly for CHERI

Trying to build the code from Listing 5.1 with a CHERI-aware toolchain in pure capability mode fails. The reported errors are due to the instructions ret, lla and jalr, that are not available in capability mode. Instead, their capability-aware counterparts must be used, leading to the modi昀椀ed code shown in Listing 5.2. The instruction lla is a regular RISC-V pseudo instruction, that gets translated to an auipc instruction, loading the upper bits of the labels address into the register ra, followed by addi, that loads the lower bits of the address into the register ra. However, with CHERI in pure capability mode, the use of auipc is not allowed. Instead, the instruction auipcc must be used, which writes PCC into the destination capability register and modi昀椀es its address in the same way as auipc would do. While addi could be used in capability mode, it would then invalidate the capability in cra, as it is a non-capability-aware instruction. Instead, cincoffset is used, which increments the address without invalidating the capability. The resulting instructions are combined in the new pseudo instruction cllc, which expands to the explained CHERI instructions, but is functionally equivalent to lla in a non-CHERI context. The replacement can be seen in Line 10 and 12 of Listing 5.2. At this point it is important to understand, that while the label main is still an integer address value, the loaded value in cra is now a capability, derived from PCC. Also, the instruction jalr is not available in capability mode, instead cjalr must be used, which is a capability-aware version of the jump and link register instruction. This replacement can be seen in Line 11 and 13. The instruction cjalr cs1, is a short form pseudo notation, that is extended to cjalr cra, cs1, 0, similar how jalr rs1 is a short form of jalr ra, rs1, 0. This short

5 Running Bare-Metal Software on CHERI-RISC-V VP++    49

1 .section .text 1 .section .text 2 main: 2 main: 3 li t0, 42 3 li t0, 42 4 li t1, 84 4 li t1, 84 5 add a0, t0, t1 5 add a0, t0, t1 6 ret 6 cret 7 7 8 .globl _start 8 .globl _start 9 _start: 9 _start: 10 lla ra, main 10 cllc cra, main 11 jalr ra 11 cjalr cra 12 lla ra, exit 12 cllc cra, exit 13 jalr ra 13 cjalr cra 14 14 15 exit: 15 exit: 16 li a7, 93 16 li a7, 93 17 ecall 17 ecall

Listing 5.1: Simple assembly program without Listing 5.2: Simple assembly program with CHERI support CHERI support

form notation is commonly used in RISC-V assembly code, when jumping to a function without an immediate o昀昀set. In Line 11 and 13, the register cra, which is the capability extended version of ra, is used as the register cs1, which holds the target address of the jump. This leads to the full extended form cjalr cra, cra, 0, which means that the program jumps to the address stored in cra and stores the return address in cra as well. The CHERI-enabled cjalr instruction 昀椀rst checks if cra authorizes the instruction and then sets cra to the next instructions PCC, replaces PCC with cra and increments the address of PCC by the immediate value. As ret is a pseudo instruction that translates to a jalr instruction, it must be replaced by cjalr, or by the new pseudo instruction cret, which is used in Line 6. The replacements lead to the assembly code shown in Listing 5.2. While this code does now compile, runs on the CHERI-RISC-V VP++ and shows the same results, additional considerations are required for more sophisticated programs.

5.1.3 Initializing the Stack Pointer

In the RISC-V VP++ the stack pointer is initialized during the init procedure of the ISS. At this point the stack pointer is set to the end of the memory region that is passed to the ISS. Therefore, no additional stack pointer initialization is required, when running non-CHERI code on the RISC-V VP++. However, in CHERI mode the stack pointer is extended to a capability (csp), as all other registers in the RegFile. To allow loading and storing data on the stack, csp must be a valid capability with according permissions. A valid csp could be achieved, by setting csp to the In昀椀nite Capability during the initialization of the VP, but this would di昀昀er from the speci昀椀ed behavior of the CHERI architecture, which requires all registers to be initialized with the Null Capability. Instead, programs must initialize the stack pointer capability themselves. Programs can only derive capabilities from existing ones, and never generate them from scratch. This is the application of CHERI’s principles of least privilege and intentional use. Therefore, the stack pointer capability must be derived from the DDC in the bootstrap code. As the stack pointer already holds the end of the memory region, sp is 昀椀rst saved to a temporary register t0, and the address of csp is set to t0, after csp is set to DDC. The resulting assembly code is shown together with the changes from the next section in Listing 5.5, where the Lines 11 to 14 perform the described stack pointer initialization.

5 Running Bare-Metal Software on CHERI-RISC-V VP++    50


1  label:
2   auipcc  a0, %captab_pcrel_hi(symbol) # R_RISCV_CHERI_CAPTAB_PCREL_HI20(symbol)
3   clc     ca0, %pcrel_lo(label)(ca0) # R_RISCV_PCREL_LO12_I(label)

            Listing 5.3: Expansion of the clgc pseudo instruction

5.1.4 The Global Capability Table

While the code shown in Listing 5.2 does execute correctly, it still has a critical 昀氀aw. According to [55], the usage of cllc is heavily discouraged as the resulting capability’s bounds and permis- sions are derived from PCC, which can not be guaranteed safe. This is because cllc does not recompute bounds or restricts permissions based on the loaded value, but simply preserves the bounds and permissions of PCC, which may be either too permissive or too restrictive depending on the execution context. As such, this can undermine CHERI’s security guarantees by avoiding the principles of least privilege and intentional use. Therefore, it is recommended to use cllc only when absolutely necessary, such as in low-level startup code. Instead, it is recommended to use clgc ca0, symbol, which is a capability-aware pseudo in- struction that expands to the assembly instruction and relocations shown in Listing 5.3. This clgc instruction is the functional equivalent to the la pseudo instruction that loads the address of a symbol from the Global O昀昀set Table (GOT) when assembling position independent code for the base RISC-V architecture. The GOT is a structure used in conventional compilation models to hold absolute addresses of global symbols (such as variables and functions) for use in position- independent code, allowing the linker to resolve references at runtime. In CHERI, the Global Capability Table (GCT) serves a similar purpose, but holds capabilities instead of plain pointers. The GCT is represented by the .captable section in the ELF 昀椀le. The .captable is created by the compiler and completely replaces the traditional GOT. The clgc instruction loads a capability from the GCT into the speci昀椀ed capability register. To create a valid capability and follow the principle of least privilege, the compiler must understand the context in which the capability is used. In the example from above, this requires to specify the size of the main and exit sections, and also declaring them as functions. This changes can be seen in Line 2 and 8 of Listing 5.5. It is not possible for the compiler to store the required capabilities in the .captable section, as traditional memory (and therefore ELF binary 昀椀les) can not hold capabilities, due to the out- of-band Tag bit. Instead, the compiler emits capability relocations (__cap_relocs) to describe how the capability table should be initialized. This relocation contains all information required to initialize the .captable section located in the tagged memory of the CHERI system. The format of the relocations is speci昀椀ed in [55]. A single relocation is 40 bytes in size and contains the following 5 昀椀elds of 8 bytes each: Location The virtual address where the capability is to be stored. Base The virtual address of the symbol being pointed to. This is the base of the derived capability. O昀昀set The o昀昀set to add to the base address to get the 昀椀nal address of the capability. Length The length of the capability, which is used to set the bounds of the capability. Flags Flags that specify the permissions of the capability. Only two bits are de昀椀ned: The Most Signi昀椀cant Bit (MSB) indicates an executable capability. If this is not set the capability is considered a data (non-executable) capability. The second MSB marks the capability as read- only and will only permit loads, if not set, the capability permits both load and stores.

5 Running Bare-Metal Software on CHERI-RISC-V VP++    51



1  #include <cheri_init_globals.h>
2  #include "init_globals.h"
3
4  void init_globals() {
5  cheri_init_globals_3(__builtin_cheri_global_data_get(),
6   __builtin_cheri_program_counter_get(),
7   __builtin_cheri_global_data_get());
8  }

                    Listing 5.4: Initialization of the GCT


1  .section .text
2  .type main, @function
3  main:
4   li           t0, 42
5   li           t1, 84
6   add          a0, t0, t1
7   cret         # Pseudo for cjalr cnull, cra
8  .size main, . - main
9  .globl _start

10 _start: 11 # Stack pointer initialization 12 addi t0, sp, 0 # Store current sp 13 cspecialr csp, ddc # Set csp to DDC 14 cincoffset csp, csp, t0 # Set addr of csp to previous sp 15 cllc ct0, init_globals 16 cjalr ct0 17 clgc cra, main 18 cjalr cra 19 clgc cra, exit 20 cjalr cra 21 .type exit, @function 22 exit: 23 li a7, 93 24 ecall 25 .size exit, . - exit

        Listing 5.5: Simple Assembly Program with CHERI Support and GCT

In a dynamically linked object the dynamic linker 昀椀lls the .captable section during loading. But, in a bare-metal context, where executables are statically linked, it is up to the startup code to 昀椀ll the .captable section with the correct capabilities. Fortunately, the LLVM toolchain pro- vides functions that can be used for this initialization. This functions can be found in the 昀椀le cheri_init_globals.h of the CHERI LLVM project and is called cheri_init_globals_3. The simplest solution for calling this function is to create a minimal C program, as shown in Listing 5.4, and call the init_globals function in the beginning of the _start function. The C program is then compiled using clang, using the exact same arguments as when building the assembly code. The resulting object 昀椀le is then linked with the object 昀椀le of the compiled assembly code in the linking step. The clang linker allows specifying multiple object 昀椀les, which are then linked together to create the 昀椀nal executable. With all the changes applied, the resulting assembly code is given in Listing 5.5. Notice that the call of init_globals is still done using cllc, as at this point the GCT is not yet initialized. But as soon as this is done, all following calls to functions make use of the GCT. Although the usage of the GCT may appear unnecessary in this simple example, it serves as a fundamental building block for all subsequent examples in this thesis. This setup is essential not only because the C compiler automatically emits the required relocations for function calls and relies on the clgc instruction by default, but also due to CHERI’s strict monotonicity principles.

5 Running Bare-Metal Software on CHERI-RISC-V VP++    52


1    [N] Name    Type   Addr Size         1   [N] Name         Type    Addr Size
2    [0]         NULL   00000 0000        2   [0]              NULL    00000 0000
3    [1] .text   PROGBITS 100b0 0034      3   [1] __cap_relocs PROGBITS 10000 0050
4                                         4   [2] .text        PROGBITS 10050 02c4
5                                         5   [3] .captable    PROGBITS 10320 0020

    Listing 5.6: Default Section Headers  Listing 5.7: Section Headers with Capability
                                                  Relocation enabled

These monotonicity guarantees enforce that capabilities can only be derived from existing ones and cannot be created arbitrary at runtime. As a result all required static capabilities must be de昀椀ned ahead of time during the building process. Therefore, even in very small scenarios, setting up the GCT is a necessary step to ensure correctness and compatibility with the CHERI architecture. To understand the GCT in more detail, the compiled and linked output of the code can be inspected using the readelf tool, that is part of the CHERI LLVM toolchain. The readelf -S outputs of the binaries from Listings 5.1 and 5.5 are compared in Listings 5.6 and 5.7. This comparison makes immediately clear that the sections .captable and __cap_relocs are present in the CHERI-aware ELF binary, but not in the non-CHERI ELF binary. The corresponding output of readelf -S for the CHERI-aware binary only using cllc (Listing 5.2) is identical to output of the ELF binary built from Listing 5.1, which shows that the sections are only created, if a global capability is required. As mentioned before, a single relocation is 40 bytes in size. As the output shows, the size of the __cap_relocs section is 80 bytes, which means that two relocations are present in the binary. Based on the code this is expected, as pointers (capabilities) for both the main and the exit function are required. The relocations can then be further analyzed using readelf -x. The output of this is given in Listing 5.8, be aware that the endianness of the output is inverted. Interpreting this output according to the format speci昀椀ed in [55] leads to the following values: Capability 1 (main): s Location = 0x00000000 00010320 s Base = 0x00000000 00010050 s O昀昀set = 0x00000000 00000000 s Length = 0x00000000 00000010 s Permissions = 0x80000000 00000000 Capability 2 (exit): s Location = 0x00000000 00010330 s Base = 0x00000000 00010088 s O昀昀set = 0x00000000 00000000 s Length = 0x00000000 00000008 s Permissions = 0x80000000 00000000 These contents do match with the expectations when looking at the code in Listing 5.5. The addresses of the capabilities are within the .captable section, which starts at 0x10320. The bases point to the addresses of the two functions, which do match with the disassembly of the built binary. The o昀昀sets are 0, as the entry point of each function is at its beginning. The lengths are 16 bytes for main and 8 bytes for exit, because the main function consists of 4 instructions, each 4 bytes in size, and the exit function consists of 2 instructions. The permissions MSB is set for both capabilities, which means that they are executable, as expected for capabilities pointing to functions.

5 Running Bare-Metal Software on CHERI-RISC-V VP++    53


1   0x00010000 20030100 00000000 50000100 00000000
2   0x00010010 00000000 00000000 10000000 00000000
3   0x00010020 00000000 00000080 30030100 00000000
4   0x00010030 88000100 00000000 00000000 00000000
5   0x00010040 08000000 00000000 00000000 00000080

    Listing 5.8: Content of the __cap_relocs section

The .captable section is only 32 bytes in size, which is exactly the size of two capabilities in RV64, as each capability is 16 bytes in size. The size of the .captable showcases the implications of the out-of-band Tag bit, which is not visible in the size of the capability table section. Instead, the compiler assumes that in a CHERI-enabled architecture, the memory is capable of holding capabilities. Further inspection of .captable using readelf -x shows that it is actually empty after linking. The results of these experiments lead to the following conclusion: The CHERI-aware toolchain does not store capabilities in the binary, but instead uses relocations to describe how the GCT should be initialized. The startup code is responsible for 昀椀lling the .captable section with valid capabilities, which includes the out-of-band Tag bit set to 1. The relocation allows binaries built for CHERI-aware systems to be stored in a traditional ELF format on unmodi昀椀ed 昀椀le systems.

5.2 CHERI-enabled C Programs

Building on the understanding of simple assembly programs on bare-metal CHERI systems, the next step is to explore how C programs can be executed on the same setup. As discussed in Section 2.4.2, a central goal of CHERI is to provide a transition path for existing C and C++ codebases, enabling them to bene昀椀t from CHERI’s memory-safety features with minimal source code changes. While the previous sections showed that assembly code required considerable mod- i昀椀cation to enable CHERI, the following sections show that it is mostly possible to run C code unmodi昀椀ed. However, certain caveats arise when targeting bare-metal systems that must be taken into account. Before diving into speci昀椀c examples, the next section will take a closer look to the requirements for running C programs on bare-metal CHERI systems.

5.2.1 Prerequisites

As it is the case for all bare-metal programs (disregarding CHERI, a bootstrap code is required that sets up the environment and calls the actual main function. With CHERI-enabled, the bootstrap code must also initialize the .captable as explained in Section 5.1.4, as the C compiler makes use of capability relocations for global pointers. Some additional modi昀椀cations are required when standard C library functions are used. As the standard C library is typically designed to work in an environment with an OS providing essential system calls, like 昀椀le I/O and memory allocation. Since there is no OS on bare-metal, functions like printf or malloc are missing an underlying implementation. To run on bare-metal a minimal libc implementation is required that provides the necessary functions. While there is a port of the newlib C library for CHERI-RISC-V, it does still require manual implementation of many low-level system functions, like _write, _read or _sbrk. As not many standard C library functions are used in the examples, newlib is not used and instead custom variants of the required functions are implemented. Those do directly access the required peripherals in the VP using the memory mapped I/O interface. As an example, printf (and the underlying putc) is implemented to directly interface with an abstract terminal peripheral located at address 0x20000000. This peripheral simply prints every received character to the console.

5 Running Bare-Metal Software on CHERI-RISC-V VP++    54

1 // Address of memory mapped terminal peripheral 2 #define TERMINAL_BASE_ADDR 0x20000000 3 4 // Non-CHERI variant 5 inline void putc(char c){ 6 // Will cause a TagViolation if CHERI is enabled 7 (char)TERMINAL_BASE_ADDR = c; 8 } 9 10 // CHERI compliant variant 11 inline void putc(char c) { 12 char *TERMINAL_ADDR = (char *)__builtin_cheri_offset_increment(cheri_ddc_get(), TERMINAL_BASE_ADDR); 13 // Will write the character c to the terminal peripheral located at 0x20000000 14 *TERMINAL_ADDR = c; 15 return 0; 16 }

       Listing 5.9: Peripheral Access in Bare-Metal CHERI Software

It is important to note that accessing a peripheral directly by dereferencing a pointer with the address of the peripheral does not work with CHERI. As the pointer is not a valid capability, the dereferencing causes a CHERI TagViolation exception. Instead, a valid capability to the peripheral must be created. This is done by creating a capability based on DDC, and then setting its address to the peripherals address. In Listing 5.9 an example of such peripheral access is shown. The 昀椀rst attempt (Line 7) shows how the peripheral access is done in a non-CHERI compliant way, that causes a TagViolation. The second approach creates a valid capability based on DDC (Line 12). This is done by using the built-in function __builtin_cheri_offset_increment, which creates a new capability by incrementing the o昀昀set of an existing capability by a given integer value. In this case, DDC is used as the base capability, and the peripherals address is used as the o昀昀set. This results in a valid capability that has the same permissions as DDC, but points to the peripherals address, and thus can be used to access the peripheral. The created capability is then used for the peripheral access in Line 14, which does not cause an exception and successfully writes the character to the terminal peripheral. If this capability is required more than once, it can also be stored in a global variable and only initialized once. A valid capability pointing to the peripherals address could also be obtained by using the GCT explained previously in Section 5.1.4. With a working bootstrap code and implementations of the required standard C library functions in place, it is now possible to run C programs on the bare-metal CHERI-RISC-V VP++.

5.2.2 Evaluating Boundary Protection

The 昀椀rst example demonstrates how CHERI can prevent a common memory corruption bug, namely reading beyond the bounds of an array. The code shown in Listing 5.10 creates an array of 5 elements and initializes them to 0, then obtains a pointer to the array and calculates the length of the array. The example then loops over the length of the array and tries to read each element. However, the loop is deliberately con昀椀gured to exceed the bounds of the array (i <= length + 5 in Line 14 of Listing 5.10). This illustrates a typical programming error in C, such as a bu昀昀er over-read, which can lead to critical security vulnerabilities. Such issues are widely exploited in real-world attacks, notably bu昀昀er over昀氀ows that permit remote code execution [56]. This test is constructed in such a way that it can be built with a standard C compiler and executed on a normal RISC-V CPU. The results are predictable: the program reads beyond the bounds of the array, accessing memory that contains values that are not part of the array. Instead, numbers that

5 Running Bare-Metal Software on CHERI-RISC-V VP++ 55

1 int main() 2 { 3 int32_t array[5]; 4 for (int i = 0; i < 5; i++) 5 { 6 // Manually set each element to 0 7 // to avoid use of memset 8 array[i] = 0; 9 } 10 int32_t *p_array = array; 11 12 uint64_t length = sizeof(array) / sizeof(array[0]); 13 // Intended read over the bounds 14 for (uint32_t i = 0; i <= length + 5; i++) 15 { 16 // pp_cap(p_array + i); // Only relevant for CHERI, prints capability 17 printf("Count: %d, Value: %d\n", i, *(p_array + i)); 18 } 19 20 return 0; 21 }

              Listing 5.10: Source code of the read beyond bounds example

might appear random, but actually contain values that are stored in adjacent memory locations, are printed to the console, as shown in Listing 5.11. Although this may not seem problematic in this simple example, it re昀氀ects a signi昀椀cant real world threat and reminds of the notorious Heartbleed bug, where attackers requested a length greater than the actual payload size [57]. When the same source code (ignoring the di昀昀erent bootstrap process for CHERI) is compiled using the CHERI-RISC-V toolchain and is then executed on the CHERI-RISC-V VP++ the results are di昀昀erent to a standard build running on the unmodi昀椀ed VP. The program output in Listing 5.12 shows that on the CHERI-RISC-V VP++, the program does not terminate successfully and raises a CHERI LengthViolation instead. This exception occurs when the program attempts to dereference a pointer whose address lies outside the bounds of the array in Line 17 of Listing 5.10. On a high level this can be explained as follows: When a pointer to an object is created, a CHERI- aware compiler initializes a capability for this pointer. When data is loaded, the address of the pointer is 昀椀rst checked against the bounds of the capability authorizing the transaction. If the address is outside these bounds, a CHERI LengthViolation is raised, preventing the access. For the given example in Listing 5.10, the capability is already created during the initialization of the array in Line 3. In Line 10, the created pointer is then assigned to the capability corresponding to the array.

1 Count: 0, Value: 0 1 Count: 0, Value: 0 2 Count: 1, Value: 0 2 Count: 1, Value: 0 3 Count: 2, Value: 0 3 Count: 2, Value: 0 4 Count: 3, Value: 0 4 Count: 3, Value: 0 5 Count: 4, Value: 0 5 Count: 4, Value: 0 6 Count: 5, Value: 5 6 CHERI Exception: LengthViolation 7 Count: 6, Value: 0 7 [ISS] Warn: Taking trap handler in machine mode 8 Count: 7, Value: 0 to 0x0, this is probably an error. 9 Count: 8, Value: 33554364 8 10 Count: 9, Value: 9 9 11 Count: 10, Value: 5 10

Listing 5.11: Output of RISC-V VP++ Listing 5.12: Output of CHERI-RISC-V VP++ running running the example the example

5 Running Bare-Metal Software on CHERI-RISC-V VP++    56

1 Capability { 2 Tag: 1, Perms: 78fff, Type: ffffffffffffffff, Flags: 0, Address: 9fffffb8, 3 Base: 9fffffb8, Top: 9fffffcc, Length: 0014, Offset: 0000 4 } 5 Count: 0, Value: 0 6 Capability { 7 Tag: 1, Perms: 78fff, Type: ffffffffffffffff, Flags: 0, Address: 9fffffbc, 8 Base: 9fffffb8, Top: 9fffffcc, Length: 0014, Offset: 0004 9 } 10 Count: 1, Value: 0 11 Capability { 12 Tag: 1, Perms: 78fff, Type: ffffffffffffffff, Flags: 0, Address: 9fffffc0, 13 Base: 9fffffb8, Top: 9fffffcc, Length: 0014, Offset: 0008 14 } 15 Count: 2, Value: 0 16 Capability { 17 Tag: 1, Perms: 78fff, Type: ffffffffffffffff, Flags: 0, Address: 9fffffc4, 18 Base: 9fffffb8, Top: 9fffffcc, Length: 0014, Offset: 000c 19 } 20 Count: 3, Value: 0 21 Capability { 22 Tag: 1, Perms: 78fff, Type: ffffffffffffffff, Flags: 0, Address: 9fffffc8, 23 Base: 9fffffb8, Top: 9fffffcc, Length: 0014, Offset: 0010 24 } 25 Count: 4, Value: 0 26 Capability { 27 Tag: 1, Perms: 78fff, Type: ffffffffffffffff, Flags: 0, Address: 9fffffcc, 28 Base: 9fffffb8, Top: 9fffffcc, Length: 0014, Offset: 0014 29 } 30 CHERI Exception: LengthViolation 31 [ISS] Warn: Taking trap handler in machine mode to 0x0, this is probably an error.

   Listing 5.13: Output of CHERI-RISC-V VP++ running the exapmle with pp_cap

The VP then enters the trap handler, which is not de昀椀ned for this bare-metal case. Therefore, the output shows a warning that the trap handler is unde昀椀ned which leads to the PC being set to 0x00 and thus leading to an in昀椀nite loop and the program never terminating. Based on this simple example, a lot of the CHERI principles and control 昀氀ow in the VP can be explained. First, the code is expanded with additional print statements that show the contents of the capability. The capability is printed by calling the pp_cap function in every iteration of the loop before the pointer is dereferenced. This function is based on the CHERI examples by the CapableVMs research project [47]. The pp_cap function takes a capability as argument and prints all its properties to the console. The addition of this print results in the output shown in Listing 5.13. This output shows, that the pointer to the array is indeed a capability and various observations can be made: The Tag bit is always set to 1, indicating that the capability is valid throughout the entire execution. The permissions are always 0x78昀昀f, which indicates that the capability has all permissions set. The 12 architectural permissions (see capability encoding in Section 2.4.5) are indicated by the value 0x昀昀f. The 4 software permission bits are shifted by 15, resulting in the value 0x78000. The permission value results from the way the capability is created. Unlike the function pointer capabilities explained in the bare-metal requirements section, this capability is created during runtime and is not stored in the relocation table. Instead, the capability’s permission is derived from another capability. As the array is stored on the stack, the capability pointing to the array is created based on csp. And csp is derived from DDC in the bootstrap code, which is the In昀椀nite Capability by default, thus granting all permissions. If the same program is built for CheriBSD, permissions are more restricted, but more on that in Chapter 6.

5 Running Bare-Metal Software on CHERI-RISC-V VP++ 57

Unmodified CHERI CHERI Encoding OpIds Capabilities Capabilities Modified Program Instruction OpId SCRs CSRs Registers New Counter Encoding Table Capability

PCC Instruction Instruction Instruction CHERI OpId Operation CHERI Operation Execution Checks Fetching Decoding Selection Field Values (imm, reg, ...) CHERI Instruction Execution

  Figure 5.1: Overview of Data Flow in the ISS of CHERI-RISC-V VP++

The type set to -1 (0x昀昀昀昀昀昀昀昀昀昀昀昀昀昀昀昀) indicates that the capability is unsealed. The 昀氀ag is always 0, as it is only relevant for PCC. The address is incremented by 4 in each iteration, as the counter value is added to the pointer in line 16. As the pointer points to a 32-bit integer array, this results in an increment by 4. The base and top address of the capability remain unchanged throughout the iterations, as the pointer always refers to the same array. The length is 20 bytes, which is the di昀昀erence between the base and top values, and also corresponds to the number of elements in the array, if divided by the size of a single element. In this example, an element has 4 bytes (int32_t), with an array of 5 elements, this results in a total length of 20 bytes. The o昀昀set is the distance between the base and the actual address and therefore increments by 4 in each iteration. To conclude this, it can be seen, that as long as the address is within the bounds of the capability, the dereferencing works as expected. But as soon as the address reaches the end of the capabilities bounds, the LengthViolation occurs and the value at array[5] is never printed. To learn even more from this example, the assembly code of the example can be inspected. Stepping through the disassembly shows, that it is the instruction lw that actually causes the CHERI LengthViolation, which shows that CHERI does not only implement new instructions, but also modi昀椀es the behavior of existing instructions. To understand the changes of CHERI in more detail the execution of the instruction lw in the VP is analyzed in the next paragraphs. In Figure 5.1, which is extended from Figure 2 in [22], the execution 昀氀ow is visualized. Again color coding was changed to highlight the modi昀椀cations made by CHERI, with new elements in green and modi昀椀ed elements in orange. 1. The next instruction is fetched from the current PC address (0x80000344). Before the instruc- tion is fetched, checks on PCC are done, as explained in Section 3.3.4. 2. The instruction is decoded, by 昀椀rst evaluating the opcode and then the funct3 and funct7 昀椀elds, depending on the instruction type. In this case, the opcode of 0x03 identi昀椀es the oper- ation as a load instruction. The funct3 昀椀eld of 0x02 further speci昀椀es the instruction as load word (LW). It is important to note, that the instruction decoding is also dependent on the current mode indicated by the flag_cap_mode in PCC. But as described in Section 3.3.3, this is only relevant for compressed instructions, which are not used in this example. 3. Based on the decoded instruction, the corresponding execution operation is performed. This case distinction is extended by the new CHERI operations. 4. Execution of the determined instruction begins. CHERI does not only extend the execution block with new instruction, but also alters the behavior of some existing instructions, especially those accessing memory, like lw. In regular RISC-V, the memory address is calculated by adding the immediate value to the value of the source register rs1. For CHERI, an authorization

5 Running Bare-Metal Software on CHERI-RISC-V VP++ 58

capability is required to access memory. If the 昀氀ag in PCC marks the current mode as capability mode, the register cs1 (capability of rs1) is used for authorization. If the 昀氀ag is not set, DDC is used for authorization, and the 昀椀nal address is calculated by the address of cs1 and the immediate value, plus the address of DDC(if DDC relocation is enabled). The given example is executed in pure capability mode, therefore the 昀氀ag in PCC is always set and the cs1 register is used for authorization. 5. Once address and authorization capability are determined, the CHERI checks are performed. These again ensure that the capability allows loading from the given address. The following checks are applied: Tag of the authorization capability must be valid (1) Authorization capability must be unsealed (type = -1) Authorization capability must permit the current operation, in this case loading (permit_load = 1) The address must be within the bounds of the authorization capability, which translates to the following two conditions: a) addr >= base: The address must be greater than or equal to the base address of the authorizing capability b) addr + size <= top: The address plus the size of the loaded data type must be less than or equal to the top address of the authorizing capability This is the check failing in Line 17 of Listing 5.10, because as the printed capability in Line 26 of Listing 5.13 shows, the base of cs1 is 0x9昀昀昀昀fb8, the top is 0x9昀昀昀昀fcc and the address is 0x9昀昀昀昀fcc. With a size of 4 bytes, the second condition is not met, leading to a CHERI LengthViolation. The violation causes a trap in the VP, which is then handled by the trap handler. Refer to Section 3.3.5 for more details on the trap handling. For the sake of completeness the remaining execution steps within the VP are listed here, although they are not reached for the given example. The address must be aligned to the size of the loaded data type, in this case 4 bytes 6. When all checks pass, the address is then translated to a physical address, which is done by the MMU in the VP. For the given example, virtual memory is not enabled, so the address is directly used as the physical address. 7. Memory is accessed at the calculated address. Depending on the runtime con昀椀guration of the VP, memory access is either handled via a TLM transaction, or via the DMI, which provides faster access. As a general 4-byte word is loaded the DMI or TLM transaction are both performed like in the standard RISC-V VP and do not require modi昀椀cation for CHERI. 8. The loaded value is then stored in the destination register rd, which is determined by the instruction encoding. The writing to the rd register does automatically clear the corresponding capability cd, as the value is not a capability. Clearing means, the capability is set to a Null- Capability, which zeroes the Tag and all 昀椀elds, except the address.

Learnings

      From this example multiple things can be learned. First of all, the CHERI aware compiler generates

capabilities for pointers to objects, without the need for explicit capability instructions in the source code. Minimal changes to existing source code are a key feature of CHERI, which aims to preserve compatibility with legacy software. It is the CHERI compatible hardware that enforces the checks on the capabilities, which is done by modifying the behavior of existing instructions. It is allowed for pointers to go beyond the boundaries of its corresponding capability, as long as the address is not dereferenced. As soon as the address is dereferenced, the CHERI checks are performed and a LengthViolation occurs. This violation leads to a trap, and the CPU enters the con昀椀gured trap handler.

5 Running Bare-Metal Software on CHERI-RISC-V VP++    59

1 int add(int a, int b) 2 { 3 return a + b; 4 } 5 6 int main() 7 { 8 int a = 5; 9 int b = 10; 10 int (*function_pointer)(int, int) = add; 11 pp_cap(function_pointer); 12 int result = function_pointer(a, b); 13 printf("Result of function pointer call: %d\n", result); 14 return 0; 15 }

                   Listing 5.14: Source code of the function pointer example


1  Capability {
2      Tag: 1, Perms: 78f57, Type: fffffffffffffffe, Flags: 1, Address: 800002c0,
3      Base: 0000, Top: ffffffffffffffff, Length: ffffffffffffffff, Offset: 800002c0
4  }
5  Result of function pointer call: 15

                   Listing 5.15: Output of the function pointer example

5.2.3 Evaluating Function Pointers

The next example is designed to show how CHERI handles function pointers. The program shown in Listing 5.14 is rather simple. Besides the main function it de昀椀nes one additional function add that takes two integers as arguments and returns their sum. In the main function, a function pointer is created pointing to the add function. The function pointer is then used to call the add function to show that accessing via the function pointer does work as expected. As the function pointer is a capability, if CHERI is enabled, it is printed to the console using the pp_cap function for further analysis. The program output in Listing 5.15 reveals three noteworthy properties of this capability. Its bounds span from 0 to 0x昀昀昀昀昀昀昀昀昀昀昀昀昀昀昀昀, e昀昀ectively granting access the entire address space, which is a result of the current version of the compiler, that applies exact bounds only to data symbols. For function symbols, the compiler sets the bounds to the code segment of the shared object that contains the function [58]. In this case, the code segment is the .text section, which spans the entire address space in the bare-metal setup. Apparently there is an experimental version of the compiler that will bound code pointers to only the current function, which is not used by default. This could be an interesting topic for future research. Another interesting property is the type of the capability, which is set to 0x昀昀昀昀昀昀昀昀昀昀昀昀昀昀fe (-2). This value indicates that the capability is a sealed entry (sentry), marking it as a jump-target and ensuring that it is not modi昀椀ed. The concept of sealing is explained in Section 2.4.5. Also, the permission 昀椀eld in this example has a value of 0x78f57, which is more restricted than in the previous example. This value indicates that the capability has all permissions set, except for the following three: permit_store permit_store_cap permit_seal

5 Running Bare-Metal Software on CHERI-RISC-V VP++    60

This means that the capability can be used to call the function, and also to read data from the location pointed to by the function pointer, but it cannot be used to modify the memory contents at that location, nor can it be used to store a capability to that location. The capability does not permit sealing, because sealing is a privileged operation that is not required for function invocation and granting it would violate CHERI’s principle of least privilege and increase the risk of misuse. Because the capability is a function pointer, it gets marked as such in the __cap_relocs section by the compiler, by setting the MSB of the cr_flags 昀椀eld to 1. As explained in Section 5.1.4, the cheri_init_globals_3 function is used to initialize the GCT during the bootstrap process. In this method, a function pointer permission mask is de昀椀ned, which clears the three permissions mentioned above, and thus leading to the permissions seen in Listing 5.15.

Learnings

This example shows that CHERI creates function pointers as capabilities, which are de昀椀ned in the capability relocation table. Using capability relocations enables the compiler to restrict the per- missions of a function pointer capability to those required for function invocation, thus preventing misuse. Marking the capability as executable in the relocation table enforces the linker to set the object type of the capability to sentry, which prevents the capability from being modi昀椀ed. The fact, that the function pointer is able to access the entire address space and not only the code segment of the function is a limitation of the current compiler version and could be a topic for future research.

5.3 Conclusion

This chapter 昀椀rst explained the requirements for running CHERI programs on bare-metal systems. It covered the necessary modi昀椀cations to adapt an existing simple assembly program to run on CHERI-enabled systems in pure capability mode. Then it explained how the CHERI compiler and linker create static capability constants, which are required as soon as programs become a little more sophisticated. This required an in-depth look at the generated ELF 昀椀les and how relocations are used to de昀椀ne these capability constants. After the basics were established, the chapter provided an example C program that demonstrated the use of capabilities to ensure memory safety. This showed, that when using a CHERI compiler, no modi昀椀cation to the source code is required to run the program in pure capability mode and enable capability based memory protection. The example was followed by a detailed analysis, of the instruction execution 昀氀ow in the VP and highlighted the additional mechanisms added by CHERI. Finally, a second example was provided, which demonstrated the e昀昀ects of CHERI on function pointers and how capabilities ensure the integrity of function calls. This example also showed a topic for future research, as the current version of the compiler is not able to identify constrained boundaries for function pointers. Overall, this chapter showcased, that the VP is indeed capable of running CHERI programs in pure capability mode and that the CHERI protection mechanisms do work as expected. This opens up the possibility to run more complex scenarios on the VP and to further explore the CHERI capabilities and their potential for memory safety and security in embedded systems.

Chapter 6

Bringing a Full-Scale Operating System to
CHERI-RISC-V VP++

After verifying the CHERI-RISC-V VP++ implementation using the TestRIG framework (see Chapter 4) and demonstrating its correct behavior through several bare-metal CHERI-enabled programs (Section 5.2), the natural next step is to evaluate its functionality in a more realistic, full- system environment. Running a general purpose OS provides an opportunity to evaluate the CHERI extensions in scenarios using virtual memory management, user and kernel privilege separation, dynamic linking, and software compartmentalization. These features cannot be fully exercised in bare-metal contexts. However, this chapter does not aim to provide a comprehensive evaluation of all these aspects, but rather demonstrates that the CHERI-RISC-V VP++ is capable of booting and running a full OS, thereby showcasing its practical applicability. The analysis of complex scenarios involving CHERI’s memory protection mechanisms in a full OS context is deferred to future work. Instead, this chapter revisits examples already shown in Chapter 5 and focuses on the di昀昀erences that arise when running them in a full OS environment. At the time of writing, three options for CHERI-enabled OSs are available. The CHERI-Alliance has published variants of Linux and Zephyr that support CHERI. But these do follow the new, and still changing RISC-V speci昀椀cation for CHERI extensions [8], while the CHERI-RISC-V VP++ is based on the speci昀椀cation from CTSRD (CHERI ISAv9) [6], as discussed in Section 2.4.4. The third option is CheriBSD, which was introduced in Section 2.4.9. CheriBSD is a CHERI- enabled variant of FreeBSD, which was adapted by the CTSRD team and is based on the same CHERI speci昀椀cation as the VP. Therefore, CheriBSD is the only available CHERI-enabled OS that is suitable for running on the CHERI-RISC-V VP++ at the moment. Additionally, CHERI- QEMU, that was introduced in Section 2.4.9, is capable of running the same CheriBSD kernel, which allows for a direct comparison between the VP and the CHERI-QEMU environment.

6.1 Requirements and Setup

The base RISC-V VP++ already supports booting the Linux kernel on a SiFive FU540-like plat- form [21]. With further adaptations, it is also capable of booting the FreeBSD kernel. To align more closely with the CHERI-enabled QEMU environment, the CHERI-RISC-V VP++ was ex- tended to support QEMU’s virt platform, which is the standard con昀椀guration used for running CheriBSD in QEMU. By reusing the same platform de昀椀nition as CHERI-QEMU, discrepancies in hardware con昀椀guration or platform-speci昀椀c behaviors can be avoided, enabling direct comparisons and simplifying debugging, while increasing con昀椀dence in the implementation’s correctness. The QEMU virt platform emulates a generic RISC-V virtual board, which includes a set of pre- de昀椀ned memory-mapped devices. These include a PLIC, and a CLINT and a SiFive Test device, which already existed in the RISC-V VP++, as well as a NS16550A-UART which was implemented and added to the VP for this work. Additionally, the platform includes multiple components, that are not mandatory for the boot of CheriBSD, and are therefore only implemented as dummy de- vices in the VP. Such devices are the Google Gold昀椀sh Real-Time Clock (RTC), a 昀氀ash controller, a Peripheral Component Interconnect (PCI) controller and a virtio-mmio transport device.

61

6 Bringing a Full-Scale Operating System to CHERI-RISC-V VP++    62


1  ./vp/build/bin/cheri_qemu_virt64-sc-vp
2   --use-data-dmi
3   --kernel-file ./kernel_files/CheriBSD_kernel_rootfs.elf
4   --dtb-file=./dtb_files/qemu_cheri_virt_rv64_2GiB_VP.dtb
5   ./opensbi-ctsrd/opensbi/build/platform/generic/firmware/fw_jump.elf

   Listing 6.1: Command-line arguments to boot CheriBSD on CHERI-RISC-V VP++

As CHERI-RISC-V VP++ does currently not emulate block devices, it is not possible to mount a traditional disk-based 昀椀le system in CheriBSD. To overcome this limitation, the kernel is con昀椀gured to include an embedded root 昀椀le system image using the md (memory disk) facility. At boot, this image is loaded into memory and mounted as ram-disk, serving as the root 昀椀le system. This approach enables the system to provide a minimal, read-only userland environment directly from memory, allowing userspace CHERI programs to be executed and tested without the need for additional storage emulation. In addition to the CheriBSD kernel, a 昀椀rst-stage bootloader is required to prepare the system for kernel execution. This role is ful昀椀lled by OpenSBI [59], which initializes the RISC-V platform in Machine mode (M-mode) and provides the standard Supervisor Binary Interface (SBI) services that OSs rely on. OpenSBI is responsible for setting up the necessary environment to transition into Supervisor mode (S-mode), where CheriBSD operates. Furthermore, OpenSBI handles low- level functionality such as console I/O, timer management, and trap delegation, which are essential for early kernel bootstrapping and runtime services. Without OpenSBI, CheriBSD would not be able to boot, as it depends on these SBI calls for basic system functionality on RISC-V platforms. The CTSRD project provides a CHERI-aware version of OpenSBI [60], which is used for this work.

6.2 Running CheriBSD

With everything mentioned above considered, the CHERI-RISC-V VP++ is now able to boot CheriBSD. Performance-wise, CHERI-RISC-V VP++ is not directly comparable with QEMU, as the VP takes 200 seconds on the debug build and 24 seconds on the release build to boot CheriBSD, while QEMU requires roughly 8 seconds on the same hardware. However, it is important to understand that the VP and QEMU are fundamentally distinct tools with di昀昀erent design goals. As explained in Section 2.4.9, QEMU is an emulator, which focuses on performance, while the VP provides a deterministic, cycle-approximate simulation of the hardware platform. However, the CHERI extension was done with a focus on correctness and readability, which means performance can de昀椀nitely be improved in the future. An improvement is expected, when the caching mechanism added by Schlägl and Große [61] is re-enabled for the CHERI extension. Also, most algorithmic operations performed on capabilities, are not optimized yet. In Listing 6.1, the command-line arguments used to boot CheriBSD on the VP are shown. The used VP binary is called cheri_qemu_virt64-sc-vp, indicating that it is a CHERI-enabled version of the VP that resembles QEMU’s virt platform for 64-bit. The -sc suffix indicates that this is the single-core platform. The VP supports multi-core systems, however to keep things simple for this work, all experiments shown in this chapter are performed on the single-core version. The arguments in Listing 6.1 show, that the DMI is enabled for data access, which increases performance signi昀椀cantly, as explained in Section 3.4.4. The VP requires to provide the path to the CheriBSD kernel 昀椀le, which is in the ELF format. Additionally, the path to the device tree blob 昀椀le is provided, which describes the hardware con昀椀guration of the emulated platform. As the name implies, this 昀椀le is copied from QEMU and slightly adapted to work with the VP. Finally, the path to the OpenSBI ELF 昀椀le is provided as positional argument, which is loaded by the VP at startup. As explained in Section 6.1, the CHERI-enabled version of OpenSBI provided by CTSRD is used here.

6 Bringing a Full-Scale Operating System to CHERI-RISC-V VP++ 63

1 OpenSBI v0.9 2 ____ _____ ____ _____ 3 / __ \ / _| _ _ | 4 5 ' \ / _ \ ' \ _ | 6 7 __ 8 9 _ 10 Platform Name : riscv-virtio,qemu 11 Platform Features : mfdeleg 12 Platform HART Count : 1 13 Platform IPI Device : aclint-mswi 14 Platform Timer Device : aclint-mtimer 15 Platform Console Device : uart8250 16 Platform HSM Device : --- 17 Platform SysReset Device : sifive_test 18 Firmware Base : 0x80000000 19 Firmware Size : 248 KB 20 Runtime SBI Version : 0.3 21 Domain0 Name : root 22 Domain0 Boot HART : 0 23 Domain0 HARTs : 0* 24 Domain0 Region00 : 0x0000000002000000-0x000000000200ffff (I) 25 Domain0 Region01 : 0x0000000080000000-0x000000008003ffff () 26 Domain0 Region02 : 0x0000000000000000-0xffffffffffffffff (R,W,X) 27 Domain0 Next Address : 0x0000000080200000 28 Domain0 Next Arg1 : 0x0000000080100000 29 Domain0 Next Mode : S-mode 30 Domain0 SysReset : yes 31 FDT ADDRESS: 82200000 32 Boot HART ID : 0 33 Boot HART Domain : root 34 Boot HART ISA : rv64imafdcsux 35 Boot HART Features : scounteren,mcounteren,mcountinhibit,time 36 Boot HART PMP Count : 16 37 Boot HART PMP Granularity : 4 38 Boot HART PMP Address Bits: 54 39 Boot HART MHPM Count : 0 40 Boot HART MIDELEG : 0x0000000000000222 41 Boot HART MEDELEG : 0x000000001c00b109

  Listing 6.2: Output of OpenSBI when booting CheriBSD on CHERI-RISC-V VP++

6 Bringing a Full-Scale Operating System to CHERI-RISC-V VP++    64

1 ---<>--- 2 Physical memory chunk(s): 3 0x80000000 - 0xffffffff, 2048 MB ( 524288 pages) 4 Excluded memory regions: 5 0x80000000 - 0x801fffff, 2 MB ( 512 pages) NoAlloc NoDump 6 0x80200000 - 0xa141bfff, 530 MB ( 135708 pages) NoAlloc 7 Avail lists: 8 phys_avail[0] 0xa141c000 9 phys_avail[1] 0x100000000 10 dump_avail[0] 0x80200000 11 dump_avail[1] 0x100000000 12 Found 1 CPUs in the device tree 13 Copyright 2011-2025 University of Cambridge. 14 Copyright 2012-2025 SRI International. 15 Copyright (c) 1992-2024 The FreeBSD Project. 16 Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 17 The Regents of the University of California. All rights reserved. 18 FreeBSD is a registered trademark of The FreeBSD Foundation.

  Listing 6.3: Beginning of CheriBSD’s output when booting on CHERI-RISC-V VP++


1  Trying to mount root from ufs:/dev/md0 []...
2  WARNING: WITNESS option enabled, expect reduced performance.
3  regulator: shutting down unused regulators
4  Warning: no time-of-day clock registered, system time will not be set accurately
5  start_init: trying /sbin/init
6  2025-06-24T12:55:01.709693+00:00 - init 17 - - login_getclass: unknown class 'daemon'
7   2025-06-24T12:55:01.964631+00:00 - init 17 - - can't access /etc/rc: No such file or
   directory
8  /bin/sh: cannot open /etc/rc: No such file or directory
9  Enter full pathname of shell or RETURN for /bin/sh:

   Listing 6.4: End of CheriBSD’s output when booting on CHERI-RISC-V VP++

Running the command shown in Listing 6.1 starts the VP, which then loads the provided OpenSBI 昀椀rmware ELF binary 昀椀le. Afterwards, the VP reads the device tree blob 昀椀le and 昀椀nally loads the CheriBSD kernel 昀椀le into memory. Once everything is loaded, the VP starts execution at the entry point of the OpenSBI 昀椀rmware, which is de昀椀ned in the ELF 昀椀le. After OpenSBI has initialized the platform, the user can see some output by OpenSBI on the console, which contains information about the platform and OpenSBI version used. It also shows the next address, which OpenSBI will jump to once it has 昀椀nished its initialization. This address corresponds to the entry point of the CheriBSD kernel, which must match the address de昀椀ned in the kernel’s ELF 昀椀le. The full output of OpenSBI is shown in Listing 6.2. When OpenSBI has 昀椀nished its initialization, it jumps to the CheriBSD kernel’s entry point, which starts the kernel boot process. Because the entire verbose boot output of CheriBSD is quite lengthy, Listing 6.3 only shows the beginning of the boot process. Listing 6.4 contains the end of the boot process. As the last line in Listing 6.4 shows, the system has successfully booted to a state where it is ready to accept user input. The user can now either press RETURN to start a shell, or enter the path to another program to execute it instead. At this point, the VP has successfully executed approximately 324 million instructions and performed around 21 million data loads and 63 million data stores, according to detailed runtime statistics provided by the VP. Additionally, 655 thousand valid capabilities can be found in the Tagged Memory Module, indicated by the Tags set to 1 in the corresponding memory array. By CHERI’s design, all of these capabilities are derived from existing ones, re昀氀ecting CHERI’s guarantee of non-forgeability and monotonicity. Each capability thus encodes restricted bounds and permissions inherited from its parent, ensuring that memory accesses and authority delegation remain con昀椀ned within the limits imposed by hardware.

6 Bringing a Full-Scale Operating System to CHERI-RISC-V VP++    65


1  Enter full pathname of shell or RETURN for /bin/sh:
2  # pwd
3  /
4  # ls
5  METALOG  boot  etc   lib64c   media  net   rescue   sbin  usr
6  bin      dev   lib   libexec  mnt    proc  root     tmp   var
7  # cd bin
8  # pwd
9  /bin

    Listing 6.5: Basic system navigation in CheriBSD on CHERI-RISC-V VP++


1  # cd /tmp
2  # pwd
3  /tmp
4  # ls
5  # mkdir test
6  mkdir: test: Read-only file system
7  # ls
8  #

    Listing 6.6: Writing to the 昀椀le system in CheriBSD on CHERI-RISC-V VP++

The next sections will show some simple interactions with the OS, to demonstrate that it is fully functional, with the restrictions imposed by the limited root 昀椀le system.

6.2.1 Basic System Interaction

First of all, the user can check the current working directory by executing the pwd command, which shows that the current directory is the root directory /, as can be seen in Line 3 of Listing 6.5. Printing the content of the root directory using the ls command shows that the root 昀椀le system contains a minimal set of directories as shown in Listing 6.5. These folders do match with the expectations, as it is the same structure that was con昀椀gured when creating the root 昀椀le system image. Line 7 navigates to the /bin directory, which contains the user programs. Using pwd again shows that the current working directory has indeed changed to \bin, which con昀椀rms that basic 昀椀le system navigation is functional. Trying to write to the 昀椀le system, as shown in Listing 6.6, shows a big limitation of the current setup. Because the root 昀椀le system is mounted as a read-only ram-disk, any attempt to create or modify 昀椀les fails with a Read-only file system error, as can be seen in Line 5. It is important to note that it is possible to mount the ram-disk in read-write mode, however changes to the 昀椀le system would not be persistent, as the ram-disk is lost when the system is powered o昀昀 or rebooted. As the output of the subsequent ls command in Line 7 shows, the directory remains empty, as no folder could be created. The same is true for any other write operation, such as creating, moving, or deleting existing 昀椀les.

6.2.2 Inspecting Running Processes

To further demonstrate that the OS is fully functional, the standard system monitoring tools ps and top are executed. These utilities provide insights into the currently running processes and system resource usage. The command ps aux lists all active processes, along with details such as process IDs, CPU and memory usage and the command that launched each process. The output in Listing 6.7 shows several important aspects. First, it con昀椀rms that the system is running multiple kernel and system threads (such as [kernel], pagedeamon, bufdaemon, ...) in the background.

6 Bringing a Full-Scale Operating System to CHERI-RISC-V VP++    66

1 # ps aux 2 USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND 3 0 11 96.4 0.0 0 20 - RNL 12:55 32:26.90 [idle] 4 0 18 0.2 0.2 18968 3304 u0 Ss 12:55 0:02.39 -sh (sh) 5 0 0 0.0 0.0 0 240 - DLs 12:55 0:04.11 [kernel] 6 0 1 0.0 0.1 16700 1296 - ILs 12:55 0:00.69 /sbin/init 7 0 2 0.0 0.0 0 20 - WL 12:55 0:06.55 [clock] 8 0 3 0.0 0.0 0 40 - DL 12:55 0:00.00 [crypto] 9 0 4 0.0 0.0 0 60 - DL 12:55 0:00.00 [cam] 10 0 5 0.0 0.0 0 20 - DL 12:55 0:00.36 [md0] 11 0 6 0.0 0.0 0 20 - RL 12:55 0:06.73 [rand_harvestq] 12 0 7 0.0 0.0 0 20 - DL 12:55 0:00.00 [cheri_revoke] 13 0 8 0.0 0.0 0 60 - RL 12:55 0:08.58 [pagedaemon] 14 0 9 0.0 0.0 0 20 - DL 12:55 0:00.00 [vmdaemon] 15 0 10 0.0 0.0 0 20 - DL 12:55 0:00.00 [audit] 16 0 12 0.0 0.0 0 100 - WL 12:55 0:00.29 [intr] 17 0 13 0.0 0.0 0 60 - DL 12:55 0:00.18 [geom] 18 0 14 0.0 0.0 0 40 - DL 12:55 0:01.51 [bufdaemon] 19 0 15 0.0 0.0 0 20 - DL 12:55 0:00.65 [vnlru] 20 0 16 0.0 0.0 0 20 - DL 12:55 0:00.71 [syncer] 21 0 20 0.0 0.2 18980 3116 u0 R+ 13:28 0:00.72 ps aux

    Listing 6.7: Output of the ps aux command in CheriBSD on CHERI-RISC-V VP++

1 # top 2 last pid: 21; load averages: 0.18, 0.31, 0.28 up 0+00:35:04 13:30:01 3 2 processes: 1 running, 1 sleeping 4 CPU: 0.2% user, 0.0% nice, 2.2% system, 0.0% interrupt, 97.6% idle 5 Mem: 1968K Active, 232K Inact, 40M Wired, 5468K Buf, 1407M Free 6 # top -S 7 last pid: 22; load averages: 0.31, 0.30, 0.28 up 0+00:59:02 13:53:59 8 19 processes: 2 running, 15 sleeping, 2 waiting 9 CPU: 0.2% user, 0.0% nice, 2.1% system, 0.0% interrupt, 97.7% idle 10 Mem: 1964K Active, 244K Inact, 40M Wired, 5468K Buf, 1407M Free

   Listing 6.8: Output of the top command in CheriBSD on CHERI-RISC-V VP++

These threads show di昀昀erent states, such as RL (running, locked) and Ss (Sleeping, session leader), which demonstrates that the kernel’s process management is functional. This con昀椀rms that the system is capable of multitasking and managing various system activities concurrently. Addition- ally, the presence of the shell process -sh indicates that the userland environment is operational (which was already proven in the previous section). Finally, the idle process [idle] shows that the system is correctly managing idle time, which is important for power efficiency and system responsiveness. While top is typically used to provide a dynamic, real-time view of system processes, it can also be run in a non-interactive mode to capture a snapshot. Because the current setup only provides a so called dumb terminal, which does not support interactive features, such as cursor movement or screen clearing, top is executed in a non-interactive mode here. The output in Listing 6.8 shows the output of a single execution of the top command. It con昀椀rms the presence of two userland processes, one running (the top command itself) and one sleeping (the shell process). When running the top command with the -S 昀氀ag, it includes all system processes, which increases the total number of processes to 19, matching with the number of process shown by ps aux. The output of top includes insights into the CPU and memory usage of the system. Both are expectedly low, as the system is mostly idle, with only a few background processes running.

6 Bringing a Full-Scale Operating System to CHERI-RISC-V VP++    67


1 # ifconfig
2 lo0: flags=8008<LOOPBACK,MULTICAST> metric 0 mtu 16384
3 options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
4 groups: lo
5 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

Listing 6.9: Output of ifconfig in CheriBSD on CHERI-RISC-V VP++ without IPv4 address


1  # ping -c 4 127.0.0.1
2  PING 127.0.0.1 (127.0.0.1): 56 data bytes
3  64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=3.970 ms
4  64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=4.439 ms
5  64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=3.051 ms
6  64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=3.334 ms
7
8  --- 127.0.0.1 ping statistics ---
9  4 packets transmitted, 4 packets received, 0.0% packet loss

10 round-trip min/avg/max/stddev = 3.051/3.699/4.439/0.542 ms

   Listing 6.10: Output of the ping command in CheriBSD on CHERI-RISC-V VP++

6.2.3 Networking

To verify that the networking subsystem of CheriBSD is functional on the VP, the ping com- mand is used to send Internet Control Message Protocol (ICMP) packets to the loopback inter- face. After boot, the internal loopback interface is available, but has no IPv4 address assigned by default, as the output in Listing 6.9 shows. This results in the ping command returning the message ping: sendto: No route to host, indicating that the system cannot route packets to the (un)speci昀椀ed address. After manually assigning the IPv4 address using ifconfig lo0 inet 127.0.0.1 netmask 255.0.0.0 up, the ping command successfully receives responses from the loopback address, con昀椀rming that CheriBSD’s networking stack is operational, even in the limited VP environment without external networking. Listing 6.10 shows the output of the successful ping command.

6.3 Testing CHERI Protection in CheriBSD

Now that the CHERI-RISC-V VP++ is able to boot CheriBSD, it can be used to explore the more advanced features of CHERI, in a realistic OS environment. The examples presented in Chapter 5 are revisited here, to observe their behavior within the CheriBSD environment. This allows for a direct comparison between the bare-metal and the CheriBSD environment. While these examples do not cover all aspects of CHERI in a full OS environment, they nevertheless demonstrate that the CHERI extension in the CHERI-RISC-V VP++ is functionally correct. More advanced examples, that make use of all features proposed by CHERI, are left for future work.

6.3.1 Evaluating Boundary Protection

To verify that the CHERI extension in the VP is correctly implemented and does work in the CheriBSD environment, the example that was already introduced in Section 5.2.2 is reused again. This allows for a direct comparison between the bare-metal and the CheriBSD environment. So for this experiment, the same C code as in Listing 5.10 is compiled using the toolchain for CheriBSD, generated using cheribuild. The resulting binary is then copied to the root 昀椀le system image, before rebuilding the kernel to include the updated image. After booting CheriBSD on the VP, the test program is executed from the shell.

6 Bringing a Full-Scale Operating System to CHERI-RISC-V VP++    68

1 Capability: 0x3fffdfff58 [rwRW,0x3fffdfff58-0x3fffdfff6c] 2 Tag: 1, Perms: 6817d, Type: ffffffffffffffff, Address: 3fffdfff58, Base: 3fffdfff58, End: 3fffdfff6c, Flags: 0, Length: 0014, Offset: 0000 3 Count: 0, Value: 0 4 Capability: 0x3fffdfff5c [rwRW,0x3fffdfff58-0x3fffdfff6c] 5 Tag: 1, Perms: 6817d, Type: ffffffffffffffff, Address: 3fffdfff5c, Base: 3fffdfff58, End: 3fffdfff6c, Flags: 0, Length: 0014, Offset: 0004 6 Count: 1, Value: 0 7 Capability: 0x3fffdfff60 [rwRW,0x3fffdfff58-0x3fffdfff6c] 8 Tag: 1, Perms: 6817d, Type: ffffffffffffffff, Address: 3fffdfff60, Base: 3fffdfff58, End: 3fffdfff6c, Flags: 0, Length: 0014, Offset: 0008 9 Count: 2, Value: 0 10 Capability: 0x3fffdfff64 [rwRW,0x3fffdfff58-0x3fffdfff6c] 11 Tag: 1, Perms: 6817d, Type: ffffffffffffffff, Address: 3fffdfff64, Base: 3fffdfff58, End: 3fffdfff6c, Flags: 0, Length: 0014, Offset: 000c 12 Count: 3, Value: 0 13 Capability: 0x3fffdfff68 [rwRW,0x3fffdfff58-0x3fffdfff6c] 14 Tag: 1, Perms: 6817d, Type: ffffffffffffffff, Address: 3fffdfff68, Base: 3fffdfff58, End: 3fffdfff6c, Flags: 0, Length: 0014, Offset: 0010 15 Count: 4, Value: 0 16 Capability: 0x3fffdfff6c [rwRW,0x3fffdfff58-0x3fffdfff6c] 17 Tag: 1, Perms: 6817d, Type: ffffffffffffffff, Address: 3fffdfff6c, Base: 3fffdfff58, End: 3fffdfff6c, Flags: 0, Length: 0014, Offset: 0014 18 In-address space security exception (core dumped)

Listing 6.11: Output of the read beyond bounds example in CheriBSD on CHERI-RISC-V VP++

The output shown in Listing 6.11 shows that the result is very similar to the bare-metal output in Listing 5.13, and reading beyond the bounds of the array is not possible, when the code is build and run on CheriBSD. However, the resulting output from the pp_cap function is di昀昀erent. Also, the behavior after the exception is di昀昀erent. The CHERI LengthViolation trap is handled by the CheriBSD kernel, which terminates the program and produces the (core dumped) message in the output. This behavior is similar to how memory access violations are handled in non-CHERI systems with a MMU. After program termination, CheriBSD remains in a stable state, allowing the user to continue interacting with the system. Direct comparison with the output from the bare-metal example in Listing 5.13 shows two di昀昀er- ences in the output, which are analyzed in more detail in the following. First of all, the address of the capability has changed, which is expected, as the address space layout of CheriBSD is di昀昀erent from the bare-metal example. CheriBSD uses virtual memory, to separate the address spaces of di昀昀erent processes, while the bare-metal example runs in a 昀氀at memory model without virtual memory. The second, more interesting di昀昀erence, is the change in the permission 昀椀eld of the capability. While in the bare-metal example the capability has full permissions (0x78昀昀f), the capability in CheriBSD only provides the following permissions: global permit_load permit_store permit_load_cap permit_store_cap permit_store_local_cap permit_c_invoke These permissions, which are explained in Section 2.4.5, indicate that the capability can not be used for execution, sealing and unsealing, nor can it access the system registers. As the experiments in Section 5.2.2 showed, the capability of the array is derived from the DDC in the bare-metal

6 Bringing a Full-Scale Operating System to CHERI-RISC-V VP++    69


1  void *__capability ddc = cheri_ddc_get();
2  printf("DDC: ");
3  pp_cap(ddc);
4  // Prints:
5  // DDC: Capability: 0x0
6  // Tag: 0, Perms: 0000, Type: ffffffffffffffff, Address: 0000, Base: 0000,
7  // End: ffffffffffffffff, Flags: 0, Length: ffffffffffffffff, Offset: 0000

    Listing 6.12: Reading the Default Data Capability


1  void *stack_pointer = __builtin_frame_address(0);
2  printf("Stack Pointer: ");
3  pp_cap(stack_pointer);
4  // Prints:
5  // Stack Pointer: Capability: 0x3fffdfff90 [rwRW,0x3fbfe00000-0x3fffe00000]
6  // Tag: 1, Perms: 6817d, Type: ffffffffffffffff, Address: 3fffdfff90, Base: 3fbfe00000
7  // End: 3fffe00000, Flags: 0, Length: 40000000, Offset: 3fffff90

    Listing 6.13: Reading the Stack Pointer Capability

scenario. To see if this is also the case in the CheriBSD scenario, and learn why the permissions are reduced, the DDC is examined next. This is done by adding the lines shown in Listing 6.12 to the main function of the program. The resulting output shows, that the DDC has its bounds set to the entire address space, but the permission 昀椀eld is set to 0, indicating no access rights at all. Therefore, the capability of the array is derived from another capability, not DDC. Because the capability of the array points to the stack, it appears likely, that the capability is derived from the stack pointer capability. So as a next step, the stack pointer capability is examined, by adding the lines shown in Listing 6.13 to the main function of the program. From this output it can be seen, that the stack pointer capability has the permissions set to 0x6817d, which is the same value as the capability of the array in the output in Listing 6.11. This result con昀椀rms the hypothesis, that the capability of the array is indeed derived from the stack pointer capability, the same way as it was already shown in Section 5.2.2. However, unlike in the bare-metal scenario, the stack pointer capability is apparently not derived from DDC. Instead, it is set up by the OS (CheriBSD) before the program is started, and its permissions are set to a more restrictive set, which is sufficient for the program to run, but does not allow it to execute code or access system registers. This shows the application of the principles of least privilege and intentional use, by the OS.

6.3.2 Evaluating Function Pointers

The next example will revisit the function pointer example from Section 5.2.3 and is based on the same C code, which is shown in Listing 5.14. The only di昀昀erence is that the code is compiled using the CheriBSD toolchain. The resulting binary is then copied to the root 昀椀le system image, before rebuilding the kernel to include the updated image. After booting CheriBSD on the VP, the test program is executed using the shell. When comparing the output when running the program on CheriBSD in Listing 6.14 with the output from the bare-metal example in Listing 5.15, it can be seen, that using function pointers works in both environments. However, some di昀昀erences can be observed, when comparing the value of the capability printed by the pp_cap function. As already seen in the previous section, the address of the capability has changed, again due to the di昀昀erent address space layout of CheriBSD. Also, the permission 昀椀eld of the capability is di昀昀erent. While function pointers in the bare-metal example had almost full permissions (0x78f57),

6 Bringing a Full-Scale Operating System to CHERI-RISC-V VP++    70


1  # /usr/bin/function-pointer-example
2  Capability {
3   Tag: 1, Perms: 68117, Type: fffffffffffffffe, Address: 101b7e,
4   Base: 100000, End: 103ee0, Flags: 1, Length: 3ee0, Offset: 1b7e
5  }
6  Result of function pointer call: 15

   Listing 6.14: Output of the function pointer example in CheriBSD on CHERI-RISC-V VP++

the function pointer capability in CheriBSD is much more restrictive, only providing the following permissions: global permit_execute permit_load permit_load_cap permit_c_invoke This di昀昀erence can again be explained by the way capabilities are derived for a function pointer. As explained in Section 5.2.3, the function pointer is created by the compiler as a relocation entry in the __cap_relocs section of the ELF 昀椀le, where it is marked as an executable code pointer. The actual capability in the GCT, which is located in the tagged memory, can only be created at runtime. While in the bare-metal example, the GCT entry for the pointer is created in the bootstrap process, in the CheriBSD example the GCT entry is created by the dynamic linker of CheriBSD, which is part of the OS. This leads to a more restrictive permission set, as the dynamic linker applies more 昀椀ne-grained control over the permissions of capabilities, following the principles of least privilege and intentinal use. The other very important di昀昀erence is, that the capability in the CheriBSD example, has limited bounds, while the capability in the bare-metal example had its bounds set to the entire address space. This time, the bounds are set to a range from 0x100000 to 0x103ee0. Using objdump and readelf it can be con昀椀rmed, that this range corresponds to the size of the .text section of the ELF binary. So while the boundaries appear to be more limited than in the bare-metal example, they still allow the function pointer to access the entire code section of the currently executed program, which is the expected behavior, as explained in Section 5.2.3.

6.4 Conclusion

This chapter demonstrated that the CHERI-RISC-V VP++ is capable of booting and running CheriBSD, a CHERI-enabled Unix-like full-scale OS. The 昀椀rst section explained the necessary requirements and setup to achieve this goal, including the implementation of missing devices and the use of a CHERI-aware OpenSBI 昀椀rmware. With the OS running, the chapter proceeded to showcase basic system interactions, such as navigat- ing the 昀椀le system and inspecting running processes. The fact that multiple threads were running concurrently, and the system remained stable after terminating a program, demonstrated the ro- bustness of the VP implementation. The networking capabilities were also veri昀椀ed by successfully pinging the loopback interface, con昀椀rming that the networking stack of CheriBSD is functional within the VP environment. As a 昀椀nal step, the chapter explored basic CHERI functionality within the CheriBSD environment. By revisiting the simple example from Section 5.2.2, it was shown that the CHERI-RISC-V VP++ correctly enforces memory safety by causing a CHERI LengthViolation when attempting to read

6 Bringing a Full-Scale Operating System to CHERI-RISC-V VP++    71

beyond the bounds of an array. This test also con昀椀rmed that CheriBSD remains stable after handling an exception, as the shell remains operational after the in-address space security exception caused the core dump of the program. The function pointer example was also successfully executed within CheriBSD, demonstrating that function pointers work as expected in the OS environment. This example also highlighted the di昀昀erences in capability permissions and bounds when compared to the bare-metal scenario, emphasizing the role of the OS in setting up capabilities with appropriate restrictions. While this small experiments do not fully explore the capabilities of CHERI within an OS context, they serve as a proof of concept that the CHERI-RISC-V VP++ can e昀昀ectively run a CHERI-enabled OS and enforce CHERI protections. The chapter concludes that the CHERI-RISC-V VP++ is a viable platform for further exploration and development of CHERI- based systems, providing a foundation for future research and experimentation in this area.

Chapter 7

Conclusion and Future Work

The 昀椀nal chapter of this thesis summarizes the results achieved through the development of CHERI-RISC-V VP++ in Section 7.1 and outlines potential directions for future work in Sec- tion 7.2.

7.1 Conclusion

In this thesis, CHERI-RISC-V VP++, a capability-aware VP platform, was developed to support 昀椀ne-grained memory protection based on the CHERI extension for the RISC-V architecture. This work was motivated by the growing signi昀椀cance of the CHERI project, which has gained increasing attention in recent years as a promising approach to mitigating memory-related security vulnerabil- ities. By combining hardware-enforced capabilities with software-level support, CHERI addresses one of the most critical and persistent issues in systems programming: memory corruption. CHERI-RISC-V VP++ extends the existing RISC-V VP++ by integrating CHERI-speci昀椀c fea- tures, including capability registers, tagged memory, capability-aware instruction decoding and ex- ecution, and exception handling. The resulting CHERI-RISC-V VP++ was veri昀椀ed using random testing with the TestRIG framework, and its functionality was demonstrated through the execution of CHERI-enabled programs on bare-metal as well as by successfully booting the capability-enabled full-scale OS CheriBSD. Chapter 2 started this thesis by providing the necessary background knowledge, required to un- derstand the design and implementation of the CHERI-RISC-V VP++. First, an overview of RISC-V and the concept of VPs was presented, followed by a detailed description of the existing RISC-V VP++, which served as the basis for this work. The chapter also introduced the CHERI architecture and its implications on RISC-V, as well as all the necessary toolchains and software components used throughout this thesis. This allowed to establish a common understanding of key concepts, necessary to follow the next chapters. On this basis, the actual implementation of the CHERI-RISC-V VP++ was described, by outlining the changes made to the existing VP from an architectural perspective in Chapter 3. The represen- tation of capabilities in registers and memory was 昀椀rst explained, followed by the changes made to the ISS to support decoding and executing CHERI-speci昀椀c instructions. Finally, the modi昀椀cations to all memory related components and interfaces were presented, to support tagged memory and capability-aware memory accesses. Chapter 4 then focused on the veri昀椀cation of the CHERI-RISC-V VP++, by explaining how the TestRIG framework was used for random testing. The chapter discussed advantages of this ap- proach and showcased bugs that were found during the veri昀椀cation process, but also pointed out the limits of random testing and problems caused by the RVFI-DII interface. Finally, the chapter concluded with a detailed code coverage analysis of all 昀椀les modi昀椀ed during the implementation of the CHERI-RISC-V VP++, in order to evaluate the coverage achieved by the applied tests.

72

7 Conclusion and Future Work    73

Using the veri昀椀ed implementation allowed running bare-metal CHERI-aware programs, which were shown in Chapter 5. This required the understanding of the compilation process of CHERI pro- grams and how capabilities are created in the 昀椀rst place. Based on simple examples and an in depth analysis of the executed CHERI instructions, the CHERI protection scheme and all the additional checks performed by the hardware were explained. Finally, Chapter 6 demonstrated the full functionality of the CHERI-RISC-V VP++ by booting the capability-enabled OS CheriBSD. The chapter explained the necessary steps to build and run CheriBSD on the VP and showed the successful boot process. The running system was then 昀椀rst veri昀椀ed by inspecting the results of common Unix commands. With basic functionality assured, two simple CHERI-speci昀椀c examples from previous chapters were rerun, to con昀椀rm that the CHERI protection mechanisms do work as expected within CheriBSD. The 昀椀nal version of the CHERI-RISC-V VP++ developed in this thesis will be released as open- source on GitHub1. Having a fully functional CHERI-enabled RISC-V VP opens up new possibili- ties for future work, including challenging the CHERI model and its claims on security guarantees, as outlined in the 昀椀nal section of this thesis.

7.2 Future Work

While the results presented in Chapter 6 demonstrate that the CHERI-RISC-V VP++ is fully functional, several areas remain for potential improvement and further exploration. The following sections outline targets for future work identi昀椀ed in the course of this thesis.

7.2.1 RISC-V Speci昀椀cation for CHERI Extensions

As explained in Section 2.4.4, the CHERI-RISC-V VP++ developed in this thesis is based on CHERI ISAv9 [6]. However, it currently appears like the RISC-V speci昀椀cation for CHERI exten- sions [8] will become the standard for CHERI on RISC-V. At the time of writing, however, this speci昀椀cation is still evolving and hos not yet reached full stability, which is why this work is based on the stable CHERI ISAv9. As more and more projects are released that are based on this upcoming standard, it might be bene昀椀cial to adapt the CHERI-RISC-V VP++ to the latest speci昀椀cation. The majority of adaptions should be straightforward, as CHERI ISAv9 is the more feature rich speci昀椀cation, and the RISC-V speci昀椀cation only implements the core features of CHERI. Therefore, most of the work is to adapt all instruction encodings, as the upcoming standard has moved the encoding to the default RISC-V instruction space, while CHERI ISAv9 uses a custom encoding space for CHERI instructions. However, special attention is required for handling the hybrid mode, as its implementation dif- fers between the two speci昀椀cations. Additional unforeseen issues may arise, since the upcoming speci昀椀cation was not considered during the development of the CHERI-RISC-V VP++.

7.2.2 RV32 Support

As explained in the beginning of Chapter 3, the CHERI-RISC-V VP++ currently only supports the RV64 architecture. This choice was made to reduce the implementation e昀昀ort, as supporting only one architecture simpli昀椀es implementation and testing, but still provides a fully functional platform that is capable of demonstrating all CHERI features. However, to provide a more complete VP, one might be interested in adding support for the RV32 architecture as well. As with the adaption to the new RISC-V speci昀椀cation for CHERI extensions, this should be mostly straightforward, as the implementation can be based on the existing RV64 implementation.

1https://github.com/ics-jku/cheri-riscv-vp-plusplus

7 Conclusion and Future Work    74

7.2.3 Performance Improvements

As mentioned in Section 3.3.6, the caching mechanism added by Schlägl and Große [61] is dis- abled for the CHERI extension. As this paper reports a great performance increase for the RISC-V VP++, it is expected that the performance of the CHERI-enabled VP could also be improved signi昀椀cantly. However, enabling this caching mechanism requires some special consider- ations, especially for the caching of the PCC. Another potential performance improvement could be achieved by optimizing the conversion from the encoded capability format to the partially decompressed format, as this function is invoked repeatedly, particularly throughout the CheriBSD boot process. Currently, this conversion is im- plemented based on the Sail model, and is designed targeting functionality and easy traceability rather than performance. One could also consider implementing a fully decompressed capability format, which would reduce the e昀昀ort of calculating the boundaries of a capability on each boundary check. Instead, they could be updated, whenever the capability is modi昀椀ed. Because the boundary checks are performed more often than the modi昀椀cation of the capabilities boundaries this could result in signi昀椀cant performance gains. While this approach might not be feasible for actual hardware, it could be a good 昀椀t for the VP, as the semantics of the ISS do not change.

7.2.4 Extending other PTE Formats

As explained in Section 3.4.5, only the Sv39 page table format is currently supported by the CHERI extended RISC-V VP++. However, the base RISC-V VP++ supports all three RISC-V page table formats, namely Sv32, Sv39, and Sv48. Therefore, it would be bene昀椀cial to extend the CHERI-RISC-V VP++ to also support the other two formats. While the implementation of the Sv48 format should be straightforward, Sv32 might require some additional considerations, as the lack of spare bits in this format makes the extension difficult. The standard does provide some suggestions on how to handle this, mainly by acting like the 昀椀elds that can not be represented are always set.

7.2.5 GDB

The CTSRD project has extended the GNU Debugger (GDB) to support CHERI [62], enabling inspection of capabilities in registers and memory, as well as disassembly of CHERI-speci昀椀c in- structions. Since the RISC-V VP++ includes a GDB server interface, it is already possible to connect a GDB client and debug the programs running on the VP. This also works, when run- ning CHERI-enabled programs on the CHERI-RISC-V VP++, allowing users to set breakpoints, step through code, and inspect memory and integer-registers. However, to facilitate more e昀昀ective analysis of CHERI programs, it would be advantageous to extend the GDB server implementa- tion to be compatible with the CHERI-aware GDB. This would allow users to directly observe how capabilities are represented in registers and manipulated at runtime. While such information can be obtained by debugging the VP itself and inspecting its registers, support for the CHERI GDB extension would signi昀椀cantly streamline the debugging process, especially when working with larger programs or complex scenarios such as booting CheriBSD.

7.2.6 Veri昀椀cation

While the random testing approach using TestRIG proved to be an e昀昀ective method for verifying the CHERI-RISC-V VP++, it became evident that this method has its 昀氀aws and limitations. First, the TestRIG does not cover all possible instructions and can not handle side e昀昀ects of instructions. Second, the RVFI-DII approach requires signi昀椀cant changes to the VP, leading to a modi昀椀ed behavior under test, and also requiring additional e昀昀ort to maintain, which might not be justi昀椀ed in the long run. Therefore, it would be bene昀椀cial to explore alternative veri昀椀cation methods

7 Conclusion and Future Work    75

that can complement or even replace the current approach. Ideally, such a method would be less intrusive, allowing for a blackbox testing of the DuT, and thus avoiding the need for modi昀椀cations to the VP. One potential method is the approach used by Schlägl and Große in [63], which provided a frame- work for positive and negative testing of the RISC-V vector extension. This framework was already used to verify the implementation of the vector extension in the RISC-V VP++ [22] and could be adapted to support CHERI-speci昀椀c instructions and scenarios. The advantage of this black- box approach is that it requires only minimal changes to the DuT and natively supports the RISC-V VP++.

7.2.7 Challenging the CHERI Model

With a fully functional VP of a CHERI-enabled RISC-V system, it is now possible to challenge the CHERI model itself. As described in the introduction, CHERI claims to provide a more secure computing environment by enabling 昀椀ne-grained memory protection and highly scalable software compartmentalization. The ability to execute CHERI programs on the VP and to observe their behavior in every de- tail allows exploring aspects of the model that go beyond what functional emulators such as QEMU provide. While QEMU enables fast execution and practical software experimentation, the CHERI-RISC-V VP++ extends these possibilities with precise insight into timing behavior, hard- ware interactions and the microarchitectural consequences of capability-based execution. With the working CheriBSD system showcased in Chapter 6, researches can now experiment with the full feature set of CHERI, while at the same time validating and re昀椀ning the underlying hardware model. In this way, the CHERI-RISC-V VP++ not only extends the available toolchain for CHERI research, but also opens up new opportunities to test the limits of the model and its security guarantees. A particularly promising research direction is the systematic analysis of the protection achieved in practice, especially when the CHERI compiler inserts additional CHERI instructions. As demon- strated in Section 5.2.3, the current version of the CHERI compiler does not always generate the most restrictive capabilities possible, raising the question of how this limitation in昀氀uences the overall security guarantees of the system. Applying techniques such as data-昀氀ow analysis [64] or information-昀氀ow tracking [65] could yield detailed insights into which parts of a program are e昀昀ec- tively protected by CHERI and how this protection propagates through execution. Such methods also enable the visualization of protection 昀氀ows, facilitating the identi昀椀cation of potential weak- nesses or opportunities for re昀椀nement. Moreover, applying and further developing novel coverage metrics, such as those developed by Hazott and Große [66], may help quantify the e昀昀ectiveness of CHERI protection in real-world applications. These metrics could also support comparisons between CHERI’s 昀椀ne-grained memory protection and type-based safety models, such as those employed by Rust.

Bibliography

[1] MITRE. 2024. CWE/SANS Top 25 Most Dangerous Software Errors. (2024). https://cwe.m itre.org/top25/. [2] 2025. The Chromium Projects - Memory Safety. https://www.chromium.org/Home/chromi um-security/memory-safety/. [3] László Szekeres, Mathias Payer, Tao Wei, and Dawn Song. 2013. SoK: Eternal War in Mem- ory. In 2013 IEEE Symposium on Security and Privacy, pp. 48–62. doi: 10.1109/SP.2013.13. [4] Robert N. M. Watson, David Chisnall, Jessica Clarke, Brooks Davis, Nathaniel Wesley Fi- lardo, Ben Laurie, Simon W. Moore, Peter G. Neumann, Alexander Richardson, Peter Sewell, Konrad Witaszczyk, and Jonathan Woodru昀昀. 2024. CHERI: Hardware-Enabled C/C++ Memory Protection at Scale. IEEE Security & Privacy, 22, 4, 50–61. doi: 10.1109/MSE C.2024.3396701. [5] Robert N.M. Watson, Jonathan Woodru昀昀, Peter G. Neumann, Simon W. Moore, Jonathan Anderson, David Chisnall, Nirav Dave, Brooks Davis, Khilan Gudka, Ben Laurie, Steven J. Murdoch, Robert Norton, Michael Roe, Stacey Son, and Munraj Vadera. 2015. CHERI: A Hybrid Capability-System Architecture for Scalable Software Compartmentalization. In 2015 IEEE Symposium on Security and Privacy, pp. 20–37. doi: 10.1109/SP.2015.9. [6] Robert N. M. Watson, Peter G. Neumann, Jonathan Woodru昀昀, Michael Roe, Hesham Al- matary, Jonathan Anderson, John Baldwin, Graeme Barnes, David Chisnall, Jessica Clarke, Brooks Davis, Lee Eisen, Nathaniel Wesley Filardo, Franz A. Fuchs, Richard Grisenthwaite, Alexandre Joannou, Ben Laurie, A. Theodore Markettos, Simon W. Moore, Steven J. Mur- doch, Kyndylan Nienhuis, Robert Norton, Alexander Richardson, Peter Rugg, Peter Sewell, Stacey Son, and Hongyan Xia. 2023. Capability Hardware Enhanced RISC Instructions: CHERI Instruction-Set Architecture (Version 9). Technical report UCAM-CL-TR-987. Uni- versity of Cambridge, Computer Laboratory, (September 2023). doi: 10.48456/tr-987. http s://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-987.pdf. [7] 2025. CHERI Alliance. https://cheri-alliance.org/. [8] Thomas Aird, Hesham Almatary, et al. 2025. RISC-V Speci昀椀cation for CHERI Extensions. Technical report. Version v0.9.5, Stable, 2025-06-18. RISC-V Foundation. [9] 2025. QEMU-CHERI. https://github.com/CTSRD-CHERI/qemu. [10] Manfred Schlägl, Christoph Hazott, and Daniel Große. 2024. RISC-V VP++: Next Genera- tion Open-Source Virtual Prototype. In Workshop on Open-Source Design Automation. [11] 2023. IEEE Standard for Standard SystemC Language Reference Manual. IEEE Std 1666- 2023 (Revision of IEEE Std 1666-2011), 1–618. doi: 10.1109/IEEESTD.2023.10246125. [12] Daniel Große and Rolf Drechsler. 2010. Quality-Driven SystemC Design. Springer. [13] Vladimir Herdt, Daniel Große, and Rolf Drechsler. 2020. Enhanced Virtual Prototyping: Featuring RISC-V Case Studies. Springer. [14] Andrew Waterman, Yunsup Lee, David A Patterson, and Krste Asanovic. 2011. The risc-v instruction set manual, volume i: Base user-level isa. EECS Department, UC Berkeley, Tech. Rep. UCB/EECS-2011-62, 116, 1–32. [15] Andrew Waterman, Yunsup Lee, Rimas Avizienis, David A Patterson, and Krste Asanovic. 2016. The RISC-V instruction set manual volume II: Privileged architecture version 1.7. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2016-129. [16] 2025. RISC-V Foundation. https://riscv.org/.

76

Bibliography    77

[17] Andrew Waterman, Yunsup Lee, David A. Patterson, and Krste Asanović. 2011. The RISC-V Instruction Set Manual, Volume I: Base User-Level ISA. Technical report UCB/EECS-2011- 62. (May 2011). http://www2.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-62.html. [18] T. De Schutter. 2014. Better Software. Faster!: Best Practices in Virtual Prototyping. Syn- opsys Press, (March 2014). [19] Open SystemC Initiative (OSCI). 2009. OSCI TLM-2.0 Language Reference Manual. [Online]. Available: https://www.accellera.org/images/downloads/standards/systemc/TLM_2_0_L RM.pdf. [20] Vladimir Herdt, Daniel Große, Hoang M. Le, and Rolf Drechsler. 2018. Extensible and Con- 昀椀gurable RISC-V based Virtual Prototype. In Forum on Speci昀椀cation and Design Languages, pp. 5–16. [21] Manfred Schlägl and Daniel Große. 2023. GUI-VP Kit: A RISC-V VP Meets Linux Graphics - Enabling Interactive Graphical Application Development. In ACM Great Lakes Symposium on VLSI, pp. 599–605. [22] Manfred Schlägl, Moritz Stockinger, and Daniel Große. 2024. A RISC-V “V” VP: Unlocking Vector Processing for Evaluation at the System Level. In Design, Automation and Test in Europe Conference, pp. 1–6. [23] Manfred Schlägl and Daniel Große. 2025. Fast Interpreter-Based Instruction Set Simulation for Virtual Prototypes. In Design, Automation and Test in Europe Conference, pp. 1–7. [24] 2025. SiFive. https://www.sifive.com/boards/hifive-unleashed. [25] 2025. CTSRD - Rethinking the hardware-software interface for security. https://www.cl.ca m.ac.uk/research/security/ctsrd/. [26] Robert N. M. Watson, Simon W. Moore, Peter Sewell, and Peter G. Neumann. 2019. An In- troduction to CHERI. Technical report UCAM-CL-TR-941. University of Cambridge, Com- puter Laboratory, (September 2019). doi: 10.48456/tr-941. https://www.cl.cam.ac.uk/tech reports/UCAM-CL-TR-941.pdf. [27] Dapeng Gao and Robert NM Watson. 2023. Library-based compartmentalisation on CHERI. In Programming Languages for Architecture 2023. [28] Robert NM Watson, Jonathan Woodru昀昀, Michael Roe, Simon W Moore, and Peter G Neu- mann. 2018. Capability hardware enhanced RISC instructions (CHERI): Notes on the Melt- down and Spectre attacks. Technical report. University of Cambridge, Computer Laboratory. [29] Kyndylan Nienhuis, Alexandre Joannou, Thomas Bauereiss, Anthony Fox, Michael Roe, Brian Campbell, Matthew Naylor, Robert M. Norton, Simon W. Moore, Peter G. Neumann, Ian Stark, Robert N. M. Watson, and Peter Sewell. 2020. Rigorous engineering for hardware security: Formal modelling and proof in the CHERI design and implementation process. In 2020 IEEE Symposium on Security and Privacy (SP), pp. 1003–1020. doi: 10.1109/SP4000 0.2020.00055. [30] Jerome H. Saltzer. 1974. Protection and the control of information sharing in multics. Com- mun. ACM, 17, 7, (July 1974), 388–402. issn: 0001-0782. doi: 10 . 1145 / 361011 . 361067. https://doi.org/10.1145/361011.361067. [31] Andrew Herbert and Karen Spärck Jones, (Eds.) 2004. Least Privilege and More. Computer Systems: Theory, Technology, and Applications. Springer New York, New York, NY, pp. 253– 258. isbn: 978-0-387-21821-2. doi: 10.1007/0-387-21821-1_38. https://doi.org/10.1007/0-3 87-21821-1_38. [32] Norm Hardy. 1988. The Confused Deputy: (or why capabilities might have been invented). ACM SIGOPS Operating Systems Review, 22, 4, 36–38. [33] Robert NM Watson, Peter G Neumann, Jonathan Woodru昀昀, Jonathan Anderson, Ross An- derson, Nirav Dave, Ben Laurie, Simon W Moore, Steven J Murdoch, Philip Paeps, et al. 2012. Cheri: a research platform decon昀氀ating hardware virtualisation and protection. In Run- time Environments, Systems, Layering and Virtualized Environments (RESoLVE).

Bibliography    78

[34] Richard Grisenthwaite, Graeme Barnes, Robert N. M. Watson, Simon W. Moore, Peter Sewell, and Jonathan Woodru昀昀. 2023. The Arm Morello Evaluation Platform—Validating CHERI-Based Security in a High-Performance System. IEEE Micro, 43, 3, 50–57. doi: 10.1 109/MM.2023.3264676. [35] 2025. CHERI x86-64 Sail model. https://github.com/CTSRD-CHERI/sail-cheri-x86. [36] Jonathan Woodru昀昀, Alexandre Joannou, Hongyan Xia, Anthony Fox, Robert M Norton, David Chisnall, Brooks Davis, Khilan Gudka, Nathaniel W Filardo, A Theodore Market- tos, et al. 2019. Cheri concentrate: Practical compressed capabilities. IEEE Transactions on Computers, 68, 10, 1455–1469. [37] 2025. Sail CHERI RISC-V. https://github.com/CTSRD-CHERI/sail-cheri-riscv. [38] 2025. The Sail ISA speci昀椀cation language. https://github.com/rems-project/sail. [39] Kathryn E. Gray, Gabriel Kerneis, Dominic Mulligan, Christopher Pulte, Susmit Sarkar, and Peter Sewell. 2015. An integrated concurrency and core-ISA architectural envelope de昀椀ni- tion, and test oracle, for IBM POWER multiprocessors. In Proceedings of the 48th Interna- tional Symposium on Microarchitecture (MICRO-48). Association for Computing Machinery, Waikiki, Hawaii, pp. 635–646. isbn: 9781450340342. doi: 10.1145/2830772.2830775. https: //doi.org/10.1145/2830772.2830775. [40] 2025. Sail RISC-V. https://github.com/riscv/sail-riscv. [41] 2025. CHERI LLVM. https://github.com/CTSRD-CHERI/llvm. [42] 2025. CheriBSD - Github. https://github.com/CTSRD-CHERI/cheribsd. [43] 2025. CheriBSD. https://www.cheribsd.org/. [44] Rabia Khan, Kinan Ghanem, and Federico Co昀昀ele. 2023. Digital Security by Design: A Re- view of Combined Hardware-Software-Based CyberSecurity with Compartmentalization. In 2023 IEEE International Workshop on Technologies for Defense and Security (TechDefense), pp. 181–186. doi: 10.1109/TechDefense59795.2023.10380808. [45] 2025. QEMU. https://www.qemu.org/. [46] 2025. Cheribuild. https://github.com/CTSRD-CHERI/cheribuild. [47] 2025. CapableVMs. https://capablevms.github.io/. [48] 2025. EPSRC. https://www.ukri.org/councils/epsrc/. [49] 2025. TestRIG. https://github.com/CTSRD-CHERI/TestRIG. [50] A Joannou, P Rugg, J Woodru昀昀, FA Fuchs, M Van Der Maas, M Naylor, M Roe, RNM Watson, PG Neumann, and SW Moore. 2024. Randomized Testing of RISC-V CPUs Using Direct Instruction Injection. doi: 10.17863/CAM.95122. https://www.repository.cam.ac.uk /handle/1810/347706. [51] Claude Helmstetter, Tayeb Bouhadiba, Matthieu Moy, and Florence Maraninchi. 2006. IEEE Transactions on VLSI Systems, 14, 501–513. [52] 2025. RVFI-DII. https://github.com/CTSRD-CHERI/TestRIG/blob/master/RVFI-DII.m d. [53] 2025. Spike RISC-V ISA Simulator. https://github.com/CTSRD-CHERI/riscv-isa-sim/. [54] 2025. GCOV - A Test Coverage Program. https://gcc.gnu.org/onlinedocs/gcc/Gcov.htmli. [55] 2025. cheri-elf-psabi. https://github.com/CTSRD-CHERI/cheri-elf-psabi/blob/master/risc v.md. [56] Crispin Cowan, F Wagle, Calton Pu, Steve Beattie, and Jonathan Walpole. 2000. Bu昀昀er over昀氀ows: Attacks and defenses for the vulnerability of the decade. In Proceedings DARPA Information Survivability Conference and Exposition. DISCEX’00. Volume 2. IEEE, pp. 119– 129. [57] Marco Carvalho, Jared DeMott, Richard Ford, and David A. Wheeler. 2014. Heartbleed 101. IEEE Security & Privacy, 12, 4, 63–67. doi: 10.1109/MSP.2014.66.

Bibliography    79

[58] Brooks Davis, Robert NM Watson, Alexander Richardson, Peter G Neumann, Simon W Moore, John Baldwin, David Chisnall, Jessica Clarke, Nathaniel Wesley Filardo, Khilan Gudka, et al. 2019. CheriABI: Enforcing valid pointer provenance and minimizing pointer privilege in the POSIX C run-time environment. In Proceedings of the Twenty-Fourth In- ternational Conference on Architectural Support for Programming Languages and Operating Systems, pp. 379–393. [59] 2025. OpenSBI. https://github.com/riscv-software-src/opensbi. [60] 2025. OpenSBI-CHERI. https://github.com/CTSRD-CHERI/opensbi. [61] Manfred Schlägl and Daniel Große. 2025. FastISS RISC-V VP++: A Simulation Performance Evaluation of RVV Workloads. In RISC-V Summit Europe. [62] 2025. GDB CHERI. https://github.com/CTSRD-CHERI/gdb. [63] Manfred Schlägl and Daniel Große. 2024. Single Instruction Isolation for RISC-V Vector Test Failures. In IEEE/ACM International Conference on Computer-Aided Design, 156:1–156:9. [64] Muhammad Hassan, Vladimir Herdt, Hoang M. Le, Mingsong Chen, Daniel Große, and Rolf Drechsler. 2017. Data Flow Testing for Virtual Prototypes. In Design, Automation and Test in Europe Conference, pp. 380–385. [65] Pascal Pieper, Vladimir Herdt, Daniel Große, and Rolf Drechsler. 2020. Dynamic Information Flow Tracking for Embedded Binaries using SystemC-based Virtual Prototypes. In Design Automation Conference, pp. 1–6. [66] Christoph Hazott and Daniel Große. 2024. Relation Coverage: A new Paradigm for Hard- ware/Software Testing. In IEEE European Test Symposium, pp. 1–4.

Appendix A

TestRIG Code Coverage

Appendix A TestRIG Code Coverage 81

Table A.1: Code Coverage of the CHERI-RISC-V VP++ running 2 150 000 test cases generated by TestRIG

Directory File core/common bus_lock_if.h bus_lock_if.h dbbcache.h elf_loader.h helper_functions.h instr.cpp instr.h iss_stats.h lscache.h lscache_stats.h mmu.h real_clint.cpp timer.cpp timer.h trap.h core/common/cheri cheri_cap_common.h cheri_capability.h cheri_exceptions.h cheri_reg昀椀le.h cheri_sys_regs.h core/rv64 csr.h iss.h iss_ctemplate.cpp iss_ctemplate.h syscall.h core/rv64/cheri64 cheri_addr_checks.h cheri_mem.h cheri_prelude.cpp platform/cheri cheri_main.cpp platform/common async_event.h bus.h cheri_memory.h options.cpp terminal.h Line Coverage Function Coverage 32.6% 1391 / 4265 71.3% 144 / 202 60.0 % 3 / 5 33.3 % 1 / 3 59.1 % 13 / 22 66.7 % 4 / 6 79.7 % 102 / 128 80.0 % 4 / 5 85.9 % 61 / 71 100.0 % 8 / 8 100.0 % 3 / 3 100.0 % 1 / 1 29.6 % 1047 / 3538 94.5 % 52 / 55 87.0 % 80 / 92 87.0 % 40 / 46 95.2 % 20 / 21 95.2 % 20 / 21 23.8 % 19 / 80 71.4 % 5 / 7 11.1 % 1 / 9 11.1 % 1 / 9 5.9 % 9 / 153 16.7 % 3 / 18 29.9 % 23 / 77 7.7 % 1 / 13 5.2 % 3 / 58 14.3 % 1 / 7 100.0 % 1 / 1 100.0 % 1 / 1 85.7 % 6 / 7 100.0 % 2 / 2 91.3 % 377 / 413 94.1 % 64 / 68 100.0 % 20 / 20 100.0 % 7 / 7 98.9 % 269 / 272 97.6 % 40 / 41 60.0 % 33 / 55 83.3 % 5 / 6 71.4 % 25 / 35 77.8 % 7 / 9 96.8 % 30 / 31 100.0 % 5 / 5 28.3 % 1950 / 6899 61.8 % 97 / 157 81.2 % 177 / 218 86.7 % 26 / 30 0.0 % 0 / 11 0.0 % 0 / 2 25.5 % 1661 / 6525 50.0 % 44 / 88 91.6 % 98 / 107 80.0 % 24 / 30 36.8 % 14 / 38 42.9 % 3 / 7 68.1 % 383 / 562 65.3 % 66 / 101 90.9 % 20 / 22 100.0 % 2 / 2 64.5 % 321 / 498 63.5 % 61 / 96 100.0 % 42 / 42 100.0 % 3 / 3 83.3 % 75 / 90 100.0 % 3 / 3 83.3 % 75 / 90 100.0 % 3 / 3 60.1 % 131 / 218 65.7 % 23 / 35 46.2 % 6 / 13 50.0 % 2 / 4 58.0 % 40 / 69 76.9 % 10 / 13 73.8 % 48 / 65 72.7 % 8 / 11 54.8 % 34 / 62 40.0 % 2 / 5 33.3 % 3 / 9 50.0 % 1 / 2