Source 3692a174... — STIMSMITH

SOURCE ARCHIVE

SHA256: 3692a174862e37ef441ccdf5031eb8f140bc406e824647f08864225006c6944b

URL: https://www.usenix.org/system/files/sec23fall-prepub-7-xu-jinyan.pdf

TYPE: application/pdf

SIZE: 844.9 KB

FETCHED: 6/6/2026, 10:07:44 PM

EXTRACTOR: liteparse

CHARS: 111,175

EXTRACTED CONTENT

111,175 chars

MorFuzz: Fuzzing Processor via Runtime Instruction Morphing enhanced Synchronizable Co-simulation

  Jinyan Xu              Yiyuan Liu                                      Sirui He

Zhejiang University Zhejiang University City University of Hong Kong phantom@zju.edu.cn yiyuanliu@zju.edu.cn sol.he@my.cityu.edu.hk Haoran Lin Yajin Zhou* Cong Wang Zhejiang University Zhejiang University City University of Hong Kong haoran_lin@zju.edu.cn yajin_zhou@zju.edu.cn congwang@cityu.edu.hk

Abstract                                                             devastating errors, such as unpredictable system behavior,

Modern processors are too complex to be bug free. Recently, a few hardware fuzzing techniques have shown promising results in verifying processor designs. However, due to the complexity of processors, they suffer from complex input grammar, deceptive mutation guidance, and model implemen- tation differences. Therefore, how to effectively and efficiently verify processors is still an open problem. This paper proposes MorFuzz, a novel processor fuzzer that can efficiently discover software triggerable hardware bugs. The core idea behind MorFuzz is to use runtime infor- mation to generate instruction streams with valid formats and meaningful semantics. MorFuzz designs a new input struc- ture to provide multi-level runtime mutation primitives and proposes the instruction morphing technique to mutate instruc- tion dynamically. Besides, we also extend the co-simulation framework to various microarchitectures and develop the state synchronization technique to eliminate implementation dif- ferences. We evaluate MorFuzz on three popular open-source RISC-V processors: CVA6, Rocket, BOOM, and discover 17 new bugs (with 13 CVEs assigned). Our evaluation shows MorFuzz achieves 4.4× and 1.6× more state coverage than the state-of-the-art fuzzer, DifuzzRTL, and the famous con- strained instruction generator, riscv-dv. 1 Introduction With extensions to improve performance and extend function- ality, processor designs are becoming more and more sophis- ticated. Modern processors are extremely large and complex, typically with billions of transistors and multiple cores. Mean- while, processors also become increasingly error-prone, and even the latest commodity processors suffer from hardware bugs. For example, Intel discovered 42 errata in their 12th-gen CPUs [13]. Bugs in the processor not only produce incorrect computations (e.g., the infamous Pentium FDIV bug [11] re- turns inaccurate floating-point division results) but also cause *The corresponding author. locking up the machine, and even software security corrup- tions. The hyper-threading bug [12] can cause data corruption or loss in general-purpose registers, the Barcelona TLB bug [10] and the Pentium F00F bug [11] freeze up the proces- sor, and vulnerabilities like SYSRET bug [15] and memory sinkhole [17] allow unprivileged code escalate into higher privilege. Since hardware bugs are difficult to patch after the chip is manufactured, it is vital to discover bugs in the pre-silicon phase. Two main methods are proposed to discover hardware bugs automatically: static formal verification [21, 27, 40, 46, 69, 71] and dynamic simulation-based verification. While formal verification methods can thoroughly verify small designs, they are limited by the state explosion problem and fail to scale to large, complex designs such as processors. To automatically maximize the exploration of the state space of the processor under test, researchers proposed two mechanisms, constrained random verification [28, 35, 41, 66] and coverage guided test generation [22, 56, 59, 60], to direct the simulation-based methods to generate better test cases. However, these dynamic methods both require design-specific knowledge to define the generation strategies, which require heavy manual effort. Recently, fuzzing has become the most popular and ef- fective method in software systems due to the ability to discover unknown vulnerabilities with minimal knowledge [4, 25, 29, 47, 51, 55]. Inspired by the effectiveness of fuzzing, researchers started to apply software fuzzing to processors [6, 7, 30, 32, 33, 37, 39]. Unfortunately, according to our eval- uation (§5.3), existing fuzzers are still far from being adopted in practice. Previous efforts fail to effectively and efficiently fuzz processors because of the following three challenges. First, the input grammar of the processor is complex. Pro- cessors usually support many different instructions, each of which has its own unique format. Moreover, these instructions require different types of operands (e.g., integers, floating- point numbers, addresses) to perform meaningful operations, further complicating the input grammar. Existing fuzzers [30, 32, 33] statically generate and mutate instructions, re-

sulting in limited mutation primitives and missing effective In summary, this paper makes the following contributions: semantics. The second challenge is that the control transfer instructions (such as jump and branch instructions) impair • We propose a novel processor fuzzing approach that uses the effectiveness of mutations. Existing fuzzers ignore the runtime information to dynamically generate meaningful interference of the input’s control flow on the coverage. As input and efficiently guide mutation. a result, valuable mutations may be skipped because of the control transfer instructions and thus incorrectly discarded. • We present the design and implementation of MorFuzz, And the third issue with existing fuzzers is the implementa- a processor fuzzing framework that can efficiently detect tion differences between models. Almost all previous fuzzers software triggerable hardware bugs. MorFuzz achieves [6, 30, 32, 33] introduce a reference model to check the cor- 4.4× and 1.6× higher coverage than the state-of-the- rectness of the processor. By comparing the state of the proces- art processor fuzzer, DifuzzRTL, and the famous con- sor with that of the reference model, they treat the mismatched strained instruction generator, riscv-dv, respectively. states as bugs. However, software reference models are inher- ently different from hardware, and not all differences are bugs. • MorFuzz is a generic RISC-V processor fuzzer that is These false positives caused by implementation differences compatible with various microarchitectures. We evaluate misguide the fuzzers and prevent them from covering the deep MorFuzz on three popular real-world RISC-V processors states of the processor. (CVA6, Rocket, BOOM) and totally discover 17 new We address the aforementioned challenges with MorFuzz, bugs (with 13 CVEs assigned). a novel processor fuzzer that can detect software triggerable hardware bugs efficiently. MorFuzz addresses the first two • To facilitate the community and future research, we re- challenges by dynamically generating diverse and meaningful lease the source code of MorFuzz at https://github. instruction streams based on the runtime information. First, com/sycuricon/MorFuzz. MorFuzz introduces a new input structure, the stimulus tem- plate, to explore the processor’s input space from multiple 2 dimensions. The stimulus template provides primitives to mu- Background tate inputs at the processor state, instruction field, and program 2.1 RISC-V Instruction Set Architecture semantic levels. Second, MorFuzz uses runtime information to morph instructions dynamically. We propose the instruction The RISC-V instruction set architecture (ISA) is an open- morphing technique, which collects contextual information source reduced instruction set architecture that has gradually from the processor at runtime to mutate instructions with become popular in industry and academia. It is composed of a valid formats and meaningful semantics. In addition, since base integer instruction set and a set of optional instruction-set all mutations are executed, the coverage correctly reflects extensions. The standard extensions contain integer multipli- the effect of the mutations, achieving efficient mutation guid- cation and division, atomic memory operations, single/double- ance. Finally, MorFuzz eliminates implementation differences precision floating-point, and compressed instructions. In ad- through state synchronization. We extend the co-simulation dition, the control and status register (CSR) instruction exten- framework to various microarchitectures and add state syn- sion provides control over the privileged architecture, and the chronization support. This allows MorFuzz to identify the instruction-fetch fence extension is designed to synchronize source of the differences and synchronize the hardware state the instruction memory. to the reference model to eliminate legal differences. We have implemented a prototype of MorFuzz on RISC-V 31 24 19 14 11 6 0 architecture and evaluated it on three real-world open-source funct7 rs2 rs1 funct3 rd opcode R processors: CVA6 [68], Rocket [1], and BOOM [70]. These imm imm rs2 rs1 funct3 rd opcode I processors under evaluation cover various microarchitectures, imm rs2 rs1 funct3 imm opcode S rs1 funct3 imm opcode B from simple in-order cores to complex out-of-order super- imm rd opcode U scalar cores. Our evaluation shows that MorFuzz achieves at imm rd opcode J most 4.4× and 1.6× more state coverage than the state-of-the- 15 12 11 6 4 1 0 funct4 rd/rs1 rs2 op CR art processor fuzzer, DifuzzRTL, and the famous constrained funct3 imm rd/rs1 imm op CI instruction generator, riscv-dv, respectively. In terms of per- funct3 imm rs2 op CSS funct3 imm rd’ op CIW 31 formance, MorFuzz achieves the coverage that DifuzzRTL funct3 imm rs1’ imm rd’ op CL takes 24 hours to achieve in about 30 minutes and takes about funct3funct6imm rs1’ imm rs2’ op CS 2.4 hours to achieve the coverage that riscv-dv takes 24 hours funct3 imm rd’/rs1’ funct2 rs2’ op CA rs1’ imm op CB to complete. MorFuzz identified 17 new bugs in total, 13 of funct3 imm op CJ which are assigned with CVE numbers, and all of these bugs 15 12 are confirmed by the respective communities. Figure 1: RISC-V instruction formats. funct4 funct3 im funct3 funct3

         RISC-V instructions currently have two valid lengths. Ex-   Processor  RTL                               ISA

cept for the 16-bit compressed instructions, all other instruc- RTL code simulator simulator tions are 32-bit width. Figure 1 shows all 15 instruction for- Host S Sref mats, each consisting of multiple fields. The format of the Seed Mutator Input executable DUT Comparator 32-bit width instruction is determined by the opcode field, corpus binary while the op and the funct fields determine that of the 16-bit Coverage Coverage Bug width instruction. Currently, there are two categories of in- Instrument struction fields. The first category is opcode related fields. The Input Generation Hardware Simulation State Verification funct and opcode fields are used to determine the instruc- Figure 2: Hardware fuzzer workflow. tion’s operation, also known as the opcode. The lower two bits of the opcode field and the op field are used to determine the length of the instruction. The funct fields and the other The typical simulation-based verification method involves the bits of the opcode field are used to determine the opcode type. following three phases. First, the test case generator randomly Typically, instructions with similar functions have the same generates instruction streams based on constraints [28, 35, opcode field and are distinguished by the funct fields. The 41, 66] or coverage [22, 56, 59, 60]. Next, the RTL simulator second category is the operand related fields. The imm and rs [14, 52, 65] translates the RTL code of the processor under fields are designed to provide the operands. The rs fields are test into a software model. The simulator then compiles the used to select the source registers, and the imm fields are used software model with its test harness (containing the input as the immediate number. And the rd fields are used to con- interpreter) into a host executable binary file. The simulation trol the destination register, where the result of the instruction is performed by executing the binary file, and during the is written back. simulation, the input is translated into the bus transactions that are recognized by the DUT. Finally, the correctness of the 2.2 Processor Verification DUT’s behavior during the simulation is checked by verifying the external visible architectural state of the processor due Unlike software, hardware cannot be easily patched once to the difficulty of checking an abstract implementation. A manufactured. To avoid pre-silicon bugs from escaping to golden reference model is introduced to execute the same post-silicon, verification is performed throughout the devel- input, and the correctness of the DUT can be determined by opment process. Statistically, about 56% of the project time comparing the architectural state of the DUT SDUTarc and the is spent on verification [23]. reference model SREFarc.

2.2.1 Typical Processor Verification 2.2.2 Processor Fuzzing The processor is a finite state machine, and its state includes The typical processor verification method described above is the microarchitectural state and the architectural state. The limited by the quality of the generated test cases, struggling microarchitectural state represents the implementation-related to cover corner cases. Fuzzing has recently become a popular internal state that is transparent to the outside of the processor. testing technique for automatically detecting software security In contrast, the architectural state holds the state of a program vulnerabilities. Driven by the success of fuzzing, researchers (e.g., the memory and the general-purpose registers) and is have recently proposed to apply it to processor verification consistent across the same ISA. We denote the implementa- [6, 7, 30, 32, 33, 37, 39]. Figure 2 illustrates the general work- tion of the processor design under test (DUT) as a function flow of the existing processor fuzzing frameworks, which also fDUT , SDUT denotes the state of the DUT in current cycle. consist of three phases. In the input generation phase, the At each cycle, the processor generates the next state SDUT′ fuzzer generates instruction streams using the seeds and mu- based on the current state SDUT and the external input I (i.e., tates the instruction streams based on the coverage of the instructions): fDUT (I, SDUT ) → − SDUT′ . The task of processor previous round. DifuzzRTL [30] uses the static analysis tech- verification is to check whether the implementation function nique to generate instructions with required operands, and fDUT is a valid subset of the implementation function fspec TheHuzz [33] optimizes mutations according to its optimal defined in the specification. weights. In the hardware simulation phase, the RTL code of Researchers deploy two main methods to verify hardware the DUT is also translated into the host executable binary designs: static formal verification [21, 27, 40, 46, 69, 71] and file. During the simulation, the fuzzer uses the instruments dynamic simulation-based verification. As formal verification in the hardware to collect the coverage of the current input. is limited to scale to complex designs [16], simulation-based Existing fuzzers have designed various coverage matrices, verification is more prevalent in practice. The simulation- such as mux coverage [37], control register coverage [30], based verification uses tailored input to simulate the DUT and hardware behavior coverage [33]. In the state verification and verify whether the output of the DUT meets expectations. phase, the fuzzer extracts the DUT’s architectural state and

1 start: the input of the processor fuzzing contains control transfer 23 call init_regs instructions and exceptions. For this reason, the generated in- 4 l1: call init_page_table structions are not guaranteed to be executed, so the coverage 5 addi x2, x4, -935 actually reflects the effect of the executed instructions rather 6 l2: than the effect of the generated instructions. Unfortunately, 7 la x2, l86 existing fuzzers all choose the latter, making the coverage 8 jalr x20, 0(x2) misleading to the mutation. For example, the jalr instruction 9 # ... on line 8 jumps from l2 to l86, causing all instructions from 10 l86: l3 to l85 to be skipped. Suppose the skipped instructions con- 11 csrrw x6, satp, x5 tain some valuable mutations, and the executed instructions 12 l87: do not contribute to the coverage. The fuzzer will consider all 13 blt x25, x6, exit these mutations unhelpful and will eventually discard them. 1415 exit:# ... As a result, the coverage incorrectly guides the fuzzing toward 16 call signature an ineffective direction. Model Implementation Differences. Existing fuzzers use an Figure 3: Example test case generated by DifuzzRTL. ISA simulator as the reference model to detect hardware bugs. However, the ISA simulator is only a functional model of the then compares it with a reference model (e.g., an ISA simula- processor, and there are some inherent differences compared tor) and reveals the mismatches as bugs. However, previous with the actual hardware. For example, the ISA simulator is fuzzers simply port software fuzzing to the traditional verifica- cycle inaccurate and lacks peripheral simulation. Therefore tion flow while ignoring the challenges of processor fuzzing. the two models will get mismatched values when accessing According to our statistics in Figure 8, the performance of these registers. Another source of the differences is the inde- the state-of-the-art processor fuzzer, DifuzzRTL [30], is even terminateness in the specification. The RISC-V specification worse than using randomly generated test cases. does not restrict the implementation, so potentially multiple behaviors are allowed. For instance, the property of the CSR 2.3 Challenges of Processor Fuzzing (e.g., satp) usually is "Write Any Read Legal", which means that even if the same value is written to the CSR, the value We use the test case (Figure 3) generated by the state-of- readout may differ depending on the implementation (line 11). the-art processor fuzzer DifuzzRTL [30] as an example to Unfortunately, these differences are legal in the specification. articulate the challenges of processor fuzzing and analysis And even worse, since the state verification phase is offline, why previous fuzzers fail to effectively and efficiently fuzz these differences can cause the two models’ control flows to processors. In the first three lines, the test case initializes the diverge. For instance, the DUT uses the mismatched value to execution environment. Lines 4 to 14 are instructions used execute the branch instruction at line 13. If the DUT and the to fuzz the functionality of the processor. Each label is a simulator do not perform consistent branch behavior, it will test point, and DifuzzRTL typically generates about 180 test lead to completely mismatched subsequent traces, resulting points on average in one test case. In the end, the test case in meaningless execution. dumps the architectural state of the processor to memory as Inefficient Execution. The duplicated instruction streams the signature and exits the simulation (line 16). have no contribution to the coverage. For example, before the Complex Input Grammar. The processor’s behavior is de- fuzzing payload in the test case is executed, the fuzzer spends termined not only by the external input instructions I but also considerable time loading the test case into the DUT’s mem- by its current state SDUT . Since the state of the processor is ac- ory and waiting for the DUT to execute several initialization cumulated from previous instructions, instruction sequences functions (e.g., init_regs, init_page_table) to set up the also affect the state of the processor. And the instruction it- environment. However, DifuzzRTL can only access the DUT self also contains two variables, the opcode and the operand, through limited ports provided by the test harness, and once both of which might take on legal or illegal values. Based the simulation starts, the fuzzer has no control over the control on these multi-dimensional parameters, processor inputs also flow of the test case. Due to the poor controllability of the have complex semantics, and an instruction only performs DUT, the time-consuming initialization process is repeatedly meaningful operations with valid operands in a particular ex- executed without any improvement in coverage, which results ecution environment. Existing fuzzers fail to generate diverse in ineffective fuzzing. and meaningful instruction streams limited by static genera- tion and unidimensional mutation. For example, DifuzzRTL 3 Design uses approximate static analysis to select operands, while TheHuzz randomly mutates fields ignoring semantics. MorFuzz is a novel coverage-guided processor fuzzer that Deceptive Mutation Guidance. Unlike software fuzzing, can efficiently detect software triggerable hardware bugs. In

Processor Arc State RTL DUT

Input Generation                            Hardware Simulation & State Verification      Fetch Logic addi x2, x4, -935 Morpher    xori x5, x9, 15 Decode Logic

Stimulus Template ISA State Sync Generation Runtime Instruction Simulator Synchronizable Template Instruction rs1 funct3 rd opcode Generation Morphing Co-simulation imm Processor Arc State 110001011001 RTL DUT 00100 000 00010 0010011 Magic Instruction Field Level Mutation State Compatible addi x2, x4, -935 Generation Stimulus Semantic Level Morphed Instruction Co-simulation Field Level Mutation Execution Environment Template Mutation Streams State Synchronization Packaging 000000001111 00100 100 00010 0010011 Seed Coverage Processor morphable fields: imm, rs1, funct3, rd Semantic Level Mutation Under Test 000000001111 Golden Simulator Model corpus 01001 100 00101 0010011 Bug Report xori x5, x9, 15 Arc State Figure 4: Overview of MorFuzz. Figure 5: Example of instruction morphing. Processor RTL ISA RTL code simulator simulator Host State this section, we first give an informal verification scope of Seed Mutator Input executable State Comparator of mutating instructions indiscriminately like existing fuzzers. MorFuzz and then elaborate on the design details. corpus coverage binary Instruction morphing is a dynamic instruction mutation tech- Bug nique that accurately reflects the effect of the mutations, and Input generation Hardware Simulation State Verification thereby the coverage can effectively guide the fuzzer. And 3.1 Verification Scope instruction morphing uses runtime information to mutate op- Unlike previous fuzzers [7, 32, 37, 58] that focused on bugs codes and operands, ensuring that the morphed instructions triggered by specific hardware signals, MorFuzz is designed maintain valid field format and meaningful semantics. Be- to detect architecture functional bugs triggered by software. sides, instruction morphing is performed on binary instruc- Specifically, MorFuzz focuses on bugs triggered by specific tions, which makes it easier for MorFuzz to generate corner combinations of instructions that cause the processor’s be- cases that assembly language cannot represent, thus greatly havior to deviate from the ISA specification. Therefore, the increasing the efficiency in exploring the input space. processor’s behaviors under any privilege need to be veri- Synchronizable Co-simulation (§3.5). MorFuzz synchro- fied, and behaviors that are undefined or unconstrained in nizes the legal differences to address the implementation dif- the specification are out of our verification scope. MorFuzz ferences between models. During the simulation, MorFuzz will trust the high privilege levels it relies on and use sim- uses a simulator to co-simulate with the DUT, and compares plified firmware to provide the required functionality when the architectural state of the DUT and the simulator after testing the processor’s behavior at low privilege levels. In ad- each instruction is executed to check the correctness. Co- dition, transient execution bugs caused by microarchitecture simulation can also allow MorFuzz to locate which instruc- mistakes are out of our scope [42, 64]. tion caused the mismatched state accurately. Based on this, MorFuzz can further analyze whether the difference is legal 3.2 Architecture Overview and synchronize the correct state from the DUT to the sim- ulator, thus eliminating the mismatch. Benefiting from the The core idea behind MorFuzz is to dynamically mutate in- synchronizable co-simulation framework, MorFuzz can auto- structions based on the runtime feedback. In summary, Mor- matically mitigate the implementation differences and allow Fuzz leverages the following techniques to resolve the afore- the simulator to co-simulate synchronously with the DUT, mentioned challenges (§2.3): the stimulus template, the in- thus directing the fuzzer to cover more depth states. struction morphing, and the synchronizable co-simulation. Overview. The overall workflow of MorFuzz is depicted in Stimulus Template (§3.3). Unlike existing fuzzers that di- Figure 4. First, MorFuzz uses seeds to generate the stimulus rectly generate instruction streams as the stimulus, MorFuzz templates. Then, MorFuzz dynamically morphs the template uses a stimulus template to generate diverse and meaningful based on the runtime information and executes morphed in- instruction streams. The stimulus template provides multi- struction streams simultaneously on the DUT and the sim- level runtime mutation primitives, including processor state ulator. Finally, after each instruction is executed, MorFuzz level, instruction field level, and program semantic level, compares the architectural state of the two models. After Mor- thereby comprehensively exploring the input space of the Fuzz analyzes the mismatches, the legal difference states are processor. In addition, the stimulus template also introduces synchronized to the simulator, and the others are reported as the ability for the fuzzer to communicate with the DUT to potential bugs. manage the control flow of test cases. Therefore, the fuzzer can accurately control the DUT to skip duplicate instructions 3.3 Stimulus Template Generation and focus on the instruction sequences it is interested in. Instruction Morphing (§3.4). Instruction morphing only mu- We design a new structure for the test case, the stimulus tem- tates those instructions that are going to be executed instead plate, to provide runtime mutation primitives for processor

state and instructions. The stimulus template consists of two powerful fuzzing execution environment into the stimulus parts: the runtime morphable fuzzing payload and the read- template. First, the fuzzing execution environment is respon- only fuzzing execution environment. The fuzzing payload sible for setting up the execution environment, such as ini- contains the runtime mutation primitives, and the fuzzing ex- tializing general-purpose registers and memory, configuring ecution environment is the system firmware responsible for address translation mode, and switching to the target privi- providing a software execution environment that allows the lege level. Second, morphed instructions inevitably trigger DUT to execute the fuzzing payload continuously. exceptions, so the fuzzing execution environment is required Template Instruction Generation. Template instructions are to be able to handle the exceptions to avoid crashing the exe- blank payload instructions for instruction morphing, which cution. And third, the fuzzer manages the control flow of the provide mutation primitives for the instruction field and the stimulus template through the fuzzing execution environment. program semantic at runtime. During the generation, the tem- After executing the scheduled fuzzing payload, the fuzzing plate instruction acts like a placeholder, only containing the execution environment communicates with the fuzzer, and the fields that determine the length of the instruction to calculate fuzzer decides whether to continue the simulation based on the memory layout of the stimulus template. The other fields the reported coverage. are temporarily filled with dummy values, which MorFuzz will replace with meaningful values based on the contextual 3.4 Runtime Instruction Morphing information later during the simulation. MorFuzz generates template instructions at block granularity and designs differ- When the hardware simulation begins, MorFuzz uses instruc- ent testing blocks to cover the various hardware functional tion morphing to morph the template instructions to generate modules of the processor. A set of sequence patterns are diverse and meaningful instruction streams. To mutate the manually constructed in each testing block to constrain the instruction being executed, MorFuzz inserted a morpher into instruction types of each template instruction in the block to the DUT. The morpher is a logic block inserted in the circuit, achieve the desired test points. Under the constraints of the which does not affect the processor’s functionality. Typically, sequence pattern, each testing block is randomly filled with a it is placed on the wire that connects the processor fetch unit bunch of template instruction sequences with special seman- and the decode unit. Figure 5 illustrates the workflow of the tics. In addition, the sequence patterns also expose the DUT’s morpher. First, the morpher hijacks the instruction fetched internal state by inserting watchpoint instructions at specific from memory. Next, the morpher decodes the instruction and locations to enhance observability. For example, MorFuzz performs field level mutation on morphable opcode related inserts instructions to read the floating-point exception flag fields. And then, the morpher uses the contextual information CSR after the floating-point instruction sequence to check to generate operand related fields with good semantics. Lastly, whether the exception flag is set correctly. the morphed instruction is sent back to the decoder unit. From Magic Instruction Generation. MorFuzz instruments magic the view of the processor, the instruction fetched from mem- instructions in the prologue of each testing block as the proces- ory magically turns into another different instruction. sor state runtime mutation primitives. The magic instructions Field Level Mutation. Field level mutation is a structured are the load instructions that access a random number gener- binary mutation approach that ensures the mutated instruc- ator mounted in the test harness. During the simulation, the tions remain a valid format. To avoid subsequent mutations DUT can atomically randomize the general-purpose regis- from destroying the structure of the instruction, the morpher ters by accessing the generator. The DUT can specify the chooses to generate an instruction similar to the hijacked generated data type by accessing different address offsets of template instruction instead of producing a completely differ- the generator, including integers, floating-point numbers, ad- ent instruction. Therefore, the morpher does not mutate the dresses, page table entries, etc. The random number generator fields that determine the instruction format and length, e.g., can generate not only random numbers but also particular the opcode field. When a new template instruction arrives, the corner values (e.g., illegal addresses, maximum and minimum morpher first decodes the instructions and determines which in integers, INF and NaN in floating-point numbers). This fields are morphable. Next, the morpher randomly selects significantly improves the possibility of covering corner cases valid opcodes defined in the specification to replace the op- and increases fuzzing stress. code related fields in the morphable fields. For instance, in Instruction Shuffle. To further increase the sequence level Figure 5, by morphing the funct3 field, the morpher mutates randomness, we also perform a randomized perturbation of an addi instruction into a xori instruction. And finally, the the order of all instructions in the fuzzing payload at the end morpher passes the half-finished instruction and the list of of the generation, called instruction shuffle. Although some morphable fields to the subsequent mutation process. watchpoints will be sacrificed, shuffling instructions mix up Semantic Level Mutation. To make the morphed instruc- adjacent testing blocks, increasing the diversity of instruction tions close to real-world usage scenarios, the morpher not sequences and further producing more processor states. only generates the operand related fields randomly but also Execution Environment Packaging. MorFuzz integrates a combines contextual information to mutate the semantics.

              Handle exception No                  Yes                             DUT                        ISA Simulator

Start and redirect to Syscall ? fuzzing payload instruction a commit ① commit pc, inst ② Execute morphed fuzzing payload Exception execute one instruction record ref wdata and check pc, insn Initialize addi x2, x4, -935 instruction a write ③ Commit Pass execution back data ready judge wdata environment Exception ? instruction b commit ④ Judge Pass with write back data commit pc, inst xori x5, x9, 15 Interrupt judge wdata record ref wdata Commit Pass Yes Judge Pass Exit No Continue ? Report coverage instruction c commit commit pc, inst to host record ref wdata Commit Pass instruction c write Figure 6: Stimulus template runtime workflow. back data ready judge wdata Judge Fail No Allow ? Yes ⑤ ⑥ sync mismatch state First, the morpher generates valid address offsets based instruction d commit commit pc, inst on the current program counter and address space. During exit Commit Fail the simulation, the morpher senses DUT’s address translation mode and maps the memory layout in the stimulus template Figure 7: MorFuzz state verification flow. into the current address space. When generating an immediate number related to the address (e.g., the imm field of the branch instructions), the morpher calculates address offset based on ing the morphed instructions, if the DUT triggers an exception, the current program counter and mapped target address, thus the exception handler in the fuzzing execution environment ensuring that the morphed immediate number is meaningful. will try to handle the exception. Whether or not the handler Second, the morpher maintains a type pool of general- successfully handles the exception, the handler redirects the purpose registers to provide operands with the desired type. DUT back to the fuzzing payload. A unique system call is For memory load and store instructions, they require that the triggered when the DUT reaches the boundary of the fuzzing base address register field rs1 must point to a register con- payload, notifying the fuzzer to collect the current coverage taining an address. In order to generate meaningful rs fields, and fix the program counter. By evaluating the coverage, if the the morpher keeps track of the data types (including address fuzzer is interested in the input, it controls the DUT to return and general data) in the general-purpose registers to provide to the fuzzing payload again. Otherwise, the fuzzer will ter- the correct operand type. To simplify type tracing, MorFuzz minate the simulation and generate a new stimulus template. does not trace the data flow but only marks the type of the In addition, to avoid the DUT from falling into dead loops, destination register as its obtained data type when executing MorFuzz also monitors the coverage. If the coverage does not a magic instruction. If another normal instruction writes that increase for a period of time, the fuzzer will raise an interrupt destination register again, the register will lose its type. to stop the simulation. In summary, the fuzzer can control the Third, the morpher uses a sliding window to record the DUT to continuously execute diverse and meaningful instruc- destination register field rd of instructions still being exe- tion streams in a loop without additional initialization, thus cuted in the pipeline. The morpher can use the registers in the significantly improving the fuzzing performance. sliding window as the rs and rd fields for subsequent tem- plate instructions to generate instructions containing pipeline 3.5 Synchronizable Co-simulation hazards, such as read-after-write and write-after-write. There- fore, MorFuzz is also able to generate inputs matching the MorFuzz applies an online co-simulation approach for state microarchitectural details of the DUT spontaneously. verification, using an ISA simulator running in parallel with Notice that the morpher would still try illegal cases with a the DUT as the reference model. The ISA simulator and small probability because the input space out of the specifi- the DUT execute the same inputs, so the correctness of the cation is also a significant source of bugs, such as the illegal DUT’s state can be checked by comparing their states after opcodes bugs B1 and the illegal operands bugs B4. each instruction is executed. Diverse and Meaningful Instruction Streams. Finally, with Compatible Co-simulation. The existing work has assumed the help of the stimulus template, MorFuzz morphs template that the write-back data is always ready when the DUT com- instructions to produce diverse and meaningful instruction mits instructions. However, this assumption is not always true streams on the fly. As Figure 6 shown, the DUT starts execu- due to the microarchitectural differences between processors. tion from the initialization function in the fuzzing execution For example, Rocket [1] supports delayed write-back, which environment and jumps to the fuzzing payload. While execut- means the write-back data of long-latency instructions (e.g.,

multiply and divide, floating-point instructions) may not be 4.1 Stimulus Template Generation ready at the commit stage. To accommodate different microarchitectures, MorFuzz The stimulus template generator consists of 2.6K lines of abstracts the state comparison process into two stages, the python code and 1.3K lines of assembly and C code. We commitment stage and the judgment stage. More specifically, define a 128-bit seed to generate the stimulus template. The we use the instruction a-d in Figure 7 as an example. As- seeds determine the fuzzing execution environment and in- suming that the write-back data is not ready when the DUT struction extensions to be tested, control the weight of differ- commits the instruction a. In the commitment stage, the DUT ent testing blocks, seed the random number generator, and set first commits its program counter and the executed instruc- the intensity of the instruction shuffle. tion ( 1⃝). Once the simulator receives the commit request, it Testing Block. MorFuzz generates different types of testing executes the next instruction and then checks if the executed blocks based on the weights in the seed. The higher the weight instruction is consistent with the one committed by the DUT of the testing block in the seed, the more likely it is to be ( 2 ⃝). If the check passes, the simulator records its reference generated. We have designed seven types of testing blocks write-back data to the scoreboard ( 3⃝). The judgment stage to cover various hardware functional modules of the DUT, starts after the write-back data of the instruction a is ready. including integer arithmetic test, floating-point arithmetic test, MorFuzz compares the write-back value with the reference CSR test, memory operation test, atomic memory operation value in the scoreboard to determine whether the instruction test, system operation test, and custom extension test. And is executed correctly ( 4⃝). The instruction b shows the case control transfer instruction is placed at the end of each testing where the commit stage and the judgment stage fire simulta- block to chain them together. neously, MorFuzz is compatible with this case. When mis- Fuzzing Execution Environment. We extended the testing matched behavior is detected, MorFuzz reports the potential environment provided by the official RISC-V testing reposi- bug and exits the simulation (e.g., instruction d, c- 5⃝). tory [54] as the fuzzing execution environment. The fuzzing State Synchronization. As discussed in §2.3, not all mis- execution environment initializes the processor and configures matched differences are bugs. According to our statistics in the environment, such as the available instruction extensions, Figure 10, these implementation differences are triggered with the address translation mode and page table, and the runtime high probability. Although MorFuzz can stop the simulation privilege level. During the simulation, the fuzzing execution in time at the mismatched instruction through online state environment is placed in a non-morphable physical area and verification, exiting the simulation means that the DUT loses is responsible for handling exceptions and interrupts with the the currently accumulated state, making it difficult for the highest privilege level. In addition to managing the execution fuzzer to penetrate the deep states of the processor. Therefore, environment, the fuzzing execution environment also provides MorFuzz proposes a synchronizable co-simulation approach interfaces to fuzz the system environment, e.g., we provide a to automatically eliminate implementation differences, allow- series of page table randomization functions to mutate page ing the DUT to synchronize its state to the reference model table entries and evict mapped pages. to sustain the simulation. For instance, suppose the instruc- tion c accesses a peripheral register, and the judgment stage 4.2 Processor Fuzzing fails because the simulator lacks a corresponding peripheral simulation. MorFuzz can determine whether the mismatched We use the starship SoC generator [57] to generate the test difference is legal by analyzing the accessed physical address harness for the processor under test, including the on-chip on the simulator. If legal, MorFuzz synchronizes the hard- interconnect system and the memory model that saves the ware state to the simulator ( 6⃝) and otherwise reports it as a compiled stimulus template. The hardware test harness of the potential bug ( 5⃝). In addition, MorFuzz can also synchro- DUT is implemented using about 2K lines of Chisel and 500 nize external events, such as interrupts, to the simulator. By lines of Verilog. And we extract the core logic of the official automatically synchronizing mismatched states, MorFuzz al- RISC-V ISA simulator, the spike [53], as the reference model lows the simulation to execute deeper rather than stopping to check the correctness of the DUT’s behavior. We use about prematurely due to false positives. 2.5K lines of C++ code to complete the morpher and the co-simulation framework. 4 Implementation Instruction Morphing. The morpher is implemented as soft- ware logic embedded in hardware. It uses the Verilog DPI interface to interact with the hardware, i.e., monitor the pro- In this section, we discuss several relevant implementation cessor’s internal state, hijack fetched instructions, and return details of MorFuzz. We first describe the stimulus template morphed instructions. The morpher performs field-aware mu- generator, followed by the fuzzing framework for processor tation on fetched instructions and only replaces the wires simulation and verification. The prototype we implemented between the fetch unit and the decode unit, which ensures is based on the RISC-V 64-bit architecture. that the morphed instructions keep the instruction fetch offset

consistency with the pipeline front-end and does not require • RQ 1. How effective is MorFuzz in discovering previ- modification of the pipeline back-end. Therefore the morpher ously unknown bugs in real-world processors? (§5.2). does not introduce unwanted effects. In addition, to ensure that the reference model can perform • RQ 2. How does MorFuzz perform compared with previ- the same morphing as the DUT, the morpher maintains a ous methods in exploring the states of processors? (§5.3) morphing map, using the instruction before morphing and its address as the key and the morphed instruction as the value. • RQ 3. Are the instructions generated by instruction mor- Thus instruction morphing does not introduce false positives, phing valid and diverse? (§5.4) and both models are always able to execute deterministic and • RQ 4. How do the instruction morphing and the state identical morphed instructions. synchronization contribute to the effectiveness of our Synchronization Prerequisite. We have strictly defined the fuzzer? (§5.5) rules to approve state synchronization. A difference must meet the following three prerequisites to be considered a legal difference. First, only instructions involving operations be- 5.1 Experimental Setup yond the verification scope are allowed to perform subsequent steps. This limits the types of instructions that are allowed to We conducted the experiments on a 48-core Intel Xeon Sil- trigger state synchronization to CSR instructions and memory ver 4214 processor with 256GB RAM. We ran each experi- operation instructions. Second, the control flow information ment for 24 hours and repeated the experiment five times. We of the DUT must pass the commitment stage check. If the fuzzed three popular processors in the RISC-V community to DUT incorrectly approves access to privileged registers or demonstrate that MorFuzz is compatible with different RISC- reserved address space, an exception will be thrown on the V microarchitectures. All processors are capable of booting simulator side due to insufficient permissions. MorFuzz will and running Linux, and the configurations of each processor prevent synchronization after observing a program counter are summarized in Table 1. violation during the commitment stage. Third, mismatched CVA6 [68] is an open source 64-bit in-order RISC-V pro- write-back values are limited to the CSR WARL fields de- cessor core written in SystemVerilog. Although its six-stage fined in the specification or the data reading from peripheral pipeline is single-issue, it has independent internal execution addresses outside the specification. With further fine-grained functional units. Thus it is able to commit multiple instruc- checks, MorFuzz can ensure that all synchronized differences tions simultaneously. It also has been taped out in 22nm tech- are out of our verification scope. nology and runs at up to 1.7GHz. Hardware Simulation. We use an industry-standard com- Rocket [1] is a five-stage, single-issue, in-order scalar pro- mercial tool, Synopsys VCS [14], to simulate hardware RTL cessor written in Chisel [2]. Rocket’s pipeline is ingeniously designs, but MorFuzz does not rely on features that are exclu- designed to support the delayed write-back, allowing the pro- sive to commercial tools. All hardware modules are translated cessor to commit long latency instructions without carrying to Verilog code and then compiled into a host executable write-back data. Rocket is the world’s first RISC-V processor binary through the Synopsys VCS RTL simulator. open sourced by UC Berkeley and is still actively support- Hardware Coverage Matrix. MorFuzz is compatible with ing new extensions (e.g., hypervisor [50] and cryptography the coverage matrices proposed by existing designs, and we [49]). Moreover, it has been taped out dozens of times and use the same control register coverage to facilitate compari- extensively verified by academia and industrial groups. son with DifuzzRTL [30]. The control register is the register BOOM [70] is the third generation of the Berkeley Out-of- whose value is used for any multiplexers’ select signal. We Order Machine (BOOM). It is an out-of-order superscalar implemented the same FIRRTL [31] pass to instrument all the processor also written in Chisel. Unlike the above in-order control registers. The instrumented circuits count the different cores, BOOM has a more sophisticated microarchitecture, states triggered in the module and sum up the count as the fi- and we used the triple-issue LargeBoom configuration for the nal coverage. The control register coverage is clock-sensitive experiment. The latest BOOM has been verified on FPGA and reflects the hardware state better than other coverage ma- and achieves better performance than its predecessor. trices. Note that the coverage is only used to evaluate the effect of inputs and mutations, and achieving high coverage Table 1: Summary of the cores used for evaluation. in the DUT does not mean that the design is bug-free. Feature CVA6 Rocket BOOM 5 Evaluation ISA RV64GC RV64GCHX RV64GCX Pipeline Stage 6 5 10 In this section, we evaluate the effectiveness of MorFuzz in Issue Order In-order In-order Out-of-order various aspects. In summary, we aim to answer the following Lines of code 24K 99K 339K four questions:

5.2 Bugs Found in Real-World Processors Bug B8. In the specification, sfence.vma has a zero rd field. During the evaluation, MorFuzz found 17 new bugs and two CVA6 considers illegal sfence.vma is valid when its rd field already known bugs in total. Our results demonstrate that Mor- is mutated to a non-zero value. Fuzz is capable of finding unknown bugs that are ignored by Bug B9. The CVA6 decoder behaves incorrectly when exe- previous extensive verification conducted by both academia cuting dret with a non-zero rd field, which should be zero and industrial groups. Moreover, we take responsible disclo- according to the specification. CVA6 handles this invalid sure. We report all bugs found to the community using the sug- dret as if it were a legal dret. gested channels and assist the developers in fixing 9 of them. Bug B10. For forward compatibility, implementations must ig- We also apply CVE identifiers for all newly discovered bugs, nore rd fields in fence.i/fence, and standard software must and 13 bugs are assigned with CVE numbers. Since MorFuzz clear them. When executing a non-standard fence.i/fence does not explicitly target security property violations, direct with a non-zero rd field, CVA6 throws an exception. exploitation of most discovered bugs is to launch denial-of- service attacks. For example, bug B10 prevents the processor 5.2.2 CSR State Related Bugs from executing crafted instructions correctly, the wrong type generated by bug B13 makes the kernel fail to handle excep- CSR state bugs require first setting the CSR to a specific state tions properly and triggering bug B18 shuts down the system. and then inducing the buggy behavior through instruction In general, it is difficult to evaluate the exact security impacts sequences. MorFuzz is able to generate instruction sequences of functional bugs without real-world exploitation scenarios. that meet the above requirements with the guidance of se- Recent attacks have shown that even faulty computation re- quence patterns. sults can compromise security isolation [5, 34, 43–45]. We Bug B2. In Rocket, the custom extension illegal signal incor- list all the bugs found by MorFuzz in Table 2 together with rectly uses vector extension status. Due to this bug, the valid their corresponding common weakness enumerations (CWEs) custom extension instruction may fail to execute. to show their potential security implications. Bug B3. vsstatus.xs field is writable in Rocket. The xs We also compare the average bug reproduction time in field summarizes the extension context status, and according Table 3. We select similar bugs reported by previous work to the specification, it is read-only. to highlight the efficiency improvement over the previous Bug B7. If we set the frm to DYN (or an invalid value), any processor fuzzer and instruction generator. In the case of bug floating-point instruction whose rm field is set to DYN should B7, MorFuzz triggers the problem significantly faster than raise an illegal instruction exception. Nonetheless, BOOM riscv-torture and DifuzzRTL. Additionally, since we included executes these instructions without raising an exception. binary-level mutations, we may effectively trigger bugs that Bug B11. When the mstatus.fs field is set to dirty, the previous methods failed to cover, such as B8. mstatus.sd field in CVA6 does not update immediately. This Next, we describe in detail the bugs found by MorFuzz. bug may cause the contents of the floating-point registers to Depending on the complexity, these bugs can be classified be lost during the context switch. into three categories: instruction decoder related, CSR state Bug B12. CVA6 writes the binary instruction of the ebreak related, and complex logic bugs. We identify these bugs with to the mtval/stval register when it executes an ebreak. the latest RISC-V ISA specification [62, 63]. According to the specification, mtval/stval should contain the faulting virtual address if it is written with a non-zero 5.2.1 Instruction Decoder Related Bugs value when a breakpoint exception occurs. And the ecall also for the same reason. The first category of bugs is decoder bugs caused by the rare Bug B18. Spike’s mcontrol.action component contains an corner format of a single instruction. Previous fuzzers mutate incorrect mask, which is 0x3f, while this field only has 4 bits inputs at the instruction level, only generating assembly in- width. If users attempt to set the sizelo field next to it, an structions with valid formats. MorFuzz performs binary-level illegal action will be saved, forcing the program simulation field-aware mutation, enabling more efficient exploration of to crash abruptly. unexpected instruction formats. Bug B1. According to the specification, the rcon field of 5.2.3 Complex Logic Bugs aes64ks1i should not greater than 0xA. When executing an aes64ks1i with rcon field greater than 0xA, Rocket does not The remaining bugs are not concentrated in specific hardware throw an illegal instruction exception. functional modules and require numerous instructions with Bug B6. By setting the rm field in the floating-point instruc- specific semantics to prepare a buggy environment, we col- tion, programmers can specify the rounding mode. BOOM lectively call them logic bugs. MorFuzz monitors the internal can execute floating-point instructions with illegal rm fields runtime states of the DUT to dynamically morph instructions (such as 5 or 6) without raising exceptions. and randomize operands, greatly enhancing the semantics of

                 Table 2: A list of bugs discovered by MorFuzz.

Processor    Bug Description                                                                 CVE/Issue ID      CWE  New Bug  Confirmed  Fixed
             B1: Treat aes64ksli with rcon greater than 0xA as valid                         CVE-2022-34632  CWE-327   ✓         ✓        ✓
Rocket       B2: Error in condition of the rocc_illegal signal                               Issue #2980     CWE-1281  ✓         ✓        ✓
             B3: The vsstatus.xs is writable                                                 CVE-2022-34627  CWE-732   ✓         ✓        ✓
             B4: Incorrect exception type when a PMA violation                               CVE-2022-34636  CWE-1202  ✓         ✓
BOOM         B5: Incorrect exception type when a PMP violation                               CVE-2022-34641  CWE-1198  ✓         ✓
             B6: Floating-point instruction with invalid rm field does not raise exception   Issue #458      CWE-391             ✓
             B7: Floating-point instruction with invalid frm does not raise exception        Issue #492      CWE-391             ✓
             B8: Crafted or incorrectly formatted sfence.vma instructions are executed       CVE-2022-34633  CWE-1242  ✓         ✓        ✓
             B9: Crafted or incorrectly formatted dret instructions are executed             CVE-2022-34634  CWE-1242  ✓         ✓        ✓
             B10: Non-standard fence instructions are treated as illegal                     CVE-2022-34639  CWE-1209  ✓         ✓        ✓
             B11: The mstatus.sd field does not update immediately                           CVE-2022-34635  CWE-1199  ✓         ✓
CVA6         B12: The value of mtval/stval after ecall/ebreak is incorrect                   CVE-2022-34640  CWE-755   ✓         ✓
             B13: Incorrect exception type when a PMA violation                              CVE-2022-34636  CWE-1202  ✓         ✓
             B14: Incorrect exception type when a PMP violation                              CVE-2022-34641  CWE-1198  ✓         ✓        ✓
             B15: Incorrect exception type when accessing an illegal virtual address         CVE-2022-34637  CWE-754   ✓         ✓
             B16: Improper physical PC truncate                                              Issue #901      CWE-222   ✓         ✓
             B17: Incorrect lr exception type                                                CVE-2022-37182  CWE-754   ✓         ✓
Spike        B18: The component mcontrol.action contains the incorrect mask                  CVE-2022-34642  CWE-787   ✓         ✓        ✓
             B19: Incorrect exception priotrity when accessing memory                        CVE-2022-34643  CWE-754   ✓         ✓        ✓


Table 3: Comparison of the average time to reproduce bugs.             MorFuzz               DifuzzRTL  riscv-dv   riscv-torture
                                                                      2.25M
Bug ID       riscv-tortureElapsed Time                                2.00M
                    DifuzzRTL  MorFuzz                                1.75M
  B7            118h  20.3h     10.4m                                 1.50M
                                                                      1.25M
  B8             ✗      ✗        6.5s                                 1.00M
✗ means failed to reproduce the bug.                                  0.75M
                                                                      0.50M
the input. For complex operations that are difficult to gener-        0.25M
ate randomly, like modifying the page table, MorFuzz can               0h                    4h  8h     12h 16h    20h 24h
                                                                                                 Time
use ecall with specific parameters to invoke the page table
randomization function in the fuzzing execution environment.      Figure 8: State coverage of MorFuzz and prior works on
Bug B4, B13. MorFuzz found that the exception type of the         Rocket over the 24-hour fuzzing time. The shaded area repre-
physical memory attribute (PMA) violation during the address      sents the 95% confidence interval.
translation is incorrect. The processor needs to raise an access-
fault exception corresponding to the original access type if
accessing PTE violates a PMA check. Both the exception            Bug B19. Spike implements the incorrect exception priority
types for BOOM and CVA6 are incorrect.                            when accessing memory. In the specification, the breakpoint
Bug B5, B14. We perform a store operation with a special vir-     exception has a higher priority than the address-misaligned
tual address whose non-leaf PTE is out of the physical mem-       and access-fault exceptions, which is the opposite of the
ory protection (PMP) range. BOOM and CVA6 implement               spike’s implementation.
the incorrect exception type when a PMP violation occurs.
Bug B15. Bits 63 to 39 of 64-bit virtual addresses in Sv39        5.3                        Exploring the State of Processors
must all equal bit 38. When accessing an address that does not
satisfy this requirement, CVA6 throws an access fault, while      We first demonstrate that MorFuzz can achieve higher state
according to the specification, it should be a page fault.        coverage than the state-of-the-art processor fuzzer. We choose
Bug B16. In CVA6, an implicit address truncation is applied       the currently available fuzzer, DifuzzRTL[30], for evaluation.
to any physical address access. Specifically, the highest 8       We also compare MorFuzz with traditional simulation-based
bits for instruction addresses and the highest 32 bits for data   dynamic verification methods. We select riscv-torture [48], a
addresses are ignored.
Bug B17. The exception type of failed lr instruction is incor-    Since the open-source implementation of DifuzzRTL uses a different
rect in CVA6. When we use lr to access a page that has not        RTL simulator, we replay all test cases generated by DifuzzRTL in our
yet been mapped, CVA6 throws a store page fault.                  environment. In this way, both fuzzers share the same evaluation environment,
                                                                  and the results are comparable.

Coverage

64                                    64                                             64
56                                    56                                             56
48                                    48                                             48
40                                    40                                             40
32                                    32                                             32
24                                    24                                             24
16                                    16                                             16
 8                                     8                                              8
 0                                     0                                              0
    0 25 50 75     100 125 150 175 199      0 25     50    75     100 125 150 175 199  0 25 50 75 100 125 150 175     199
              # opcodes                                      # opcodes                         # opcodes
            (a) MorFuzz                                  (b) DifuzzRTL        (c) riscv-dv

Figure 9: Instruction diversity of MorFuzz, DifuzzRTL and riscv-dv in one round of 24-hour fuzzing. The highlighted area
indicates the number of committed instructions with the corresponding write-back data.

random instruction generator widely used in the community,            insufficient to reflect the mutation quality. This may also ex-
and use the UVM-based risv-dv [26] to represent industrial            plain why DifuzzRTL performs better than riscv-torture in
solutions. These instruction generators do not use coverage           the first few hours, but as time grows, the fuzzer is gradu-
but hand-written constraints to guide random instruction gen-         ally misguided. In contrast, MorFuzz achieves an 86% testing
eration. The riscv-torture uses simple register analysis to           block execution rate with the help of instruction morphing and
generate instructions, and the riscv-dv uses manually crafted         state synchronization. As a result, MorFuzz is able to execute
test templates to generate high-quality test cases. We evaluate       inputs more thoroughly and mutate them more effectively,
the above verification approaches with default configurations         enabling efficient exploration of the processor’s state.
on the BaseConfig Rocket because other processors suffer
from pending bugs.                                                    5.4 Instruction Diversity
         Figure 8 presents the evaluation result, which indicates
that MorFuzz achieves higher coverage and better efficiency           To illustrate that instruction morphing generates valid and di-
than other methods. Since riscv-torture and riscv-dv use fixed        verse instruction streams, we visualize each committed valid
constraints and do not generate random control transfer in-           instruction and its write-back data during the fuzzing. We
structions, they generate deterministic inputs for a specific         assess the diversity of instructions in two dimensions: the
input space, and therefore their state coverage curves have           opcode (i.e., the function of the instruction) and the wdata
tighter confidence intervals. MorFuzz and DifuzzRTL have              (i.e., the result written back). The diversity of opcode indi-
larger fluctuations due to the randomly generated seeds and           cates the number of data paths a fuzzer can test, while the
the control transfer instructions. MorFuzz eventually explores        diversity of wdata suggests the test completeness for a spe-
4.4× more coverage than DifuzzRTL, 3.1× more coverage                 cific data path. We evaluate the instruction diversity of both
than riscv-torture, and 1.6× more coverage than riscv-dv.             MorFuzz, DifuzzRTL and riscv-dv. We plot the result as heat
Moreover, MorFuzz is far more efficient than the state-of-the-        maps (Figure 9), using the opcode as the x-axis, the logarithm
art processor fuzzer DifuzzRTL. DifuzzRTL takes 24 hours              of the wdata as the y-axis, and the brightness as the number
to reach 480K coverage, while MorFuzz obtains the same                of committed instructions corresponding to the opcode and
coverage in only bout 30 minutes. And MorFuzz uses about              wdata pair.
2.4 hours to achieve the coverage that riscv-dv takes 24 hours        Since not all instructions generate the full range of 64-bit
to complete. The remarkable result indicates that MorFuzz             write-back data, some areas in the heat map are always dark.
can explore processor states effectively and efficiently.             For example, the branch and store instructions do not have
  One interesting aspect is that ten years old riscv-torture out-     destination registers, and word operations never generate data
performs DifuzzRTL. To further highlight the impact of input          larger than 32 bits. Comparing these figures, we find that riscv-
control flow on mutation effectiveness (§2.3), we statistics          dv has the most bright areas (Figure 9c). Under manually
the test point execution rate of DifuzzRTL. To our surprise,          crafted constraints, riscv-dv can generate valid instructions
the result is only about 4%. Since DifuzzRTL blindly inserts          with uniformly distributed operands, representing the upper
control flow instructions and lacks exception handlers, most          bound on the quality of randomly generated instructions. On
inputs are not completely executed. Such a low execution              the contrary, DifuzzRTL has the least highlighted regions,
rate means that DifuzzRTL spends most of its time executing           indicating its limited input diversity (Figure 9b). Figure 9a
meaningless initialization functions and makes the coverage           suggests that MorFuzz is capable of generating more valid

log 2 (wdata+1)

    2.25M                                                          the synchronizable co-simulation framework to automatically
    2.00M                                                          eliminate the implementation differences between the DUT
    1.75M                                                          and the simulator.
    1.50M                                                          Restriction on Functional Bug. Currently, MorFuzz can
    1.25M                                                          only use the functional model provided by the simulator for
    1.00M                                                          state verification, such as memory and registers. For details
    0.75M                                                          not available in the simulator, such as caches and branch
    0.50M                                                          predictors, MorFuzz cannot directly verify the correctness of
    0.25M                        MorFuzz      MorFuzz*             their behavior and can only detect bugs when they propagate
                                 MorFuzz−
    0h     4h 8h 12h                   16h    20h     24h          into the architectural registers.
                 Time                                              State Sync vs. False Positive. State synchronization would
  Figure 10: State coverage of MorFuzz and its variants.           only eliminate legal differences in the architectural state with-
                                                                   out modifying implementation details. From the simulator’s
                                                                   view, it merely replaces the operands in the general-purpose
and diverse fuzzing inputs than DifuzzRTL, and the quality         registers with those of the DUT. If the DUT executes the syn-
of the generated instructions is comparable to that of riscv-dv.   chronized instruction again, it will still get a mismatch and
                                                                   then trigger state synchronization to eliminate it. Therefore,
                                                                   state synchronization itself does not introduce false positives.
5.5 Component Analysis                                             On the contrary, it eliminates false positives caused by im-
To measure the effect of each component of MorFuzz, we first       plementation differences at the architectural level in time,
create two variants of MorFuzz that disables part of its com-      allowing the fuzzer to explore more deep states.
ponents: (i) MorFuzz− disables instruction morphing, which         Complex Bug Pinpoint. First, MorFuzz still requires the user
means the DUT directly executes statically generated random        to dive deep into the specification and circuit to analyze the
instructions without morphing. (ii) MorFuzz∗ disables both         root cause to determine if the mismatch is a false positive
instruction morphing and state synchronization. This variant       caused by implementation differences or an actual bug. Sec-
means the fuzzer not only executes the static instructions but     ond, diverse instruction streams make MorFuzz more efficient
also terminates the simulation immediately when it detects         in terms of coverage while also making it difficult for users
a mismatched state. Similarly, we evaluate MorFuzz and its         to pinpoint the bug. Because the morphed instructions and
variants on the BaseConfig Rocket for 24 hours.                    control flow information generated by the morpher do not
  Figure 10 shows the results of the experiment. The gap           exist in the stimulus template, users need to save additional
in coverage between the two variants shows that false pos-         runtime information to assist in the analysis.
itives caused by the model implementation differences can          FPGA Emulations. RFUZZ [37] and DifuzzRTL [30] can
cause the DUT to terminate the simulation prematurely and          use FPGA to accelerate the simulation process by sacrificing
fail to execute the input comprehensively. MorFuzz− effec-         verification accuracy. MorFuzz uses the ISA simulator to co-
tively eliminates implementation differences through state         simulate with the DUT to provide more accurate verification.
synchronization, allowing the fuzzer to touch more deep states.    However, the ISA simulator is a software model that cannot be
And by comparing MorFuzz and MorFuzz−, MorFuzz signifi-            mapped directly onto FPGA, so MorFuzz can only simulate
cantly increases the speed of coverage growth with instruction     them via the RTL simulator.
morphing. The instruction morphing technique uses run-time         7 Related Work
contextual information to generate more diverse and mean-
ingful instruction streams, thus dramatically increasing the
effectiveness of the fuzzing.                                      In this section, we describe the existing hardware fuzzing
                                                                   works and introduce how MorFuzz differs from them, as sum-
                                                                   marized in Table 4.
6               Discussion and Limitations                           RFUZZ [37] introduced the concept of hardware fuzzing
                                                                   and first proposed a coverage-guided hardware fuzzing frame-
Requirement of ISA Simulator. MorFuzz and other proces-            work for general RTL designs. To match the various interfaces
sor fuzzing works [6, 30, 32, 33] use the ISA simulator as the     of the targets, RFUZZ generates input directly for the hard-
golden reference model to verify the behavior of processors.       ware ports at cycle granularity. Unlike RFUZZ, MorFuzz uses
Usually, there are many available simulators for different in-     a test harness to convert the compiled assembly programs into
struction set architectures, e.g., Bochs [38], QEMU [3], and       bus transactions, ensuring that the input’s hardware semantics
Dromajo [8]. One possible problem is that the implemen-            are legal, thereby more efficiently fuzzing processor designs.
tations between different simulators are variable and may          And several succeeding works [7, 39] extended RFUZZ to
lead to false positives. To mitigate this, MorFuzz proposes        improve performance by analyzing circuit information (e.g.,

Coverage

    Table 4: Comparison with the prior hardware fuzzers.

Fuzzer                 Fuzzing        Coverage           Mutation       Verification     Coverage      Performance  New Bugs
                        Target         Matrix           Dimension       Preknowledge    Comparison      Comparison
RFUZZ [37]           RTL designs    Multiplexer           Binary            N/A          Baseline        Baseline      0
DirectFuzz [7]       RTL designs    Multiplexer     Instance distance       N/A          Same to       2.23× faster    0
                                                                                          RFUZZ         than RFUZZ
Trippel et al. [58]  RTL designs      Software        Custom grammar        SVA        26.70% more         N/A         0
                                                                                        than RFUZZ
DifuzzRTL [30]        Processor   Control register     Instruction      Not required       N/A          40× faster     16
                                                                                                        than RFUZZ
Kabylkas et al. [32]  Processor         N/A                N/A          Not required       N/A             N/A         13
TheHuzz [33]          Processor  Hardware behavior  Instruction field   Not required    2.86% more     3.33× faster    8
                                                                                      than DifuzzRTL  than DifuzzRTL
                                                     Processor state,                   4.4× more       48× faster
MorFuzz               Processor   Control register  instruction field,  Not required  than DifuzzRTL  than DifuzzRTL   17
                                                     program semantic

module distance, symbolic execution) to optimize input. How- the traditional verification processes in the semiconductor ever, these efforts only focus on maximizing the hardware industry and the open-source community. coverage and do not give solutions to verifying hardware Kabylkas et al. [32] introduced the Logic Fuzzer, a small behavior, thus they are ineffective in finding bugs. piece of logic injected into the circuit to trigger atypical sce- Trippel et al. [58] use the famous software fuzzer AFL [67] narios. However, we consider that the Logic Fuzzer has a to fuzz the host-executable binary file generated by the RTL limited effect. First, the bugs triggered by the Logic Fuzzer simulator. Moreover, the authors use SystemVerilog assertion cannot be reproduced by software. Therefore the Logic Fuzzer (SVA) to check design violations. Unfortunately, SVA has the may violate the designer’s intent, e.g., a properly working following two drawbacks [24]. First, SVA requires prior man- Branch Target Buffer will not generate invalid branch ad- ual instrumentation by the developer. Thus it asserts known dresses. Second, the Logic Fuzzer can only work in specific bugs rather than exploring unknown bugs. Second, SVA can- hardware modules (e.g., FIFO, memory) that do not affect the not constrain the buggy behavior of complex processors well processor’s functionality. MorFuzz increases processor test because bugs usually result from multi-cycle actions. SVA pressure by injecting the morpher into the decoder, which is a has difficulty constraining the behavior of multiple modules general design, and all reported bugs are software triggerable. over multiple cycles. MorFuzz uses a co-simulation based dif- In addition to the above fuzzers for processor RTL code at ferential testing approach. By comparing the state differences the pre-silicon stage, there are also some fuzzers designed to between the DUT and the reference model after each instruc- detect undocumented instructions [18, 61] and hidden model- tion is executed, MorFuzz can accurately and automatically specific registers [20, 36] in manufactured processors to dis- identify potential bugs without any predefined assertions. close the hardware backdoors [19]. Since the custom exten- DifuzzRTL [30] and TheHuzz [33] are hardware fuzzing sions in different models are not identical, the differential frameworks exclusively for processors and are the most testing based MorFuzz is also able to detect these undocu- relevant works to MorFuzz. DifuzzRTL proposes a cycle- mented hidden features. During our evaluation, we did find sensitive control register coverage matrix, and TheHuzz uses some custom instructions [9] and their related bug (Bug B2). features provided by commercial EDA tools to capture more intrinsic hardware behaviors. As opposed to previous efforts 8 Conclusion that focus on designing fine-grained coverage matrix, Mor- Fuzz aims to verify processors more effectively and efficiently. First, MorFuzz designs the stimulus template to efficiently This paper proposed MorFuzz, a coverage-guided processor explore the input space from the processor state, instruction fuzzer that can detect software triggerable hardware bugs effi- field, and program semantics levels. Second, MorFuzz uses ciently. As opposed to prior fuzzers, MorFuzz uses instruction instruction morphing to dynamically mutate the template in- morphing to dynamically mutate instructions at runtime to structions. By collecting runtime information to generate generate diverse and meaningful inputs and efficiently guide meaningful instruction streams, MorFuzz significantly im- mutations. In addition, MorFuzz designs stimulus templates proves the effectiveness of fuzzing. Third, MorFuzz uses state to provide multi-level runtime mutation primitives and devel- synchronization to eliminate the implementation differences ops the synchronizable co-simulation framework to eliminate between the DUT and the reference model so that the simu- implementation differences. We evaluate MorFuzz on three lation can continue to execute. Thus MorFuzz can penetrate popular open-source RISC-V processors and achieve at most more deep states of the processor. Additionally, MorFuzz 4.4× and 1.6× more state coverage than the state-of-the-art does not rely on commercial tools and is also compatible with fuzzer, DifuzzRTL, and the famous constrained instruction

generator, riscv-dv, respectively. Moreover, MorFuzz discov-     In Proceedings of the 2016 ACM SIGSAC Conference on
 ered a total of 17 new bugs (with 13 CVEs assigned), demon-     Computer and Communications Security, pages 1032–

strating its effectiveness in detecting unknown bugs in real- 1043, 2016. world processors. [5] Pietro Borrello, Andreas Kogler, Martin Schwarzl, Acknowledgments Moritz Lipp, Daniel Gruss, and Michael Schwarz. {ÆPIC} leak: Architecturally leaking uninitialized data We thank all anonymous reviewers and our shepherd for their from the microarchitecture. In 31st USENIX Security valuable comments and suggestions, which significantly im- Symposium (USENIX Security 22), pages 3917–3934, proved this paper. We also appreciate the developers in the 2022. open-source RISC-V processor communities for their helpful [6] Niklas Bruns, Vladimir Herdt, Daniel Große, and Rolf responses to our bug reports. They are Andrew Waterman, Drechsler. Efficient cross-level processor verification Jerry Zhao and Jiuyang Liu from Rocket/BOOM, Florian using coverage-guided fuzzing. In Proceedings of the Zaruba and Guillaume Chauvon from CVA6, Scott Johnson Great Lakes Symposium on VLSI 2022, pages 97–103, and Tim Newsome from Spike. Additionally, we acknowl- 2022. edge Zhiheng He and Zhenxia Mo from Pengcheng Labora- tory and Kun Yang from Zhejiang University for sharing their [7] Sadullah Canakci, Leila Delshadtehrani, Furkan Eris, insights on the state of the industry. The authors are partially Michael Bedford Taylor, Manuel Egele, and Ajay Joshi. supported by the National Key R&D Program of China (No. Directfuzz: Automated test generation for rtl designs us- 2022YFE0113200), the National Natural Science Foundation ing directed graybox fuzzing. In 2021 58th ACM/IEEE of China (NSFC) under Grant U21A20464, as well as the Re- Design Automation Conference (DAC), pages 529–534. search Grants Council (Hong Kong) under Grants RFS2122- IEEE, 2021. 1S04, C2004-21GF, R1012-21, and R6021-20F. Any opinions, findings, and conclusions or recommendations expressed in [8] Chipsalliance. dromajo. https://github.com/ this material are those of the authors and do not necessarily chipsalliance/dromajo. reflect the views of funding agencies. [9] Chipsalliance. Non-standard opcode 0x30500073 References in rocket. https://github.com/chipsalliance/ rocket-chip/issues/1868.

    [1] Krste Asanovi´c, Rimas Avizienis, Jonathan Bachrach,       [10] AMD Corporation. Revision guide for amd family 10h
           Scott Beamer, David Biancolin, Christopher Celio,     processors revision 3.92. 2012.

Henry Cook, Daniel Dabbelt, John Hauser, Adam Izraelevitz, Sagar Karandikar, Ben Keller, Donggyu [11] Intel Corporation. Pentium processor specification up- Kim, John Koenig, Yunsup Lee, Eric Love, Martin Maas, date. 1999. Albert Magyar, Howard Mao, Miquel Moreto, Albert Ou, David A. Patterson, Brian Richards, Colin Schmidt, [12] Intel Corporation. 7th and 8th generation intel core pro- Stephen Twigg, Huy Vo, and Andrew Waterman. The cessor family specification update revision 011. 2019. rocket chip generator. Technical Report UCB/EECS- [13] Intel Corporation. 12th generation intel core processor 2016-17, EECS Department, University of California, specification update revision 008. 2022. Berkeley, Apr 2016. [2] Jonathan Bachrach, Huy Vo, Brian Richards, Yun- [14] Synopsys Corporation. Chronologic VCS Simula- sup Lee, Andrew Waterman, Rimas Avižienis, John tor. https://www.synopsys.com/verification/ Wawrzynek, and Krste Asanovi´c. Chisel: construct- simulation/vcs.html. ing hardware in a scala embedded language. In DAC [15] National Vulnerability Database. CVE-2012-0217. Design automation conference 2012, pages 1212–1221. https://cve.mitre.org/cgi-bin/cvename.cgi? IEEE, 2012. name=CVE-2012-0217. [3] Fabrice Bellard. Qemu, a fast and portable dynamic translator. In USENIX annual technical conference, [16] Ghada Dessouky, David Gens, Patrick Haney, Garrett FREENIX Track, volume 41, pages 10–5555. Califor-nia, Persyn, Arun Kanuparthi, Hareesh Khattri, Jason M USA, 2005. Fung, Ahmad-Reza Sadeghi, and Jeyavijayan Rajendran. {HardFails}: Insights into {Software-Exploitable} hard- [4] Marcel Böhme, Van-Thuan Pham, and Abhik Roychoud- ware bugs. In 28th USENIX Security Symposium hury. Coverage-based greybox fuzzing as markov chain. (USENIX Security 19), pages 213–230, 2019.

[17] C Domas. The memory sinkhole-unleashing an x86 de-   [30] Jaewon Hur, Suhwan Song, Dongup Kwon, Eunjin Baek,

sign flaw allowing universal privilege escalation. Black- Jangwoo Kim, and Byoungyoung Lee. Difuzzrtl: Dif- Hat, Las Vegas, USA, 2015. ferential fuzz testing to find cpu bugs. In 2021 IEEE [18] Christopher Domas. Breaking the x86 isa. Black Hat, Symposium on Security and Privacy (SP), pages 1286– 2017. 1303. IEEE, 2021. [19] Christopher Domas. Hardware backdoors in x86 cpus. [31] A. Izraelevitz, J. Koenig, P. Li, R. Lin, A. Wang, A. Mag- Black Hat, pages 1–14, 2018. yar, D. Kim, C. Schmidt, C. Markley, J. Lawson, and J. Bachrach. Reusability is firrtl ground: Hardware con- [20] Christopher Domas. The ring 0 façade: Awakening the struction languages, compiler frameworks, and transfor- processors inner demons. DEF CON, 2018. mations. In 2017 IEEE/ACM International Conference [21] Stephanie Drzevitzky. Proof-carrying hardware: Run- on Computer-Aided Design (ICCAD), pages 209–216, time formal verification for secure dynamic reconfig- Nov 2017. uration. In 2010 International Conference on Field [32] Nursultan Kabylkas, Tommy Thorn, Shreesha Srinath, Programmable Logic and Applications, pages 255–258. Polychronis Xekalakis, and Jose Renau. Effective IEEE, 2010. processor verification with logic fuzzer enhanced co- [22] Shai Fine and Avi Ziv. Coverage directed test genera- simulation. In MICRO-54: 54th Annual IEEE/ACM tion for functional verification using bayesian networks. International Symposium on Microarchitecture, pages In Proceedings of the 40th annual Design Automation 667–678, 2021. Conference, pages 286–291, 2003. [33] Rahul Kande, Addison Crump, Garrett Persyn, Patrick [23] Harry Foster. 2020 wilson research group functional Jauernig, Ahmad-Reza Sadeghi, Aakash Tyagi, and verification study: Ic/asic functional verification trend Jeyavijayan Rajendran. Thehuzz: Instruction fuzzing of report. Wilson Research Group and Mentor, A Siemens processors using golden-reference models for finding Business, White Paper, 2020. software-exploitable vulnerabilities. [24] Weimin Fu, Orlando Arias, Yier Jin, and Xiaolong Guo. [34] Zijo Kenjar, Tommaso Frassetto, David Gens, Michael Fuzzing hardware: Faith or reality? In 2021 IEEE/ACM Franz, and Ahmad-Reza Sadeghi. V0ltpwn: Attacking International Symposium on Nanoscale Architectures x86 processor integrity from software. In Proceedings (NANOARCH), pages 1–6. IEEE, 2021. of the 29th USENIX Conference on Security Symposium, pages 1445–1461, 2020. [25] Shuitao Gan, Chao Zhang, Xiaojun Qin, Xuwen Tu, Kang Li, Zhongyu Pei, and Zuoning Chen. Collafl: Path [35] Nathan Kitchen and Andreas Kuehlmann. Stimulus gen- sensitive fuzzing. In 2018 IEEE Symposium on Security eration for constrained random simulation. In 2007 and Privacy (SP), pages 679–696. IEEE, 2018. IEEE/ACM International Conference on Computer- Aided Design, pages 258–265. IEEE, 2007. [26] Google. riscv-dv. https://github.com/google/ riscv-dv. [36] Andreas Kogler, Daniel Weber, Martin Haubenwallner, Moritz Lipp, Daniel Gruss, and Michael Schwarz. Find- [27] Xiaolong Guo, Raj Gautam Dutta, Yier Jin, Farimah ing and exploiting cpu features using msr templating. In Farahmandi, and Prabhat Mishra. Pre-silicon security 2022 IEEE Symposium on Security and Privacy (SP), verification and validation: A formal perspective. In pages 1474–1490. IEEE, 2022. Proceedings of the 52nd annual design automation con- ference, pages 1–6, 2015. [37] Kevin Laeufer, Jack Koenig, Donggyu Kim, Jonathan [28] Finn Haedicke, Hoang M Le, Daniel Große, and Rolf Bachrach, and Koushik Sen. Rfuzz: Coverage-directed Drechsler. Crave: An advanced constrained random ver- fuzz testing of rtl on fpgas. In 2018 IEEE/ACM Interna- ification environment for systemc. In 2012 International tional Conference on Computer-Aided Design (ICCAD), Symposium on System on Chip (SoC), pages 1–7. IEEE, pages 1–8. IEEE, 2018. 2012. [38] Kevin P Lawton. Bochs: A portable pc emulator for [29] Wookhyun Han, Byunggill Joe, Byoungyoung Lee, unix/x. Linux Journal, 1996(29es):7–es, 1996. Chengyu Song, and Insik Shin. Enhancing memory error [39] Tun Li, Hongji Zou, Dan Luo, and Wanxia Qu. Symbolic detection for large-scale applications and fuzz testing. simulation enhanced coverage-directed fuzz testing of In Network and Distributed Systems Security (NDSS) rtl design. In 2021 IEEE International Symposium on Symposium 2018, 2018. Circuits and Systems (ISCAS), pages 1–5. IEEE, 2021.

[40] Lingyi Liu and Shabha Vasudevan. Star: Generating [52] Wilson Snyder. Verilator. https://github.com/ input vectors for design validation by static analysis verilator/verilator. of rtl. In 2009 IEEE International High Level Design Validation and Test Workshop, pages 32–37. IEEE, 2009. [53] RISC-V Software. riscv-isa-sim. https://github. com/riscv-software-src/riscv-isa-sim. [41] Ashok B Mehta. Constrained random verification (crv). In ASIC/SoC Functional Design Verification, pages 65– [54] RISC-V Software. riscv-tests. https://github.com/ 74. Springer, 2018. riscv-software-src/riscv-tests. [42] Daniel Moghimi, Moritz Lipp, Berk Sunar, and Michael [55] Dokyung Song, Felicitas Hetzelt, Dipanjan Das, Chad Schwarz. Medusa: Microarchitectural data leakage via Spensky, Yeoul Na, Stijn Volckaert, Giovanni Vigna, automated attack synthesis. In Proceeding of the 29th Christopher Kruegel, Jean-Pierre Seifert, and Michael USENIX Security Symposium, 2020. Franz. Periscope: An effective probing and fuzzing [43] Kit Murdock, David Oswald, Flavio D Garcia, framework for the hardware-os boundary. In 2019 Jo Van Bulck, Daniel Gruss, and Frank Piessens. Network and Distributed Systems Security Symposium Plundervolt: Software-based fault injection attacks (NDSS), pages 1–15. Internet Society, 2019. against intel sgx. In 2020 IEEE Symposium on Security [56] Giovanni Squillero. Microgp—an evolutionary assem- and Privacy (SP), pages 1466–1482. IEEE, 2020. bly program generator. Genetic programming and evolv- [44] Shisong Qin, Chao Zhang, Kaixiang Chen, and Zheming able machines, 6(3):247–263, 2005. Li. idev: exploring and exploiting semantic deviations in [57] Sycuricon. Starship soc generator. https://github. arm instruction processing. In Proceedings of the 30th com/sycuricon/starship. ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 580–592, 2021. [58] Timothy Trippel, Kang G. Shin, Alex Chernyakhovsky, [45] Pengfei Qiu, Dongsheng Wang, Yongqiang Lyu, and Garret Kelly, Dominic Rizzo, and Matthew Hicks. Gang Qu. Voltjockey: Breaking sgx by software- Fuzzing hardware like software. In 31st USENIX Se- controlled voltage-induced hardware faults. In 2019 curity Symposium (USENIX Security 22), pages 3237– Asian Hardware Oriented Security and Trust Sympo- 3254. USENIX Association, 2022. sium (AsianHOST), pages 1–6. IEEE, 2019. [59] Ilya Wagner, Valeria Bertacco, and Todd Austin. [46] Jeyavijayan Rajendran, Vivekananda Vedula, and Stresstest: an automatic approach to test generation via Ramesh Karri. Detecting malicious modifications of activity monitors. In Proceedings of the 42nd annual data in third-party intellectual property cores. In 2015 Design Automation Conference, pages 783–788, 2005. 52nd ACM/EDAC/IEEE Design Automation Conference [60] Fanchao Wang, Hanbin Zhu, Pranjay Popli, Yao Xiao, (DAC), pages 1–6, 2015. Paul Bodgan, and Shahin Nazarian. Accelerating cover- [47] Sanjay Rawat, Vivek Jain, Ashish Kumar, Lucian Co- age directed test generation for functional verification: jocar, Cristiano Giuffrida, and Herbert Bos. Vuzzer: A neural network-based framework. In Proceedings of Application-aware evolutionary fuzzing. In NDSS, vol- the 2018 on Great Lakes Symposium on VLSI, pages ume 17, pages 1–14, 2017. 207–212, 2018. [48] UC Berkeley Architecture Research. riscv-torture. [61] Guang Wang, Ziyuan Zhu, Shuan Li, Xu Cheng, and https://github.com/ucb-bar/riscv-torture. Dan Meng. Differential testing of x86 instruction de- coders with instruction operand inferring algorithm. In [49] RISC-V. riscv-crypto. https://github.com/riscv/ 2021 IEEE 39th International Conference on Computer riscv-crypto. Design (ICCD), pages 196–203. IEEE, 2021. [50] Bruno Sá, José Martins, and Sandro Pinto. A first look [62] Andrew Waterman, Krste Asanovic, and CS Division. at risc-v virtualization from an embedded systems per- The RISC-V instruction set manual volume I: Unprivi- spective. IEEE Transactions on Computers, 71(9):2177– leged isa. 2190, 2021. [51] Sergej Schumilo, Cornelius Aschermann, Robert [63] Andrew Waterman, Krste Asanovic, John Hauser, and Gawlik, Sebastian Schinzel, and Thorsten Holz. CS Division. The RISC-V instruction set manual vol- {kAFL}:{Hardware-Assisted} feedback fuzzing for ume II: Privileged architecture. {OS} kernels. In 26th USENIX Security Symposium [64] Daniel Weber, Ahmad Ibrahim, Hamed Nemati, Michael (USENIX Security 17), pages 167–182, 2017. Schwarz, and Christian Rossow. Osiris: Automated

discovery of microarchitectural side channels. In 30th USENIX Security Symposium (USENIX Security’21), pages 1–18, 2021. [65] Stephen Williams. Icarus verilog. https://github. com/steveicarus/iverilog. [66] Jun Yuan, Carl Pixley, Adnan Aziz, and Ken Albin. A framework for constrained functional verification. In ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No. 03CH37486), pages 142– 145. IEEE, 2003. [67] Michal Zalewski. American fuzzy lop. https:// lcamtuf.coredump.cx/afl. [68] F. Zaruba and L. Benini. The cost of application-class processing: Energy and performance analysis of a linux- ready 1.7-ghz 64-bit risc-v core in 22-nm fdsoi technol- ogy. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 27(11):2629–2640, Nov 2019. [69] Rui Zhang, Calvin Deutschbein, Peng Huang, and Cyn- thia Sturton. End-to-end automated exploit generation for validating the security of processor designs. In 2018 51st Annual IEEE/ACM International Symposium on Mi- croarchitecture (MICRO), pages 815–827. IEEE, 2018. [70] Jerry Zhao, Ben Korpan, Abraham Gonzalez, and Krste Asanovic. Sonicboom: The 3rd generation berkeley out-of-order machine. May 2020. [71] Yanhong Zhou, Tiancheng Wang, Huawei Li, Tao Lv, and Xiaowei Li. Functional test generation for hard-to- reach states using path constraint solving. IEEE Transac- tions on Computer-Aided Design of Integrated Circuits and Systems, 35(6):999–1011, 2015.