SOURCE ARCHIVE

SHA256: c3a0cca11071e73ff9565def0bb5b6d491bb15ed71a47354577dca65b5b94dfe

URL: https://www.usenix.org/system/files/sec22fall_kande.pdf

TYPE: application/pdf

SIZE: 633.4 KB

FETCHED: 6/14/2026, 10:50:28 AM

EXTRACTOR: liteparse

CHARS: 119,296

EXTRACTED CONTENT

119,296 chars

TheHuzz: Instruction Fuzzing of Processors Using Golden-Reference Models for Finding Software-Exploitable Vulnerabilities

Rahul Kande†, Addison Crump†, Garrett Persyn†, Patrick Jauernig∗, Ahmad-Reza Sadeghi∗, Aakash Tyagi†, and Jeyavijayan Rajendran† †Texas A&M University, USA, ∗Technische Universität Darmstadt, Germany †{rahulkande, addisoncrump, gpersyn, tyagi, jv.rajendran}@tamu.edu, ∗{patrick.jauernig, ahmad.sadeghi}@trust.tu-darmstadt.de

    Abstract plementation which can lead to errors and exploitation at-

The increasing complexity of modern processors poses many tacks with fatal consequences. Hardware vulnerabilities range challenges to existing hardware verification tools and method- from functional bugs (e.g., [37]) to emerging security-critical ologies for detecting security-critical bugs. Recent attacks on vulnerabilities that have been uncovered and exploited (e.g., processors have shown the fatal consequences of uncovering [36],[45]), and both affect commodity processors and their and exploiting hardware vulnerabilities. dedicated security extensions (e.g., [11], [81]). The hardware Fuzzing has emerged as a promising technique for detecting common weakness enumeration (CWE) lists numerous hard- software vulnerabilities. Recently, a few hardware fuzzing ware vulnerabilities whose impact spans not only the hard- techniques have been proposed. However, they suffer from ware but also software [48]. It is crucial to discover hardware several limitations, including non-applicability to commonly- vulnerabilities in the early stages of the design cycle. used hardware description languages (HDLs) like Verilog Various hardware vulnerability detection techniques and and VHDL, the need for significant human intervention, and tools have been proposed or developed by both academia and inability to capture many intrinsic hardware behaviors, such industry, such as formal verification [10, 74, 68, 6, 55, 85, as signal transitions and floating wires. 14, 13, 61], run-time detection [27, 64, 84], information flow In this paper, we present the design and implementation of tracking [78, 2, 43, 42, 92], and the recent efforts towards a novel hardware fuzzer, TheHuzz, that overcomes the afore- fuzzing hardware [51, 79, 39, 30], which is our focus. mentioned limitations and significantly improves the state of While formal verification tools can efficiently find bugs in the art. We analyze the intrinsic behaviors of hardware designs smaller designs, they are unable to cope with the increasing in HDLs and then measure the coverage metrics that model complexity of modern, large designs and are becoming less such behaviors. TheHuzz generates assembly-level instruc- efficient in detecting bugs, especially security vulnerabilities tions to increase the desired coverage values, thereby finding [14, 47, 89, 12, 17]. One particular reason is that these tools many hardware bugs exploitable from software. We evaluate rely heavily on human expertise to engineer or specify “attack TheHuzz on four popular open-source processors and achieve scenarios” for verification. For instance, the popular industrial 1.98× and 3.33× the speed compared to the industry-standard formal verification tool, Cadence’s JasperGold [10], has been random regression approach and the state-of-the-art hardware evaluated against a crowd-sourced vulnerability detection ef- fuzzer, DifuzzRTL, respectively. Using TheHuzz, we detected fort from 54 competing teams participating in a hardware 11 bugs in these processors, including 8 new bugs, and we capture-the-flag competition [14]. The results were based demonstrate exploits using the detected bugs. We also show on security bugs mimicking real-world common vulnerabili- that TheHuzz overcomes the limitations of formal verifica- ties and exposures (CVEs) [49]. While JasperGold detected tion tools from the semiconductor industry by comparing its 48% of the bugs, manual inspection with simulation detected findings to those discovered by the Cadence JasperGold tool. 61% of the bugs, highlighting issues like state explosion and scalability of the existing techniques, amongst others. Another approach to find hardware security bugs is run- 1 Introduction time detection techniques, which hardcode assertions in hard- ware to check security violations at runtime [27, 64, 84]. How- Modern processors are becoming increasingly complex with ever, these techniques detect bugs only post-fabrication and sophisticated functional and security mechanisms and exten- unlike software, hardware is not easily patchable. sions. This development, however, increases the chance of Information-flow tracking (IFT) techniques analyze the introducing vulnerabilities into the hardware design and im- hardware to detect security vulnerabilities by labeling all the

input signals and propagating this label throughout the de- sign to identify information leakage or tampering of critical data [78, 2, 43, 42, 92]. Although IFT can analyze designs with several thousand lines of code, the labels often get pol- luted with unwanted signals, resulting in many false positives. The initial labels have to be assigned manually, which can be error-prone, and require expert knowledge of the design. Hence, there is an increasing need for methodologies and tools to detect hardware vulnerabilities that are scalable to large and complex designs, highly automatic, effective and efficient in detecting security-critical vulnerabilities that are exploitable (and not just only functional bugs), compatible with existing chip design and verification flows, applicable to different hardware models (register-transfer level, gate- level, transistor-level, taped-out chip), and account for dif- ferent hardware behaviors (signal transitions, finite-state ma- chines (FSMs), and floating wires). A promising technique extensively used for software vul- nerability detection is fuzzing. Fuzzing uses random gener- ation of test cases to detect invalid states in the target[46]. While it seems natural to apply or extend a software fuzzer to detect security bugs in hardware [79, 51], such approaches do not capture hardware-intrinsic behaviors, for instance, signal transitions of wires, FSMs, and floating wires, defined in hard- ware description languages (HDLs) like Verilog and VHDL. We will discuss these challenges in Section 3. So far, there have been a few proposals towards fuzzing hardware [39, 51, 79, 30]. However, as we elaborate in Sec- tion 7, they suffer from various limitations: lack of support for commonly-used HDLs such as VHDL and Verilog [39] or only partially supporting their constructs [79], strong reliance on human intervention [51], and the inherent inability of capturing many hardware behaviors, including transitioning of logical values in wires and of floating wires [30]. Our goals and contributions. We present the design and im- plementation of a novel hardware fuzzer, TheHuzz. It tackles the challenges of building a hardware fuzzer (cf. Section 3) and addresses the aforementioned shortcomings of the current hardware fuzzing proposals (cf. Section 4). We analyze the intrinsic behaviors of hardware designs and describe appro- priate coverage metrics of the HDL to capture such behaviors. Given the importance of software-exploitable hardware vul- nerabilities [14, 47, 89, 12, 17], TheHuzz fuzzes the target hardware by testing instruction sequences, thereby discover- ing security bugs that are exploitable by the software code which executes such instruction sequences. Through a built- in optimizer, TheHuzz can select the best instructions and mutation techniques to achieve the best coverage. TheHuzz (i) supports commonly-used HDLs like Verilog and VHDL, (ii) is compatible with conventional industry- standard IC design and verification flow, (iii) detects software- exploitable hardware vulnerabilities, (iv) accounts for differ- ent hardware behaviors, (v) does not require knowledge of the design, (vi) is scalable to large-scale designs, and (vii) does not need human intervention. In summary, our main contributions are: • We present a novel hardware fuzzer, TheHuzz, (Section 4), which uses coverage metrics that capture a wide variety of hardware behaviors, including signal transitions, floating wires, multiplexers, along with combinational and sequen- tial logic. TheHuzz optimizes the selection of the best in- structions and mutation techniques and can achieve high coverage rates (cf. Section 4.4). Our fuzzer achieves 1.98× and 3.33× the speed compared to the industry-standard ran- dom regression approach and the state-of-the-art hardware fuzzer, DifuzzRTL, respectively (cf. Section 6.4). • We extensively evaluate our fuzzer, TheHuzz, on four well- known and complex real-world open-source processor de- signs from two different open-source instruction set ar- chitectures (ISAs): (i) or1200 processor (OpenRISC ISA), (ii) mor1kx processor (OpenRISC ISA), (iii) Ariane proces- sor (RISC-V ISA), and (iv) Rocket Core (RISC-V ISA). All these processors can run Linux-based operating systems and are used in multiple hardware verification research studies [14, 27, 94, 93]. • TheHuzz found 11 bugs that are software exploitable in four different processors; eight of them are new bugs. We also showcase two attacks from unprivileged software exploiting vulnerabilities found by TheHuzz (cf. Section 6.2). • We perform an investigation of the bugs detected by The- Huzz using a leading formal verification tool, Cadence’s JasperGold [10] (cf. Section 6.5). TheHuzz overcomes the limitations of JasperGold: state explosion, intensive resource consumption, reliance upon error-prone human ex- pertise, and a requirement of prior knowledge of hardware vulnerabilities or security properties. • To foster research in the area of hardware fuzzing, we plan to open-source the code of TheHuzz to provide the commu- nity a framework to build upon. 2 Background The growing number of attacks that exploit hardware vulnera- bilities from software [37, 36, 45, 59, 52, 82, 76, 60, 34, 11, 81] call for new and effective hardware vulnerability detection techniques that address the limitations of existing methods and tools, such as state-space explosion, modeling hardware- software interactions, and the need for manual analysis. 2.1 Fuzzing Fuzzing techniques are shown to be highly effective in detect- ing software vulnerabilities [75, 40, 46, 67, 23, 16, 22, 87]. Fuzzing generates test inputs and simulates the target design to detect vulnerabilities in it. The inputs are generated by mutating the previous inputs, which are generated from seeds. Mutation techniques modify the input by performing pre- defined operations, including bit-flip, clone, and swap. The

mutation process also generates invalid inputs, testing the flows. GRMs are often written at a higher abstraction level design outside the specification. In the past, fuzzers were cre- (e.g., for RTL, the GRM is a software model of the hardware). ated specifically to target different kinds of software: binary Verification techniques usually compare the outputs of RTL targets [40], JIT compilers [23], web applications [16], and and the GRM to find any mismatches, which will reveal the operating systems [22]. Thus, specialized fuzzers conform bugs. The accuracy of these techniques is further increased to the needs of each target type. Fuzzers have seen use from by comparing not only the final outputs but also the values of both independent researchers and organizations as an addi- intermediate registers and by performing comparisons after tional verification step, most notably that of Google’s OSS every clock cycle. They perform similar tests on the gate- Fuzz [67], which actively fuzzes a plethora of software on level design and the fabricated chip; for these models, the their ClusterFuzz platform [20]. Fuzzers are highly successful adjacent abstraction level acts as the GRM. Similarly, post- in detecting software vulnerabilities as they are automated, manufacturing, testing of the fabricated chips is performed to are scalable to large codebases, do not require the knowledge weed out the faulty chips. of the underlying system, and are highly efficient in detecting When the architecture of the chip is designed, the security many security vulnerabilities. team concurrently identifies the threat model, security fea- Unfortunately, comparable approaches for hardware tures, and assets. During the design phase, the security team fuzzing are still in their infancy. Hardware-specific behav- performs security testing, starting with the RTL model via iors pose several challenges to the design of hardware fuzzers, simulation and emulation, formal verification, and manual which we present in this section. However, before we consider review of RTL code. Post-deployment, the security engineers the natural question of why one cannot trivially adopt the ad- provide support and patch any bugs, if possible. vances of software fuzzers for hardware, we briefly explain the typical hardware (security) development life cycle. 3 Challenges of Hardware Fuzzing 2.2 Hardware Development Lifecycle In this section, we outline the challenges that arise when an- The hardware development lifecycle [26, 7, 83, 50] typically alyzing hardware using fuzzing. We first elaborate on the begins with a design exploration driven by the market segment problems that one encounters when deploying existing soft- served by the product. Architects then engineer the optimal ware fuzzers to analyze hardware. Then we discuss challenges architecture while trading off among performance, area, and that need to be tackled when designing and implementing a power, and the associated microarchitectural features. De- dedicated hardware fuzzer. signers implement all the microarchitectural modules using hardware description languages (HDLs), which are usually 3.1 Fuzzing Hardware with Software Fuzzers written at the register-transfer level (RTL). To this end, popu- lar HDLs like Verilog and VHDL are used to describe com- There are two ways to fuzz hardware with software fuzzers: plex hardware structures such as buses, controllers as finite (i) using software fuzzers directly on the hardware, and state machines (FSMs), queues, and datapath units like adders, (ii) fuzzing hardware as software. However, both approaches multipliers, etc. Electronic design automation (EDA) tools face several limitations. synthesize the RTL models into gate-level designs, which Problems with using software fuzzers directly on hard- realize the hardware using Boolean gates, multiplexers, wires, ware. First, software fuzzers rely on a different behavior in and state elements like flip-flops. EDA tools then synthesize vulnerability detection. They rely on software abstractions the gate-level design into transistor-level and eventually to to find a bug by using the operating system or instrumenting layout, which is then sent to the foundry for manufacturing. software to monitor failure detection [54, 66]. Most soft- Most of the design effort and time spent by designers goes ware fuzzers use crashes to detect bugs, but hardware does into manually writing HDLs at the RTL as the rest of the steps not crash [79]. Thus, hardware fuzzers need to find their are highly automated. Unfortunately, writing HDL at the RTL equivalent of crashes and memory leaks. Second, hardware is error-prone [7, 83, 50] . Thus, the verification team checks simulations are slow. Typically, given a function, executing if the design at its various stages meets the required specifi- its software equivalent is faster than simulating its hardware cation or not using functional, formal, and simulation-based model. Parallelization of hardware simulation is difficult due tools; if the design does not meet the specification, the de- to the complex interdependencies in the hardware design [12]. signers patch the bugs, and the process is repeated until the Third, many software fuzzers rely on instrumentation of the design passes the verification tests. To this end, companies software program to obtain feedback (e.g., AFL [40]) and typically develop a golden reference model (GRM)* for in- use custom compilers (e.g., afl-gcc) to instrument the code dustry designs to be used with the conventional verification [29, 19, 44, 40], but these compilers will not be able to in- *The GRM for hardware is similar to a test oracle in software which strument the hardware designs since they do not support helps verify the result of a program’s execution [28]. HDLs such as Verilog and VHDL. One of the prior works,

RFUZZ [39] made the first attempt towards solving this chal- hardware design fundamentally differs from any software pro- lenge: it uses hardware simulators to compile the hardware gram in terms of inputs, language used, feedback information and applies a modified version of software fuzzer, AFL [40] available, and design complexity. Also, designing a hardware to fuzz the hardware. However, this fuzzer is limited in terms fuzzer has its own set of unique challenges, which are pre- of the scalability [30] and coverage (cf. Appendix B). sented below. Multiple attempts have been made in the recent Problems with fuzzing hardware as software. Another strat- past towards building hardware fuzzers [39, 30, 51, 79] where egy of fuzzing hardware using software fuzzers is to convert a each of these challenges are approached differently. HDL model into an equivalent software model using tools like Input generation. For a hardware fuzzer to be efficient and Verilator [70], and then apply software fuzzers to the resultant effective, it should generate inputs in the format expected by software code [79]. Unfortunately, converting hardware into the target processor. Directly applying the input-generation software models poses its own set of challenges. techniques used in software fuzzing is impossible as the in- First, applying existing software fuzzers on software mod- put formats differ: while many software fuzzers take input els of hardware designs is, in general, inefficient. The software files or a set of values assigned to a variable, the input to models of hardware designs need to account for properties hardware is mostly continuous without a defined length [79]. unique to the working of the hardware, like computing all Further, inputs to hardware can be generated at various hard- the register values for every clock cycle and bit manipula- ware abstraction levels: architecture level, register-transfer tion operations, and components such as controllers, system level (RTL), gate level, and transistor level. Each level also has bus, and queues—which makes the model computationally its own input representation, ranging from transaction packets, expensive. Moreover, software fuzzers use program crashes over continuous-time digital signals, to continuous-time ana- and instrumented memory safety checks to detect bugs in an log signals. Hence, the major challenges in input generation application; these concepts cannot be trivially applied to hard- are to determine the suitable abstraction level to fuzz and the ware [79]. Instead, a well-defined specification to compare input representation that maximizes the efficiency in finding against is needed to detect incorrect logic implementations, vulnerabilities [12, 50, 51, 39, 30, 79] . timing violations, and unintended data flow or control flow. Another important aspect is the continuous nature of the Second, inferring actual hardware coverage from the gener- hardware since it changes its state with every input (and/or ated software model is difficult. While software and hardware time). Also, multiple FSMs can run in parallel, and one or line and edge/block coverage are comparable in some in- more of them could enter in deadlock states, preventing the stances [79], other forms of coverage may not be. A relatively hardware from receiving inputs from the fuzzer [12]. For simple operation in a HDL, like bit manipulation, may be instance, a password checking module could be designed to significantly more complex in software. Conversely, a more lock itself forever after one incorrect password entry unless complex component in HDL, such as a multiplexer, could be the system is reset. Hence, another crucial challenge is to represented by a simple switch statement in software. Thus, identify situations where the hardware simulation should be one has to account for the effects of conversion. stopped or reset before applying new inputs. Third, the hardware community has developed its own stan- Finally, similar to how software fuzzers like syzkaller [22] dards, processes, and flows for using verification methodolo- encode functional dependencies (e.g., of system calls), hard- gies and tools over several decades of research [50, 7, 83]. ware modules often need to be initialized to enable the fuzzer Any new approach has to be compatible with the hardware to test further functionality, e.g., an AES encryption mod- verification flow, as these methodologies have specialized data ule needs to be initialized with the key size and encryption structures and algorithms geared towards hardware models mode before testing the actual encryption with plaintext and and behaviors. key. Inferring these functional dependencies is highly chal- An open-source approach to solve the many challenges of lenging, as such information is usually only available with a fuzzing the software model of hardware is performed in [79]. well-defined formal specification [51, 79]. This technique derives equivalences between the coverage Feedback mechanism. Exploring complex targets, espe- metrics (e.g., line and FSM) used in hardware to that of soft- cially hardware, often forces fuzzers to generate tremendous ware (e.g., line and edge). While this approach is promising, it amounts of inputs, while making decisions like which muta- does not scale to complex designs such as processors, which tion technique to use, when to stop mutating an input, and how is the focus of this work (cf. Section 7). to generate the seed inputs repeatedly. Rather than relying on randomly-generated inputs alone, a more efficient way is to 3.2 Creating a Hardware Fuzzer analyze the impact of these parameters on the target processor and adapt input generation accordingly as done in feedback- A hardware fuzzer needs to take into account the nature and guided fuzzing [65, 41]. Prior works [39, 30] addressed this requirements of hardware to improve efficiency. For example, challenge using hardware-friendly coverage metrics but fail Syzkaller [22], which specializes in kernel-fuzzing, incorpo- to capture many hardware behaviors (cf. Appendix B). rates system call signatures to generate better test cases. A Adapting software feedback mechanisms to hardware is

                                                                         Listing 1: Chisel code of the case study.

Seeds Input Golden 36 // combinational logic for vld register Input Input Processor Processor Processor Golden 37 vld :=debug_en |(flush |en) // Bug b2 001110 reference 001110 101110 database 101110 database 100010 database reference 100010 011010 100101 011010 model 38 100101 101101 101101 model 39 // select signal for mux 40 val sel1 =Wire(Bool()) Instruction Mutation Feedback 41 sel1 :=((pass ===ipass) |debug_en) // Bug b1 generator engine engine 42 43 // flush logic 1110 Comparator 44 val state_f =Wire(UInt(3.W)) add 001110 0111110 r1,r2,r5 001110 10 45 10 1010 mul r3, 100010 010 r6, r3 100010 00010 beq r5, 100101 0001000 when (flush &en){ r7, r9 100101 46 state_f :=FLUSH Seed generator Stimulus generator Bug detection 47 } otherwise { 48 state_f :=state Figure 1: Framework of TheHuzz. 49 } 50 51 state :=(!sel1 &state_f) |(sel1 &D_READ) difficult due to the differences in execution/simulation for soft- ware and hardware [39, 30, 51, 79]. Instrumentation needs which returns coverage feedback to the stimulus generator to be added to the hardware design such that the activities and trace information for bug detection. Finally, the bug de- of different combinational and sequential structures, which tection mechanism compares the RTL simulation trace with are critical to the functionality of the hardware, can be traced. that of a golden reference model (GRM) to find differences Although feedback-guided fuzzers have more potential to ex- in execution, and hence, find bugs. plore complex targets, capturing, analyzing, and processing In the following, before we explain the modules of The- the feedback data is challenging [65, 41]. This issue will be- Huzz, we first analyze the intrinsic behaviors of designs at come more profound in hardware since hardware designs are the RTL, as TheHuzz targets such behaviors, and describe the slower to simulate. One way to speedup hardware fuzzing is to coverage metrics that capture those behaviors. Then, we de- use FPGA emulation, but instrumenting a design on an FPGA scribe the seed generator and stimulus generator of our fuzzer is challenging [39, 30]. Hence, the feedback mechanism needs in detail and how they interact. Finally, we detail how we to capture the complex characteristics of hardware. optimize the mutation engine and how the bugs are detected. Lastly, the performance of a fuzzer needs to be evaluated on hardware designs comparable to what is used in practice. 4.1 Hardware Design and Coverage Metrics However, unlike with software, commercial hardware designs like Intel’s x86 processors do not have their source code Hardware designs at RTL consist of combinational and se- available. Hence, a key challenge is to find openly-available quential logic. Combinational logic is a time-independent designs that are reasonably modern and complex. circuit with boolean logic gates (e.g., AND, OR, XOR) and wires connecting them. Apart from building datapath units 4 Design of Our Fuzzer, TheHuzz like adders and multipliers, these logic gates are used to build basic combinational structures like multiplexers (MUXes), TheHuzz is a novel hardware fuzzer that overcomes the chal- demultiplexers, encoders, and decoders, which are in turn lenges identified in Section 3.2. We directly fuzz the hardware used in building complex blocks. Apart from combinational design instead of the software model, thereby eliminating the gates, sequential logic also uses registers, which are usually need for hardware-to-software conversions and the associated implemented using D flip-flops (DFFs). In the following, we equivalency checks. To overcome the slowness of hardware explain the effectiveness of our fuzzer in capturing hardware simulation, TheHuzz selects the optimal instructions and mu- behaviors over existing hardware fuzzers using a case study. tation techniques to use. TheHuzz is easily integratable with Case study. We now present a case study using a design with existing hardware design and verification methodologies— two bugs inspired by CVEs. First, we explain the intended be- thereby, easily adaptable by companies—as our approach havior and then the bugs. Then, we detail TheHuzz’s coverage does not require any modification to the target processor and metrics and describe how they detect these bugs. utilizes existing hardware simulation tools and techniques. We Consider a cache controller module—similar to the instruc- refer to the target processor as the design under test (DUT). tion cache controller of the Ariane processor [91]—shown in Our fuzzer generates instructions as inputs to the DUT since Listing 1. As shown in Figure 3, the D_READ and the FLUSH we focus on software-exploitable processor vulnerabilities. states determine the read operation during the debug mode TheHuzz comprises three modules, as shown in Figure 1. and the flush operation during the normal mode, respectively, First, the seed generator starts the fuzzing process by gener- as listed in Lines 39–51†. The controller enters the FLUSH ating an initial sequence of instructions (seeds or seed inputs). state when there is a flush command and if the cache is en- Then, the stimulus generator generates new instruction se- abled. The intended behavior of the FSM is that the read quences by mutating them, beginning with the seeds. These operations in the debug mode are permitted only if the user inputs are passed to the simulated RTL design of the DUT, †For succinctness, we ignore the other states of the cache controller.

                                                                                            sel1

D_READ_i 3 DDDpQQQ D_READ ➀ ➂ !sel1 sel1 OR !(flush AND en) FLUSH 0 3 FLUSH_i3 DDD 3 0 n FLUSH D_READ QQQ 3 ➁ 1 3 1 DDDQQQ 3 flush_i DDD QQQ flush sel2 sel1 state flush AND en AND !sel1 pass_i DDD QQQ pass = ➃ here, sel1 = (pass == ipass) | debug_en Figure 3: Finite state machine (FSM) of the design in Figure 2. ipass_i DDD QQQ ipass o p tion (flush & en) in the when block of Line 45) to be tested en_i DDD QQQ en DDD QQQ vld for all possible input values and not only a subset of values. In 5 , the value of the 3-bit register can be one of the eight possible values. We use FSM coverage of the state register debug_en_i DDD QQQ debug_en to check for all the eight values. This coverage captures the different FSM states and also their transitions. In 6 , all the input signals generating vld should be tested Figure 2: Hardware design for Listing 1. for all possible values, similar to 2 . We use expression coverage for this purpose which requires the combinational block (i.e., the expression debug_en|(flush|en) in Line has inputted the correct password (Line 41). This protection 37) to be tested for all possible input values. Furthermore, mechanism allows only authorized users to read the cache in expression coverage covers the select line of MUX 3 and the debug mode. The cache controller sets the valid signal the combinational logic 4 that drives it as they are defined (vld) based on the flush and debug requests issued to the using an expression in Line 51, unlike MUX 1 which is controller (Line 37 of Listing 1). defined as branch in Lines 45–49. The electronic design automation (EDA) tools synthesize In 5 and 7 , the value of each DFF can be 1, 0, or float- this RTL code into an equivalent gate-level design shown in ing‡. We use toggle coverage of these DFFs to check for Figure 2. The MUXes 1 and 3 select the next state. The toggling of their values among these three possibilities. Un- combinational logic 2 and 4 controls the state transitions. like FSM coverage, toggle coverage covers all the DFFs in the The DFFs in 5 hold the current state. The EDA tools im- design. In addition, we also use statement coverage to ensure plement Line 37 as combinational logic 6 . The DFFs in 7 every line of the RTL code is executed during simulation. register the inputs and outputs. TheHuzz uses commercial industrial-standard tools— This design has two bugs: b1 and b2. Bug b1 (Line 41 in Synopsys [74], ModelSim [68], Cadence [10]—to compile Listing 1) is from HardFails [14], which has been used for the the hardware and extract these coverage values. The semi- Hack@DAC competitions, and is similar to CVE-2017-18293. conductor industry has been using these tools for the last few This bug is in the combinational logic 4 , where the debug decades, and its verification flow is built on these tools, thus read operation is access-protected but the bug allows one to providing a promising way to obtain coverage [50]. perform the debug read operation illegally. This compromises TheHuzz detects both b1 and b2 using the expression cover- the security of the read operations as it allows users without age of 4 and 6 , respectively. The expression coverage veri- the correct password to read the cache. Bug b2 is similar to fies that all the signals involved in the combinational logics CVE-2019-19602 and is in the combinational logic 6 that 4 and 6 cover all possible values. One such combination drives the vld register (Line 37), allowing one to flush the will trigger the bugs b1 and b2, resulting in an incorrect out- cache even when it is not enabled. put, which will be flagged as a mismatch. Thus, TheHuzz’s In 1 , all the inputs of the MUX and their correspond- coverage metrics aid in detecting bugs b1 and b2. ing values on the select lines must be tested for correctness. In contrast to TheHuzz, existing hardware fuzzers lose For this purpose, we use branch coverage, which tests each hardware intrinsic behaviors (e.g., floating wires, signal tran- branching construct (the when block of Line 45) for both sitions) while converting the target hardware into a soft- “when” and “otherwise” conditions. ware model [79], operate only on the select signals of the In 2 , one should check that every input combination pro- MUXes [39], operate only on the DFFs that determine the duces the correct output value. To this end, we use condition ‡Referred to as a high-impedance state or tristate and denoted as z. Such coverage, which requires the condition block (i.e., the condi- floating wire-related bugs (CWE-1189) have compromised systems [14].

select signals of the MUXes [30], or operate at the protocol Opcode Profiler Optimizer level [51]. Hence, the coverage used by existing fuzzers will list Mutation Coverage Optimization not be able to cover the bugs in 4 , 6 , and some DFFs in 7 mul Weights (wI) add engine logs tool beq including the bugs we inserted, b1 and b2 (cf. Appendix B). ... Mutation Profiling list inputs Processor min. |IM pairs| TheHuzz Bitflip s.t. constraints 4.2 Seed Generator Swap 001110 101110 (i) and (ii) Clone 100010 011010 ... 100101 101101

Given that we have discussed the various coverage metrics Figure 4: Optimization process used for TheHuzz. to capture hardware behaviors, we now describe the seed generation in more detail. The seed generator generates seed To generate bug-triggering out-of-spec inputs, the second type inputs that run on the DUT and are used to generate further of mutation techniques mutates both the data and the opcode inputs through mutation. bits. Mutating the opcode bits will create inputs with new Seed inputs. TheHuzz’s goal is to detect software-exploitable instruction sequences and help uncover different control paths vulnerabilities in the RTL model of the processors. Proces- in the DUT. This will help generate illegal instructions to test sors execute instructions using the data from the instruction the processor with out-of-spec inputs. We employ AFL-style memory. Hence, our fuzzer provides inputs at the instruction mutation techniques as listed in Appendix A. set architecture (ISA) abstraction level by generating proces- Every time new inputs are generated by the stimulus gener- sor instructions. The seed inputs are data files containing a ator, the code coverage data of these inputs is used to discard sequence of instructions, which are loaded onto the memory. the underperforming inputs, thereby only retaining the inputs Instruction generator generates the instructions for the seed that trigger new coverage points. This helps steer the fuzzer inputs from a set of valid instructions of the processor. towards discovering new coverage points quickly. Input format. Each input consists of two types of instructions: configuration instructions (CIs) and test instructions (TIs). The CIs are needed to setup the baremetal environment, e.g., 4.4 Optimization setting up the stack, exception handler table, and clearing the general-purpose registers. This baremetal environment allows We now propose an optimization for improving the efficiency TheHuzz to run instructions directly on the processor without of a processor fuzzer, as shown in Figure 4. Instead of using the need for an operating system. The TIs are generated by all the instructions and mutations, we optimally select the ones the instruction generator, which are the actual instructions that achieve the best coverage. To this end, we first profile used to fuzz the processor. the individual instructions and mutations and formulate an optimization problem, which returns the optimal weights for 4.3 Stimulus Generator each instruction-mutation (IM) pair. Profiler characterizes the control and data flow paths explored The stimulus generator is responsible for mutating the current by each IM pair. TheHuzz generates the coverage values spe- inputs, generating new inputs, and discarding the underper- cific to each IM pair via hardware simulation. forming inputs. Seed inputs are used to generate the first set Optimizer aims to minimize the number of IM pairs while of new inputs. We mutate the instructions directly as binary achieving the same amount of coverage as using all the IM data instead of at a higher abstraction level such as assembly. pairs. Let I and M be the sets of instructions and mutations, This allows us to mutate all the bits of the instruction based respectively. Let P = I × M. C denotes the union of coverage on the mutation technique used. Thereby, we can test the pro- metrics such as statement, branch, expression, toggle, FSM, cessor with out-of-spec inputs like illegal instructions (i.e., and condition. The coverage from the profiling phase for each instructions not specified in the ISA) generated through mu- IM pair is denoted by the indicator function D : P × C → tation of the opcode bits of the instruction. This allows us to {0, 1}. Cd ⊆ C denotes the coverage points hit by an IM pair detect issues that other verification techniques may not have during the profiling phase. The optimization problem is to detected, like the bug B3 in the Ariane and B8 in the or1200 find the smallest subset of P , denoted as Q, that covers all the processors, which cannot be detected with legal instructions. coverage points identified during the profiling stage, Cd. The Mutation engine performs the mutation operations on the optimizer returns the set Q that contains the optimal IM pairs. instructions. We mutate only the TIs since these are the in- TheHuzz uses this information to generate the weights for each structions used to fuzz the processor. The CIs are not mutated instruction-mutation pair w(I,M)(i, m) = I{i,m}∈Q, ∀ (i, m) ∈ to ensure the correct initialization of the processor for fuzzing. P, where I is an indicator function. The seed generator uses the The mutation techniques used by our fuzzer can be classi- weights, wI , to select instructions, and the stimulus generator fied into two types. The first type only mutates the data bits uses the weights, wM to select the mutation techniques for keeping the opcode unchanged. These mutations increase the each instruction and thereby, eliminating underperforming coverage on different data paths that are close to each other. instructions and mutations.

4.5 Bug Detection Stimulus generator consists of the mutation and the feedback Software programs indicate bug triggers through crashes, engines. The mutation engine mutates the TIs using the AFL- memory leaks, and exit status codes. However, hardware in- like mutations listed in Appendix A. The feedback engine uses trinsically cannot provide such feedback because it does not coverage logs for each mutated TI from the RTL simulation. crash or have memory leaks. Thus, as performed in traditional It retains the best performing instruction-mutation pairs and hardware verification, we compare the outputs of GRMs and discards the ones that do not improve the coverage. the DUT for the inputs generated by the fuzzer. Any mismatch Golden Reference Models (GRMs). We used spike ISA event indicates the presence of a bug, which is then manually emulator [62] as the GRM for Ariane and Rocket Core, and analyzed to identify its cause. or1ksim [57] as the GRM for mor1kx and or1200 processors. 5 Implementation 6 Evaluation We now describe the four open-source processors—Ariane, We implemented TheHuzz such that it is compatible with mor1kx, or1200, and Rocket Core—used to evaluate our traditional IC design and verification flow, while effectively fuzzer TheHuzz and present the evaluation results, along with detecting security vulnerabilities. All the components are bugs detected (cf. Table 1) and the coverage. We compare The- implemented in Python unless specified otherwise. We used Huzz with another fuzzer DifuzzRTL [30] and two traditional CPLEX [31] for optimization. hardware verification techniques: random regression testing Register-Transfer-Level simulation. We simulate the target and formal verification. The experiments are conducted on a hardware using a leading industry tool, Synopsys VCS [74]. 32-core Intel Xeon processor running at 2.6Ghz with 512GB This tool supports a wide variety of hardware description of RAM with CentOS Linux release 7.3.1611. languages (HDLs) and different hardware models: RTL, gate level, and transistor level. We wrote custom Python scripts 6.1 Evaluation Setup to process the logs of VCS to extract the coverage metrics— statement, branch, toggle, expression, and condition. It also With rich hardware-software interactions and complex hard- generates instruction traces, which contain the sequence of ware components, processor designs provide a challenging instructions executed along with the register or memory loca- target for evaluating the potential of hardware fuzzers. While tions modified by each instruction and their updated values. testing commercial processors is appealing, their closed- Thus, TheHuzz leverages existing hardware simulation tools source nature makes register-transfer level (RTL) analysis to instrument the HDLs. impossible. This is a challenge hardware researchers face, and Seed generator generates C programs that consist of config- hence, most papers which evaluate their tool’s effectiveness uration instructions (CIs) and test instructions (TIs). The CIs on processors use open-source designs. We have selected four configure a baremetal C environment on the processors; we processors from two widely used open-source ISAs, Open- extract these CIs from the baremetal libraries of the corre- RISC [57] and RISC-V [63]. All these processors can run a sponding ISAs, e.g., the RISC-V tests repository [62]. The modern Linux-based operating system. TIs are the actual instructions used to fuzz the processor from Ariane (a.k.a. cva6 core) is a RISC-V based, 64-bit, 6- the initial state. Each seed input has 20 TIs; this number is stage, in-order processor, and supports a Unix-like operating selected based on empirical observations before a random TI system [91]. mor1kx is a 32-bit OpenRISC based proces- leads to a deadlock. Events like exceptions or instructions like sor. From the three possible configurations, we selected the branch, jump, system calls, and atomic instructions can cause 6-stage Cappuccino configuration, as it is the most complex the control flow of the processor to jump to a different location design. Developers and the open-source community have eval- or even freeze for a large number of clock cycles, waiting for uated this design for more than seven years. or1200 is a 32-bit resources (in the case of atomic instructions). The first half of OpenRISC based processor. It is one of the first open-source the TIs are generated uniformly from the instructions that are processors and is used for more than two decades [57]. Rocket less likely to trigger such events (e.g., arithmetic and logical Core is a RISC-V based, 64-bit, 5-stage, in-order scalar proces- instructions). This maximizes the number of TIs executed by sor, and supports a Unix-like operating system [5]. RISC-V the TheHuzz in each simulation. The other half of the TIs are open-source processors are widely used in prior work in hard- generated uniformly from all the instructions returned by the ware verification and security, as shown in Table 1, and have optimizer. Thus, the processor is reset after the execution of proven to be effective replacements for commercial designs. every 20 instructions and is simulated with new input. This 6.2 Bugs Detected results in periodical initialization of the processor control flow back to the location of the TIs. The GCC toolchain compiles We now detail the vulnerabilities detected by TheHuzz. We these C programs to generate the executable files which are found eight new bugs. We map each bug to the relevant hard- loaded onto the processor RAM and used as seeds. ware common weakness enumerations (CWEs), as listed in

                                                                   Table 1: Bugs detected by TheHuzz.

 Processor               Prior work         Design size                Bug description                               Location              Coverage types  CWE      New    # instructions
                                using the   LOC       Coverage                                                                                                      bug?   to detect the bug
                                  design              points

Ariane [91] B1: Incorrect implementation of logic to detect Decoder Branch CWE-440 ✓ 1.36 ×104 ISA: RISC-V [63] [86, 69], the FENCE.I instruction. Design year: 2018 [18, 14], 2.07 ×104 3.42 ×105 B2: Incorrect propagation of exception type in Frontend Toggle CWE-1202 ✗ 4.02 ×104 64-bit, 6-stage pipeline [58] instruction queue B3: Some illegal instructions can be executed Decode Condition CWE-1242 ✓ 1.81 ×106 B4: Failure to detect cache coherency violation Cache controller FSM CWE-1202 ✓ 1.72 ×105 mor1kx [57] B5: Incorrect implementation of the logic to ALU Expression CWE-1201 ✓ 20 ISA: OpenRISC [57] [15, 93], 2.21 ×10 4 4 ×104 generate thecarryflag Design year: 2013 [38, 30] B6: Read/write access checking not implemented Register file Condition CWE-1262 ✓ 4.46 ×105 32-bit, 6-stage pipeline for privileged register B7: Incomplete implementation of EEAR Register file Condition CWE-1199 ✓ 1.12 ×105 register write logic or1200 [57] [94, 24], B8: Incorrect forwarding logic for the GPR0 Register forwarding Condition CWE-1281 ✗ 174 ISA: OpenRISC [57] [27, 25], 3.16 ×104 3.90 ×104 and expression Design year: 2000 [35, 88, 8] B9: Incomplete update logic of overflow bit for ALU Toggle CWE-1201 ✓ 3.35 ×103 32-bit, 5-stage pipeline MSB & MAC instructions B10: Incorrect implementation of the logic to ALU Expression CWE-1201 ✓ 2.21 ×104 generate the overflow flag Rocket Core [5] ISA: RISC-V [63] [30] 1.06 ×104 6.65 ×105 B11: Instruction retired count not increased Register file Condition CWE-1201 ✗ 776 Design year:2016 when EBREAK 32-bit, 5-stage pipeline

1 Listing 2: Verilog code snippet for B1 in Ariane. the fuzzer generated a FENCE.I instruction with a non-zero // Memory ordering instructions 2 riscv::OpcodeMiscMem: begin value in the imm field. Ariane raised an exception saying that 34 instruction_o.fu =CSR; the instruction is illegal, whereas spike successfully executed instruction_o.rs1 ='0; 5 instruction_o.rs2 ='0; the instruction, resulting in a mismatch§. Due to this bug, 67 instruction_o.rd ='0; failing-FENCE.I will not be executed, resulting in a poten- case (instr.stype.funct3) 8 // FENCE: Currently implemented as a whole DCache flush tial violation of cache coherence. This bug is similar to the 9 ,→ boldly ignoring other things expected behavior violation vulnerability, CWE-440 [48]. 3'b000: instruction_o.op =ariane_pkg::FENCE; 10 // FENCE.I Bug B2 is in the instruction queue of the frontend stage of 11 3'b001: begin 12 if (instr.instr[31:20] !='0) Ariane. The bug is that a fixed exception is forwarded instead 13 illegal_instr =1'b1; of the actual exception. We detected this bug as a mismatch 14 instruction_o.op =ariane_pkg::FENCE_I; 15 end in the value of a register that loads the exception type when 16 default: illegal_instr =1'b1; an exception occurs. Operating systems that assume that in- 17 endcase 18 if (instr.stype.rs1 !='0 ||instr.stype.imm0 !='0 ||instr. struction access-faults are raised correctly will not behave 19 ,→ instr[31:28] !='0) as expected, and triggering this bug may lead to undefined illegal_instr =1'b1; 20 end (and possibly exploitable) behavior. Also, an incorrect excep- tion handling might be executed, resulting in a memory and Table 1. We present bugs B1, B4, and B6 in detail as we ex- storage vulnerability, CWE-1202 [48]. ploit them in Section 6.3 and briefly describe the other bugs; Bug B3 is that the decode stage does not correctly check arXiv version [80] details the other bugs. for certain illegal instructions. It was detected as a mismatch when the fuzzer generated one such illegal instruction. Due 6.2.1 Bugs in Ariane Processor to this, any undocumented instruction of a certain value can Bug B1 is located in the decode stage of Ariane. According be executed on Ariane, resulting in an undocumented feature to the RISC-V specification [63], the decoder should ignore vulnerability, CWE-1242 [48]. certain fields in a FENCE.I instruction, which enforces cache Bug B4 As per the RISC-V specification [63], when the in- coherence in the processor (e.g., by flushing the instruction struction memory is modified, the software should handle cache and instruction pipeline). It also ensures that the correct cache coherency using FENCE.I instruction. Failure to handle instruction memory is used for execution when performing cache coherency results in undefined behavior, wherein pro- memory sensitive operations (e.g., updating the instruction cessors may use stale data and incorrect execution of instruc- memory). The bug is that the decoder does not ignore the tions [71]. When the fuzzer generated an input program that imm and rs1 fields and expects a value of 0 in these fields, modified the instruction memory but did not use a FENCE.I as seen in Lines 12 and 18 of Listing 2. This Ariane imple- instruction, TheHuzz detected a mismatch in the trace logs mentation declares valid instructions as illegal (Lines 13 and of Ariane and spike. This mismatch could have been avoided 19) due to this additional constraint on the imm and rs1 fields, §We refer to these FENCE.I instructions that Ariane fails to detect as thus violating the specification. We detected this bug when failing-FENCE.I and the rest as the working-FENCE.I.

if the RISC-V specification or the Ariane processor detected An attacker can cause data hazards to obfuscate the behav- violations of cache coherency in hardware. Due to this bug, ior of malware, e.g., by jumping to an offset computed by software running on Ariane could run into cache coherency an instruction that uses GPR0. This bug is similar to CWE- issues and remain undetected if the FENCE.I instruction is 1281 [48], where a sequence of processor instructions result used incorrectly, resulting in a memory and storage vulnera- in unexpected behavior. bility, CWE-1202 [48]. In Section 6.3.1 we use this bug and Bug B9 is that the overflow flag is not correctly calculated bug B1 to successfully exploit a theoretically safe program. for the multiply and subtract (MSB) and the multiply and accumulate (MAC) instructions. This bug results in the failure 6.2.2 Bugs in mor1kx Processor of the software programs to detect the overflow events. Thus Bug B5 is the inaccurate implementation of the carry flag this bug is a core and compute issue vulnerability, CWE- logic for subtract operations. The fuzzer generated inputs 1201 [48], resulting in more software vulnerabilities. that triggered this bug by mutating the data bits of subtract Bug B10 is the incorrect overflow logic for the subtract in- instructions. This caused a mismatch in the value of the carry struction. The bug was detected when the fuzzer was mutating flag between the RTL and golden reference model (GRM). data bits of subtract instruction. This bug also compromises This bug can cause incorrect computations, including those the security mechanisms relying on the overflow flag and is a used in cryptographic functions, resulting in corruption and core and compute issue vulnerability, CWE-1201 [48]. compromise of the processor security (CWE-1201 [48]). Bug B6 The register file stores, updates, and shares the value 6.2.4 Bugs in Rocket Core Processor of all the architectural registers. These registers include the Bug B11 is that the instruction retired count does not increase general- and special-purpose registers (GPRs and SPRs, re- on an EBREAK instruction. It was detected when the fuzzer spectively). Read and write operations to the SPRs are re- executed the EBREAK instruction. TheHuzz was able to detect stricted based on the privilege mode of the processor, as per the only bug, B11 reported by DifuzzRTL using only 776 the OpenRISC specification [57]. The Exception Program instructions and is 6.7× faster than DifuzzRTL. Counter Register (EPCR) is an SPR that stores the address to All the bugs except for B2, B8, and B11 are new bugs which the processor should return after handling an exception. detected by TheHuzz. B2 is fixed in the latest version of A user-level program should not be able to access this register. Ariane. B8 is first reported in [93]. The bug in mor1kx is that the register file does not check for privilege mode access permissions when performing read and write operations on EPCR. This bug was detected when 6.3 Case Study: Exploitability our fuzzer generated an instruction that tried to write into We now present the two exploits we crafted to demonstrate EPCR from user privilege mode. Due to this bug, an attacker the security implications of the bugs found by TheHuzz. Both can write into EPCR from user privilege mode and control the attacks can be mounted from unprivileged software. return address of the processor after handling an exception (CWE-1262 [48]). This bug can have severe security con- sequences like privilege escalation, as demonstrated in our 6.3.1 Ariane FENCE.I Exploit mor1kx exploit in Section 6.3.2. The Ariane exploit leverages B1 and B4 to cause incoherence Bug B7. The register file in mor1kx does not allow one to in the instruction cache. As a result, in the contrived “safe” write into the Exception Effective Address Register (EEAR), just-in-time (JIT) compiler we developed to demonstrate this even for supervisor privilege mode. This bug is detected when bug, an attacker can generate inputs that selectively invalidate our fuzzer generated an instruction that tried to write into cache lines containing old instructions. This program uses EEAR from the supervisor privilege mode. This bug prevents an extension of the FENCE.I instruction (from the failing- programs from updating EEAR, resulting in incorrect execu- FENCE.I instructions) which should fall back to standard tions. Thus, it prevents software from correctly performing fence behavior and flush the entire instruction cache as the exception handling. This bug is similar to CWE-1199 [48]. extension is not understood by spike or Ariane. For our threat model, we assume that the attacker is aware of the use of an 6.2.3 Bugs in or1200 Processor extended FENCE.I instruction present in a target and is capa- ble of loading and executing “safe” programs in the target’s Bug B8 is that the register forwarding logic forwards a non- JIT compiler. An attacker first loads a region of executable zero value for GPR0 if a previous instruction in the pipeline code (which does not contain a vulnerability) into the cache writes to GPR0. We found this bug as a mismatch when the by executing it. The attacker then overwrites the same region fuzzer applied an ADD instruction to create a data hazard of executable code with new instructions (which also does not for GPR0. This bug can result in incorrect computations since contain a vulnerability), then executes separate code which GPR0 is frequently used by software to check for conditions. jumps to instructions which align to cache lines the attacker

                            Random regression
                            Random regression                         cessor. Each experiment is repeated 10 times. Even after 1M

coverage points 420K - DifuzzRTL

                            DifuzzRTL                                 instructions, both random regression testing and DifuzzRTL
                            TheHuzz                                   did not improve their coverage beyond 2.5% than what they
                410K -      TheHuzz                                   collected after applying 300K instructions; on the other hand
                                                                      TheHuzz’s coverage kept increasing. TheHuzz is slower in the
                400K -                                                beginning than random regression testing as the fuzzer uses a
                200K0 --                                              set of instructions until it cannot reach new coverage points;
                    00      4 8 500     600     700×K) 800 900 1000   in that case, it discards and selects new a set of instructions.
                                # instructions (                      TheHuzz achieved the 404.1K coverage points achieved by

Figure 5: Coverage analysis of random regression testing, DifuzzRTL at 3.33× the speed of DifuzzRTL. TheHuzz and DifuzzRTL [30], and TheHuzz for the Rocket Core processor. random regression testing outperformed DifuzzRTL because DifuzzRTL is guided by the control-register coverage, which wishes to invalidate. After, they execute the original region does not capture many hardware behaviors (cf. Appendix B). of executable code, at which point the behavior of spike and The p-value from the Mann-Whitney U test [53] shows that Ariane diverge. In spike, the new instructions will be present the result is statistically significant (p < 0.05) with a p-value and will execute as expected with no vulnerabilities present. of 1.4e-4 for both random regression testing and DifuzzRTL. This is because spike successfully identified the FENCE.I The Vargha-Delaney A12 measure returned TheHuzz as the instruction, but did not recognise its extension, and fell back best performing technique when compared with random re- to flushing the entire cache. In Ariane, the old instructions gression testing and DifuzzRTL. will be present; Ariane fails to recognise the FENCE.I in- The instrumentation overhead of DifuzzRTL is 18% in struction as it instead marks it as an illegal instruction, an terms of lines of Verilog code. TheHuzz does not instrument implementation which is non-compliant with the RISC-V Verilog code explicitly and instead relies on the commer- ISA. Because the cache lines were only invalidated in regions cial tools which do not produce the overhead information. selected by the attacker, the attacker is able to successfully re- Hence, the instrumentation overheads of these two fuzzers are place bounds checks in the original program with effectively not comparable. The runtime overhead for TheHuzz (71%) nops, leading to a vulnerability which was neither present in is greater than DifuzzRTL (6.9%) since TheHuzz requires the old or the new JIT code. As a result, the attacker is able to accessing multiple files to collect all the coverage, whereas inject a stack overflow vulnerability and gain arbitrary code DifuzzRTL only needs to collect control-register coverage. execution. A more detailed description of the vulnerability, exploit, ramifications, and threat model are presented in the 6.5 Comparison with Formal Verification arXiv version [80]. We also compared our fuzzer with another standard approach 6.3.2 mor1kx EPCR Register Exploit used by the semiconductor industry—formal verification. For this purpose, we used the industry-leading formal verifica- The mor1kx exploit leverages the B6 to set the EPCR to point tion tool, Cadence JasperGold [10]. However, there are two to an attacker-controlled exploit function. An exception return challenges in performing this comparison. First, there is no instruction is executed to mimic the return from an exception industry-standard formal tool that can produce a set of instruc- event, causing the processor to update the program counter tions that can trigger a hardware bug in RTL, even if the bug (PC) and status register (SR) values with EPCR and exception is known apriori. Second, these industry tools require one to status register (ESR) values, respectively. The SR stores the write assertions targeting each vulnerability manually. Thus, privilege level. By performing the exploit when the ESR stores the usage of formal tools in this scenario requires one to know a higher-privilege level, execution jumps to the exploit func- of these vulnerabilities apriori—unlike TheHuzz, which does tion while overwriting the privilege level stored in SR. For not make any such assumptions. our threat model, we assume that the attacker already has To manually write these assertions, one has to know the “foothold” access to a target machine and has the ability to entire design, identify the signals and specific conditions that execute arbitrary instructions as a low-privilege user. In this trigger the security vulnerability. This step is highly cumber- scenario, an attacker can perform privilege escalation in the some given the vast number of modules, signals, and states in mor1kx processor. The arXiv version [80] explains this ex- processors, as shown in Table 2. Many bugs are cross-modular, ploit in detail. and hence, they require one to load multiple modules, which only makes writing assertions difficult as they now need to 6.4 Coverage Analysis consider signals across modules and their interactions. These tools only produce Boolean assignments to the inputs of these Figure 5 shows the coverage achieved by random regression modules and not a set of instructions that violate these asser- testing, DifuzzRTL, and TheHuzz for the Rocket Core pro- tions. As shown Table 2, the number of inputs range in few

 Table 2: Hardware complexity encountered while using industry-standard JasperGold [10] to detect the bugs.

Processor Ariane mor1kx or1200 Rocket Core Statistics Bug B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 No. of modules 1 9 1 655 1 4 4 6 1 1 1 No. of inputs 518 627 518 3 298 752 752 703 123 123 284 No. of states 2.51e+58 2.16e+68 2.16e+68 2.01e+59 4.72e+10 1.55e+11 1.55e+11 3.83e+11 1.29e+10 1.29e+10 2.23e+20

hundreds, thereby increasing the number of states that need the bugs found by this fuzzer are shown to be exploitable, as to be checked, leading to state-space explosion. Some bugs most bugs are triggered by physically controlling the interrupt like B4 require one to load the entire system-on-chip into the signals with precise timing; such interrupt signals are not formal tool, which is not always feasible due to state-space usually exposed to unprivileged software [56]. The fuzzer is explosion. Thus, in contrast to TheHuzz, existing formal tools also slower in detecting the bugs as it compares the processor are resource intense, error-prone, and not scalable to complex state after the entire program is executed, while our fuzzer bugs and larger designs, apart from relying heavily on human performs comparison after each instruction is executed. expertise and prior knowledge of hardware vulnerabilities. In contrast, TheHuzz: (i) is compatible with traditional IC design verification flow allowing for seamless integra- 7 Related Work tion by using coverage metrics already widely used in the semiconductor industry; (ii) is scalable to large, complicated, We now describe the limitations of the existing attempts to industrial-designs with several tens of thousands of code, and fuzz hardware and how TheHuzz is different from them, as not just small FSM designs; (iii) captures many intrinsic hard- summarized in Table 3. ware behaviors, such as signal transitions and floating wires, using multiple coverage metrics: statement, toggle, branch, RFUZZ is a mux-coverage-guided fuzzer for hardware de- expression, condition, and FSM; (iv) does not require the de- signs [39]. Although this technique can fuzz designs on FP- signer to specify security rules; and (v) detects several bugs GAs, it is computationally intensive and does not scale to that lead to severe security exploits. Instead, we compare how large designs [30]. Additionally, its coverage metric does not the software views the hardware (i.e., ISA emulator) and how capture many hardware behaviors (cf. Appendix B). It is also the hardware actually behaves (i.e., Verilog), leading to an ineffective in finding any bugs. effective hardware fuzzer. HyperFuzzing proposes a new grammar to represent the se- curity specification rules for hardware, converts the hardware design into equivalent software models, and fuzzes them using 8 Discussion and Limitations AFL fuzzer [51]. It is inapplicable to general hardware de- signs like finite state machines (FSMs) or combinational logic Requirement of Golden Reference Models (GRMs). The- and requires a lot of human intervention, including writing Huzz and other hardware fuzzers [30, 51] depend on GRMs security specifications manually. It did not report any bugs. to find vulnerabilities. Such GRMs are widely available in Fuzzing hardware like software translates the hardware the semiconductor industry. Verification of many commer- design to software models and fuzzes them using a software cial (proprietary and open-source) CPUs critically depend on fuzzer [79]. While this is a promising approach, it is limited by the availability of GRMs, including many industrial, large- the strength of existing open-source tools (i.e., Verilator [70]): scale designs, e.g. Intel x86 Archsim [33], AMD x86 Simnow they currently do not support many constructs of HDLs such [1], ARM Cortex Neoverse [3], and ARM Fast Models [4]. as latches, floating wires, etc. It did not report any bugs. The Thus, the reliance on GRM is not a limiting factor for The- largest benchmark used by this technique has 4,585 lines of Huzz. Sometimes, the GRM itself can be buggy, thereby code (LOC). It also does not scale to real-world designs like causing false positives. This situation is highly unlikely be- processors. For instance, while fuzzing Google’s OpenTitan cause GRMs are carefully curated and versioned with legacy SoC [21], this work could only fuzz the peripheral modules code, and rigorously tested. Verifying a GRM is easier as it is but not the iBex processor in it. written at a higher abstraction level and is thus less complex DifuzzRTL, a recent work, uses a custom-developed control- than a RTL model. register coverage as feedback for the fuzzer by instrumenting Requirement of Register-Transfer Level (RTL) source the HDL [30]. The technique only focuses on the coverage code. TheHuzz depends on RTL access, similar to previous of registers generating the select signals of MUXes and does works such as DifuzzRTL [30], RFUZZ [39], and Hyper- not check for toggle, expression, and FSM coverage points, fuzzing [51]. As mentioned in Section 2.2, verification teams thereby missing the bugs in 3 , 4 , and floating wires in already have access to RTL. An attacker can also buy RTL 5 in Figure 2 (cf. Appendix B for more details). None of models of the target design, as many companies like Imagi-

                                                                                                                     Table 3: Comparison with the prior work on hardware fuzzers.

                                                           Target       Design        Largest                                              Comparison                      Exploitable    Exploits
Methodology      Fuzzer used  HDL      Simulator           design     knowledge       design                 Metrics used              against random                Bugs      from
                                                                                     (Lines of code)                                   regression testing         reported software       presented
RFUZZ            H/W fuzzer   FIRRTL   Any                 RTL                       5-stage Sodor                                    ∼5% increase in
[39]                                                       designs   Not required    core (4,088)        mux-coverage                 mux coverage                0        N/A            0
Hyperfuzzing     S/W AFL      Any      Verilator           SoC     Need              SHA crypto          None                         N/A                         0        N/A            0
[51]             fuzzer                                    designs  security rules   engine (1,196)
Trippel et al.   S/W AFL                                   RTL                                           FSM , line, edge, toggle,    Two orders magnitude
[79]             fuzzer       Any      Verilator           designs   Not required    KMAC (4,585)        and functional coverage      faster for datapath FSMs    0        N/A            0
DifuzzRTL        H/W fuzzer   Any      Any                 CPU                       Boom                Control-register             ∼10% increase in
[30]                                                       designs   Not required    (12,956 in Scala)   coverage                     control-register coverage   16       Not reported   0
                                       Commercial,         CPU       Not required    Ariane (20,698)     statement, toggle, branch,   2.86% increase                       Yes            2*
TheHuzz          H/W fuzzer   Any      industry-standard   designs                                       expression, condition,       in coverage metrics         10
                                       HDL simulator                                                     and FSM coverage
                                                                                      *In theory, the bugs discovered can be used to build more than two exploits, but we show only two due to page limitations.

nation Tech. Limited [32], Cadence [9], and Synopsys [73] We presented an instruction fuzzer, TheHuzz, for processor- sell proprietary hardware designs, and run TheHuzz on them based hardware designs. The effectiveness of TheHuzz is as these designs are compatible with industry-standard tools. shown by fuzzing three popular open-sourced processor de- While companies like Intel and ARM do not reveal the RTL signs. TheHuzz has detected eight new bugs in the three de- model of their processors, attackers can use reverse engineer- signs tested and three previously detected bugs. These bugs, ing services from companies like TechInsights [77] on the when used individually or in tandem, resulted in ROP and target chip and use gate-level to RTL reverse engineering privilege escalation exploits that could compromise both hard- techniques [72] to obtain the RTL model. ware and software, as shown in the two exploits we presented. FPGA emulations. DifuzzRTL and RFUZZ can fuzz pro- Our fuzzer achieved 1.98× and 3.33× the speed compared cessors faster through FPGA emulation than RTL simula- to the industry-standard random regression approach and the tions [30, 39]. TheHuzz uses the coverage metrics imple- state-of-the-art hardware fuzzer, DifuzzRTL, respectively. Fi- mented by EDA simulation tools like Modelsim [68] and nally, compared to the industry-standard formal verification Synopsys VCS [74]. These coverage metrics are not readily tool, JasperGold, TheHuzz does not need human intervention available for FPGA emulations, thereby limiting TheHuzz’s and overcomes its other limitations. applicability to fuzz FPGA-emulated designs. Responsible disclosure. The bugs have been responsibly Fuzzing non-processor designs. Currently, TheHuzz, similar disclosed through the legal department of our institution(s). to DifuzzRTL [30], is limited to fuzzing processor designs since it generates processor specific inputs. These fuzzers cannot fuzz standalone hardware components like SoC pe- Acknowledgement ripherals, memory modules, and other hardware accelerators, which are targeted by RFUZZ and Tripple et al. [79]. The- Huzz could be extended to fuzz non-processor designs by Our research work was partially funded by the US Office of fuzzing the individual input signals of the design. The seeds Naval Research (ONR Award #N00014-18-1-2058), by Intel’s would be assignments to individual input signal values rather Scalable Assurance Program, by the Deutsche Forschungs- than instructions. The coverage metrics and the bug detection gemeinschaft (DFG, German Research Foundation)—SFB mechanism used by TheHuzz will still be applicable. 1119—236615297 within project S2, and by the German Fed- Fuzzing parametric properties of hardware. TheHuzz cur- eral Ministry of Education and Research and the Hessian rently fuzzes only processors for functional behavior but not State Ministry for Higher Education, Research and the Arts for parametric behavior (e.g., cache timing behavior) and within ATHENE. We thank Kevin Laeufer (UC Berkeley), thereby cannot detect side-channel vulnerabilities. One can Jaewon Hur (Seoul National University), and TAMU HRPC extend TheHuzz to cover such vulnerabilities by developing for their support. And, we thank anonymous reviewers for timing-related coverage properties and targeting them. their comments. Any opinions, findings, conclusions, or rec- ommendations expressed herein are those of the authors, and do not necessarily reflect those of the US Government. 9 Conclusion Bugs in hardware are increasingly exposed and exploited. References Current techniques fall short of detecting bugs, as our results demonstrated by finding bugs in a 20-year old processor and [1] AMD. AMD SimNow. https://developer.amd. others. This calls for a revamp of security evaluation method- com/simnow-simulator, 2021. Last accessed on ologies for hardware designs. 10/09/2021.

    [2] A. Ardeshiricham, W. Hu, et al. Clepsydra: Modeling  [17] F. Farahmandi, Y. Huang, et al. System-on-Chip Security:

Timing Flows in Hardware Designs. IEEE/ACM ICCAD, Validation and Verification. Springer Nature, 2019. pages 147–154, 2017. [18] M. Fischer, F. Langer, et al. Hardware Penetration Test- [3] ARM. ARM Cortex Neoverse. https: ing Knocks Your SoCs Off. IEEE D&T, 2020. //www.arm.com/products/silicon-ip-cpu/ neoverse/neoverse-n1, 2021. Last accessed on [19] S. Gan, C. Zhang, et al. CollAFL: Path Sensitive 10/09/2021. Fuzzing. IEEE S&P, pages 679–696, 2018. [4] ARM. ARM Fast Models. https://developer.arm. [20] Google. ClusterFuzz. https://google.github.io/ com/tools-and-software/simulation-models/ clusterfuzz/, 2021. Last accessed on 04/08/2021. fast-models, 2021. Last accessed on 10/09/2021. [5] K. Asanovic, R. Avizienis, et al. The Rocket Chip Gen- [21] Google. Opentitan SoC. https://opentitan.org/, erator. EECS Department, UCB, Tech. Rep., 2016. 2021. Last accessed on 04/08/2021. [6] Averant. Averant Solidify. http://www.averant. [22] Google. Syzkaller. https://github.com/google/ com/storage/documents/Solidify.pdf, 2015. Last syzkaller, 2021. Last accessed on 04/08/2021. accessed on 04/08/2021. [23] S. Groß. FuzzIL: Coverage Guided Fuzzing for [7] W. Badawy and G. A. Julien. System-on-chip for Real- JavaScript Engines. https://saelo.github.io/ time Applications, volume 711. Springer Science & papers/thesis.pdf. Last accessed on 04/08/2021. Business Media, 2012. [24] S. Gurumurthy, S. Vasudevan, et al. Automatic Genera- [8] J. Bai, L. Wu, et al. A 10Gbps In-line Network Security tion of Instruction Sequences Targeting Hard-to-detect Processor With a 32-bit Embedded CPU. IEEE WOCC, Structural Faults in a Processor. IEEE ITC, pages 1–9, pages 616–619, 2013. 2006. [9] Cadence. Cadence Design IP Portfolio. [25] S. Gurumurthy, R. Vemu, et al. Automatic Generation https://ip.cadence.com/ipportfolio/ of Instructions to Robustly Test Delay Defects in Pro- ip-portfolio-overview, 2021. Last accessed cessors. IEEE ETS, pages 173–178, 2007. on 10/09/2021. [26] S. L. He, N. H. Roe, et al. Model of the Product Devel- [10] Cadence. Cadence Webpage. https://www.cadence. opment Lifecycle. Sandia Report (2015), pages 1–49, com/en_US/home.html, 2021. Last accessed on 2015. 04/08/2021. [11] G. Chen, S. Chen, et al. SgxPectre: Stealing Intel Secrets [27] M. Hicks, C. Sturton, et al. Specs: A Lightweight Run- from SGX Enclaves via Speculative Execution. IEEE time Mechanism for Protecting Software From Security- S&P, pages 142–157, 2019. critical Processor Bugs. ACM ASPLOS, pages 517–529, 2015. [12] W. Chen, S. Ray, et al. Challenges and Trends in Modern [28] W. E. Howden. Theoretical and Empirical Studies of SoC Design Verification. IEEE D&T, 34(5):7–22, 2017. Program Testing. IEEE TSE, SE-4(4):293–298, 1978. [13] E. M. Clarke, W. Klieber, et al. Model Checking and [29] C.-C. Hsu, C.-Y. Wu, et al. Instrim: Lightweight Instru- the State Explosion Problem. LASER Summer School mentation for Coverage-guided Fuzzing. NDSS, Work- on Software Engineering, pages 1–30, 2011. shop on Binary Analysis Research, 2018. [14] G. Dessouky, D. Gens, et al. Hardfails: Insights into Software-Exploitable Hardware Bugs. USENIX Security [30] J. Hur, S. Song, et al. DifuzzRTL: Differential Fuzz Symposium, pages 213–230, 2019. Testing to Find CPU Bugs. IEEE S&P, pages 1286– 1303, 2021. [15] C. Deutschbein and C. Sturton. Mining Security Critical Linear Temporal Logic Specifications for Processors. [31] IBM. CPLEX. https://pypi.org/project/cplex/, IEEE MTV, pages 18–23, 2018. 2021. Last accessed on 04/08/2021. [16] L. Dukes, X. Yuan, et al. A Case Study on Web Appli- [32] Imagination. Imagination Technologies. https:// cation Security Testing with Tools and Manual Testing. www.imaginationtech.com/products, 2021. Last IEEE Southeastcon, pages 1–6, 2013. accessed on 10/08/2021.

   [33] Intel. Intel Archsim. https://course.ece.cmu.edu/   [49] MITRE.     CVE Database.          https://cveform.mitre.

ece742/2011spring/lib/exe/fetch.php?media= org/, 2021. Last accessed on 04/08/2021. marr_hyperthread02.pdf, 2021. Last accessed on 10/09/2021. [50] A. Molina and O. Cadenas. Functional Verification: Approaches and Challenges. Latin American applied [34] Z. Kenjar, T. Frassetto, et al. V0LTpwn: Attacking x86 research, 37(1):65–69, 2007. Processor Integrity from Software. USENIX Security Symposium, pages 1445–1461, 2020. [51] S. K. Muduli, G. Takhar, et al. Hyperfuzzing for SoC Security Validation. IEEE/ACM ICCAD, pages 1–9, [35] A. R. Khatri. Implementation, Verification and Vali- 2020. dation of an OpenRISC-1200 Soft-core Processor on FPGA. IJACSA, 10(1):480–487, 2019. [52] O. Mutlu. The RowHammer Problem and Other Issues We May Face as Memory Becomes Denser. IEEE DATE, [36] P. Kocher, J. Horn, et al. Spectre Attacks: Exploiting pages 1116–1121, 2017. Speculative Execution. IEEE S&P, pages 1–19, 2019. [53] W. C. Navidi. Statistics for engineers and scientists. [37] D. Koncaliev. Pentium FDIV bug. https://www.cs. McGraw-Hill Higher Education New York, NY, USA, earlham.edu/dusko/cs63/fdiv.html, 2001. Last 2008. Last accessed on 04/08/2021. accessed on 04/08/2021. [54] N. Nethercote and J. Seward. Valgrind: A Frame- [38] G. Krishnakumar and C. Rebeiro. MSMPX: Microar- work for Heavyweight Dynamic Binary Instrumentation. chitectural Extensions for Meltdown Safe Memory Pro- ACM SIGPLAN Notices, 42(6):89–100, June 2007. tection. IEEE SOCC, pages 432–437, 2019. [55] Onespin. Onespin Website. https://www.onespin. [39] K. Laeufer, J. Koenig, et al. RFUZZ: Coverage-directed com/, 2021. Last accessed on 04/08/2021. Fuzz Testing of RTL on FPGAs. IEEE/ACM ICCAD, pages 1–8, 2018. [56] OpenHW Group. Ariane Source Code. https:// github.com/lowRISC/ariane, 2020. Last accessed [40] lcamtuf. American Fuzzy Lop (AFL) Fuzzer. on 04/08/2021. http://lcamtuf.coredump.cx/afl/technical_ details.txt. Last accessed on 04/08/2021. [57] OpenRISC. OpenRISC Homepage. https:// openrisc.io/, 2020. Last accessed on 04/08/2021. [41] J. Li, B. Zhao, et al. Fuzzing: a survey. Cybersecurity, 1(1):1–13, 2018. [58] Princeton. OpenPiton. https://parallel. princeton.edu/openpiton/index.html, 2018. Last [42] X. Li, V. Kashyap, et al. Sapper: A Language for accessed on 04/08/2021. Hardware-level Security Policy Enforcement. ACM ASPLOS, pages 97–112, 2014. [59] R. Qiao and M. Seaborn. A New Approach for Rowham- mer Attacks. IEEE HOST, pages 161–166, 2016. [43] X. Li, M. Tiwari, et al. Caisson: A Hardware Description Language for Secure Information Flow. ACM PLDI, [60] P. Qiu, D. Wang, et al. VoltJockey: Breaching Trust- 46(6):109–120, 2011. Zone by Software-Controlled Voltage Manipulation over Multi-Core Frequencies. ACM CCS, pages 195–209, [44] Y. Li, B. Chen, et al. Steelix: Program-State Based 2019. Binary Fuzzing. ESEC/FSE, pages 627–637, 2017. [61] J. Rajendran, V. Vedula, et al. Detecting Malicious Mod- [45] M. Lipp, M. Schwarz, et al. Meltdown: Reading Kernel ifications of Data in Third-Party Intellectual Property Memory from User Space. USENIX Security Sympo- cores. IEEE/ACM DAC, pages 1–6, 2015. sium, pages 973–990, 2018. [62] RISC-V. RISC-V Github Repositories. https: [46] V. J. M. Manès, H. Han, et al. The Art, Science, and //github.com/riscv, 2021. Last accessed on Engineering of Fuzzing: A Survey. IEEE TSE, pages 04/08/2021. 1–1, 2019. [63] RISC-V. RISC-V Webpage. https://riscv.org/, [47] B. Marshall. Hardware Verification in an Open Source 2021. Last accessed on 04/08/2021. Context. ODSA, 2019. [64] S. R. Sarangi, A. Tiwari, et al. Phoenix: Detecting [48] MITRE. Hardware Design CWEs. https: and Recovering from Permanent Processor Design Bugs //cwe.mitre.org/data/definitions/1194.html, with Programmable Hardware. IEEE/ACM MICRO, 2019. Last accessed on 04/08/2021. pages 26–37, 2006.

[65] S. Schumilo, C. Aschermann, et al. kAFL: Hardware- [80] A. Tyagi, A. Crump, et al. TheHuzz: Instruction Fuzzing Assisted Feedback Fuzzing for OS Kernels. USENIX of Processors Using Golden-Reference Models for Find- Security Symposium, pages 167–182, August 2017. ing Software-Exploitable Vulnerabilities. arXiv preprint [66] K. Serebryany, D. Bruening, et al. AddressSanitizer: A arXiv:2201.09941, 2022. Fast Address Sanity Checker. USENIX ATC, page 28, [81] J. Van Bulck, F. Piessens, et al. Foreshadow: Extract- 2012. ing the Keys to the Intel SGX Kingdom with Transient [67] K. Serebryany. OSS-Fuzz - Google’s Continuous Out-of-Order Execution. USENIX Security Symposium, Fuzzing Service for Open Source Software. USENIX pages 991–1008, 2018. Association, August 2017. [82] V. Van Der Veen, Y. Fratantonio, et al. Drammer: De- [68] Siemens. Modelsim. https://eda.sw.siemens. terministic Rowhammer Attacks on Mobile Platforms. com/en-US/ic/modelsim/, 2021. Last accessed on ACM CCS, pages 1675–1689, 2016. 04/08/2021. [83] S. Vasudevan. An Introduction to IC Verification. Effec- [69] D. Šišejkovi´c, F. Merchant, et al. A Secure Hardware- tive Functional Verification: Principles and Processes, software Solution Based on RISC-V, Logic Locking and pages 3–12, 2006. Microkernel. SCOPES, pages 62–65, 2020. [84] I. Wagner and V. Bertacco. Engineering Trust with [70] W. Snyder. Verilator. https://www.veripool.org/ Semantic Guardians. IEEE DATE, pages 1–6, 2007. wiki/verilator, 2021. Last accessed on 04/08/2021. [85] B. Wile, J. Goss, et al. Comprehensive Functional Veri- [71] D. J. Sorin, M. D. Hill, et al. A Primer on Memory fication: The Complete Industry Cycle. Morgan Kauf- Consistency and Cache Coherence. Synthesis lectures mann, 2005. on computer architecture, 6(3):1–212, 2011. [86] N. Wistoff, M. Schneider, et al. Prevention of Microar- [72] P. Subramanyan, N. Tsiskaridze, et al. Reverse Engi- chitectural Covert Channels on an Open-source 64-bit neering Digital Circuits Using Structural and Functional RISC-V Core. arXiv preprint arXiv:2005.02193, 2020. Analyses. IEEE Transactions on Emerging Topics in [87] W. Xu, S. Park, et al. FREEDOM: Engineering a State- Computing, 2(1):63–80, 2013. of-the-Art DOM Fuzzer. ACM CCS, page 971–986, 2020. [73] Synopsys. Synopsys DesignWare IP. https://www. synopsys.com/designware-ip.html, 2021. Last ac- [88] S. Xuan, J. Han, et al. A Configurable SoC Design for cessed on 10/08/2021. Information Security. IEEE ASICON, pages 1–4, 2015. [74] Synopsys. Synopsys Webpage. https://www. [89] W. Yang, M.-K. Chung, et al. Current Status and Chal- synopsys.com/, 2021. Last accessed on 04/08/2021. lenges of SoC Verification for Embedded Systems Mar- ket. IEEE SOCC, pages 213–216, 2003. [75] A. Takanen, J. D. Demott, et al. Fuzzing for Software Security Testing and Quality Assurance. Artech House, [90] M. Zalewski. Technical Whitepaper for AFL Fuzzer. 2018. http://lcamtuf.coredump.cx/afl/technical_ details.txt, 2015. Last accessed on 04/08/2021. [76] A. Tang, S. Sethumadhavan, et al. CLKSCREW: Ex- posing the Perils of Security-Oblivious Energy Manage- [91] F. Zaruba and L. Benini. The Cost of Application- ment. USENIX Security Symposium, pages 1057–1074, Class Processing: Energy and Performance Analysis of 2017. a Linux-Ready 1.7-GHz 64-Bit RISC-V Core in 22-nm FDSOI Technology. IEEE Transactions on Very Large [77] Techinsights. TechInsights. https://www. Scale Integration (VLSI) Systems, 27(11):2629–2640, techinsights.com/, 2021. Last accessed on Nov 2019. 10/08/2021. [92] D. Zhang, Y. Wang, et al. A Hardware Design Language [78] M. Tiwari, J. K. Oberg, et al. Crafting a Usable Mi- for Timing-Sensitive Information-Flow Security. ACM crokernel, Processor, and I/O System with Strict and ASPLOS, pages 503–516, 2015. Provable Information Flow Security. ACM/IEEE ISCA, 39(3):189–200, 2011. [93] R. Zhang, C. Deutschbein, et al. End-to-End Automated Exploit Generation for Validating the Security of Pro- [79] T. Trippel, K. G. Shin, et al. Fuzzing Hardware Like cessor Designs. IEEE/ACM MICRO, pages 815–827, Software. arXiv preprint arXiv:2102.02308, 2021. 2018.

[94] R. Zhang, N. Stanley, et al. Identifying Security Crit- Listing 3: Verilog code of the hardware design in Figure 2 ical Properties for the Dynamic Verification of a Pro- instrumented by DifuzzRTL. cessor. ACM SIGARCH Computer Architecture News, 61 assign _T =flush |en; // @[cmd3.sc 37:30] 45(1):541–554, 2017. 62 assign _T_2 =pass ==ipass; // @[cmd3.sc 41:20] 63 assign sel1 = _T_2 |debug_en; // @[cmd3.sc 41:31] 64 assign _T_4 =flush &en; // @[cmd3.sc 46:17] 65 assign state_f = _T_4 ?FLUSH :state; // @[cmd3.sc 46:22] Appendix 66 assign _GEN_1 ={{2'd0}, ∼sel1}; // @[cmd3.sc 52:21] 67 assign _T_6 = _GEN_1 &state_f; // @[cmd3.sc 52:21] 68 assign _GEN_2 ={{2'd0}, sel1}; // @[cmd3.sc 52:40] A Mutation Techniques 69 assign _T_7 = _GEN_2 &D_READ; // @[cmd3.sc 52:40] ... ... 78 assign en_shl =en; We use 12 distinct mutation techniques inspired by the popular 79 assign en_pad ={1'h0,en_shl}; 80 assign flush_shl ={flush, 1'h0}; binary manipulation fuzzer, AFL [90], as indicated in Table 4 81 assign flush_pad =flush_shl; and also detailed in the arXiv version [80]. 82 assign cache_controller_xor0 =en_pad \xor flush_pad; ... ... 168 always @(posedge clock) begin Table 4: Mutation techniques used by TheHuzz. 207... ... state <= _T_6 | _T_7;

Name Description ... ...

M0 Bitflip 1/1 Flip single bit 212 vld <=debug_en | _T; M1 Bitflip 2/1 Flip two adjacent bits 213 end M2 Bitflip 4/1 Flip four adjacent bits 214... cache_controller_state <=cache_controller_xor0; M3 Bitflip 8/8 Flip single byte 218 ... M4 Bitflip 16/8 Flip two adjacent bytes end M5 Arith 8/8 Treat single byte as 8-bit integer, +/- value from 0 to 35 M6 Arith 16/8 Treat 2 adjacent bytes as 16-bit integer, +/- value from 0 to 35 of such MUX implementations. Consequently, it does not M7 Arith 32/8 Treat 4 adjacent bytes as 32-bit integer, +/- value from 0 to 35 M8 Random 8 Overwrite random byte with random value produce coverage points for these control registers. M9 Delete Delete an instruction In the controller example in Figure 2, the combina- M10 Clone Clone an instruction tional logic 4 generates the select signal sel1 of MUX M11 Opcode Overwrite opcode bits 3 . DifuzzRTL cannot detect this MUX because its RTL code is described using combinational logic (Line 51 B Coverage Metrics Of Prior Work of Listing 1: state := ((!sel1 & state_f) | (sel1 & D_READ))) instead of control flow constructs (like when We now demonstrate why the coverage metrics of DifuzzRTL block at Lines 45–49 of Listing 1), thereby failing to detect and RFUZZ cannot cover the bugs in Figure 2. the bug b1 in 4 . To demonstrate this limitation, we compiled the Chisel code (Listing 1) of the controller, generated the correspond- B.1 DifuzzRTL’s coverage metric: control- ing FIRRTL code, and ran DifuzzRTL on it. The instrumented register coverage Verilog code and output of DifuzzRTL instrumentation are shown in Listing 3 and Listing 4, respectively. It can be seen The control-register coverage metric of DifuzzRTL defines from the Lines 28 and 32 of DifuzzRTL’s report (Listing 4) all the registers that drive the select signals of the MUXes as that DifuzzRTL detected only one MUX and two control control registers. These registers in each module are concate- registers; Lines 78–82 of the instrumented Verilog code (List- nated into a single module_state register; all possible values ing 3) show that these control registers are flush and en. of these module_state registers are defined as coverage points. The control registers (pass, ipass, debug_en) generating When applied to the example in Figure 2, DifuzzRTL the signal sel1 of MUX 3 are not included. Consequently, should concatenate all the registers that drive the select signals DifuzzRTL does not have any coverage point in 4 , thereby of the two MUXes 1 and 3 : flush, en, pass, ipass, and failing to detect b1. debug_en. Since there are five 1-bit registers, there are 25 = Limitation 2. DifuzzRTL focuses only on the control- 32 possible values; DifuzzRTL considers each of them as registers that drive the select signals of MUXes. Thus, Di- coverage points, resulting in 32 coverage points. We now fuzzRTL will not cover any combinational logic that does discuss in detail why the control-register coverage metric not drive the select signals of the MUXes. In the controller does not cover the two bugs in Figure 2. example in Figure 2, the bug b2 is in the combinational logic Limitation 1. DifuzzRTL detects only certain implementa- 6 . DifuzzRTL cannot detect this bug since it does not cover tions of MUXes in the RTL code. When a MUX is imple- the registers, flush and en, generating vld in 6 as these mented differently (e.g., as a combination of NOT, AND, or registers are not generating the select signals of any MUX. OR gates), DifuzzRTL fails to detect the MUX and ignores We demonstrate this limitation of DifuzzRTL using the the corresponding control registers. Therefore, it fails to ac- same instrumented Verilog code (Listing 3) and the output count for certain control registers driving the select signals of DifuzzRTL instrumentation (Listing 4) . DifuzzRTL only

Listing 4: DifuzzRTL’s output of the hardware design in Listing 5: Verilog code of the hardware design in Figure 2 Figure 2. MUX2 is undetected. instrumented by RFUZZ. 1 ============Finding Control Registers ========= 37 wire _T =flush |en; // @[cmd3.sc 37:30] 5 numRegs: 9, numCtrlRegs: 2, numMuxes: 1 38 wire _T_2 =pass ==ipass; // @[cmd3.sc 41:20] 8 ============Instrumenting Coverage ============ 39 wire sel1 = _T_2 |debug_en; // @[cmd3.sc 41:31] 12 regStateSize: 2, totBitWidth: 2, numRegs: 2 40 wire [2:0] state_f =profilePin ?FLUSH :state; // @[cmd3.sc 13 numOptRegs: 2 ,→ 46:22] 25 ============Instrumentation Summary =========== 41 wire _T_5 =∼sel1; // @[cmd3.sc 52:15] 26 Total number of registers: 9 42 wire [2:0] _GEN_1 ={{2'd0}, _T_5}; // @[cmd3.sc 52:21] 27 Total number of control registers: 2 43 wire [2:0] _T_6 = _GEN_1 &state_f; // @[cmd3.sc 52:21] 28 Total number of muxes: 1 44 wire [2:0] _GEN_2 ={{2'd0}, sel1}; // @[cmd3.sc 52:40] 29 Total number of optimized registers: 2 45 wire [2:0] _T_7 = _GEN_2 &D_READ; // @[cmd3.sc 52:40] 30 Total bit width of registers: 15 ... ... 31 Total bit width of control registers: 2 48 assign auto_cover_out =flush &en; 32 Optimized total bit width of control registers: 2 ... ... 33 Total bit width of muxes: 1 109 always @(posedge clock) begin ... ... reports the two control registers: flush and en generating 148... state <= _T_6 | _T_7; ... the select signal sel2 of MUX 1 (Lines 78–82 of the instru- 153 vld <=debug_en | _T; mented Verilog code in Listing 3). However, DifuzzRTL does 154 end 155 end not have any coverage points for the signals in the combina- tional logic 6 , where the bug resides. Combinational logic Listing 6: RFUZZ’s output for the hardware design in Fig- constitutes a significant portion of the hardware design, and ure 2. MUX2 is undetected. thus these bugs cannot be overlooked as rare corner cases. 51 [[coverage]] 52 port ="auto_cover_out" B.2 RFUZZ’s coverage metric: Mux-coverage ... ... 58 human ="(flush and en)"

RFUZZ uses a coverage metric called mux-coverage. It treats Verilog code and output of RFUZZ instrumentation are shown the select signal of each 2:1 MUX as a coverage point. When in Listing 5 and Listing 6, respectively. It can be seen from the applied to the controller design in Figure 2, sel1 and sel2 Lines 49 and 58 of RFUZZ’s report (Listing 6) that RFUZZ signals are selected as the mux-coverage points. Since both detected only one select signal of the MUX 1 ; Line 48 of are 1-bit wide, the total number of mux-coverage points is21 the instrumented Verilog code (Listing 5) shows the same. + 21 = 4 coverage points. We now discuss in detail why the The select signal sel1 of MUX 3 is not included. Conse- mux-coverage metric does not cover the two bugs in Figure 2. quently, RFUZZ does not have any coverage point in 4 , Limitation 1. RFUZZ detects only certain implementations thereby failing to detect b1. of MUXes in the RTL code. When a MUX is implemented differently (e.g., as a combination of NOT, AND, or OR gates), Limitation 2. RFUZZ focuses only on the select signals of RFUZZ fails to detect the MUX and ignores the correspond- the MUXes. Thus, RFUZZ will not cover any combinational ing select signals. Therefore, it fails to account for select logic that does not drive the select signals of the MUXes. In signals of such MUX implementations. Consequently, it does the controller example in Figure 2, the second bug b2 is in the not produce coverage point for these MUXes. combinational logic 6 . RFUZZ cannot detect this bug since In the controller example in Figure 2, the combina- it does not cover the registers, flush and en, generating vld tional logic 4 generates the select signal sel1 of MUX in 6 as these registers are not the select signals of any MUX. 3 . RFUZZ cannot detect this MUX because its RTL We demonstrate this limitation of RFUZZ using the same code is described using combinational logic (Line 51 instrumented Verilog code (Listing 5) and the output of of Listing 1: state := ((!sel1 & state_f) | (sel1 & RFUZZ instrumentation (Listing 6) . RFUZZ only reports D_READ))) instead of control flow constructs (like when the one signal: the select signal sel2 of the MUX 1 (Line block at Lines 45–49 of Listing 1), thereby failing to detect 58 of the RFUZZ’s output). However, RFUZZ does not have the bug b1 in 4 . any coverage points for the signals in the combinational logic To demonstrate this limitation, we compiled the Chisel 6 , where the bug resides. Combinational logic constitutes code (Listing 1) of the controller, generated the correspond- a significant portion of the hardware design, and thus these ing FIRRTL code, and ran RFUZZ on it. The instrumented bugs cannot be overlooked as rare corner cases.