Skip to content
STIMSMITH

SOURCE ARCHIVE

SHA256: 8af4d0b5dcff0e58a9d50c20b8508bb7265b728710b1df71645e652a561dc97c
TYPE: application/pdf
SIZE: 374.9 KB
FETCHED: 6/8/2026, 10:12:54 PM
EXTRACTOR: liteparse
CHARS: 90,760

EXTRACTED CONTENT

90,760 chars
                                                                     ProcessorFuzz: Processor Fuzzing with Control and
    Status Registers Guidance

    Sadullah Canakci1, Chathura Rajapaksha1, Leila Delshadtehrani1, Anoop Nataraja2,
    Michael Bedford Taylor2, Manuel Egele1, and Ajay Joshi1
    1Department of ECE, Boston University
    2Department of ECE, University of Washington
    {scanakci, chath, delshad, megele, joshi}@bu.edu, mysanoop@uw.edu, prof.taylor@gmail.com

        Abstract—As the complexity of modern processors has increased   technique, can be adapted as a dynamic verification technique
over the years, developing effective verification strategies to iden-   to identify bugs in a processor design if certain differences
    tify bugs prior to manufacturing has become critical. Inspired by   between hardware and software are addressed.
    software fuzzing, a technique commonly used for software testing,
      multiple recent works use hardware fuzzing for the verification   Prior works on processor fuzzing mainly focus on addressing

of Register-Transfer Level (RTL) designs. However, these works two major challenges. First, code coverage metrics used for suffer from several limitations such as lack of support for widely- fuzzing software programs (basic block, branch coverage, etc.) used Hardware Description Languages (HDLs) and misleading are not well-suited for fuzzing hardware [24], [55]. Second, coverage-signals that misidentify “interesting” inputs. Towards overcoming these shortcomings, we present ProcessorFuzz, a a bug in a processor design does not result in an observable processor fuzzer that guides the fuzzer with a novel CSR-transition anomaly (i.e., crash) during testing as opposed to many soft- coverage metric. ProcessorFuzz monitors the transitions in Control ware programs which can indicate the presence of bugs by and Status Registers (CSRs) as CSRs are in charge of controlling throwing memory violation errors or raising exceptions. and holding the state of the processor. Therefore, transitions in CSRs indicate a new processor state, and guiding the fuzzer based To address the first challenge, researchers have introduced on this feedback enables ProcessorFuzz to explore new processor a variety of coverage metrics [24], [33], [35], [45] such as states. We evaluated ProcessorFuzz with three real-world open- multiplexer toggle coverage and register coverage that are source processors – Rocket, BOOM, and BlackParrot. Proces- tailored for hardware. In the context of a processor, the proces- sorFuzz triggered a set of ground-truth bugs 1.23× faster (on sor is effectively a complex Finite State Machine (FSM) that average) than DIFUZZRTL. Moreover, our experiments exposed 8 new bugs across the three RISC-V cores and one new bug in a consists of a large number of states. Exploring different states in reference model. All nine bugs were confirmed by the developers ‘processor FSM’ is the key to identifying bugs in the processor. of the corresponding projects. Therefore, hardware-specific coverage metrics mainly aim to Index Terms—greybox fuzzing, processor, coverage, RTL veri- guide the fuzzer towards different uncovered ‘processor FSM’ fication, RISC-V states. These metrics take the hardware intrinsic (e.g., wire I. INTRODUCTION connections) into account rather than merely the code structure of the hardware. For instance, DIFUZZRTL [24], a state-of- As the complexity of processor designs has continuously the-art processor fuzzer, introduces register coverage metric grown over the years, verification has become one of the most where the goal is to monitor value changes in registers that challenging tasks in processor manufacturing. The state-space control multiplexer selection signals. The intuition is that a of a complex processor is extremely large, while the processor particular value in these registers represents a unique state in the vendors have limited time and resources for verification. An ‘processor FSM’ and guiding the fuzzer based on this feedback exhaustive verification (i.e., testing each and every scenario) explores additional FSM states. is an unrealistic goal to achieve, and therefore, a high-quality DIFUZZRTL’s register coverage metric improves on prior verification methodology is essential to discover bugs before works [1], [33], [44] in terms of scalability, efficiency, and fabrication. A timely, pre-silicon bug discovery can circumvent precision. However, we make a key observation that the register potentially millions-of-dollars of losses [25]. Otherwise, undis- coverage can be a highly misleading metric for a processor covered bugs can manifest as severe functional and security fuzzer. Specifically, we find that DIFUZZRTL monitors many holes in both proprietary and open-source processors [11], [13], datapath registers which have minimal control over the current [26], [28], [32], [58], [61]. FSM state of the processor. The coverage increase resulting Dynamic verification techniques [4], [14], [16], [47], [55], from the datapath registers does not provide meaningful infor- [59] are commonly used as part of the processor verification mation related to the current FSM state of the processor. This process. Dynamic verification involves simulating a Design results in a scenario where inputs that affect datapath register Under Test (DUT) with a test input and analyzing the be- coverage are incorrectly being classified as ‘interesting’ inputs, havior of the DUT during or after simulation to identify which in turn leads to wasted fuzzing time. bugs. Recent works [24], [33], [57] demonstrate that Coverage- To address the second challenge, existing processor fuzzers based Greybox Fuzzing (CGF), a widely-used software testing [20], [24], [29], [34] adapt differential testing from the software

    domain to the hardware domain. Differential testing in software      eight new bugs identified in those three different processor
           compares outputs of multiple programs that have the same    designs and one new bug in a reference model.
functional behavior and   checks  for inconsistencies.  In      the
         hardware domain, the results of an Register Transfer Level        II. BACKGROUND AND MOTIVATION
          (RTL) simulator are compared with those of an Instruction    In this section, we first briefly explain coverage-based grey-
      Set Architecture (ISA) simulator. An RTL simulator is used to    box fuzzing (CGF) for software. Next, we provide a brief
    simulate the execution of an instruction stream on the detailed    background of how CGF is adapted as a hardware fuzzing
         microarchitecture implementation of the processor. The ISA    method (specifically for processor fuzzing) and present the
       simulator is used to simulate the functional behavior of the    motivation of our work.
       processor design and used as a reference model. A difference    A. Coverage-based Greybox Fuzzing
       in the execution output of RTL simulation and ISA simulation
indicates a potential bug in the processor.                                    Fuzzing has gained broad adoption in the software com-
         In this work, we present ProcessorFuzz, a processor fuzzer    munity due to its effectiveness in bug discovery, scalability,
      that implements two novel features. First, ProcessorFuzz uses    and practicality [18], [19], [41]. Fuzzing is the process of
     a new coverage metric called CSR-transition coverage to effec-    repeatedly running a Program Under Test (PUT) with a large
       tively guide processor fuzzing towards exploring unique pro-    number of random inputs to discover bugs in software. One
cessor states. Specifically, it monitors transitions in Control and    of the widely-used fuzzing variants is CGF which utilizes
     Status Registers (CSRs) that form the core of the architecture    the coverage feedback collected from the PUT at runtime.
     specifications. Our intuition is that certain CSRs dictated by    In each run of the PUT, CGF records coverage (e.g., basic
        ISA readily expose the current ‘processor FSM’ state (e.g.,    block coverage, edge coverage, etc.) to determine if the input
       current privilege mode, the event that caused floating point    is ‘interesting’, i.e., whether it leads to increased coverage. If
    mode exception), and thus the transitions in these CSRs signify    so, CGF applies a set of mutations to the ‘interesting’ input
a new ‘processor FSM’ state.                                           to generate new inputs which are then fed to the PUT in the

ProcessorFuzz’s second feature is that it uses ISA simulation next fuzzing rounds. Here, the intuition is that generating new to rapidly determine if a test input is interesting. Prior works inputs from coverage increasing ones would cover even more rely on RTL simulation for the same goal, which is time- unexplored code. CGF instruments the code of the program consuming. In fact, this problem gets compounded if the (either statically or dynamically) with the necessary book- coverage guidance is misleading and results in the execution of keeping logic to record coverage during the program execution. repetitive test inputs. ISA simulation is significantly faster than B. Adapting CGF for Processor Fuzzing RTL simulation1. Hence, ProcessorFuzz can efficiently elimi- nate repetitive test inputs and focus on as many qualitatively Recent works [24], [33], [57] show that CGF can be adapted distinct test input patterns as possible to expose bugs faster. as a dynamic verification method for hardware including pro- We evaluate ProcessorFuzz using a variety of widely-used cessors. In this section, we briefly explain two important aspects open-source RISC-V based processors [2], [9], [48] designed when adapting CGF to processor fuzzing. in different HDLs (i.e., Chisel and SystemVerilog). These Hardware Execution. In the case of CGF for software, the processors vary in microarchitectural implementations such as fuzzing target is a software program that can be directly their pipeline depths, execution type (i.e., in-order and out-of- executed on a host machine with a test input after compilation. order execution), etc. We compare the bug-finding effectiveness However, hardware (e.g., a processor) is not directly executable of ProcessorFuzz against the state-of-the-art register coverage on the host machine. A hardware design is implemented with guided DIFUZZRTL. On average, for the bugs found by an RTL abstraction and simulated with an RTL simulator to DIFUZZRTL, ProcessorFuzz triggers bugs 1.23× faster than evaluate a test input. The RTL design is usually expressed with DIFUZZRTL. In addition, ProcessorFuzz revealed 8 new bugs an HDL (e.g., Verilog, VHDL). in widely-used open-source processors and one new bug in a Bug Detection. Most software fuzzers focus on bugs that man- reference model. ifest as memory safety violations such as segmentation faults. In summary, we make the following contributions: These types of bugs are relatively easy to detect because they • We propose ProcessorFuzz, a new processor fuzzing cause an observable anomaly (i.e., crash) in program behavior. mechanism. ProcessorFuzz uses a novel CSR-transition However, fuzzing to find semantic bugs (e.g., logic errors) is coverage (CTC) metric, to effectively guide processor harder than discovering memory violations because defining fuzzing towards interesting processor states. semantic violations is a highly domain-specific task. For these • We propose to use the ISA simulator as part of a coverage types of bugs, researchers proposed differential testing [5], [40], feedback mechanism to rapidly identify interesting test [42], [50] that compares the output of multiple programs with inputs, thereby accelerating the bug-finding process. the same functionality and checks for inconsistent behaviors. • We demonstrate the practicality of ProcessorFuzz using This approach is used by processor fuzzers [20], [24], [34] 3 different open-sourced RISC-V processors and present where the processor fuzzer provides the same input to both the RTL simulator and the reference model. Here, the reference 1As a reference point, ISA simulation is 79× faster than RTL simulation model is an ISA simulator that mimics the behavior of all the for the open-source RISC-V based BOOM [9] processor. ISA-level operations. The hardware fuzzer extracts the final

                                               2

                   cfiType0  ................ cfiType27 cfiType27[0]  0     0   1                                    150K
                                                        cfiType27[1]  0     1   1                                             Other                        Other
                      Combinational Logic               cfiType26[0]  0     1   0                                                                     100K
BTB                                                     cfiType26[1]  .     .   .                                             DCache                       Remainder register
                                                                      .     .   .                                    100K     BTB                      75K
                                                                      .     .   .      Update
                                                                      0     0   1     after  Coverage                         Rocket
                             Remainder                  Remainder[0]  0     1   0     each cycle Map                  50K     MulDiv                   50K
                                                        Remainder[1]  0     1   1                                                                      25K
                      Combinational Logic               Remainder[2]  .     .   .
         MulDiv                                                       .     .   .
                                 .......                                                                             2000                              200
Rocket Core RTL                                                       clk0  clk1 clk2                                1000                              100
                   1         Register                        2  Coverage                                           0    0     6         12    18   24  00       6  12   18     24
                      Identification                            Computation                                                       Time (h)                     Time (h)
Fig. 1: Overview of DIFUZZRTL’s coverage feedback strategy.                                                                  (a) Rocket core           (b) MulDiv module
memory states and architectural register values from both the                                               Fig. 2: DIFUZZRTL’s register coverage breakup for Rocket.
ISA simulator and the RTL simulator for the same input and
cross-checks the traces. Any mismatch is considered a potential                                             overall register coverage results from the MulDiv module at
bug in the processor and is marked for further investigation by                                             the end of 24-hours. Figure 2b further shows the contribution
the verification engineer.                                                                                  of the               remainder register to the coverage increase in the
                                                                                                            MulDiv module. Compared to all other registers in the MulDiv
C. DIFUZZRTL’s Register Coverage                                                                            module,          remainder        register is clearly major factor that
                                                     DIFUZZRTL [24] adapts CGF to capture FSM state tran-   causes increase in register coverage.
sitions during RTL simulation. The strategy follows a two-                                                                   Broadly, as pointed out with the above example, DIFUZ-

stage approach as depicted in Figure 1. In stage 1⃝, it performs ZRTL monitors and uses coverage information from reg- static analysis to identify a small set of registers in each RTL isters even if they are mostly involved in datapath-related module and instruments the RTL with necessary hardware operations and have minimal control over the current FSM logic to record register coverage at simulation time. At a state of the hardware. Unfortunately, data-path registers (e.g., high level, DIFUZZRTL monitors a register if its value is remainder) increase search space significantly, yet the cov- directly or indirectly used to control a multiplexer selection erage increase resulting from data-path registers indeed does signal. DIFUZZRTL creates a circuit graph of the RTL design not provide meaningful information to the fuzzer related to the where nodes and edges of this graph represent circuit elements current hardware state. Therefore, it is not interesting to keep (e.g., multiplexers, wires, ports, registers) and connections, an input for further mutations if it increases coverage based respectively. Then, it recursively performs a backward data-flow on data-path registers. In our work, we present a new coverage analysis for each multiplexer’s selection signal and identifies metric that aims to tackle this problem. any register in the traversed path. In stage ⃝2 , DIFUZZRTL III. PROCESSORF monitors value changes in the identified registers during the UZZ RTL simulation. For each clock cycle, DIFUZZRTL hashes A. Design Overview all the values in the identified registers into a coverage map We illustrate the design overview of ProcessorFuzz in Figure to represent the current FSM state. If a new hash value is 3. In stage (1), ProcessorFuzz is provided with an empty seed observed, DIFUZZRTL increases register coverage to signify corpus. It populates the seed corpus by generating a set of that the current test is interesting for further mutations. random test inputs in the form of assembly programs that DIFUZZRTL’s register coverage improves prior work [33], conforms to the target ISA. Next, ProcessorFuzz chooses a [35] in terms of scalability, efficiency, and precision. However, test input from the seed corpus in stage (2) and subsequently using register coverage metric for hardware fuzzing can be applies a set of mutations (such as removing instructions, highly misleading. At a high level, we observe that a subset of appending instructions, or replacing instructions) on the chosen registers leads to misleading coverage increase, and therefore, input in stage (3). For these three stages, ProcessorFuzz uses misguides the hardware fuzzer. We provide more details using the same methods applied by a prior work [24]. In stage (4), an example (illustrated in Figure 1) from the open-source ProcessorFuzz runs an ISA simulator with one of the mutated RISC-V-based Rocket Core [2]. In the multiplication unit of inputs and generates an extended ISA trace log that includes Rocket Core, there is a 130-bit remainder register in the the value of CSRs for each executed instruction. The Transition MulDiv module that indirectly controls 98 mux selection sig- Unit (TU) receives the ISA trace log, extracts the transitions nals. Therefore, DIFUZZRTL identifies this register to monitor that occur in the CSRs, and cross-checks each transition against during fuzzing2. The change in the value of remainder the Transition Map (TM) in stage (5). The TM is initially empty results in an increase in coverage. In Figure 2, we demonstrate and populated with unique CSR transitions during the fuzzing the coverage increase resulted from the remainder register session. If the observed transition is not present in the TM, it during a 24-hour fuzzing session. First, in Figure 2a, we depict is classified as a unique transition and added to the TM. In the coverage progress of different modules in the Rocket core. case the current test input triggers at least one new transition, Clearly, the MulDiv module (multiplication unit of Rocket the input is deemed interesting and added to the seed corpus core) dominates the module-wise register coverage. 62% of for further mutations. Otherwise, the input is discarded. In 2DIFUZZRTL applies some optimizations to reduce search space. As one stage (6), ProcessorFuzz runs the RTL simulation of the target of their optimizations, it is able to track only a subset of bits of a register and processor with only the interesting mutated input. The RTL therefore; ultimately tracks 98 bits of the remainder register. simulation also generates an extended RTL trace log similar to

                                                                                3










...

...










...
                             0 1
...

...
                   0 1       0 1  0 1

Register Coverage

Register Coverage

                                              RTL  6                                                N0
                                           Simulation
                                        New                       Extended RTL                   Initial
        2     Seed                      Transition?        7     Trace Log                        State
                                                                     Mismatch?
      Scheduling Mutation            5  Transition Unit    Trace         Potential               P0      P2
                  Engine                 Transition Map    Compare             Bug
1 Seed Corpus       3                         Extended ISA
                                               Trace Log          Extended ISA                  Operation             Register
                                                               Trace Log                        Invalid fdiv     P1 Read fflags
                                              ISA
                                           Simulation 4                                          N1                      N2
                   Fig. 3: ProcessorFuzz Design.                                         Fig. 5: Abstract state diagram for triggering Bug 2 in Table IV.
  #     PC     Instruction           [  Privileged                  Unprivileged ]       higher runtime overheads (e.g., 71% overhead in TheHuzz [30]
  1   0x045c   sret                  [8000000a00006000,00,0f,b100,   0,00        ]       and 97% overhead in RFUZZ [33]).
  2   0x283c   sraiw s5, s0, 6       [8000000a00006020,00,0f,b100,   0,00        ]
  3   0x2840   fdiv.s fs11, ft0, fa7 [8000000a00006020,00,0f,b100,   0,00        ]       C. CSR-transition Coverage
  45  0x2844   fence iorw,iorw       [8000000a00006020,00,0f,b100,   0,03        ]       1) Description of the Metric: As described in Section II-C,
      0x2848   fsqrt.s ft0, ft5      [8000000a00006020,00,0f,b100,   0,03        ]

Fig. 4: Extended trace log generated by the ISA simulator. DIFUZZRTL’s register coverage technique monitors many dat- The values (in hexadecimal) of a subset of CSRs in Table apath registers (e.g., remainder register) to determine the I are included within the square brackets in the given order; current FSM state, which leads to large state space. To test the mstatus, mcause, scause, medeleg, frm, and fflags. processor with as many qualitatively distinct input patterns as Transitions are color coded; red and blue for mstatus and possible, we propose a novel CSR-transition coverage metric. fflags CSR transitions, respectively. CSRs are system registers in an ISA specification. These registers are used to control (e.g., delegated exceptions) or the extended ISA trace log. The ISA trace log and the RTL hold information (e.g., state of the floating-point unit) about trace log are compared in stage (7). Any mismatch between the current architectural state of the processor. Our intuition the logs signifies a potential bug that needs to be confirmed by for using CSRs is as follows. A processor is a complex FSM a verification engineer usually by manual inspection. where CSRs have direct control over the current processor state. Architectural state of the processor (held in the register file B. Feedback from the ISA Simulation and status registers) represents the state of a program running in the processor. A value change in a CSR often signifies One design feature of ProcessorFuzz is that it relies on the an architectural state change such as a value change in a ISA simulation to determine if a test input is interesting as CSR that stores exception code or privilege level. Therefore, opposed to prior works that rely on the RTL simulation. We ProcessorFuzz aims to realize the current state of the processor use the ISA simulator to capture the CSR transitions for two by monitoring transitions in CSRs to guide the fuzzer towards main reasons. First, ISA simulators are generally much faster interesting processor states. in executing a given program in comparison to executing that CSR transitions can be extracted from either the ISA simu- program on a processor using the RTL simulation. For instance, lator or the RTL simulation of the processor design. Proces- we observed that the RISC-V Spike ISA simulator [53] is, sorFuzz uses the ISA simulator to capture CSR transitions. on average 79× faster than the RTL simulation of the RISC- Specifically, ProcessorFuzz monitors the CSR values resulting V BOOM processor. This speedup provides a considerable from the execution of the previous and current instructions and advantage as ProcessorFuzz can then quickly identify if a checks if they differ. If so, ProcessorFuzz uses the transition to test input is interesting without performing the slow RTL determine if the input is interesting as detailed in the following simulation. Eliminating inputs with similar characteristics helps subsections. We provide a concrete example using the extended ProcessorFuzz achieve faster bug discovery times as shown ISA trace log shown in Figure 4 to illustrate how ProcessorFuzz in Section IV. Indeed, ProcessorFuzz discovered all the bugs identifies a CSR transition in the ISA trace log. After execution found by the existing processor fuzzer (i.e., DIFUZZRTL). of the sret, the CSR value changes which can be seen by Second is the reduced effort needed to instrument the sim- comparing the entries in line 1 and 2 of the ‘Privileged’ column. ulator. A simulator needs to be instrumented to generate an Specifically, we observe a CSR-transition in mstatus CSR as extended trace log with the selected CSRs. An ISA simulator highlighted in red in Figure 4. can be easily instrumented by extending the already available 2) Why Transitions Instead of Values?: DIFUZZRTL deter- trace logic with the selected CSRs. The same instrumented ISA mines the current processor state based on the register coverage simulator can be used to fuzz any processor design as long as it as detailed in Section II-C. For each newly covered FSM has been designed for the same ISA target. In contrast, instru- state, DIFUZZRTL’s register coverage only stores the current menting RTL designs for tracking the coverage metrics requires state of the processor and does not consider the previous extensive effort. Moreover, instrumentation in one HDL does state. Unfortunately, this design choice can lead to important not readily translate to other HDLs. Additionally, as shown test inputs being discarded by the fuzzer and the fuzzer can in Section IV, ProcessorFuzz incurs limited instrumentation potentially miss out on the discovery of a bug. We illustrate this overhead during fuzzing (only 1% in ISA simulator) as opposed in Figure 5. The figure represents a subset of the abstract states to prior works [30] that instrument processor RTL and result in associated with a real-world bug (Bug 2 in Table IV) that we

                                                                     4

identified in a RISC-V processor. The processor starts out in the example, if we only want to verify the functionality of the N0 state. The bug triggers in the N2 state only if the previous floating-point unit in the processor, only floating-point CSRs state is N1. During a coverage-guided fuzzing session, if both can be monitored to identify transitions. We quantitatively N1 (through P0 transition) and N2 (through P2 transition) are demonstrate this capability of ProcessorFuzz in Section IV-B. covered individually, there will not be a coverage increase for D. Transition Unit the denoted P1 state transition. Hence, the unique P1 transition is not particularly driven towards. Thus, the fuzzing session As shown in Figure 3, the TU takes an extended ISA trace fails to trigger the bug. Contrarily, by monitoring transitions, we log as input and communicates with the TM to output whether can detect P1 as a new transition even though N1 and N2 states the trace log contains any new transitions. We describe the are already covered. Overall, we monitor new transitions in complete workflow of the TU in this section. CSRs rather than just identifying unique CSR values to improve Filtering Transitions. As a first step, the TU extracts all the sensitivity of the feedback metric. Indeed, our rationale is CSR transitions in the trace log based on the description in similar to widely-used software fuzzers’ [19], [38] rationale Section III-C. Then, ProcessorFuzz applies a filter to remove that monitor edges in a program instead of basic blocks. unnecessary transitions. We observe that not all CSR transitions 3) CSR Selection Criteria: An ISA specification usually represent interesting architectural state changes that are relevant specifies a large number of CSRs3. Monitoring all available for testing processors. For instance, a test program running on CSRs for transitions can mislead the fuzzer (as we show in the target processor can write to a CSR that contains processor Section IV) because not all CSRs provide distinctive infor- status, e.g. mstatus CSR in RISC-V ISA. This could get mation regarding the current processor state. As an example, identified as a new CSR transition. If the write operation is consider instret CSR that holds the total number of retired legal, the processor continues the execution of the program and instructions. Monitoring the instret results in a scenario eventually overwrites the CSR with the updated status. Overall, where each committed instruction by the processor results in a the type of transitions that occur from writes to status CSRs CSR transition. Effectively, ProcessorFuzz would identify any do not affect the architectural state of the processor. Thus, test input as interesting since the instret causes a transition ProcessorFuzz filters out transitions that occur from explicit after each committed instruction. However, a test would rarely writes to status CSRs. result in a bug due to a change in committed instruction count. Grouping Transitions. Next, the TU groups the transitions to To aid ProcessorFuzz in determining qualitatively different reduce the state space. ProcessorFuzz provides the flexibility inputs, we introduce the following two criteria when selecting to customize the CSR-transition coverage metric to be suitable the CSRs that ProcessorFuzz monitors for transitions. First, we for verifying different Architectural Units (AUs) individually. select CSRs that contain status information about the processor Specifically, ProcessorFuzz allows a designer to group CSR (criteria C1). These CSRs are important because they directly transitions of AUs, thereby considering them as independent reveal the current status of the processor. As an example, we events. Grouping transitions improves the exploration of CSR select a CSR that stores the cause for an exception taken transitions within each group. As a result, the fuzzer is able to by the processor (e.g., mcause). If a test case results in an generate tests targeted towards individual AUs and verify them exception, ProcessorFuzz analyzes the cause and differentiates thoroughly. This is a useful feature for a verification engineer it from another test case that has a different exception reason as AUs in a processor can be individually verified as an initial (e.g., misaligned load/store attempt or access faults due to step of verification. For example, privileged and unprivileged unauthorized privilege mode). Second, we select any CSR that architectures in a RISC-V processor can be verified individually is used to set a certain configuration in the processor (criteria by grouping transitions as shown in Figure 4. Identifying and C2). Here, we aim to realize if the processor behaves as fixing the bugs in each AU before fuzzing the processor as a expected under different configurations. For instance, the value whole can reduce the overall verification effort. of medeleg can be changed to determine which traps can be Transition Map ProcessorFuzz maintains a transition map to delegated to lower privilege levels (e.g., the load access fault store CSR-transitions. Each transition is stored in the map handled in supervisor mode instead of machine mode). This as a tuple: (Im, S0, S1) where Im is the mnemonic of the way, ProcessorFuzz aims to realize if processor designs can instruction whose execution resulted in the CSR transition. S0 perform correctly under different configurations (e.g., different and S1 are CSR values before and after the transition as defined exception delegations) for a particular processor status (e.g., an in subsection III-C. Revisiting the same example given in exception). Table I lists all the CSRs in the RISC-V ISA that we subsection III-C, the unprivileged CSR-transition in lines 3 and used for identifying transitions in the current implementation 4 in Figure 4 can be represented as (fdiv.s, 0000, 0003). We in- of ProcessorFuzz based on the aforementioned two criteria, i.e., clude instruction mnemonic because the same transition can be C1 and C2. We also provide all the CSRs that we excluded (e.g., triggered by different instructions. For example, both floating- instret) along with details why they are not considered as point division and floating-point square-root instructions can part of ProcessorFuzz’s current design in Table II. trigger the same transition in fflags CSR in RISC-V ISA due Apart from these two criteria, CSR selection can be further to invalid operations. Nevertheless, only the invalid operation limited depending on the desired scope of verification. For of floating-point division instruction might contain a bug. Only the mnemonic of the instructions is included to ignore repetitive 3As a reference point, RISC-V ISA defines up to 4096 CSRs. transitions that get triggered by different operands of the same

5

TABLE I: CSR selection for RISC-V ISA implementation of ProcessorFuzz along with the criteria that was used to select them. Here, C1 and C2 correspond to two criteria that we describe in Section III-C3.

        CSR Group              CSR                                                           Description                                        Criteria
                           mstatus.xIE      Controls the global interrupt enable bit for privilege x, x = {M, S, U}                                C2
                           mstatus.xPIE     Holds the value of interrupt-enable bit active prior to the trap for privilege mode x                  C1
                           mstatus.xPP      Holds the previous privilege mode active prior to a trap taken to privilege mode x                     C1
                           mstatus.XS       Contains the state of any additional user-mode extensions                                              C1
                           mstatus.FS       Contains the state of the floating-point unit                                                          C1
                           mstatus.MPRV     Controls the privilege mode in which the memory operations are performed                               C2
                           mstatus.SUM      Controls the permission for accessing user memory from supervisor mode                                 C2
         Privileged        mstatus.MXR      Controls the privilege with which loads access virtual memory                                          C2
                           mstatus.TVM      Controls the ability to edit virtual-memory configuration from supervisor mode                         C2
                           mstatus.TW       Controls the privilege modes that wait for interrupt (WFI) is allowed to execute                       C2
                           mstatus.TSR      Provides the ability to trigger a trap when SRET instruction is executed in supervisor mode            C2
                           mstatus.xXL      Controls the width of an integer register for privilege mode x, x = {S, U}                             C2
                           mstatus.SD       Indicate the combined state of mstatus.FS and mstatus.XS for context switches                          C1
                           {m,s}cause       Contains the trap cause when a trap is taken in to machine or supervisor mode                          C1
                           medeleg          Decides what type of exceptions are delegated to supervisor mode from machine mode                     C2
                           {m,s}counteren   Controls the availability of the hardware performance-monitoring counters for supervisor or user mode  C2
        Unprivileged       frm              Controls the dynamic rounding mode for floating-point operations                                       C2
                           fflags           Holds the accrued exceptions from the floating-point operations                                        C1

                      TABLE II: CSRs not monitored by ProcessorFuzz along with the reason for exclusion.

Category CSR Description Reason for Exclusion misa Reports the CPU capabilities of a hart Privileged mhartid Contains the integer ID of the hardware thread running the code {m,s}tvec Contains the trap handler base address and vector configuration for machine or supervisor mode Holds a constant satp Controls supervisor-mode address translation and protection value during testing PMP pmpcfg Contains the physical memory protection configuration pmpaddr Contains the physical memory protection addresses {m,s}ip Reports pending interrupts in machine or supervisor mode {m,s}ie Control what interrupts are enabled in machine or supervisor mode Interrupt mideleg Decides what type of interrupts are delegated from machine mode to supervisor mode dcsr Contains the configuration and status of debug extension Not supported by the testing Debug dpc Holds the program counter of the next instruction to be executed before entering debug mode infrastructure (e.g., RISC-V Extension dscratch Optional scratch register that holds temporary values ISA vector extension is not tselect Control which trigger is accessible through the other trigger registers supported and relevant CSRs tdata1-3 Holds trigger-specific data are excluded.) Vector vstart Holds the index of the first element to be executed by a vector instruction Extension vxsat Holds the saturation flag for fixed-point operations vxrm Controls the rounding mode used in the vector extension mcountinhibit Controls which hardware performance-monitoring counters are allowed to increment cycle Holds the elapsed cycle count of the CPU HPC instret Holds the number of retired instruction count Contains information to hpmevent Hardware performance-monitoring event selector assist designers during hpmcounter Performance-monitoring counter of the event selected by hpmevent analysis of a hardware bug {m,s}tval Hold the exception-specific information when a trap is taken to machine or supervisor mode rather than revealing the mscratch Holds a pointer to the machine mode context space while the hart executes in lower privilege fundamental issue Privileged {m,s}epc Contains the PC of an instruction that caused an exception for machine or supervisor mode sscratch Holds a pointer to the supervisor mode context space while the hart executes in user mode

instruction. Once tuples are created, the map is queried to check IV. EVALUATION whether the detected transition is new or a duplicate. Tuples In this section, we evaluate the effectiveness of Processor- that are identified to contain new transitions are added to the Fuzz using real-world processor designs. map while marking the current test input as interesting. The transition map is empty at the beginning of a fuzzing session A. Evaluation Setup and maintained throughout the session. 1) Implementation Details: ProcessorFuzz has two main implementation steps; generation of an extended trace log using the ISA simulator and building the TU (see Figure 3). For E. RTL Simulation and Trace Comparison the former, we extended Spike [53] open-source ISA simulator to store the values of monitored CSRs (see Table I). The If the TU determines that the current input results in a unique instrumentation overhead of Spike is 0.4% in terms of lines CSR transition, ProcessorFuzz launches the RTL simulation of C++ code, while the runtime overhead is 0.15%. For the and generates the extended RTL trace log. ProcessorFuzz then RTL simulation of all processor designs, we used Verilator [52], compares the extended RTL trace log with the extended ISA an open-source RTL simulator. We used the same mutation trace log. Any difference between these logs signifies a poten- engine (see Figure 3) as provided by DIFUZZRTL’s open- tial bug in the processor design and needs to be investigated source repository. Using the same engine is important since further by a verification engineer. In case the input does not our goal is to compare two coverage feedback mechanisms (i.e., result in a unique transition, ProcessorFuzz discards the input register coverage and CSR-transition coverage) rather than in- and proceeds to the next fuzzing iteration. put generation mechanisms. We separated transitions belonging

                                            6

to frm and fflags to separate floating-point operations from       While there exist several fuzzing benchmarks for software pro-
the rest of the CSRs.                                              grams [22], [36], this is not the case for processors. Therefore,
  1. Processor Designs: We use three real-world open-source we relied on a set of bugs (in total six bugs) previously reported RISC-V processors. Rocket Core is a Chisel [3] HDL-based by DIFUZZRTL for BOOM processor to evaluate the bug- open-source, general-purpose, in-order RISC-V processor core finding capability of ProcessorFuzz and perform a head-to- that can be generated using the Rocket Chip SoC Generator head comparison with DIFUZZRTL. Overall, our evaluation framework [2]. We used Spike [53] as a reference model to aims to demonstrate that ProcessorFuzz can guide the fuzzer verify the correctness during fuzzing. The commit version of efficiently to discover ground-truth bugs thanks to the CSR- the Rocket core that we used is 148d5d2. BOOM Core [9] transition feedback obtained using the ISA simulation. is an out-of-order, superscalar RISC-V processor core. It can In Table III, we report the TTE of bugs in seconds for three also be generated from the same Rocket Chip SoC Generator different settings in 2nd-4th columns; no-cov-difuzzrtl, framework [2] and is also designed in Chisel HDL. We used reg-cov-difuzzrtl, and ProcessorFuzz selected con- Spike ISA simulator to verify the correctness during fuzzing. figuration. selected configuration of ProcessorFuzz uses the The commit version of the BOOM core that we used is CSRs in Table I for transition extraction based on the criteria 148d5d2. BlackParrot Core [48] is an open-source 64-bit that we detailed in Section III-C3. We also provide the achieved RISC-V core, designed in the industry-standard SystemVerilog speedups by ProcessorFuzz over no-cov-difuzzrtl, and HDL. BlackParrot is silicon-validated and is in active develop- reg-cov-difuzzrtl. Besides selected, we provide re- ment. We used Dromajo [56] as a reference model to expose sults for two more different configurations of ProcessorFuzz the bugs in BlackParrot (commit bc3b48b). in column 5th-6th; fp-csr, and all-csr. These configu-
  2. Settings: We compared ProcessorFuzz with two rations differ in the CSRs that ProcessorFuzz monitors dur- different settings of DIFUZZRTL. The first setting is ing fuzzing. Specifically, all-csr configuration monitors no-cov-difuzzrtl where DIFUZZRTL fuzzing all implemented CSRs in the BOOM core. Here, by using framework is used without any coverage guidance (i.e., all-csr configuration, we aim to present that Processor- as a blackbox fuzzer). For all the cores that we evaluated, Fuzz can be effectively guided towards bugs by eliminating we successfully used this setting as a comparison point. The certain CSRs that do not assist fuzzing towards exploring bugs second setting is reg-cov-difuzzrtl where DIFUZZRTL (e.g., instret that repeatedly changes after an instruction fuzzing framework relies on register coverage as a guidance retires). Finally, fp-csr configuration uses only the floating- mechanism. While this setting is applicable to Rocket and point CSRs (unprivileged CSRs in Table I). The aim of BOOM Cores, it is not the case for BlackParrot Core. This this experiment is to show that ProcessorFuzz can focus on is because DIFUZZRTL’s register coverage passes do not certain parts of processors by selecting a subset of CSRs support SystemVerilog. They are tailored for FIRRTL [27], an (e.g., floating point unit). Overall, ProcessorFuzz selected intermediate representation (IR) used by Chisel HDL, which is configuration and DIFUZZRTL discovered five out of six used to design Rocket and BOOM cores. We tried to convert bugs reported in the DIFUZZRTL within the fuzzing time SystemVerilog to FIRRTL using an open-source tool (i.e., limit in our experiments. Unfortunately, we could not detect Yosys [62]), and apply DIFUZZRTL’s register coverage passes. #504 with any of the settings. In summary, ProcessorFuzz However, we observed several issues during this conversion (selected) achieved, on average, 1.21× (up to 2.1×) and due to the limited support for SystemVerilog to FIRTTL 1.23× (up to 2.32×) speedups over no-cov-difuzzrtl and conversion and thus failed to instrument BlackParrot. In our reg-cov-difuzzrtl, respectively. no-cov-difuzzrtl experiments, we used DIFUZZRTL as the sole comparison performed slightly better than regcov-difuzzrtl. point since it shows clear benefits over previous processor We included fp-csr configuration to demonstrate the Pro- fuzzing frameworks as well as its open-source nature. Also, cessorFuzz’s ability to change the scope of verification by for each setting, we reported Time-to-Exposures (TTE) which changing the CSR selection. fp-csr detected the bugs in the is defined as the total elapsed time from the starting of the floating-point unit (issues #492, #493 and #503) 2.08× faster fuzzing session until the bug is exposed. compared to the selected configuration while showing a
  3. Infrastructure: All the experiments based on ISA and slowdown in detecting other bugs. the RTL simulations were conducted on server nodes with We also show the effect of CSR selection on TTE of the Intel®Xeon®E5-2670 CPUs and CentOS Linux 7 as the op- bugs through all-csr configuration. all-csr configuration erating system. We fuzzed each processor design 10 times for failed to detect two of the bugs within the allocated fuzzing each setting and allocated 48 hours (2 days) of time limit for time. Moreover, selected is significantly faster (i.e., 16.49× each fuzzing instance. For each fuzzing instance, we dedicated on average) than all-csr in detecting bugs. two cores and 8GB of memory. In total, it took 4320 CPU To understand the performance of ProcessorFuzz and DI- hours to conduct all the experiments. FUZZRTL for different bugs, we further study the relationship B. Ground-truth Bugs among register coverage, CSR-transition coverage, and bug- finding times. Specifically, in Figure 6a, we show the mea- As discussed by prior works [31], [39], the bug-finding sured register coverage progress for different settings of DI-
capability of a fuzzer is the ultimate litmus test for a fuzzer.   FUZZRTL and ProcessorFuzz. Although ProcessorFuzz covers


                                             7

TABLE III: The speedup achieved by selected ProcessorFuzz configuration over no-cov-difuzzrtl, and reg-cov-difuzzrtl for
the ground-truth bugs in the BOOM processor. We also report speedup of fp-csr and all-csr ProcessorFuzz configurations
over selected ProcessorFuzz configuration. The runtime is set as 48 hours (172800 seconds) for bugs that could not be found.

       no-cov-     reg-cov-                                                 ProcessorFuzz                 ProcessorFuzz    ProcessorFuzz
     difuzzrtl     difuzzrtl                                                  (selected)                     (fp-csr)        (all-csr)
Issue                                                  Speedup             Speedup   Speedup        Speedup        Speedup
  No  Time (s)                              Time (s)   (over        Time (s)  (over   (over       Time (s)  (over  Time (s)  (over
                                                       no-cov)             no-cov)   reg-cov)            selected)       selected)
 #458    104.3   70.3                                     1.48            54  1.93           1.3  151324.8     0.0   172800     NA
 #454  32883.3                               45322        0.73       25020    1.31          1.81  119886.2     0.2  39523.3   0.63
 #492   2047.2                               4238.9       0.48        1821.2  1.12          2.32    1221.8    1.49   172800     NA
 #493    585.4  494.9                                     1.18         278.7   2.1          1.77     170.1    1.63    526.6   0.52
 #503   1463.7                               1011.1       1.44        2795.9  0.52          0.36     757.6    3.69  62246.8   0.04
 #504 172800                                 172800         NA        172800    NA            NA    172800      NA   172800     NA
 Geo.   3182.9                               3245.1       0.98        2630.7  1.21          1.23    8890.2    0.29  43402.2   0.06

600.0K                                                                               very selective when categorizing a test input as ‘interesting’.
500.0K                                                                               Consequently, ProcessorFuzz identified only 33% of the gener-
400.0K                                                                               ated test inputs as interesting. Moreover, ProcessorFuzz could
300.0K                                                                               expose the bugs faster although it used the least number of
200.0K                                                                               test inputs for the RTL simulation. Note that ProcessorFuzz
100.0K                                                  no-cov-difuzzrtl             launched the RTL simulation only with interesting inputs (i.e.,
                                                        reg-cov-difuzzrtl
0.0                                                    ProcessorFuzz                 curved dotted red line) and discarded any other generated input.
      0       8 16      24  32                              40   48                  Using the fast ISA simulation enabled ProcessorFuzz to quickly
                     Time (h)                                                        eliminate inputs that do not result in a new FSM state and spend
(a) Register coverage progress during fuzzing.                                       more time on inputs that explore new FSM states.
30.0K    no-cov-difuzzrtl (Total generated)
         reg-cov-difuzzrtl (Total generated)
25.0K                            no-cov-difuzzrtl (Interesting)                      C. Newly Discovered Bugs
         reg-cov-difuzzrtl (Interesting)
20.0K    ProcessorFuzz (Total generated)                                                In Table IV, we document the various new bugs discovered
         ProcessorFuzz (Interesting)
15.0K                                                                                by ProcessorFuzz in the selected processors mentioned earlier
10.0K                                                                                and in the ISA simulator used as a reference model. Here,
 5.0K                                                                                we provide detailed descriptions of three bugs chosen from
0.0                                                                                  different processors and reference model. The details of the re-
      0       8 16      24  32                              40   48                  maining bugs can be found in respective processor repositories.
                     Time (h)
              (b) Coverage increasing and total test input counts during fuzzing.     1) Bug Descriptions:  Bug 6. Any write attempt to the zero
                                                                                     register (i.e., x0) must be ignored according to the RISC-V
                        Fig. 6: Coverage details for different settings.             ISA. However, in BlackParrot, we detected that the x0 register

less number of states (i.e., achieves lower register coverage) is read as a non-zero value if one of the preceding division during fuzzing, it was still able to discover bugs faster. For instructions that writes to x0 is still in the pipeline. Further instance, ProcessorFuzz triggered the most challenging bug analysis revealed that this discrepancy is due to bypassing the based on the TTE (i.e., #454) after exploring 303K states while result of division operation to the following instruction even no-cov-difuzzrtl and reg-cov-difuzzrtl triggered when the destination register of a division operation is x0. that bug after exploring 364K and 354K states, respectively. ProcessorFuzz was able to identify this bug because a test input This particular bug shows that higher register state coverage that has this scenario caused a CSR transition in fflags due does not necessarily translate to a faster bug discovery. Indeed, to division by zero. An attacker can use this bug to obfuscate an increase in coverage due to value changes in datapath the behavior of malware. Specifically, malware can jump to an registers can mislead the fuzzer since inputs with similar address computed by an instruction that uses x0. characteristics (the multiplication example in Section II-C) are Bug 7. According to RISC-V privileged specification, the effec- repeatedly used by the fuzzer to generate a new set of inputs. tive privilege mode for implicit page table accesses should be In Figure 6b, we also show the total number of test inputs supervisor mode. However, we observed that Dromajo accesses that lead to a coverage increase, i.e. ‘interesting test inputs’, page tables in user mode privilege level when executing user- and the total number of inputs generated by the mutation mode programs. Further analysis revealed that Dromajo also engine for the two settings of DIFUZZRTL and ProcessorFuzz. carries out Physical Memory Protection (PMP) checks in user For no-cov-difuzzrtl and reg-cov-difuzzrtl, we mode when no PMP entries are set, violating the RISC-V ISA use the register coverage metric, the same metric used in privileged specification in two counts. DIFUZZRTL, to realize if a test input increases coverage. Bug 8. In a multi-level page table implementation, the accessed For ProcessorFuzz, we use the CSR-transition coverage metric (A), dirty (D), and user-mode (U) bits of a non-leaf page table to detect inputs that resulted in a coverage increase. The entry (PTE) are reserved for future use and should be cleared. results provide an important takeaway. Although ProcessorFuzz If these bits are set in a non-leaf PTE, the processor must raise generates significantly more inputs than other approaches, it is an instruction page fault when accessing the PTE according to

                                                                 8

Test Count Register Coverage

TABLE IV: Brief description of bugs discovered by ProcessorFuzz, and their current status, in various processor cores.
 Bug  Core /                                    Brief Description of the Bug                                 Status (Issue No)
     Simulator
 1   BlackParrot   Non-boxed single-precision floating point values are not interpreted as NaNs              Confirmed (#971)
 2   BlackParrot   Read-after-Write dependencies on fcsr.fflags are not satisfied.                           Fixed (#994)
 3   BlackParrot   When mstatus.FS is not set and the fcsr is written, FS is unexpectedly updated.           Fixed (#969)
 4   BlackParrot   The 2 low-bits of sepc CSR are not write-insensitive.                                     Fixed (#970)
 5   BlackParrot   No exception raised when writing certain read-only CSRs.                                  Fixed (#967)
 6   BlackParrot   Reading zero register, following specific instruction sequences, return unexpected non-   Fixed (#832)
                   zero values
 7   Dromajo       PMP checks are performed, and raise exceptions upon encountering violations, even         Confirmed (#46)
                   with no PMP entries set.
 8   Rocket &      Instruction page fault not raised when accessing non-leaf PTEs with certain unspecified   Fixed (#2905, #570)
        BOOM       page attributes.
 9   BOOM          mstatus.FS is gratuitously set to dirty.                                                  Confirmed (#969)
the RISC-V ISA. We discovered that Rocket and BOOM cores            direct the next round of test generation to target the uncovered
do not raise instruction page fault when software attempts to       parts of RTL. Unfortunately, these works are generally DUT-
access a PTE with any of A, D, or U bits set. This bug is similar   specific which hinders their general applicability.
to CWE-1209 [43] where failure to disable reserved bits allows          Formal verification methods (e.g., symbolic execution and
attackers to compromise the hardware state.                         model checking) are also widely used in hardware verification
  1. Timing Results: Table V provides the TTEs for six [6], [10], [46]. These methods use mathematical reasoning to newly identified bugs (Bug 1-6) in BlackParrot. We did not prove that a hardware design conforms to its specification. include Bug 7-9 since they were easily detected in all the Unfortunately, formal verification methods have a well-known settings. We were only able to compare ProcessorFuzz with state explosion problem, and therefore, do not scale well for no-cov-difuzzrtl. As detailed in Section IV-A3, we complex RTL designs such as a processor [12]. could not instrument BlackParrot with register coverage since DIFUZZRTL lacks support for SystemVerilog. ProcessorFuzz B. Hardware Fuzzing does not require any instrumentation on the RTL design, there- In Table VI, we provide a high-level overview of all fuzzing- fore, could successfully guide the fuzzer with CSR-transition based RTL verification approaches. For each approach, we coverage to expose bugs. Overall, ProcessorFuzz achieved include the input format, the coverage metric used to guide 1.57× speedup, on average, over no-cov-difuzzrtl. Note the fuzzer, and the method to identify bugs. that only ProcessorFuzz was able to detect Bug 6 from Ta- RFUZZ [33] proposes a new metric, the multiplexer toggle ble IV. Similar to the experiment that we conducted in the coverage. RFUZZ monitors all the multiplexers in the RTL de- BOOM processor using the ground-truth bugs, selected sign. It retains an input for further mutations if the input toggles configuration of ProcessorFuzz performed significantly better a previously uncovered multiplexer selection signal. A follow- compared to all-csr configuration (i.e., 15.61× faster). up work by Li et al. [35] enhances RFUZZ with symbolic Moreover, fp-csr configuration identified floating-point re- simulation. Both RFUZZ and Li et al. are highly coupled to lated bugs fairly faster (e.g., Bug 3) compared to other type of Chisel HDL which limits the applicability of the approach [49]. bugs (e.g., Bug 4 that focuses on sepc CSR). Additionally, monitoring multiplexers in complex designs intro- V. RELATED WORK duces excessive performance overhead [24]. ProcessorFuzz is We first present traditional methods in hardware verifica- agnostic to HDL, which makes it both practical and efficient.
tion. Then, we explain fuzzing-based hardware verification             Trippel et al. [57] translate hardware designs to software
approaches and how ProcessorFuzz differs.                           models and fuzzes those models. This way, available coverage
A. Traditional Hardware Verification                                metrics used by software fuzzers (e.g., basic block and edge)
                                                                    can be used for fuzzing hardware as well. However, this method
     Random instruction generators [15], [20], [21], [23], [34]     of converting hardware designs to software models introduces
have been commonly used in processor verification since they        additional challenges such as proving the equivalency between
require limited human expertise and are scalable to large RTL       hardware design and software model [49].
designs. The lack of coverage guidance in these tools leads              TheHuzz [30] relies on a variety of coverage metrics ex-

to the generation of the repetitive inputs that test the same tracted using industrial-standard tools such as Cadence [7] processor functionalities, thereby decreasing the chances of and ModelSim [51]. TheHuzz profiles individual instructions finding bugs [24], [33]. A verification engineer can target the to associate with relevant mutation strategies while generating uncovered RTL regions by adjusting the constraints that control new set of inputs. Unlike DIFUZZRTL or ProcessorFuzz, the random test generator. However, this method significantly TheHuzz does not propose a new coverage metric. TheHuzz increases engineering effort, and therefore, slows down the relies on several coverage metrics used in software testing verification process. To overcome this problem, researchers (i.e., statement, branch, line, and expression). As discussed by proposed several coverage-directed test generation mecha- prior works [24], [55], these metrics are not sufficient metrics nisms [4], [14], [16], [47], [54], [55], [59] that automatically to verify a processor. Moreover, it is not clear how registers

                                                9

TABLE V: The speedup of ProcessorFuzz over no-cov-difuzzrtl, and reg-cov-difuzzrtl for the discovered bugs in the BlackParrot processor. We also report speedup of fp-csr and all-csr ProcessorFuzz configurations over selected ProcessorFuzz configuration. We state the maximum allowed runtime of 48 hours (172800 seconds) for bugs that could not be found.

       no-cov-difuzzrtl   ProcessorFuzz (selected)    ProcessorFuzz (fp-csr)                       ProcessorFuzz (all-csr)
 Bug   Time (s)          Time (s)  Speedup  Time (s)  Speedup                                      Speedup        Speedup      Speedup
                              (over no-cov)      (over no-cov)                                 (over selected)  Time (s)  (over no-cov)  (over selected)
  1               464.9     230.2   2.02       430.2                                           1.08       0.54    1608.7           0.29             0.14
  2               95695   57441.3   1.67    100804.9                                           0.95       0.57    122076           0.78             0.47
  3              1520.1   1474.5    1.03       921.8                                           1.65       1.60    172800             NA               NA
  4               585.3       308   1.90       558.8                                           1.05       0.55   13560.4           0.04             0.02
  5               476.1     242.1   1.97       239.7                                           1.99       1.01   39150.9           0.01             0.01
  6              172800  147942.3   1.17      148655                                           1.16       1.00    172800             NA               NA
 Geo.            3849.9   2447.7    1.57      3044.4                                           1.26        0.8   38212.2           0.10             0.06

                                        TABLE VI: Existing RTL Fuzzers.

                                                         Input Format    Coverage Metric                               Evaluated RTL Designs       Bug Discovery Method
          RFUZZ [33]                                   A Series of Bits     Mux Toggle                                      Peripherals,                Assertion
                                                                                                                  RISC-V Processors (Sodor 1-3-5)
        Li et. al [35]                                 A Series of Bits  Full Mux Toggle                       Custom RISC-V Processor, OpenCore 1200   Assertion
  DIFUZZRTL [24]                                           Assembly     Register Coverage                                RISC-V Processors             Golden Model
                                                                                                                    (BOOM, Mork1x, Rocket Chip)
        DirectFuzz [8]                                 A Series of Bits     Mux Toggle                                     Same as RFUZZ                Assertion
  Trippel et al. [57]                                   Byte Sequence     Edge Coverage                                   RISC-V IP Cores        Golden Model, Assertion
         TheHuzz [30]                                      Assembly  Branch, Line, Statement,                 RISC-V Processors ( Rocket Chip, CVA6),  Golden Model
                                                                   Expression, DFF Toggle, FSM                         mor1kx, OpenCore 1200
       HYPERFUZZER [45]                                A Series of Bits     High-Level                                       Custom SoC               Property Check
  Logic Fuzzer [29]                                   A Series of Bits,        N/A                                       RISC-V Processors             Golden Model
                                                         Random Data                                                 (BlackParrot, BOOM, CVA6)
  ProcessorFuzz (this work)                                Assembly   Control Path Register,                             RISC-V processors             Golden Model
                                                                        ISA-Sim Transition                        (BOOM, BlackParrot, Rocket Chip)

that control FSM coverage are identified as the industrial-                                    not limited to the RISC-V-based processors and can be used in
tools are not open-sourced. We could not quantatively compare                                  processors based on other ISAs.
ProcessorFuzz with TheHuzz as TheHuzz is not open sourced.                                     Unintended RTL Transitions. ProcessorFuzz uses ISA simu-
The common goal of the aforementioned fuzzing works is to                                      lation as part of a feedback mechanism since it is faster and
maximize coverage of an RTL design, thereby discovering bugs                                   agnostic to the HDL. ProcessorFuzz does not use an input for
across the entire RTL design. Researchers have also proposed                                   RTL simulation if the input lacks a unique transition in its ISA
fuzzing frameworks for achieving alternate verification goals.                                 simulation trace. One limitation of this design choice is that
For instance, DirectFuzz [8] adapts the notion of directed                                     ProcessorFuzz can potentially miss certain bugs that follow the
greybox fuzzing and applies it to the RTL verification. The                                    given scenario. If a test input would result in an unintended
goal of DirectFuzz is to cover certain specific RTL regions                                    transition in RTL simulation but the same test input does not
with a targeted fuzzing approach. Here, the motivation is to                                   cause any unique transition in ISA simulation, such a test input
dedicate more fuzzing time to the RTL components that need                                     will be discarded. Hence, the bug will not be identified.
to undergo thorough testing. HYPERFUZZER [45] introduces
a new grammar that represents the hardware security proper-                                        VII. CONCLUSION
ties. During fuzzing, HYPERFUZZER checks if any of the                                          This work presents ProcessorFuzz, a processor fuzzer guided

fuzzer-generated inputs violates a security property. Defining by a novel CSR-transition coverage feedback obtained from the security properties manually can be the most accurate ISA simulation. ProcessorFuzz demonstrates that monitoring approach if their correctness are verified. However, defining CSR transitions can effectively guide fuzzing towards buggy properties requires human expertise which is error-prone. Logic processor states. Moreover, using ISA simulation instead of Fuzzer [29] randomizes control signals and states of a DUT RTL simulation can quickly eliminate inputs that result in the without compromising the functional correctness of the DUT. same coverage, thereby helping the fuzzer to test as many Logic Fuzzer needs to be provided with fuzzing targets (e.g., qualitatively different inputs as possible. Our experimental congestible points in an RTL design), and therefore requires results discovered eight new bugs in established, real-world, domain expertise. INTROSPECTRE [17] and Osiris [60] use RISC-V processors, and one new bug in a reference model. blackbox fuzzing approach to discover microarchitectural side A CKNOWLEDGMENT channels (e.g., Meltdown [37] and Spectre [32]) in processors. VI. DISCUSSION AND LIMITATIONS Parts of this work are funded by Air Force Research Labora- tory (AFRL) and Defense Advanced Research Projects Agency Other ISAs. In this work, we demonstrated the capability of (DARPA) under agreement number FA8650-18-2-7856, and by ProcessorFuzz using the RISC-V ISA. However, CSRs are not NSF Awards 2118628 and 1801052. The U.S. Government is only specific to the RISC-V architecture and defined as part of authorized to reproduce and distribute reprints for Governmen- many other ISAs including x86. Therefore, ProcessorFuzz is tal purposes notwithstanding any copyright notation thereon.

                                    10

                        REFERENCES                                               [28]  Y. Jang, S. Lee, and T. Kim, “Breaking kernel address space layout
                                                                                       randomization with intel tsx,” in ACM SIGSAC Conference on Computer

[1] V. V. Acharya, S. Bagri, and M. S. Hsiao, “Branch guided functional test and Communications Security, 2016, p. 380–392. generation at the rtl,” in IEEE European Test Symposium, 2015, pp. 1–6. [29] N. Kabylkas, T. Thorn, S. Srinath, P. Xekalakis, and J. Renau, “Effective [2] K. Asanovi´c et al., “The rocket chip generator,” EECS Department, Uni- processor verification with logic fuzzer enhanced co-simulation,” in versity of California, Berkeley, Tech. Rep. UCB/EECS-2016-17, 2016. International Symposium on Microarchitecture, 2021, pp. 667–678. [3] J. Bachrach et al., “Chisel: Constructing hardware in a scala embedded [30] R. Kande, A. Crump, G. Persyn, P. Jauernig, A.-R. Sadeghi, A. Tyagi, language,” in Design Automation Conference, 2012, pp. 1212–1221. and J. Rajendran, “{TheHuzz}: Instruction fuzzing of processors using [4] M. Bose, J. Shin, E. M. Rudnick, T. Dukes, and M. Abadir, “A genetic {Golden-Reference} models for finding {Software-Exploitable} vulner- approach to automatic bias generation for biased random instruction gen- abilities,” in USENIX Security Symposium, 2022, pp. 3219–3236. eration,” in Proceedings of the Congress on Evolutionary Computation, [31] G. Klees, A. Ruef, B. Cooper, S. Wei, and M. Hicks, “Evaluating fuzz vol. 1, 2001, pp. 442–448. testing,” in ACM SIGSAC CCS, 2018, pp. 2123–2138. [5] C. Brubaker, S. Jana, B. Ray, S. Khurshid, and V. Shmatikov, “Using [32] P. Kocher and othersl, “Spectre attacks: Exploiting speculative execution,” frankencerts for automated adversarial testing of certificate validation in in Symposium on Security and Privacy, 2019, pp. 1–19. ssl/tls implementations,” in IEEE Symposium on Security and Privacy, [33] K. Laeufer, J. Koenig, D. Kim, J. Bachrach, and K. Sen, “Rfuzz: 2014, pp. 114–129. Coverage-directed fuzz testing of rtl on fpgas,” in International Con- [6] Cadence, “JasperGold Formal Verification Platform,” 2019. ference on Computer-Aided Design, 2018, pp. 1–8. [7] Cadence, “Circuit Simulation,” https://www.cadence.com/en US/home/ [34] Y. Lee and H. Cook, “riscv-torture,” https://github.com/ucb-bar/ tools/custom-ic-analog-rf-design/circuit-simulation.html, 2022. riscv-torture, 2015. [8] S. Canakci, L. Delshadtehrani, F. Eris, M. B. Taylor, M. Egele, and [35] T. Li, H. Zou, D. Luo, and W. Qu, “Symbolic simulation enhanced A. Joshi, “Directfuzz: Automated test generation for rtl designs using coverage-directed fuzz testing of rtl design,” in International Symposium directed graybox fuzzing,” in Design Automation Conference, 2021. on Circuits and Systems, 2021, pp. 1–5. [9] C. Celio, D. A. Patterson, and K. Asanovi´c, “The berkeley out-of-order [36] Y. Li, S. Ji, Y. Chen, S. Liang, W.-H. Lee, Y. Chen, C. Lyu, C. Wu, machine (boom): An industry-competitive, synthesizable, parameterized R. Beyah, and P. Cheng, “Unifuzz: A holistic and pragmatic metrics- risc-v processor,” EECS Department, University of California, Berkeley, driven platform for evaluating fuzzers,” in USENIX Security, 2021. Tech. Rep. UCB/EECS-2015-167, Jun 2015. [37] M. Lipp et al., “Meltdown: Reading kernel memory from user space,” in [10] M. Chen and P. Mishra, “Property learning techniques for efficient USENIX Security Symposium, 2018, pp. 973–990. generation of directed tests,” IEEE Transactions on Computers, vol. 60, [38] LLVM, “libfuzzer,” https://llvm.org/docs/LibFuzzer.html\#corpus, 2021. no. 6, pp. 852–864, 2011. [39] V. J. M. Man`es, H. Han, C. Han, S. K. Cha, M. Egele, E. J. Schwartz, [11] R. R. Collins, “The Pentium FOOF bug. Dr. Dobb’s Journal,” https: and M. Woo, “The art, science, and engineering of fuzzing: A survey,” //drdobbs.com/embedded-systems/the-pentium-f00f-bug/184410555, IEEE Transactions on Software Engineering, 2019. 1998. [40] L. Martignoni, R. Paleari, G. F. Roglia, and D. Bruschi, “Testing cpu [12] G. Dessouky, D. Gens, P. Haney, G. Persyn, A. Kanuparthi, H. Khattri, emulators,” in Proceedings of the International Symposium on Software J. M. Fung, A.-R. Sadeghi, and J. Rajendran, “Hardfails: Insights into Testing and Analysis, 2009, pp. 261–272. software-exploitable hardware bugs,” in USENIX Security Symposium, [41] Microsoft, “onefuzz,” https://github.com/microsoft/onefuzz, 2020. 2019, pp. 213–230. [42] C. Min, S. Kashyap, B. Lee, C. Song, and T. Kim, “Cross-checking se- [13] A. Edelman, “The mathematics of the pentium division bug,” SIAM mantic correctness: The case of finding file system bugs,” in Proceedings Review, vol. 39, no. 1, pp. 54–67, 1997. of the Symposium on Operating Systems Principles, 2015, pp. 361–377. [14] S. Fine and A. Ziv, “Coverage directed test generation for functional [43] MITRE, “Hardware design CWEs,” https://cwe.mitre.org/data/definitions/ verification using bayesian networks,” in Design Automation Conference, 1194.html, 2019. 2003, pp. 286–291. [44] D. Moundanos, J. A. Abraham, and Y. V. Hoskote, “Abstraction tech- [15] I. Futurewei Technologies, “force-riscv,” https://github.com/ niques for validation coverage analysis and test generation,” IEEE Trans- openhwgroup/force-riscv, 2020. [45] actions on Computers, vol. 47, no. 1, pp. 2–14, 1998. [16] R. Gal, E. Haber, W. Ibraheem, B. Irwin, Z. Nevo, and A. Ziv, “Automatic S. K. Muduli, G. Takhar, and P. Subramanyan, “Hyperfuzzing for soc scalable system for the coverage-directed generation (cdg) problem,” in security validation,” in Proceedings of the International Conference on Design, Automation & Test in Europe Conference & Exhibition, 2021, [46] Computer-Aided Design, 2020, pp. 1–9. pp. 206–211. R. Mukherjee, D. Kroening, and T. Melham, “Hardware verification using [17] M. Ghaniyoun, K. Barber, Y. Zhang, and R. Teodorescu, “Introspectre: A software analyzers,” in Computer Society Annual Symposium on VLSI, pre-silicon framework for discovery and analysis of transient execution [47] 2015, pp. 7–12. vulnerabilities,” in International Symposium on Computer Architecture, G. Nativ, S. Mittennaier, S. Ur, and A. Ziv, “Cost evaluation of cov- 2021, pp. 874–887. erage directed test generation for the ibm mainframe,” in Proceedings [18] Google, “Oss-fuzz: Continuous fuzzing for open source software,” https: [48] International Test Conference, 2001, pp. 793–802. //github.com/google/oss-fuzz, 2016. D. Petrisko, F. Gilani, M. Wyse, D. C. Jung, S. Davidson, P. Gao, C. Zhao, [19] Google, “American fuzzy lop,” https://github.com/google/AFL, 2017. Z. Azad, S. Canakci, B. Veluri, T. Guarino, A. Joshi, M. Oskin, and [20] Google, “Riscv-dv,” https://github.com/google/riscv-dv, 2021. M. B. Taylor, “Blackparrot: An agile open-source risc-v multicore for [21] S. Group, “Shakti aapg,” https://gitlab.com/shaktiproject/tools/aapg/-/ [49] accelerator socs,” IEEE Micro, vol. 40, no. 4, pp. 93–102, 2020. wikis/Wiki, 2018. A.-R. Sadeghi, J. Rajendran, and R. Kande, “Organizing the world’s [22] A. Hazimeh, A. Herrera, and M. Payer, “Magma: A ground-truth fuzzing largest hardware security competition: Challenges, opportunities, and benchmark,” ACM POMACS, vol. 4, no. 3, pp. 1–29, 2020. lessons learned,” in Proceedings of the Symposium on VLSI, 2021, pp. [23] V. Herdt, D. Große, E. Jentzsch, and R. Drechsler, “Efficient cross- [50] 95–100. level testing for processor verification: A risc-v case-study,” in Forum O. Sahin, A. K. Coskun, and M. Egele, “Proteus: Detecting android for Specification and Design Languages (FDL), 2020, pp. 1–7. emulators from instruction-level profiles,” in International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 2018, pp. 3–24. [24] J. Hur, S. Song, D. Kwon, E. Baek, J. Kim, and B. Lee, “Difuzzrtl: [51] SIEMENS, “Modelsim,” https://eda.sw.siemens.com/en-US/ic/modelsim/, Differential fuzz testing to find cpu bugs,” in Security and Privacy, 2021, 2022. pp. 1286–1303. [52] W. Snyder, “Verilator, a Verilog/Systemverilog simulator and compiler,” [25] Intel, “Intel Annual Report,” https://www.intel.com/content/www/us/en/ https://www.veripool.org/verilator/, 2018. history/history-1994-annual-report.html, 1994. [53] R.-V. Software, “Spike RISC-V ISA Simulator,” https://github.com/ [26] Intel, “Machine check error avoidance on page size change,” riscv-software-src/riscv-isa-sim, 2019. https://www.intel.com/content/www/us/en/developer/articles/ [54] G. Squillero, “Microgp—an evolutionary assembly program generator,” troubleshooting/software-security-guidance/technical-documentation/ Genetic Programming and Evolvable Machines, vol. 6, no. 3, pp. 247– machine-check-error-avoidance-page-size-change.html, 2022. 263, 2005. [27] A. Izraelevitz et al., “Reusability is firrtl ground: Hardware construction [55] S. Tasiran, F. Fallah, D. G. Chinnery, S. J. Weber, and K. Keutzer, languages, compiler frameworks, and transformations,” in International “A functional validation technique: biased-random simulation guided Conference on Computer-Aided Design, 2017, pp. 209–216. by observability-based coverage,” in Proceedings of the International

                       11

  Conference on Computer Design: VLSI in Computers and Processors,
  2001, pp. 82–88.

[56] E. Technologies, “Dromajo - Esperanto Technology’s RISC-V Reference Model,” https://github.com/chipsalliance/dromajo, 2019. [57] T. Trippel, K. G. Shin, A. Chernyakhovsky, G. Kelly, D. Rizzo, and M. Hicks, “Fuzzing hardware like software,” arXiv preprint arXiv:2102.02308, 2021. [58] J. Van Bulck et al., “Foreshadow: Extracting the keys to the Intel SGX kingdom with transient out-of-order execution,” in Usenix Security Symposium, 2018. [59] I. Wagner, V. Bertacco, and T. Austin, “Stresstest: an automatic approach to test generation via activity monitors,” in Design Automation Confer- ence, 2005, pp. 783–788. [60] D. Weber, A. Ibrahim, H. Nemati, M. Schwarz, and C. Rossow, “Osiris: Automated discovery of microarchitectural side channels,” in USENIX Security Symposium, 2021, pp. 1415–1432. [61] R. Wojtczuk, “PV Privilege Escalation,” https://lists.xen.org/archives/ html/xen-announce/2012-06/msg00001.html, 2012. [62] C. Wolf, “Yosys open synthesis suite,” https://yosyshq.net/yosys/, 2014.

                      12