Source 274654fd... — STIMSMITH

SOURCE ARCHIVE

SHA256: 274654fdf2f64c192ef588a57eccd504b4b7725d3f447885266bfc5510535ed7

URL: https://masc.soe.ucsc.edu/docs/micro21.pdf

TYPE: application/pdf

SIZE: 790.3 KB

FETCHED: 5/28/2026, 10:27:46 PM

EXTRACTOR: liteparse

CHARS: 87,335

EXTRACTED CONTENT

87,335 chars

 Effective Processor Verification with Logic Fuzzer Enhanced
                        Co-simulation

Nursultan Kabylkas Tommy Thorn Shreesha Srinath∗ nkabylka@ucsc.edu tommy.thorn@esperantotech.com shreesha.srinath@intel.com UC Santa Cruz Esperanto Technologies Intel Santa Cruz, CA, USA Mountain View, CA, USA Portland, OR, USA Polychronis Xekalakis∗ Jose Renau pxekalakis@nvidia.com renau@ucsc.edu Nvidia UC Santa Cruz Portland, OR, USA Santa Cruz, CA, USA ABSTRACT CCS CONCEPTS The study on verification trends in the semiconductor industry • Computing methodologies → Model verification and vali- shows that the design complexity is increasing, fewer companies dation; Simulation evaluation; Simulation tools; • Computer achieve first silicon success and need more spins before production, systems organization → Architectures. companies hire more verification engineers, and 53% of the whole hardware-design-cycle is spent on the design verification [18]. The KEYWORDS cost of the respin is high, and more than 40% of the cases that con- microprocessor verification, co-simulation, enhanced simulation, tribute to it are post-fabrication functional bug exposures [16]. The RISC-V study also shows that 65% of verification engineers’ time is spent on debug, test creation, and simulation [17]. This paper presents a ACM Reference Format: set of tools for RISC-V processor verification engineers that help to Nursultan Kabylkas, Tommy Thorn, Shreesha Srinath, Polychronis Xekalakis, expose more bugs before production and increase the productiv- and Jose Renau. 2021. Effective Processor Verification with Logic Fuzzer ity of time spent on debugging, test creation and simulation. We Enhanced Co-simulation. In The 54th Annual IEEE/ACM International Sym- present Logic Fuzzer (LF), a novel tool that expands the verification posium on Microarchitecture (MICRO-54), October 18–22, 2021, Athens, Greece. space exploration without the creation of additional verification ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/1122445.1122456 tests. The LF randomizes the states or control signals of the design- 1 under-test at the places that do not affect functionality. It brings the INTRODUCTION processor execution outside its normal flow to increase the number Modern microprocessors are complex systems. The study on ver- of microarchitectural states exercised by the tests. We also present ification trends in the semiconductor industry shows that the de- Dromajo, the state of the art processor verification framework for sign complexity is increasing, fewer companies achieve first silicon RISC-V cores. Dromajo is an RV64GC emulator that was designed success, companies hire more verification engineers, and 53% of specifically for co-simulation purposes. It can boot Linux, handle the whole hardware-design-cycle is spent on the design verifica- external stimuli, such as interrupts and debug requests on the fly, tion [18]. and can be integrated into existing testbench infrastructure with When verifying microprocessors, the common practice is to minimal effort. We evaluate the effectiveness of the tools on three build a co-simulation infrastructure [27, 28]. The co-simulation RISC-V cores: CVA6, BlackParrot, and BOOM. Dromajo by itself compares the design-under-test (DUT) execution against the high- found a total of nine bugs. The enhancement of Dromajo with the level software model of the design, also known as the “golden” Logic Fuzzer increases the exposed bug count to thirteen without model. The underlying idea is simple: when we run the code on creating additional verification tests. the DUT and the model, the architectural state of both must be the same at any given moment. In the case of a mismatch, reasons are investigated, and bugs are uncovered. ∗The contribution was made while at Esperanto Technologies To get confidence about the DUT correctness, engineers use various proxy metrics, such as automatic code-based and circuit- structure-based coverage, as well as manual implementation spe- Permission to make digital or hard copies of all or part of this work for personal or cific functional coverage [38]. Another typical metric is to track classroom use is granted without fee provided that copies are not made or distributed the bugs found per week. The verification team thoroughly studies for profit or commercial advantage and that copies bear this notice and the full citation the architecture specification and carefully designs the functional on the first page. Copyrights for components of this work owned by others than ACM coverage models. The plan must cover ample verification space and, must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a most importantly, consider the corner cases [20]. The team then fee. Request permissions from permissions@acm.org. sets the goals for the above-mentioned metrics, and the DUT is in- MICRO-54, October 18–22, 2021, Athens, Greece tensely stressed by adding numerous tests to regression, gradually © 2021 Association for Computing Machinery. uncovering bugs and increasing the coverage until defined goals ACM ISBN 978-1-4503-XXXX-X/18/06. . . $15.00 https://doi.org/10.1145/1122445.1122456 are achieved.

MICRO-54, October 18–22, 2021, Athens, Greece Kabylkas et al.

    In the context of the described state-of-the art verification prac-            the DUT. Our results show that the mere integration of Dromajo
     tices, we want to bring up two specific challenges. First, the de-         into the testbenches of existing open-source RISC-V cores exposed
        scribed infrastructure does not guarantee that the processor is   bugs and proved its effectiveness.
         bug-free when sending the design for fabrication. The study on         To the best of our knowledge, Dromajo is the only available RISC-
     verification trends shows that fewer companies achieve first sili-        V verification framework that supports checkpoints. The checkpoint
       con success due to increased complexity and need costly re-spins  is a snapshot of the processor’s architectural state that is taken after
         before production. Despite the immense amount of co-simulation          executing a certain amount of instructions. Dromajo can generate
        and achieved high coverage, 40% of the reasons that cause a re-            such checkpoints with arbitrary RISC-V applications and resume
   spin are functional bugs that escape to silicon [18]. These bugs get        them with the co-simulation enabled. It addresses the productivity
   exposed only during the silicon validation or even worse at the end-         of time spent on simulation. For example, the checkpointing capa-
   customer. The verification engineers call these bugs outlier bugs or        bilities of Dromajo can allow testing of lengthy running programs,
      simulation resistant super bugs. The outlier bugs exposed neither             such as SPEC benchmarks running on Linux in parallel. The way
        by random instruction streams nor by directed tests because the        to achieve this is to run the program on Dromajo standalone (fast)
       sequence of events for the bug to occur is too complicated. They               and dump N checkpoints along the run. These checkpoints are
       can only “be exposed by exercising the design outside its normal           then spawned across N different simulations and co-simulated in

flow or operating parameters" [5]. The second challenge is that verification requires large amounts of human effort and resources. It is reported that 53% of the whole hardware-design-cycle is spent on design verification. Out of that time, to reach the defined coverage goals, verification engineers spend 21% of their time generating tests and running them on the simulator. When bugs are uncovered, they spend 44% of their time on debugging [18]. This paper’s contribution is the novel Logic Fuzzer, a technique that brings the processor’s execution outside of its normal flow and increases the chances of finding outlier bugs in the simulation phase. The key in simulation-based verification is to develop the code sequence that will bring the processor into a buggy microar- chitectural state that results in an inconsistent architectural state with the golden model. The Logic Fuzzer increases the microarchi- tectural states reached without the engineering effort of developing new code sequences, hence increasing the productivity of time spent on the test creation. The overall idea is to randomize states or control signals in the DUT that does not affect the correctness. For example, the re-order buffer (ROB) may assert a full or stall signal even when it is not full. It is also possible to change the branch pre- dictor tables at any given moment or even insert the instructions in the mispredicted path. The Logic Fuzzer can change the number of cycles but does not corrupt the functionality or the program order. Our results show that atypical microarchitectural states created by Logic Fuzzer expose more bugs during the simulation phase. Logic Fuzzer is not to be confused with the fuzzing of input stim- uli [26, 30, 36]. It is the fuzzing of the actual logic. The inserted logic stirs up the execution paths while running the code and brings the processor outside its normal flow. It does not require a specialized type of code and operates independently of existing verification infrastructure. In addition to the previous contribution, we built the state of the art co-simulation framework for RISC-V cores. We called it Dromajo [39], and it is open-source released. Dromajo is an RV64GC emulator that was designed specifically for co-simulation purposes. It can boot Linux and handle external stimuli, such as interrupts and debug requests on the fly, and be integrated into existing testbench infrastructure with minimal effort. Dromajo addresses the need for productivity and shortens the debug-verify cycle. The co-simulation framework simplifies debugging because an engineer starts the investigation at the point closest to the divergence of the model and parallel. To sum up, this paper has the following contributions: • Propose and apply a novel Logic Fuzzer to find more bugs during the simulation phase. • Contribute Dromajo, the state of the art co-simulation frame- work to RISC-V community. It is the artifact of the project that we used to evaluate the effectiveness of Logic Fuzzer. • Exposure of thirteen bugs in three RISC-V cores: CVA6 [46], BlackParrot [3], and BOOM [6, 48]. Dromajo by itself found a total of nine bugs. The enhancement of Dromajo with the Logic Fuzzer increases the exposed bug count to thirteen without creating additional verification tests. We demonstrate that the presented tools are capable of exposing hardware malfunctions and inconsistencies with ISA that could prevent any complex software from running correctly. The paper also provides interesting observations related to Operating Systems. Three RISC-V cores that we used for evaluation have gone through several tapeouts and claim to boot and run Linux. More than half of the bugs found were OS related. Interestingly, a “well behaved" Linux will not have excercised most of the bugs. Our results show that being able to boot and run Linux is far from saying that the core is verified. The rest of the paper is organized as follows. Section 2 provides background concepts. Section 3 describes different types of Logic Fuzzer and implementation details. Section 4 goes over Dromajo and its main contributions. Section 5 briefly describes the evaluation methodology. We discuss the results in Section 6. Finally, we finish with the related works in Section 7 and conclude in Section 8. 2 BACKGROUND 2.1 Typical Verification Setup In simulation-based verification, the RTL model of the microproces- sor is translated into a high-level object-oriented software class [37]. The translated high-level model is then instantiated in the testbench along with the memory model. The memory gets prepopulated with the verification tests. After the simulation infrastructure is set up, we must come up with criteria to determine whether the tests that we are simulating pass or fail. The trivial option is the correctness checking by running directed tests. When a test completes execu- tion, the final result is checked against a pre-calculated answer. The test’s success or failure is determined based on the comparison of

Effective Processor Verification with Logic Fuzzer Enhanced Co-simulation MICRO-54, October 18–22, 2021, Athens, Greece

the test output and the pre-calculated answer. The directed tests 2.3.2 Trace comparison. Another way to implement a reference are heavily used in industry, but often the purpose of these tests is model checking setup is through trace comparison. This method to verify the basic functionality of the design or, on the contrary, to needs both of the models to be able to dump the execution logs. reach a well-thought-out corner case. For more than three decades, Typically, these logs contain information about program counter to test the design at a more rigorous level, designers both in in- flow and every single register/memory writeback. The traces are dustry and academia relied on the verification binaries that are then compared, and mismatches are flagged. This approach solves randomly generated. both of the issues mentioned above. 2.2 Random Instruction Generators Nevertheless, this technique fails to work when we test an ex- ternal stimulus, such as interrupts and debug requests. Due to Random Instruction Generator, also known as Test Program Gen- their asynchronous nature, an external stimulus can fire completely erator or Instruction Stream Generator, is a utility software that randomly during the execution. Because in the described infrastruc- generates randomized assembly instruction streams given the set ture, both models are running standalone and comparisons happen of configurations. The tests generated by the RIG sweeps a broad post factum, the single interrupt will cause execution logs to be range of implemented functionality. It can create complex test cases different [27]. that are hard to come up with for an engineer [2, 44]. 2.3.3 Co-simulation. To tackle the issue described in Section 2.3.2, Stressing the design-under-test with complete random instruc- an infrastructure that runs both models in parallel and supports tion streams can be thought of as testing the system in “breadth.” communication between the models must be built. The cores simul- Complete randomness does not provide control over the generated taneously start executing the same code and pass messages to one tests. The probability of hitting a bug that is hidden “deep down” another. In general, we must support two types of messages. under the complex interactions among different units is very low. First, at the specified event, for example, when an instruction is To close this gap, some RIGs give the ability to have control over committed, the RTL model will signal the reference to commit an in- the generated tests through test program templates. The template is struction and compare the states of interest. The failed comparison an abstract description of the test and describes a set of constraints immediately halts the execution, and the stimulus that caused it is that the generator should satisfy. Hence, it is giving the ability to reported. This approach simplifies debugging because an engineer manage the direction and the “depth” of the generated tests [10, 41]. starts the investigation at the point closest to the divergence. The next question is: how do we determine if the verification Second, to support asynchronous interrupts, the setup must code failed or passed? Self-checking techniques are not applicable support the messaging that overwrites the emulator’s execution due to the random nature of the generated tests. We can solve this path. When the RTL flags an interrupt, it must be able to inform issue by building an infrastructure that compares the execution the emulator to follow its execution path [1]. with the reference model. 2.3 Reference Model Comparison 2.4 Formal Verification The reference, or the golden model, is the high-level software model Formal verification techniques have shown promising results [24] of a processor. The characteristics of such a model are that it is and are getting more popular [9, 43]. This type of verification ex- fast and uncomplicated, it does not reflect any details of the im- haustively examines all possible execution paths. It rigorously tries plementation and the changes to the architectural state happen in to prove or disprove the correctness of a formal model of a de- instruction-level granularity. sign. Nevertheless, due to scalability issues, formal methods are The reference model comparison is the verification technique applicable only on a modular level or on the designs of a moderate that, as the name suggests, compares the execution paths of the size and complexity [21]. To gain confidence in the system’s cor- implementation and the model. The underlying idea is simple: when rectness with high complexity, such as a modern microprocessor, we run the code on the DUT and the model, the architectural state industry still heavily relies on the dynamic verification techniques of both must be the same at any given moment. We can implement or simulation-based verification. the comparison with the reference model in different ways with The simulation-based approach, on the other hand, is scalable different levels of complexity. but deals only with the finite set of execution paths. It can verify the system’s correctness on the finite number of states based on 2.3.1 End-of-simulation comparison. The end-of-simulation com- the stimuli that get fed into the system. For the microprocessors, parison is a cheap and simplistic setup. This method compares only this stimuli is a verification code-base that usually consists of the the architectural state once the test completes. In other words, the following three: (1) simple directed unit-tests, (2) real applications, same code is run both on the reference model and the RTL imple- and (3) randomly generated instructions. mentation. At the end of the simulation, we dump the register file 3 LOGIC FUZZER states and the memory of both and compare against each other. If any of the values do not match, the reasons are investigated. This Logic Fuzzer (LF) inserts small pieces of logic into the RTL imple- approach’s drawback is that a buggy behavior that got reflected mentation that does not break the microprocessor’s functionality. in the architectural state can be overwritten and hidden by later We can fuzz any logic that does not affect the functionality to reach correct execution. Besides, even when the mismatch is detected, a broader range of processor microarchitectural states. Next, we another problem is that it is challenging to debug as we might be discuss fuzzable logic that is typical to the modern microprocessors, very far from the point of divergence [25]. but the concept can be generalized to an arbitrary digital design.

MICRO-54, October 18–22, 2021, Athens, Greece Kabylkas et al.

3.1 Congestors: A Case for Fuzzing 3.2 Table Mutatators The simplest type of LF is a congestor. As shown in Figure 1, a Table mutators allow RTL memories to be mutated. For example, simple example of a congestor is an or-gate inserted at the full signal the tables of branch predictors can be freely fuzzed at any given of a FIFO module. The full signal gets activated even though the moment as they must not affect the correctness of the running condition for it to become full has not been satisfied. The congestor program. Other examples are the random invalidation of the cache is then randomly activated, which results in artificial backpressure. or TLB entries or the value fuzzing of already invalid entries. We could also locate the congestor at busy signals and ready-valid handshake signals. (a)

                                                                            (b)
                                                            FIFO
                KJ   QQ   Data                                              (c)
                                                full                              W0 W1 W2     W3 W4 W5   W6     W7

                                                                               1000      2000      3000
                          (a) Original Design
                                                                       Figure 2: CVA6’s L1 cache way/bank utilization without (a)
                                                            FIFO       tag array mutation and (b)(c) with tag array mutation
                J    Q    Data
                K    Q                                                      Taking an example from the application to CVA6, Figure 2 illus-
                                      Congest   full                   trates L1 cache way/bank utilization (stores only) when running
                                                                       over 50 random tests that were generated with Google’s riscv-dv
                                                                       tool[15]. We run this set of tests three times. The first run, row (a),
                    (b) Design After Congestor Insertion               shows the regular run with the table mutation off. As can be seen,
                                                                       given the memory locations that program requests, CVA6’s way

Figure 1: Congestor Logic Fuzzer placed at the FIFO’s full selection logic gives preference to way 0. In the verification phase, signal an engineer may notice this fact and might decide to stress under- utilized ways. Traditionally, the engineer would have to regenerate the binaries by configuring the random instruction generator to We claim that the inserted logic stirs up the execution while provide the memory requests in a specific manner. The problem running the code. We can prove this by showing that after inserting with this approach is that this may be a time-consuming process the congestor logic and running the same tests, the design is expe- and will require delving into the cache replacement policies’ details. riencing different behavior, and we can observe the new activity. Besides, the tool may not even support constraining the address We can observe this by measuring the Toggle Coverage. The signal generation. is said to be toggled if its value switched 0 → 1 and 1 → 0 at least The second and the third run, rows (b) and (c), illustrate that we once while executing the test. Toggle coverage is one of the proxy can mutate the tag arrays and the valid bits to stir all the cache metrics that is used both in industry [38] and academia [26] to gain accesses to the bank of interest with the minimum amount of effort. confidence about the correctness of the design-under-test. To be specific, we edited five lines of code on the RTL side to replace For example, in the case of BOOM, we inserted a congestor at tag arrays with the wrappers that access Table Mutators through the ready signal of the Reorder Buffer. We then randomly pulled the DPI and the simplistic twelve-line method implementation in Table ready signal low at the moments when the ROB was, in fact, ready. Mutator class that mutates the entries to stress the cache bank of As a result, 12 additional signals toggled in the frontend module, 40 interest. signals toggled in the core module, and 32 signals toggled in the load-store-unit. 3.3 Stressing mispredicted path For instance, according to the comments in the RTL, the signal The branch predictor is a significant part of the design that has a that got activated in the load-store-unit (execute_ignore) “ignores significant effect on modern processors’ performance. The advances the next response that comes from memory and replays it.” An- in the branch prediction research have reached levels of accuracy other example is edge_inst signal in the fetchcontroller, which gets exceeding 95%. However, from the verification standpoint, this activated when “first instruction in the bundle is PC-2.” means that the mispredicted path may be overlooked. We demonstrated that single congestor can activate logic pieces It is important to test all of the instructions in the mispredicted that had not been touched when running over 200 tests. Note that path as some may have side effects. Figure 3 shows the instruction for this simplistic example, the usage of toggle coverage was suf- coverage in the mispredicted path. The y-axis of the plot represents ficient to capture and prove that Logic Fuzzer creates additional the number of unique RISC-V instructions that were speculatively activity. We discuss some drawbacks of the coverage metrics in 6.5. allowed into the pipeline and eventually flushed due to the correct

Effective Processor Verification with Logic Fuzzer Enhanced Co-simulation                                         MICRO-54, October 18–22, 2021, Athens, Greece

branch resolution. After running over 200 tests on CVA6, we see                  0x80000000
that the coverage does not reach even a 60% level. The reason is
that the instruction sequence in the program exhibit nonrandom
behavior and the same instructions get into the mispredicted path                0x60000000
repeatedly. With the fuzzing, we can insert any instruction into the                                                                         Fuzzing
mispredicted path regardless of the binary. Not only can we test                 0x40000000                                                  No fuzzing
100% of the instructions, but we also can reach that coverage level
earlier.
                                                                                 0x20000000
      100
                                                                                     0     2   4 6  8  10 12 14                         16 18
                                                                                                    Tests
       80
                                                                                Figure 4: Instruction address ranges retrieved from Branch
       60                                                                       Target Buffer while executing random instructions.

       40
                                                                                systematic, and configurable, we embedded the LF into the existing
       20                                                                       Dromajo infrastructure. We extended the base set of APIs to provide
                                       Fuzzing                                  access to the fuzzers from the RTL through the DPI calls. The fuzzers
        0                              No fuzzing                               are configured by Dromajo’s JSON configuration file.
          0      50  100      150      200
          Test #
                                                                                        RISC-V RTL                OpenCosim
Figure 3: The coverage of instructions that were in CVA6’s                                                        with Fuzzer
mispredicted path                                                                     Commit
                                                                                   instruction        step()
               Another interesting question to answer is: what will happen if     Access branch
speculative execution mechanisms will start generating instruction               predictor table    access_table()
addresses that are atypical. Figure 4 is a scatter plot where each                                                    Mutate tables
data point represents a Branch Target Buffer’s (BTB) prediction of                                  access_table()
the PC address in a given test. The red marks are the PC predictions
from when no fuzzing was enabled. We can see that the predicted                                       step()
addresses are within a narrow range. On the one hand, by design,
the BTB is supposed to provide predictions based on the history
of resolved branch target addresses. Hence, it is constrained to                 Figure 5: RTL-Fuzzer interaction flow
the address range that is encoded in the .text section of the elf
file. On the other hand, the processor must be robust enough to                             Figure 5 illustrates the interaction between the RTL and the
handle non-typical cases. The data points illustrated in blue circles           fuzzers. The diagram demonstrates how the RTL implementation is
are the BTB’s predictions when running the same tests with the                  accessing the table mutator of the processor’s branch predictor. The
fuzzing enabled. It is possible to fuzz BTB entries and provide falsely         fuzzer object in the Dromajo is configured to allocate the table that
predicted addresses to a broader range or even provide random                   has the same size as the branch predictor. On the implementation
addresses at runtime. These scenarios can potentially create an                 side, instead of accessing the RTL memory model, it accesses the
iTLB page faults on the mispredicted path. The same technique can               table from the fuzzer through the DPI. As the simulation is running,
be applied to Return Address Stack.                                             the tables are fuzzed randomly or with specific patterns.
3.4   Can the LF’s states ever happen in the real                                     For the congestors, the verification engineer first needs to iden-
      world?                                                                    tify all of the design’s congestible points. The identification of these
                                                                                points can be consulted with the designer. After we have the list
Logic Fuzzer could create a microarchitectural state that no program            of congestible signals, the fuzzer object is configured to create the
could ever reach. Nevertheless, the co-simulation failures exposed              same number of congestor objects. Each congestor’s period and
by fuzzing are potential bugs. They serve as a red flag for the                 random seeds are configured in the JSON file.
engineer and must be proved or disproved to be a bug. The bugs                             To address the productivity issue, we implemented a proof-of-
presented in this paper were all confirmed by the designers.                    concept automatic insertion of congestors into BOOM core. It al-
3.5   Logic Fuzzer Implementation                                               lowed us to insert congestors by merely annotating the signals in
                                                                                RTL (one line of code per congestor). We achieved this by using
             The Logic Fuzzer as a concept can be implemented directly in the   Chiffre by Eldridge et al [12] that automatically instruments hard-
RTL code. However, to make the Logic Fuzzer integration clean,                  ware systems written in Chisel [4]. Their work leverages FIRRTL

Instruction coverage (%)

      Address space

MICRO-54, October 18–22, 2021, Athens, Greece Kabylkas et al.

compiler that allows traversal and transformation over the inter- caches, TLBs, and other memory elements will start the execution mediate representation(IR) of the digital circuit with passes [22]. It from the reset state. It is problematic because we lose microarchi- automatically breaks the annotated signal and inserts the congestor tectural states from which the bug could potentially be manifested. in between. Because Chiffre can only work with hardware descrip- We believe that Logic Fuzzer’s Table Mutators can partially close tions written in Chisel, this experiment was limited to BOOM core. this gap as we can pre-populate or randomize all the tables. We are currently applying the described concept using tools capable of transforming IRs generated from Verilog [42]. 4.2 Verification Flow with Dromajo To insert random instructions into the mispredicted path, we Figure 6 illustrates one possible way to implement a RISC-V core make use of the table mutators. We replace the instruction cache verification flow using Dromajo in five steps. The whole flow can tag and data arrays with the table mutators. The tables are written be broken down into two main parts: checkpoint generation (Steps and read by the instruction cache logic through the DPI in the same 1, 2, 3) and co-simulation (Steps 4, 5). manner as shown in Figure 5. We then force the Branch History Table to provide a taken prediction, and we force the Branch Target Buffer to provide the address with a specific tag. The fuzzer tables are then programmed to provide a random instruction stream when RIG tests 2 it sees that specific tag. Linux 1 OpenCosim 3 Checkpoints Unit tests 4 DROMAJO 4 Dromajo is an emulator designed for co-simulation with RTL proces- sors that implement the RISC-V RV64GC instruction-set. Dromajo 5 Memory provides a simple set of APIs that allow for flexible integration for OpenCosim RISC-V core RTL a variety of RTL processor implementations. instance DPI calls implementation Dromajo enables executing applications, such as benchmarks running on Linux, under fast software simulation (17 MIPS). It can generate checkpoints after a given number of cycles, and resuming Testbench such checkpoints for HW/SW co-simulation. The results prove this to be a compelling way to capture bugs, especially in combination Figure 6: Verification flow with Dromajo with randomized tests.

4.1 Checkpoint Definition 4.2.1 Checkpoint Generation. First, Dromajo accepts an arbitrary Dromajo checkpoints include the processor architectural state (reg- RISC-V ELF binary (Step 1). The flow has been productive when isters, CSRs), the memory, but also reprogram interrupts (PLIC/CLINT), using randomly generated tests. We then run Dromajo stand-alone and performance counters such as the cycle and instructions exe- (Step 2) for a certain amount of time and dump the model’s whole cuted. This is achieved by creating a small valid bootrom. Check- architectural state to a checkpoint (Step 3). points in Dromajo consist of two parts: (a) memory image and (b) bootrom image. The created bootrom is a valid RISC-V program. It 4.2.2 Step and Compare. As shown in Figure 6, Dromajo gets in- leverages the RISC-V debug spec that allows changing many super- stantiated and encapsulated in the RTL as a submodule. When the visor registers. Since most RV64 CPUs support the RISC-V debug simulation starts, we load the checkpoint into both models’ main spec, the checkpoints created by Dromajo can be shared across dif- memories (Step 4). Dromajo instance is provided with the path to ferent CPUs without requiring changes in the initialization beyond the checkpoint location. At the same time, the RTL model has to loading a different memory and bootrom. populate the main memory and initialize the content through Ver- Checkpoint based co-simulation allows to create portable stimu- ilog function like readhex. Once the boot code completes running, lus and increases productivity without the need to recompile vari- both cores will have identical architectural states. ous benchmarks. A common use is to split the Linux boot sequence in several checkpoints to speed up verification. Other advantages 4.3 Dromajo Integration of using checkpoints include: (a) apply concepts of phase analysis Dromajo is compiled into a shared library. We then link this library and simulation points to capture important phases in a portable to a simulator and interact with Dromajo through DPI calls from format; and (b) allows a long-running program to be checkpointed the Verilog. The implementation of DPI functions is trivial. Mainly, and run in parallel which reduces the simulation costs. they serve the role of wrappers for Dromajo’s set of API functions. We could leverage the concept of Phase Analysis [35] and check- Below, we explain the only three functions that we need to call to point the benchmarks at the simulation points [34]. We then can integrate Dromajo into the verification flow. co-simulate, for example, SPEC benchmarks in a fast manner by Figure 7 depicts the interactions that happen between RTL and loading the simulation points that represent different phases of the Dromajo during the simulation. The DPI wrapper-function cosim_init() program. is invoked within an initial Verilog block. In turn, the cosim_init Despite the above-mentioned benefits, one disadvantage of co- invokes Dromajo’s initialization API function. It passes a path to simulation with checkpoints is that the branch predictor tables, the configuration file as an argument and initializes Dromajo. The

Effective Processor Verification with Logic Fuzzer Enhanced Co-simulation MICRO-54, October 18–22, 2021, Athens, Greece

RISC-V RTL        OpenCosim                                                 during the simulation. We instead prepopulate the memories before
                                                                            the simulation start.
                                                                                      By using Dromajo checkpointed infrastructure, we had a fully

Load deterministic architectural state in the evaluated cores. Independent checkpoint of this work, it may be interesting work to explore a DTM 2.0 where

                                                                            memory accessed to bring up binaries or model fake system calls.

Commit cosim_init() blocking CSR operations are implemented instead of non-blocking instruction step() step() 5 EVALUATION METHODOLOGY Asynch 5.1 RISC-V Cores interrupt raise_interrupt() The proposed verification tools, i.e., Dromajo co-simulation en- step() hanced with Logic Fuzzer, were evaluated on three open-source RISC-V processors. The details about each tested core are listed step() Mismatch below and summarized in the Table 1. halt 5.1.1 CVA6. Previously known as Ariane and developed in ETH Zurich. The development and maintenance has been transferred to OpenHW Group. It is written in SystemVerilog. CVA6 is a 6-stage, Figure 7: RTL-Dromajo interaction flow single issue, in-order core which implements the 64-bit RISC-V instruction set. It is capable of booting Linux and was taped out in 22nm technology [46]. function returns a pointer to the initialized Dromajo RISC-V refer- ence model. The configuration file contains the path to a checkpoint 5.1.2 BlackParrot. Joint work of the University of Washington and and the core-specific information, such as a memory map. Boston University. It is written in SystemVerilog. The BlackParrot The DPI wrapper-function step() communicates program counter, is a single issue, in-order core which implements the 64-bit RISC- instruction and store-data to Dromajo. This function should be V instruction set. BlackParrot was written in SystemVerilog. It is called whenever a valid instruction is committed (Step 5). For exam- capable of booting linux and 4-core configuration of BlackParrot ple, when integrating Dromajo into BOOM infrastructure, the DPI was taped out in 12nm technology [3]. can be called by implementing simple monitor logic in the Reorder 5.1.3 BOOM. Developed and maintained at UC Berkeley’s Berke- Buffer module. If the instruction at the head of the buffer is valid and ley Architecture Research group. It is written in Chisel hardware ready to be committed, we call the DPI. Upon invocation, Dromajo construction language. It is a generator that can be configured to commits one instruction on its side and conducts a comparison of generate verilog BOOM designs with various levels of complexity. communicated data. The function returns a non-zero code in case The generated cores implement the 64-bit RISC-V instruction set. of a mismatch, and we abort the execution. We used default MediumBoomConfig which is a 2-wide, out-of-order Co-simulation is a synchronous process, but interrupts are not. core [48]. One of the more complex configurations of the BOOM Hence, we need a way to log that a core took an interrupt and force was taped out in 28nm technology. the control-flow inside Dromajo to do the same. It allows us to co-simulate the interrupt trap handler routines. The DPI wrapper- function raise_interrupt() does that. It communicates the cause and Features CVA6 BlackParrot BOOM sets the trap vector in Dromajo. Execution in-order in-order out-of-order 4.4 Deterministic RISC-V co-simulation Issue width 1 1 2 (MedConfig) Extenstions RV64GC RV64G RV64GC One of the prerequisites of the co-simulation is a deterministic Priv. modes M, S, U M, S, U M, S, U simulation infrastructure. A common RISC-V verification infras- Virt. memory SV39 SV39 SV39 tructure tends to use Debug Transport Module (DTM) [7] to load Table 1: Summary of the cores used for evaluation the test binaries to RTL and generate artificial system calls. Interestingly, our experiments show that usage of DTM brings the core into a nondeterministic architectural state which led to false-positive co-simulation mismatches. The interaction with the host device through the memory-mapped DTM is sensitive to the characteristics and utilization of the machine running the simulator. 5.2 Evaluation Metrics As a result, the simulation is sometimes not deterministic. Since To evaluate the proposed tools and methodology we use a simple, DTM is so common, Dromajo also supports it. However, Dromajo yet powerful metric — a precise number of bugs found. We first allows creating memory and bootram checkpoints which makes run a set of binaries on the base setup, i.e. testbench infrastructure usage of DTM unnecessary. Also, avoidance of DTM usage speeds with only Dromajo enabled. We then run the same set of binaries up simulation as we no longer spend time uploading the binary with enabled Logic Fuzzers, which expose additional bugs.

MICRO-54, October 18–22, 2021, Athens, Greece Kabylkas et al.

5.3 Test Binaries which performs arbitration between icache and dcache requests. We used verification binaries from two available resources: RISC- The randomized backpressure stirs up the arbiter’s states and locks V ISA tests [14] and random instruction streams generated with the grant signal indefinitely at 0, not allowing any of the requests Google’s riscv-dv tool [15]. Table 2 summarizes the simulated test to go down. binaries that we run to get the presented results. 6.2.3 Bug ID B11. The Black-Parrot microarchitecture defines a FIFO queue between the frontend and the backend of the core. The Core No. of ISA tests No. of random tests purpose of this FIFO is to enqueue specific commands, such as PC CVA6 228 120 redirect and state reset, from the backend to the frontend. This BlackParrot 215 150 bug is exposed when we insert the congestor at the FIFO’s ready BOOM 228 120 signal. We create artificial backpressure by randomly pulling this Table 2: Summary of the simulated tests FIFO’s ready signal low. When we run the tests with the inserted congestor, Dromajo catches the mismatch as BlackParrot starts com- mitting instructions with the wrong PC. According to the designer, BlackParrot’s backend cannot handle the backpressure. Because 6 EVALUATION the microarchitecture has no stalling points past the decode stage, some backend commands will be lost if the queue is not ready. 6.1 Main Results 6.2.4 Bug ID B12. This bug is exposed when we generate irregular We evaluate the effectiveness of the tools on three RISC-V cores: addresses from BlackParrot’s Branch Target Buffer (BTB). The BTB CVA6, BlackParrot, and BOOM. We summarize our findings in Ta- fuzzing generated the address that maps to off-chip memory. This ble 3. Dromajo by itself found a total of nine bugs. The enhancement scenario resulted in the complete freeze of the system. According to of Dromajo with the Logic Fuzzer increases the exposed bug count the designer, addresses that are routed to a tile must be responded to thirteen. Note that we did not create additional tests when en- to. However, BlackParrot decoded the address and routed it to a abling Logic Fuzzer. It used the same set of tests listed in Table 2 to specific device on the tile, such as cfg or clint. In the case when no expose additional bugs. device matched, it hanged. We demonstrated that the presented tools are capable of exposing hardware malfunctions that could prevent any complex software 6.3 Bugs found by Dromajo from running correctly. The paper also provides interesting obser- 6.3.1 Bug ID B1. This bug is the result of incorrect update logic vations related to Operating Systems (OS). Three RISC-V cores that implementation of debug control status register and was exposed we used for evaluation claim to boot and run Linux. More than half by Dromajo. The execution divergence occurs right after dret which of the bugs found were OS related. Interestingly, a “well behaved" should jump to the PC indicated by dpc CSR and in the privileged Linux will not have excercised most of the bugs. Our results show mode that is indicated by prv bits in dcsr CSR. Dromajo starts exe- that being able to boot and run Linux is far from saying that the cution in the user-mode while CVA6 starts executes the following core is verified. instruction in machine-mode. According to the designer, the confu- To gain insights on how Dromajo and Fuzzer works, we next sion came from the fact that the core should update prv bits to the describe the bugs found. current running privileged level when entering debug mode. 6.2 Bugs found by Dromajo+Logic Fuzzer 6.3.2 Bug ID B2. This bug is in CVA6’s integer divide unit. The unit 6.2.1 Bug ID B5. This bug surfaces out when we mutate the ITLB fails to properly handle some corner case divide (div) and remainder entries. The random mutation made the instruction TLB entry valid (rem) instructions; Dromajo caught the mismatch when cores were but the corresponding translation was mutated to a non-existent executing division of -1/1. Dromajo committed a correct result by memory region. As a result, both Dromajo and CVA6 threw an assigning -1 to the destination register, while CVA6 committed 0. exception and steered the execution flow to the exception handler. 6.3.3 Bug ID B3. According to RISC-V ISA, stval CSR is written Dromajo halted the execution when reading an mcause register exception-specific information when the processor traps into a within the handler due to the mismatch. When trapping, Dromajo supervisor-mode. The ISA explicitly specifies when and which correctly sets the mcause value to 1 which indicates the exception information must be written to the register. Dromajo catches a cause Instruction Access Fault. CVA6, on the other hand, incorrectly mismatch inside the exception handler when reading the value of set the value to 12 which indicates Instruction Page Fault. Accord- stval due to incorrect setting. ing to the designer, this is because CVA6 implementation aliases the access and page faults in the instruction front-end and treats 6.3.4 Bug ID B4. This bug brings about similar implementation everything as instruction page faults. inconsistancy within the ISA specification that is described in Bug 6.2.2 Bug ID B6. This bug is exposed when we create artificial ID B3 (6.3.3). The difference here is that mtval control status register backpressure at the FIFO’s full signal in the cache subsystem and is written incorrect value. results in the complete hang of the system. The purpose of this 6.3.5 Bug ID B7. This bug is in the BlackParrot’s integer divide FIFO is to queue memory requests that are coming from the icache. unit. The co-simulation failed when the BlackParrot commited divw The full signal, in turn, is used to form a request logic for the arbiter, insruction, which is the 32-bit integer division. The bug manifests

Effective Processor Verification with Logic Fuzzer Enhanced Co-simulation        MICRO-54, October 18–22, 2021, Athens, Greece

 Bug ID   Core        Dr* Dr+LF**   Short description                                   Reported  Fixed
 B1       CVA6         ✓            incorrect update of prv bits in dcsr register          ✓        ✓
 B2       CVA6         ✓            incorrect integer division                             ✓
 B3       CVA6         ✓            stval CSR is written on ecall                          ✓
 B4       CVA6         ✓            mtval CSR is written on ecall                          ✓
 B5       CVA6           ✓          incorrect trap cause                                   ✓
 B6       CVA6           ✓          arbiter locks with gnt 0                               ✓
 B7       BlackParrot  ✓            integer divide, incorrect handling of sign-extension   ✓        ✓
 B8       BlackParrot  ✓            no exception handling on some illegal instructions     ✓        ✓
 B9       BlackParrot  ✓            least-significant-bit not cleared on jalr instruction  ✓        ✓
 B10      BlackParrot  ✓            speculative long latency instructions commit           ✓        ✓
 B11      BlackParrot    ✓          core hangs on access to irregular memory region        ✓        ✓
 B12      BlackParrot    ✓          backend backpressure breaks instruction ordering       ✓        ✓
 B13      BOOM         ✓            incorrect mtval CSR value on traps                     ✓        ✓
Table 3: Summary of the bugs exposed in three RISC-V cores. *Dr column refers to bugs found with Dromajo. **Dr+LF refers
Dromajo with Logic Fuzzer

on the remw instruction as well. The divw and remw are signed writes. Including the write to the CSR mepc, which gets set to 0x196. instruction. The proper implementation should divide the lower The last instruction of the setup code is mret, whose execution sup- 32-bits source registers by treating them as signed numbers, but posed to change the privileged mode and revert the program flow BlackParrot’s integer divide unit implementation was treating the to the instruction memory address pointed by mepc. Nevertheless, operands as if they were unsigned. mret throws an exception because of the instruction page fault. Ac- 6.3.6 Bug ID B8. This bug is in the BlackParrot’s instruction de- cording to ISA, this fault must set the mtval value to the address of coder. The decoder did not trap an invalid instruction, but passed the instruction that caused the exception. However, when we read it down the pipeline. The machine code that BlackParrot decided the value of this CSR in the exception handler, they are different. not to mark as an illegal was the the binary word similar to the en- The value that is set by BOOM is off by 2. According to the designer, coding of jalr instruction. The only difference was that sub-opcode the bug is due to the handling of compressed RISC-V instructions func3 was not equal to zero. The ISA defines jalr instruction with (RVC). Specifically, handling of exceptions on misaligned instruc- the subopcode of zero. The decoder had not perform any checks on tions appeared to be broken. The bug disappears in later commits func3 bits at the time of the bug detection. Although the mismatch of the BOOM. was detected for the case of jalr, in general, all instructions with invalid subopcodes should trigger an illegal instruction exception. 6.4 Bug Hunting with LF and False Positives 6.3.7 Bug ID B9. The bug is related to the calculation of the target The co-simulation mismatch is what tells if the bug was manifested address of the control flow instructions. The RISC-V ISA explicitly or not. We start debugging only if the activity created by LF propa- requires that the least significant bit of the calculated address by jalr gates to the architectural state and gets flagged by Dromajo. The instruction is cleared. The clearing of the bit was not implemented process of proving or disproving if the mismatch is an actual bug in the BlackParrot. Hence, Dromajo flagged the PC mismatch af- is no different from regular RTL debugging. We then discuss the ter the execution of jalr instructions. According to the designer, debugging results with the designer who confirms or rejects the the confusion came from the fact the jalr instruction is encoded validity of the finding. differently than jal and branch instructions. Logic Fuzzer had two false bugs (not presented in the paper) – 6.3.8 Bug ID B10. This bug is the result of an incorrect poison one in CVA6 and one in BOOM. Let us mention that traditional bit setting. Dromajo flagged a mismatch on the load instruction. verification practices have the same issue of false positives. The Dromajo trace analysis proved that there was only a single store- fact that we find bugs in an automatic way does not qualitatively instruction to writing to the memory address. However, the follow- affect this issue, only quantitatively. ing load-instruction brought a different value from what had been written. According to the designer, the values were overwritten by 6.5 Toggle Coverage long latency instructions that were marked for flushing. The bug The signal is said to be toggled if its value switched 0 → 1 and would manifest when the pipeline flushed on exceptions. Then at 1 → 0 at least once while executing the test. Toggle coverage is some point, the long latency instruction completed and allowed one of the proxy metrics that is used both in industry [23, 38] write-back due to the invalid poison bit. and academia [26] to gain confidence about the correctness of the 6.3.9 Bug ID B13. This bug gets exposed when running a random design-under-test. instruction stream and Dromajo flags a mismatch when reading Figure 8 illustrates how the toggle coverage increases as we run mtval CSR. The cores start running the binary in an M-privileged the verification binaries. Logic Fuzzer increased the toggle coverage mode to execute the setup instructions, such as a series of CSR on average by 1%.

MICRO-54, October 18–22, 2021, Athens, Greece                                                                                     Kabylkas et al.

        100                                                               the instruction-by-instruction comparison in the presence of an
                                                                          external stimulus.
         50      Fuzzing                                                            The tool proposed by Herdt et al. [19] have the co-simulation
                 No fuzzing                              Ariane           component in their flow. The novelty of their approach is that
        1000 0   40     80     120   160     200     240  280  320        the instruction stream generator and co-simulation come in one
                                                                          package. They present a testbench infrastructure where instructions
                                                                          are generated at run-time and co-simulated endlessly. Like Ibex
         50      Fuzzing                                                  Core Verification flow this flow does not support the handling of
                 No fuzzing                              Black Parrot     the asynchronous stimuli.
          0                                                                      Whisper [11] is the instruction set simulator developed by West-
        100 0    40     80     120  160  200     240 280  320 360         ern Digital. It can also be used in the co-simulation environment.
         50                                                               The difference with Open-Cosim is that it supports only RV32, it
                 Fuzzing                                                  doesn’t handle interrupts, and has no support for checkpoints.
          0      No fuzzing                               BOOM                There are several general resources available for verification of a
            0    40     80     120   160     200     240  280  320        RISC-V processors. The GitHub repository developed in UC Berke-
                                     Test #                               ley has the set of unit tests that sweeps through the base instructions
                                                                          defined in ISA [14]. The RISC-V International Association has also
Figure 8: Coverage increase when running verification bina-               established a Compliance Task Group [13]. Similar to [14], they
ries                                                                      only check for the basic functionality, but it is an attempt to formal-
                                                                          ize the compliance process. Currently only the RV32I ISA subset
                                                                          is completed. In addition, there are several open-source random
                                                                          instruction generators available [10, 15, 31, 33].
    Although increased coverage is a beneficial side-effect of fuzzing,        Finally, there is a set of works that use the term “co-simulation"
we want to emphasize that increasing coverage in and of itself            in the context of heterogeneous simulation frameworks [8, 45]. In
is NOT the purpose of the Logic Fuzzer. The purpose is to                 these settings, the software model is a part of a testbench and is
create an irregular execution flow, which, most of the time, is           used to drive stimuli to a design-under-test. These should not be
not captured by the coverage metrics. Toggle and similar types            confused with the co-simulation concept presented in this paper as
of code coverage are not full indicators that the system is verified.     we are describing it in the context of comparison with the golden
They are just proxy metrics.                                              model.
         For example, from the bugs found by the Logic Fuzzer, only one
(B12) is directly correlated to the toggle coverage. The remaining        7.2    Logic Fuzzer Related Work
three bugs do not correlate with toggle coverage. The bugs were
detected because of the randomized events created by LF. It was           7.2.1    Input-stimuli fuzzing. Inspired by software verification tech-
a combination of signals and states that even with a 100% toggle          niques, several works adapted the methods for hardware verifi-
coverage may not have been detected.                                      cation. RFUZZ transferred the concept of American Fuzzy Lop
                                                                          to hardware [26]. Trippel et al. [40], on the contrary, explore the
7       RELATED WORK                                                      methods to transfer the design to the software model and apply
7.1     Dromajo Related Work                                              well-established software fuzzing techniques in software domain.
                                                                          The technique in the PyMTL infrastructure adapted Hypothesis
Imperas Software Ltd. is a commercial company that develops vir-          Testing [36]. It is property-based testing that requires to construct
tual platforms supporting a range of ISAs, including RISC-V [29].         assertions that must always hold. The technique then tries to find
They claim to support step-and-compare simulation capability [28].        the minimal possible example that breaks the assertion. Similar
Imperas provide the RISC-V core models under the Apache 2.0. li-          method presented in [30]. These techniques should not be asso-
cense. However the model is attached to the simulator, which they         ciated with Logic Fuzzer as all of them stress the DUT externally
licensed under OVP Fixed Platform Kit. The difference of Dromajo          having an “outside-in” approach. The LF proposes an “inside-out”
with the Imperas’ solution is the ability to handle checkpoints. An-      approach, meaning the actual RTL logic is fuzzed wherever possible.
other difference is that the whole Dromajo is licensed under Apache
2.0.                                                                      7.2.2       Fault Injection. On a surface level a parallel can be drawn
       lowRISC created a verification flow for their 32-bit RISC-V core   between the concepts of Logic Fuzzer and Fault-Injections [12]. A
Ibex [27]. The infrastructure that they set up executes the binary of     fault is a physical defect or a flaw that may occur in a hardware
interest both on the RTL implementation and the respective golden         system. As the name suggests, Fault-Injection is a simulation-based
model. The models are completely decoupled and run the binary             procedure when the fault is injected into the system on purpose. The
independently. The correctness checking happens post-completion           simulation is then run with the fault and the behavior is observed.
by comparing the execution traces. In this setup, the golden model        The idea is to prevent the system from complete failure, even in
is completely oblivious to the activities happening on the RTL side.      the presence of faults. The overlapping concept with the LF is
An external stimulus, such as interrupts and debug requests, will         the purposeful injection of logic into the system that changes the
cause the traces to diverge. Therefore, this flow cannot support          behavior.

Toggle coverage (%)

Effective Processor Verification with Logic Fuzzer Enhanced Co-simulation MICRO-54, October 18–22, 2021, Athens, Greece

           Nevertheless, the fundamental difference is that LF makes sure   9 ACKNOWLEDGMENTS

that the inserted logic does not have any side effects on functionality. We like to thank the reviewers for their feedback on the paper. We On the one hand, system failures detected by fault-injection are would also like to express our deepest gratitude to the designers of analyzed. As a result, preventive or corrective actions are proposed. the processors-under-test for their readiness to address our ques- On the other hand, system failures detected by Logic Fuzzer are tions. Namely, we acknowledge Florian Zaruba from CVA6 team, flagged as potential functional bugs. Jerry Zhao and Abraham Gonzalez from BOOM team, Dan Petrisko and Farzam Gilani from Black Parrot team. This work was sup- ported in part by grants from Google. This material is based upon work supported by, or in part by, the Army Research Laboratory and 8 CONCLUSION AND FUTURE WORK the Army Research Office under contract/grant W911NF1910466. In this paper, we presented Logic Fuzzer, a methodology that brings the processor execution outside its normal flow by randomizing the microarchitectural state at the places that do not affect functionality. We presented several Logic Fuzzer variants and showed that it could uncover additional bugs in the simulation phase by creating atyp- ical scenarios without the generation of additional tests. Besides, we presented Dromajo, a state of the art verification framework for RISC-V cores which addresses simulation productivity issue through checkpointing. We illustrated the effectiveness of Dromajo by exposing nine bugs in CVA6, BlackParrot, and BOOM. The enhancement of Dro- majo with Logic Fuzzer exposes additional two bugs in CVA6 and two bugs in BlackParrot. LogicFuzzer was not able to find additional bugs in BOOM. However, we demonstrated the applicability and effortless integration of the tools into existing testbench environ- ments. Logic Fuzzer found difficult bugs. In three cases, the bugs were not directly correlated with toggle coverage, which means that the tests exercised the associated logic. The bugs were still there because a combination of events was needed to reach the bug condition. Logic Fuzzer exposes these bugs. We are currently investigating several ways of improving the effectiveness of Logic Fuzzer. The items that we are working on include the identification and testing of other fuzzable logics in the microprocessor, such as reordering of outstanding memory requests and randomization of fixed priority muxes and arbiters. We are also looking into training a machine learning model to orchestrate the inserted Logic Fuzzers in the system. We believe that Dromajo and Logic Fuzzer are great tools to help the RISC-V ecosystem have a more robust infrastructure. Besides the presented contributions, this work opens up new research oppor- tunities and could be extended further in new areas. To mention a few, (1) Logic Fuzzer can be integrated with tools like Coppelia [47]. Coppelia uses symbolic execution techniques to generate exploits that will bring the processor into a vulnerable state. Although this work is proposed in the context of hardware security, we believe it can be extended towards functional verification. We could po- tentially provide Coppelia with the failing case exposed by the Logic Fuzzer and let it try to generate the code. The successful generation of the exploit will prove the presence of the real bug. (2) Integrating Logic Fuzzer in the fully elastic systems [32] could expose even more bugs than in non-elastic designs. (3) The RISC-V checkpoints created by Dromajo could also be leveraged for works that do statistical sampling for performance and power. To conclude, we expect the developed infrastructure to enhance the quality of existing RISC-V cores, and Logic Fuzzer techniques to be applied beyond RISC-V.

MICRO-54, October 18–22, 2021, Athens, Greece Kabylkas et al.

REFERENCES Verification, Ahmed Bouajjani and Oded Maler (Eds.). Springer Berlin Heidelberg, [1] [n.d.]. [information is disclosed per double-blind peer review guidelines]. Berlin, Heidelberg, 414–429. [2] W. Anderson. 1992. Logical verification of the NVAX CPU chip design. In Proceed- [25] M. Kantrowitz and L. M. Noack. 1996. I’m done simulating; now what? Verifi- ings 1992 IEEE International Conference on Computer Design: VLSI in Computers cation coverage analysis and correctness checking of the DECchip 21164 Alpha Processors. 306–309. microprocessor. In 33rd Design Automation Conference Proceedings, 1996. 325–330. [3] Azad, Zahra; Delshadtehrani, Leila; Zhou, Boyou; Joshi, Ajay; Gilani, Farzam; [26] Kevin Laeufer, Jack Koenig, Donggyu Kim, Jonathan Bachrach, and Koushik Sen. Lim, Katie; Petrisko, Daniel; Jung, Tommy; Wyse, Mark; Guarino, Tavio; Veluri, 2018. RFUZZ: coverage-directed fuzz testing of RTL on FPGAs. In Proceedings of Bandhav; Wang, Yongqin; Oskin, Mark; Taylor, Michael. 2019. The BlackParrot the International Conference on Computer-Aided Design, ICCAD 2018, San Diego, Processor: An Open-Source Industrial-Strength RV64G Multicore Processor. (Mar CA, USA, November 05-08, 2018, Iris Bahar (Ed.). ACM, 28. https://doi.org/10. 2019). 1145/3240765.3240842 [4] J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avižienis, J. Wawrzynek, [27] lowRISC. 2019. Ibex Core Verification. https://github.com/lowRISC/ibex/blob/ and K. Asanov. 2012. Chisel: Constructing hardware in a Scala embedded lan- master/doc/verification.rst guage. In DAC Design Automation Conference 2012. 1212–1221. [28] Imperas Software Ltd. [n.d.]. Imperas announce first reference [5] Brian Bailey, Harry Foster, Chirag Gandhi, Tom Anderson, Craig Shirley, Adam model with UVM encapsulation for RISC-V verification. https: Sherer, Rajesh Ramanujam, Roger Sabbaugh, Vigyan Singhal, and Kevin McDer- //www.imperas.com/articles/imperas-announce-first-reference-model- mott. 2018. When Bugs Escape). https://semiengineering.com/when-bugs-escape/ with-uvm-encapsulation-for-/risc-v-verification [6] Christopher Celio. 2018. A Highly Productive Implementation of an Out-of-Order [29] Imperas Software Ltd. [n.d.]. OVP RISC-V Solutions. https://www.ovpworld. Processor Generator. Ph.D. Dissertation. EECS Department, University of Califor- org/info_riscv nia, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018- [30] M. Naylor and S. Moore. 2015. A generic synthesisable test bench. In 2015 151.html ACM/IEEE International Conference on Formal Methods and Models for Codesign [7] Chipyard. [n.d.]. 8.2. Communicating with the DUT. https://chipyard. (MEMOCODE). 128–137. readthedocs.io/en/latest/Advanced-Concepts/Chip-Communication.html [31] OpenHWGroup. 2020. https://github.com/openhwgroup/force-riscv [8] Shenghsun Cho, Mrunal Patel, Han Chen, Michael Ferdman, and Peter Milder. [32] R. T. Possignolo, E. Ebrahimi, H. Skinner, and J. Renau. 2016. Fluid Pipelines: 2018. A Full-System VM-HDL Co-Simulation Framework for Servers with PCIe- Elastic circuitry meets Out-of-Order execution. In 2016 IEEE 34th International Connected FPGAs (FPGA ’18). Association for Computing Machinery, New York, Conference on Computer Design (ICCD). 233–240. NY, USA, 87–96. https://doi.org/10.1145/3174243.3174269 [33] Anmol Sahoo. 2019. https://pypi.org/project/aapg/ [9] Joonwon Choi, Muralidaran Vijayaraghavan, Benjamin Sherman, Adam Chlipala, [34] Timothy Sherwood, Erez Perelman, Greg Hamerly, and Brad Calder. 2002. Auto- and Arvind. 2017. Kami: A Platform for High-Level Parametric Hardware Spec- matically Characterizing Large Scale Program Behavior. ACM SIGPLAN Notices ification and Its Modular Verification. Proc. ACM Program. Lang. 1, ICFP (Aug 37 (09 2002). https://doi.org/10.1145/605397.605403 2017), 30 pages. https://doi.org/10.1145/3110268 [35] Timothy Sherwood, Suleyman Sair, and Brad Calder. 2003. Phase tracking and [10] M. Chupilko, A. Kamkin, A. Kotsynyak, A. Protsenko, S. Smolov, and A. prediction. ACM Sigarch Computer Architecture News 31, 336– 347. https: Tatarnikov. 2018. Test Program Generator MicroTESK for RISC-V. In 2018 19th //doi.org/10.1145/871656.859657 International Workshop on Microprocessor and SOC Test and Verification (MTV). [36] Christopher Batten Shunning Jiang, Christopher Torng. 2018. An Open-Source 6–11. Python-Based Hardware Generation, Simulation, and Verification Framework. In [11] Western Digital. [n.d.]. Whisper Instruction Set Simulator. https://github.com/ Proceedings of the First Workshop on Open-Source EDA Technology (WOSET18). westerndigitalcorporation/swerv-ISS [37] Wilson Snyder. [n.d.]. Verilator. https://www.veripool.org/wiki/verilator [12] Schuyler Eldridge, Alper Buyuktosunoglu, and Pradip Bose. 2018. Chiffre: A [38] S. Tasiran and K. Keutzer. 2001. Coverage metrics for functional validation of Configurable Hardware Fault Injection Framework for RISC-V Systems. In 2nd hardware designs. IEEE Design Test of Computers 18, 4 (2001), 36–45. Workshop on Computer Architecture Research with RISC-V (CARRV ’18). [39] Esperanto Technologies. 2019. Dromajo - Esperanto Technology’s RISC-V Refer- [13] RISC-V Foundation. [n.d.]. https://github.com/riscv/riscv-compliance ence Model. https://github.com/chipsalliance/dromajo [14] RISC-V Foundation. 2013. https://github.com/riscv/riscv-tests [40] Timothy Trippel, Kang G. Shin, Alex Chernyakhovsky, Garret Kelly, Do- [15] Google. 2019. SV/UVM based instruction generator for RISC-V processor verifi- minic Rizzo, and Matthew Hicks. 2021. Fuzzing Hardware Like Software. cation. https://github.com/google/riscv-dv arXiv:2102.02308 [cs.AR] [16] Mentor Harry Foster. 2019. Part 12: The 2018 Wilson Research Group [41] I. Wagner, V. Bertacco, and T. Austin. 2005. StressTest: an automatic approach Functional Verification Study (IC/ASIC Verification Results Trends). to test generation via activity monitors. In Proceedings. 42nd Design Automation https://blogs.mentor.com/verificationhorizons/blog/2019/01/29/part-12- Conference, 2005. 783–788. the-2018-wilson-research-group-functional-verification-study/ [42] Sheng-Hong Wang, Rafael Trapani Possignolo, Qian Chen, Rohan Ganpati, and [17] Mentor Harry Foster. 2019. Part 7: The 2018 Wilson Research Group Func- Jose Renau. 2019. LGraph: A Unified Data Model and API for Productive Open- tional Verification Study (IC/ASIC Design Trends). https://blogs.mentor.com/ Source Hardware Design. In Open-Source EDA Technology, Proceedings of the verificationhorizons/blog/2019/01/29/part-7-the-2018-wilson-research-group- Second Workshop on (WOSET’19). functional-verification-study/ [43] Claire Wolf. 2017. SymbiYosys (sby) - Front-end for Yosys-based formal verifica- [18] Mentor Harry Foster. 2019. Part 8: The 2018 Wilson Research Group tion flows. https://github.com/YosysHQ/SymbiYosys Functional Verification Study (IC/ASIC Resource Trends). https: [44] D. A. Wood, G. A. Gibson, and R. H. Katz. 1990. Verifying a multiprocessor cache //blogs.mentor.com/verificationhorizons/blog/2019/01/29/part-8-the-2018- controller using random test generation. IEEE Design Test of Computers 7, 4 (1990), wilson-research-group-functional-verification-study/ 13–25. [19] Vladimir Herdt, Daniel Grosse, Eyck Jentzsch, and Rolf Drechsler. 2020. Efficient [45] Alberto Dassatti Xavier Ruppen, Roberto Rigamonti. 2018. Test Program Gen- Cross-Level Testing for Processor Verification: A RISC-V Case-Study. In Forum eration for Functional Verification of PowePC Processors in IBM. In Barcelona on Specification and Design Languages (FDL). RISC-V Workshop. [20] R. C. Ho, C. Han Yang, M. A. Horowitz, and D. L. Dill. 1995. Architecture validation [46] F. Zaruba and L. Benini. 2019. The Cost of Application-Class Processing: Energy for processors. In Proceedings 22nd Annual International Symposium on Computer and Performance Analysis of a Linux-Ready 1.7-GHz 64-Bit RISC-V Core in Architecture. 404–413. 22-nm FDSOI Technology. IEEE Transactions on Very Large Scale Integration [21] Bo-Yuan Huang, Hongce Zhang, Pramod Subramanyan, Yakir Vizel, Aarti Gupta, (VLSI) Systems 27, 11 (Nov 2019), 2629–2640. https://doi.org/10.1109/TVLSI.2019. and Sharad Malik. 2018. Instruction-Level Abstraction (ILA): A Uniform Specifi- 2926114 cation for System-on-Chip (SoC) Verification. ACM Trans. Des. Autom. Electron. [47] Rui Zhang, Calvin Deutschbein, Peng Huang, and Cynthia Sturton. 2018. End- Syst. 24, 1, Article 10 (dec 2018), 24 pages. https://doi.org/10.1145/3282444 to-End Automated Exploit Generation for Validating the Security of Processor [22] A. Izraelevitz, J. Koenig, P. Li, R. Lin, A. Wang, A. Magyar, D. Kim, C. Schmidt, Designs. In Proceedings of the International Symposium on Microarchitecture (MI- C. Markley, J. Lawson, and J. Bachrach. 2017. Reusability is FIRRTL ground: CRO). IEEE/ACM. Hardware construction languages, compiler frameworks, and transformations. [48] Jerry Zhao, Ben Korpan, Abraham Gonzalez, and Krste Asanovic. 2020. Sonic- In 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). BOOM: The 3rd Generation Berkeley Out-of-Order Machine. (May 2020). 209–216. https://doi.org/10.1109/ICCAD.2017.8203780 [23] Jing-Yang Jou and Chien-Nan Liu. 1999. Coverage Analysis Techniques for HDL Design Validation. IEEE Asia Pacific Conference on Chip Design Languages (01 1999). [24] Roope Kaivola, Rajnish Ghughal, Naren Narasimhan, Amber Telfer, Jesse Whitte- more, Sudhindra Pandav, Anna Slobodov, Christopher Taylor, Vladimir Frolov, Erik Reeber, and Armaghan Naik. 2009. Replacing Testing with Formal Verifica- tion in Intel CoreTM i7 Processor Execution Engine Validation. In Computer Aided