Source b6dc9142... — STIMSMITH

SOURCE ARCHIVE

SHA256: b6dc914228e5c065b849158f894744a2bcdd5e0b725b38a23e6171a902ca589d

URL: https://agra.informatik.uni-bremen.de/doc/konf/riscv-processor-verification-fdl-2020.pdf

TYPE: application/pdf

SIZE: 261.6 KB

FETCHED: 6/2/2026, 10:16:25 AM

EXTRACTOR: liteparse

CHARS: 49,881

EXTRACTED CONTENT

49,881 chars

                                                                                                          Efficient Cross-Level Testing for
                                                                                                     Processor Verification: A RISC-V Case-Study

                      Vladimir Herdt1            Daniel Große1,2                Eyck Jentzsch3     Rolf Drechsler1,4
                                                                                                 1Cyber-Physical Systems, DFKI GmbH, Bremen, Germany
                                                                                         2Chair of Complex Systems, Johannes Kepler University Linz, Austria
                                                                                                     3MINRES® Technologies GmbH, Munich, Germany
                                                                                        4Institute of Computer Science, University of Bremen, Bremen, Germany
                                                                           Vladimir.Herdt@dfki.de, daniel.grosse@jku.at, eyck@minres.com, drechsle@informatik.uni-bremen.de

Abstract—Extensive processor verification at the Register- verification effort. However, they require efficient test generation Transfer Level (RTL) is crucial to avoid bugs. Therefore, methods to achieve a thorough verification. simulation-based approaches are prevalent but they require effi- Several approaches have been proposed for the purpose of cient test generation methods to achieve a thorough verification. instruction stream generation for processor verification. In par- In this paper we propose an efficient cross-level testing approach for processor verification targeting the RISC-V Instruc- ticular model-based approaches, which separate the test gen- tion Set Architecture (ISA). We generate an endless instruction erator from the architecture description, have a long history. stream without restrictions on the generated instructions by Prominent examples using constraint solving techniques are [1], evolving the instruction stream on-the-fly during simulation. [2]. An optimized test generation framework has been presented An Instruction Set Simulator (ISS) is leveraged as reference in [3]. It propagates constraints among multiple instructions in model for the RTL core under test in a tightly coupled cross- an effective manner. The test program generator of [4] includes level co-simulation setting. This enables a very efficient and comprehensive testing process. As a case-study we present results a coverage model that holds constraints describing execution on the verification of the 32 bit pipelined RISC-V core of paths of individual instructions. Alternative approaches integrate MINRES The Good Folk (TGF) Series Our approach has been coverage-guided test generation based on bayesian networks [5] very effective in finding several serious bugs. and other machine learning techniques [6] as well as fuzzing [7]. Index Terms—RISC-V, Cross-Level, Processor Verification, However, these approaches are either not designed for RTL Instruction Stream, Co-Simulation verification or impose restrictions on the generated instruction I. INTRODUCTION streams. In addition, they do not target the RISC-V ISA. In this paper we propose an efficient cross-level testing ap- RISC-V is an open and royalty-free Instruction Set Architec- proach for processor verification at RTL targeting the RISC-V ture (ISA) that gained enormous momentum in both academia ISA. Our approach generates an endless instruction stream with- and industry in recent years. The major goal of the RISC-V out restrictions on the generated instructions by evolving the ISA is to provide a path to a new era of processor innovation instruction stream on-the-fly during simulation. An Instruction via open standard collaboration. RISC-V features an extremely Set Simulator (ISS) is leveraged as reference model for the RTL modular and extensible design that provides enormous flexibility core under test in a tightly coupled cross-level co-simulation in building application specific solutions that can leverage cus- setting. This enables a very efficient and comprehensive testing tom extensions and only include features that are really required. process. Our solution provides a testbench that feeds the gen- RISC-V became a game changer for embedded systems in sev- erated instruction stream to the ISS and RTL core and compares eral application areas including e.g. IoT and Edge devices. Thus, the results after each executed instruction in order to detect errors many emerging designs feature a RISC-V processor, which is at in the RTL core immediately when they occur. As a case-study the heart of the design. we present results on the verification of the 32 bit pipelined Extensive verification of the processor at the Register-Transfer RISC-V core of MINRES The Good Folk (TGF) Series. Our Level (RTL) is crucial to avoid bugs, which could lead to longer approach has been very effective in finding several serious bugs design cycles and significant follow-up costs. Due to their ease in the industrial core. Moreover, our approach is very efficient of use and scalability, simulation-based methods are still preva- with more than 200 million processed instructions per hour on a lent in the verification domain and form the back-bone of the standard laptop.1 This work was supported in part by the German Federal Ministry of II. RELATED WORK Education and Research (BMBF) within the project Scale4Edge under contract no. 16ME0127 and no. 16ME0135, and within the BMBF We already mentioned related work on general methods to project VerSys under contract no. 01IW19001, and by the German generate processor-level stimuli in the introduction. Here we Research Foundation (DFG) as part of the Collaborative Research Center focus on RISC-V specific solutions which have started to emerge (Sonderforschungsbereich) 1320 EASE – Everyday Activity Science and Engineering, University of Bremen (http://www.ease-crc.org/) in subproject recently. P04. 1Visit http://www.systemc-verification.org/risc-v for our most recent 978-1-7281-8928-4/20/$31.00 ©2020 IEEE RISC-V related approaches.

      First, the officially provided test-suites [8], [9] need to be     360 DV RISC-V verification app [16]. Though, riscv-formal has
mentioned. They are hand-written and aim to cover basic sanity          only limited support for the RISC-V privileged ISA and OneSpin
checks and several corner-case scenarios with support for differ-       360 DV is only commercially available. Another direction is to
ent RISC-V instruction set extensions. However, by being hand-        formalize the RISC-V ISA semantics that is pursued by e.g. [17],
written, their overall coverage is obviously very limited and they   [18]. Based on these formalization theorem prover can be utilized
are not suitable for continuous testing.                                 to reason about the RISC-V ISA semantics and generate simula-

A model-based test generation approach is pursued by tion backends. While formal methods can provide correctness RISC-V Torture Test [10]. It is a Scala-based framework that guarantees, they are significantly more difficult to apply than generates tests based on randomized instruction sequence tem- simulation-based methods and, due to their complexity and po- plates and supports several RISC-V ISA extensions. [11] is tential scalability issues, should be complemented by simulation- another model-based approach that leverages a constraint-based based methods. specification for test generation. However, both approaches leverage pre-defined building blocks for instruction sequences III. PRELIMINARIES which limits their coverage and they do not support illegal This section presents relevant background information on the instructions or exceptions. RISC-V ISA as well as SystemC and TLM (Transaction Level Another research direction considers coverage-guided fuzzing Modeling) [19], [20]. Our co-simulation testbench is imple- tailored for verification at the ISS level [12], [13]. They mented in SystemC and uses TLM. loosen some of the instruction stream generation restrictions but still have problems with branches and jumps to avoid non- A. RISC-V terminating test-cases and problems with platform dependent The RISC-V ISA consists of a mandatory base integer instruc- CSR and memory access operations. In addition, the fuzzing tion set, denoted RV32I, RV64I or RV128I with corresponding result is a test-suite with a comparatively small number of test- register widths, and optional extensions denoted as single letters, cases (a few thousand). e.g. M (integer multiplication and division), C (compressed RISC-V DV [14] by Google is another test generation ap- instructions) etc. Thus, RV32I denotes a 32 bit core without any proach that leverages SystemVerilog in combination with UVM extensions. It has 32 general purpose registers x0 to x31 where (Universal Verification Methodology) to continuously generate x0 is hardwired to zero. Each register is 32 bit wide. Instructions RISC-V instruction streams based on constrained-random de- are grouped into different classes (e.g. computational, load/store, scriptions. Each instruction stream represents a test-case and branch/jump). They access registers (source: RS1 and RS2, RISC-V DV provides a high-level co-simulation interface to destination: RD) and immediates to perform their operation. compare the results between different simulators via execution Immediates are available in different sizes and signed/unsigned log files. RISC-V DV supports a large set of features includ- interpretation. RV32I has five immediate types: I-, S-, B-, U- and ing several RISC-V instruction set extensions and CSR testing J-type. For example, I-type is a signed 12 bit immediate, thus capabilities. However, it has two major disadvantages: First, has a value range of [-2048...2047]. Format and semantics (for the generated instruction streams are restricted to avoid prob- the base ISA and extensions) are defined in the unprivileged ISA lems with infinite loops and platform dependent memory access specification [21]. operations. Second, RISC-V DV has a significant performance In addition, the privileged (architecture) specification [22] overhead because it is a generic framework that aims to support a covers further important functionality that is required for envi- large range of simulators (and RTL cores perspectively) and thus ronment interaction, operating system execution and trap han- fully decoupled the test generation and co-simulation process. dling. It includes different execution modes (in particular the This makes the verification process significantly less efficient, mandatory Machine mode) with corresponding Control and because each test-case needs to be compiled and then loaded and Status Register (CSR) descriptions. CSRs are registers serving executed on the reference simulator and test simulator (and then a special purpose, that form the backbone of the privileged execution logs are compared to decide a mismatch). In addition, architecture description. Example CSRs are: it is much more difficult to pass execution feedback to the test • MISA provides the supported instruction set. generation engine. • MTVEC stores the trap handler address and access config- In contrast, our approach generates one endless instruction uration. stream without any of these restrictions on the generated in- • MTVAL provides exception specific information in case of structions by evolving the instruction stream on-the-fly dur- a trap. ing simulation. In addition, the tight integration of instruction • MEPC stores the return address from a trap for the MRET stream generation and co-simulation environment can make our instruction. approach much more efficient performance-wise. It processes • MINSTRET counts the number of retired instructions. more than 200 million instructions per hour on a standard laptop • MHARTID provides the read-only core id. (which typically cannot be achieved with a generate, compile, • MSTATUS is the main control and status register for the execute and compare loop). core. Beside test generation methods, there are also a few formal CSRs can be read-only and consist of different fields. A field verification approaches for RISC-V. Notable approaches that is part of the CSR (it has a start position and bitwidth) and leverage model checking are riscv-formal [15] and the OneSpin has an access specification such as WARL (Write Any Read

 1    void                Memory::operation(tlm_generic_payload &gp){                                 RTL and ISS Test Memory behave in the same way
23         auto  len        = gp.get_data_length();                                RTL Data                               Instr.         ISS Data
 4         auto  addr       = gp.get_address();                                       Memory                            Generator         Memory
           uint8_t     *ptr = reinterpret_cast<uint8_t*>(
                            gp.get_data_ptr());                                        read/write             generate      read/write
 5                                                                                     memory                 next instr.   memory
 6         if (gp.is_read()) { // read   access                                       RTL Data                            Instr.         ISS Data
78          for  (auto              i=0; i<len; ++i)                               Memory If.         next                Stream  next  Memory If.
 9         } else*(ptr+i){           = read_byte(addr+i);                             RTL signal to   RTL                         ISS
10          assert // write access                                                    TLM interface   instr.   instr.       ISS to TLM
                     (gp.is_write());                                                                                       interface
11          for  (auto              i=0; i<len; ++i)                      SystemC     RTL Core     RTL Instr.                  ISS Instr.  ISS
12              write_byte(addr+i, *(ptr+i));                             based              Memory If.                        Memory If.
13         }
14    }                                                                   Simulation     process RTL           execute next instr. /
                                                                                         Core signals          access ISS data
                                                                           Clock       RTL Core                            Test provide Result /
 Fig. 1.    Example memory access operation using a TLM transaction        Cycle         Adapter                        Controller       Metrics
                                                                                      access RTL Core data / notify instr. completed

Legal). A WARL field can be written with any value but a read Fig. 2. Overview of our co-simulation testbench for processor verification access will only return legal values. This allows SW to query the CSRs to obtain more information on the capabilities of the A. Co-Simulation Testbench Overview core. In contrast to the instruction set specifications, the CSR behavior is much less rigidly defined and often leaves many legal Fig. 2 shows an overview of our co-simulation testbench implementation choices which makes the testing process more design. It is implemented in SystemC and enables an efficient challenging. co-simulation between the RTL core (left side of Fig. 2) under test and the ISS (right side of Fig. 2) reference model. B. SystemC and TLM The co-simulation is orchestrated by the test controller (bot- SystemC in combination with TLM is an industry-proven tom center of Fig. 2). Essentially, it repeats the following steps: modeling standard for building designs at different levels of First, the test controller lets the RTL core execute one instruction. abstraction. SystemC is not a new language, rather a C++ class Then, it lets the ISS execute the same instruction. Finally, the library which includes an event-driven simulation kernel [19]. RTL core and ISS execution states (the registers in particular) The structure of a SystemC design is described with modules, are compared. In case of a mismatch in the execution states, whereas the behavior is modeled in processes which are triggered an error is reported. The mismatch has to be analyzed and by events. The execution of a process is non-preemptive, i.e. the fixed accordingly. Otherwise (no mismatch), the co-simulation simulation kernel receives the control back if the process has continues until the testing time is exhausted. This basic approach finished its execution or actively suspends itself. Communication presents considerable challenges that need to be solved, which can be implemented via signals (commonly used for RTL mod- we discuss in more detail in Section IV-B. In the following we els) or abstracted using TLM transactions (commonly used for present more details on the co-simulation testbench. high-level algorithmic models). A transaction object essentially The RTL core is driven by a clock signal. It has two separate consists of a command (e.g. read/write), the data (payload) to be memory interfaces to access the instruction and data memory, transmitted and the address. respectively. The memory interfaces translate back and forth Fig. 1 shows an example memory interface based on between RTL core signals and TLM transactions. We leverage TLM. The memory receives a transaction object called gp TLM transactions to have a unified memory abstraction for the (tlm generic payload type). Based on the TLM command either RTL core and the ISS based on a common standard (recall Sec- a data read (Line 7-8) or write (Line 11-12) operation is executed. tion III-B). The data memory is implemented to work in a lazy The address, access length and data pointer is obtained from fashion. Initially it is empty. On a write access data is stored in the transaction object (Line 2-4). The read byte and write byte the data memory. On a read access either the existing data is functions read and write a single byte from the memory, respec- returned or new random data is generated (if no access at this tively. address happened before). To match the RTL core, the ISS is also using two separate memory interfaces. Please note, the ISS IV. CROSS-LEVEL TESTING FOR PROCESSOR and RTL data memories both use the same random seed and VERIFICATION thus behave exactly in the same way because RTL core and ISS perform the same data memory access sequences (i.e. in the same In this section we present our proposed cross-level testing order). Finally, we provide a core adapter to simplify the access approach for processor verification via on-the-fly endless in- to the RTL core. We provide more details on the core adapter struction stream generation. We start in Section IV-A with an in Section IV-B. overview on the co-simulation testbench design that feeds the Instruction fetching is handled by the instruction memory instruction stream to the ISS and RTL core. Then, we discuss interface based on the Program Counter (PC). An instruction relevant implementation challenges (Section IV-B) and present fetch of the RTL core results in the generation of a new in- our instruction stream generator in more detail (Section IV-C). struction (always, even if this PC has been fetched already), i.e.

on-the-fly during the simulation. An instruction fetch of the ISS 1 function InstrStream::next RTL instr(PC) receives the corresponding fetched instruction of the RTL core. 2 // always generate a new instruction This matching is handled by the instruction stream (top center 3 i ← InstrGenerator::next() of Fig. 2) which is placed between the instruction generator and 4 pending instrs queue::push((PC, i)) the respective memory interfaces. We provide more details on 5 return i // return this new instruction instruction matching in Section IV-B. Please note, our approach generates an endless instruction stream without restrictions on 6 function InstrStream::next ISS instr(PC, expected instr) the generated instructions. Thus, all memory access instructions 7 // search for a matching instruction (because we wrap the complete address range of the data mem- 8 while not pending instrs queue::empty() do ory interface) and jump instructions (including self-loops due to 9 (iPC, i) ← pending instrs queue::pop() our on-the-fly instruction generation) as well as special RISC-V 10 if iPC = PC and i == expected instr then CSR access instructions are supported. This enables a very com- 11 return i // match found, return it prehensive testing. Independent of the generated instructions, ISS and RTL core should behave completely identical on the 12 report mismatch() // no match, something wrong observable architectural state (i.e. register updates). B. Implementation Challenges Fig. 3. Instruction fetch matching between the RTL core and ISS There are two main challenges that need to be solved in order the previous instruction (for the same PC). Thus, a direct match- to implement our proposed approach: 1) it can be difficult to ing based on the PC does not work. For example consider a one detect when an instruction is completed in the RTL core, and instruction backward jump J from address 8 to address 4. Thus, 2) feeding the same instruction stream into the RTL core and the RTL core executes J and starts pre-fetching from address ISS requires special attention. We discuss both points and our 4 solutions in the following. , before J is fully completed. Therefore, a new instruction is

Detecting Completed Instructions: The (industrial fetched (and thus generated on-the-fly) for address 8 before the pipelined) RTL core does not provide a single signal that can ISS would had the opportunity to fetch and execute J. be queried to detect that an instruction has been completed. In Fig. 3 shows our algorithm to solve the above instruction particular, illegal instruction can bypass several stages of the matching problem. In the instruction stream, we keep a queue pipeline (depending where they identified as illegal) and do not of pending instructions (in fetch order) that have been fetched trigger any regular register write back notifications. Furthermore, by the RTL core but not yet picked up by the ISS (Line 4). it is not possible to directly consider an illegal instruction com- Please note, beside the generated instruction (Line 3) we also pleted the moment it is detected in the pipeline, because there store the PC in the queue in Line 4. In addition, we leverage may still be legal instructions pending in the pipeline ahead the core adapter to extract the last completed instruction from which need to be completed first (to preserve the instruction the RTL core (by carefully analyzing the pipeline signals). We order). In addition, the pipeline can get flushed (due to jumps and pass this last completed RTL core instruction alongside the ISS traps) as well as get stuck at different stages (some operations PC to fetch the next ISS instruction (Line 6). Based on this such as shifting can take multiple cycles) and thus cause delays arguments we perform a matching with the queue of pending and gaps, which need to be considered as well. Thus, a deep instructions (Line 10). In case of a match the instruction is understanding of the pipeline is required to detect when an returned (Line 11). Otherwise, a mismatch is reported between instruction has been completed. RTL core and ISS (because the ISS tried fetching an instruction Therefore, we provide a core adapter to hide the implementa- which was not delivered to the RTL core) in Line 12. Please note,

tion details of the core and provide a clean testing interface. The       we do not directly feed the completed instruction sequence from
core adapter observes the internal signal changes of the core (in         the core adapter to the ISS, because this would compromise the
particular the pipeline) and notifies the test controller each time       testing approach since we would then rely that the instruction
the RTL core completed one instruction (and also preserves the            propagation in the RTL core works correctly (and the RTL core
correct order in case of illegal instructions). In addition, the core     is under test).
adapter provides access to the register values of the RTL core to         C. Instruction Stream Generator
compare them with the ISS.                                                         Our carefully designed co-simulation setup enables endless

Instruction Stream Matching: The primary goal of our generation of unrestricted instructions. Thus, the baseline gen- testing approach is to generate an endless and unrestricted in- eration algorithm simply fully randomizes the generated instruc- struction stream. However, this makes it more difficult to feed tions. It forms the foundation of the testing process. In addition, the same instructions to the RTL core and ISS. The reason is that we consider several modifications to guide the test generation the RTL core pre-fetches several instructions due to the pipeline. towards interesting cases. However, those pre-fetched instructions may not be executed in The first modification is to inject a random instruction op- case of a jump or trap. In this case, the ISS will fetch a different code to create a valid instruction but keep the instruction fields sequence of PCs than the RTL core. Furthermore, short jumps randomized. This modification is very simple but at the same (which can also be caused by traps) can cause a new instruction time very generic and effective. It is also extremely important fetch in the RTL core before the ISS had the opportunity to fetch to ensure that a large set of legal instructions is considered 1 function InstrGenerator::next()

(A)  Fully random     32  bit word (most likely illegal instruction)                2   |      // sequence      is an InstrGenerator class variable
1 inject random opcode (ADDI here)                                                  3   |      if  sequence = nil and sequence.has next() then
 31      2019xxxxx1514                      1211    76      0                       4          | // continue with existing      sequence
                                 | 000   | xxxxx_|       0010011 |                  5              return sequence.next()   // return next   instr.
                             RS1                    opcode    RD opcode             6
(B)  ADDI: Regs[RD]       =  Regs[RS1] +   |_imm                                        |      if  Random::probability(1) then     // enter with 1%
                                                                                    7              // start a new sequence
                                                2 mutate RD to be equal to RS1      8              sequence ← choose random sequence generator()
 31      2019                                    1514ow 1211                        9              return sequence.start()   // return first instr.
         |v]                                       |v]⁷  6      0|                  10         // generate a random 32 bit      word (instruction)
                             RSL                    opcode    RD opcode  ~~         11
(C)  Special ADDI with       RD = RS1 (but |_imm still randomized)                      |      x ← Random::instruction()
                                                                                    12  |      if  Random::probability(98) then  // enter with  98%
     Fig. 4.                 Injection and mutation rule example for illustration   13             // choose any  opcode,   keep fields random
                                                                                    14             x ← inject random valid opcode(x)
(because pure randomization tends to generate illegal instruction                   15             // apply a mutation     rule to the fields

due to the significantly larger state space of illegal instructions). Fig. 4 shows an example. Starting with the fully randomized bit instruction (A), the ADDI opcode is injected resulting in a 18 randomized ADDI instruction (B), by operation (1). The second modification is to mutate the instruction fields based on a pre-defined rule. We provide a set of rules about the structure and values of the instructions. The derived based on the RISC-V instruction format. We provide rules to inject special values, such as {MIN, -1, 0, 1, MAX}, into the respective immediate field. Other rules reason about the register structure, e.g. mutate RD to zero (since the zero register is hardwired in RISC-V and thus a special case), mutate RD to be equal to RS1 and/or RS2, and mutate RS1 to match RS2. And we provide a rule to mutate the CSR selector field to a supported CSR. Fig. 4 again shows an example. Starting with the randomized ADDI instruction (B) the RD field is mutated to match the RS1 field (C), by mutation (2). Both register fields are still randomized but equal (denoted as Y in Fig. 4). As a third modification, we consider generation of instruction sequences. A sequence consists of a fixed number of instructions that are designed to perform a specific task and can be random- ized. For example, two RISC-V instructions that in combination can load a large immediate value into a RISC-V register (the immediate field of a single instruction is not large enough to load an arbitrary register value). The target register and load value are randomized. Another example is a compute chain, that feeds the result of one instruction into the source register of the next instruction but randomizes the operation (e.g. ADD, SUB, etc) and operand registers. One more useful sequence is a CSR access sequence. It performs a randomized CSR access and then writes the CSR value into a normal register (so it can be compared with the ISS register). Fig. 5 shows the algorithm that we use for instruction genera- tion. If a sequence is active and not yet completed (Line 3), then the next instruction in the sequence is returned (Line 5). With a 1% probability a new sequence is randomly selected and started (Line 6-9). Starting a sequence does randomize it’s and returns the first instruction. Otherwise (no active sequence and no new sequence started), a single independent instruction is generated (Line 11-17) and returned (Line 18). We start with a 16 | if Random::probability(20) then // enter with 20% 32 17 | x ← apply random field mutation(x) -_—| return x // a single independent instruction Fig. 5. Instruction generation algorithm that reason rules are fully randomized instruction (Line 11). With a high probability (98%) a random opcode is injected (Line 12-14). In addition, with a smaller probability (20%) a random field mutation is applied (Line 15-17). V. EXPERIMENTAL EVALUATION We have implemented our proposed cross-level testing ap- proach and applied it for the verification of the pipelined 32 bit industrial RISC-V TGF series core. The core has been implemented in SpinalHDL. It is designed to be highly con- figurable on the microarchitectural level, such as choosing the shifter implementation and pipeline levels. For this evaluation ~~ we use the standard configuration that is available to customers. We obtained the Verilog RTL implementation from SpinalHDL (an option for this use-case is provided) and then applied the Verilator tool to obtain the C++ description of the core which we embedded into our SystemC-based co-simulation testbench. As ISS reference model, we use the 32 bit RISC-V ISS of the open source RISC-V VP [23], [24]. We have modified the ISS to exactly match the capabilities of the RTL core (i.e. the supported RISC-V instruction set and CSRs). The RTL core supports the RV32I ISA in combination with the machine mode CSRs. All experiments have been performed on a Linux system with an Intel Core i5-7200U processor. For the verification process, we iteratively switched between testing and bug fixing until no more bugs were found. In the following we first present and discuss the bugs that we have found and then present performance and execution metrics that we have obtained. instructions A. Found Bugs Our testing process revealed that the RTL core already had a very mature implementation of the RISC-V unprivileged ISA.

Only very few special cases have triggered a mismatch with the   the C extension is deactivated) and then reading MTVAL. The
  ISS. Most bugs were related to the RISC-V privileged ISA, in   reason is that the ISS still expanded the fetched compressed
 particular the CSR handling. In total we found 10 bugs in the   instruction (16 bit) into an uncompressed instruction (32 bit),

RTL core, which we discuss in the following: even though the C extension was deactivated. Thus, instead

Write access to a read-only CSR does not cause an illegal of the original fetched instruction, the expanded instruction is instruction trap. In addition, for specific CSRs and options, erroneously written into MTVAL on the illegal instruction trap. a legal write access to a (non read-only) CSR caused an All of the described bugs have been found in less than 5 exception. minutes each. Thus, our approach has been very effective in
MEPC is not updated correctly on the lower two bits. This finding bugs. In the following we present more details on the allows SW to write an unaligned address into MEPC which performance characteristic and other execution metrics.
1. can cause an unaligned jump. B. Performance and Execution Metrics MISA was not correctly initialized and could be updated The lightweight test-generation process and tight co-
2. by the SW to unsupported values. simulation between the ISS and RTL core enable our approach to MTVAL should be set to zero on an ECALL (instead it has achieve a very high performance. In one hour it generated and co- been set to the ECALL instruction encoding, which is the simulated a total number of 226 million (M) instructions. These default behavior for illegal instructions to help diagnose total instructions are separated into 12M illegal and 214M legal
3. them). instructions. From the legal instructions 156M completed nor- SW can write a reserved value into the MODE field mally and 58M caused an exception (i.e. trap). For illustration, of MTVEC, which should not be allowed since MODE Fig. 6 shows how the legal instructions are distributed. It can should only be able to hold supported values. This can be observed that they are mostly uniformly distributed, ranging cause a serious problem with forward compatibility of from 6.0M for ADDI and 3.6M for MRET (please note, the y- SW, because (due to the modular and extensible design of axis scale starts at 3.0M). The distribution difference are due to RISC-V) SW can query CSRs to obtain their capabilities the randomness of the generation process and the inclusion of
4. (and would be misled in this case). special instruction sequences. For example loading a RISC-V EBREAK instruction sets MCAUSE to illegal instruction register with an arbitrary number requires two instructions, an
5. instead of breakpoint. ADDI and a LUI which is also reflected in Fig. 6. On average The FENCE and FENCE I instructions cause an illegal 63K (K = thousand) instructions and 229K (RTL core) cycles instruction trap for specific options. The problem has been are processed per second. This high performance enables a very
6. in the decoder implementation. efficient testing process. Writing to the MINSTRET and MCYCLE CSRs erro- Looking more closely at the instructions we observed between neously caused an illegal instruction trap (though, accord- 11M to 22M accesses per register with an average of 12M. Due to ing to the specification, this special counter CSRs are the special semantic of the hardwired x0 register in RISC-V, we
7. allowed to be modified by SW). used generation rules that favor the x0 register (thus it is accessed MINSTRET (which counts the number of retired instruc- more often compared to other registers). We observed between tions) is not correctly updated on a write access. In this 1 (because register x0 is hardwired to zero) and 870K different case it should avoid the increment for the instruction that values per register with an average of 747K. On the immediate
performs the write access. fields we observed 5M to 51M accesses with an average of MRET continues at the wrong instruction for some special 22M. The amount of observed values in the immediate fields instruction sequences that involve multiple MRET and varies largely from 32 to 1M due to the different value ranges illegal instructions. MRET is a special RISC-V instruction of the immediates. In total we observed 99.6% of the possi- to return from the trap handler. Thus, it is used in a very ble immediate values (across all instructions in combination). regular way by SW. In contrast, our approach allows to Thus, beside the high performance, our approach also enables comprehensively stress test the MRET instruction (and a broad coverage. In combination with support for unrestricted others) and hence is very effective in finding errors. instruction sequences (to cover highly irregular control flows) In contrast to the existing testing frameworks for RISC-V, our approach is very suitable for extensive stress testing. which impose several restrictions on the generated instructions Finally, please note that our on-the-fly instruction stream (and thus simply cannot generate specific instruction sequences) generation approach is very generic and thus not limited to a our approach avoids these restrictions by evolving the instruction specific RISC-V ISA configuration. We expect that only minimal stream on-the-fly during simulation. This is a very important extensions are necessary to provide efficient support for addi- advantage, because many corner-case bugs will only be revealed tional RISC-V ISA extensions (covering the privileged as well by very specific instruction sequences with highly unregular as unprivileged ISA). control-flow, including tight loops and traps (as for example bug 10 demonstrates). VI. CONCLUSION AND FUTURE WORK Beside the 10 bugs in the RTL core, our testing process also We proposed an efficient cross-level testing approach for revealed 1 bug in the reference ISS, where MTVAL was set processor verification targeting the RISC-V ISA. It works by incorrectly. The bug is triggered by executing a compressed generating and feeding an endless instruction stream into the instruction (which is considered an illegal instruction because RTL core under test and a reference ISS in a tightly coupled

6.0M

55M

5.0M

45M

35M

3.0M p i 2

Fig. 6. Distribution on the executed legal instructions for a 1 hour testing process. The X-axis shows the instructions and the Y-axis the count (M = Millions).

co-simulation setting. The instruction stream evolves on-the-fly [5] S. Fine and A. Ziv, “Coverage directed test generation for functional during simulation and thus avoids restrictions on the generated verification using bayesian networks,” in DAC, 2003, pp. 286–291. instructions. Our approach has been very effective in finding [6] C. Ioannides, G. Barrett, and K. Eder, “Feedback-based coverage several serious bugs in the pipelined industrial RISC-V TGF directed test generation: An industrial evaluation,” in Hardware and Software: Verification and Testing, S. Barner, I. Harris, D. Kroening, series core and worked very efficiently with more than 200 and O. Raz, Eds., 2011. million processed instructions per hour. For future work we plan [7] L. Martignoni, R. Paleari, G. F. Roglia, and D. Bruschi, “Testing CPU to: [8] emulators,” in ISSTA, 2009, pp. 261–272. “RISC-V ISA tests,” https://github.com/riscv/riscv-tests. • Investigate parallelized test sessions (using different ran- [9] “RISC-V compliance task group,” https://github.com/riscv/ dom seeds) and utilizing FPGAs to further boost the testing riscv-compliance. process. [10] “RISC-V torture test generator,” https://github.com/ucb-bar/ riscv-torture. • Consider testing the interrupt interface of the RTL core [11] V. Herdt, D. Große, and R. Drechsler, “Towards specification and testing which is quite challenging as it needs to be synchronized [12] of RISC-V ISA compliance,” in DATE, 2020. with the instruction stream co-simulation (to avoid spurious V. Herdt, D. Große, H. M. Le, and R. Drechsler, “Verifying instruction set simulators using coverage-guided fuzzing,” in DATE, 2019, pp. 360– mismatches between ISS and RTL core). [13] 365. • Extend and evaluate our approach on additional RISC-V V. Herdt, D. Große, and R. Drechsler, “Closing the RISC-V compliance ISA extensions. As already mentioned, we believe that our [14] gap: Looking from the negative testing side,” in DAC, 2020. “RISCV-DV,” https://github.com/google/riscv-dv. approach is very well prepared for this task due to the [15] “RISC-V formal verification framework,” https://github.com/ generic on-the-fly instruction stream generation. [16] SymbioticEDA/riscv-formal. • Investigate new coverage metrics that also consider RTL “OneSpin 360 DV RISC-V Verification App,” https://www.onespin.com/ specific coverage and develop execution feedback mecha- [17] solutions/risc-v. “Formal specification of RISC-V ISA in kami,” https://github.com/sifive/ nisms to further guide the test generation process. [18] RiscvSpecFormal. “Riscv sail model,” https://github.com/rems-project/sail-riscv. REFERENCES [19] IEEE Standard SystemC Language Reference Manual, IEEE Std. 1666, 2011. [1] A. Adir, E. Almog, L. Fournier, E. Marcus, M. Rimon, M. Vinov, [20] D. Große and R. Drechsler, Quality-Driven SystemC Design. Springer, and A. Ziv, “Genesys-pro: innovations in test program generation for 2010. functional processor verification,” D&T, pp. 84–93, 2004. [21] A. Waterman and K. Asanovi´c, The RISC-V Instruction Set Manual; [2] B. Campbell and I. Stark, “Randomised testing of a microprocessor Volume I: Unprivileged ISA, SiFive Inc. and CS Division, EECS De- model using SMT-solver state generation,” in Formal Methods for partment, University of California, Berkeley, 2019. Industrial Critical Systems, F. Lang and F. Flammini, Eds., 2014, pp. [22] ——, The RISC-V Instruction Set Manual; Volume II: Privileged Archi- 185–199. tecture, SiFive Inc. and CS Division, EECS Department, University of [3] Y. Katz, M. Rimon, and A. Ziv, “Generating instruction streams using California, Berkeley, 2019. abstract CSP,” in DATE, 2012, pp. 15–20. [23] V. Herdt, D. Große, H. M. Le, and R. Drechsler, “Extensible and [4] M. Chupilko, A. Kamkin, A. Kotsynyak, and A. Tatarnikov, “Mi- configurable RISC-V based virtual prototype,” in FDL, 2018, pp. 5–16. croTESK: specification-based tool for constructing test program gen- [24] V. Herdt, D. Große, P. Pieper, and R. Drechsler, “RISC-V based virtual erators,” in HVC, 2017. prototype: An extensible and configurable platform for the system-level,” JSA, 2020.