Skip to content
STIMSMITH

SOURCE ARCHIVE

SHA256: 44c1dbd8f2729b5f0bde4d9d870cbd2efce9ea1a8d7cbd18cc0bc71cbfd4d652
TYPE: application/pdf
SIZE: 170.5 KB
FETCHED: 6/5/2026, 10:28:25 AM
EXTRACTOR: liteparse
CHARS: 27,372

EXTRACTED CONTENT

27,372 chars
      Cross-Level Processor Verification via
 Endless Randomized Instruction Stream Generation
with Coverage-guided Aging

Niklas Bruns1 Vladimir Herdt1,2 Eyck Jentzsch3 Rolf Drechsler1,2
1Institute of Computer Science, University of Bremen, 28359 Bremen, Germany

2Cyber-Physical Systems, DFKI GmbH, 28359 Bremen, Germany 3MINRES Technologies GmbH, 85579 Neubiberg, Germany {nbruns,vherdt,drechsler}@uni-bremen.de eyck@minres.com

Abstract—We propose a novel cross-level verification approach context leverage methods based on co-simulation that employ for processor verification at the Register-Transfer Level (RTL). an Instruction Set Simulator (ISS) (i.e. an executable abstract The foundation is a randomized coverage-guided instruction model of the processor core, typically implemented in C++) stream generator that produces one endless and unrestricted as a functional reference model for the RTL processor under instruction stream that evolves dynamically at runtime. We lever- age an Instruction Set Simulator (ISS) as a reference model in a test. Such a method is used by Google’s open-source RISC-V tight co-simulation setting. Coverage information is continuously Design Verification (DV) framework. It applies constraint-based updated based on the execution state of the ISS and we employ specification techniques in SystemVerilog to generate RISC-V Coverage-guided Aging to smooth out the coverage distribution of assembly tests one after another. Different RISC-V instruction the randomized instruction stream over the time. In combination, sets are supported by selecting and combining the respective this enables a broad and deep coverage to find intricate corner- case bugs in the RTL processor. Our case study with an industrial constraint-based specifications. Execution results between the pipelined 32 bit RISC-V processor demonstrate the effectiveness ISS and RTL processor core are compared through execution log of our approach. files. While this feature set makes RISC-V DV very powerful in I. INTRODUCTION general, it also has some major weaknesses. In order to keep the framework generic, the generated tests use a restricted in- Extensive processor verification at the Register-Transfer struction set to avoid problems with infinite loops and platform- Level (RTL) is essential to detect intricate bugs, which could dependent memory access operations. Moreover, by generating lead to enormous follow-up costs and additional design it- tests one by one, only comparatively short instruction sequences erations. Simulation-based methods that rely on continuous are considered, and the state of the processor under test is processor-level stimuli generation are still prevalent and form regularly reset for each new test execution. Furthermore, the the backbone of the verification effort due to their ease of co-simulation has an inherent performance overhead due to use and scalability. In this paper we consider RISC-V [1], [2] the extensive filesystem communication, since each RISC-V as a representative Instruction Set Architecture (ISA) which assembly test needs to be compiled, loaded onto the respective serves as foundation for modern processor architectures, in simulator, and produce a log file for comparison. Finally, the test particular in the embedded application domain. RISC-V is a generator is not designed to be dynamically guided by coverage free and open-source ISA that enables a royalty-free processor information obtained from the test execution progress. Many of design and implementation. It is designed in a very modular these issues have been addressed by a recent academic work [3]. way with optional standard instruction set extensions around It generates endless instruction streams and integrates the ISS a mandatory base integer instruction set and the ability to with the RTL core in a very efficient co-simulation compiled integrate additional custom instruction sets to build highly into a single binary with in-memory communication. The setup application-specific processors. These properties made RISC-V allows to generate instructions without any restrictions, i.e., very popular in industry and academia. From the verification arbitrary combinations of load/store and Control and Status perspective, however, the extensive modularity adds additional Registers (CSRs)1 instructions, as well as infinite loops, are complexity. Besides the modern features provided by RISC-V supported, which enables a very comprehensive test approach. and any micro-architectural specific optimizations of the pro- However, the approach is still limited as it does not collect or cessor, such as pipelining and branch prediction, the verification employ runtime coverage information to assess and guide the tools also need to be able to deal with the large configura- test generation process. Instead, the instruction stream genera- tion space offered by RISC-V. Promising approaches in this tors are based on a simple randomized test strategy which makes it very difficult to continuously achieve a broad and deep test This work was supported in part by the German Federal Ministry of Education and Research (BMBF) within the project Scale4Edge under contract 1In the CSRs, the processor stores additional instruction results to enable no. 16ME0127, and within the project VerSys under contract no. 01IW19001. sophisticated hardware/software interactions.

coverage in endless instruction streams. Seed InstrGen Core-Adapter RTL-Core RTL-Memory In this paper, we propose a novel cross-level verification C approach that conceptually builds upon the previous academic o m work [3] and addresses the aforementioned limitations. The p foundation is a randomized coverage-guided instruction stream Instruction-Injector Coverage-Observer a r generator that produces an endless and unrestricted instruction at stream that evolves dynamically at runtime based on observed o coverage information. We also leverage an ISS as a reference Seed InstrGen ISS ISS-Memory r model in a tight co-simulation setting. Coverage information is continuously updated based on the execution state of the ISS Fig. 1. Overview on core verification and we employ the novel concept of Coverage-guided Aging to smooth out the coverage distribution of the randomized in- struction stream over time. In combination, this enables a broad III. BACKGROUND ON RISC-V and deep coverage to find intricate corner-case bugs in the RTL RISC-V, is a free and open Instruction Set Architecture (ISA) core. Our experiments with the 32-bit pipelined RISC-V core that was developed at UC Berkeley and is available under the of the MINRES The Good Core (TGC) series demonstrate the open-source license: Creative Commons Attribution 4.0 Interna- effectiveness of our approach. We achieve a much more regular tional License [1], [2]. RISC-V provides three different integer coverage distribution of the randomized instruction stream via base ISAs that differ primarily in the used word width: RV32I Coverage-guided Aging, and we found another intricate micro- is the 32-bit version of the architecture, RV64I is the 64-bit, and architecture related bug in the interplay between the already RV128I is the 128-bit version. These base ISAs define integer heavily tested industrial processor with the accompanied test calculations, program control, load and store operations, and bench infrastructure. debugging instructions. In addition to this base ISAs, many instruction set extensions are defined. The used extensions are II. RELATED WORK appended to the name of the integer base ISA to name the ca- pabilities of a core implementation. A 32-bit RISC-V processor Several approaches have been proposed to generate tests for with a multiplication unit, CSR instructions, Fence, and support the purpose of processor verification. One prominent direc- for compressed instructions is called RV32IMCZicsrZifencei. tion is to employ model-based test generators that leverage a constraint-based specification format to guide the test generation IV. CROSS-LEVEL PROCESSOR VERIFICATION WITH process [4], [5]. In this context, optimization techniques for COVERAGE-GUIDED AGING constraint propagation [6], execution path coverage models [7] In this section, we present our cross-level processor verifica- and mining techniques for processor manuals [8] have been tion approach that is based on endless randomized instruction considered. Alternative approaches integrate coverage-guided stream generation using Coverage-guided Aging. We start with test generation based on bayesian networks [9] and other ma- an overview. chine learning techniques [10] as well as fuzzing [11] and symbolic execution [12]. However, these approaches are either A. Overview not designed for RTL verification or impose restrictions on the Fig. 1 shows the overview of our approach. It starts with generated instruction streams. In addition, they do not target the initializing the random instruction generators (InstrGen). Each modern RISC-V ISA. core has its separate instruction generator which are initialized Recently, verification approaches tailored for RISC-V have with the same cryptographic seeds. As a consequence, the gen- emerged. In the introduction, we already covered the modern erators provide the same endless randomized instruction stream. co-simulation based approaches that are tailored for RTL and At first, some instructions of the endless instruction stream are are closest to our proposed approach. Other simulation-based generated and executed by the ISS. After this, the RTL processor approaches for RISC-V generate instruction sequences by com- fetches its instruction stream. However, for the fetching of the bining pre-defined randomized patterns [13] and by utilizing RTL core, micro-architectural details such as pipelining, pre- constraint-based specifications [14] as well as coverage-guided fetching, and fetch-buffering have to be considered. For this fuzzing techniques [15]. However, they suffer from the same purpose, a core adapter is used, which checks for addresses limitations as the traditional processor-level stimuli generation that were not fetched by the ISS, fills them with randomized approaches in imposing restrictions or operating at a differ- values (not generated by InstrGen), and forwards them to the ent abstraction level than RTL. Finally, a set of directed test- RTL-Core. After the execution of the instructions, the core suites that cover different RISC-V instruction sets [16]–[18] are and ISS write the results to the separated memories. Next, the available that form a baseline for testing and looking beyond Coverage-Observer measures the functional coverage based on simulation-based techniques. A few formal approaches that the ISS execution state, does the coverage-aging, and gives are based on model checking techniques [19], [20] have been hints to the Instruction-Injector if functionality must be covered proposed as well. Nevertheless, these formal techniques are (again. In principle, the functional coverage can be specified possibly susceptible to scalability issues. arbitrarily complex and is used to guide the test generation over

time. We will present more details on the Coverage-Observer in we have ensured that the behaviors of the random instruction Section IV-B. Next, the Instruction-Injector evaluates the hints generators are equal. and injects instructions to cover the requested functionality. The injector must consider that the cores have different fetch V. EVALUATION behaviors and execution timings that result in individual random In this section, we present our case study and discuss the instruction generator states. The functional principle of the evaluation results. The goal of our case study is to evaluate the Instruction-Injector is described in Section IV-C. The purpose applicability of Coverage-guided Aging for cross-level proces- of the Comparator is to find functional differences between sor verification. We start with the test setup. the RTL-Core and the ISS. To achieve this, it compares the register values of the ISS and the RTL-Core. The matching is not A. Test Setup straightforward because the cores do not have the same timing As Device Under Test (DUT), we used the 32-bit pipelined behavior. The Comparator logs the value changes and constantly RISC-V core of the MINRES The Good Core (TGC) series, compares the two changes at the same position to solve this which has already been extensively verified using simulation- problem. If the Comparator finds any differences, then it quits based approaches and formal techniques. As reference ISS, we the simulation. In the following, we provide more details on used the ISS of the open-source SystemC-based RISC-V VP2. the Coverage-Observer (Section IV-B) and Instruction-Injector To enable the co-simulation, we translated the industrial RTL (Section IV-C), which are the two most important components core to C++ using the open-source tool Verilator3 and inte- to implement coverage-guided aging. grated it into a SystemC test bench along with the ISS. For B. Coverage-Observer our evaluation, we configured the core and ISS to support the The main functionality of the Coverage-Observer is to mon- RISC-V subset RV32IMCZicsrZifencei (see: Section III). All itor the internal state of the ISS to measure the coverage. It experiments were executed on an Ubuntu 20.04 LTS machine samples the executed instructions and looks up the matching with an AMD Ryzen 7 PRO 4750U CPU with 4.1GHz and coverage points. In this work, we define the cross-product of 36GB RAM and a SystemC simulation time limit of 1 second (≈ instruction groups as coverage points. The instruction groups are 20 million instructions). By analyzing the RISC-V specification, defined by a verification engineer to lay the verification focus we identified the following six important instruction groups that at the to-be-tested functionality. An instruction group covers act as base for the coverage points in this case study: Arithmetic, a set of instructions like arithmetic or load/store instructions. Control Flow, Memory, Special & System, Control & Status Consequently, our approach guarantees to verify each func- Register (CSR), and Other. The group Arithmetic contains all tionality in combination with every function. The Coverage- arithmetic instructions of the instruction subsets RV32I and Observer watches the executed instructions at run-time and is RV32C and all instructions of RV32M. The group Control Flow the heart of our coverage aging extension. After an instruction contains the unconditional jump and the conditional branch sequence covers an coverage point, the Coverage-Observer sets instructions of RV32I and RV32C. The group Memory con- the corresponding Coverage-guided Aging counters to a defined tains the load/store instructions of RV32I and RV32C and the maximal value. Periodically, the Coverage-Observer decreases memory ordering instructions of RV32I. The group Special & the Coverage-guided Aging counter until the minimum limit System contains the ECALL and EBREAK, the NOP and the is reached. In this case, it gives a hint to the Instruction- HINT instructions of RV32I, and the illegal, NOP, breakpoint, Injector. This hint consists of a random instruction sequence and HINT instructions of RV32C. Additionally, it contains the that is needed to cover the coverage point. The instructions are FENCE instruction of ZIFENCEI. The group CSR is equivalent randomly selected instructions that were sampled in this run to ZICSR. The group Other contains all instructions of the un- dynamically. The Coverage-Observer will reset the Coverage- defined and unsupported subsets and the privileged architecture. guided Aging counter if the groups are covered again. Next we As a consequence of the six instruction groups and the resulting describe the Instruction-Injector. 36 coverage points, we configured the Coverage-guided Aging counter to the value 100 and will be decremented after a new C. Instruction-Injector instruction is generated. With the value 100, there are enough The purpose of the Instruction-Injector is to inject instruction random instructions, and at the same time, the coverage points sequences into the random test generators in compliance with are triggered frequently. In the following, we compare the results their internal state. When the instruction injection ignores the of a random test generator with and without our Coverage- internal states, then the generators provide differing instruction guided Aging extension (Section V-B). Then we present a bug streams that may lead to a false result of the Comparator. To that we found during the development process (Section V-C). achieve a legal injection, the Instruction-Injector measures how B. Random vs. Coverage-guided Aging many instructions have been executed before the current state Fig. 2 shows the result bar chart of our case study. The chart of the random generator was reached. Then, it schedules the gives information about how often the coverage points (defined injection to the same near-future instruction count for all instruc- as cross product of the instruction groups) were executed by the tion generators. This approach is valid because deterministic random sources, that are initiated with the same cryptographic 2 seed value, provide the same random sequences. In this way, 3https://github.com/agra-uni-bremen/riscv-vp https://www.veripool.org/verilator/

     2d                                                              entries in the execute FIFO of the pipeline and thus the core did
 35                                                                  not receive any further instructions. This was triggered because
     m= Random + Coverage Aging                                      the pipeline was only emptied by the test bench adapter when a
 20                                                                  valid instruction was executed. Therefore, a test case could trig-
                                                                     ger this error if the core ran too many invalid instructions (see:
 25                                                                  Special & System : Special & System in Fig. 2) in succession.
geo                                                                  D. Discussion and Future Work
                                                                                         Our case study shows, that Coverage-guided Aging is a ef-
                                                                     fective extension for cross-level processor verification. We have
 10                                                                  shown that Coverage-guided Aging complements to close gaps
                                                                     and achieves a much more regular coverage distribution. Fur-
                                                                     thermore we found another intricate micro-architectural bug in
                                                                     the already heavily tested industrial processor. For future work,
 0                                    55000              350853529   we plan to design advanced micro-architecture coverage metrics
                                                                     to measure specific feature testing like the hazard handling of
                 £            EH                                 £   pipelines. In addition, we plan to create a processor verification
                                                                     benchmark based on finely detailed coverage groups.
     3og         Eg                zgEg     0      gE:ᴬ₅                                           REFERENCES
                 gz                52 02                              [1]  A.  Waterman  and  K.   Asanovi´c, Eds., The RISC-V     Instruction Set
                     Cross Coverage   ro                              [2]  Manual; Volume I: Unprivileged ISA, 2019.
                                                                           ——, The RISC-V Instruction Set Manual; Volume II: Privileged Archi-
     Fig. 2.         Cross Coverage Groups : Sum of all runs               tecture, 2019.
                                                                      [3]  V. Herdt, D. Große, E. Jentzsch, and R. Drechsler, “Efficient cross-level
     random test generator and the Coverage-guided Aging test gen-         testing for processor verification: A risc- v case-study,” in FDL, 2020,
   erator. The random generator is a re-implementation of the test    [4]  pp. 1–7.
 generator of [3] and has already proven its excellent bug-hunting         A. Adir, E. Almog, L. Fournier, E. Marcus, M. Rimon, M. Vinov,
                                                                           and A. Ziv, “Genesys-pro: innovations in test program generation for
capabilities. Unfortunately, it tends to favor specific test state         functional processor verification,” D&T, pp. 84–93, 2004.

spaces. It is based on a static randomized test strategy that does ~~ [5] B. Campbell and I. Stark, “Randomised testing of a microprocessor not change over time. However, such an adjustment is critical model using SMT-solver state generation,” in Formal Methods for since we are looking at an endless instruction stream and not at Industrial Critical Systems, F. Lang and F. Flammini, Eds., 2014, pp. 185–199. individual cases where readjustment after each run is possible. ~~ [6] Y. Katz, M. Rimon, and A. Ziv, “Generating instruction streams using As stated in the legend, the blue bars, which are always on the abstract CSP,” in DATE, 2012, pp. 15–20. left side, represent the instructions generated by the random test [7] M. Chupilko, A. Kamkin, A. Kotsynyak, and A. Tatarnikov, “Mi- croTESK: specification-based tool for constructing test program gen- generator and the orange bars belonging to the test generator erators,” in HVC, 2017. that is enhanced with Coverage-guided Aging. The execution of [8] W. Ma, A. Forin, and J. Liu, “Rapid prototyping and compact testing the random test generator leads to substantial peaks in specific [9] of CPU emulators,” in RSP, 2010, pp. 1–7. S. Fine and A. Ziv, “Coverage directed test generation for functional combinations of instruction groups while other combinations verification using bayesian networks,” in DAC, 2003, pp. 286–291. were almost never executed. For example, the count of Special [10] C. Ioannides, G. Barrett, and K. Eder, “Feedback-based coverage & System : Special & System is so low that it almost can not directed test generation: An industrial evaluation,” in Hardware and Software: Verification and Testing, S. Barner, I. Harris, D. Kroening, be seen, and in opposite, the combination of Other : Other was and O. Raz, Eds., 2011. executed very often. Thus, clear gaps can be seen. In contrast, [11] L. Martignoni, R. Paleari, G. F. Roglia, and D. Bruschi, “Testing CPU the Coverage-guided Aging generator has much weaker peaks [12] emulators,” in ISSTA, 2009, pp. 261–272. H. Wagstaff, T. Spink, and B. Franke, “Automated ISA branch cov- on certain groups. In addition, every group is executed and erage analysis and test case generation for retargetable instruction set always reaches a clearly visible execution count. Thus, the [13] simulators,” in CASES, 2014, pp. 1–10. result of the random test generator seems to degenerate. In “RISC-V torture test generator,” https://github.com/ucb-bar/ riscv-torture. comparison, the Coverage-guided Aging test generator provides [14] V. Herdt, D. Große, and R. Drechsler, “Towards specification and testing a more balanced result, and no gaps can be seen. Unfortunately, [15] of RISC-V ISA compliance,” in DATE, 2020, pp. 995–998. the results could not be presented for space reasons. Thus, we V. Herdt, D. Große, H. M. Le, and R. Drechsler, “Verifying instruction set simulators using coverage-guided fuzzing,” in DATE, 2019, pp. 360– have shown that Coverage-guided Aging complements to close 365. gaps and achieves more balanced verification results. [16] “RISC-V ISA tests,” https://github.com/riscv/riscv-tests. [17] “RISC-V compliance task group,” https://github.com/riscv/ C. Detected Pipeline Bug riscv-compliance. [18] N. Bruns, V. Herdt, D. Große, and R. Drechsler, “Toward RISC-V CSR During the development of the Coverage-guided Aging test compliance testing,” IEEE ESL, vol. 13, no. 4, pp. 202–205, 2021. generator we have discovered a micro-architectural related bug [19] “RISC-V formal verification framework,” https://github.com/ in the accompanied test bench adapter of the already well-tested [20] SymbioticEDA/riscv-formal, 2020. “OneSpin 360 DV RISC-V Verification App,” https://www.onespin.com/ industrial RTL-Core. In certain test cases, there where no free solutions/risc-v, 2020.