Source 44c1dbd8... — STIMSMITH

SOURCE ARCHIVE

SHA256: 44c1dbd8f2729b5f0bde4d9d870cbd2efce9ea1a8d7cbd18cc0bc71cbfd4d652

URL: https://agra.informatik.uni-bremen.de/doc/konf/DATE2022_crosslevelPV.pdf

TYPE: application/pdf

SIZE: 170.5 KB

FETCHED: 6/5/2026, 10:28:25 AM

EXTRACTOR: liteparse

CHARS: 27,372

EXTRACTED CONTENT

27,372 chars

      Cross-Level Processor Verification via
 Endless Randomized Instruction Stream Generation
with Coverage-guided Aging

Niklas Bruns1 Vladimir Herdt1,2 Eyck Jentzsch3 Rolf Drechsler1,2
1Institute of Computer Science, University of Bremen, 28359 Bremen, Germany

2Cyber-Physical Systems, DFKI GmbH, 28359 Bremen, Germany 3MINRES Technologies GmbH, 85579 Neubiberg, Germany {nbruns,vherdt,drechsler}@uni-bremen.de eyck@minres.com

Abstract—We propose a novel cross-level verification approach context leverage methods based on co-simulation that employ for processor verification at the Register-Transfer Level (RTL). an Instruction Set Simulator (ISS) (i.e. an executable abstract The foundation is a randomized coverage-guided instruction model of the processor core, typically implemented in C++) stream generator that produces one endless and unrestricted as a functional reference model for the RTL processor under instruction stream that evolves dynamically at runtime. We lever- age an Instruction Set Simulator (ISS) as a reference model in a test. Such a method is used by Google’s open-source RISC-V tight co-simulation setting. Coverage information is continuously Design Verification (DV) framework. It applies constraint-based updated based on the execution state of the ISS and we employ specification techniques in SystemVerilog to generate RISC-V Coverage-guided Aging to smooth out the coverage distribution of assembly tests one after another. Different RISC-V instruction the randomized instruction stream over the time. In combination, sets are supported by selecting and combining the respective this enables a broad and deep coverage to find intricate corner- case bugs in the RTL processor. Our case study with an industrial constraint-based specifications. Execution results between the pipelined 32 bit RISC-V processor demonstrate the effectiveness ISS and RTL processor core are compared through execution log of our approach. files. While this feature set makes RISC-V DV very powerful in I. INTRODUCTION general, it also has some major weaknesses. In order to keep the framework generic, the generated tests use a restricted in- Extensive processor verification at the Register-Transfer struction set to avoid problems with infinite loops and platform- Level (RTL) is essential to detect intricate bugs, which could dependent memory access operations. Moreover, by generating lead to enormous follow-up costs and additional design it- tests one by one, only comparatively short instruction sequences erations. Simulation-based methods that rely on continuous are considered, and the state of the processor under test is processor-level stimuli generation are still prevalent and form regularly reset for each new test execution. Furthermore, the the backbone of the verification effort due to their ease of co-simulation has an inherent performance overhead due to use and scalability. In this paper we consider RISC-V [1], [2] the extensive filesystem communication, since each RISC-V as a representative Instruction Set Architecture (ISA) which assembly test needs to be compiled, loaded onto the respective serves as foundation for modern processor architectures, in simulator, and produce a log file for comparison. Finally, the test particular in the embedded application domain. RISC-V is a generator is not designed to be dynamically guided by coverage free and open-source ISA that enables a royalty-free processor information obtained from the test execution progress. Many of design and implementation. It is designed in a very modular these issues have been addressed by a recent academic work [3]. way with optional standard instruction set extensions around It generates endless instruction streams and integrates the ISS a mandatory base integer instruction set and the ability to with the RTL core in a very efficient co-simulation compiled integrate additional custom instruction sets to build highly into a single binary with in-memory communication. The setup application-specific processors. These properties made RISC-V allows to generate instructions without any restrictions, i.e., very popular in industry and academia. From the verification arbitrary combinations of load/store and Control and Status perspective, however, the extensive modularity adds additional Registers (CSRs)1 instructions, as well as infinite loops, are complexity. Besides the modern features provided by RISC-V supported, which enables a very comprehensive test approach. and any micro-architectural specific optimizations of the pro- However, the approach is still limited as it does not collect or cessor, such as pipelining and branch prediction, the verification employ runtime coverage information to assess and guide the tools also need to be able to deal with the large configura- test generation process. Instead, the instruction stream genera- tion space offered by RISC-V. Promising approaches in this tors are based on a simple randomized test strategy which makes it very difficult to continuously achieve a broad and deep test This work was supported in part by the German Federal Ministry of Education and Research (BMBF) within the project Scale4Edge under contract 1In the CSRs, the processor stores additional instruction results to enable no. 16ME0127, and within the project VerSys under contract no. 01IW19001. sophisticated hardware/software interactions.

coverage in endless instruction streams. In this paper, we propose a novel cross-level verification approach that conceptually builds upon the previous academic work [3] and addresses the aforementioned limitations. The foundation is a randomized coverage-guided instruction stream generator that produces an endless and unrestricted instruction stream that evolves dynamically at runtime based on observed coverage information. We also leverage an ISS as a reference model in a tight co-simulation setting. Coverage information continuously updated based on the execution state of the ISS and we employ the novel concept of Coverage-guided Aging to smooth out the coverage distribution of the randomized struction stream over time. In combination, this enables a broad and deep coverage to find intricate corner-case bugs in the RTL core. Our experiments with the 32-bit pipelined RISC-V core of the MINRES The Good Core (TGC) series demonstrate the effectiveness of our approach. We achieve a much more regular coverage distribution of the randomized instruction stream via Coverage-guided Aging, and we found another intricate micro- architecture related bug in the interplay between the already heavily tested industrial processor with the accompanied test bench infrastructure. II. RELATED WORK Several approaches have been proposed to generate tests for the purpose of processor verification. One prominent direc- tion is to employ model-based test generators that constraint-based specification format to guide the test generation process [4], [5]. In this context, optimization techniques for constraint propagation [6], execution path coverage models [7] and mining techniques for processor manuals [8] have been considered. Alternative approaches integrate coverage-guided test generation based on bayesian networks [9] and other ma- chine learning techniques [10] as well as fuzzing [11] and symbolic execution [12]. However, these approaches are either not designed for RTL verification or impose restrictions on the generated instruction streams. In addition, they do not target the modern RISC-V ISA. Recently, verification approaches tailored for RISC-V have emerged. In the introduction, we already covered the modern co-simulation based approaches that are tailored for RTL and are closest to our proposed approach. Other simulation-based approaches for RISC-V generate instruction sequences by com- bining pre-defined randomized patterns [13] and by utilizing constraint-based specifications [14] as well as coverage-guided fuzzing techniques [15]. However, they suffer from the same limitations as the traditional processor-level stimuli generation approaches in imposing restrictions or operating at a differ- ent abstraction level than RTL. Finally, a set of directed test- suites that cover different RISC-V instruction sets [16]–[18] are available that form a baseline for testing and looking beyond simulation-based techniques. A few formal approaches that are based on model checking techniques [19], [20] have been proposed as well. Nevertheless, these formal techniques are possibly susceptible to scalability issues. Seed InstrGen Core-Adapter RTL-Core RTL-Memory C o m p Instruction-Injector Coverage-Observer a r at o Seed InstrGen ISS ISS-Memory r is Fig. 1. Overview on core verification in- III. BACKGROUND ON RISC-V RISC-V, is a free and open Instruction Set Architecture (ISA) that was developed at UC Berkeley and is available under the open-source license: Creative Commons Attribution 4.0 Interna- tional License [1], [2]. RISC-V provides three different integer base ISAs that differ primarily in the used word width: RV32I is the 32-bit version of the architecture, RV64I is the 64-bit, and RV128I is the 128-bit version. These base ISAs define integer calculations, program control, load and store operations, and debugging instructions. In addition to this base ISAs, many instruction set extensions are defined. The used extensions are appended to the name of the integer base ISA to name the ca- pabilities of a core implementation. A 32-bit RISC-V processor with a multiplication unit, CSR instructions, Fence, and support for compressed instructions is called RV32IMCZicsrZifencei. leverage a IV. CROSS-LEVEL PROCESSOR VERIFICATION WITH COVERAGE-GUIDED AGING In this section, we present our cross-level processor verifica- tion approach that is based on endless randomized instruction stream generation using Coverage-guided Aging. We start with an overview. A. Overview Fig. 1 shows the overview of our approach. It starts with initializing the random instruction generators (InstrGen). Each core has its separate instruction generator which are initialized with the same cryptographic seeds. As a consequence, the gen- erators provide the same endless randomized instruction stream. At first, some instructions of the endless instruction stream are generated and executed by the ISS. After this, the RTL processor fetches its instruction stream. However, for the fetching of the RTL core, micro-architectural details such as pipelining, pre- fetching, and fetch-buffering have to be considered. For this purpose, a core adapter is used, which checks for addresses that were not fetched by the ISS, fills them with randomized values (not generated by InstrGen), and forwards them to the RTL-Core. After the execution of the instructions, the core and ISS write the results to the separated memories. Next, the Coverage-Observer measures the functional coverage based on the ISS execution state, does the coverage-aging, and gives hints to the Instruction-Injector if functionality must be covered (again. In principle, the functional coverage can be specified arbitrarily complex and is used to guide the test generation over

time. We will present more details on the Coverage-Observer in we have ensured that the behaviors of the random instruction Section IV-B. Next, the Instruction-Injector evaluates the hints generators are equal. and injects instructions to cover the requested functionality. The injector must consider that the cores have different fetch V. EVALUATION behaviors and execution timings that result in individual random In this section, we present our case study and discuss the instruction generator states. The functional principle of the evaluation results. The goal of our case study is to evaluate the Instruction-Injector is described in Section IV-C. The purpose applicability of Coverage-guided Aging for cross-level proces- of the Comparator is to find functional differences between sor verification. We start with the test setup. the RTL-Core and the ISS. To achieve this, it compares the register values of the ISS and the RTL-Core. The matching is not A. Test Setup straightforward because the cores do not have the same timing As Device Under Test (DUT), we used the 32-bit pipelined behavior. The Comparator logs the value changes and constantly RISC-V core of the MINRES The Good Core (TGC) series, compares the two changes at the same position to solve this which has already been extensively verified using simulation- problem. If the Comparator finds any differences, then it quits based approaches and formal techniques. As reference ISS, we the simulation. In the following, we provide more details on used the ISS of the open-source SystemC-based RISC-V VP2. the Coverage-Observer (Section IV-B) and Instruction-Injector To enable the co-simulation, we translated the industrial RTL (Section IV-C), which are the two most important components core to C++ using the open-source tool Verilator3 and inte- to implement coverage-guided aging. grated it into a SystemC test bench along with the ISS. For B. Coverage-Observer our evaluation, we configured the core and ISS to support the The main functionality of the Coverage-Observer is to mon- RISC-V subset RV32IMCZicsrZifencei (see: Section III). All itor the internal state of the ISS to measure the coverage. It experiments were executed on an Ubuntu 20.04 LTS machine samples the executed instructions and looks up the matching with an AMD Ryzen 7 PRO 4750U CPU with 4.1GHz and coverage points. In this work, we define the cross-product of 36GB RAM and a SystemC simulation time limit of 1 second (≈ instruction groups as coverage points. The instruction groups are 20 million instructions). By analyzing the RISC-V specification, defined by a verification engineer to lay the verification focus we identified the following six important instruction groups that at the to-be-tested functionality. An instruction group covers act as base for the coverage points in this case study: Arithmetic, a set of instructions like arithmetic or load/store instructions. Control Flow, Memory, Special & System, Control & Status Consequently, our approach guarantees to verify each func- Register (CSR), and Other. The group Arithmetic contains all tionality in combination with every function. The Coverage- arithmetic instructions of the instruction subsets RV32I and Observer watches the executed instructions at run-time and is RV32C and all instructions of RV32M. The group Control Flow the heart of our coverage aging extension. After an instruction contains the unconditional jump and the conditional branch sequence covers an coverage point, the Coverage-Observer sets instructions of RV32I and RV32C. The group Memory con- the corresponding Coverage-guided Aging counters to a defined tains the load/store instructions of RV32I and RV32C and the maximal value. Periodically, the Coverage-Observer decreases memory ordering instructions of RV32I. The group Special & the Coverage-guided Aging counter until the minimum limit System contains the ECALL and EBREAK, the NOP and the is reached. In this case, it gives a hint to the Instruction- HINT instructions of RV32I, and the illegal, NOP, breakpoint, Injector. This hint consists of a random instruction sequence and HINT instructions of RV32C. Additionally, it contains the that is needed to cover the coverage point. The instructions are FENCE instruction of ZIFENCEI. The group CSR is equivalent randomly selected instructions that were sampled in this run to ZICSR. The group Other contains all instructions of the un- dynamically. The Coverage-Observer will reset the Coverage- defined and unsupported subsets and the privileged architecture. guided Aging counter if the groups are covered again. Next we As a consequence of the six instruction groups and the resulting describe the Instruction-Injector. 36 coverage points, we configured the Coverage-guided Aging counter to the value 100 and will be decremented after a new C. Instruction-Injector instruction is generated. With the value 100, there are enough The purpose of the Instruction-Injector is to inject instruction random instructions, and at the same time, the coverage points sequences into the random test generators in compliance with are triggered frequently. In the following, we compare the results their internal state. When the instruction injection ignores the of a random test generator with and without our Coverage- internal states, then the generators provide differing instruction guided Aging extension (Section V-B). Then we present a bug streams that may lead to a false result of the Comparator. To that we found during the development process (Section V-C). achieve a legal injection, the Instruction-Injector measures how B. Random vs. Coverage-guided Aging many instructions have been executed before the current state Fig. 2 shows the result bar chart of our case study. The chart of the random generator was reached. Then, it schedules the gives information about how often the coverage points (defined injection to the same near-future instruction count for all instruc- as cross product of the instruction groups) were executed by the tion generators. This approach is valid because deterministic random sources, that are initiated with the same cryptographic 2 seed value, provide the same random sequences. In this way, 3https://github.com/agra-uni-bremen/riscv-vp https://www.veripool.org/verilator/

     2d                                                              entries in the execute FIFO of the pipeline and thus the core did
 35                                                                  not receive any further instructions. This was triggered because
     m= Random + Coverage Aging                                      the pipeline was only emptied by the test bench adapter when a
 20                                                                  valid instruction was executed. Therefore, a test case could trig-
                                                                     ger this error if the core ran too many invalid instructions (see:
 25                                                                  Special & System : Special & System in Fig. 2) in succession.
geo                                                                  D. Discussion and Future Work
                                                                                         Our case study shows, that Coverage-guided Aging is a ef-
                                                                     fective extension for cross-level processor verification. We have
 10                                                                  shown that Coverage-guided Aging complements to close gaps
                                                                     and achieves a much more regular coverage distribution. Fur-
                                                                     thermore we found another intricate micro-architectural bug in
                                                                     the already heavily tested industrial processor. For future work,
 0                                    55000              350853529   we plan to design advanced micro-architecture coverage metrics
                                                                     to measure specific feature testing like the hazard handling of
                 £            EH                                 £   pipelines. In addition, we plan to create a processor verification
                                                                     benchmark based on finely detailed coverage groups.
     3og         Eg                zgEg     0      gE:ᴬ₅                                           REFERENCES
                 gz                52 02                              [1]  A.  Waterman  and  K.   Asanovi´c, Eds., The RISC-V     Instruction Set
                     Cross Coverage   ro                              [2]  Manual; Volume I: Unprivileged ISA, 2019.
                                                                           ——, The RISC-V Instruction Set Manual; Volume II: Privileged Archi-
     Fig. 2.         Cross Coverage Groups : Sum of all runs               tecture, 2019.
                                                                      [3]  V. Herdt, D. Große, E. Jentzsch, and R. Drechsler, “Efficient cross-level
     random test generator and the Coverage-guided Aging test gen-         testing for processor verification: A risc- v case-study,” in FDL, 2020,
   erator. The random generator is a re-implementation of the test    [4]  pp. 1–7.
 generator of [3] and has already proven its excellent bug-hunting         A. Adir, E. Almog, L. Fournier, E. Marcus, M. Rimon, M. Vinov,
                                                                           and A. Ziv, “Genesys-pro: innovations in test program generation for
capabilities. Unfortunately, it tends to favor specific test state         functional processor verification,” D&T, pp. 84–93, 2004.

spaces. It is based on a static randomized test strategy that does ~~ [5] B. Campbell and I. Stark, “Randomised testing of a microprocessor not change over time. However, such an adjustment is critical model using SMT-solver state generation,” in Formal Methods for since we are looking at an endless instruction stream and not at Industrial Critical Systems, F. Lang and F. Flammini, Eds., 2014, pp. 185–199. individual cases where readjustment after each run is possible. ~~ [6] Y. Katz, M. Rimon, and A. Ziv, “Generating instruction streams using As stated in the legend, the blue bars, which are always on the abstract CSP,” in DATE, 2012, pp. 15–20. left side, represent the instructions generated by the random test [7] M. Chupilko, A. Kamkin, A. Kotsynyak, and A. Tatarnikov, “Mi- croTESK: specification-based tool for constructing test program gen- generator and the orange bars belonging to the test generator erators,” in HVC, 2017. that is enhanced with Coverage-guided Aging. The execution of [8] W. Ma, A. Forin, and J. Liu, “Rapid prototyping and compact testing the random test generator leads to substantial peaks in specific [9] of CPU emulators,” in RSP, 2010, pp. 1–7. S. Fine and A. Ziv, “Coverage directed test generation for functional combinations of instruction groups while other combinations verification using bayesian networks,” in DAC, 2003, pp. 286–291. were almost never executed. For example, the count of Special [10] C. Ioannides, G. Barrett, and K. Eder, “Feedback-based coverage & System : Special & System is so low that it almost can not directed test generation: An industrial evaluation,” in Hardware and Software: Verification and Testing, S. Barner, I. Harris, D. Kroening, be seen, and in opposite, the combination of Other : Other was and O. Raz, Eds., 2011. executed very often. Thus, clear gaps can be seen. In contrast, [11] L. Martignoni, R. Paleari, G. F. Roglia, and D. Bruschi, “Testing CPU the Coverage-guided Aging generator has much weaker peaks [12] emulators,” in ISSTA, 2009, pp. 261–272. H. Wagstaff, T. Spink, and B. Franke, “Automated ISA branch cov- on certain groups. In addition, every group is executed and erage analysis and test case generation for retargetable instruction set always reaches a clearly visible execution count. Thus, the [13] simulators,” in CASES, 2014, pp. 1–10. result of the random test generator seems to degenerate. In “RISC-V torture test generator,” https://github.com/ucb-bar/ riscv-torture. comparison, the Coverage-guided Aging test generator provides [14] V. Herdt, D. Große, and R. Drechsler, “Towards specification and testing a more balanced result, and no gaps can be seen. Unfortunately, [15] of RISC-V ISA compliance,” in DATE, 2020, pp. 995–998. the results could not be presented for space reasons. Thus, we V. Herdt, D. Große, H. M. Le, and R. Drechsler, “Verifying instruction set simulators using coverage-guided fuzzing,” in DATE, 2019, pp. 360– have shown that Coverage-guided Aging complements to close 365. gaps and achieves more balanced verification results. [16] “RISC-V ISA tests,” https://github.com/riscv/riscv-tests. [17] “RISC-V compliance task group,” https://github.com/riscv/ C. Detected Pipeline Bug riscv-compliance. [18] N. Bruns, V. Herdt, D. Große, and R. Drechsler, “Toward RISC-V CSR During the development of the Coverage-guided Aging test compliance testing,” IEEE ESL, vol. 13, no. 4, pp. 202–205, 2021. generator we have discovered a micro-architectural related bug [19] “RISC-V formal verification framework,” https://github.com/ in the accompanied test bench adapter of the already well-tested [20] SymbioticEDA/riscv-formal, 2020. “OneSpin 360 DV RISC-V Verification App,” https://www.onespin.com/ industrial RTL-Core. In certain test cases, there where no free solutions/risc-v, 2020.