Publication
Randomized Testing of RISC-V CPUs Using Direct Instruction Injection was authored by Alexandre Joannou, Peter Rugg, Jonathan Woodruff, Franz A. Fuchs, Marno van der Maas, Matthew Naylor, Michael Roe, Robert N. M. Watson, Peter G. Neumann, and Simon W. Moore, and published in IEEE Design & Test of Computers, volume 41, issue 1, pages 40–49, in February 2024. [IEEE Design & Test of Computers] [DOI 10.1109/MDAT.2023.3262741] [1]
Overview
The paper describes a randomized verification approach for RISC-V CPU implementations built around the TestRIG ecosystem. It positions TestRIG as a standardized environment in which verification engines, models, and implementations communicate through common interfaces and can be improved independently. [TestRIG ecosystem] [2]
The central mechanism is Direct Instruction Injection: instead of relying only on fetched program binaries, the system injects instruction-level packets into implementations. The evidence states that instruction injection makes shrinking of instruction sequences with branches straightforward and was used to replace instruction-level unit tests for the CHERI extension. [Direct Instruction Injection] [2]
RVFI-DII interface and implementation requirements
To participate in the TestRIG ecosystem, implementations must be extended with RVFI-DII instrumentation. The paper states that supporting data structures and libraries are distributed in several languages to facilitate RVFI-DII connections over TCP ports. [RVFI-DII instrumentation] [2]
The evidence also defines baseline expectations for TestRIG participants: implementations are expected to be identical in every architecturally visible way, expose an RVFI-DII interface, provide 8 MiB of memory at address 0x80000000, return access faults for all other addresses, and support reset to a known state including zeroed registers, known default CSR values, and zeroed memory after a reset DII packet. [TestRIG requirements] [2]
The paper discusses implementation choices for instruction injection. A design may remove the instruction cache entirely while preserving architecturally visible PC translation, or it may exercise the instruction cache and replace instruction bytes after fetch. For RISC-V compressed instructions, the paper notes a choice between substituting picked instructions before decode and injecting 16-bit instruction fragments to exercise instruction-picking logic. [Injection design choices] [2]
Shrinking and counterexample reduction
The paper emphasizes shrinking as a key advantage of Direct Instruction Injection. Once QCVEngine finds a counterexample, QuickCheck's built-in list shrinking removes candidate instructions and reruns the test to discard instructions irrelevant to the erroneous behavior. [Smart shrinking] [2]
The authors augment generic list shrinking with smart transformations. One described transformation propagates an instruction's output register into later input operands, enabling additional list-shrinking passes to remove move-like instructions. The paper also describes a simplification library that replaces distracting or esoteric instructions with simpler equivalents when possible, so that the reduced trace more directly exposes the root cause of a failure. [Smart shrinking] [2]
Sequences can also be annotated as non-shrinkable. The evidence gives the example of forcing initialization to avoid trivial counterexamples caused by uninitialized floating-point registers, allowing testing to proceed to more interesting divergences in exception conditions and rounding modes. [Non-shrinkable sequences] [2]
Assertions
TestRIG sequences may include assertions, such as asserting that the value written by the previous instruction was non-zero. According to the paper, assertions allow failures without requiring tandem verification, and the authors used them to test limits of implementation-defined behavior. [Assertions] [2]
Coverage evaluation
The paper evaluates architectural coverage using sailcov, which measures how many branches of the RISC-V Sail model are explored during a run. The coverage study compares TestRIG's QCVEngine against the RISC-V test suite, riscv-tests, and the RISCV-DV generator. [Coverage methodology] [3]
The study conducts two runs of each framework—QCVEngine, riscv-tests, and RISCV-DV—for both RV32IMC and RV64IMAFDCZicsr. For RV32IMC, the paper measures Sail-model coverage of the I, M, and C extension instructions and general-purpose registers. [Coverage methodology] [3]
Related tools and context
The paper situates TestRIG relative to other verification approaches. It describes PyH2P as pointing in an encouraging direction but lacking community-standard interfaces proven across a range of implementations; TestRIG is presented as maturing that approach through standardized communication among verification engines, models, and implementations. [Related work] [2]
The evidence also mentions IBM's Genesys-Pro as a template-based approach for intelligently solving for desired deep states, and Symbolic QED as an approach that generates minimal tests for verification, including post-silicon verification, using a formal model of the pipeline. [Related work] [2]
External reception
The paper's RVFI-DII approach is independently cited in follow-on work on large-scale RISC-V processor verification, where applying RVFI-DII to the Ibex core is reported as requiring more than 450 lines of code, and is used as a baseline against which alternative LLM-assisted testbench generation methods are compared. [4] [5]
Significance
From the provided evidence, the paper's main contribution is an end-to-end randomized RISC-V CPU testing framework: standardized RVFI-DII integration, direct instruction injection, randomized generation through QCVEngine, architectural comparison against other test generators, and shrinking techniques that reduce failures into simpler counterexamples. [TestRIG ecosystem] [Smart shrinking] [Coverage methodology] [2] [3]