RISC-V Wiki — STIMSMITH

RISC-V

Concept WIKI v19 · 6/10/2026

RISC-V is a free and open instruction-set architecture (ISA) standard based on RISC design principles. The provided evidence positions RISC-V as a major target for several research areas: randomized and tandem CPU verification (TestRIG and Direct Instruction Injection via the RVFI-DII interface), push-button formal hardware verification (rtlv) of RISC-V SoCs such as PicoRV32 and MicroTitan, minimally invasive generation of instruction-set simulators from the LIBRISCV formal ISA model integrated with Spike and RISC-V VP, comprehensive floating-point (FP) verification of RISC-V simulators using FP-RVVTS, hardware-accelerated generative-model-based RISC-V verification (Lyra and LyraGen), broader RISC-V micro-architectural verification practices (SystemVerilog/UVM, formal, emulation, plus property-based testing of pipelined RISC-V CPUs with QuickCheck and BlueCheck), RISC-V hardware/architecture security research, and software-exploitation studies such as return-oriented programming on RISC-V.

Overview

RISC-V is described in the provided public sources as a free and open instruction-set architecture standard and as an open-source hardware ISA based on RISC design principles. The same sources frame RISC-V as important in embedded and IoT processors, security research, software-exploitation analysis, CPU verification, floating-point (FP) verification of simulators, hardware-accelerated generative-model-based verification, and property-based testing of pipelined RISC-V CPUs.

The current technical evidence positions RISC-V as a target for several distinct lines of research:

Randomized and tandem CPU verification with TestRIG and Direct Instruction Injection (DII) over the RVFI-DII interface, as documented in the IEEE Design & Test version of the DII paper.
Property-based testing of pipelined RISC-V CPUs (e.g., Bluespec's "Fife", a 6-stage pipelined core) using TestRIG together with Haskell QuickCheck and the official RISC-V Sail formal model, plus an FPGA-runnable QuickCheck implementation in Bluespec called BlueCheck.
Push-button hardware verification with rtlv on RISC-V SoCs such as PicoRV32 and MicroTitan (an OpenTitan derivative).
Minimally invasive generation of instruction-set simulators (ISS) for RISC-V from the LIBRISCV formal ISA model, integrated with Spike and RISC-V VP.
Comprehensive floating-point verification of RISC-V simulators (RISC-V VP++, Spike FF, QEMU) with FP-RVVTS, covering the F, D, and Zfh extensions.
Hardware-accelerated RISC-V verification with Lyra, a heterogeneous GPU-CPU–FPGA co-verification framework paired with LyraGen, a domain-specialized generative model for ISA-aware stimulus generation.
Broader RISC-V micro-architectural verification practice, combining SystemVerilog/UVM constrained-random generation, formal methods on sub-units, and hardware-assisted verification (emulation, FPGA prototyping), with attention to functional safety (ISO 26262) and microarchitectural security.
RISC-V hardware/architecture security research.
Software-exploitation studies such as return-oriented programming (ROP) on RISC-V.

Randomized and tandem verification (TestRIG and DII)

The paper Randomized Testing of RISC-V CPUs Using Direct Instruction Injection by Joannou, Rugg, Woodruff, Fuchs, van der Maas, Naylor, Roe, Watson, Neumann, and Moore was published in IEEE Design & Test of Computers, volume 41, issue 1, pages 40–49, in February 2024, and is the canonical reference for the DII approach. The paper is also cited in subsequent RISC-V verification work as a general ISA correctness technique [044727c3].

TestRIG framework and RVFI-DII

TestRIG (hosted at github.com/CTSRD-CHERI/TestRIG) is a framework for RISC-V processor verification using the RVFI-DII ("rividy") interface. It supports two types of components: vengines (verification engines) and implementations (including models, simulators, and SoCs). A vengine generates one or more DII streams of instruction traces (itraces) and consumes one or more RVFI streams of execution traces (etraces). An implementation consumes a DII instruction trace and produces an RVFI execution trace. A vengine can, for example, produce two equivalent DII streams for two implementations and check that the resulting RVFI streams are equivalent.

The RVFI-DII communication uses a single socket: the itrace is consumed and the etrace is delivered over the same socket. An implementation must support a mode in which instructions are consumed exclusively from the RVFI-DII instruction port, bypassing any actual instruction memory and ignoring the architectural program counter, and must then deliver a trace report at the end of execution detailing the implementation's behaviour in response to each instruction in the RVFI-DII format.

The TestRIG repository includes submodules for the QuickCheck Verification Engine, Spike (built without CHERI support by default, as a 32-bit version of Sail), RVBS, the Sail model, Ibex, and Toooba. Build dependencies include cabal and ghc (installed via GHCup; cabal 3.4 or later is required for the list-bin command) for the QuickCheck Verification Engine, device-tree-compiler for Spike, the Bluespec compiler bsc for RVBS, verilator (version newer than 2018-03-17) for Ibex and Toooba, and opam packages for the Sail model.

A default configuration can be verified with make followed by utils/scripts/runTestRIG.py. A typical Sail-vs-Toooba comparison on the 64-bit RISC-V ISA is launched with make vengines, make sail-rv64, make toooba-rv64, and utils/scripts/runTestRIG.py -a sail -b toooba -r rv64i. A CHERI-enabled 32-bit comparison between Sail and Ibex is launched with make vengines, make sail-rv32-cheri, make ibex-rv32ic-cheri, and utils/scripts/scripts/runTestRIG.py -a sail -b ibex -r rv32icxcheri.

Capabilities of TestRIG

The TestRIG documentation articulates three concrete advantages over hand-written test suites:

It eliminates the "test gap" between specification and implementation. If the specification is an executable model (such as Sail or L3), TestRIG allows automated verification of any specified property without passing through human interpretation and hand-writing tests. While new classes of ISA behaviour still require work to construct trace generators, new instructions in an existing class are automatically included in traces and run in many more variations than hand-written tests would allow.
It verifies the pipeline as well as specification compliance. Test suites exercise prescribed test results and cannot reasonably verify complex pipeline behaviour, whereas TestRIG can verify every register value read in the pipeline under random sequence generation.
It greatly increases debugging efficiency. In-memory test suites require significant boilerplate to construct a valid test state that cannot be automatically reduced without disturbing instruction layout. Because TestRIG bypasses fetch through the PC, a sequence of instructions can be shortened by simply eliminating instructions from the trace. Automatically reduced counterexamples are reported on the order of a handful of instructions.

The RISC-V Summit Europe evidence discusses TestRIG-related work for randomized tandem verification, Direct Instruction Injection (DII), Sail-model transformations, short generated tests, and lockup debugging in the Toooba processor. A framework applies transformations to versions of a Sail model, builds them, and runs a Verification Engine; transformations and run results are tracked in an SQLite database and can be displayed in HTML for visual inspection. The current scripts use text-based Python transformations, while the authors state that a future direction would be to hook into the Sail compiler and transform the syntax tree directly. The same evidence notes that the framework could be adapted to mutate the RTL of a particular implementation, making it possible to test for microarchitectural coverage rather than only model coverage.

Test-suite generation

The evidence describes a test-suite-generation side effect of combining model-coverage transformations with QuickCheckVEngine's shrinking mechanism. Because only a single difference is introduced to the model at a time, the resulting counterexample is targeted to that line of the model; after shrinking, it becomes a minimal test case relying on the minimal features needed to exercise that behaviour. Repeating the process can build a library of traces that test each line of the model. For the coverage examined so far, these generated tests are described as typically having a single-digit number of instructions, rather than the hundreds of instructions required by traditional ISA tests, making failures easier to diagnose. The authors conjecture that such tests could form the basis of a comprehensive architectural compatibility suite.

Lockup debugging and Toooba findings

The evidence reports that an unrelated TestRIG development was motivated by bugs in the Toooba processor that caused it to lock up and stop retiring instructions — described as one of the worst possible processor failure modes, likely requiring a hardware reset to recover. Toooba was found to lock up by mis-decoding some illegal instructions. To check that the relevant lockup cases had been caught, the authors added a TestRIG mode that allows a single implementation to run alone without comparison to a model. In this mode, checking that an RVFI report is received for every DII instruction within a timeout is sufficient to detect lockup conditions. Injecting uniform 32-bit instructions into the processor with DII reproduced the decode issues and confirmed that the problem was resolved after fixes to Toooba. The same approach found a subtle, rare branch-prediction issue in Toooba that could also cause a lockup: the fetch stage could get stuck in a loop by incorrectly predicting that the first instruction is a compressed jump, then redirecting without correctly retraining the branch predictor because of an associativity issue. The framework also identified a fatal assert in a version of the Sail model.

Property-based testing of pipelined RISC-V CPUs (Nikhil/Bluespec)

A talk at FPL Launchpad by Rishiyur S. Nikhil (CTO and co-founder of Bluespec, Inc.) describes applying property-based testing of software to hardware, specifically to verify "Fife", a 6-stage pipelined RISC-V CPU. The talk uses Haskell QuickCheck via TestRIG (from the University of Cambridge) for RISC-V test generation and shrinking, and uses the official RISC-V Formal Model written in the Sail ISA Description Language. It describes the necessary TestRIG and Sail configuration, the instrumentation required for Direct Instruction Injection and effect reporting in any implementation being tested (largely reusable across implementations), and a complication specific to pipelined CPUs: speculative instruction execution arising from branch and PC prediction, and traps. The talk proposes extensions to TestRIG to handle non-deterministic events (such as interrupts) and micro-architectural properties (not shared across implementations).

The talk also briefly introduces BlueCheck, an implementation of QuickCheck written in the hardware design language Bluespec BSV, so that BlueCheck can itself be run in hardware (such as an FPGA) along with the CPU design being tested, for much greater testing speed.

Nikhil's biography notes that he chaired the RISC-V Foundation's technical group that selected the RISC-V ISA formal specification in Sail.

Push-button hardware verification (rtlv)

The paper rtlv: push-button verification of software on hardware by Moroze, Athalye, Kaashoek, and Zeldovich (MIT CSAIL) describes an approach for formally verifying properties that involve software running on hardware for many cycles. The paper uses deterministic start — verifying that boot code resets a processor's microarchitectural state to known deterministic values — as a motivating example.

Two key ideas enable rtlv to handle many cycles of circuit execution. First, rtlv compiles circuits to Rosette, a solver-aided programming language embedded in Racket, using hybrid symbolic execution to maximize the amount of state that remains concrete and minimize the complexity of symbolic expressions. Second, rtlv enables reusable circuit-agnostic property checkers with a performance-hint interface: the verifier itself is trusted, and hints are untrusted. Supplying inadequate or incorrect hints can only harm performance or cause the tool to fail to prove the property — they cannot be misused to make the tool erroneously report "OK". This separation allowed the rtlv/shiva checker to be reused for multiple circuits.

Verification performance

Verifying deterministic start for the PicoRV32 (a small RISC-V CPU with about 1,300 flip-flops) requires modeling 104 cycles of boot code. With rtlv this verification takes 1.3 seconds; with SymbiYosys, a popular open-source hardware verification tool, the same task cannot finish within 12 hours. rtlv scales linearly in cycles, while SymbiYosys scales exponentially. rtlv further scales to a larger 4,300-flip-flop RISC-V SoC (MicroTitan, an OpenTitan derivative) where verifying the state-clearing property requires modeling over 20,000 cycles of software executing on hardware. The MicroTitan case study uses the Ibex CPU, 8KB of ROM, 8KB of RAM, and UART/SPI/USB peripherals; its boot code was modeled to clear microarchitectural state, and rtlv's hint-driven optimizations were necessary to control the symbolic term growth in the SPI state machine. The MicroTitan core clock domain contains the Ibex CPU, memories, UART, and slices of the SPI and USB peripherals, and was the focus of the case study because of output determinism, which allows microarchitectural state to depend on input data and therefore requires tracking a set of "allowed dependencies" through boot-code execution.

Findings

The authors report that formal verification with rtlv helped find and fix violations of the property in the baseline hardware, demonstrating that the tool is useful for finding bugs in practice.

Minimally invasive ISS generation from formal models

The paper Minimally Invasive Generation of RISC-V describes an approach for generating the instruction execution unit of a RISC-V ISS from a formal ISA model while leaving other ISS components (e.g., memory) as-is, easing integration with existing simulators. The authors focus on RISC-V because the architecture is highly modular and the ISA specification is constantly expanding with new extensions, so simulators benefit from formal models that can be regenerated.

ISA model (LIBRISCV)

The approach uses the LIBRISCV formal RISC-V ISA model — an embedded DSL in Haskell. Unlike Sail, LIBRISCV focuses exclusively on describing user-level instruction semantics in isolation, without formally describing other aspects (memory behaviour, microarchitectural details such as pipelining or timing) and without instruction decoding. The LIBRISCV EDSL consists of 26 primitives for formally describing instruction semantics and an expression language for operations on memory/register values. Because the original LIBRISCV by Tempel et al. separated instruction decoding from execution, the authors added new primitives (decodeRD, decodeRS1, decodeImmI) so that instruction semantics can be expressed over the instruction opcode rather than already-decoded fields, enabling direct code generation.

Interface model and code generation

A custom interface model provides a generic C/C++ API for common operations (read_register, write_register, load_byte, load_half, load_word, etc.). The code generation tool is simulator-agnostic; per-simulator implementations map the generic API to internal simulator interfaces (e.g., the ISS's register file and memory implementation).

Integration targets and results

The authors integrated the generated ISS into two existing RISC-V simulators:

Spike, a high-simulation-speed RISC-V simulator developed by UC Berkeley, with limited SystemC/peripheral support.
RISC-V VP, a SystemC-based virtual prototype providing full hardware platform models (e.g., SiFive HiFive1, SiFive HiFive Unleashed).

The integration took less than a day for a programmer with domain knowledge. The modified simulators pass the official RISC-V tests. Benchmarks using the Embench suite (19 applications, 25 executions each) show that the generated ISS has either slightly lower or the same execution time as the manually written baseline, indicating no loss in simulation performance.

Related formal ISA-modelling work

The paper situates its work against:

Sail, a DSL for ISA semantics used to model RISC-V, ARM-v8, and CHERI-MIPS, with code generation to C, OCaml, Coq, Isabelle, and HOL4. Sail aims for completeness (including address translation and instruction decoding) which makes it complex to integrate into existing RISC-V simulators; it instead generates a new standalone ISA simulator.
GRIFT, a Haskell model attempting to capture the ISA precisely in Haskell's type system.
Forvis, an executable formal model in a restricted Haskell subset.
A Coq model using the embedded Kami DSL for reasoning about correctness of microarchitecture implementations or of software running on the ISA.

Floating-point verification of RISC-V simulators (FP-RVVTS)

The paper Float Fight: Verifying Floating-Point Behaviour in RISC-V Simulators by Ruep, Schlägl, and Große (Johannes Kepler University Linz), presented as a DATE 2026 Late Breaking Result, introduces FP-RVVTS — an enhanced version of RVVTS, an open-source framework for positive and negative testing of RISC-V vector instructions — extended to enable comprehensive FP verification across various RISC-V simulators and FP libraries.

Motivation

FP computation is pervasive in modern systems, and incorrect FP behaviour in domains such as scientific simulation, signal processing, computer graphics, machine learning, and safety-critical applications can cause subtle numerical errors that are hard to detect. The IEEE 754 standard specifies FP semantics and has been revised and extended over time, which increases verification effort, particularly for open ISAs such as RISC-V where a diverse set of processors, simulators, and toolchains coexist and evolve in parallel.

FP-RVVTS approach

FP-RVVTS incorporates a context-free FP grammar to cover the RISC-V FP extensions and enriches it with dependency annotations to enable more effective test-case reductions. As in RVVTS, the generated tests are automatically executed on both a reference simulator and a Design Under Test (DUT), where differences in the resulting architectural states are detected and used as input for isolation and minimization of failing instructions.

The grammar covers the F, D, and Zfh extensions for both RV32 and RV64, configured according to the target architecture. The grammar enforces architectural support (unsupported conversions do not compile), configures the rounding-mode CSR field frm, and allows independent per-instruction rounding-mode suffixes (e.g., fsqrt.d f1, f2, rne). Both global and per-instruction rounding mechanisms are supported individually and in combination.

FP-RVVTS uses Spike (with SoftFloat) as the golden reference model — i.e., the official RISC-V reference simulator maintained by RISC-V International.

In case of a failure, FP-RVVTS improves test-case reduction by using dependency annotations in the grammar. While RVVTS can pinpoint a failure to a single instruction, FP-RVVTS also produces a minimal snapshot of the system state immediately before that instruction — containing only the relevant registers and configuration CSRs — thereby easing debugging significantly. The framework incorporates a strengthened single-instruction isolation technique that helps narrow down the cause of failures.

Experimental setup and FP libraries

The DUTs evaluated are:

RISC-V VP++, in variants using SoftFloat (SF) and FloppyFloat (FF) as the FP library.
Spike FF, a performance-tuned version of the official RISC-V golden reference simulator Spike from RISC-V International.
QEMU, which employs SoftFloat.

SF is Berkeley SoftFloat, a widely used, bit-accurate IEEE 754 reference library from UC Berkeley. FF is a hybrid library designed to offer SF-compatible semantics at higher performance. The paper states that FP-RVVTS and the DUTs were configured for RV64IFD_Zfh, providing the broadest FP coverage supported across all tested simulators.

Test results and bug findings

FP-RVVTS generates test sets achieving more than 95% functional coverage in both positive and negative testing modes. The two test sets produced in the paper's experiments are summarized as:

Positive test set: 95.44% functional coverage, 51.80% FP-instruction ratio (42,187 / 81,447), 5,353 tests.
Negative test set: 96.73% functional coverage, 53.48% FP-instruction ratio (43,530 / 81,394), 5,340 tests.

Failures reported by FP-RVVTS across the eight test setups (4 DUTs × 2 test sets) include:

RISC-V VP++ FF positive: 852 failures, isolated to fcvt.wu.d and fcvt.wu.s.
RISC-V VP++ FF negative: 1,858 failures, isolated to fld/w/h, fsd/w/h, fmv.x.h, fcvt.wu.d/s, fcvt.lu.d.
RISC-V VP++ SF positive: 373 failures, isolated to fmax.d/s and fmin.d/s.
RISC-V VP++ SF negative: 1,758 failures, isolated to fld/w/h, fsd/w/h, fmax.d/s/h, fmin.d/s, fmv.x.h.
Spike FF positive: 1 failure, isolated to fnmadd.d.
Spike FF negative: 1 failure, isolated to fdiv.s.
QEMU positive: 0 failures.
QEMU negative: 0 failures.

The paper highlights that, in most cases, the number of failures is higher in negative testing than in the corresponding positive testing, demonstrating the importance of negative testing.

Specific bugs identified by FP-RVVTS

fmax.d (RISC-V VP++ SF): NaN handling bug. Spike returns the non-NaN operand, whereas RISC-V VP++ SF returns NaN. Per the RISC-V specification, the semantics changed in v2.2: before v2.2 the result was NaN; since v2.2 it is the non-NaN operand. Thus, the bug stems from RISC-V VP++ implementing the pre-v2.2 behaviour.
fld (RISC-V VP++): FP-enable check bug. When the FP extension is disabled, Spike correctly raises an exception, but RISC-V VP++ executes the instruction because its load logic omits the required FP-extension-enabled check. This bug is independent of the underlying FP library (occurs with both SF and FF), pointing to a RISC-V VP++ bug rather than a library issue.
fcvt.wu.d (RISC-V VP++ FF): NaN-handling bug. RISC-V VP++ FF correctly computes the NaN result as 2^32 − 1 but then incorrectly zero-extends the 32-bit value to 64 bits instead of sign-extending it as required by the RISC-V specification.
fdiv.s (Spike FF): exception-flag generation bug. Spike sets the underflow exception flag as intended, whereas Spike FF does not, indicating a bug in Spike FF.

The paper concludes that FP-RVVTS systematically verified the FP backbones of modern RISC-V simulators and, in the "float fight" between their SF/FF libraries, exposed a diverse set of previously unknown bugs — from outdated NaN semantics and missing FP-enable checks to incorrect sign extension and exception-flag handling.

Related work cited for RISC-V verification

The FP-RVVTS paper situates itself against substantial prior work on general RISC-V instruction generation and processor verification, including:

Herdt, Große, and Drechsler: Towards specification and testing of RISC-V ISA compliance (DATE 2020) and Efficient cross-level testing for processor verification: A RISC-V case-study (FDL 2020).
Klemmer and Große: EPEX: processor verification by equivalent program execution (GLSVLSI 2021).
Bruns, Herdt, Jentzsch, and Drechsler: Cross-level processor verification via endless randomized instruction stream generation with coverage-guided aging (DATE 2022).
Schlägl and Große: Single instruction isolation for RISC-V vector test failures (ICCAD 2024) — the basis for RVVTS.
Schlägl, Hazott, and Große: RISC-V VP++: Next generation open-source virtual prototype (OSDA 2024).
Solt, Ceesay-Seitz, and Razavi: Cascade: CPU fuzzing via intricate program generation (USENIX Security 2024).
Xu, Liu, He, Lin, Zhou, and Wang: MorFuzz: Fuzzing processor via runtime instruction morphing enhanced synchronizable co-simulation (USENIX Security 2023).
Liu, Hu, Liu, Han, and Liu: A RISC-V test sequences generation method based on instruction generation constraints (Journal of Electronics & Information Technology, 2023).
Lu, Liu, Xia, and Liu: Comprehensive RISC-V floating-point verification: Efficient coverage models and constraint-based test generation (DATE 2025).
Google's RISCV-DV, an open-source constrained-random generator. RISCV-DV initializes FP registers only once at the beginning of a test with special values, which the FP-RVVTS paper identifies as a coverage limitation.
Imperas riscvOVPsim.
RISC-V Foundation, The RISC-V Instruction Set Manual, Volume I: User-Level ISA, document version 20251126 (November 2025).

The paper notes that, to the best of its knowledge, the only RISC-V-specific work that directly targets FP instruction verification prior to FP-RVVTS is the DATE 2025 paper by Lu et al., which proposes coverage and constraint-based test generation for RV32F on one concrete RTL processor but is not open-source, is limited to 32-bit single precision, and does not address off-the-shelf RISC-V simulators, alternative FP libraries, or techniques to improve failure analysis.

Hardware-accelerated generative-model-based RISC-V verification (Lyra and LyraGen)

The paper Lyra: A Hardware-Accelerated RISC-V Verification Framework with Generative Model-Based Processor Fuzzing (Huo, Gao, Liu, Wang, Bao, Gao, and Shi, arXiv 2512.13686v3) introduces Lyra, a heterogeneous RISC-V verification framework that pairs FPGA-accelerated test execution with an ISA-aware generative model. The authors note that verification can consume up to 70% of processor-development effort, and the emergence of open ISAs such as RISC-V has driven demand for more agile verification methods. The paper frames two challenges in conventional verification: low software-simulation throughput (typically tens of kHz) and the semantic blindness of purely random or bit-flip-based fuzzers, which struggle to produce the logically precise, semantically coherent instruction sequences needed to exercise corner cases.

Lyra architecture (GPU-CPU–FPGA co-verification)

Lyra is described as a heterogeneous verification framework in which test execution, differential checking, and coverage collection are offloaded to hardware. The DUT and the reference model (REF) are instantiated on the same FPGA SoC: the DUT runs on programmable logic, while the REF runs on hardened ARM processors. Dedicated hardware checkers perform runtime differential checking of execution results at the instruction level. Lyra incorporates the control register-based coverage instrumentation approach used in state-of-the-art fuzzing techniques (e.g., DifuzzRTL, Cascade) and implements it directly on FPGA so that the entire coverage-collection loop runs on the FPGA itself. The paper characterizes Lyra as the first heterogeneous GPU-CPU–FPGA co-verification framework reported in the literature it surveys.

LyraGen: domain-specialized generative model for RISC-V

To address the semantic blindness of software fuzzers, Lyra incorporates LyraGen, a domain-specialized generative model of 125 million parameters. LyraGen is designed to generate high-quality RISC-V instruction streams while inherently understanding RISC-V semantics.

Base model and training. The authors adopt OPT-125M as the base model and retrain it on a dataset of <instruction, coverage> pairs. Training takes 10 epochs on an NVIDIA GeForce RTX 4090 GPU, totaling 58 minutes.
Input modification. Instead of receiving token IDs as conventional LLMs do, the model directly accepts a 22-dimensional numerical coverage vector. The vector is processed through a linear layer rather than an embedding layer before being fed into the model's hidden layers. Because the input interface and tokenizer are substantially redesigned and the model no longer operates on natural language, the model is trained from scratch rather than fine-tuned.
RVTokenizer. To accommodate diverse instruction token formats, the authors design RVTokenizer with a vocabulary of 257 transformer tokens: one for each numerical value from 0 to 255, plus a padding symbol. Higher-bitwidth values (such as a 20-bit immediate) are represented as sequences of multiple tokens, avoiding vocabulary explosion while preserving strong representational capacity.

Instruction encoding and dataset construction

A non-compressed RISC-V instruction is 32 bits long and consists of multiple fields (opcode, source/destination registers, immediates) that vary by instruction format. The authors redesign the instruction representation by re-encoding each RISC-V instruction into a sequence of tokens. Each token is an integer with a maximum width of 8 bits (values up to 255). The opcode field appears in all instructions, while other fields appear only in specific instruction categories; when a field does not exist for a given instruction, its token is omitted. Because each instruction format uniquely determines its required token set, these omissions introduce no ambiguity and the encoding forms a prefix code. The tokenization table covers the R, I, I (shift), S, B, U, and J instruction types, with per-type token IDs and bit-widths for fields such as opcode, funct7, funct3, Rd, Rs1, Rs2, immediate low/middle/high segments, and shift amount.

To generate high-quality <instruction, coverage> pairs, the authors construct a heterogeneous fuzzing system consisting of a software fuzzer running on a CPU and a hardware testbed implemented on an FPGA. Following the Encore architecture, both the DUT and the software ISA emulator are instantiated on the same FPGA. The software fuzzer generates an instruction block, writes it into main memory, and the DUT and ISA emulator simultaneously fetch and execute it. An encoding module at the output of the hardware fuzzing pipeline translates each RISC-V instruction into the instruction-token representation. Register-level coverage is instrumented directly on the FPGA, extracted immediately after each instruction executes, and fed back to the fuzzer. The fuzzer mutates instructions to generate new candidates, and for each executed instruction, the tokenized encoding and corresponding coverage vector are stored as a pair in the dataset for model training. The authors describe this as a supervised learning approach that gives the model a comprehensive global view of coverage-space behaviour while avoiding the inefficiency and instability of reinforcement learning.

Inference-time instruction filter and address checker

Because the generative model is stochastic, the token sequences it produces do not always map to well-formed instructions. The authors design an instruction filter that verifies the syntactic and semantic validity of each reconstructed instruction during assembly. For invalid instructions, the filter first attempts repair by mapping the token sequence to the closest valid instruction based on binary encoding similarity; if repair is not feasible, the instruction is discarded. Because memory-access instructions can be generated with out-of-bounds or misaligned addresses that would prevent subsequent instructions from executing, the instruction filter includes an address-sanitization module that corrects faulty memory addresses. After correction, instructions are fed into the FPGA testbed, which performs self-checking between the DUT and reference model, and hardware-synthesizable coverage points are automatically instrumented.

Empirical results

The paper reports that Lyra achieves up to 1.27× higher coverage and accelerates end-to-end verification by 107× to 3343× compared to state-of-the-art software fuzzers, while consistently demonstrating lower convergence difficulty.

Related processor-fuzzing work cited by Lyra

The Lyra paper situates itself against prior work on adapting software fuzzing to processor verification, including:

Laeufer et al., RFUZZ: coverage-directed fuzz testing of RTL on FPGAs (2018).
Muduli et al., HyperFuzzing for SoC security validation (2020).
Canakci et al., ProcessorFuzz: processor fuzzing with control and status registers guidance (2023).
Kabylkas et al., Effective processor verification with logic fuzzer enhanced co-simulation (2021).
Kande et al., TheHuzz: instruction fuzzing of processors using Golden-Reference models for finding Software-Exploitable vulnerabilities (2022).
Solt et al., Cascade: CPU fuzzing via intricate program generation (USENIX Security 2024).
Hur et al., DifuzzRTL: differential fuzz testing to find CPU bugs (2021).

Broader RISC-V micro-architectural verification context

A Semiconductor Engineering industry article on RISC-V micro-architectural verification synthesizes current practice and challenges across the RISC-V ecosystem. The article emphasizes that RISC-V's flexibility and extensibility come with verification cost: every custom extension roughly doubles verification effort, and the specification often leaves micro-architectural details open, so every implementation choice in the pipeline, in branch prediction, or in the load/store unit can differ from core to core. Industry practitioners quoted in the article characterize processor verification as fundamentally harder than ASIC verification: while the AS in ASIC stands for "application-specific," processor verification is unbounded because every operation in the ISA must be verified to provide the specified behaviour in every eventuality.

Methodological mix: SystemVerilog/UVM, formal, and emulation

The article reports that the dominant ASIC-verification workhorses — SystemVerilog and UVM — are widely used in RISC-V projects but have limitations. UVM-based random instruction generation can be shaped toward particular areas of interest, but is described as inefficient for flushing out hard-to-reach corner cases, because coverage closure on a single instruction (e.g., 100% coverage on the ADD instruction) does not imply coverage of the cross-product of operands, hazards, and pipeline interactions. The article also reports that the diversity of RISC-V cores, extensions, and the relatively short time since the standard stabilized have produced few standards or open-source tools for processor verification. Several RISC-V DV (design-verification) environments are being built, but the article characterizes formal tools, performance-analysis tools, and other dedicated RISC-V infrastructure as still early.

The article advocates a mixed strategy in which sub-units (branch prediction, pipeline stages, caches, multipliers, load/store units, register files) are verified with formal methods using SystemVerilog assertions, and the integrated core is then verified with UVM testbenches and test software plus emulation. The motivation is that simulation-based verification is unlikely to be exhaustive for data-path complexity, and formal methods exercise every possible combination of inputs to break ISA-specified behaviour. Properties developed for sub-units can be reused at the integration level, and architectural verification assertions and covers can be reused in formal verification of the RTL implementation, which can catch architectural violations as functional bugs, safety issues, or security issues via the confidentiality-integrity-availability triad.

The article also reports that bringing up a complete software stack (e.g., booting Linux) is treated as a final verification step that surfaces classes of bugs not seen elsewhere, including asynchronous effects, timer interrupts, and timing-base mismatches between simulation and FPGA-based emulation. Because timer interrupts can desynchronize a DUT from a golden reference model, one reported practice is to insert timer interrupts every N retired instructions (e.g., 5,000) rather than at fixed clock-cycle intervals, ensuring that DUT and REF execute the same number of instructions.

Functional safety and security considerations

The article notes that when safety certification is required, ISO 26262-style well-defined fault injection and diagnostic-coverage analysis are needed, and the achievable rigor depends on the certification level. The article also highlights the security implications of RISC-V's open-source nature: transparency allows community-driven scrutiny, but it also means adversaries have access to the same information, which intensifies the need for robust security verification of the micro-architecture. The article further notes that as RISC-V implementations begin to adopt speculative and out-of-order execution, they will inherit the same class of microarchitectural security vulnerabilities (e.g., Spectre, Meltdown) that have affected established processor architectures.

Sub-unit verification with formal and shrink-based debugging

The article describes a property-based verification approach in which the behaviour of a processor sub-unit is captured as a set of properties plus a vocabulary of commands, and a generator creates sequences of these commands until it finds a sequence that breaks behaviour with respect to a golden reference model, then shrinks the sequence to a minimal failing test. The approach is reported as effective for sub-units (branch prediction, pipeline stages, caches) but is described as a complement, not a replacement, for the broader verification mix. Coverage is described as necessary but not sufficient: it gives confidence that a good part of the design has been stimulated and that certain classes of bugs have been found, but the article cautions that with increasing design complexity coverage alone will not guarantee that no problems remain.

Security research context

A public RISC-V security survey states that the RISC-V community is studying security solutions intended to support a root of trust and prevent sensitive information on RISC-V devices from being tampered with or leaked, and presents itself as covering representative RISC-V hardware and architecture security mechanisms. A separate public paper on return-oriented programming describes RISC-V as an open-source hardware ISA and reports that RISC-V ROP can perform Turing-complete computation and arbitrary function calls using gadgets found in a GNU libc version, and describes a compiler that converts programs into RISC-V ROP chains using techniques such as self-modifying ROP chains and algorithmic ROP chain generation.

Practical implications from the provided evidence

The supplied evidence supports treating RISC-V as a major target for randomized CPU verification, especially through TestRIG-style Direct Instruction Injection over the RVFI-DII interface, the single-socket itrace/etrace protocol that lets implementations bypass the program counter.
TestRIG is articulated as three concrete advantages over hand-written test suites: eliminating the test gap, verifying the pipeline (not just specification compliance), and producing handful-of-instructions counterexamples through shrinking.
TestRIG supports a vengine-vs-vengine or vengine-vs-implementation comparison, with build artefacts including Sail, Spike, Ibex, Toooba, and RVBS.
Property-based testing of pipelined RISC-V CPUs (e.g., Bluespec's "Fife") is feasible using TestRIG plus the official RISC-V Sail formal model; speculative execution, traps, interrupts, and micro-architectural properties require specific extensions, and BlueCheck is a Bluespec BSV implementation of QuickCheck that can be synthesized to hardware (FPGA) to run alongside the CPU design for higher testing speed.
Push-button formal verification of RISC-V SoC boot properties is feasible with rtlv in seconds where SymbiYosys times out, including for a 4,300-flip-flop OpenTitan derivative requiring 20,000+ modeled cycles.
Formal RISC-V ISA models (LIBRISCV) can drive ISS generation with minimal integration effort, matching the simulation performance of manually written ISSes and passing the official RISC-V test suite.
Sail-model transformation plus shrinking can generate very short, targeted architectural tests, with the authors reporting single-digit instruction counts for the coverage examined so far.
TestRIG's single-implementation mode can detect processor lockups by requiring an RVFI report for every injected DII instruction within a timeout.
The Toooba case studies show that the methodology can expose both illegal-instruction decode lockups and rare branch-prediction lockups.
FP-RVVTS achieves more than 95% functional coverage on RISC-V FP extensions (F, D, Zfh) using both positive and negative testing, and has exposed previously unknown bugs in RISC-V VP++ (outdated pre-v2.2 NaN semantics, missing FP-enable check, incorrect sign extension) and in Spike FF (exception-flag generation for fdiv.s).
QEMU passed both positive and negative FP-RVVTS tests in the reported experiments, while RISC-V VP++ variants produced large numbers of failures across the same tests.
Lyra demonstrates that pairing an FPGA-resident DUT and reference model with a coverage-feedback-driven generative model (LyraGen) can outperform software-only fuzzers by up to 1.27× in coverage and 107×–3343× in end-to-end verification speed, by combining hardware-level coverage instrumentation, ISA-aware stimulus generation, and post-generation instruction filtering (validity repair and address sanitization).
The RISC-V micro-architectural verification industry view frames the dominant practice as a mix of SystemVerilog/UVM constrained-random generation, formal verification on sub-units, and hardware-assisted verification (emulation, FPGA prototyping) for integration and software bring-up, with growing attention to functional safety (ISO 26262 fault injection) and security verification against microarchitectural attacks enabled by speculative and out-of-order execution.
The security-oriented public sources frame RISC-V as relevant both to hardware/architecture security mechanisms and to software exploitation studies such as return-oriented programming.

LINKED ENTITIES

1 links

TestRIG EVALUATES

CITATIONS

10 sources

10 citations

[1] TestRIG is a framework for RISC-V processor verification using the RVFI-DII interface, with two component kinds (vengines and implementations), single-socket itrace/etrace communication, and support for bypassing the architectural program counter. CTSRD-CHERI/TestRIG

[2] TestRIG supports Sail, Spike, Ibex, Toooba, and RVBS as buildable submodules, with a default verification flow using `make` and `utils/scripts/runTestRIG.py`, and example invocations such as `utils/scripts/runTestRIG.py -a sail -b toooba -r rv64i` and a CHERI-enabled `utils/scripts/runTestRIG.py -a sail -b ibex -r rv32icxcheri`. CTSRD-CHERI/TestRIG

[3] TestRIG's documented capabilities include eliminating the test gap between specification and implementation, verifying the pipeline as well as specification compliance, and producing automatically reduced counterexamples on the order of a handful of instructions via direct-injection-based shrinking. CTSRD-CHERI/TestRIG

[4] Bluespec CTO Rishiyur S. Nikhil gave a talk describing property-based testing of "Fife", a 6-stage pipelined RISC-V CPU, using TestRIG with Haskell QuickCheck and the official RISC-V Sail formal model, with proposed extensions for speculative execution, traps, interrupts, and micro-architectural properties, and introduced BlueCheck, a Bluespec BSV implementation of QuickCheck that can run in hardware. Talk: Property Based Testing to Verify Pipelined CPU Hardware

[5] The Lyra paper (arXiv 2512.13686v3) presents a heterogeneous RISC-V verification framework in which the DUT runs on FPGA programmable logic and the reference model on hardened ARM processors, with hardware-synthesizable coverage points and differential self-checking. Lyra: A Hardware-Accelerated RISC-V Verification Framework with Generative Model-Based Processor Fuzzing

[6] LyraGen is a 125M-parameter generative model built on OPT-125M, retrained on `<instruction, coverage>` pairs, taking a 22-dimensional coverage vector through a linear layer and using an RVTokenizer with a 257-token vocabulary, trained from scratch for 10 epochs on an RTX 4090 in 58 minutes. Lyra: A Hardware-Accelerated RISC-V Verification Framework with Generative Model-Based Processor Fuzzing

[7] Lyra re-encodes each RISC-V instruction into a prefix-code sequence of 8-bit tokens covering the R, I, I(Shift), S, B, U, and J instruction formats, with field-specific bit-widths (e.g., opcode 7, funct7 7, funct3 3, Rd/Rs1/Rs2 5) and omission of fields that do not apply to a given format. Lyra: A Hardware-Accelerated RISC-V Verification Framework with Generative Model-Based Processor Fuzzing

[8] The Lyra paper frames the two main challenges in conventional RISC-V verification as the tens-of-kHz throughput limit of software simulation and the semantic blindness of random or bit-flip-based fuzzers that cannot produce the logically precise instruction sequences needed to exercise corner cases. Lyra: A Hardware-Accelerated RISC-V Verification Framework with Generative Model-Based Processor Fuzzing

[9] Industry practitioners quoted in the Semiconductor Engineering article report that RISC-V micro-architectural verification typically mixes SystemVerilog/UVM constrained-random generation, formal methods on sub-units, and emulation/FPGA prototyping, with property-based shrink-based debugging effective on sub-units and timer interrupts inserted every ~5,000 retired instructions to keep a DUT in sync with a golden reference model. RISC-V Micro-Architectural Verification

[10] The Semiconductor Engineering article reports that RISC-V's open-source nature both enables community-driven security scrutiny and gives adversaries access to the same information, that ISO 26262-style fault injection and diagnostic coverage are needed for functional-safety certification, and that as RISC-V cores adopt speculative and out-of-order execution they will inherit the same class of microarchitectural vulnerabilities (e.g., Spectre, Meltdown) seen in established processor architectures. RISC-V Micro-Architectural Verification

VERSION HISTORY

v19 · 6/10/2026 · minimax/minimax-m3 (current)

v18 · 6/10/2026 · minimax/minimax-m3

v17 · 6/10/2026 · minimax/minimax-m3

v16 · 6/9/2026 · minimax/minimax-m3

v15 · 6/7/2026 · gpt-5.5

v14 · 6/6/2026 · minimax/minimax-m3

v13 · 6/3/2026 · minimax/minimax-m3

v12 · 6/2/2026 · gpt-5.5

v11 · 5/30/2026 · gpt-5.5

v10 · 5/29/2026 · gpt-5.5

v9 · 5/29/2026 · gpt-5.5

v8 · 5/28/2026 · gpt-5.5

v7 · 5/28/2026 · gpt-5.5

v6 · 5/28/2026 · gpt-5.5

v5 · 5/27/2026 · gpt-5.5

v4 · 5/27/2026 · gpt-5.5

v3 · 5/27/2026 · gpt-5.5

v2 · 5/27/2026 · gpt-5.5

v1 · 5/26/2026 · gpt-5.5

Compare with: