Skip to content
STIMSMITH

Random Instruction Generation

Concept WIKI v7 · 6/8/2026

Random instruction generation is a hardware verification approach in which instruction sequences or test programs are generated randomly (typically under scenario constraints) to stimulate processor designs and find functional bugs. It is widely used in RISC-V CPU verification because it requires limited human expertise and scales to large RTL designs, but it is known to generate repetitive inputs that test the same processor functionalities and produce long, convoluted counterexamples.

Random Instruction Generation

Random instruction generation is a hardware verification technique in which an instruction set generator (ISG) produces assembly programs whose instructions are chosen randomly, typically under scenario-defined constraints (e.g., instruction mix, frequencies, boot and memory-map parameters), in order to exercise a processor design under test (DUT) and uncover functional bugs.

Role in design verification

Random stimulus is attractive because, in theory, it can exercise all possible combinations given enough time. In practice, however, a purely random approach has difficulty exercising all combinations quickly enough on highly complex designs, so verification environments are often steered toward hard-to-hit combinations using constraints or additional guidance mechanisms. The lack of coverage guidance in random generators is also reported to lead to repetitive inputs that retest the same processor functionalities, decreasing the chances of finding bugs.

A cited machine-learning study further reports that augmenting constrained-random verification tools with supervised learning and reinforcement learning can produce better functional coverage and reach complex hard-to-hit states faster than a purely random or constrained-random baseline.

How random instruction generation works in a RISC-V flow

The cited RISC-V verification papers describe the same overall pipeline. An instruction generator produces an assembly file containing instructions randomly selected according to the targeted test scenario's constraints. A compiler then turns that assembly into a machine-language file, which is loaded into the design's memory and delivered to the core. In parallel, the same test program is executed on a reference model (a RISC-V instruction set simulator acting as a golden model) and the resulting execution traces are compared; divergence between the two traces signals a potential bug. In the RISC-V context, the golden model is often the Spike ISA simulator, which is officially released by RISC-V International.

The RISC-V Formal Interface (RVFI) is commonly used to capture the internal state and behavior of a processor during the execution of each instruction, enabling trace-based comparison and supporting formal verification of RISC-V processors.

Example tools

Several concrete random instruction generators are mentioned in the evidence:

  • Google riscv-dv — a directed-verification instruction stream generator for RISC-V. It is the base layer used by other tools.
  • COREV-DV — a library of extensions layered on top of Google riscv-dv, used within the OpenHW Group CORE-V verification project.
  • EAVS-DV — an enhancement of COREV-DV proposed in the cited ElectraIC Advanced Verification Suite (EAVS) work. Its key change is to parameterize all fixed address spaces that are hard-coded in COREV-DV, so that the environment can be adapted to any DUT and Spike configuration with different memory address limitations.
  • Force-riscv — an OpenHW Group ISG for the RISC-V ISA that supports all instructions of RV32GC.
  • The arXiv machine-learning study additionally cites Google's RISC-V Random Instruction Generator as the random generator used in a hardware-verification example with the open-source RISC-V Ariane design.

TestRIG and verification engines

The TestRIG framework generalizes the same pattern. Its interactive Verification Engine (VEngine) stimulates RISC-V implementations over RVFI-DII sockets. An RVFI-DII-compatible RISC-V implementation can be reset, consume instruction sequences, and report execution traces through the interface. A VEngine may host an internal RISC-V model (similar to PyH2P) or drive two independent implementations and compare their RVFI traces, as in QCVEngine.

TestRIG reports that VEngine instruction sequences can be loaded from disk, generated randomly, or produced with interactive architecture-driven state-space exploration. The framework supports software RISC-V simulators such as Spike and QEMU, the Sail RISC-V model, and hardware implementations including RVBS, Ibex, Piccolo, Flute, and Toooba (written in SystemVerilog or Bluespec).

Practical drawbacks

The cited evidence identifies several known limitations of random instruction generation:

  • Repetitive inputs. Without coverage guidance, generators tend to produce inputs that exercise the same processor regions repeatedly.
  • Long, convoluted counterexamples. Automatically generated failing sequences can be long and hard to understand, whereas hand-written tests are typically shorter and more readable.
  • Branch target validity. The generator must ensure that useful instructions exist at the targets of randomly generated branches, otherwise the program may not be valid or may not exercise the intended behavior.
  • Engineering effort to target gaps. A verification engineer can manually adjust instruction-mix constraints (e.g., increasing the load/store ratio) when coverage is missing in a particular unit, but this significantly increases engineering effort and slows the verification process.

To address debugging difficulty, the cited RISC-V work discusses automated reduction of failing randomly generated instruction sequences. PyH2P is reported to often reduce failing sequences to fewer than five instructions, with each instruction still meaningful for reproducing the error, although it has known shortcomings: it does not perform full trace comparison, has difficulty shrinking through branches because it must keep a valid in-memory program, and does not use community-standard interfaces.

Coverage-directed generation as an extension

Coverage-directed test generation (CDG) mechanisms are described as an extension of random instruction generation: the constraints of a random test generator are automatically steered by coverage feedback so that the next round of test generation targets uncovered RTL regions. Cited CDG approaches include MicroGP (genetic programming with statement coverage as fitness), a Bayesian-network-based mechanism (Fine and Ziv), and a Markov-chain-based framework (Wagner et al.) whose weights are tuned from collected coverage. CDG approaches in general must balance domain knowledge required to set up the framework against its general applicability.

Example: ElectraIC EAVS applied to the cv32e40p core

The cited EAVS study applied a wide range of randomly generated tests to the cv32e40p RISC-V core and ran the same tests on Spike. After converting the resulting logs to CSV and comparing them, the authors report that each random test produced by EAVS-DV successfully executed on the cv32e40p core.

The same study notes that Spike's log file shows memory contents only for load instructions, so EAVS compares both data and addresses for loads but only addresses for stores. The mode field in the RVFI Agent log corresponds to the privilege level at which the program is running (the M column in RVFI; lines labelled with 3 in Spike's log).

The cited work also documents a set of known issues with the cv32e40p tracer that led the authors to disable it: srai is improperly decoded; compressed instructions are logged with the binary format of their uncompressed counterparts; lui and auipc append three zeros to the LSB of their immediate values, causing operand-comparison errors; and pseudo-instructions are decoded in their normal form rather than Spike's convention.

Connections

  • EAVS-DV is a tool that implements random instruction generation as part of the ElectraIC Advanced Verification Suite, layered on top of COREV-DV and Google riscv-dv.
  • Instruction Set Generator (ISG) is the underlying concept: a component that produces an assembly file whose instructions are randomly generated under scenario constraints. Random instruction generation is the specific mode of operation of an ISG.

CITATIONS

11 sources
11 citations
[1] Random instruction generators have been commonly used in processor verification since they require limited human expertise and scale to large RTL designs, and their lack of coverage guidance leads to repetitive inputs that retest the same functionalities. Hardware Fuzzing / ProcessorFuzz thesis (Boston University)
[2] Random stimulus can in principle exercise all combinations given enough time, but a purely random approach has difficulty doing so quickly on highly complex designs, leading to constrained-random verification steered by human expertise. Optimizing Design Verification using Machine Learning: Doing better than Random
[3] Machine learning (supervised and reinforcement learning) can improve functional coverage and reach complex hard-to-hit states faster than random or constrained-random baselines, demonstrated on a Cache Controller and on the RISC-V Ariane design using Google's RISC-V Random Instruction Generator. Optimizing Design Verification using Machine Learning: Doing better than Random
[4] Random test generation in RISC-V produces an assembly file via an instruction generator, which is compiled into a machine-language file and loaded into the DUT's memory; the same program is run on a golden model (e.g., Spike) and the execution traces are compared. Advanced Verification Suite for RISC-V Cores (RISC-V Summit Europe 2025)
[5] The instruction set generator (ISG) produces an assembly file containing the configured instructions for the targeted test scenarios, with instructions randomly generated in accordance with the scenario's constraints. Advanced Verification Suite for RISC-V Cores (RISC-V Summit Europe 2025)
[6] Examples of random instruction generators for RISC-V include Force-riscv (OpenHW Group, supports all RV32GC instructions) and Google riscv-dv; EAVS-DV is an enhancement of COREV-DV that parameterizes the fixed address spaces present in COREV-DV. Advanced Verification Suite for RISC-V Cores (RISC-V Summit Europe 2025)
[7] TestRIG's Verification Engine stimulates RISC-V implementations over RVFI-DII sockets, and its instruction sequences can be loaded from disk, generated randomly, or produced by interactive architecture-driven state-space exploration; it supports Spike, QEMU, the Sail model, and hardware cores such as RVBS, Ibex, Piccolo, Flute, and Toooba. Randomized Testing of RISC-V CPUs using Direct ... (TestRIG)
[8] Drawbacks of random test generation include long and convoluted automatically generated counterexamples, and the need for the generator to ensure useful instructions exist at the targets of randomly generated branches; PyH2P is a tool that reduces failing randomly generated RISC-V sequences to fewer than five instructions, but does not perform full trace comparison, struggles to shrink through branches, and does not use community-standard interfaces. Randomized Testing of RISC-V CPUs using Direct ... (TestRIG)
[9] Coverage-directed test generation (CDG) automatically steers the constraints of a random test generator using coverage feedback to target uncovered RTL regions; examples include MicroGP (genetic programming), a Bayesian-network mechanism (Fine and Ziv, 2003), and a Markov-chain framework (Wagner et al., 2005). Hardware Fuzzing / ProcessorFuzz thesis (Boston University)
[10] EAVS applied a wide range of randomly generated tests to the cv32e40p core, ran the same tests on Spike, and reported that each random test from EAVS-DV successfully executed on the core; comparison relies on Spike's log showing memory contents only for loads, so EAVS compares data and addresses for loads but only addresses for stores. Advanced Verification Suite for RISC-V Cores (RISC-V Summit Europe 2025)
[11] The cv32e40p tracer has known issues that led the authors to disable it: srai is improperly decoded, compressed instructions are logged in their uncompressed binary form, lui and auipc append three zeros to the LSB of their immediate values causing operand-comparison errors, and pseudo-instructions are decoded in normal form rather than Spike's convention. Advanced Verification Suite for RISC-V Cores (RISC-V Summit Europe 2025)

VERSION HISTORY

v7 · 6/8/2026 · minimax/minimax-m3 (current)
v6 · 6/2/2026 · gpt-5.4
v5 · 5/29/2026 · gpt-5.5
v4 · 5/28/2026 · gpt-5.5
v3 · 5/27/2026 · gpt-5.5
v2 · 5/27/2026 · gpt-5.5
v1 · 5/25/2026 · gpt-5.5