Random Instruction Generation
Random instruction generation is a hardware verification technique in which an instruction set generator (ISG) produces assembly programs whose instructions are chosen randomly, typically under scenario-defined constraints (e.g., instruction mix, frequencies, boot and memory-map parameters), in order to exercise a processor design under test (DUT) and uncover functional bugs.
Role in design verification
Random stimulus is attractive because, in theory, it can exercise all possible combinations given enough time. In practice, however, a purely random approach has difficulty exercising all combinations quickly enough on highly complex designs, so verification environments are often steered toward hard-to-hit combinations using constraints or additional guidance mechanisms. The lack of coverage guidance in random generators is also reported to lead to repetitive inputs that retest the same processor functionalities, decreasing the chances of finding bugs.
A cited machine-learning study further reports that augmenting constrained-random verification tools with supervised learning and reinforcement learning can produce better functional coverage and reach complex hard-to-hit states faster than a purely random or constrained-random baseline.
How random instruction generation works in a RISC-V flow
The cited RISC-V verification papers describe the same overall pipeline. An instruction generator produces an assembly file containing instructions randomly selected according to the targeted test scenario's constraints. A compiler then turns that assembly into a machine-language file, which is loaded into the design's memory and delivered to the core. In parallel, the same test program is executed on a reference model (a RISC-V instruction set simulator acting as a golden model) and the resulting execution traces are compared; divergence between the two traces signals a potential bug. In the RISC-V context, the golden model is often the Spike ISA simulator, which is officially released by RISC-V International.
The RISC-V Formal Interface (RVFI) is commonly used to capture the internal state and behavior of a processor during the execution of each instruction, enabling trace-based comparison and supporting formal verification of RISC-V processors.
Example tools
Several concrete random instruction generators are mentioned in the evidence:
- Google riscv-dv — a directed-verification instruction stream generator for RISC-V. It is the base layer used by other tools.
- COREV-DV — a library of extensions layered on top of Google riscv-dv, used within the OpenHW Group CORE-V verification project.
- EAVS-DV — an enhancement of COREV-DV proposed in the cited ElectraIC Advanced Verification Suite (EAVS) work. Its key change is to parameterize all fixed address spaces that are hard-coded in COREV-DV, so that the environment can be adapted to any DUT and Spike configuration with different memory address limitations.
- Force-riscv — an OpenHW Group ISG for the RISC-V ISA that supports all instructions of RV32GC.
- The arXiv machine-learning study additionally cites Google's RISC-V Random Instruction Generator as the random generator used in a hardware-verification example with the open-source RISC-V Ariane design.
TestRIG and verification engines
The TestRIG framework generalizes the same pattern. Its interactive Verification Engine (VEngine) stimulates RISC-V implementations over RVFI-DII sockets. An RVFI-DII-compatible RISC-V implementation can be reset, consume instruction sequences, and report execution traces through the interface. A VEngine may host an internal RISC-V model (similar to PyH2P) or drive two independent implementations and compare their RVFI traces, as in QCVEngine.
TestRIG reports that VEngine instruction sequences can be loaded from disk, generated randomly, or produced with interactive architecture-driven state-space exploration. The framework supports software RISC-V simulators such as Spike and QEMU, the Sail RISC-V model, and hardware implementations including RVBS, Ibex, Piccolo, Flute, and Toooba (written in SystemVerilog or Bluespec).
Practical drawbacks
The cited evidence identifies several known limitations of random instruction generation:
- Repetitive inputs. Without coverage guidance, generators tend to produce inputs that exercise the same processor regions repeatedly.
- Long, convoluted counterexamples. Automatically generated failing sequences can be long and hard to understand, whereas hand-written tests are typically shorter and more readable.
- Branch target validity. The generator must ensure that useful instructions exist at the targets of randomly generated branches, otherwise the program may not be valid or may not exercise the intended behavior.
- Engineering effort to target gaps. A verification engineer can manually adjust instruction-mix constraints (e.g., increasing the load/store ratio) when coverage is missing in a particular unit, but this significantly increases engineering effort and slows the verification process.
To address debugging difficulty, the cited RISC-V work discusses automated reduction of failing randomly generated instruction sequences. PyH2P is reported to often reduce failing sequences to fewer than five instructions, with each instruction still meaningful for reproducing the error, although it has known shortcomings: it does not perform full trace comparison, has difficulty shrinking through branches because it must keep a valid in-memory program, and does not use community-standard interfaces.
Coverage-directed generation as an extension
Coverage-directed test generation (CDG) mechanisms are described as an extension of random instruction generation: the constraints of a random test generator are automatically steered by coverage feedback so that the next round of test generation targets uncovered RTL regions. Cited CDG approaches include MicroGP (genetic programming with statement coverage as fitness), a Bayesian-network-based mechanism (Fine and Ziv), and a Markov-chain-based framework (Wagner et al.) whose weights are tuned from collected coverage. CDG approaches in general must balance domain knowledge required to set up the framework against its general applicability.
Example: ElectraIC EAVS applied to the cv32e40p core
The cited EAVS study applied a wide range of randomly generated tests to the cv32e40p RISC-V core and ran the same tests on Spike. After converting the resulting logs to CSV and comparing them, the authors report that each random test produced by EAVS-DV successfully executed on the cv32e40p core.
The same study notes that Spike's log file shows memory contents only for load instructions, so EAVS compares both data and addresses for loads but only addresses for stores. The mode field in the RVFI Agent log corresponds to the privilege level at which the program is running (the M column in RVFI; lines labelled with 3 in Spike's log).
The cited work also documents a set of known issues with the cv32e40p tracer that led the authors to disable it: srai is improperly decoded; compressed instructions are logged with the binary format of their uncompressed counterparts; lui and auipc append three zeros to the LSB of their immediate values, causing operand-comparison errors; and pseudo-instructions are decoded in their normal form rather than Spike's convention.
Connections
- EAVS-DV is a tool that implements random instruction generation as part of the ElectraIC Advanced Verification Suite, layered on top of COREV-DV and Google riscv-dv.
- Instruction Set Generator (ISG) is the underlying concept: a component that produces an assembly file whose instructions are randomly generated under scenario constraints. Random instruction generation is the specific mode of operation of an ISG.