instruction generation algorithm Wiki

Overview

The instruction generation algorithm is the instruction-stream generator described in Efficient Cross-Level Testing for Processor Verification: A RISC-V Case-Study. It is used in a co-simulation-based testing setup whose goal is to generate an endless, unrestricted instruction stream for comparing an RTL core with an instruction set simulator (ISS). The paper identifies fully randomized instruction generation as the baseline, then adds modifications that guide generation toward more interesting and more often legal RISC-V instructions. [C1]

A key motivation is that pure randomization tends to produce illegal instructions because the illegal-instruction state space is much larger than the legal-instruction state space. The algorithm therefore begins with a random 32-bit instruction word but often injects a valid opcode and sometimes applies field-level mutation rules derived from RISC-V instruction structure. [C2]

Algorithm flow

The pseudocode for InstrGenerator::next() combines sequence handling with single-instruction generation: [C3]

If an existing instruction sequence is active and has a next instruction, the generator returns the next instruction from that sequence.
With probability 1%, the generator starts a new sequence by choosing a random sequence generator and returning the first instruction of that sequence.
Otherwise, it generates a random 32-bit instruction word.
With probability 98%, it injects a random valid opcode into that word while keeping the instruction fields random.
With probability 20%, it applies a random field mutation rule to the instruction fields.
It returns the resulting value as a single independent instruction.

This design keeps the baseline of unrestricted random instruction generation while biasing the stream toward legal instructions and special architectural cases. [C1]

Opcode injection

Opcode injection is the first modification applied to the fully randomized baseline. It chooses a random valid opcode and writes it into the instruction word while leaving the other instruction fields randomized. The paper describes this as simple, generic, and effective, and notes that it helps ensure that a large set of legal instructions is considered. [C1]

The paper illustrates opcode injection with an ADDI example: a fully randomized 32-bit word is modified by injecting the ADDI opcode, producing a randomized ADDI instruction whose remaining fields are still random. [C2]

Field mutation rules

Instruction field mutation is the second modification. It applies predefined rules that reason about instruction structure and values, with rules derived from the RISC-V instruction format. The cited rules include: [C4]

injecting special immediate values such as MIN, -1, 0, 1, and MAX into the appropriate immediate field;
mutating RD to zero, reflecting the hardwired RISC-V zero register as a special case;
mutating RD to equal RS1 and/or RS2;
mutating RS1 to match RS2;
mutating the CSR selector field to a supported CSR.

The ADDI example also demonstrates a register-field mutation: after opcode injection creates a randomized ADDI instruction, the RD field is mutated to match the RS1 field. Both register fields remain randomized, but they become equal. [C4]

Instruction sequences

The algorithm also supports instruction sequence generation. The paper describes sequence generation as a third modification and states that a sequence consists of a fixed number of instructions. In the pseudocode, the generator can choose a random sequence generator with 1% probability and return the first instruction from the new sequence; later calls can continue an existing sequence by returning its next instruction. [C3]

Verification context

The instruction generation algorithm is part of the paper's cross-level testing approach. The surrounding co-simulation setup is designed to enable endless generation of unrestricted instructions, while matching and adapter logic manage the differences between RTL-core instruction fetching and ISS fetching. The approach was implemented and applied to verification of a pipelined 32-bit industrial RISC-V TGF series core implemented in SpinalHDL. [C5]