Model-based Random Testing Wiki

Overview

Model-based random testing is a functional verification technique that compares an implementation against a model while executing generated test sequences. In the cited RISC-V context, the approach is motivated by the difficulty of formally proving equivalence for complex microarchitectures. Rather than proving equivalence between a formal model and an implementation, model-based random testing can detect divergences and refute equivalence with counterexamples. [C1]

Method

A typical model-based random-testing workflow generates test programs or instruction sequences and executes them on both a golden model and a processor implementation under development. Divergence is commonly detected by comparing execution traces. [C2]

Directed-random test-sequence generation has been used to debug pipeline and memory bugs and to uncover unexpected divergences in implementation behavior. In the RISC-V ecosystem, examples of generators include RISC-V RTG and RISCV-DV; the cited source describes RISCV-DV as an advanced RISC-V sequence generator that works well where detailed traces can be compared. [C3]

RISC-V use case

For RISC-V, RISCV-DV generates assembly programs that are converted into in-memory images for execution. Its generators cover RV32IMAFDC and RV64IMAFDC and include support for page-table interactions, privileged CSR use, and traps or interrupts. The generated programs are executed on both a golden model and a processor in development, and divergence is typically detected through trace comparison. [C4]

Relationship to formal verification

Model-based random testing complements, but does not replace, model-based formal verification. Formal approaches using RVFI tracing and tools such as JasperGold can prove equivalence between traces from a simple HDL model and a pipelined HDL implementation, but the cited source notes limitations: such tools can handle only in-order pipelines and require specialist knowledge. As a result, formal verification does not yet replace functional testing for entire processors in that context. [C5]

Limitations

The cited source identifies several drawbacks of randomly generated tests. Automatically generated counterexamples can be long and convoluted compared with hand-written tests, and the generator must ensure that useful instructions exist at the targets of randomly generated branches. [C6]

Automated reduction can mitigate this problem. PyH2P is described as applying automated test-case reduction to randomly generated RISC-V instruction sequences, often producing sequences with fewer than five instructions where each instruction is meaningful for reproducing the error. However, the same source notes shortcomings: PyH2P does not perform full trace comparison with its internal PyMTL3 model, has difficulty shrinking through branches because it must produce a valid in-memory program, and does not use community-standard interfaces proven across a range of implementations. [C7]

Use in TestRIG

TestRIG applies this style of testing through an interactive Verification Engine. In TestRIG, the Verification Engine stimulates RISC-V implementations over RVFI-DII sockets, injects instruction sequences, and compares execution traces until it finds a divergence. A Verification Engine can drive one or more RVFI-DII-compatible implementations, either using an internal RISC-V model or comparing traces from two independent implementations. Its instruction sequences may be loaded from disk, generated randomly, or produced by interactive architecture-driven state-space exploration. [C8]

The cited TestRIG work focuses on comparing executable formal models, software ISA simulators, and simulated hardware designs rather than completed fabricated chips. This requires instrumenting CPU designs with a Direct Instruction Injection interface for tandem verification. [C9]